├── .gitignore ├── LICENSE.txt ├── MANIFEST.in ├── README.md ├── plots ├── YosemiteGrayscale.jpg ├── YosemiteGrayscale.png ├── YosemiteRGB.jpeg ├── barplot.png ├── boxplot.png ├── density.png ├── hexbin.png ├── histogram.png ├── scatter_matrix.png └── time_series.png ├── ppsqlviz ├── __init__.py └── plotter.py └── setup.py /.gitignore: -------------------------------------------------------------------------------- 1 | # Byte-compiled / optimized / DLL files 2 | __pycache__/ 3 | *.py[cod] 4 | 5 | #Edited file backups 6 | *~ 7 | 8 | #Mac DS_store files 9 | *.DS_Store 10 | 11 | # C extensions 12 | *.so 13 | 14 | # Distribution / packaging 15 | .Python 16 | env/ 17 | bin/ 18 | build/ 19 | develop-eggs/ 20 | dist/ 21 | eggs/ 22 | lib/ 23 | lib64/ 24 | parts/ 25 | sdist/ 26 | var/ 27 | *.egg-info/ 28 | .installed.cfg 29 | *.egg 30 | 31 | # Installer logs 32 | pip-log.txt 33 | pip-delete-this-directory.txt 34 | 35 | # Unit test / coverage reports 36 | htmlcov/ 37 | .tox/ 38 | .coverage 39 | .cache 40 | nosetests.xml 41 | coverage.xml 42 | 43 | # Translations 44 | *.mo 45 | 46 | # Mr Developer 47 | .mr.developer.cfg 48 | .project 49 | .pydevproject 50 | 51 | # Rope 52 | .ropeproject 53 | 54 | # Django stuff: 55 | *.log 56 | *.pot 57 | 58 | # Sphinx documentation 59 | docs/_build/ 60 | -------------------------------------------------------------------------------- /LICENSE.txt: -------------------------------------------------------------------------------- 1 | Copyright Srivatsan Ramanujam (vatsan.cs@utexas.edu). 2 | 3 | Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: 4 | - Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. 5 | - Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. 6 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A 7 | PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT 8 | LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR 9 | TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 10 | 11 | -------------------------------------------------------------------------------- /MANIFEST.in: -------------------------------------------------------------------------------- 1 | include *.txt *.md 2 | 3 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | GitHub Page 2 | ============ 3 | 4 | Pandas-via-psql (ppsqlviz) is a command line visualization utility for SQL using Pandas library in Python. 5 | Please visit the GitHub page [ppsqlviz](http://vatsan.github.io/pandas_via_psql/) for a complete tutorial. 6 | 7 | PSQL + Pandas Awesomeness 8 | ========================== 9 | 10 | [Pandas](http://pandas.pydata.org/) is a popular library in Python that is commonly used for data analysis and it provides Python equivalent of the R dataframe that is fundamental to data analysis. Some engineers and data scientists however are increasingly adopting SQL based libraries for building large scale machine learning algorithms. [MADlib](http://madlib.net) is one such library for scalable, parallel, in-database machine learning. 11 | 12 | While there are commercial tools to visualize data that reside in databases (example: Tableau), often what's missing in a Big Data scientist's arsenal is a command line tool to be able to quickly visualize the output of a SQL query, without having to switch to a commercial tool or have to use a wrapper to a SQL engine. The pandas_via_psql (ppsqlviz) will show you how simple it is to redirect the output of a SQL query to some boilerplate Pandas's plotting functions, to quickly visualize the data from the command line. 13 | 14 | Pre-Requisites 15 | ============== 16 | 17 | ppsqlviz depends on the Pandas python library. You should also have [PSQL](http://www.postgresql.org/docs/8.1/static/app-psql.html) or a similar SQL command line interface to connect to your database and also ensure that you have password-less access to your remote database (set up SSH keys appropriately). 18 | 19 | I recommend you download [Anaconda Python](https://store.continuum.io/cshop/anaconda/) from the nice folks at [Continuum Analytics](http://continuum.io/). It's got most of the essential Python scientific computing libraries pre-packaged and with [conda](http://bokeh.pydata.org/) you can save a lot of pain in installing python libraries. It also makes creating and managing virtual environments a piece of cake! 20 | 21 | Installation 22 | ============= 23 | 24 | You can install install ppsqlviz through pip 25 | 26 | ``` 27 | pip install ppsqlviz 28 | ``` 29 | 30 | This will install the dependent library (Pandas) if you don't already have that. I strongly encourage you use Anaconda Python to avoid going down the rabbit hole of PyData stack dependency nightmares. 31 | 32 | 33 | Datasets Used 34 | ============== 35 | 36 | For this demo, I'm using two publicly available datasets. 37 | * [The UCI wine quality dataset](http://archive.ics.uci.edu/ml/datasets/Wine+Quality) - Here is a sampling of rows from this dataset: 38 | 39 | ``` 40 | alcohol | mmalic_acid | ash | alcalinity_of_ash | magnesium | total_phenols | flavanoids | nonflavanoid_phenols | proanthocyanins | color_intensity | hue | od280 | proline | quality 41 | ---------+-------------+------+-------------------+-----------+---------------+------------+----------------------+-----------------+-----------------+-------+-------+---------+--------- 42 | 1 | 14.23 | 1.71 | 2.43 | 15.6 | 127 | 2.8 | 3.06 | 0.28 | 2.29 | 5.64 | 1.04 | 3.92 | 1065 43 | 1 | 13.2 | 1.78 | 2.14 | 11.2 | 100 | 2.65 | 2.76 | 0.26 | 1.28 | 4.38 | 1.05 | 3.4 | 1050 44 | 1 | 13.16 | 2.36 | 2.67 | 18.6 | 101 | 2.8 | 3.24 | 0.3 | 2.81 | 5.68 | 1.03 | 3.17 | 1185 45 | 1 | 14.37 | 1.95 | 2.5 | 16.8 | 113 | 3.85 | 3.49 | 0.24 | 2.18 | 7.8 | 0.86 | 3.45 | 1480 46 | 1 | 13.24 | 2.59 | 2.87 | 21 | 118 | 2.8 | 2.69 | 0.39 | 1.82 | 4.32 | 1.04 | 2.93 | 735 47 | 1 | 14.2 | 1.76 | 2.45 | 15.2 | 112 | 3.27 | 3.39 | 0.34 | 1.97 | 6.75 | 1.05 | 2.85 | 1450 48 | 1 | 14.39 | 1.87 | 2.45 | 14.6 | 96 | 2.5 | 2.52 | 0.3 | 1.98 | 5.25 | 1.02 | 3.58 | 1290 49 | 1 | 14.06 | 2.15 | 2.61 | 17.6 | 121 | 2.6 | 2.51 | 0.31 | 1.25 | 5.05 | 1.06 | 3.58 | 1295 50 | 1 | 14.83 | 1.64 | 2.17 | 14 | 97 | 2.8 | 2.98 | 0.29 | 1.98 | 5.2 | 1.08 | 2.85 | 1045 51 | 1 | 13.86 | 1.35 | 2.27 | 16 | 98 | 2.98 | 3.15 | 0.22 | 1.85 | 7.22 | 1.01 | 3.55 | 1045 52 | 1 | 14.1 | 2.16 | 2.3 | 18 | 105 | 2.95 | 3.32 | 0.22 | 2.38 | 5.75 | 1.25 | 3.17 | 1510 53 | 1 | 14.12 | 1.48 | 2.32 | 16.8 | 95 | 2.2 | 2.43 | 0.26 | 1.57 | 5 | 1.17 | 2.82 | 1280 54 | 1 | 13.75 | 1.73 | 2.41 | 16 | 89 | 2.6 | 2.76 | 0.29 | 1.81 | 5.6 | 1.15 | 2.9 | 1320 55 | 1 | 14.75 | 1.73 | 2.39 | 11.4 | 91 | 3.1 | 3.69 | 0.43 | 2.81 | 5.4 | 1.25 | 2.73 | 1150 56 | 1 | 14.38 | 1.87 | 2.38 | 12 | 102 | 3.3 | 3.64 | 0.29 | 2.96 | 7.5 | 1.2 | 3 | 1547 57 | 1 | 13.63 | 1.81 | 2.7 | 17.2 | 112 | 2.85 | 2.91 | 0.3 | 1.46 | 7.3 | 1.28 | 2.88 | 1310 58 | 59 | ``` 60 | 61 | * [The S&P daily prices dataset](http://finance.yahoo.com/q/hp?s=%5EGSPC+Historical+Prices) - Here is a sampling of rows from this dataset: 62 | 63 | ``` 64 | dt | open | high | low | close | volume | adj_close 65 | ------------+---------+---------+---------+---------+------------+----------- 66 | 2013-09-27 | 1695.52 | 1695.52 | 1687.11 | 1691.75 | 2951700000 | 1691.75 67 | 2012-04-23 | 1378.53 | 1378.53 | 1358.79 | 1366.94 | 3654860000 | 1366.94 68 | 2012-01-18 | 1293.65 | 1308.11 | 1290.99 | 1308.04 | 4096160000 | 1308.04 69 | 2011-09-07 | 1165.85 | 1198.62 | 1165.85 | 1198.62 | 4441040000 | 1198.62 70 | 2011-06-03 | 1312.94 | 1312.94 | 1297.9 | 1300.16 | 3505030000 | 1300.16 71 | 2011-03-31 | 1327.44 | 1329.77 | 1325.03 | 1325.83 | 3566270000 | 1325.83 72 | 2010-12-28 | 1259.1 | 1259.9 | 1256.22 | 1258.51 | 2478450000 | 1258.51 73 | 2010-09-23 | 1131.1 | 1136.77 | 1122.79 | 1124.83 | 3847850000 | 1124.83 74 | 2010-07-21 | 1086.67 | 1088.96 | 1065.25 | 1069.59 | 4747180000 | 1069.59 75 | 2010-05-13 | 1170.04 | 1173.57 | 1156.14 | 1157.44 | 4870640000 | 1157.44 76 | 2010-03-10 | 1140.22 | 1148.26 | 1140.09 | 1145.61 | 5469120000 | 1145.61 77 | 2009-12-04 | 1100.43 | 1119.13 | 1096.52 | 1105.98 | 5781140000 | 1105.98 78 | 2009-07-24 | 972.16 | 979.79 | 965.95 | 979.26 | 4458300000 | 979.26 79 | 2009-02-09 | 868.24 | 875.01 | 861.65 | 869.89 | 5574370000 | 869.89 80 | 2008-11-05 | 1001.84 | 1001.84 | 949.86 | 952.77 | 5426640000 | 952.77 81 | 2008-09-02 | 1287.83 | 1303.04 | 1272.2 | 1277.58 | 4783560000 | 1277.58 82 | 2008-04-30 | 1391.22 | 1404.57 | 1384.25 | 1385.59 | 4508890000 | 1385.59 83 | 2008-01-25 | 1357.32 | 1368.56 | 1327.5 | 1330.61 | 4882250000 | 1330.61 84 | 2007-09-14 | 1483.95 | 1485.99 | 1473.18 | 1484.25 | 2641740000 | 1484.25 85 | ``` 86 | 87 | Usage 88 | ====== 89 | Invoke Pandas plotting functions by piping in the output from a psql query. 90 | You can re-use this boiler-plate code for Scatter Plots, Box Plots, Histograms and Time Series Plots on your tables. 91 | 92 | 93 | Scatter Matrix 94 | =============== 95 | This is pretty useful when you are interested in analyzing the correlation between a bunch of features in a dataset, particularly in their correlation with the target attribute/label. You might then perform feature selection based on a visual output of the correlations. 96 | 97 | Here is how the scatter matrix can be created on the UCI Wine Quality Dataset 98 | ``` 99 | home$ psql -d vatsandb -h dca -U gpadmin -c 'select * from wine;' | python -m 'ppsqlviz.plotter' scatter 100 | ``` 101 | Here is the output ![Scatter Matrix of all features from the Wine Quality Dataset] 102 | (https://raw.githubusercontent.com/vatsan/pandas_via_psql/master/plots/scatter_matrix.png) 103 | 104 | 105 | Hexbin Plots 106 | ============= 107 | Scatter plots sometimes may not reveal the underlying relationship between the dimensions when multiple points overlap. 108 | For this reason, it is better to look at a 2-d histogram or a hex-bin plot. We can tap into `matplotlib's` hexbin plot for this. 109 | 110 | You could invoke it from your command line like so: 111 | ``` 112 | home$ psql -d vatsandb -h dca -U gpadmin -c 'select ash, flavanoids from wine;' | python -m 'ppsqlviz.plotter' hexbin 113 | ``` 114 | Here is the output ![Hexbin plot of Ash vs. Flavanoids from Wine Quality Dataset] 115 | (https://raw.githubusercontent.com/vatsan/pandas_via_psql/master/plots/hexbin.png) 116 | 117 | 118 | Histogram Plot 119 | ============== 120 | To get a quick glimpse of the distribution of the data in your columns, a histogram plot of all columns is quite useful. 121 | 122 | You could invoke it from your command line like so: 123 | ``` 124 | home$ psql -d vatsandb -h dca -U gpadmin -c 'select ash, flavanoids, hue, proline from wine;' | python -m 'ppsqlviz.plotter' hist 125 | ``` 126 | Here is the output 127 | ![Histogram Plots of some features from the Wine Quality Dataset](https://raw.githubusercontent.com/vatsan/pandas_via_psql/master/plots/histogram.png) 128 | 129 | Density Plot 130 | ============= 131 | In place of binning your data, you might consider plotting the density directly. 132 | 133 | You could invoke it from your command line like so: 134 | ``` 135 | home$ psql -d vatsandb -h dca -U gpadmin -c 'select ash, flavanoids, hue, proline from wine;' | python -m 'ppsqlviz.plotter' density 136 | ``` 137 | Here is the output ![Density Plots of some features from the Wine Quality Dataset](https://raw.github.com/vatsan/pandas_via_psql/master/plots/density.png) 138 | 139 | Box Plot 140 | ========= 141 | Box plots are useful in visually getting a feel for the quartile ranges of numerical columns in your dataset. You could invoke it from your command line like so: 142 | 143 | ``` 144 | home$ psql -d vatsandb -h dca -U gpadmin -c 'select ash, flavanoids, hue, proline from wine;' | python -m 'ppsqlviz.plotter' box 145 | ``` 146 | Here is the output ![Box Plot of some features from the Wine Quality Dataset](https://raw.githubusercontent.com/vatsan/pandas_via_psql/master/plots/boxplot.png) 147 | 148 | Time Series Plot 149 | ================= 150 | Again, Pandas has an impressive collection of functions for time series analysis but to quickly visualize a time series, you can run the following from your command line: 151 | ``` 152 | home$ psql -d vatsandb -h dca -U gpadmin -c 'select dt, high, low from sandp_prices where dt > 1998 order by dt;' | python -m 'ppsqlviz.plotter' tseries 153 | ``` 154 | Here is the output ![Time Series Plotting of S&P](https://raw.githubusercontent.com/vatsan/pandas_via_psql/master/plots/time_series.png) 155 | 156 | Bar Plot 157 | ================== 158 | Bar plots are typically used to plot binned data, where the data is binned according to user specified bins. This support is provided in pandas-via-psql. The data table is expected to comprise of two array columns of the same length, one each for the x and y axes. You can plot a bar plot by running the following from your command line: 159 | 160 | ``` 161 | home$ psql -d -h -U gpadmin -c 'select x*10 as binCenter, random()*100 as count from generate_series(1, 100) x;' | python -m 'ppsqlviz.plotter' bar 162 | ``` 163 | The first column always has to be the x axis (bin center). 164 | Here's the output ![Bar Plot](https://raw.githubusercontent.com/vatsan/pandas_via_psql/master/plots/barplot.png) 165 | 166 | Image Rendering 167 | =================== 168 | Pandas also has a great set of tools for viewing images: grayscale or RGB, which can be quite handy when working on image processing or computer vision in SQL. For example, to check a binary mask after thresholding or the weights output by a deep learning algorithm, it is much easier to visualize an image than to interpret a table of intensity values. 169 | To view an image whose intensity values are stored in a table, simply select the height and width of the image (number of rows & columns) followed by a vector of intensity values ordered by row, then column. For example, to view this 270x360 pixel grayscale image, you can run the following from your command line: 170 | 171 | ``` 172 | home$ psql -d vatsandb -h dca -U gpadmin -c 'select 270 as rows, 360 as cols, intensity_values from sample_image;' | python -m 'ppsqlviz.plotter' image 173 | ``` 174 | 175 | Here is the output ![Sample Grayscale image](https://raw.githubusercontent.com/vatsan/pandas_via_psql/master/plots/YosemiteGrayscale.png) 176 | 177 | Similarly, to view an RGB image, provide the image height and width followed by a vector of intensity values ordered by row, then column, then color. To view a sample RGB image you can run the following from your comman line: 178 | 179 | ``` 180 | home$ psql -d vatsandb -h dca -U gpadmin -c 'select max(row)+1, max(col)+1, array[array_agg(red_intensity order by row,col), array_agg(green_intensity order by row,col), array_agg(blue_intensity order by row,col)] from (select * from sample_RGB_image order by row,col)t;' | python -m 'ppsqlviz.plotter' imageRGB 181 | ``` 182 | 183 | Here is the output ![Sample RGB image](https://raw.githubusercontent.com/vatsan/pandas_via_psql/master/plots/YosemiteRGB.jpeg) 184 | 185 | 186 | Author 187 | ======= 188 | 189 | Please email questions and feedback to [Srivatsan Ramanujam](https://github.com/vatsan/) at vatsan.cs@utexas.edu 190 | 191 | Contributors 192 | ============== 193 | 194 | Thanks to [Ailey Crow](https://github.com/ailey) and [Gautam Muralidhar](https://github.com/gautamsm) for their contributions. 195 | 196 | -------------------------------------------------------------------------------- /plots/YosemiteGrayscale.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/vatsan/pandas_via_psql/a0dfd738f722c406611620e43ddbc7b925c308b7/plots/YosemiteGrayscale.jpg -------------------------------------------------------------------------------- /plots/YosemiteGrayscale.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/vatsan/pandas_via_psql/a0dfd738f722c406611620e43ddbc7b925c308b7/plots/YosemiteGrayscale.png -------------------------------------------------------------------------------- /plots/YosemiteRGB.jpeg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/vatsan/pandas_via_psql/a0dfd738f722c406611620e43ddbc7b925c308b7/plots/YosemiteRGB.jpeg -------------------------------------------------------------------------------- /plots/barplot.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/vatsan/pandas_via_psql/a0dfd738f722c406611620e43ddbc7b925c308b7/plots/barplot.png -------------------------------------------------------------------------------- /plots/boxplot.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/vatsan/pandas_via_psql/a0dfd738f722c406611620e43ddbc7b925c308b7/plots/boxplot.png -------------------------------------------------------------------------------- /plots/density.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/vatsan/pandas_via_psql/a0dfd738f722c406611620e43ddbc7b925c308b7/plots/density.png -------------------------------------------------------------------------------- /plots/hexbin.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/vatsan/pandas_via_psql/a0dfd738f722c406611620e43ddbc7b925c308b7/plots/hexbin.png -------------------------------------------------------------------------------- /plots/histogram.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/vatsan/pandas_via_psql/a0dfd738f722c406611620e43ddbc7b925c308b7/plots/histogram.png -------------------------------------------------------------------------------- /plots/scatter_matrix.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/vatsan/pandas_via_psql/a0dfd738f722c406611620e43ddbc7b925c308b7/plots/scatter_matrix.png -------------------------------------------------------------------------------- /plots/time_series.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/vatsan/pandas_via_psql/a0dfd738f722c406611620e43ddbc7b925c308b7/plots/time_series.png -------------------------------------------------------------------------------- /ppsqlviz/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/vatsan/pandas_via_psql/a0dfd738f722c406611620e43ddbc7b925c308b7/ppsqlviz/__init__.py -------------------------------------------------------------------------------- /ppsqlviz/plotter.py: -------------------------------------------------------------------------------- 1 | ''' 2 | Pandas Plotting from STDIN. 3 | Srivatsan Ramanujam 4 | 20 Jan 2014 5 | ============================================================================================================================ 6 | Usage: 7 | ========= 8 | Syntax: python plotter.py 9 | Examples: 10 | ========= 11 | 1) home$ psql -d vatsandb -h dca -U gpadmin -c 'select * from wine;' | python plotter.py scatter 12 | 2) home$ psql -d vatsandb -h dca -U gpadmin -c 'select ash, flavanoids, hue, proline from wine;' | python plotter.py box 13 | 3) home$ psql -d vatsandb -h dca -U gpadmin -c 'select ash, flavanoids, hue, proline from wine;' | python plotter.py hist 14 | 4) home$ psql -d vatsandb -h dca -U gpadmin -c 'select dt, high, low from sandp_prices where dt > 1998 order by dt;' | python plotter.py tseries 15 | 5) home$ psql -d vatsandb -h dca -U gpadmin -c 'select ash, flavanoids, hue, proline from wine;' | python plotter.py density 16 | 6) home$ psql -d vatsandb -h dca -U gpadmin -c 'select ash, flavanoids from wine;' | python plotter.py hexbin 17 | ============================================================================================================================ 18 | ''' 19 | 20 | import pandas as pd 21 | from pandas import DataFrame 22 | from pandas.tools.plotting import scatter_matrix 23 | import matplotlib.pyplot as plt 24 | import matplotlib.cm as cm 25 | from StringIO import StringIO 26 | import numpy as np 27 | import fileinput 28 | import re 29 | import numpy as np 30 | 31 | def scatterMatrix(dframe): 32 | ''' 33 | Show Scatter Matrix 34 | ''' 35 | df = DataFrame(dframe) 36 | #Rename columns so that the plot if not very cluttered. 37 | df.columns = range(len(df.columns)) 38 | smatrix = scatter_matrix(df, alpha=0.2, figsize=(6, 6), diagonal='kde') 39 | plt.show() 40 | 41 | def hexbinPlot(dframe): 42 | ''' 43 | Show 2-d hexbin plot 44 | ''' 45 | x_label, y_label=dframe.columns[0],dframe.columns[1] 46 | x, y = dframe[x_label],dframe[y_label] 47 | hexbinplt = plt.hexbin(x,y,gridsize=30) 48 | cbar = plt.colorbar() 49 | cbar.set_label('count') 50 | plt.xlabel(x_label) 51 | plt.ylabel(y_label) 52 | plt.show() 53 | 54 | def boxPlot(dframe): 55 | ''' 56 | Show Box Plot of various fields 57 | ''' 58 | box_plot = dframe.boxplot() 59 | plt.show() 60 | 61 | def histogramPlot(dframe): 62 | ''' 63 | Show histogram of various fields 64 | ''' 65 | if(len(dframe.columns)>1): 66 | hist_plot = dframe.hist(figsize=(6, 6)) 67 | else: 68 | hist_plot = dframe.hist() 69 | plt.show() 70 | 71 | def timeSeriesPlot(dframe): 72 | ''' 73 | Show time series plot using pandas. The first column should be a date/time column. 74 | ''' 75 | #The first column should be a date column and that will used as an index. 76 | dframe.set_index(dframe.columns[0]).plot() 77 | plt.show() 78 | 79 | def densityPlot(dframe): 80 | ''' 81 | Show Kernel Density Plots 82 | ''' 83 | if(len(dframe.columns)>1): 84 | hist_plot = dframe.plot(kind='kde',linewidth=3, figsize=(6, 6)) 85 | else: 86 | hist_plot = dframe.plot(kind='kde',linewidth=3) 87 | plt.show() 88 | 89 | def barPlot(dframe): 90 | ''' 91 | Show Bar Plot. The first column should be the x-coordinates (e.g.,bin centers) and 92 | the second column should be the y-coordinates (e.g., counts). 93 | ''' 94 | x_label, y_label=dframe.columns[0],dframe.columns[1] 95 | width = (dframe[x_label][1]-dframe[x_label][0])*0.7 96 | plt.bar(dframe[x_label],dframe[y_label],align='center',width=width) 97 | plt.xlabel(x_label) 98 | plt.ylabel(y_label) 99 | plt.show() 100 | 101 | def imgPlot(dframe): 102 | ''' 103 | Show Image 104 | ''' 105 | if(len(dframe.columns)>1): 106 | r = int(dframe.iloc[0][0]) 107 | c = int(dframe.iloc[0][1]) 108 | patch = dframe.iloc[0][2] 109 | patch = patch.replace('{',''); 110 | patch = patch.replace('}',''); 111 | str_pixels = patch.split(','); 112 | pixels = np.array(map(float, str_pixels)); 113 | pixels = pixels.reshape(r,c) 114 | plt.imshow(pixels,cmap = cm.Greys_r) 115 | else: 116 | print 'Usage: for image plot, 3 columns are expected, with the first two columns being the number of rows and number of columns that make up the image and the third column being the actual image pixelsa as a vector' 117 | plt.show() 118 | 119 | def imgRGBPlot(dframe): 120 | ''' 121 | Show RGB Image 122 | ''' 123 | if(len(dframe.columns)>1): 124 | r = int(dframe.iloc[0][0]) 125 | c = int(dframe.iloc[0][1]) 126 | patch = dframe.iloc[0][2] 127 | patch = patch.replace('{',''); 128 | patch = patch.replace('}',''); 129 | str_pixels = patch.split(','); 130 | pixels = np.array(map(float, str_pixels)); 131 | im = np.reshape(pixels, (-1,3)) # reshape 132 | im = np.dstack((np.reshape([np.uint8(float(i)) for i in pixels[0:r*c]], (r,c)), 133 | np.reshape([np.uint8(float(i)) for i in pixels[r*c:2*r*c]], (r,c)), 134 | np.reshape([np.uint8(float(i)) for i in pixels[2*r*c:3*r*c]], (r,c)))) 135 | plt.imshow(im) 136 | else: 137 | print 'Usage: for image plot, 3 columns are expected, with the first two columns being the number of rows and number of columns that make up the image and the third column being the actual image pixelsa with R,G,& B values listed sequentially as a vector' 138 | plt.show() 139 | 140 | def readTableFromPipe(plot_type): 141 | ''' 142 | Read the output of a SQL query from a pipe and display a scatter plot 143 | ''' 144 | rows_pattern = re.compile(r'^\(\d+ rows\)$') 145 | underline_pattern = re.compile(r'^(-+\+-+)+$') 146 | single_underline_pattern = re.compile(r'^(-)+$') 147 | data =[] 148 | for line in fileinput.input(): 149 | #Skip lines not representing header or data 150 | if(line.strip() and not rows_pattern.match(line) and not underline_pattern.match(line) and not single_underline_pattern.match(line)): 151 | data.append(re.sub('\s+','',line)) 152 | dframe = pd.read_csv(StringIO('\n'.join(data)), sep='|', index_col=False) 153 | if(plot_type=='scatter'): 154 | scatterMatrix(dframe) 155 | elif(plot_type=='box'): 156 | boxPlot(dframe) 157 | elif(plot_type=='hist'): 158 | histogramPlot(dframe) 159 | elif(plot_type=='tseries'): 160 | timeSeriesPlot(dframe) 161 | elif(plot_type=='density'): 162 | densityPlot(dframe) 163 | elif(plot_type=='hexbin'): 164 | hexbinPlot(dframe) 165 | elif(plot_type=='bar'): 166 | barPlot(dframe) 167 | elif(plot_type=='image'): 168 | imgPlot(dframe) 169 | elif(plot_type=='imageRGB'): 170 | imgRGBPlot(dframe) 171 | 172 | if(__name__ == '__main__'): 173 | from sys import argv 174 | if(len(argv)!=2): 175 | print 'Usage: python plotter.py ' 176 | else: 177 | plot_type = argv[1] 178 | #Remove arguments list from Argv (else fileinput will cry). 179 | argv.pop() 180 | readTableFromPipe(plot_type) 181 | -------------------------------------------------------------------------------- /setup.py: -------------------------------------------------------------------------------- 1 | from setuptools import setup, find_packages 2 | from distutils.util import convert_path 3 | import os,sys 4 | from fnmatch import fnmatchcase 5 | 6 | # Provided as an attribute, so you can append to these instead 7 | # of replicating them: 8 | standard_exclude = ('*.pyc', '*$py.class', '*~', '.*', '*.bak') 9 | standard_exclude_directories = ('.*', 'CVS', '_darcs', './build', 10 | './dist', 'EGG-INFO', '*.egg-info','plots') 11 | 12 | 13 | # (c) 2005 Ian Bicking and contributors; written for Paste (http://pythonpaste.org) 14 | # Licensed under the MIT license: http://www.opensource.org/licenses/mit-license.php 15 | # Note: you may want to copy this into your setup.py file verbatim, as 16 | # you can't import this from another package, when you don't know if 17 | # that package is installed yet. 18 | def find_package_data( 19 | where='.', package='', 20 | exclude=standard_exclude, 21 | exclude_directories=standard_exclude_directories, 22 | only_in_packages=True, 23 | show_ignored=False): 24 | """ 25 | Return a dictionary suitable for use in ``package_data`` 26 | in a distutils ``setup.py`` file. 27 | The dictionary looks like:: 28 | {'package': [files]} 29 | Where ``files`` is a list of all the files in that package that 30 | don't match anything in ``exclude``. 31 | If ``only_in_packages`` is true, then top-level directories that 32 | are not packages won't be included (but directories under packages 33 | will). 34 | Directories matching any pattern in ``exclude_directories`` will 35 | be ignored; by default directories with leading ``.``, ``CVS``, 36 | and ``_darcs`` will be ignored. 37 | If ``show_ignored`` is true, then all the files that aren't 38 | included in package data are shown on stderr (for debugging 39 | purposes). 40 | Note patterns use wildcards, or can be exact paths (including 41 | leading ``./``), and all searching is case-insensitive. 42 | """ 43 | 44 | out = {} 45 | stack = [(convert_path(where), '', package, only_in_packages)] 46 | while stack: 47 | where, prefix, package, only_in_packages = stack.pop(0) 48 | for name in os.listdir(where): 49 | fn = os.path.join(where, name) 50 | if os.path.isdir(fn): 51 | bad_name = False 52 | for pattern in exclude_directories: 53 | if (fnmatchcase(name, pattern) 54 | or fn.lower() == pattern.lower()): 55 | bad_name = True 56 | if show_ignored: 57 | print >> sys.stderr, ( 58 | "Directory %s ignored by pattern %s" 59 | % (fn, pattern)) 60 | break 61 | if bad_name: 62 | continue 63 | if (os.path.isfile(os.path.join(fn, '__init__.py')) 64 | and not prefix): 65 | if not package: 66 | new_package = name 67 | else: 68 | new_package = package + '.' + name 69 | stack.append((fn, '', new_package, False)) 70 | else: 71 | stack.append((fn, prefix + name + '/', package, only_in_packages)) 72 | elif package or not only_in_packages: 73 | # is a file 74 | bad_name = False 75 | for pattern in exclude: 76 | if (fnmatchcase(name, pattern) 77 | or fn.lower() == pattern.lower()): 78 | bad_name = True 79 | if show_ignored: 80 | print >> sys.stderr, ( 81 | "File %s ignored by pattern %s" 82 | % (fn, pattern)) 83 | break 84 | if bad_name: 85 | continue 86 | out.setdefault(package, []).append(prefix+name) 87 | return out 88 | 89 | setup( 90 | name='ppsqlviz', 91 | version='1.0.1', 92 | author='Srivatsan Ramanujam', 93 | author_email='vatsan.cs@utexas.edu', 94 | url='http://vatsan.github.io/pandas_via_psql/', 95 | packages=find_packages(), 96 | package_data=find_package_data(only_in_packages=False,show_ignored=True), 97 | include_package_data=True, 98 | license='LICENSE.txt', 99 | description='A command line visualization utility for SQL using Pandas library in Python.', 100 | long_description=open('README.md').read(), 101 | install_requires=[ 102 | "pandas >= 0.13.0" 103 | ], 104 | ) 105 | --------------------------------------------------------------------------------