├── .gitignore
├── LICENSE.txt
├── MANIFEST.in
├── README.md
├── plots
    ├── YosemiteGrayscale.jpg
    ├── YosemiteGrayscale.png
    ├── YosemiteRGB.jpeg
    ├── barplot.png
    ├── boxplot.png
    ├── density.png
    ├── hexbin.png
    ├── histogram.png
    ├── scatter_matrix.png
    └── time_series.png
├── ppsqlviz
    ├── __init__.py
    └── plotter.py
└── setup.py


/.gitignore:
--------------------------------------------------------------------------------
 1 | # Byte-compiled / optimized / DLL files
 2 | __pycache__/
 3 | *.py[cod]
 4 | 
 5 | #Edited file backups
 6 | *~
 7 | 
 8 | #Mac DS_store files
 9 | *.DS_Store
10 | 
11 | # C extensions
12 | *.so
13 | 
14 | # Distribution / packaging
15 | .Python
16 | env/
17 | bin/
18 | build/
19 | develop-eggs/
20 | dist/
21 | eggs/
22 | lib/
23 | lib64/
24 | parts/
25 | sdist/
26 | var/
27 | *.egg-info/
28 | .installed.cfg
29 | *.egg
30 | 
31 | # Installer logs
32 | pip-log.txt
33 | pip-delete-this-directory.txt
34 | 
35 | # Unit test / coverage reports
36 | htmlcov/
37 | .tox/
38 | .coverage
39 | .cache
40 | nosetests.xml
41 | coverage.xml
42 | 
43 | # Translations
44 | *.mo
45 | 
46 | # Mr Developer
47 | .mr.developer.cfg
48 | .project
49 | .pydevproject
50 | 
51 | # Rope
52 | .ropeproject
53 | 
54 | # Django stuff:
55 | *.log
56 | *.pot
57 | 
58 | # Sphinx documentation
59 | docs/_build/
60 | 


--------------------------------------------------------------------------------
/LICENSE.txt:
--------------------------------------------------------------------------------
 1 | Copyright Srivatsan Ramanujam (vatsan.cs@utexas.edu).
 2 | 
 3 | Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
 4 | - Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. 
 5 | - Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. 
 6 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A 
 7 | PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT 
 8 | LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR 
 9 | TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
10 | 
11 | 


--------------------------------------------------------------------------------
/MANIFEST.in:
--------------------------------------------------------------------------------
1 | include *.txt *.md
2 | 
3 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | GitHub Page
  2 | ============
  3 | 
  4 | Pandas-via-psql (ppsqlviz) is a command line visualization utility for SQL using Pandas library in Python.
  5 | Please visit the GitHub page [ppsqlviz](http://vatsan.github.io/pandas_via_psql/) for a complete tutorial.
  6 | 
  7 | PSQL + Pandas Awesomeness
  8 | ==========================
  9 | 
 10 | [Pandas](http://pandas.pydata.org/) is a popular library in Python that is commonly used for data analysis and it provides Python equivalent of the R dataframe that is fundamental to data analysis. Some engineers and data scientists however are increasingly adopting SQL based libraries for building large scale machine learning algorithms. [MADlib](http://madlib.net) is one such library for scalable, parallel, in-database machine learning.
 11 | 
 12 | While there are commercial tools to visualize data that reside in databases (example: Tableau), often what's missing in a Big Data scientist's arsenal is a command line tool to be able to quickly visualize the output of a SQL query, without having to switch to a commercial tool or have to use a wrapper to a SQL engine. The pandas_via_psql (ppsqlviz) will show you how simple it is to redirect the output of a SQL query to some boilerplate Pandas's plotting functions, to quickly visualize the data from the command line.
 13 | 
 14 | Pre-Requisites
 15 | ==============
 16 | 
 17 | ppsqlviz depends on the Pandas python library. You should also have [PSQL](http://www.postgresql.org/docs/8.1/static/app-psql.html) or a similar SQL command line interface to connect to your database and also ensure that you have password-less access to your remote database (set up SSH keys appropriately).
 18 | 
 19 | I recommend you download [Anaconda Python](https://store.continuum.io/cshop/anaconda/) from the nice folks at [Continuum Analytics](http://continuum.io/). It's got most of the essential Python scientific computing libraries pre-packaged and with [conda](http://bokeh.pydata.org/) you can save a lot of pain in installing python libraries. It also makes creating and managing virtual environments a piece of cake!
 20 | 
 21 | Installation
 22 | =============
 23 | 
 24 | You can install install ppsqlviz through pip
 25 | 
 26 | ```
 27 | pip install ppsqlviz
 28 | ```
 29 | 
 30 | This will install the dependent library (Pandas) if you don't already have that. I strongly encourage you use Anaconda Python to avoid going down the rabbit hole of PyData stack dependency nightmares.
 31 | 
 32 | 
 33 | Datasets Used
 34 | ==============
 35 | 
 36 | For this demo, I'm using two publicly available datasets. 
 37 | * [The UCI wine quality dataset](http://archive.ics.uci.edu/ml/datasets/Wine+Quality) - Here is a sampling of rows from this dataset:
 38 | 
 39 | ```
 40 | alcohol | mmalic_acid | ash  | alcalinity_of_ash | magnesium | total_phenols | flavanoids | nonflavanoid_phenols | proanthocyanins | color_intensity |  hue  | od280 | proline | quality 
 41 | ---------+-------------+------+-------------------+-----------+---------------+------------+----------------------+-----------------+-----------------+-------+-------+---------+---------
 42 |        1 |       14.23 | 1.71 |              2.43 |      15.6 |           127 |        2.8 |                 3.06 |            0.28 |            2.29 |  5.64 |  1.04 |    3.92 |    1065
 43 |        1 |        13.2 | 1.78 |              2.14 |      11.2 |           100 |       2.65 |                 2.76 |            0.26 |            1.28 |  4.38 |  1.05 |     3.4 |    1050
 44 |        1 |       13.16 | 2.36 |              2.67 |      18.6 |           101 |        2.8 |                 3.24 |             0.3 |            2.81 |  5.68 |  1.03 |    3.17 |    1185
 45 |        1 |       14.37 | 1.95 |               2.5 |      16.8 |           113 |       3.85 |                 3.49 |            0.24 |            2.18 |   7.8 |  0.86 |    3.45 |    1480
 46 |        1 |       13.24 | 2.59 |              2.87 |        21 |           118 |        2.8 |                 2.69 |            0.39 |            1.82 |  4.32 |  1.04 |    2.93 |     735
 47 |        1 |        14.2 | 1.76 |              2.45 |      15.2 |           112 |       3.27 |                 3.39 |            0.34 |            1.97 |  6.75 |  1.05 |    2.85 |    1450
 48 |        1 |       14.39 | 1.87 |              2.45 |      14.6 |            96 |        2.5 |                 2.52 |             0.3 |            1.98 |  5.25 |  1.02 |    3.58 |    1290
 49 |        1 |       14.06 | 2.15 |              2.61 |      17.6 |           121 |        2.6 |                 2.51 |            0.31 |            1.25 |  5.05 |  1.06 |    3.58 |    1295
 50 |        1 |       14.83 | 1.64 |              2.17 |        14 |            97 |        2.8 |                 2.98 |            0.29 |            1.98 |   5.2 |  1.08 |    2.85 |    1045
 51 |        1 |       13.86 | 1.35 |              2.27 |        16 |            98 |       2.98 |                 3.15 |            0.22 |            1.85 |  7.22 |  1.01 |    3.55 |    1045
 52 |        1 |        14.1 | 2.16 |               2.3 |        18 |           105 |       2.95 |                 3.32 |            0.22 |            2.38 |  5.75 |  1.25 |    3.17 |    1510
 53 |        1 |       14.12 | 1.48 |              2.32 |      16.8 |            95 |        2.2 |                 2.43 |            0.26 |            1.57 |     5 |  1.17 |    2.82 |    1280
 54 |        1 |       13.75 | 1.73 |              2.41 |        16 |            89 |        2.6 |                 2.76 |            0.29 |            1.81 |   5.6 |  1.15 |     2.9 |    1320
 55 |        1 |       14.75 | 1.73 |              2.39 |      11.4 |            91 |        3.1 |                 3.69 |            0.43 |            2.81 |   5.4 |  1.25 |    2.73 |    1150
 56 |        1 |       14.38 | 1.87 |              2.38 |        12 |           102 |        3.3 |                 3.64 |            0.29 |            2.96 |   7.5 |   1.2 |       3 |    1547
 57 |        1 |       13.63 | 1.81 |               2.7 |      17.2 |           112 |       2.85 |                 2.91 |             0.3 |            1.46 |   7.3 |  1.28 |    2.88 |    1310
 58 | 
 59 | ```
 60 | 
 61 | * [The S&P daily prices dataset](http://finance.yahoo.com/q/hp?s=%5EGSPC+Historical+Prices) - Here is a sampling of rows from this dataset:
 62 | 
 63 | ```
 64 |      dt     |  open   |  high   |   low   |  close  |   volume   | adj_close 
 65 | ------------+---------+---------+---------+---------+------------+-----------
 66 |  2013-09-27 | 1695.52 | 1695.52 | 1687.11 | 1691.75 | 2951700000 |   1691.75
 67 |  2012-04-23 | 1378.53 | 1378.53 | 1358.79 | 1366.94 | 3654860000 |   1366.94
 68 |  2012-01-18 | 1293.65 | 1308.11 | 1290.99 | 1308.04 | 4096160000 |   1308.04
 69 |  2011-09-07 | 1165.85 | 1198.62 | 1165.85 | 1198.62 | 4441040000 |   1198.62
 70 |  2011-06-03 | 1312.94 | 1312.94 |  1297.9 | 1300.16 | 3505030000 |   1300.16
 71 |  2011-03-31 | 1327.44 | 1329.77 | 1325.03 | 1325.83 | 3566270000 |   1325.83
 72 |  2010-12-28 |  1259.1 |  1259.9 | 1256.22 | 1258.51 | 2478450000 |   1258.51
 73 |  2010-09-23 |  1131.1 | 1136.77 | 1122.79 | 1124.83 | 3847850000 |   1124.83
 74 |  2010-07-21 | 1086.67 | 1088.96 | 1065.25 | 1069.59 | 4747180000 |   1069.59
 75 |  2010-05-13 | 1170.04 | 1173.57 | 1156.14 | 1157.44 | 4870640000 |   1157.44
 76 |  2010-03-10 | 1140.22 | 1148.26 | 1140.09 | 1145.61 | 5469120000 |   1145.61
 77 |  2009-12-04 | 1100.43 | 1119.13 | 1096.52 | 1105.98 | 5781140000 |   1105.98
 78 |  2009-07-24 |  972.16 |  979.79 |  965.95 |  979.26 | 4458300000 |    979.26
 79 |  2009-02-09 |  868.24 |  875.01 |  861.65 |  869.89 | 5574370000 |    869.89
 80 |  2008-11-05 | 1001.84 | 1001.84 |  949.86 |  952.77 | 5426640000 |    952.77
 81 |  2008-09-02 | 1287.83 | 1303.04 |  1272.2 | 1277.58 | 4783560000 |   1277.58
 82 |  2008-04-30 | 1391.22 | 1404.57 | 1384.25 | 1385.59 | 4508890000 |   1385.59
 83 |  2008-01-25 | 1357.32 | 1368.56 |  1327.5 | 1330.61 | 4882250000 |   1330.61
 84 |  2007-09-14 | 1483.95 | 1485.99 | 1473.18 | 1484.25 | 2641740000 |   1484.25
 85 | ```
 86 | 
 87 | Usage
 88 | ======
 89 | Invoke Pandas plotting functions by piping in the output from a psql query.
 90 | You can re-use this boiler-plate code for Scatter Plots, Box Plots, Histograms and Time Series Plots on your tables.
 91 | 
 92 | 
 93 | Scatter Matrix
 94 | ===============
 95 | This is pretty useful when you are interested in analyzing the correlation between a bunch of features in a dataset, particularly in their correlation with the target attribute/label. You might then perform feature selection based on a visual output of the correlations.
 96 | 
 97 | Here is how the scatter matrix can be created on the UCI Wine Quality Dataset
 98 | ```
 99 | home$ psql -d vatsandb -h dca -U gpadmin -c 'select * from wine;' | python -m 'ppsqlviz.plotter' scatter
100 | ```
101 | Here is the output ![Scatter Matrix of all features from the Wine Quality Dataset]
102 | (https://raw.githubusercontent.com/vatsan/pandas_via_psql/master/plots/scatter_matrix.png)
103 | 
104 | 
105 | Hexbin Plots
106 | =============
107 | Scatter plots sometimes may not reveal the underlying relationship between the dimensions when multiple points overlap.
108 | For this reason, it is better to look at a 2-d histogram or a hex-bin plot. We can tap into `matplotlib's` hexbin plot for this.
109 | 
110 | You could invoke it from your command line like so:
111 | ```
112 | home$ psql -d vatsandb -h dca -U gpadmin -c 'select ash, flavanoids from wine;' | python -m 'ppsqlviz.plotter' hexbin
113 | ```
114 | Here is the output ![Hexbin plot of Ash vs. Flavanoids from Wine Quality Dataset]
115 | (https://raw.githubusercontent.com/vatsan/pandas_via_psql/master/plots/hexbin.png)
116 | 
117 | 
118 | Histogram Plot
119 | ==============
120 | To get a quick glimpse of the distribution of the data in your columns, a histogram plot of all columns is quite useful.
121 | 
122 | You could invoke it from your command line like so:
123 | ```
124 | home$ psql -d vatsandb -h dca -U gpadmin -c 'select ash, flavanoids, hue, proline from wine;' | python -m 'ppsqlviz.plotter' hist
125 | ```
126 | Here is the output 
127 | ![Histogram Plots of some features from the Wine Quality Dataset](https://raw.githubusercontent.com/vatsan/pandas_via_psql/master/plots/histogram.png)
128 | 
129 | Density Plot
130 | =============
131 | In place of binning your data, you might consider plotting the density directly. 
132 | 
133 | You could invoke it from your command line like so:
134 | ```
135 | home$ psql -d vatsandb -h dca -U gpadmin -c 'select ash, flavanoids, hue, proline from wine;' | python -m 'ppsqlviz.plotter' density
136 | ```
137 | Here is the output ![Density Plots of some features from the Wine Quality Dataset](https://raw.github.com/vatsan/pandas_via_psql/master/plots/density.png)
138 | 
139 | Box Plot
140 | =========
141 | Box plots are useful in visually getting a feel for the quartile ranges of numerical columns in your dataset. You could invoke it from your command line like so:
142 | 
143 | ```
144 | home$ psql -d vatsandb -h dca -U gpadmin -c 'select ash, flavanoids, hue, proline from wine;' | python -m 'ppsqlviz.plotter' box
145 | ```
146 | Here is the output ![Box Plot of some features from the Wine Quality Dataset](https://raw.githubusercontent.com/vatsan/pandas_via_psql/master/plots/boxplot.png)
147 | 
148 | Time Series Plot
149 | =================
150 | Again, Pandas has an impressive collection of functions for time series analysis but to quickly visualize a time series, you can run the following from your command line:
151 | ```
152 | home$ psql -d vatsandb -h dca -U gpadmin -c 'select dt, high, low  from sandp_prices where dt > 1998 order by dt;' | python -m 'ppsqlviz.plotter' tseries
153 | ```
154 | Here is the output ![Time Series Plotting of S&P](https://raw.githubusercontent.com/vatsan/pandas_via_psql/master/plots/time_series.png)
155 | 
156 | Bar Plot
157 | ==================
158 | Bar plots are typically used to plot binned data, where the data is binned according to user specified bins. This support is provided in pandas-via-psql. The data table is expected to comprise of two array columns of the same length, one each for the x and y axes. You can plot a bar plot by running the following from your command line:
159 | 
160 | ```
161 | home$ psql -d <dbname> -h <hostname> -U gpadmin -c 'select x*10 as binCenter, random()*100 as count from generate_series(1, 100) x;' | python -m 'ppsqlviz.plotter' bar 
162 | ```
163 | The first column always has to be the x axis (bin center).
164 | Here's the output ![Bar Plot](https://raw.githubusercontent.com/vatsan/pandas_via_psql/master/plots/barplot.png)
165 | 
166 | Image Rendering
167 | ===================
168 | Pandas also has a great set of tools for viewing images: grayscale or RGB, which can be quite handy when working on image processing or computer vision in SQL. For example, to check a binary mask after thresholding or the weights output by a deep learning algorithm, it is much easier to visualize an image than to interpret a table of intensity values.
169 | To view an image whose intensity values are stored in a table, simply select the height and width of the image (number of rows & columns) followed by a vector of intensity values ordered by row, then column. For example, to view this 270x360 pixel grayscale image, you can run the following from your command line:
170 | 
171 | ```
172 | home$ psql -d vatsandb -h dca -U gpadmin -c 'select 270 as rows, 360 as cols, intensity_values from sample_image;' | python -m 'ppsqlviz.plotter' image
173 | ```
174 | 
175 | Here is the output ![Sample Grayscale image](https://raw.githubusercontent.com/vatsan/pandas_via_psql/master/plots/YosemiteGrayscale.png)
176 | 
177 | Similarly, to view an RGB image, provide the image height and width followed by a vector of intensity values ordered by row, then column, then color. To view a sample RGB image you can run the following from your comman line:
178 | 
179 | ```
180 | home$ psql -d vatsandb -h dca -U gpadmin -c 'select max(row)+1, max(col)+1, array[array_agg(red_intensity order by row,col), array_agg(green_intensity order by row,col), array_agg(blue_intensity order by row,col)] from (select * from sample_RGB_image order by row,col)t;' | python -m 'ppsqlviz.plotter' imageRGB
181 | ```
182 | 
183 | Here is the output ![Sample RGB image](https://raw.githubusercontent.com/vatsan/pandas_via_psql/master/plots/YosemiteRGB.jpeg)
184 | 
185 | 
186 | Author
187 | =======
188 | 
189 | Please email questions and feedback to [Srivatsan Ramanujam](https://github.com/vatsan/) at vatsan.cs@utexas.edu
190 | 
191 | Contributors
192 | ==============
193 | 
194 | Thanks to [Ailey Crow](https://github.com/ailey) and [Gautam Muralidhar](https://github.com/gautamsm) for their contributions.
195 | 
196 | 


--------------------------------------------------------------------------------
/plots/YosemiteGrayscale.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/vatsan/pandas_via_psql/a0dfd738f722c406611620e43ddbc7b925c308b7/plots/YosemiteGrayscale.jpg


--------------------------------------------------------------------------------
/plots/YosemiteGrayscale.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/vatsan/pandas_via_psql/a0dfd738f722c406611620e43ddbc7b925c308b7/plots/YosemiteGrayscale.png


--------------------------------------------------------------------------------
/plots/YosemiteRGB.jpeg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/vatsan/pandas_via_psql/a0dfd738f722c406611620e43ddbc7b925c308b7/plots/YosemiteRGB.jpeg


--------------------------------------------------------------------------------
/plots/barplot.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/vatsan/pandas_via_psql/a0dfd738f722c406611620e43ddbc7b925c308b7/plots/barplot.png


--------------------------------------------------------------------------------
/plots/boxplot.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/vatsan/pandas_via_psql/a0dfd738f722c406611620e43ddbc7b925c308b7/plots/boxplot.png


--------------------------------------------------------------------------------
/plots/density.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/vatsan/pandas_via_psql/a0dfd738f722c406611620e43ddbc7b925c308b7/plots/density.png


--------------------------------------------------------------------------------
/plots/hexbin.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/vatsan/pandas_via_psql/a0dfd738f722c406611620e43ddbc7b925c308b7/plots/hexbin.png


--------------------------------------------------------------------------------
/plots/histogram.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/vatsan/pandas_via_psql/a0dfd738f722c406611620e43ddbc7b925c308b7/plots/histogram.png


--------------------------------------------------------------------------------
/plots/scatter_matrix.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/vatsan/pandas_via_psql/a0dfd738f722c406611620e43ddbc7b925c308b7/plots/scatter_matrix.png


--------------------------------------------------------------------------------
/plots/time_series.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/vatsan/pandas_via_psql/a0dfd738f722c406611620e43ddbc7b925c308b7/plots/time_series.png


--------------------------------------------------------------------------------
/ppsqlviz/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/vatsan/pandas_via_psql/a0dfd738f722c406611620e43ddbc7b925c308b7/ppsqlviz/__init__.py


--------------------------------------------------------------------------------
/ppsqlviz/plotter.py:
--------------------------------------------------------------------------------
  1 | '''
  2 | Pandas Plotting from STDIN.
  3 | Srivatsan Ramanujam <vatsan.cs@utexas.edu> 
  4 | 20 Jan 2014
  5 | ============================================================================================================================
  6 | Usage:
  7 | =========
  8 | Syntax: python plotter.py <hist|box|scatter|tseries|density|hexbin>
  9 | Examples:
 10 | =========
 11 | 1) home$ psql -d vatsandb -h dca -U gpadmin -c 'select * from wine;' | python plotter.py scatter
 12 | 2) home$ psql -d vatsandb -h dca -U gpadmin -c 'select ash, flavanoids, hue, proline from wine;' | python plotter.py box
 13 | 3) home$ psql -d vatsandb -h dca -U gpadmin -c 'select ash, flavanoids, hue, proline from wine;' | python plotter.py hist
 14 | 4) home$ psql -d vatsandb -h dca -U gpadmin -c 'select dt, high, low  from sandp_prices where dt > 1998 order by dt;' | python plotter.py tseries
 15 | 5) home$ psql -d vatsandb -h dca -U gpadmin -c 'select ash, flavanoids, hue, proline from wine;' | python plotter.py density
 16 | 6) home$ psql -d vatsandb -h dca -U gpadmin -c 'select ash, flavanoids from wine;' | python plotter.py hexbin
 17 | ============================================================================================================================
 18 | '''
 19 | 
 20 | import pandas as pd
 21 | from pandas import DataFrame
 22 | from pandas.tools.plotting import scatter_matrix
 23 | import matplotlib.pyplot as plt
 24 | import matplotlib.cm as cm
 25 | from StringIO import StringIO
 26 | import numpy as np
 27 | import fileinput
 28 | import re
 29 | import numpy as np
 30 | 
 31 | def scatterMatrix(dframe):
 32 |     '''
 33 |        Show Scatter Matrix
 34 |     '''
 35 |     df = DataFrame(dframe)
 36 |     #Rename columns so that the plot if not very cluttered.
 37 |     df.columns = range(len(df.columns))
 38 |     smatrix = scatter_matrix(df, alpha=0.2, figsize=(6, 6), diagonal='kde')
 39 |     plt.show()
 40 |     
 41 | def hexbinPlot(dframe):
 42 |     '''
 43 |        Show 2-d hexbin plot
 44 |     '''
 45 |     x_label, y_label=dframe.columns[0],dframe.columns[1]
 46 |     x, y = dframe[x_label],dframe[y_label]
 47 |     hexbinplt = plt.hexbin(x,y,gridsize=30)
 48 |     cbar = plt.colorbar()
 49 |     cbar.set_label('count')
 50 |     plt.xlabel(x_label)
 51 |     plt.ylabel(y_label)
 52 |     plt.show()
 53 |     
 54 | def boxPlot(dframe):
 55 |     '''
 56 |        Show Box Plot of various fields
 57 |     '''
 58 |     box_plot = dframe.boxplot()
 59 |     plt.show()
 60 |     
 61 | def histogramPlot(dframe):
 62 |     '''
 63 |        Show histogram of various fields
 64 |     '''
 65 |     if(len(dframe.columns)>1):
 66 |         hist_plot = dframe.hist(figsize=(6, 6))
 67 |     else:
 68 |         hist_plot = dframe.hist()
 69 |     plt.show()
 70 |     
 71 | def timeSeriesPlot(dframe):
 72 |     '''
 73 |        Show time series plot using pandas. The first column should be a date/time column.
 74 |     '''
 75 |     #The first column should be a date column and that will used as an index.
 76 |     dframe.set_index(dframe.columns[0]).plot()
 77 |     plt.show()
 78 | 
 79 | def densityPlot(dframe):
 80 |     '''
 81 |        Show Kernel Density Plots
 82 |     '''
 83 |     if(len(dframe.columns)>1):
 84 |         hist_plot = dframe.plot(kind='kde',linewidth=3, figsize=(6, 6))
 85 |     else:
 86 |         hist_plot = dframe.plot(kind='kde',linewidth=3)
 87 |     plt.show()
 88 | 
 89 | def barPlot(dframe):
 90 |     '''
 91 |        Show Bar Plot. The first column should be the x-coordinates (e.g.,bin centers) and 
 92 |        the second column should be the y-coordinates (e.g., counts).
 93 |     '''
 94 |     x_label, y_label=dframe.columns[0],dframe.columns[1]
 95 |     width = (dframe[x_label][1]-dframe[x_label][0])*0.7
 96 |     plt.bar(dframe[x_label],dframe[y_label],align='center',width=width)
 97 |     plt.xlabel(x_label)
 98 |     plt.ylabel(y_label)
 99 |     plt.show() 
100 |     
101 | def imgPlot(dframe):
102 |     '''
103 |        Show Image
104 |     '''
105 |     if(len(dframe.columns)>1):
106 |         r = int(dframe.iloc[0][0])
107 |         c = int(dframe.iloc[0][1])
108 |         patch = dframe.iloc[0][2]
109 |         patch = patch.replace('{','');
110 |         patch = patch.replace('}','');
111 |         str_pixels = patch.split(',');
112 |         pixels = np.array(map(float, str_pixels));
113 |         pixels = pixels.reshape(r,c)
114 |         plt.imshow(pixels,cmap = cm.Greys_r)
115 |     else:
116 |         print 'Usage: for image plot, 3 columns are expected, with the first two columns being the number of rows and number of columns that make up the image and the third column being the actual image pixelsa as a vector'
117 |     plt.show()
118 | 
119 | def imgRGBPlot(dframe):
120 |     '''
121 |        Show RGB Image
122 |     '''
123 |     if(len(dframe.columns)>1):
124 |         r = int(dframe.iloc[0][0])
125 |         c = int(dframe.iloc[0][1])
126 |         patch = dframe.iloc[0][2]
127 |         patch = patch.replace('{','');
128 |         patch = patch.replace('}','');
129 |         str_pixels = patch.split(',');
130 |         pixels = np.array(map(float, str_pixels));
131 |         im = np.reshape(pixels, (-1,3)) # reshape
132 |         im = np.dstack((np.reshape([np.uint8(float(i)) for i in pixels[0:r*c]], (r,c)),
133 |                 np.reshape([np.uint8(float(i)) for i in pixels[r*c:2*r*c]], (r,c)),
134 |                 np.reshape([np.uint8(float(i)) for i in pixels[2*r*c:3*r*c]], (r,c))))
135 |         plt.imshow(im)
136 |     else:
137 |         print 'Usage: for image plot, 3 columns are expected, with the first two columns being the number of rows and number of columns that make up the image and the third column being the actual image pixelsa with R,G,& B values listed sequentially	as a vector'
138 |     plt.show()
139 |     
140 | def readTableFromPipe(plot_type):
141 |     '''
142 |        Read the output of a SQL query from a pipe and display a scatter plot
143 |     '''
144 |     rows_pattern = re.compile(r'^\(\d+ rows\)$')
145 |     underline_pattern = re.compile(r'^(-+\+-+)+$')
146 |     single_underline_pattern = re.compile(r'^(-)+$')
147 |     data =[]
148 |     for line in fileinput.input():
149 |         #Skip lines not representing header or data
150 |         if(line.strip() and not rows_pattern.match(line) and not underline_pattern.match(line) and not single_underline_pattern.match(line)):
151 |             data.append(re.sub('\s+','',line))
152 |     dframe = pd.read_csv(StringIO('\n'.join(data)), sep='|', index_col=False)
153 |     if(plot_type=='scatter'):
154 |         scatterMatrix(dframe)
155 |     elif(plot_type=='box'):
156 |         boxPlot(dframe)
157 |     elif(plot_type=='hist'):
158 |         histogramPlot(dframe)
159 |     elif(plot_type=='tseries'):
160 |         timeSeriesPlot(dframe)
161 |     elif(plot_type=='density'):
162 |         densityPlot(dframe)
163 |     elif(plot_type=='hexbin'):
164 |         hexbinPlot(dframe)
165 |     elif(plot_type=='bar'):
166 |         barPlot(dframe)
167 |     elif(plot_type=='image'):
168 |     	imgPlot(dframe)
169 |     elif(plot_type=='imageRGB'):
170 |     	imgRGBPlot(dframe)
171 |         
172 | if(__name__ == '__main__'):
173 |     from sys import argv
174 |     if(len(argv)!=2):
175 |         print 'Usage: python plotter.py <hist|box|scatter|tseries|density|hexbin>'
176 |     else:
177 |         plot_type = argv[1]
178 |         #Remove arguments list from Argv (else fileinput will cry).
179 |         argv.pop()
180 |         readTableFromPipe(plot_type)
181 | 


--------------------------------------------------------------------------------
/setup.py:
--------------------------------------------------------------------------------
  1 | from setuptools import setup, find_packages
  2 | from distutils.util import convert_path
  3 | import os,sys
  4 | from fnmatch import fnmatchcase
  5 | 
  6 | # Provided as an attribute, so you can append to these instead
  7 | # of replicating them:
  8 | standard_exclude = ('*.pyc', '*$py.class', '*~', '.*', '*.bak')
  9 | standard_exclude_directories = ('.*', 'CVS', '_darcs', './build',
 10 |                                 './dist', 'EGG-INFO', '*.egg-info','plots')
 11 | 
 12 | 
 13 | # (c) 2005 Ian Bicking and contributors; written for Paste (http://pythonpaste.org)
 14 | # Licensed under the MIT license: http://www.opensource.org/licenses/mit-license.php
 15 | # Note: you may want to copy this into your setup.py file verbatim, as
 16 | # you can't import this from another package, when you don't know if
 17 | # that package is installed yet.
 18 | def find_package_data(
 19 |     where='.', package='',
 20 |     exclude=standard_exclude,
 21 |     exclude_directories=standard_exclude_directories,
 22 |     only_in_packages=True,
 23 |     show_ignored=False):
 24 |     """
 25 |     Return a dictionary suitable for use in ``package_data``
 26 |     in a distutils ``setup.py`` file.
 27 |     The dictionary looks like::
 28 |         {'package': [files]}
 29 |     Where ``files`` is a list of all the files in that package that
 30 |     don't match anything in ``exclude``.
 31 |     If ``only_in_packages`` is true, then top-level directories that
 32 |     are not packages won't be included (but directories under packages
 33 |     will).
 34 |     Directories matching any pattern in ``exclude_directories`` will
 35 |     be ignored; by default directories with leading ``.``, ``CVS``,
 36 |     and ``_darcs`` will be ignored.
 37 |     If ``show_ignored`` is true, then all the files that aren't
 38 |     included in package data are shown on stderr (for debugging
 39 |     purposes).
 40 |     Note patterns use wildcards, or can be exact paths (including
 41 |     leading ``./``), and all searching is case-insensitive.
 42 |     """
 43 | 
 44 |     out = {}
 45 |     stack = [(convert_path(where), '', package, only_in_packages)]
 46 |     while stack:
 47 |         where, prefix, package, only_in_packages = stack.pop(0)
 48 |         for name in os.listdir(where):
 49 |             fn = os.path.join(where, name)
 50 |             if os.path.isdir(fn):
 51 |                 bad_name = False
 52 |                 for pattern in exclude_directories:
 53 |                     if (fnmatchcase(name, pattern)
 54 |                         or fn.lower() == pattern.lower()):
 55 |                         bad_name = True
 56 |                         if show_ignored:
 57 |                             print >> sys.stderr, (
 58 |                                 "Directory %s ignored by pattern %s"
 59 |                                 % (fn, pattern))
 60 |                         break
 61 |                 if bad_name:
 62 |                     continue
 63 |                 if (os.path.isfile(os.path.join(fn, '__init__.py'))
 64 |                     and not prefix):
 65 |                     if not package:
 66 |                         new_package = name
 67 |                     else:
 68 |                         new_package = package + '.' + name
 69 |                     stack.append((fn, '', new_package, False))
 70 |                 else:
 71 |                     stack.append((fn, prefix + name + '/', package, only_in_packages))
 72 |             elif package or not only_in_packages:
 73 |                 # is a file
 74 |                 bad_name = False
 75 |                 for pattern in exclude:
 76 |                     if (fnmatchcase(name, pattern)
 77 |                         or fn.lower() == pattern.lower()):
 78 |                         bad_name = True
 79 |                         if show_ignored:
 80 |                             print >> sys.stderr, (
 81 |                                 "File %s ignored by pattern %s"
 82 |                                 % (fn, pattern))
 83 |                         break
 84 |                 if bad_name:
 85 |                     continue
 86 |                 out.setdefault(package, []).append(prefix+name)
 87 |     return out
 88 | 
 89 | setup(
 90 |     name='ppsqlviz',
 91 |     version='1.0.1',
 92 |     author='Srivatsan Ramanujam',
 93 |     author_email='vatsan.cs@utexas.edu',
 94 |     url='http://vatsan.github.io/pandas_via_psql/',
 95 |     packages=find_packages(),
 96 |     package_data=find_package_data(only_in_packages=False,show_ignored=True),
 97 |     include_package_data=True,
 98 |     license='LICENSE.txt',
 99 |     description='A command line visualization utility for SQL using Pandas library in Python.',
100 |     long_description=open('README.md').read(),
101 |     install_requires=[
102 |         "pandas >= 0.13.0"
103 |     ],
104 | )
105 | 


--------------------------------------------------------------------------------