├── .gitignore
├── .travis.yml
├── LICENSE
├── MANIFEST.in
├── README.rst
├── docs
├── Makefile
├── conf.py
└── index.rst
├── dplyr-comparison.html
├── pandas_ply
├── __init__.py
├── methods.py
├── symbolic.py
└── vendor
│ ├── __init__.py
│ └── six.py
├── setup.py
└── tests
├── test_methods.py
└── test_symbolic.py
/.gitignore:
--------------------------------------------------------------------------------
1 | *.pyc
2 | *.egg-info
3 | *.egg
4 | /MANIFEST
5 | /dist/
6 | /docs/_build
7 | /build/
8 |
--------------------------------------------------------------------------------
/.travis.yml:
--------------------------------------------------------------------------------
1 | sudo: false
2 | language: python
3 | python:
4 | - "2.6"
5 | - "2.7"
6 | - "3.3"
7 | - "3.4"
8 | install:
9 | - if [[ "$TRAVIS_PYTHON_VERSION" == "2.7" ]]; then
10 | wget https://repo.continuum.io/miniconda/Miniconda-latest-Linux-x86_64.sh -O miniconda.sh;
11 | else
12 | wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh -O miniconda.sh;
13 | fi
14 | - bash miniconda.sh -b -p $HOME/miniconda
15 | - export PATH="$HOME/miniconda/bin:$PATH"
16 | - hash -r
17 | - conda config --set always_yes yes --set changeps1 no
18 | - conda update -q conda
19 | - conda info -a
20 |
21 | - conda create -q -n test-environment python=$TRAVIS_PYTHON_VERSION pandas nose mock
22 | - source activate test-environment
23 | - if [[ $TRAVIS_PYTHON_VERSION == 2.6 ]]; then conda install unittest2; fi
24 | - python setup.py install
25 | script:
26 | - nosetests -w tests/ -v -s
27 |
--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
1 | Copyright 2015 Coursera Inc.
2 |
3 | Licensed under the Apache License, Version 2.0 (the "License");
4 | you may not use this file except in compliance with the License.
5 | You may obtain a copy of the License at
6 |
7 | http://www.apache.org/licenses/LICENSE-2.0
8 |
9 | Unless required by applicable law or agreed to in writing, software
10 | distributed under the License is distributed on an "AS IS" BASIS,
11 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12 | See the License for the specific language governing permissions and
13 | limitations under the License.
14 |
--------------------------------------------------------------------------------
/MANIFEST.in:
--------------------------------------------------------------------------------
1 | include README.rst LICENSE
2 |
--------------------------------------------------------------------------------
/README.rst:
--------------------------------------------------------------------------------
1 | **pandas-ply**: functional data manipulation for pandas
2 | =======================================================
3 |
4 | **pandas-ply** is a thin layer which makes it easier to manipulate data with `pandas `_. In particular, it provides elegant, functional, chainable syntax in cases where **pandas** would require mutation, saved intermediate values, or other awkward constructions. In this way, it aims to move **pandas** closer to the "grammar of data manipulation" provided by the `dplyr `_ package for R.
5 |
6 | For example, take the **dplyr** code below:
7 |
8 | .. code:: r
9 |
10 | flights %>%
11 | group_by(year, month, day) %>%
12 | summarise(
13 | arr = mean(arr_delay, na.rm = TRUE),
14 | dep = mean(dep_delay, na.rm = TRUE)
15 | ) %>%
16 | filter(arr > 30 & dep > 30)
17 |
18 | The most common way to express this in **pandas** is probably:
19 |
20 | .. code:: python
21 |
22 | grouped_flights = flights.groupby(['year', 'month', 'day'])
23 | output = pd.DataFrame()
24 | output['arr'] = grouped_flights.arr_delay.mean()
25 | output['dep'] = grouped_flights.dep_delay.mean()
26 | filtered_output = output[(output.arr > 30) & (output.dep > 30)]
27 |
28 | **pandas-ply** lets you instead write:
29 |
30 | .. code:: python
31 |
32 | (flights
33 | .groupby(['year', 'month', 'day'])
34 | .ply_select(
35 | arr = X.arr_delay.mean(),
36 | dep = X.dep_delay.mean())
37 | .ply_where(X.arr > 30, X.dep > 30))
38 |
39 | In our opinion, this **pandas-ply** code is cleaner, more expressive, more readable, more concise, and less error-prone than the original **pandas** code.
40 |
41 | Explanatory notes on the **pandas-ply** code sample above:
42 |
43 | * **pandas-ply**'s methods (like ``ply_select`` and ``ply_where`` above) are attached directly to **pandas** objects and can be used immediately, without any wrapping or redirection. They start with a ``ply_`` prefix to distinguish them from built-in **pandas** methods.
44 | * **pandas-ply**'s methods are named for (and modelled after) SQL's operators. (But keep in mind that these operators will not always appear in the same order as they do in a SQL statement: ``SELECT a FROM b WHERE c GROUP BY d`` probably maps to ``b.ply_where(c).groupby(d).ply_select(a)``.)
45 | * **pandas-ply** includes a simple system for building "symbolic expressions" to provide as arguments to its methods. ``X`` above is an instance of ``ply.symbolic.Symbol``. Operations on this symbol produce larger compound symbolic expressions. When ``pandas-ply`` receives a symbolic expression as an argument, it converts it into a function. So, for instance, ``X.arr > 30`` in the above code could have instead been provided as ``lambda x: x.arr > 30``. Use of symbolic expressions allows the ``lambda x:`` to be left off, resulting in less cluttered code.
46 |
47 | Warning
48 | -------
49 |
50 | **pandas-ply** is new, and in an experimental stage of its development. The API is not yet stable. Expect the unexpected.
51 |
52 | (Pull requests are welcome. Feel free to contact us at pandas-ply@coursera.org.)
53 |
54 | Using **pandas-ply**
55 | --------------------
56 |
57 | Install **pandas-ply** with:
58 |
59 | ::
60 |
61 | $ pip install pandas-ply
62 |
63 |
64 | Typical use of **pandas-ply** starts with:
65 |
66 | .. code:: python
67 |
68 | import pandas as pd
69 | from pandas_ply import install_ply, X, sym_call
70 |
71 | install_ply(pd)
72 |
73 | After calling ``install_ply``, all **pandas** objects have **pandas-ply**'s methods attached.
74 |
75 | API reference
76 | -------------
77 |
78 | Full API reference is available at ``_.
79 |
80 | Possible TODOs
81 | --------------
82 |
83 | * Extend ``pandas``' native ``groupby`` to support symbolic expressions?
84 | * Extend ``pandas``' native ``apply`` to support symbolic expressions?
85 | * Add ``.ply_call`` to ``pandas`` objects to extend chainability?
86 | * Version of ``ply_select`` which supports later computed columns relying on earlier computed columns?
87 | * Version of ``ply_select`` which supports careful column ordering?
88 | * Better handling of indices?
89 |
90 | License
91 | -------
92 |
93 | Copyright 2015 Coursera Inc.
94 |
95 | Licensed under the Apache License, Version 2.0 (the "License");
96 | you may not use this file except in compliance with the License.
97 | You may obtain a copy of the License at
98 |
99 | http://www.apache.org/licenses/LICENSE-2.0
100 |
101 | Unless required by applicable law or agreed to in writing, software
102 | distributed under the License is distributed on an "AS IS" BASIS,
103 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
104 | See the License for the specific language governing permissions and
105 | limitations under the License.
106 |
--------------------------------------------------------------------------------
/docs/Makefile:
--------------------------------------------------------------------------------
1 | # Makefile for Sphinx documentation
2 | #
3 |
4 | # You can set these variables from the command line.
5 | SPHINXOPTS =
6 | SPHINXBUILD = sphinx-build
7 | PAPER =
8 | BUILDDIR = _build
9 |
10 | # Internal variables.
11 | PAPEROPT_a4 = -D latex_paper_size=a4
12 | PAPEROPT_letter = -D latex_paper_size=letter
13 | ALLSPHINXOPTS = -d $(BUILDDIR)/doctrees $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) .
14 | # the i18n builder cannot share the environment and doctrees with the others
15 | I18NSPHINXOPTS = $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) .
16 |
17 | .PHONY: help clean html dirhtml singlehtml pickle json htmlhelp qthelp devhelp epub latex latexpdf text man changes linkcheck doctest gettext
18 |
19 | help:
20 | @echo "Please use \`make ' where is one of"
21 | @echo " html to make standalone HTML files"
22 | @echo " dirhtml to make HTML files named index.html in directories"
23 | @echo " singlehtml to make a single large HTML file"
24 | @echo " pickle to make pickle files"
25 | @echo " json to make JSON files"
26 | @echo " htmlhelp to make HTML files and a HTML help project"
27 | @echo " qthelp to make HTML files and a qthelp project"
28 | @echo " devhelp to make HTML files and a Devhelp project"
29 | @echo " epub to make an epub"
30 | @echo " latex to make LaTeX files, you can set PAPER=a4 or PAPER=letter"
31 | @echo " latexpdf to make LaTeX files and run them through pdflatex"
32 | @echo " text to make text files"
33 | @echo " man to make manual pages"
34 | @echo " texinfo to make Texinfo files"
35 | @echo " info to make Texinfo files and run them through makeinfo"
36 | @echo " gettext to make PO message catalogs"
37 | @echo " changes to make an overview of all changed/added/deprecated items"
38 | @echo " linkcheck to check all external links for integrity"
39 | @echo " doctest to run all doctests embedded in the documentation (if enabled)"
40 |
41 | clean:
42 | -rm -rf $(BUILDDIR)/*
43 |
44 | html:
45 | $(SPHINXBUILD) -b html $(ALLSPHINXOPTS) $(BUILDDIR)/html
46 | @echo
47 | @echo "Build finished. The HTML pages are in $(BUILDDIR)/html."
48 |
49 | dirhtml:
50 | $(SPHINXBUILD) -b dirhtml $(ALLSPHINXOPTS) $(BUILDDIR)/dirhtml
51 | @echo
52 | @echo "Build finished. The HTML pages are in $(BUILDDIR)/dirhtml."
53 |
54 | singlehtml:
55 | $(SPHINXBUILD) -b singlehtml $(ALLSPHINXOPTS) $(BUILDDIR)/singlehtml
56 | @echo
57 | @echo "Build finished. The HTML page is in $(BUILDDIR)/singlehtml."
58 |
59 | pickle:
60 | $(SPHINXBUILD) -b pickle $(ALLSPHINXOPTS) $(BUILDDIR)/pickle
61 | @echo
62 | @echo "Build finished; now you can process the pickle files."
63 |
64 | json:
65 | $(SPHINXBUILD) -b json $(ALLSPHINXOPTS) $(BUILDDIR)/json
66 | @echo
67 | @echo "Build finished; now you can process the JSON files."
68 |
69 | htmlhelp:
70 | $(SPHINXBUILD) -b htmlhelp $(ALLSPHINXOPTS) $(BUILDDIR)/htmlhelp
71 | @echo
72 | @echo "Build finished; now you can run HTML Help Workshop with the" \
73 | ".hhp project file in $(BUILDDIR)/htmlhelp."
74 |
75 | qthelp:
76 | $(SPHINXBUILD) -b qthelp $(ALLSPHINXOPTS) $(BUILDDIR)/qthelp
77 | @echo
78 | @echo "Build finished; now you can run "qcollectiongenerator" with the" \
79 | ".qhcp project file in $(BUILDDIR)/qthelp, like this:"
80 | @echo "# qcollectiongenerator $(BUILDDIR)/qthelp/pandas-ply.qhcp"
81 | @echo "To view the help file:"
82 | @echo "# assistant -collectionFile $(BUILDDIR)/qthelp/pandas-ply.qhc"
83 |
84 | devhelp:
85 | $(SPHINXBUILD) -b devhelp $(ALLSPHINXOPTS) $(BUILDDIR)/devhelp
86 | @echo
87 | @echo "Build finished."
88 | @echo "To view the help file:"
89 | @echo "# mkdir -p $$HOME/.local/share/devhelp/pandas-ply"
90 | @echo "# ln -s $(BUILDDIR)/devhelp $$HOME/.local/share/devhelp/pandas-ply"
91 | @echo "# devhelp"
92 |
93 | epub:
94 | $(SPHINXBUILD) -b epub $(ALLSPHINXOPTS) $(BUILDDIR)/epub
95 | @echo
96 | @echo "Build finished. The epub file is in $(BUILDDIR)/epub."
97 |
98 | latex:
99 | $(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex
100 | @echo
101 | @echo "Build finished; the LaTeX files are in $(BUILDDIR)/latex."
102 | @echo "Run \`make' in that directory to run these through (pdf)latex" \
103 | "(use \`make latexpdf' here to do that automatically)."
104 |
105 | latexpdf:
106 | $(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex
107 | @echo "Running LaTeX files through pdflatex..."
108 | $(MAKE) -C $(BUILDDIR)/latex all-pdf
109 | @echo "pdflatex finished; the PDF files are in $(BUILDDIR)/latex."
110 |
111 | text:
112 | $(SPHINXBUILD) -b text $(ALLSPHINXOPTS) $(BUILDDIR)/text
113 | @echo
114 | @echo "Build finished. The text files are in $(BUILDDIR)/text."
115 |
116 | man:
117 | $(SPHINXBUILD) -b man $(ALLSPHINXOPTS) $(BUILDDIR)/man
118 | @echo
119 | @echo "Build finished. The manual pages are in $(BUILDDIR)/man."
120 |
121 | texinfo:
122 | $(SPHINXBUILD) -b texinfo $(ALLSPHINXOPTS) $(BUILDDIR)/texinfo
123 | @echo
124 | @echo "Build finished. The Texinfo files are in $(BUILDDIR)/texinfo."
125 | @echo "Run \`make' in that directory to run these through makeinfo" \
126 | "(use \`make info' here to do that automatically)."
127 |
128 | info:
129 | $(SPHINXBUILD) -b texinfo $(ALLSPHINXOPTS) $(BUILDDIR)/texinfo
130 | @echo "Running Texinfo files through makeinfo..."
131 | make -C $(BUILDDIR)/texinfo info
132 | @echo "makeinfo finished; the Info files are in $(BUILDDIR)/texinfo."
133 |
134 | gettext:
135 | $(SPHINXBUILD) -b gettext $(I18NSPHINXOPTS) $(BUILDDIR)/locale
136 | @echo
137 | @echo "Build finished. The message catalogs are in $(BUILDDIR)/locale."
138 |
139 | changes:
140 | $(SPHINXBUILD) -b changes $(ALLSPHINXOPTS) $(BUILDDIR)/changes
141 | @echo
142 | @echo "The overview file is in $(BUILDDIR)/changes."
143 |
144 | linkcheck:
145 | $(SPHINXBUILD) -b linkcheck $(ALLSPHINXOPTS) $(BUILDDIR)/linkcheck
146 | @echo
147 | @echo "Link check complete; look for any errors in the above output " \
148 | "or in $(BUILDDIR)/linkcheck/output.txt."
149 |
150 | doctest:
151 | $(SPHINXBUILD) -b doctest $(ALLSPHINXOPTS) $(BUILDDIR)/doctest
152 | @echo "Testing of doctests in the sources finished, look at the " \
153 | "results in $(BUILDDIR)/doctest/output.txt."
154 |
--------------------------------------------------------------------------------
/docs/conf.py:
--------------------------------------------------------------------------------
1 | # -*- coding: utf-8 -*-
2 | #
3 | # pandas-ply documentation build configuration file, created by
4 | # sphinx-quickstart on Tue Nov 18 19:40:12 2014.
5 | #
6 | # This file is execfile()d with the current directory set to its containing dir.
7 | #
8 | # Note that not all possible configuration values are present in this
9 | # autogenerated file.
10 | #
11 | # All configuration values have a default; values that are commented out
12 | # serve to show the default.
13 |
14 | import sys, os
15 | import sphinx_rtd_theme
16 |
17 | # If extensions (or modules to document with autodoc) are in another directory,
18 | # add these directories to sys.path here. If the directory is relative to the
19 | # documentation root, use os.path.abspath to make it absolute, like shown here.
20 | sys.path.insert(0, os.path.abspath('..'))
21 |
22 | # -- General configuration -----------------------------------------------------
23 |
24 | # If your documentation needs a minimal Sphinx version, state it here.
25 | #needs_sphinx = '1.0'
26 |
27 | # Add any Sphinx extension module names here, as strings. They can be extensions
28 | # coming with Sphinx (named 'sphinx.ext.*') or your custom ones.
29 | extensions = [
30 | 'sphinx.ext.autodoc',
31 | 'sphinx.ext.doctest',
32 | 'sphinx.ext.coverage',
33 | 'sphinxcontrib.napoleon'
34 | ]
35 |
36 | # Napoleon settings
37 | napoleon_google_docstring = True
38 | napoleon_numpy_docstring = True
39 | napoleon_include_private_with_doc = False
40 | napoleon_include_special_with_doc = True
41 | napoleon_use_admonition_for_examples = False
42 | napoleon_use_admonition_for_notes = False
43 | napoleon_use_admonition_for_references = False
44 | napoleon_use_ivar = False
45 | napoleon_use_param = True
46 | napoleon_use_rtype = True
47 | autodoc_member_order = 'bysource'
48 |
49 | # Add any paths that contain templates here, relative to this directory.
50 | templates_path = ['_templates']
51 |
52 | # The suffix of source filenames.
53 | source_suffix = '.rst'
54 |
55 | # The encoding of source files.
56 | #source_encoding = 'utf-8-sig'
57 |
58 | # The master toctree document.
59 | master_doc = 'index'
60 |
61 | # General information about the project.
62 | project = u'pandas-ply'
63 | copyright = u'2015, Coursera'
64 |
65 | # The version info for the project you're documenting, acts as replacement for
66 | # |version| and |release|, also used in various other places throughout the
67 | # built documents.
68 | #
69 | # The short X.Y version.
70 | version = '0.2.1'
71 | # The full version, including alpha/beta/rc tags.
72 | release = '0.2.1'
73 |
74 | # The language for content autogenerated by Sphinx. Refer to documentation
75 | # for a list of supported languages.
76 | #language = None
77 |
78 | # There are two options for replacing |today|: either, you set today to some
79 | # non-false value, then it is used:
80 | #today = ''
81 | # Else, today_fmt is used as the format for a strftime call.
82 | #today_fmt = '%B %d, %Y'
83 |
84 | # List of patterns, relative to source directory, that match files and
85 | # directories to ignore when looking for source files.
86 | exclude_patterns = ['_build']
87 |
88 | # The reST default role (used for this markup: `text`) to use for all documents.
89 | #default_role = None
90 |
91 | # If true, '()' will be appended to :func: etc. cross-reference text.
92 | #add_function_parentheses = True
93 |
94 | # If true, the current module name will be prepended to all description
95 | # unit titles (such as .. function::).
96 | #add_module_names = True
97 |
98 | # If true, sectionauthor and moduleauthor directives will be shown in the
99 | # output. They are ignored by default.
100 | #show_authors = False
101 |
102 | # The name of the Pygments (syntax highlighting) style to use.
103 | pygments_style = 'sphinx'
104 |
105 | # A list of ignored prefixes for module index sorting.
106 | #modindex_common_prefix = []
107 |
108 |
109 | # -- Options for HTML output ---------------------------------------------------
110 |
111 | # The theme to use for HTML and HTML Help pages. See the documentation for
112 | # a list of builtin themes.
113 | html_theme = 'sphinx_rtd_theme'
114 |
115 | # Theme options are theme-specific and customize the look and feel of a theme
116 | # further. For a list of options available for each theme, see the
117 | # documentation.
118 | #html_theme_options = {}
119 |
120 | # Add any paths that contain custom themes here, relative to this directory.
121 | html_theme_path = [sphinx_rtd_theme.get_html_theme_path()]
122 |
123 | # The name for this set of Sphinx documents. If None, it defaults to
124 | # " v documentation".
125 | #html_title = None
126 |
127 | # A shorter title for the navigation bar. Default is the same as html_title.
128 | #html_short_title = None
129 |
130 | # The name of an image file (relative to this directory) to place at the top
131 | # of the sidebar.
132 | #html_logo = None
133 |
134 | # The name of an image file (within the static path) to use as favicon of the
135 | # docs. This file should be a Windows icon file (.ico) being 16x16 or 32x32
136 | # pixels large.
137 | #html_favicon = None
138 |
139 | # Add any paths that contain custom static files (such as style sheets) here,
140 | # relative to this directory. They are copied after the builtin static files,
141 | # so a file named "default.css" will overwrite the builtin "default.css".
142 | #html_static_path = ['_static']
143 |
144 | # If not '', a 'Last updated on:' timestamp is inserted at every page bottom,
145 | # using the given strftime format.
146 | #html_last_updated_fmt = '%b %d, %Y'
147 |
148 | # If true, SmartyPants will be used to convert quotes and dashes to
149 | # typographically correct entities.
150 | #html_use_smartypants = True
151 |
152 | # Custom sidebar templates, maps document names to template names.
153 | #html_sidebars = {}
154 |
155 | # Additional templates that should be rendered to pages, maps page names to
156 | # template names.
157 | #html_additional_pages = {}
158 |
159 | # If false, no module index is generated.
160 | #html_domain_indices = True
161 |
162 | # If false, no index is generated.
163 | #html_use_index = True
164 |
165 | # If true, the index is split into individual pages for each letter.
166 | #html_split_index = False
167 |
168 | # If true, links to the reST sources are added to the pages.
169 | #html_show_sourcelink = True
170 |
171 | # If true, "Created using Sphinx" is shown in the HTML footer. Default is True.
172 | #html_show_sphinx = True
173 |
174 | # If true, "(C) Copyright ..." is shown in the HTML footer. Default is True.
175 | #html_show_copyright = True
176 |
177 | # If true, an OpenSearch description file will be output, and all pages will
178 | # contain a tag referring to it. The value of this option must be the
179 | # base URL from which the finished HTML is served.
180 | #html_use_opensearch = ''
181 |
182 | # This is the file name suffix for HTML files (e.g. ".xhtml").
183 | #html_file_suffix = None
184 |
185 | # Output file base name for HTML help builder.
186 | #htmlhelp_basename = 'pandas-plydoc'
187 |
188 |
189 | # -- Options for LaTeX output --------------------------------------------------
190 |
191 | latex_elements = {
192 | # The paper size ('letterpaper' or 'a4paper').
193 | #'papersize': 'letterpaper',
194 |
195 | # The font size ('10pt', '11pt' or '12pt').
196 | #'pointsize': '10pt',
197 |
198 | # Additional stuff for the LaTeX preamble.
199 | #'preamble': '',
200 | }
201 |
202 | # Grouping the document tree into LaTeX files. List of tuples
203 | # (source start file, target name, title, author, documentclass [howto/manual]).
204 | latex_documents = [
205 | ('index', 'pandas-ply.tex', u'pandas-ply Documentation',
206 | u'Coursera', 'manual'),
207 | ]
208 |
209 | # The name of an image file (relative to this directory) to place at the top of
210 | # the title page.
211 | #latex_logo = None
212 |
213 | # For "manual" documents, if this is true, then toplevel headings are parts,
214 | # not chapters.
215 | #latex_use_parts = False
216 |
217 | # If true, show page references after internal links.
218 | #latex_show_pagerefs = False
219 |
220 | # If true, show URL addresses after external links.
221 | #latex_show_urls = False
222 |
223 | # Documents to append as an appendix to all manuals.
224 | #latex_appendices = []
225 |
226 | # If false, no module index is generated.
227 | #latex_domain_indices = True
228 |
229 |
230 | # -- Options for manual page output --------------------------------------------
231 |
232 | # One entry per manual page. List of tuples
233 | # (source start file, name, description, authors, manual section).
234 | man_pages = [
235 | ('index', 'pandas-ply', u'pandas-ply Documentation',
236 | [u'Coursera'], 1)
237 | ]
238 |
239 | # If true, show URL addresses after external links.
240 | #man_show_urls = False
241 |
242 |
243 | # -- Options for Texinfo output ------------------------------------------------
244 |
245 | # Grouping the document tree into Texinfo files. List of tuples
246 | # (source start file, target name, title, author,
247 | # dir menu entry, description, category)
248 | texinfo_documents = [
249 | ('index', 'pandas-ply', u'pandas-ply Documentation',
250 | u'Coursera', 'pandas-ply', 'functional data manipulation for pandas',
251 | 'Miscellaneous'),
252 | ]
253 |
254 | # Documents to append as an appendix to all manuals.
255 | #texinfo_appendices = []
256 |
257 | # If false, no module index is generated.
258 | #texinfo_domain_indices = True
259 |
260 | # How to display URL addresses: 'footnote', 'no', or 'inline'.
261 | #texinfo_show_urls = 'footnote'
262 |
--------------------------------------------------------------------------------
/docs/index.rst:
--------------------------------------------------------------------------------
1 | **pandas-ply**: functional data manipulation for pandas
2 | =======================================================
3 |
4 | **pandas-ply** is a thin layer which makes it easier to manipulate data with `pandas `_. In particular, it provides elegant, functional, chainable syntax in cases where **pandas** would require mutation, saved intermediate values, or other awkward constructions. In this way, it aims to move **pandas** closer to the "grammar of data manipulation" provided by the `dplyr `_ package for R.
5 |
6 | For example, take the **dplyr** code below:
7 |
8 | .. code:: r
9 |
10 | flights %>%
11 | group_by(year, month, day) %>%
12 | summarise(
13 | arr = mean(arr_delay, na.rm = TRUE),
14 | dep = mean(dep_delay, na.rm = TRUE)
15 | ) %>%
16 | filter(arr > 30 & dep > 30)
17 |
18 | The most common way to express this in **pandas** is probably:
19 |
20 | .. code:: python
21 |
22 | grouped_flights = flights.groupby(['year', 'month', 'day'])
23 | output = pd.DataFrame()
24 | output['arr'] = grouped_flights.arr_delay.mean()
25 | output['dep'] = grouped_flights.dep_delay.mean()
26 | filtered_output = output[(output.arr > 30) & (output.dep > 30)]
27 |
28 | **pandas-ply** lets you instead write:
29 |
30 | .. code:: python
31 |
32 | (flights
33 | .groupby(['year', 'month', 'day'])
34 | .ply_select(
35 | arr = X.arr_delay.mean(),
36 | dep = X.dep_delay.mean())
37 | .ply_where(X.arr > 30, X.dep > 30))
38 |
39 | In our opinion, this **pandas-ply** code is cleaner, more expressive, more readable, more concise, and less error-prone than the original **pandas** code.
40 |
41 | Explanatory notes on the **pandas-ply** code sample above:
42 |
43 | * **pandas-ply**'s methods (like ``ply_select`` and ``ply_where`` above) are attached directly to **pandas** objects and can be used immediately, without any wrapping or redirection. They start with a ``ply_`` prefix to distinguish them from built-in **pandas** methods.
44 | * **pandas-ply**'s methods are named for (and modelled after) SQL's operators. (But keep in mind that these operators will not always appear in the same order as they do in a SQL statement: ``SELECT a FROM b WHERE c GROUP BY d`` probably maps to ``b.ply_where(c).groupby(d).ply_select(a)``.)
45 | * **pandas-ply** includes a simple system for building "symbolic expressions" to provide as arguments to its methods. ``X`` above is an instance of ``ply.symbolic.Symbol``. Operations on this symbol produce larger compound symbolic expressions. When ``pandas-ply`` receives a symbolic expression as an argument, it converts it into a function. So, for instance, ``X.arr > 30`` in the above code could have instead been provided as ``lambda x: x.arr > 30``. Use of symbolic expressions allows the ``lambda x:`` to be left off, resulting in less cluttered code.
46 |
47 | Warning
48 | -------
49 |
50 | **pandas-ply** is new, and in an experimental stage of its development. The API is not yet stable. Expect the unexpected.
51 |
52 | (Pull requests are welcome. Feel free to contact us at pandas-ply@coursera.org.)
53 |
54 | Using **pandas-ply**
55 | --------------------
56 |
57 | Install **pandas-ply** with:
58 |
59 | ::
60 |
61 | $ pip install pandas-ply
62 |
63 |
64 | Typical use of **pandas-ply** starts with:
65 |
66 | .. code:: python
67 |
68 | import pandas as pd
69 | from pandas_ply import install_ply, X, sym_call
70 |
71 | install_ply(pd)
72 |
73 | After calling ``install_ply``, all **pandas** objects have **pandas-ply**'s methods attached.
74 |
75 | API reference
76 | -------------
77 |
78 | pandas extensions
79 | ~~~~~~~~~~~~~~~~~
80 |
81 | .. automodule:: ply.methods
82 | :members:
83 | :undoc-members:
84 | :show-inheritance:
85 |
86 | `ply.symbolic`
87 | ~~~~~~~~~~~~~~
88 |
89 | .. automodule:: ply.symbolic
90 | :members:
91 | :undoc-members:
92 | :private-members:
93 | :show-inheritance:
94 |
--------------------------------------------------------------------------------
/pandas_ply/__init__.py:
--------------------------------------------------------------------------------
1 | from .methods import install_ply
2 | from .symbolic import X, sym_call
3 |
--------------------------------------------------------------------------------
/pandas_ply/methods.py:
--------------------------------------------------------------------------------
1 | """This module contains the **pandas-ply** methods which are designed to be
2 | added to panda objects. The methods in this module should not be used directly.
3 | Instead, the function `install_ply` should be used to attach them to the pandas
4 | classes."""
5 |
6 | from . import symbolic
7 | from .vendor.six import iteritems
8 | from .vendor.six.moves import reduce
9 |
10 | pandas = None
11 |
12 |
13 | def install_ply(pandas_to_use):
14 | """Install `pandas-ply` onto the objects in a copy of `pandas`."""
15 |
16 | global pandas
17 | pandas = pandas_to_use
18 |
19 | pandas.DataFrame.ply_where = _ply_where
20 | pandas.DataFrame.ply_select = _ply_select
21 |
22 | pandas.Series.ply_where = _ply_where
23 |
24 | pandas.core.groupby.DataFrameGroupBy.ply_select = _ply_select_for_groups
25 |
26 | pandas.core.groupby.SeriesGroupBy.ply_select = _ply_select_for_groups
27 |
28 |
29 | def _ply_where(self, *conditions):
30 | """Filter a dataframe/series to only include rows/entries satisfying a
31 | given set of conditions.
32 |
33 | Analogous to SQL's ``WHERE``, or dplyr's ``filter``.
34 |
35 | Args:
36 | `*conditions`: Each should be a dataframe/series of booleans, a
37 | function returning such an object when run on the input dataframe,
38 | or a symbolic expression yielding such an object when evaluated
39 | with Symbol(0) mapped to the input dataframe. The input dataframe
40 | will be filtered by the AND of all the conditions.
41 |
42 | Example:
43 | >>> flights.ply_where(X.month == 1, X.day == 1)
44 | [ same result as `flights[(flights.month == 1) & (flights.day == 1)]` ]
45 | """
46 |
47 | if not conditions:
48 | return self
49 |
50 | evalled_conditions = [symbolic.to_callable(condition)(self)
51 | for condition in conditions]
52 | anded_evalled_conditions = reduce(
53 | lambda x, y: x & y, evalled_conditions)
54 | return self[anded_evalled_conditions]
55 |
56 |
57 | def _ply_select(self, *args, **kwargs):
58 | """Transform a dataframe by selecting old columns and new (computed)
59 | columns.
60 |
61 | Analogous to SQL's ``SELECT``, or dplyr's ``select`` / ``rename`` /
62 | ``mutate`` / ``transmute``.
63 |
64 | Args:
65 | `*args`: Each should be one of:
66 |
67 | ``'*'``
68 | says that all columns in the input dataframe should be
69 | included
70 | ``'column_name'``
71 | says that `column_name` in the input dataframe should be
72 | included
73 | ``'-column_name'``
74 | says that `column_name` in the input dataframe should be
75 | excluded.
76 |
77 | If any `'-column_name'` is present, then `'*'` should be
78 | present, and if `'*'` is present, no 'column_name' should be
79 | present. Column-includes and column-excludes should not overlap.
80 | `**kwargs`: Each argument name will be the name of a new column in the
81 | output dataframe, with the column's contents determined by the
82 | argument contents. These contents can be given as a dataframe, a
83 | function (taking the input dataframe as its single argument), or a
84 | symbolic expression (taking the input dataframe as ``Symbol(0)``).
85 | kwarg-provided columns override arg-provided columns.
86 |
87 | Example:
88 | >>> flights.ply_select('*',
89 | ... gain = X.arr_delay - X.dep_delay,
90 | ... speed = X.distance / X.air_time * 60)
91 | [ original dataframe, with two new computed columns added ]
92 | """
93 |
94 | input_columns = set(self.columns)
95 |
96 | has_star = False
97 | include_columns = []
98 | exclude_columns = []
99 | for arg in args:
100 | if arg == '*':
101 | if has_star:
102 | raise ValueError('ply_select received repeated stars')
103 | has_star = True
104 | elif arg in input_columns:
105 | if arg in include_columns:
106 | raise ValueError(
107 | 'ply_select received a repeated column-include (%s)' %
108 | arg)
109 | include_columns.append(arg)
110 | elif arg[0] == '-' and arg[1:] in input_columns:
111 | if arg in exclude_columns:
112 | raise ValueError(
113 | 'ply_select received a repeated column-exclude (%s)' %
114 | arg[1:])
115 | exclude_columns.append(arg[1:])
116 | else:
117 | raise ValueError(
118 | 'ply_select received a strange argument (%s)' %
119 | arg)
120 | if exclude_columns and not has_star:
121 | raise ValueError(
122 | 'ply_select received column-excludes without an star')
123 | if has_star and include_columns:
124 | raise ValueError(
125 | 'ply_select received both an star and column-includes')
126 | if set(include_columns) & set(exclude_columns):
127 | raise ValueError(
128 | 'ply_select received overlapping column-includes and ' +
129 | 'column-excludes')
130 |
131 | include_columns_inc_star = self.columns if has_star else include_columns
132 |
133 | output_columns = [col for col in include_columns_inc_star
134 | if col not in exclude_columns]
135 |
136 | # Note: This maintains self's index even if output_columns is [].
137 | to_return = self[output_columns]
138 |
139 | # Temporarily disable SettingWithCopyWarning, as setting columns on a
140 | # copy (`to_return`) is intended here.
141 | with pandas.option_context('mode.chained_assignment', None):
142 |
143 | for column_name, column_value in iteritems(kwargs):
144 | evaluated_value = symbolic.to_callable(column_value)(self)
145 | # TODO: verify that evaluated_value is a series!
146 | if column_name == 'index':
147 | to_return.index = evaluated_value
148 | else:
149 | to_return[column_name] = evaluated_value
150 |
151 | return to_return
152 |
153 |
154 | # TODO: Ensure that an empty ply_select on a groupby returns a large dataframe
155 | def _ply_select_for_groups(self, **kwargs):
156 | """Summarize a grouped dataframe or series.
157 |
158 | Analogous to SQL's ``SELECT`` (when a ``GROUP BY`` is present), or dplyr's
159 | ``summarise``.
160 |
161 | Args:
162 | `**kwargs`: Each argument name will be the name of a new column in the
163 | output dataframe, with the column's contents determined by the
164 | argument contents. These contents can be given as a dataframe, a
165 | function (taking the input grouped dataframe as its single
166 | argument), or a symbolic expression (taking the input grouped
167 | dataframe as `Symbol(0)`).
168 | """
169 |
170 | to_return = pandas.DataFrame()
171 |
172 | for column_name, column_value in iteritems(kwargs):
173 | evaluated_value = symbolic.to_callable(column_value)(self)
174 | if column_name == 'index':
175 | to_return.index = evaluated_value
176 | else:
177 | to_return[column_name] = evaluated_value
178 |
179 | return to_return
180 |
181 |
182 | class PlyDataFrame:
183 | """The following methods are added to `pandas.DataFrame`:"""
184 |
185 | ply_where = _ply_where
186 | ply_select = _ply_select
187 |
188 |
189 | class PlySeries:
190 | """The following methods are added to `pandas.Series`:"""
191 |
192 | ply_where = _ply_where
193 |
194 |
195 | class PlyDataFrameGroupBy:
196 | """The following methods are added to
197 | `pandas.core.groupby.DataFrameGroupBy`:"""
198 |
199 | ply_select = _ply_select_for_groups
200 |
201 |
202 | class PlySeriesGroupBy:
203 | """The following methods are added to
204 | `pandas.core.groupby.SeriesGroupBy`:"""
205 |
206 | ply_select = _ply_select_for_groups
207 |
--------------------------------------------------------------------------------
/pandas_ply/symbolic.py:
--------------------------------------------------------------------------------
1 | """`ply.symbolic` is a simple system for building "symbolic expressions" to
2 | provide as arguments to **pandas-ply**'s methods (in place of lambda
3 | expressions)."""
4 |
5 | from .vendor.six import print_
6 | from .vendor.six import iteritems
7 |
8 |
9 | class Expression(object):
10 | """`Expression` is the (abstract) base class for symbolic expressions.
11 | Symbolic expressions are encoded representations of Python expressions,
12 | kept on ice until you are ready to evaluate them. Operations on
13 | symbolic expressions (like `my_expr.some_attr` or `my_expr(some_arg)` or
14 | `my_expr + 7`) are automatically turned into symbolic representations
15 | thereof -- nothing is actually done until the special evaluation method
16 | `_eval` is called.
17 | """
18 |
19 | def _eval(self, context, **options):
20 | """Evaluate a symbolic expression.
21 |
22 | Args:
23 | context: The context object for evaluation. Currently, this is a
24 | dictionary mapping symbol names to values,
25 | `**options`: Options for evaluation. Currently, the only option is
26 | `log`, which results in some debug output during evaluation if
27 | it is set to `True`.
28 |
29 | Returns:
30 | anything
31 | """
32 | raise NotImplementedError
33 |
34 | def __repr__(self):
35 | raise NotImplementedError
36 |
37 | def __getattr__(self, name):
38 | """Construct a symbolic representation of `getattr(self, name)`."""
39 | return GetAttr(self, name)
40 |
41 | def __call__(self, *args, **kwargs):
42 | """Construct a symbolic representation of `self(*args, **kwargs)`."""
43 | return Call(self, args=args, kwargs=kwargs)
44 |
45 | # New-style classes skip __getattr__ for magic methods, so we must add them
46 | # explicitly:
47 |
48 | _magic_method_names = [
49 | '__abs__', '__add__', '__and__', '__cmp__', '__complex__', '__contains__',
50 | '__delattr__', '__delete__', '__delitem__', '__delslice__', '__div__',
51 | '__divmod__', '__enter__', '__eq__', '__exit__', '__float__',
52 | '__floordiv__', '__ge__', '__get__', '__getitem__', '__getslice__',
53 | '__gt__', '__hash__', '__hex__', '__iadd__', '__iand__', '__idiv__',
54 | '__ifloordiv__', '__ilshift__', '__imod__', '__imul__', '__index__',
55 | '__int__', '__invert__', '__ior__', '__ipow__', '__irshift__', '__isub__',
56 | '__iter__', '__itruediv__', '__ixor__', '__le__', '__len__', '__long__',
57 | '__lshift__', '__lt__', '__mod__', '__mul__', '__ne__', '__neg__',
58 | '__nonzero__', '__oct__', '__or__', '__pos__', '__pow__', '__radd__',
59 | '__rand__', '__rcmp__', '__rdiv__', '__rdivmod__', '__repr__',
60 | '__reversed__', '__rfloordiv__', '__rlshift__', '__rmod__', '__rmul__',
61 | '__ror__', '__rpow__', '__rrshift__', '__rshift__', '__rsub__',
62 | '__rtruediv__', '__rxor__', '__set__', '__setitem__', '__setslice__',
63 | '__str__', '__sub__', '__truediv__', '__unicode__', '__xor__',
64 | ]
65 |
66 | # Not included: [
67 | # '__call__', '__coerce__', '__del__', '__dict__', '__getattr__',
68 | # '__getattribute__', '__init__', '__new__', '__setattr__'
69 | # ]
70 |
71 | def _get_sym_magic_method(name):
72 | def magic_method(self, *args, **kwargs):
73 | return Call(GetAttr(self, name), args, kwargs)
74 | return magic_method
75 |
76 | for name in _magic_method_names:
77 | setattr(Expression, name, _get_sym_magic_method(name))
78 |
79 |
80 | # Here are the varieties of atomic / compound Expression.
81 |
82 |
83 | class Symbol(Expression):
84 | """`Symbol(name)` is an atomic symbolic expression, labelled with an
85 | arbitrary `name`."""
86 |
87 | def __init__(self, name):
88 | self._name = name
89 |
90 | def _eval(self, context, **options):
91 | if options.get('log'):
92 | print_('Symbol._eval', repr(self))
93 | result = context[self._name]
94 | if options.get('log'):
95 | print_('Returning', repr(self), '=>', repr(result))
96 | return result
97 |
98 | def __repr__(self):
99 | return 'Symbol(%s)' % repr(self._name)
100 |
101 |
102 | class GetAttr(Expression):
103 | """`GetAttr(obj, name)` is a symbolic expression representing the result of
104 | `getattr(obj, name)`. (`obj` and `name` can themselves be symbolic.)"""
105 |
106 | def __init__(self, obj, name):
107 | self._obj = obj
108 | self._name = name
109 |
110 | def _eval(self, context, **options):
111 | if options.get('log'):
112 | print_('GetAttr._eval', repr(self))
113 | evaled_obj = eval_if_symbolic(self._obj, context, **options)
114 | result = getattr(evaled_obj, self._name)
115 | if options.get('log'):
116 | print_('Returning', repr(self), '=>', repr(result))
117 | return result
118 |
119 | def __repr__(self):
120 | return 'getattr(%s, %s)' % (repr(self._obj), repr(self._name))
121 |
122 |
123 | class Call(Expression):
124 | """`Call(func, args, kwargs)` is a symbolic expression representing the
125 | result of `func(*args, **kwargs)`. (`func`, each member of the `args`
126 | iterable, and each value in the `kwargs` dictionary can themselves be
127 | symbolic)."""
128 |
129 | def __init__(self, func, args=[], kwargs={}):
130 | self._func = func
131 | self._args = args
132 | self._kwargs = kwargs
133 |
134 | def _eval(self, context, **options):
135 | if options.get('log'):
136 | print_('Call._eval', repr(self))
137 | evaled_func = eval_if_symbolic(self._func, context, **options)
138 | evaled_args = [eval_if_symbolic(v, context, **options)
139 | for v in self._args]
140 | evaled_kwargs = dict((k, eval_if_symbolic(v, context, **options))
141 | for k, v in iteritems(self._kwargs))
142 | result = evaled_func(*evaled_args, **evaled_kwargs)
143 | if options.get('log'):
144 | print_('Returning', repr(self), '=>', repr(result))
145 | return result
146 |
147 | def __repr__(self):
148 | return '{func}(*{args}, **{kwargs})'.format(
149 | func=repr(self._func),
150 | args=repr(self._args),
151 | kwargs=repr(self._kwargs))
152 |
153 |
154 | def eval_if_symbolic(obj, context, **options):
155 | """Evaluate an object if it is a symbolic expression, or otherwise just
156 | returns it back.
157 |
158 | Args:
159 | obj: Either a symbolic expression, or anything else (in which case this
160 | is a noop).
161 | context: Passed as an argument to `obj._eval` if `obj` is symbolic.
162 | `**options`: Passed as arguments to `obj._eval` if `obj` is symbolic.
163 |
164 | Returns:
165 | anything
166 |
167 | Examples:
168 | >>> eval_if_symbolic(Symbol('x'), {'x': 10})
169 | 10
170 | >>> eval_if_symbolic(7, {'x': 10})
171 | 7
172 | """
173 | return obj._eval(context, **options) if hasattr(obj, '_eval') else obj
174 |
175 |
176 | def to_callable(obj):
177 | """Turn an object into a callable.
178 |
179 | Args:
180 | obj: This can be
181 |
182 | * **a symbolic expression**, in which case the output callable
183 | evaluates the expression with symbols taking values from the
184 | callable's arguments (listed arguments named according to their
185 | numerical index, keyword arguments named according to their
186 | string keys),
187 | * **a callable**, in which case the output callable is just the
188 | input object, or
189 | * **anything else**, in which case the output callable is a
190 | constant function which always returns the input object.
191 |
192 | Returns:
193 | callable
194 |
195 | Examples:
196 | >>> to_callable(Symbol(0) + Symbol('x'))(3, x=4)
197 | 7
198 | >>> to_callable(lambda x: x + 1)(10)
199 | 11
200 | >>> to_callable(12)(3, x=4)
201 | 12
202 | """
203 | if hasattr(obj, '_eval'):
204 | return lambda *args, **kwargs: obj._eval(dict(enumerate(args), **kwargs))
205 | elif callable(obj):
206 | return obj
207 | else:
208 | return lambda *args, **kwargs: obj
209 |
210 |
211 | def sym_call(func, *args, **kwargs):
212 | """Construct a symbolic representation of `func(*args, **kwargs)`.
213 |
214 | This is necessary because `func(symbolic)` will not (ordinarily) know to
215 | construct a symbolic expression when it receives the symbolic
216 | expression `symbolic` as a parameter (if `func` is not itself symbolic).
217 | So instead, we write `sym_call(func, symbolic)`.
218 |
219 | Tip: If the main argument of the function is a (symbolic) DataFrame, then
220 | pandas' `pipe` method takes care of this problem without `sym_call`. For
221 | instance, while `np.sqrt(X)` won't work, `X.pipe(np.sqrt)` will.
222 |
223 | Args:
224 | func: Function to call on evaluation (can be symbolic).
225 | `*args`: Arguments to provide to `func` on evaluation (can be symbolic).
226 | `**kwargs`: Keyword arguments to provide to `func` on evaluation (can be
227 | symbolic).
228 |
229 | Returns:
230 | `ply.symbolic.Expression`
231 |
232 | Example:
233 | >>> sym_call(math.sqrt, Symbol('x'))._eval({'x': 16})
234 | 4
235 | """
236 |
237 | return Call(func, args=args, kwargs=kwargs)
238 |
239 | X = Symbol(0)
240 | """A Symbol for "the first argument" (for convenience)."""
241 |
--------------------------------------------------------------------------------
/pandas_ply/vendor/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/coursera/pandas-ply/2a043c620b35ffb32247c559e1b46cbe22064ebd/pandas_ply/vendor/__init__.py
--------------------------------------------------------------------------------
/pandas_ply/vendor/six.py:
--------------------------------------------------------------------------------
1 | """Utilities for writing code that runs on Python 2 and 3"""
2 |
3 | # Copyright (c) 2010-2014 Benjamin Peterson
4 | #
5 | # Permission is hereby granted, free of charge, to any person obtaining a copy
6 | # of this software and associated documentation files (the "Software"), to deal
7 | # in the Software without restriction, including without limitation the rights
8 | # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9 | # copies of the Software, and to permit persons to whom the Software is
10 | # furnished to do so, subject to the following conditions:
11 | #
12 | # The above copyright notice and this permission notice shall be included in all
13 | # copies or substantial portions of the Software.
14 | #
15 | # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | # AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | # SOFTWARE.
22 |
23 | from __future__ import absolute_import
24 |
25 | import functools
26 | import operator
27 | import sys
28 | import types
29 |
30 | __author__ = "Benjamin Peterson "
31 | __version__ = "1.8.0"
32 |
33 |
34 | # Useful for very coarse version differentiation.
35 | PY2 = sys.version_info[0] == 2
36 | PY3 = sys.version_info[0] == 3
37 |
38 | if PY3:
39 | string_types = str,
40 | integer_types = int,
41 | class_types = type,
42 | text_type = str
43 | binary_type = bytes
44 |
45 | MAXSIZE = sys.maxsize
46 | else:
47 | string_types = basestring,
48 | integer_types = (int, long)
49 | class_types = (type, types.ClassType)
50 | text_type = unicode
51 | binary_type = str
52 |
53 | if sys.platform.startswith("java"):
54 | # Jython always uses 32 bits.
55 | MAXSIZE = int((1 << 31) - 1)
56 | else:
57 | # It's possible to have sizeof(long) != sizeof(Py_ssize_t).
58 | class X(object):
59 | def __len__(self):
60 | return 1 << 31
61 | try:
62 | len(X())
63 | except OverflowError:
64 | # 32-bit
65 | MAXSIZE = int((1 << 31) - 1)
66 | else:
67 | # 64-bit
68 | MAXSIZE = int((1 << 63) - 1)
69 | del X
70 |
71 |
72 | def _add_doc(func, doc):
73 | """Add documentation to a function."""
74 | func.__doc__ = doc
75 |
76 |
77 | def _import_module(name):
78 | """Import module, returning the module after the last dot."""
79 | __import__(name)
80 | return sys.modules[name]
81 |
82 |
83 | class _LazyDescr(object):
84 |
85 | def __init__(self, name):
86 | self.name = name
87 |
88 | def __get__(self, obj, tp):
89 | result = self._resolve()
90 | setattr(obj, self.name, result) # Invokes __set__.
91 | # This is a bit ugly, but it avoids running this again.
92 | delattr(obj.__class__, self.name)
93 | return result
94 |
95 |
96 | class MovedModule(_LazyDescr):
97 |
98 | def __init__(self, name, old, new=None):
99 | super(MovedModule, self).__init__(name)
100 | if PY3:
101 | if new is None:
102 | new = name
103 | self.mod = new
104 | else:
105 | self.mod = old
106 |
107 | def _resolve(self):
108 | return _import_module(self.mod)
109 |
110 | def __getattr__(self, attr):
111 | _module = self._resolve()
112 | value = getattr(_module, attr)
113 | setattr(self, attr, value)
114 | return value
115 |
116 |
117 | class _LazyModule(types.ModuleType):
118 |
119 | def __init__(self, name):
120 | super(_LazyModule, self).__init__(name)
121 | self.__doc__ = self.__class__.__doc__
122 |
123 | def __dir__(self):
124 | attrs = ["__doc__", "__name__"]
125 | attrs += [attr.name for attr in self._moved_attributes]
126 | return attrs
127 |
128 | # Subclasses should override this
129 | _moved_attributes = []
130 |
131 |
132 | class MovedAttribute(_LazyDescr):
133 |
134 | def __init__(self, name, old_mod, new_mod, old_attr=None, new_attr=None):
135 | super(MovedAttribute, self).__init__(name)
136 | if PY3:
137 | if new_mod is None:
138 | new_mod = name
139 | self.mod = new_mod
140 | if new_attr is None:
141 | if old_attr is None:
142 | new_attr = name
143 | else:
144 | new_attr = old_attr
145 | self.attr = new_attr
146 | else:
147 | self.mod = old_mod
148 | if old_attr is None:
149 | old_attr = name
150 | self.attr = old_attr
151 |
152 | def _resolve(self):
153 | module = _import_module(self.mod)
154 | return getattr(module, self.attr)
155 |
156 |
157 | class _SixMetaPathImporter(object):
158 | """
159 | A meta path importer to import six.moves and its submodules.
160 |
161 | This class implements a PEP302 finder and loader. It should be compatible
162 | with Python 2.5 and all existing versions of Python3
163 | """
164 | def __init__(self, six_module_name):
165 | self.name = six_module_name
166 | self.known_modules = {}
167 |
168 | def _add_module(self, mod, *fullnames):
169 | for fullname in fullnames:
170 | self.known_modules[self.name + "." + fullname] = mod
171 |
172 | def _get_module(self, fullname):
173 | return self.known_modules[self.name + "." + fullname]
174 |
175 | def find_module(self, fullname, path=None):
176 | if fullname in self.known_modules:
177 | return self
178 | return None
179 |
180 | def __get_module(self, fullname):
181 | try:
182 | return self.known_modules[fullname]
183 | except KeyError:
184 | raise ImportError("This loader does not know module " + fullname)
185 |
186 | def load_module(self, fullname):
187 | try:
188 | # in case of a reload
189 | return sys.modules[fullname]
190 | except KeyError:
191 | pass
192 | mod = self.__get_module(fullname)
193 | if isinstance(mod, MovedModule):
194 | mod = mod._resolve()
195 | else:
196 | mod.__loader__ = self
197 | sys.modules[fullname] = mod
198 | return mod
199 |
200 | def is_package(self, fullname):
201 | """
202 | Return true, if the named module is a package.
203 |
204 | We need this method to get correct spec objects with
205 | Python 3.4 (see PEP451)
206 | """
207 | return hasattr(self.__get_module(fullname), "__path__")
208 |
209 | def get_code(self, fullname):
210 | """Return None
211 |
212 | Required, if is_package is implemented"""
213 | self.__get_module(fullname) # eventually raises ImportError
214 | return None
215 | get_source = get_code # same as get_code
216 |
217 | _importer = _SixMetaPathImporter(__name__)
218 |
219 |
220 | class _MovedItems(_LazyModule):
221 | """Lazy loading of moved objects"""
222 | __path__ = [] # mark as package
223 |
224 |
225 | _moved_attributes = [
226 | MovedAttribute("cStringIO", "cStringIO", "io", "StringIO"),
227 | MovedAttribute("filter", "itertools", "builtins", "ifilter", "filter"),
228 | MovedAttribute("filterfalse", "itertools", "itertools", "ifilterfalse", "filterfalse"),
229 | MovedAttribute("input", "__builtin__", "builtins", "raw_input", "input"),
230 | MovedAttribute("intern", "__builtin__", "sys"),
231 | MovedAttribute("map", "itertools", "builtins", "imap", "map"),
232 | MovedAttribute("range", "__builtin__", "builtins", "xrange", "range"),
233 | MovedAttribute("reload_module", "__builtin__", "imp", "reload"),
234 | MovedAttribute("reduce", "__builtin__", "functools"),
235 | MovedAttribute("shlex_quote", "pipes", "shlex", "quote"),
236 | MovedAttribute("StringIO", "StringIO", "io"),
237 | MovedAttribute("UserDict", "UserDict", "collections"),
238 | MovedAttribute("UserList", "UserList", "collections"),
239 | MovedAttribute("UserString", "UserString", "collections"),
240 | MovedAttribute("xrange", "__builtin__", "builtins", "xrange", "range"),
241 | MovedAttribute("zip", "itertools", "builtins", "izip", "zip"),
242 | MovedAttribute("zip_longest", "itertools", "itertools", "izip_longest", "zip_longest"),
243 |
244 | MovedModule("builtins", "__builtin__"),
245 | MovedModule("configparser", "ConfigParser"),
246 | MovedModule("copyreg", "copy_reg"),
247 | MovedModule("dbm_gnu", "gdbm", "dbm.gnu"),
248 | MovedModule("_dummy_thread", "dummy_thread", "_dummy_thread"),
249 | MovedModule("http_cookiejar", "cookielib", "http.cookiejar"),
250 | MovedModule("http_cookies", "Cookie", "http.cookies"),
251 | MovedModule("html_entities", "htmlentitydefs", "html.entities"),
252 | MovedModule("html_parser", "HTMLParser", "html.parser"),
253 | MovedModule("http_client", "httplib", "http.client"),
254 | MovedModule("email_mime_multipart", "email.MIMEMultipart", "email.mime.multipart"),
255 | MovedModule("email_mime_nonmultipart", "email.MIMENonMultipart", "email.mime.nonmultipart"),
256 | MovedModule("email_mime_text", "email.MIMEText", "email.mime.text"),
257 | MovedModule("email_mime_base", "email.MIMEBase", "email.mime.base"),
258 | MovedModule("BaseHTTPServer", "BaseHTTPServer", "http.server"),
259 | MovedModule("CGIHTTPServer", "CGIHTTPServer", "http.server"),
260 | MovedModule("SimpleHTTPServer", "SimpleHTTPServer", "http.server"),
261 | MovedModule("cPickle", "cPickle", "pickle"),
262 | MovedModule("queue", "Queue"),
263 | MovedModule("reprlib", "repr"),
264 | MovedModule("socketserver", "SocketServer"),
265 | MovedModule("_thread", "thread", "_thread"),
266 | MovedModule("tkinter", "Tkinter"),
267 | MovedModule("tkinter_dialog", "Dialog", "tkinter.dialog"),
268 | MovedModule("tkinter_filedialog", "FileDialog", "tkinter.filedialog"),
269 | MovedModule("tkinter_scrolledtext", "ScrolledText", "tkinter.scrolledtext"),
270 | MovedModule("tkinter_simpledialog", "SimpleDialog", "tkinter.simpledialog"),
271 | MovedModule("tkinter_tix", "Tix", "tkinter.tix"),
272 | MovedModule("tkinter_ttk", "ttk", "tkinter.ttk"),
273 | MovedModule("tkinter_constants", "Tkconstants", "tkinter.constants"),
274 | MovedModule("tkinter_dnd", "Tkdnd", "tkinter.dnd"),
275 | MovedModule("tkinter_colorchooser", "tkColorChooser",
276 | "tkinter.colorchooser"),
277 | MovedModule("tkinter_commondialog", "tkCommonDialog",
278 | "tkinter.commondialog"),
279 | MovedModule("tkinter_tkfiledialog", "tkFileDialog", "tkinter.filedialog"),
280 | MovedModule("tkinter_font", "tkFont", "tkinter.font"),
281 | MovedModule("tkinter_messagebox", "tkMessageBox", "tkinter.messagebox"),
282 | MovedModule("tkinter_tksimpledialog", "tkSimpleDialog",
283 | "tkinter.simpledialog"),
284 | MovedModule("urllib_parse", __name__ + ".moves.urllib_parse", "urllib.parse"),
285 | MovedModule("urllib_error", __name__ + ".moves.urllib_error", "urllib.error"),
286 | MovedModule("urllib", __name__ + ".moves.urllib", __name__ + ".moves.urllib"),
287 | MovedModule("urllib_robotparser", "robotparser", "urllib.robotparser"),
288 | MovedModule("xmlrpc_client", "xmlrpclib", "xmlrpc.client"),
289 | MovedModule("xmlrpc_server", "SimpleXMLRPCServer", "xmlrpc.server"),
290 | MovedModule("winreg", "_winreg"),
291 | ]
292 | for attr in _moved_attributes:
293 | setattr(_MovedItems, attr.name, attr)
294 | if isinstance(attr, MovedModule):
295 | _importer._add_module(attr, "moves." + attr.name)
296 | del attr
297 |
298 | _MovedItems._moved_attributes = _moved_attributes
299 |
300 | moves = _MovedItems(__name__ + ".moves")
301 | _importer._add_module(moves, "moves")
302 |
303 |
304 | class Module_six_moves_urllib_parse(_LazyModule):
305 | """Lazy loading of moved objects in six.moves.urllib_parse"""
306 |
307 |
308 | _urllib_parse_moved_attributes = [
309 | MovedAttribute("ParseResult", "urlparse", "urllib.parse"),
310 | MovedAttribute("SplitResult", "urlparse", "urllib.parse"),
311 | MovedAttribute("parse_qs", "urlparse", "urllib.parse"),
312 | MovedAttribute("parse_qsl", "urlparse", "urllib.parse"),
313 | MovedAttribute("urldefrag", "urlparse", "urllib.parse"),
314 | MovedAttribute("urljoin", "urlparse", "urllib.parse"),
315 | MovedAttribute("urlparse", "urlparse", "urllib.parse"),
316 | MovedAttribute("urlsplit", "urlparse", "urllib.parse"),
317 | MovedAttribute("urlunparse", "urlparse", "urllib.parse"),
318 | MovedAttribute("urlunsplit", "urlparse", "urllib.parse"),
319 | MovedAttribute("quote", "urllib", "urllib.parse"),
320 | MovedAttribute("quote_plus", "urllib", "urllib.parse"),
321 | MovedAttribute("unquote", "urllib", "urllib.parse"),
322 | MovedAttribute("unquote_plus", "urllib", "urllib.parse"),
323 | MovedAttribute("urlencode", "urllib", "urllib.parse"),
324 | MovedAttribute("splitquery", "urllib", "urllib.parse"),
325 | MovedAttribute("splittag", "urllib", "urllib.parse"),
326 | MovedAttribute("splituser", "urllib", "urllib.parse"),
327 | MovedAttribute("uses_fragment", "urlparse", "urllib.parse"),
328 | MovedAttribute("uses_netloc", "urlparse", "urllib.parse"),
329 | MovedAttribute("uses_params", "urlparse", "urllib.parse"),
330 | MovedAttribute("uses_query", "urlparse", "urllib.parse"),
331 | MovedAttribute("uses_relative", "urlparse", "urllib.parse"),
332 | ]
333 | for attr in _urllib_parse_moved_attributes:
334 | setattr(Module_six_moves_urllib_parse, attr.name, attr)
335 | del attr
336 |
337 | Module_six_moves_urllib_parse._moved_attributes = _urllib_parse_moved_attributes
338 |
339 | _importer._add_module(Module_six_moves_urllib_parse(__name__ + ".moves.urllib_parse"),
340 | "moves.urllib_parse", "moves.urllib.parse")
341 |
342 |
343 | class Module_six_moves_urllib_error(_LazyModule):
344 | """Lazy loading of moved objects in six.moves.urllib_error"""
345 |
346 |
347 | _urllib_error_moved_attributes = [
348 | MovedAttribute("URLError", "urllib2", "urllib.error"),
349 | MovedAttribute("HTTPError", "urllib2", "urllib.error"),
350 | MovedAttribute("ContentTooShortError", "urllib", "urllib.error"),
351 | ]
352 | for attr in _urllib_error_moved_attributes:
353 | setattr(Module_six_moves_urllib_error, attr.name, attr)
354 | del attr
355 |
356 | Module_six_moves_urllib_error._moved_attributes = _urllib_error_moved_attributes
357 |
358 | _importer._add_module(Module_six_moves_urllib_error(__name__ + ".moves.urllib.error"),
359 | "moves.urllib_error", "moves.urllib.error")
360 |
361 |
362 | class Module_six_moves_urllib_request(_LazyModule):
363 | """Lazy loading of moved objects in six.moves.urllib_request"""
364 |
365 |
366 | _urllib_request_moved_attributes = [
367 | MovedAttribute("urlopen", "urllib2", "urllib.request"),
368 | MovedAttribute("install_opener", "urllib2", "urllib.request"),
369 | MovedAttribute("build_opener", "urllib2", "urllib.request"),
370 | MovedAttribute("pathname2url", "urllib", "urllib.request"),
371 | MovedAttribute("url2pathname", "urllib", "urllib.request"),
372 | MovedAttribute("getproxies", "urllib", "urllib.request"),
373 | MovedAttribute("Request", "urllib2", "urllib.request"),
374 | MovedAttribute("OpenerDirector", "urllib2", "urllib.request"),
375 | MovedAttribute("HTTPDefaultErrorHandler", "urllib2", "urllib.request"),
376 | MovedAttribute("HTTPRedirectHandler", "urllib2", "urllib.request"),
377 | MovedAttribute("HTTPCookieProcessor", "urllib2", "urllib.request"),
378 | MovedAttribute("ProxyHandler", "urllib2", "urllib.request"),
379 | MovedAttribute("BaseHandler", "urllib2", "urllib.request"),
380 | MovedAttribute("HTTPPasswordMgr", "urllib2", "urllib.request"),
381 | MovedAttribute("HTTPPasswordMgrWithDefaultRealm", "urllib2", "urllib.request"),
382 | MovedAttribute("AbstractBasicAuthHandler", "urllib2", "urllib.request"),
383 | MovedAttribute("HTTPBasicAuthHandler", "urllib2", "urllib.request"),
384 | MovedAttribute("ProxyBasicAuthHandler", "urllib2", "urllib.request"),
385 | MovedAttribute("AbstractDigestAuthHandler", "urllib2", "urllib.request"),
386 | MovedAttribute("HTTPDigestAuthHandler", "urllib2", "urllib.request"),
387 | MovedAttribute("ProxyDigestAuthHandler", "urllib2", "urllib.request"),
388 | MovedAttribute("HTTPHandler", "urllib2", "urllib.request"),
389 | MovedAttribute("HTTPSHandler", "urllib2", "urllib.request"),
390 | MovedAttribute("FileHandler", "urllib2", "urllib.request"),
391 | MovedAttribute("FTPHandler", "urllib2", "urllib.request"),
392 | MovedAttribute("CacheFTPHandler", "urllib2", "urllib.request"),
393 | MovedAttribute("UnknownHandler", "urllib2", "urllib.request"),
394 | MovedAttribute("HTTPErrorProcessor", "urllib2", "urllib.request"),
395 | MovedAttribute("urlretrieve", "urllib", "urllib.request"),
396 | MovedAttribute("urlcleanup", "urllib", "urllib.request"),
397 | MovedAttribute("URLopener", "urllib", "urllib.request"),
398 | MovedAttribute("FancyURLopener", "urllib", "urllib.request"),
399 | MovedAttribute("proxy_bypass", "urllib", "urllib.request"),
400 | ]
401 | for attr in _urllib_request_moved_attributes:
402 | setattr(Module_six_moves_urllib_request, attr.name, attr)
403 | del attr
404 |
405 | Module_six_moves_urllib_request._moved_attributes = _urllib_request_moved_attributes
406 |
407 | _importer._add_module(Module_six_moves_urllib_request(__name__ + ".moves.urllib.request"),
408 | "moves.urllib_request", "moves.urllib.request")
409 |
410 |
411 | class Module_six_moves_urllib_response(_LazyModule):
412 | """Lazy loading of moved objects in six.moves.urllib_response"""
413 |
414 |
415 | _urllib_response_moved_attributes = [
416 | MovedAttribute("addbase", "urllib", "urllib.response"),
417 | MovedAttribute("addclosehook", "urllib", "urllib.response"),
418 | MovedAttribute("addinfo", "urllib", "urllib.response"),
419 | MovedAttribute("addinfourl", "urllib", "urllib.response"),
420 | ]
421 | for attr in _urllib_response_moved_attributes:
422 | setattr(Module_six_moves_urllib_response, attr.name, attr)
423 | del attr
424 |
425 | Module_six_moves_urllib_response._moved_attributes = _urllib_response_moved_attributes
426 |
427 | _importer._add_module(Module_six_moves_urllib_response(__name__ + ".moves.urllib.response"),
428 | "moves.urllib_response", "moves.urllib.response")
429 |
430 |
431 | class Module_six_moves_urllib_robotparser(_LazyModule):
432 | """Lazy loading of moved objects in six.moves.urllib_robotparser"""
433 |
434 |
435 | _urllib_robotparser_moved_attributes = [
436 | MovedAttribute("RobotFileParser", "robotparser", "urllib.robotparser"),
437 | ]
438 | for attr in _urllib_robotparser_moved_attributes:
439 | setattr(Module_six_moves_urllib_robotparser, attr.name, attr)
440 | del attr
441 |
442 | Module_six_moves_urllib_robotparser._moved_attributes = _urllib_robotparser_moved_attributes
443 |
444 | _importer._add_module(Module_six_moves_urllib_robotparser(__name__ + ".moves.urllib.robotparser"),
445 | "moves.urllib_robotparser", "moves.urllib.robotparser")
446 |
447 |
448 | class Module_six_moves_urllib(types.ModuleType):
449 | """Create a six.moves.urllib namespace that resembles the Python 3 namespace"""
450 | __path__ = [] # mark as package
451 | parse = _importer._get_module("moves.urllib_parse")
452 | error = _importer._get_module("moves.urllib_error")
453 | request = _importer._get_module("moves.urllib_request")
454 | response = _importer._get_module("moves.urllib_response")
455 | robotparser = _importer._get_module("moves.urllib_robotparser")
456 |
457 | def __dir__(self):
458 | return ['parse', 'error', 'request', 'response', 'robotparser']
459 |
460 | _importer._add_module(Module_six_moves_urllib(__name__ + ".moves.urllib"),
461 | "moves.urllib")
462 |
463 |
464 | def add_move(move):
465 | """Add an item to six.moves."""
466 | setattr(_MovedItems, move.name, move)
467 |
468 |
469 | def remove_move(name):
470 | """Remove item from six.moves."""
471 | try:
472 | delattr(_MovedItems, name)
473 | except AttributeError:
474 | try:
475 | del moves.__dict__[name]
476 | except KeyError:
477 | raise AttributeError("no such move, %r" % (name,))
478 |
479 |
480 | if PY3:
481 | _meth_func = "__func__"
482 | _meth_self = "__self__"
483 |
484 | _func_closure = "__closure__"
485 | _func_code = "__code__"
486 | _func_defaults = "__defaults__"
487 | _func_globals = "__globals__"
488 | else:
489 | _meth_func = "im_func"
490 | _meth_self = "im_self"
491 |
492 | _func_closure = "func_closure"
493 | _func_code = "func_code"
494 | _func_defaults = "func_defaults"
495 | _func_globals = "func_globals"
496 |
497 |
498 | try:
499 | advance_iterator = next
500 | except NameError:
501 | def advance_iterator(it):
502 | return it.next()
503 | next = advance_iterator
504 |
505 |
506 | try:
507 | callable = callable
508 | except NameError:
509 | def callable(obj):
510 | return any("__call__" in klass.__dict__ for klass in type(obj).__mro__)
511 |
512 |
513 | if PY3:
514 | def get_unbound_function(unbound):
515 | return unbound
516 |
517 | create_bound_method = types.MethodType
518 |
519 | Iterator = object
520 | else:
521 | def get_unbound_function(unbound):
522 | return unbound.im_func
523 |
524 | def create_bound_method(func, obj):
525 | return types.MethodType(func, obj, obj.__class__)
526 |
527 | class Iterator(object):
528 |
529 | def next(self):
530 | return type(self).__next__(self)
531 |
532 | callable = callable
533 | _add_doc(get_unbound_function,
534 | """Get the function out of a possibly unbound function""")
535 |
536 |
537 | get_method_function = operator.attrgetter(_meth_func)
538 | get_method_self = operator.attrgetter(_meth_self)
539 | get_function_closure = operator.attrgetter(_func_closure)
540 | get_function_code = operator.attrgetter(_func_code)
541 | get_function_defaults = operator.attrgetter(_func_defaults)
542 | get_function_globals = operator.attrgetter(_func_globals)
543 |
544 |
545 | if PY3:
546 | def iterkeys(d, **kw):
547 | return iter(d.keys(**kw))
548 |
549 | def itervalues(d, **kw):
550 | return iter(d.values(**kw))
551 |
552 | def iteritems(d, **kw):
553 | return iter(d.items(**kw))
554 |
555 | def iterlists(d, **kw):
556 | return iter(d.lists(**kw))
557 | else:
558 | def iterkeys(d, **kw):
559 | return iter(d.iterkeys(**kw))
560 |
561 | def itervalues(d, **kw):
562 | return iter(d.itervalues(**kw))
563 |
564 | def iteritems(d, **kw):
565 | return iter(d.iteritems(**kw))
566 |
567 | def iterlists(d, **kw):
568 | return iter(d.iterlists(**kw))
569 |
570 | _add_doc(iterkeys, "Return an iterator over the keys of a dictionary.")
571 | _add_doc(itervalues, "Return an iterator over the values of a dictionary.")
572 | _add_doc(iteritems,
573 | "Return an iterator over the (key, value) pairs of a dictionary.")
574 | _add_doc(iterlists,
575 | "Return an iterator over the (key, [values]) pairs of a dictionary.")
576 |
577 |
578 | if PY3:
579 | def b(s):
580 | return s.encode("latin-1")
581 | def u(s):
582 | return s
583 | unichr = chr
584 | if sys.version_info[1] <= 1:
585 | def int2byte(i):
586 | return bytes((i,))
587 | else:
588 | # This is about 2x faster than the implementation above on 3.2+
589 | int2byte = operator.methodcaller("to_bytes", 1, "big")
590 | byte2int = operator.itemgetter(0)
591 | indexbytes = operator.getitem
592 | iterbytes = iter
593 | import io
594 | StringIO = io.StringIO
595 | BytesIO = io.BytesIO
596 | else:
597 | def b(s):
598 | return s
599 | # Workaround for standalone backslash
600 | def u(s):
601 | return unicode(s.replace(r'\\', r'\\\\'), "unicode_escape")
602 | unichr = unichr
603 | int2byte = chr
604 | def byte2int(bs):
605 | return ord(bs[0])
606 | def indexbytes(buf, i):
607 | return ord(buf[i])
608 | def iterbytes(buf):
609 | return (ord(byte) for byte in buf)
610 | import StringIO
611 | StringIO = BytesIO = StringIO.StringIO
612 | _add_doc(b, """Byte literal""")
613 | _add_doc(u, """Text literal""")
614 |
615 |
616 | if PY3:
617 | exec_ = getattr(moves.builtins, "exec")
618 |
619 |
620 | def reraise(tp, value, tb=None):
621 | if value is None:
622 | value = tp()
623 | if value.__traceback__ is not tb:
624 | raise value.with_traceback(tb)
625 | raise value
626 |
627 | else:
628 | def exec_(_code_, _globs_=None, _locs_=None):
629 | """Execute code in a namespace."""
630 | if _globs_ is None:
631 | frame = sys._getframe(1)
632 | _globs_ = frame.f_globals
633 | if _locs_ is None:
634 | _locs_ = frame.f_locals
635 | del frame
636 | elif _locs_ is None:
637 | _locs_ = _globs_
638 | exec("""exec _code_ in _globs_, _locs_""")
639 |
640 |
641 | exec_("""def reraise(tp, value, tb=None):
642 | raise tp, value, tb
643 | """)
644 |
645 |
646 | print_ = getattr(moves.builtins, "print", None)
647 | if print_ is None:
648 | def print_(*args, **kwargs):
649 | """The new-style print function for Python 2.4 and 2.5."""
650 | fp = kwargs.pop("file", sys.stdout)
651 | if fp is None:
652 | return
653 | def write(data):
654 | if not isinstance(data, basestring):
655 | data = str(data)
656 | # If the file has an encoding, encode unicode with it.
657 | if (isinstance(fp, file) and
658 | isinstance(data, unicode) and
659 | fp.encoding is not None):
660 | errors = getattr(fp, "errors", None)
661 | if errors is None:
662 | errors = "strict"
663 | data = data.encode(fp.encoding, errors)
664 | fp.write(data)
665 | want_unicode = False
666 | sep = kwargs.pop("sep", None)
667 | if sep is not None:
668 | if isinstance(sep, unicode):
669 | want_unicode = True
670 | elif not isinstance(sep, str):
671 | raise TypeError("sep must be None or a string")
672 | end = kwargs.pop("end", None)
673 | if end is not None:
674 | if isinstance(end, unicode):
675 | want_unicode = True
676 | elif not isinstance(end, str):
677 | raise TypeError("end must be None or a string")
678 | if kwargs:
679 | raise TypeError("invalid keyword arguments to print()")
680 | if not want_unicode:
681 | for arg in args:
682 | if isinstance(arg, unicode):
683 | want_unicode = True
684 | break
685 | if want_unicode:
686 | newline = unicode("\n")
687 | space = unicode(" ")
688 | else:
689 | newline = "\n"
690 | space = " "
691 | if sep is None:
692 | sep = space
693 | if end is None:
694 | end = newline
695 | for i, arg in enumerate(args):
696 | if i:
697 | write(sep)
698 | write(arg)
699 | write(end)
700 |
701 | _add_doc(reraise, """Reraise an exception.""")
702 |
703 | if sys.version_info[0:2] < (3, 4):
704 | def wraps(wrapped, assigned=functools.WRAPPER_ASSIGNMENTS,
705 | updated=functools.WRAPPER_UPDATES):
706 | def wrapper(f):
707 | f = functools.wraps(wrapped)(f)
708 | f.__wrapped__ = wrapped
709 | return f
710 | return wrapper
711 | else:
712 | wraps = functools.wraps
713 |
714 | def with_metaclass(meta, *bases):
715 | """Create a base class with a metaclass."""
716 | # This requires a bit of explanation: the basic idea is to make a dummy
717 | # metaclass for one level of class instantiation that replaces itself with
718 | # the actual metaclass.
719 | class metaclass(meta):
720 | def __new__(cls, name, this_bases, d):
721 | return meta(name, bases, d)
722 | return type.__new__(metaclass, 'temporary_class', (), {})
723 |
724 |
725 | def add_metaclass(metaclass):
726 | """Class decorator for creating a class with a metaclass."""
727 | def wrapper(cls):
728 | orig_vars = cls.__dict__.copy()
729 | slots = orig_vars.get('__slots__')
730 | if slots is not None:
731 | if isinstance(slots, str):
732 | slots = [slots]
733 | for slots_var in slots:
734 | orig_vars.pop(slots_var)
735 | orig_vars.pop('__dict__', None)
736 | orig_vars.pop('__weakref__', None)
737 | return metaclass(cls.__name__, cls.__bases__, orig_vars)
738 | return wrapper
739 |
740 | # Complete the moves implementation.
741 | # This code is at the end of this module to speed up module loading.
742 | # Turn this module into a package.
743 | __path__ = [] # required for PEP 302 and PEP 451
744 | __package__ = __name__ # see PEP 366 @ReservedAssignment
745 | if globals().get("__spec__") is not None:
746 | __spec__.submodule_search_locations = [] # PEP 451 @UndefinedVariable
747 | # Remove other six meta path importers, since they cause problems. This can
748 | # happen if six is removed from sys.modules and then reloaded. (Setuptools does
749 | # this for some reason.)
750 | if sys.meta_path:
751 | for i, importer in enumerate(sys.meta_path):
752 | # Here's some real nastiness: Another "instance" of the six module might
753 | # be floating around. Therefore, we can't use isinstance() to check for
754 | # the six meta path importer, since the other six instance will have
755 | # inserted an importer with different class.
756 | if (type(importer).__name__ == "_SixMetaPathImporter" and
757 | importer.name == __name__):
758 | del sys.meta_path[i]
759 | break
760 | del i, importer
761 | # Finally, add the importer to the meta path import hook.
762 | sys.meta_path.append(_importer)
763 |
--------------------------------------------------------------------------------
/setup.py:
--------------------------------------------------------------------------------
1 | from distutils.core import setup
2 | setup(
3 | name = 'pandas-ply',
4 | version = '0.2.1',
5 | author = 'Coursera Inc.',
6 | author_email = 'pandas-ply@coursera.org',
7 | packages = [
8 | 'pandas_ply',
9 | 'pandas_ply.vendor',
10 | ],
11 | description = 'functional data manipulation for pandas',
12 | long_description = open('README.rst').read(),
13 | license = 'Apache License 2.0',
14 | url = 'https://github.com/coursera/pandas-ply',
15 | classifiers = [],
16 | )
17 |
--------------------------------------------------------------------------------
/tests/test_methods.py:
--------------------------------------------------------------------------------
1 | import sys
2 | if sys.version_info < (2, 7):
3 | import unittest2 as unittest
4 | else:
5 | import unittest
6 |
7 | from pandas.util.testing import assert_frame_equal
8 | from pandas.util.testing import assert_series_equal
9 | from pandas_ply.methods import install_ply
10 | from pandas_ply.symbolic import X
11 | import pandas as pd
12 |
13 | install_ply(pd)
14 |
15 |
16 | def assert_frame_equiv(df1, df2, **kwargs):
17 | """ Assert that two dataframes are equal, ignoring ordering of columns.
18 |
19 | See http://stackoverflow.com/questions/14224172/equality-in-pandas-
20 | dataframes-column-order-matters
21 | """
22 | return assert_frame_equal(
23 | df1.sort(axis=1),
24 | df2.sort(axis=1),
25 | check_names=True, **kwargs)
26 |
27 | test_df = pd.DataFrame(
28 | {'x': [1, 2, 3, 4], 'y': [4, 3, 2, 1]},
29 | columns=['x', 'y'])
30 | test_series = pd.Series([1, 2, 3, 4])
31 |
32 | test_dfsq = pd.DataFrame(
33 | {'x': [-2, -1, 0, 1, 2], 'xsq': [4, 1, 0, 1, 4]},
34 | columns=['x', 'xsq'])
35 |
36 |
37 | class PlyWhereTest(unittest.TestCase):
38 |
39 | def test_no_conditions(self):
40 | assert_frame_equal(test_df.ply_where(), test_df)
41 |
42 | def test_single_condition(self):
43 | expected = pd.DataFrame(
44 | {'x': [3, 4], 'y': [2, 1]},
45 | index=[2, 3],
46 | columns=['x', 'y'])
47 |
48 | assert_frame_equal(test_df.ply_where(test_df.x > 2.5), expected)
49 | assert_frame_equal(test_df.ply_where(lambda df: df.x > 2.5), expected)
50 | assert_frame_equal(test_df.ply_where(X.x > 2.5), expected)
51 |
52 | def test_multiple_conditions(self):
53 | expected = pd.DataFrame(
54 | {'x': [2, 3], 'y': [3, 2]},
55 | index=[1, 2],
56 | columns=['x', 'y'])
57 |
58 | lo_df = test_df.x > 1.5
59 | hi_df = test_df.x < 3.5
60 | lo_func = lambda df: df.x > 1.5
61 | hi_func = lambda df: df.x < 3.5
62 | lo_sym = X.x > 1.5
63 | hi_sym = X.x < 3.5
64 |
65 | for lo in [lo_df, lo_func, lo_sym]:
66 | for hi in [hi_df, hi_func, hi_sym]:
67 | assert_frame_equal(test_df.ply_where(lo, hi), expected)
68 |
69 |
70 | class PlyWhereForSeriesTest(unittest.TestCase):
71 |
72 | def test_no_conditions(self):
73 | assert_series_equal(test_series.ply_where(), test_series)
74 |
75 | def test_single_condition(self):
76 | expected = pd.Series([3, 4], index=[2, 3])
77 |
78 | assert_series_equal(test_series.ply_where(test_series > 2.5), expected)
79 | assert_series_equal(test_series.ply_where(lambda s: s > 2.5), expected)
80 | assert_series_equal(test_series.ply_where(X > 2.5), expected)
81 |
82 | def test_multiple_conditions(self):
83 | expected = pd.Series([2, 3], index=[1, 2])
84 |
85 | assert_series_equal(
86 | test_series.ply_where(test_series < 3.5, test_series > 1.5), expected)
87 | assert_series_equal(
88 | test_series.ply_where(test_series < 3.5, lambda s: s > 1.5), expected)
89 | assert_series_equal(
90 | test_series.ply_where(test_series < 3.5, X > 1.5), expected)
91 | assert_series_equal(
92 | test_series.ply_where(lambda s: s < 3.5, lambda s: s > 1.5), expected)
93 | assert_series_equal(
94 | test_series.ply_where(lambda s: s < 3.5, X > 1.5), expected)
95 | assert_series_equal(
96 | test_series.ply_where(X < 3.5, X > 1.5), expected)
97 |
98 |
99 | class PlySelectTest(unittest.TestCase):
100 |
101 | def test_bad_arguments(self):
102 | # Nonexistent column, include or exclude
103 | with self.assertRaises(ValueError):
104 | test_df.ply_select('z')
105 | with self.assertRaises(ValueError):
106 | test_df.ply_select('-z')
107 |
108 | # Exclude without asterisk
109 | with self.assertRaises(ValueError):
110 | test_df.ply_select('-x')
111 |
112 | # Include with asterisk
113 | with self.assertRaises(ValueError):
114 | test_df.ply_select('*', 'x')
115 |
116 | def test_noops(self):
117 | assert_frame_equal(test_df.ply_select('*'), test_df)
118 | assert_frame_equal(test_df.ply_select('x', 'y'), test_df)
119 | assert_frame_equiv(test_df.ply_select(x=X.x, y=X.y), test_df)
120 |
121 | def test_reorder(self):
122 | reordered = test_df.ply_select('y', 'x')
123 | assert_frame_equiv(reordered, test_df[['y', 'x']])
124 | self.assertEqual(list(reordered.columns), ['y', 'x'])
125 |
126 | def test_subset_via_includes(self):
127 | assert_frame_equal(test_df.ply_select('x'), test_df[['x']])
128 | assert_frame_equal(test_df.ply_select('y'), test_df[['y']])
129 |
130 | def test_subset_via_excludes(self):
131 | assert_frame_equal(test_df.ply_select('*', '-y'), test_df[['x']])
132 | assert_frame_equal(test_df.ply_select('*', '-x'), test_df[['y']])
133 |
134 | def test_empty(self):
135 | assert_frame_equal(test_df.ply_select(), test_df[[]])
136 | assert_frame_equal(test_df.ply_select('*', '-x', '-y'), test_df[[]])
137 |
138 | def test_ways_of_providing_new_columns(self):
139 | # Value
140 | assert_frame_equal(
141 | test_df.ply_select(new=5),
142 | pd.DataFrame({'new': [5, 5, 5, 5]}))
143 |
144 | # Dataframe-like
145 | assert_frame_equal(
146 | test_df.ply_select(new=[5, 6, 7, 8]),
147 | pd.DataFrame({'new': [5, 6, 7, 8]}))
148 |
149 | # Function
150 | assert_frame_equal(
151 | test_df.ply_select(new=lambda df: df.x),
152 | pd.DataFrame({'new': [1, 2, 3, 4]}))
153 |
154 | # Symbolic expression
155 | assert_frame_equal(
156 | test_df.ply_select(new=X.x),
157 | pd.DataFrame({'new': [1, 2, 3, 4]}))
158 |
159 | def test_old_and_new_together(self):
160 | assert_frame_equal(
161 | test_df.ply_select('x', total=X.x + X.y),
162 | pd.DataFrame(
163 | {'x': [1, 2, 3, 4], 'total': [5, 5, 5, 5]},
164 | columns=['x', 'total']))
165 |
166 | def test_kwarg_overrides_asterisk(self):
167 | assert_frame_equal(
168 | test_df.ply_select('*', y=X.x),
169 | pd.DataFrame({'x': [1, 2, 3, 4], 'y': [1, 2, 3, 4]}))
170 |
171 | def test_kwarg_overrides_column_include(self):
172 | assert_frame_equal(
173 | test_df.ply_select('x', 'y', y=X.x),
174 | pd.DataFrame({'x': [1, 2, 3, 4], 'y': [1, 2, 3, 4]}))
175 |
176 | def test_new_index(self):
177 | assert_frame_equal(
178 | test_df.ply_select('x', index=X.y),
179 | pd.DataFrame(
180 | {'x': [1, 2, 3, 4]},
181 | index=pd.Index([4, 3, 2, 1], name='y')))
182 |
183 |
184 | class PlySelectForGroupsTest(unittest.TestCase):
185 |
186 | def test_simple(self):
187 | grp = test_dfsq.groupby('xsq')
188 | assert_frame_equal(
189 | grp.ply_select(count=X.x.count()),
190 | pd.DataFrame(
191 | {'count': [1, 2, 2]},
192 | index=pd.Index([0, 1, 4], name='xsq')))
193 |
--------------------------------------------------------------------------------
/tests/test_symbolic.py:
--------------------------------------------------------------------------------
1 | import sys
2 | if sys.version_info < (2, 7):
3 | import unittest2 as unittest
4 | else:
5 | import unittest
6 | import mock
7 |
8 | from pandas_ply.symbolic import Call
9 | from pandas_ply.symbolic import GetAttr
10 | from pandas_ply.symbolic import Symbol
11 | from pandas_ply.symbolic import eval_if_symbolic
12 | from pandas_ply.symbolic import sym_call
13 | from pandas_ply.symbolic import to_callable
14 |
15 |
16 | class ExpressionTest(unittest.TestCase):
17 |
18 | # These test whether operations on symbolic expressions correctly construct
19 | # compound symbolic expressions:
20 |
21 | def test_getattr(self):
22 | expr = Symbol('some_symbol').some_attr
23 | self.assertEqual(
24 | repr(expr),
25 | "getattr(Symbol('some_symbol'), 'some_attr')")
26 |
27 | def test_call(self):
28 | expr = Symbol('some_symbol')('arg1', 'arg2', kwarg_name='kwarg value')
29 | self.assertEqual(
30 | repr(expr),
31 | "Symbol('some_symbol')(*('arg1', 'arg2'), " +
32 | "**{'kwarg_name': 'kwarg value'})")
33 |
34 | def test_ops(self):
35 | expr = Symbol('some_symbol') + 1
36 | self.assertEqual(
37 | repr(expr),
38 | "getattr(Symbol('some_symbol'), '__add__')(*(1,), **{})")
39 |
40 | expr = 1 + Symbol('some_symbol')
41 | self.assertEqual(
42 | repr(expr),
43 | "getattr(Symbol('some_symbol'), '__radd__')(*(1,), **{})")
44 |
45 | expr = Symbol('some_symbol')['key']
46 | self.assertEqual(
47 | repr(expr),
48 | "getattr(Symbol('some_symbol'), '__getitem__')(*('key',), **{})")
49 |
50 |
51 | class SymbolTest(unittest.TestCase):
52 |
53 | def test_eval(self):
54 | self.assertEqual(
55 | Symbol('some_symbol')._eval({'some_symbol': 'value'}),
56 | 'value')
57 | self.assertEqual(
58 | Symbol('some_symbol')._eval(
59 | {'some_symbol': 'value', 'other_symbol': 'irrelevant'}),
60 | 'value')
61 | with self.assertRaises(KeyError):
62 | Symbol('some_symbol')._eval({'other_symbol': 'irrelevant'}),
63 |
64 | def test_repr(self):
65 | self.assertEqual(repr(Symbol('some_symbol')), "Symbol('some_symbol')")
66 |
67 |
68 | class GetAttrTest(unittest.TestCase):
69 |
70 | def test_eval_with_nonsymbolic_object(self):
71 | some_obj = mock.Mock()
72 | del some_obj._eval
73 | # Ensure constructing the expression does not access `.some_attr`.
74 | del some_obj.some_attr
75 |
76 | with self.assertRaises(AttributeError):
77 | some_obj.some_attr
78 | expr = GetAttr(some_obj, 'some_attr')
79 |
80 | some_obj.some_attr = 'attribute value'
81 |
82 | self.assertEqual(expr._eval({}), 'attribute value')
83 |
84 | def test_eval_with_symbolic_object(self):
85 | some_obj = mock.Mock()
86 | del some_obj._eval
87 | some_obj.some_attr = 'attribute value'
88 |
89 | expr = GetAttr(Symbol('some_symbol'), 'some_attr')
90 |
91 | self.assertEqual(
92 | expr._eval({'some_symbol': some_obj}),
93 | 'attribute value')
94 |
95 | def test_repr(self):
96 | self.assertEqual(
97 | repr(GetAttr('object', 'attrname')),
98 | "getattr('object', 'attrname')")
99 |
100 |
101 | class CallTest(unittest.TestCase):
102 |
103 | def test_eval_with_nonsymbolic_func(self):
104 | func = mock.Mock(return_value='return value')
105 | del func._eval # So it doesn't pretend to be symbolic
106 |
107 | expr = Call(func, ('arg1', 'arg2'), {'kwarg_name': 'kwarg value'})
108 |
109 | # Ensure constructing the expression does not call the function
110 | self.assertFalse(func.called)
111 |
112 | result = expr._eval({})
113 |
114 | func.assert_called_once_with('arg1', 'arg2', kwarg_name='kwarg value')
115 | self.assertEqual(result, 'return value')
116 |
117 | def test_eval_with_symbolic_func(self):
118 | func = mock.Mock(return_value='return value')
119 | del func._eval # So it doesn't pretend to be symbolic
120 |
121 | expr = Call(
122 | Symbol('some_symbol'),
123 | ('arg1', 'arg2'),
124 | {'kwarg_name': 'kwarg value'})
125 |
126 | result = expr._eval({'some_symbol': func})
127 |
128 | func.assert_called_once_with('arg1', 'arg2', kwarg_name='kwarg value')
129 | self.assertEqual(result, 'return value')
130 |
131 | def test_eval_with_symbolic_arg(self):
132 | func = mock.Mock(return_value='return value')
133 | del func._eval # So it doesn't pretend to be symbolic
134 |
135 | expr = Call(
136 | func,
137 | (Symbol('some_symbol'), 'arg2'),
138 | {'kwarg_name': 'kwarg value'})
139 |
140 | result = expr._eval({'some_symbol': 'arg1'})
141 |
142 | func.assert_called_once_with('arg1', 'arg2', kwarg_name='kwarg value')
143 | self.assertEqual(result, 'return value')
144 |
145 | def test_eval_with_symbol_kwarg(self):
146 | func = mock.Mock(return_value='return value')
147 | del func._eval # So it doesn't pretend to be symbolic
148 |
149 | expr = Call(
150 | func,
151 | ('arg1', 'arg2'),
152 | {'kwarg_name': Symbol('some_symbol')})
153 |
154 | result = expr._eval({'some_symbol': 'kwarg value'})
155 |
156 | func.assert_called_once_with('arg1', 'arg2', kwarg_name='kwarg value')
157 | self.assertEqual(result, 'return value')
158 |
159 | def test_repr(self):
160 | # One arg
161 | self.assertEqual(
162 | repr(Call('func', ('arg1',), {'kwarg_name': 'kwarg value'})),
163 | "'func'(*('arg1',), **{'kwarg_name': 'kwarg value'})")
164 |
165 | # Two args
166 | self.assertEqual(
167 | repr(Call(
168 | 'func',
169 | ('arg1', 'arg2'),
170 | {'kwarg_name': 'kwarg value'})),
171 | "'func'(*('arg1', 'arg2'), **{'kwarg_name': 'kwarg value'})")
172 |
173 |
174 | class FunctionsTest(unittest.TestCase):
175 |
176 | def test_eval_if_symbolic(self):
177 | self.assertEqual(
178 | eval_if_symbolic(
179 | 'nonsymbolic',
180 | {'some_symbol': 'symbol_value'}),
181 | 'nonsymbolic')
182 | self.assertEqual(
183 | eval_if_symbolic(
184 | Symbol('some_symbol'),
185 | {'some_symbol': 'symbol_value'}),
186 | 'symbol_value')
187 |
188 | def test_to_callable_from_nonsymbolic_noncallable(self):
189 | test_callable = to_callable('nonsymbolic')
190 | self.assertEqual(
191 | test_callable('arg1', 'arg2', kwarg_name='kwarg value'),
192 | 'nonsymbolic')
193 |
194 | def test_to_callable_from_nonsymbolic_callable(self):
195 | func = mock.Mock(return_value='return value')
196 | del func._eval # So it doesn't pretend to be symbolic
197 |
198 | test_callable = to_callable(func)
199 |
200 | # Ensure running to_callable does not call the function
201 | self.assertFalse(func.called)
202 |
203 | result = test_callable('arg1', 'arg2', kwarg_name='kwarg value')
204 |
205 | func.assert_called_once_with('arg1', 'arg2', kwarg_name='kwarg value')
206 | self.assertEqual(result, 'return value')
207 |
208 | def test_to_callable_from_symbolic(self):
209 | mock_expr = mock.Mock()
210 | mock_expr._eval.return_value = 'eval return value'
211 |
212 | test_callable = to_callable(mock_expr)
213 |
214 | # Ensure running to_callable does not evaluate the expression
215 | self.assertFalse(mock_expr._eval.called)
216 |
217 | result = test_callable('arg1', 'arg2', kwarg_name='kwarg value')
218 |
219 | mock_expr._eval.assert_called_once_with(
220 | {0: 'arg1', 1: 'arg2', 'kwarg_name': 'kwarg value'})
221 | self.assertEqual(result, 'eval return value')
222 |
223 | def test_sym_call(self):
224 | expr = sym_call(
225 | 'func', Symbol('some_symbol'), 'arg1', 'arg2',
226 | kwarg_name='kwarg value')
227 | self.assertEqual(
228 | repr(expr),
229 | "'func'(*(Symbol('some_symbol'), 'arg1', 'arg2'), " +
230 | "**{'kwarg_name': 'kwarg value'})")
231 |
232 |
233 | class IntegrationTest(unittest.TestCase):
234 |
235 | def test_pythagoras(self):
236 | from math import sqrt
237 |
238 | X = Symbol('X')
239 | Y = Symbol('Y')
240 |
241 | expr = sym_call(sqrt, X ** 2 + Y ** 2)
242 | func = to_callable(expr)
243 |
244 | self.assertEqual(func(X=3, Y=4), 5)
245 |
--------------------------------------------------------------------------------