├── .gitignore ├── .travis.yml ├── LICENSE ├── MANIFEST.in ├── README.rst ├── docs ├── Makefile ├── conf.py └── index.rst ├── dplyr-comparison.html ├── pandas_ply ├── __init__.py ├── methods.py ├── symbolic.py └── vendor │ ├── __init__.py │ └── six.py ├── setup.py └── tests ├── test_methods.py └── test_symbolic.py /.gitignore: -------------------------------------------------------------------------------- 1 | *.pyc 2 | *.egg-info 3 | *.egg 4 | /MANIFEST 5 | /dist/ 6 | /docs/_build 7 | /build/ 8 | -------------------------------------------------------------------------------- /.travis.yml: -------------------------------------------------------------------------------- 1 | sudo: false 2 | language: python 3 | python: 4 | - "2.6" 5 | - "2.7" 6 | - "3.3" 7 | - "3.4" 8 | install: 9 | - if [[ "$TRAVIS_PYTHON_VERSION" == "2.7" ]]; then 10 | wget https://repo.continuum.io/miniconda/Miniconda-latest-Linux-x86_64.sh -O miniconda.sh; 11 | else 12 | wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh -O miniconda.sh; 13 | fi 14 | - bash miniconda.sh -b -p $HOME/miniconda 15 | - export PATH="$HOME/miniconda/bin:$PATH" 16 | - hash -r 17 | - conda config --set always_yes yes --set changeps1 no 18 | - conda update -q conda 19 | - conda info -a 20 | 21 | - conda create -q -n test-environment python=$TRAVIS_PYTHON_VERSION pandas nose mock 22 | - source activate test-environment 23 | - if [[ $TRAVIS_PYTHON_VERSION == 2.6 ]]; then conda install unittest2; fi 24 | - python setup.py install 25 | script: 26 | - nosetests -w tests/ -v -s 27 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | Copyright 2015 Coursera Inc. 2 | 3 | Licensed under the Apache License, Version 2.0 (the "License"); 4 | you may not use this file except in compliance with the License. 5 | You may obtain a copy of the License at 6 | 7 | http://www.apache.org/licenses/LICENSE-2.0 8 | 9 | Unless required by applicable law or agreed to in writing, software 10 | distributed under the License is distributed on an "AS IS" BASIS, 11 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 12 | See the License for the specific language governing permissions and 13 | limitations under the License. 14 | -------------------------------------------------------------------------------- /MANIFEST.in: -------------------------------------------------------------------------------- 1 | include README.rst LICENSE 2 | -------------------------------------------------------------------------------- /README.rst: -------------------------------------------------------------------------------- 1 | **pandas-ply**: functional data manipulation for pandas 2 | ======================================================= 3 | 4 | **pandas-ply** is a thin layer which makes it easier to manipulate data with `pandas `_. In particular, it provides elegant, functional, chainable syntax in cases where **pandas** would require mutation, saved intermediate values, or other awkward constructions. In this way, it aims to move **pandas** closer to the "grammar of data manipulation" provided by the `dplyr `_ package for R. 5 | 6 | For example, take the **dplyr** code below: 7 | 8 | .. code:: r 9 | 10 | flights %>% 11 | group_by(year, month, day) %>% 12 | summarise( 13 | arr = mean(arr_delay, na.rm = TRUE), 14 | dep = mean(dep_delay, na.rm = TRUE) 15 | ) %>% 16 | filter(arr > 30 & dep > 30) 17 | 18 | The most common way to express this in **pandas** is probably: 19 | 20 | .. code:: python 21 | 22 | grouped_flights = flights.groupby(['year', 'month', 'day']) 23 | output = pd.DataFrame() 24 | output['arr'] = grouped_flights.arr_delay.mean() 25 | output['dep'] = grouped_flights.dep_delay.mean() 26 | filtered_output = output[(output.arr > 30) & (output.dep > 30)] 27 | 28 | **pandas-ply** lets you instead write: 29 | 30 | .. code:: python 31 | 32 | (flights 33 | .groupby(['year', 'month', 'day']) 34 | .ply_select( 35 | arr = X.arr_delay.mean(), 36 | dep = X.dep_delay.mean()) 37 | .ply_where(X.arr > 30, X.dep > 30)) 38 | 39 | In our opinion, this **pandas-ply** code is cleaner, more expressive, more readable, more concise, and less error-prone than the original **pandas** code. 40 | 41 | Explanatory notes on the **pandas-ply** code sample above: 42 | 43 | * **pandas-ply**'s methods (like ``ply_select`` and ``ply_where`` above) are attached directly to **pandas** objects and can be used immediately, without any wrapping or redirection. They start with a ``ply_`` prefix to distinguish them from built-in **pandas** methods. 44 | * **pandas-ply**'s methods are named for (and modelled after) SQL's operators. (But keep in mind that these operators will not always appear in the same order as they do in a SQL statement: ``SELECT a FROM b WHERE c GROUP BY d`` probably maps to ``b.ply_where(c).groupby(d).ply_select(a)``.) 45 | * **pandas-ply** includes a simple system for building "symbolic expressions" to provide as arguments to its methods. ``X`` above is an instance of ``ply.symbolic.Symbol``. Operations on this symbol produce larger compound symbolic expressions. When ``pandas-ply`` receives a symbolic expression as an argument, it converts it into a function. So, for instance, ``X.arr > 30`` in the above code could have instead been provided as ``lambda x: x.arr > 30``. Use of symbolic expressions allows the ``lambda x:`` to be left off, resulting in less cluttered code. 46 | 47 | Warning 48 | ------- 49 | 50 | **pandas-ply** is new, and in an experimental stage of its development. The API is not yet stable. Expect the unexpected. 51 | 52 | (Pull requests are welcome. Feel free to contact us at pandas-ply@coursera.org.) 53 | 54 | Using **pandas-ply** 55 | -------------------- 56 | 57 | Install **pandas-ply** with: 58 | 59 | :: 60 | 61 | $ pip install pandas-ply 62 | 63 | 64 | Typical use of **pandas-ply** starts with: 65 | 66 | .. code:: python 67 | 68 | import pandas as pd 69 | from pandas_ply import install_ply, X, sym_call 70 | 71 | install_ply(pd) 72 | 73 | After calling ``install_ply``, all **pandas** objects have **pandas-ply**'s methods attached. 74 | 75 | API reference 76 | ------------- 77 | 78 | Full API reference is available at ``_. 79 | 80 | Possible TODOs 81 | -------------- 82 | 83 | * Extend ``pandas``' native ``groupby`` to support symbolic expressions? 84 | * Extend ``pandas``' native ``apply`` to support symbolic expressions? 85 | * Add ``.ply_call`` to ``pandas`` objects to extend chainability? 86 | * Version of ``ply_select`` which supports later computed columns relying on earlier computed columns? 87 | * Version of ``ply_select`` which supports careful column ordering? 88 | * Better handling of indices? 89 | 90 | License 91 | ------- 92 | 93 | Copyright 2015 Coursera Inc. 94 | 95 | Licensed under the Apache License, Version 2.0 (the "License"); 96 | you may not use this file except in compliance with the License. 97 | You may obtain a copy of the License at 98 | 99 | http://www.apache.org/licenses/LICENSE-2.0 100 | 101 | Unless required by applicable law or agreed to in writing, software 102 | distributed under the License is distributed on an "AS IS" BASIS, 103 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 104 | See the License for the specific language governing permissions and 105 | limitations under the License. 106 | -------------------------------------------------------------------------------- /docs/Makefile: -------------------------------------------------------------------------------- 1 | # Makefile for Sphinx documentation 2 | # 3 | 4 | # You can set these variables from the command line. 5 | SPHINXOPTS = 6 | SPHINXBUILD = sphinx-build 7 | PAPER = 8 | BUILDDIR = _build 9 | 10 | # Internal variables. 11 | PAPEROPT_a4 = -D latex_paper_size=a4 12 | PAPEROPT_letter = -D latex_paper_size=letter 13 | ALLSPHINXOPTS = -d $(BUILDDIR)/doctrees $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) . 14 | # the i18n builder cannot share the environment and doctrees with the others 15 | I18NSPHINXOPTS = $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) . 16 | 17 | .PHONY: help clean html dirhtml singlehtml pickle json htmlhelp qthelp devhelp epub latex latexpdf text man changes linkcheck doctest gettext 18 | 19 | help: 20 | @echo "Please use \`make ' where is one of" 21 | @echo " html to make standalone HTML files" 22 | @echo " dirhtml to make HTML files named index.html in directories" 23 | @echo " singlehtml to make a single large HTML file" 24 | @echo " pickle to make pickle files" 25 | @echo " json to make JSON files" 26 | @echo " htmlhelp to make HTML files and a HTML help project" 27 | @echo " qthelp to make HTML files and a qthelp project" 28 | @echo " devhelp to make HTML files and a Devhelp project" 29 | @echo " epub to make an epub" 30 | @echo " latex to make LaTeX files, you can set PAPER=a4 or PAPER=letter" 31 | @echo " latexpdf to make LaTeX files and run them through pdflatex" 32 | @echo " text to make text files" 33 | @echo " man to make manual pages" 34 | @echo " texinfo to make Texinfo files" 35 | @echo " info to make Texinfo files and run them through makeinfo" 36 | @echo " gettext to make PO message catalogs" 37 | @echo " changes to make an overview of all changed/added/deprecated items" 38 | @echo " linkcheck to check all external links for integrity" 39 | @echo " doctest to run all doctests embedded in the documentation (if enabled)" 40 | 41 | clean: 42 | -rm -rf $(BUILDDIR)/* 43 | 44 | html: 45 | $(SPHINXBUILD) -b html $(ALLSPHINXOPTS) $(BUILDDIR)/html 46 | @echo 47 | @echo "Build finished. The HTML pages are in $(BUILDDIR)/html." 48 | 49 | dirhtml: 50 | $(SPHINXBUILD) -b dirhtml $(ALLSPHINXOPTS) $(BUILDDIR)/dirhtml 51 | @echo 52 | @echo "Build finished. The HTML pages are in $(BUILDDIR)/dirhtml." 53 | 54 | singlehtml: 55 | $(SPHINXBUILD) -b singlehtml $(ALLSPHINXOPTS) $(BUILDDIR)/singlehtml 56 | @echo 57 | @echo "Build finished. The HTML page is in $(BUILDDIR)/singlehtml." 58 | 59 | pickle: 60 | $(SPHINXBUILD) -b pickle $(ALLSPHINXOPTS) $(BUILDDIR)/pickle 61 | @echo 62 | @echo "Build finished; now you can process the pickle files." 63 | 64 | json: 65 | $(SPHINXBUILD) -b json $(ALLSPHINXOPTS) $(BUILDDIR)/json 66 | @echo 67 | @echo "Build finished; now you can process the JSON files." 68 | 69 | htmlhelp: 70 | $(SPHINXBUILD) -b htmlhelp $(ALLSPHINXOPTS) $(BUILDDIR)/htmlhelp 71 | @echo 72 | @echo "Build finished; now you can run HTML Help Workshop with the" \ 73 | ".hhp project file in $(BUILDDIR)/htmlhelp." 74 | 75 | qthelp: 76 | $(SPHINXBUILD) -b qthelp $(ALLSPHINXOPTS) $(BUILDDIR)/qthelp 77 | @echo 78 | @echo "Build finished; now you can run "qcollectiongenerator" with the" \ 79 | ".qhcp project file in $(BUILDDIR)/qthelp, like this:" 80 | @echo "# qcollectiongenerator $(BUILDDIR)/qthelp/pandas-ply.qhcp" 81 | @echo "To view the help file:" 82 | @echo "# assistant -collectionFile $(BUILDDIR)/qthelp/pandas-ply.qhc" 83 | 84 | devhelp: 85 | $(SPHINXBUILD) -b devhelp $(ALLSPHINXOPTS) $(BUILDDIR)/devhelp 86 | @echo 87 | @echo "Build finished." 88 | @echo "To view the help file:" 89 | @echo "# mkdir -p $$HOME/.local/share/devhelp/pandas-ply" 90 | @echo "# ln -s $(BUILDDIR)/devhelp $$HOME/.local/share/devhelp/pandas-ply" 91 | @echo "# devhelp" 92 | 93 | epub: 94 | $(SPHINXBUILD) -b epub $(ALLSPHINXOPTS) $(BUILDDIR)/epub 95 | @echo 96 | @echo "Build finished. The epub file is in $(BUILDDIR)/epub." 97 | 98 | latex: 99 | $(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex 100 | @echo 101 | @echo "Build finished; the LaTeX files are in $(BUILDDIR)/latex." 102 | @echo "Run \`make' in that directory to run these through (pdf)latex" \ 103 | "(use \`make latexpdf' here to do that automatically)." 104 | 105 | latexpdf: 106 | $(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex 107 | @echo "Running LaTeX files through pdflatex..." 108 | $(MAKE) -C $(BUILDDIR)/latex all-pdf 109 | @echo "pdflatex finished; the PDF files are in $(BUILDDIR)/latex." 110 | 111 | text: 112 | $(SPHINXBUILD) -b text $(ALLSPHINXOPTS) $(BUILDDIR)/text 113 | @echo 114 | @echo "Build finished. The text files are in $(BUILDDIR)/text." 115 | 116 | man: 117 | $(SPHINXBUILD) -b man $(ALLSPHINXOPTS) $(BUILDDIR)/man 118 | @echo 119 | @echo "Build finished. The manual pages are in $(BUILDDIR)/man." 120 | 121 | texinfo: 122 | $(SPHINXBUILD) -b texinfo $(ALLSPHINXOPTS) $(BUILDDIR)/texinfo 123 | @echo 124 | @echo "Build finished. The Texinfo files are in $(BUILDDIR)/texinfo." 125 | @echo "Run \`make' in that directory to run these through makeinfo" \ 126 | "(use \`make info' here to do that automatically)." 127 | 128 | info: 129 | $(SPHINXBUILD) -b texinfo $(ALLSPHINXOPTS) $(BUILDDIR)/texinfo 130 | @echo "Running Texinfo files through makeinfo..." 131 | make -C $(BUILDDIR)/texinfo info 132 | @echo "makeinfo finished; the Info files are in $(BUILDDIR)/texinfo." 133 | 134 | gettext: 135 | $(SPHINXBUILD) -b gettext $(I18NSPHINXOPTS) $(BUILDDIR)/locale 136 | @echo 137 | @echo "Build finished. The message catalogs are in $(BUILDDIR)/locale." 138 | 139 | changes: 140 | $(SPHINXBUILD) -b changes $(ALLSPHINXOPTS) $(BUILDDIR)/changes 141 | @echo 142 | @echo "The overview file is in $(BUILDDIR)/changes." 143 | 144 | linkcheck: 145 | $(SPHINXBUILD) -b linkcheck $(ALLSPHINXOPTS) $(BUILDDIR)/linkcheck 146 | @echo 147 | @echo "Link check complete; look for any errors in the above output " \ 148 | "or in $(BUILDDIR)/linkcheck/output.txt." 149 | 150 | doctest: 151 | $(SPHINXBUILD) -b doctest $(ALLSPHINXOPTS) $(BUILDDIR)/doctest 152 | @echo "Testing of doctests in the sources finished, look at the " \ 153 | "results in $(BUILDDIR)/doctest/output.txt." 154 | -------------------------------------------------------------------------------- /docs/conf.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | # 3 | # pandas-ply documentation build configuration file, created by 4 | # sphinx-quickstart on Tue Nov 18 19:40:12 2014. 5 | # 6 | # This file is execfile()d with the current directory set to its containing dir. 7 | # 8 | # Note that not all possible configuration values are present in this 9 | # autogenerated file. 10 | # 11 | # All configuration values have a default; values that are commented out 12 | # serve to show the default. 13 | 14 | import sys, os 15 | import sphinx_rtd_theme 16 | 17 | # If extensions (or modules to document with autodoc) are in another directory, 18 | # add these directories to sys.path here. If the directory is relative to the 19 | # documentation root, use os.path.abspath to make it absolute, like shown here. 20 | sys.path.insert(0, os.path.abspath('..')) 21 | 22 | # -- General configuration ----------------------------------------------------- 23 | 24 | # If your documentation needs a minimal Sphinx version, state it here. 25 | #needs_sphinx = '1.0' 26 | 27 | # Add any Sphinx extension module names here, as strings. They can be extensions 28 | # coming with Sphinx (named 'sphinx.ext.*') or your custom ones. 29 | extensions = [ 30 | 'sphinx.ext.autodoc', 31 | 'sphinx.ext.doctest', 32 | 'sphinx.ext.coverage', 33 | 'sphinxcontrib.napoleon' 34 | ] 35 | 36 | # Napoleon settings 37 | napoleon_google_docstring = True 38 | napoleon_numpy_docstring = True 39 | napoleon_include_private_with_doc = False 40 | napoleon_include_special_with_doc = True 41 | napoleon_use_admonition_for_examples = False 42 | napoleon_use_admonition_for_notes = False 43 | napoleon_use_admonition_for_references = False 44 | napoleon_use_ivar = False 45 | napoleon_use_param = True 46 | napoleon_use_rtype = True 47 | autodoc_member_order = 'bysource' 48 | 49 | # Add any paths that contain templates here, relative to this directory. 50 | templates_path = ['_templates'] 51 | 52 | # The suffix of source filenames. 53 | source_suffix = '.rst' 54 | 55 | # The encoding of source files. 56 | #source_encoding = 'utf-8-sig' 57 | 58 | # The master toctree document. 59 | master_doc = 'index' 60 | 61 | # General information about the project. 62 | project = u'pandas-ply' 63 | copyright = u'2015, Coursera' 64 | 65 | # The version info for the project you're documenting, acts as replacement for 66 | # |version| and |release|, also used in various other places throughout the 67 | # built documents. 68 | # 69 | # The short X.Y version. 70 | version = '0.2.1' 71 | # The full version, including alpha/beta/rc tags. 72 | release = '0.2.1' 73 | 74 | # The language for content autogenerated by Sphinx. Refer to documentation 75 | # for a list of supported languages. 76 | #language = None 77 | 78 | # There are two options for replacing |today|: either, you set today to some 79 | # non-false value, then it is used: 80 | #today = '' 81 | # Else, today_fmt is used as the format for a strftime call. 82 | #today_fmt = '%B %d, %Y' 83 | 84 | # List of patterns, relative to source directory, that match files and 85 | # directories to ignore when looking for source files. 86 | exclude_patterns = ['_build'] 87 | 88 | # The reST default role (used for this markup: `text`) to use for all documents. 89 | #default_role = None 90 | 91 | # If true, '()' will be appended to :func: etc. cross-reference text. 92 | #add_function_parentheses = True 93 | 94 | # If true, the current module name will be prepended to all description 95 | # unit titles (such as .. function::). 96 | #add_module_names = True 97 | 98 | # If true, sectionauthor and moduleauthor directives will be shown in the 99 | # output. They are ignored by default. 100 | #show_authors = False 101 | 102 | # The name of the Pygments (syntax highlighting) style to use. 103 | pygments_style = 'sphinx' 104 | 105 | # A list of ignored prefixes for module index sorting. 106 | #modindex_common_prefix = [] 107 | 108 | 109 | # -- Options for HTML output --------------------------------------------------- 110 | 111 | # The theme to use for HTML and HTML Help pages. See the documentation for 112 | # a list of builtin themes. 113 | html_theme = 'sphinx_rtd_theme' 114 | 115 | # Theme options are theme-specific and customize the look and feel of a theme 116 | # further. For a list of options available for each theme, see the 117 | # documentation. 118 | #html_theme_options = {} 119 | 120 | # Add any paths that contain custom themes here, relative to this directory. 121 | html_theme_path = [sphinx_rtd_theme.get_html_theme_path()] 122 | 123 | # The name for this set of Sphinx documents. If None, it defaults to 124 | # " v documentation". 125 | #html_title = None 126 | 127 | # A shorter title for the navigation bar. Default is the same as html_title. 128 | #html_short_title = None 129 | 130 | # The name of an image file (relative to this directory) to place at the top 131 | # of the sidebar. 132 | #html_logo = None 133 | 134 | # The name of an image file (within the static path) to use as favicon of the 135 | # docs. This file should be a Windows icon file (.ico) being 16x16 or 32x32 136 | # pixels large. 137 | #html_favicon = None 138 | 139 | # Add any paths that contain custom static files (such as style sheets) here, 140 | # relative to this directory. They are copied after the builtin static files, 141 | # so a file named "default.css" will overwrite the builtin "default.css". 142 | #html_static_path = ['_static'] 143 | 144 | # If not '', a 'Last updated on:' timestamp is inserted at every page bottom, 145 | # using the given strftime format. 146 | #html_last_updated_fmt = '%b %d, %Y' 147 | 148 | # If true, SmartyPants will be used to convert quotes and dashes to 149 | # typographically correct entities. 150 | #html_use_smartypants = True 151 | 152 | # Custom sidebar templates, maps document names to template names. 153 | #html_sidebars = {} 154 | 155 | # Additional templates that should be rendered to pages, maps page names to 156 | # template names. 157 | #html_additional_pages = {} 158 | 159 | # If false, no module index is generated. 160 | #html_domain_indices = True 161 | 162 | # If false, no index is generated. 163 | #html_use_index = True 164 | 165 | # If true, the index is split into individual pages for each letter. 166 | #html_split_index = False 167 | 168 | # If true, links to the reST sources are added to the pages. 169 | #html_show_sourcelink = True 170 | 171 | # If true, "Created using Sphinx" is shown in the HTML footer. Default is True. 172 | #html_show_sphinx = True 173 | 174 | # If true, "(C) Copyright ..." is shown in the HTML footer. Default is True. 175 | #html_show_copyright = True 176 | 177 | # If true, an OpenSearch description file will be output, and all pages will 178 | # contain a tag referring to it. The value of this option must be the 179 | # base URL from which the finished HTML is served. 180 | #html_use_opensearch = '' 181 | 182 | # This is the file name suffix for HTML files (e.g. ".xhtml"). 183 | #html_file_suffix = None 184 | 185 | # Output file base name for HTML help builder. 186 | #htmlhelp_basename = 'pandas-plydoc' 187 | 188 | 189 | # -- Options for LaTeX output -------------------------------------------------- 190 | 191 | latex_elements = { 192 | # The paper size ('letterpaper' or 'a4paper'). 193 | #'papersize': 'letterpaper', 194 | 195 | # The font size ('10pt', '11pt' or '12pt'). 196 | #'pointsize': '10pt', 197 | 198 | # Additional stuff for the LaTeX preamble. 199 | #'preamble': '', 200 | } 201 | 202 | # Grouping the document tree into LaTeX files. List of tuples 203 | # (source start file, target name, title, author, documentclass [howto/manual]). 204 | latex_documents = [ 205 | ('index', 'pandas-ply.tex', u'pandas-ply Documentation', 206 | u'Coursera', 'manual'), 207 | ] 208 | 209 | # The name of an image file (relative to this directory) to place at the top of 210 | # the title page. 211 | #latex_logo = None 212 | 213 | # For "manual" documents, if this is true, then toplevel headings are parts, 214 | # not chapters. 215 | #latex_use_parts = False 216 | 217 | # If true, show page references after internal links. 218 | #latex_show_pagerefs = False 219 | 220 | # If true, show URL addresses after external links. 221 | #latex_show_urls = False 222 | 223 | # Documents to append as an appendix to all manuals. 224 | #latex_appendices = [] 225 | 226 | # If false, no module index is generated. 227 | #latex_domain_indices = True 228 | 229 | 230 | # -- Options for manual page output -------------------------------------------- 231 | 232 | # One entry per manual page. List of tuples 233 | # (source start file, name, description, authors, manual section). 234 | man_pages = [ 235 | ('index', 'pandas-ply', u'pandas-ply Documentation', 236 | [u'Coursera'], 1) 237 | ] 238 | 239 | # If true, show URL addresses after external links. 240 | #man_show_urls = False 241 | 242 | 243 | # -- Options for Texinfo output ------------------------------------------------ 244 | 245 | # Grouping the document tree into Texinfo files. List of tuples 246 | # (source start file, target name, title, author, 247 | # dir menu entry, description, category) 248 | texinfo_documents = [ 249 | ('index', 'pandas-ply', u'pandas-ply Documentation', 250 | u'Coursera', 'pandas-ply', 'functional data manipulation for pandas', 251 | 'Miscellaneous'), 252 | ] 253 | 254 | # Documents to append as an appendix to all manuals. 255 | #texinfo_appendices = [] 256 | 257 | # If false, no module index is generated. 258 | #texinfo_domain_indices = True 259 | 260 | # How to display URL addresses: 'footnote', 'no', or 'inline'. 261 | #texinfo_show_urls = 'footnote' 262 | -------------------------------------------------------------------------------- /docs/index.rst: -------------------------------------------------------------------------------- 1 | **pandas-ply**: functional data manipulation for pandas 2 | ======================================================= 3 | 4 | **pandas-ply** is a thin layer which makes it easier to manipulate data with `pandas `_. In particular, it provides elegant, functional, chainable syntax in cases where **pandas** would require mutation, saved intermediate values, or other awkward constructions. In this way, it aims to move **pandas** closer to the "grammar of data manipulation" provided by the `dplyr `_ package for R. 5 | 6 | For example, take the **dplyr** code below: 7 | 8 | .. code:: r 9 | 10 | flights %>% 11 | group_by(year, month, day) %>% 12 | summarise( 13 | arr = mean(arr_delay, na.rm = TRUE), 14 | dep = mean(dep_delay, na.rm = TRUE) 15 | ) %>% 16 | filter(arr > 30 & dep > 30) 17 | 18 | The most common way to express this in **pandas** is probably: 19 | 20 | .. code:: python 21 | 22 | grouped_flights = flights.groupby(['year', 'month', 'day']) 23 | output = pd.DataFrame() 24 | output['arr'] = grouped_flights.arr_delay.mean() 25 | output['dep'] = grouped_flights.dep_delay.mean() 26 | filtered_output = output[(output.arr > 30) & (output.dep > 30)] 27 | 28 | **pandas-ply** lets you instead write: 29 | 30 | .. code:: python 31 | 32 | (flights 33 | .groupby(['year', 'month', 'day']) 34 | .ply_select( 35 | arr = X.arr_delay.mean(), 36 | dep = X.dep_delay.mean()) 37 | .ply_where(X.arr > 30, X.dep > 30)) 38 | 39 | In our opinion, this **pandas-ply** code is cleaner, more expressive, more readable, more concise, and less error-prone than the original **pandas** code. 40 | 41 | Explanatory notes on the **pandas-ply** code sample above: 42 | 43 | * **pandas-ply**'s methods (like ``ply_select`` and ``ply_where`` above) are attached directly to **pandas** objects and can be used immediately, without any wrapping or redirection. They start with a ``ply_`` prefix to distinguish them from built-in **pandas** methods. 44 | * **pandas-ply**'s methods are named for (and modelled after) SQL's operators. (But keep in mind that these operators will not always appear in the same order as they do in a SQL statement: ``SELECT a FROM b WHERE c GROUP BY d`` probably maps to ``b.ply_where(c).groupby(d).ply_select(a)``.) 45 | * **pandas-ply** includes a simple system for building "symbolic expressions" to provide as arguments to its methods. ``X`` above is an instance of ``ply.symbolic.Symbol``. Operations on this symbol produce larger compound symbolic expressions. When ``pandas-ply`` receives a symbolic expression as an argument, it converts it into a function. So, for instance, ``X.arr > 30`` in the above code could have instead been provided as ``lambda x: x.arr > 30``. Use of symbolic expressions allows the ``lambda x:`` to be left off, resulting in less cluttered code. 46 | 47 | Warning 48 | ------- 49 | 50 | **pandas-ply** is new, and in an experimental stage of its development. The API is not yet stable. Expect the unexpected. 51 | 52 | (Pull requests are welcome. Feel free to contact us at pandas-ply@coursera.org.) 53 | 54 | Using **pandas-ply** 55 | -------------------- 56 | 57 | Install **pandas-ply** with: 58 | 59 | :: 60 | 61 | $ pip install pandas-ply 62 | 63 | 64 | Typical use of **pandas-ply** starts with: 65 | 66 | .. code:: python 67 | 68 | import pandas as pd 69 | from pandas_ply import install_ply, X, sym_call 70 | 71 | install_ply(pd) 72 | 73 | After calling ``install_ply``, all **pandas** objects have **pandas-ply**'s methods attached. 74 | 75 | API reference 76 | ------------- 77 | 78 | pandas extensions 79 | ~~~~~~~~~~~~~~~~~ 80 | 81 | .. automodule:: ply.methods 82 | :members: 83 | :undoc-members: 84 | :show-inheritance: 85 | 86 | `ply.symbolic` 87 | ~~~~~~~~~~~~~~ 88 | 89 | .. automodule:: ply.symbolic 90 | :members: 91 | :undoc-members: 92 | :private-members: 93 | :show-inheritance: 94 | -------------------------------------------------------------------------------- /pandas_ply/__init__.py: -------------------------------------------------------------------------------- 1 | from .methods import install_ply 2 | from .symbolic import X, sym_call 3 | -------------------------------------------------------------------------------- /pandas_ply/methods.py: -------------------------------------------------------------------------------- 1 | """This module contains the **pandas-ply** methods which are designed to be 2 | added to panda objects. The methods in this module should not be used directly. 3 | Instead, the function `install_ply` should be used to attach them to the pandas 4 | classes.""" 5 | 6 | from . import symbolic 7 | from .vendor.six import iteritems 8 | from .vendor.six.moves import reduce 9 | 10 | pandas = None 11 | 12 | 13 | def install_ply(pandas_to_use): 14 | """Install `pandas-ply` onto the objects in a copy of `pandas`.""" 15 | 16 | global pandas 17 | pandas = pandas_to_use 18 | 19 | pandas.DataFrame.ply_where = _ply_where 20 | pandas.DataFrame.ply_select = _ply_select 21 | 22 | pandas.Series.ply_where = _ply_where 23 | 24 | pandas.core.groupby.DataFrameGroupBy.ply_select = _ply_select_for_groups 25 | 26 | pandas.core.groupby.SeriesGroupBy.ply_select = _ply_select_for_groups 27 | 28 | 29 | def _ply_where(self, *conditions): 30 | """Filter a dataframe/series to only include rows/entries satisfying a 31 | given set of conditions. 32 | 33 | Analogous to SQL's ``WHERE``, or dplyr's ``filter``. 34 | 35 | Args: 36 | `*conditions`: Each should be a dataframe/series of booleans, a 37 | function returning such an object when run on the input dataframe, 38 | or a symbolic expression yielding such an object when evaluated 39 | with Symbol(0) mapped to the input dataframe. The input dataframe 40 | will be filtered by the AND of all the conditions. 41 | 42 | Example: 43 | >>> flights.ply_where(X.month == 1, X.day == 1) 44 | [ same result as `flights[(flights.month == 1) & (flights.day == 1)]` ] 45 | """ 46 | 47 | if not conditions: 48 | return self 49 | 50 | evalled_conditions = [symbolic.to_callable(condition)(self) 51 | for condition in conditions] 52 | anded_evalled_conditions = reduce( 53 | lambda x, y: x & y, evalled_conditions) 54 | return self[anded_evalled_conditions] 55 | 56 | 57 | def _ply_select(self, *args, **kwargs): 58 | """Transform a dataframe by selecting old columns and new (computed) 59 | columns. 60 | 61 | Analogous to SQL's ``SELECT``, or dplyr's ``select`` / ``rename`` / 62 | ``mutate`` / ``transmute``. 63 | 64 | Args: 65 | `*args`: Each should be one of: 66 | 67 | ``'*'`` 68 | says that all columns in the input dataframe should be 69 | included 70 | ``'column_name'`` 71 | says that `column_name` in the input dataframe should be 72 | included 73 | ``'-column_name'`` 74 | says that `column_name` in the input dataframe should be 75 | excluded. 76 | 77 | If any `'-column_name'` is present, then `'*'` should be 78 | present, and if `'*'` is present, no 'column_name' should be 79 | present. Column-includes and column-excludes should not overlap. 80 | `**kwargs`: Each argument name will be the name of a new column in the 81 | output dataframe, with the column's contents determined by the 82 | argument contents. These contents can be given as a dataframe, a 83 | function (taking the input dataframe as its single argument), or a 84 | symbolic expression (taking the input dataframe as ``Symbol(0)``). 85 | kwarg-provided columns override arg-provided columns. 86 | 87 | Example: 88 | >>> flights.ply_select('*', 89 | ... gain = X.arr_delay - X.dep_delay, 90 | ... speed = X.distance / X.air_time * 60) 91 | [ original dataframe, with two new computed columns added ] 92 | """ 93 | 94 | input_columns = set(self.columns) 95 | 96 | has_star = False 97 | include_columns = [] 98 | exclude_columns = [] 99 | for arg in args: 100 | if arg == '*': 101 | if has_star: 102 | raise ValueError('ply_select received repeated stars') 103 | has_star = True 104 | elif arg in input_columns: 105 | if arg in include_columns: 106 | raise ValueError( 107 | 'ply_select received a repeated column-include (%s)' % 108 | arg) 109 | include_columns.append(arg) 110 | elif arg[0] == '-' and arg[1:] in input_columns: 111 | if arg in exclude_columns: 112 | raise ValueError( 113 | 'ply_select received a repeated column-exclude (%s)' % 114 | arg[1:]) 115 | exclude_columns.append(arg[1:]) 116 | else: 117 | raise ValueError( 118 | 'ply_select received a strange argument (%s)' % 119 | arg) 120 | if exclude_columns and not has_star: 121 | raise ValueError( 122 | 'ply_select received column-excludes without an star') 123 | if has_star and include_columns: 124 | raise ValueError( 125 | 'ply_select received both an star and column-includes') 126 | if set(include_columns) & set(exclude_columns): 127 | raise ValueError( 128 | 'ply_select received overlapping column-includes and ' + 129 | 'column-excludes') 130 | 131 | include_columns_inc_star = self.columns if has_star else include_columns 132 | 133 | output_columns = [col for col in include_columns_inc_star 134 | if col not in exclude_columns] 135 | 136 | # Note: This maintains self's index even if output_columns is []. 137 | to_return = self[output_columns] 138 | 139 | # Temporarily disable SettingWithCopyWarning, as setting columns on a 140 | # copy (`to_return`) is intended here. 141 | with pandas.option_context('mode.chained_assignment', None): 142 | 143 | for column_name, column_value in iteritems(kwargs): 144 | evaluated_value = symbolic.to_callable(column_value)(self) 145 | # TODO: verify that evaluated_value is a series! 146 | if column_name == 'index': 147 | to_return.index = evaluated_value 148 | else: 149 | to_return[column_name] = evaluated_value 150 | 151 | return to_return 152 | 153 | 154 | # TODO: Ensure that an empty ply_select on a groupby returns a large dataframe 155 | def _ply_select_for_groups(self, **kwargs): 156 | """Summarize a grouped dataframe or series. 157 | 158 | Analogous to SQL's ``SELECT`` (when a ``GROUP BY`` is present), or dplyr's 159 | ``summarise``. 160 | 161 | Args: 162 | `**kwargs`: Each argument name will be the name of a new column in the 163 | output dataframe, with the column's contents determined by the 164 | argument contents. These contents can be given as a dataframe, a 165 | function (taking the input grouped dataframe as its single 166 | argument), or a symbolic expression (taking the input grouped 167 | dataframe as `Symbol(0)`). 168 | """ 169 | 170 | to_return = pandas.DataFrame() 171 | 172 | for column_name, column_value in iteritems(kwargs): 173 | evaluated_value = symbolic.to_callable(column_value)(self) 174 | if column_name == 'index': 175 | to_return.index = evaluated_value 176 | else: 177 | to_return[column_name] = evaluated_value 178 | 179 | return to_return 180 | 181 | 182 | class PlyDataFrame: 183 | """The following methods are added to `pandas.DataFrame`:""" 184 | 185 | ply_where = _ply_where 186 | ply_select = _ply_select 187 | 188 | 189 | class PlySeries: 190 | """The following methods are added to `pandas.Series`:""" 191 | 192 | ply_where = _ply_where 193 | 194 | 195 | class PlyDataFrameGroupBy: 196 | """The following methods are added to 197 | `pandas.core.groupby.DataFrameGroupBy`:""" 198 | 199 | ply_select = _ply_select_for_groups 200 | 201 | 202 | class PlySeriesGroupBy: 203 | """The following methods are added to 204 | `pandas.core.groupby.SeriesGroupBy`:""" 205 | 206 | ply_select = _ply_select_for_groups 207 | -------------------------------------------------------------------------------- /pandas_ply/symbolic.py: -------------------------------------------------------------------------------- 1 | """`ply.symbolic` is a simple system for building "symbolic expressions" to 2 | provide as arguments to **pandas-ply**'s methods (in place of lambda 3 | expressions).""" 4 | 5 | from .vendor.six import print_ 6 | from .vendor.six import iteritems 7 | 8 | 9 | class Expression(object): 10 | """`Expression` is the (abstract) base class for symbolic expressions. 11 | Symbolic expressions are encoded representations of Python expressions, 12 | kept on ice until you are ready to evaluate them. Operations on 13 | symbolic expressions (like `my_expr.some_attr` or `my_expr(some_arg)` or 14 | `my_expr + 7`) are automatically turned into symbolic representations 15 | thereof -- nothing is actually done until the special evaluation method 16 | `_eval` is called. 17 | """ 18 | 19 | def _eval(self, context, **options): 20 | """Evaluate a symbolic expression. 21 | 22 | Args: 23 | context: The context object for evaluation. Currently, this is a 24 | dictionary mapping symbol names to values, 25 | `**options`: Options for evaluation. Currently, the only option is 26 | `log`, which results in some debug output during evaluation if 27 | it is set to `True`. 28 | 29 | Returns: 30 | anything 31 | """ 32 | raise NotImplementedError 33 | 34 | def __repr__(self): 35 | raise NotImplementedError 36 | 37 | def __getattr__(self, name): 38 | """Construct a symbolic representation of `getattr(self, name)`.""" 39 | return GetAttr(self, name) 40 | 41 | def __call__(self, *args, **kwargs): 42 | """Construct a symbolic representation of `self(*args, **kwargs)`.""" 43 | return Call(self, args=args, kwargs=kwargs) 44 | 45 | # New-style classes skip __getattr__ for magic methods, so we must add them 46 | # explicitly: 47 | 48 | _magic_method_names = [ 49 | '__abs__', '__add__', '__and__', '__cmp__', '__complex__', '__contains__', 50 | '__delattr__', '__delete__', '__delitem__', '__delslice__', '__div__', 51 | '__divmod__', '__enter__', '__eq__', '__exit__', '__float__', 52 | '__floordiv__', '__ge__', '__get__', '__getitem__', '__getslice__', 53 | '__gt__', '__hash__', '__hex__', '__iadd__', '__iand__', '__idiv__', 54 | '__ifloordiv__', '__ilshift__', '__imod__', '__imul__', '__index__', 55 | '__int__', '__invert__', '__ior__', '__ipow__', '__irshift__', '__isub__', 56 | '__iter__', '__itruediv__', '__ixor__', '__le__', '__len__', '__long__', 57 | '__lshift__', '__lt__', '__mod__', '__mul__', '__ne__', '__neg__', 58 | '__nonzero__', '__oct__', '__or__', '__pos__', '__pow__', '__radd__', 59 | '__rand__', '__rcmp__', '__rdiv__', '__rdivmod__', '__repr__', 60 | '__reversed__', '__rfloordiv__', '__rlshift__', '__rmod__', '__rmul__', 61 | '__ror__', '__rpow__', '__rrshift__', '__rshift__', '__rsub__', 62 | '__rtruediv__', '__rxor__', '__set__', '__setitem__', '__setslice__', 63 | '__str__', '__sub__', '__truediv__', '__unicode__', '__xor__', 64 | ] 65 | 66 | # Not included: [ 67 | # '__call__', '__coerce__', '__del__', '__dict__', '__getattr__', 68 | # '__getattribute__', '__init__', '__new__', '__setattr__' 69 | # ] 70 | 71 | def _get_sym_magic_method(name): 72 | def magic_method(self, *args, **kwargs): 73 | return Call(GetAttr(self, name), args, kwargs) 74 | return magic_method 75 | 76 | for name in _magic_method_names: 77 | setattr(Expression, name, _get_sym_magic_method(name)) 78 | 79 | 80 | # Here are the varieties of atomic / compound Expression. 81 | 82 | 83 | class Symbol(Expression): 84 | """`Symbol(name)` is an atomic symbolic expression, labelled with an 85 | arbitrary `name`.""" 86 | 87 | def __init__(self, name): 88 | self._name = name 89 | 90 | def _eval(self, context, **options): 91 | if options.get('log'): 92 | print_('Symbol._eval', repr(self)) 93 | result = context[self._name] 94 | if options.get('log'): 95 | print_('Returning', repr(self), '=>', repr(result)) 96 | return result 97 | 98 | def __repr__(self): 99 | return 'Symbol(%s)' % repr(self._name) 100 | 101 | 102 | class GetAttr(Expression): 103 | """`GetAttr(obj, name)` is a symbolic expression representing the result of 104 | `getattr(obj, name)`. (`obj` and `name` can themselves be symbolic.)""" 105 | 106 | def __init__(self, obj, name): 107 | self._obj = obj 108 | self._name = name 109 | 110 | def _eval(self, context, **options): 111 | if options.get('log'): 112 | print_('GetAttr._eval', repr(self)) 113 | evaled_obj = eval_if_symbolic(self._obj, context, **options) 114 | result = getattr(evaled_obj, self._name) 115 | if options.get('log'): 116 | print_('Returning', repr(self), '=>', repr(result)) 117 | return result 118 | 119 | def __repr__(self): 120 | return 'getattr(%s, %s)' % (repr(self._obj), repr(self._name)) 121 | 122 | 123 | class Call(Expression): 124 | """`Call(func, args, kwargs)` is a symbolic expression representing the 125 | result of `func(*args, **kwargs)`. (`func`, each member of the `args` 126 | iterable, and each value in the `kwargs` dictionary can themselves be 127 | symbolic).""" 128 | 129 | def __init__(self, func, args=[], kwargs={}): 130 | self._func = func 131 | self._args = args 132 | self._kwargs = kwargs 133 | 134 | def _eval(self, context, **options): 135 | if options.get('log'): 136 | print_('Call._eval', repr(self)) 137 | evaled_func = eval_if_symbolic(self._func, context, **options) 138 | evaled_args = [eval_if_symbolic(v, context, **options) 139 | for v in self._args] 140 | evaled_kwargs = dict((k, eval_if_symbolic(v, context, **options)) 141 | for k, v in iteritems(self._kwargs)) 142 | result = evaled_func(*evaled_args, **evaled_kwargs) 143 | if options.get('log'): 144 | print_('Returning', repr(self), '=>', repr(result)) 145 | return result 146 | 147 | def __repr__(self): 148 | return '{func}(*{args}, **{kwargs})'.format( 149 | func=repr(self._func), 150 | args=repr(self._args), 151 | kwargs=repr(self._kwargs)) 152 | 153 | 154 | def eval_if_symbolic(obj, context, **options): 155 | """Evaluate an object if it is a symbolic expression, or otherwise just 156 | returns it back. 157 | 158 | Args: 159 | obj: Either a symbolic expression, or anything else (in which case this 160 | is a noop). 161 | context: Passed as an argument to `obj._eval` if `obj` is symbolic. 162 | `**options`: Passed as arguments to `obj._eval` if `obj` is symbolic. 163 | 164 | Returns: 165 | anything 166 | 167 | Examples: 168 | >>> eval_if_symbolic(Symbol('x'), {'x': 10}) 169 | 10 170 | >>> eval_if_symbolic(7, {'x': 10}) 171 | 7 172 | """ 173 | return obj._eval(context, **options) if hasattr(obj, '_eval') else obj 174 | 175 | 176 | def to_callable(obj): 177 | """Turn an object into a callable. 178 | 179 | Args: 180 | obj: This can be 181 | 182 | * **a symbolic expression**, in which case the output callable 183 | evaluates the expression with symbols taking values from the 184 | callable's arguments (listed arguments named according to their 185 | numerical index, keyword arguments named according to their 186 | string keys), 187 | * **a callable**, in which case the output callable is just the 188 | input object, or 189 | * **anything else**, in which case the output callable is a 190 | constant function which always returns the input object. 191 | 192 | Returns: 193 | callable 194 | 195 | Examples: 196 | >>> to_callable(Symbol(0) + Symbol('x'))(3, x=4) 197 | 7 198 | >>> to_callable(lambda x: x + 1)(10) 199 | 11 200 | >>> to_callable(12)(3, x=4) 201 | 12 202 | """ 203 | if hasattr(obj, '_eval'): 204 | return lambda *args, **kwargs: obj._eval(dict(enumerate(args), **kwargs)) 205 | elif callable(obj): 206 | return obj 207 | else: 208 | return lambda *args, **kwargs: obj 209 | 210 | 211 | def sym_call(func, *args, **kwargs): 212 | """Construct a symbolic representation of `func(*args, **kwargs)`. 213 | 214 | This is necessary because `func(symbolic)` will not (ordinarily) know to 215 | construct a symbolic expression when it receives the symbolic 216 | expression `symbolic` as a parameter (if `func` is not itself symbolic). 217 | So instead, we write `sym_call(func, symbolic)`. 218 | 219 | Tip: If the main argument of the function is a (symbolic) DataFrame, then 220 | pandas' `pipe` method takes care of this problem without `sym_call`. For 221 | instance, while `np.sqrt(X)` won't work, `X.pipe(np.sqrt)` will. 222 | 223 | Args: 224 | func: Function to call on evaluation (can be symbolic). 225 | `*args`: Arguments to provide to `func` on evaluation (can be symbolic). 226 | `**kwargs`: Keyword arguments to provide to `func` on evaluation (can be 227 | symbolic). 228 | 229 | Returns: 230 | `ply.symbolic.Expression` 231 | 232 | Example: 233 | >>> sym_call(math.sqrt, Symbol('x'))._eval({'x': 16}) 234 | 4 235 | """ 236 | 237 | return Call(func, args=args, kwargs=kwargs) 238 | 239 | X = Symbol(0) 240 | """A Symbol for "the first argument" (for convenience).""" 241 | -------------------------------------------------------------------------------- /pandas_ply/vendor/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/coursera/pandas-ply/2a043c620b35ffb32247c559e1b46cbe22064ebd/pandas_ply/vendor/__init__.py -------------------------------------------------------------------------------- /pandas_ply/vendor/six.py: -------------------------------------------------------------------------------- 1 | """Utilities for writing code that runs on Python 2 and 3""" 2 | 3 | # Copyright (c) 2010-2014 Benjamin Peterson 4 | # 5 | # Permission is hereby granted, free of charge, to any person obtaining a copy 6 | # of this software and associated documentation files (the "Software"), to deal 7 | # in the Software without restriction, including without limitation the rights 8 | # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | # copies of the Software, and to permit persons to whom the Software is 10 | # furnished to do so, subject to the following conditions: 11 | # 12 | # The above copyright notice and this permission notice shall be included in all 13 | # copies or substantial portions of the Software. 14 | # 15 | # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | # AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | # SOFTWARE. 22 | 23 | from __future__ import absolute_import 24 | 25 | import functools 26 | import operator 27 | import sys 28 | import types 29 | 30 | __author__ = "Benjamin Peterson " 31 | __version__ = "1.8.0" 32 | 33 | 34 | # Useful for very coarse version differentiation. 35 | PY2 = sys.version_info[0] == 2 36 | PY3 = sys.version_info[0] == 3 37 | 38 | if PY3: 39 | string_types = str, 40 | integer_types = int, 41 | class_types = type, 42 | text_type = str 43 | binary_type = bytes 44 | 45 | MAXSIZE = sys.maxsize 46 | else: 47 | string_types = basestring, 48 | integer_types = (int, long) 49 | class_types = (type, types.ClassType) 50 | text_type = unicode 51 | binary_type = str 52 | 53 | if sys.platform.startswith("java"): 54 | # Jython always uses 32 bits. 55 | MAXSIZE = int((1 << 31) - 1) 56 | else: 57 | # It's possible to have sizeof(long) != sizeof(Py_ssize_t). 58 | class X(object): 59 | def __len__(self): 60 | return 1 << 31 61 | try: 62 | len(X()) 63 | except OverflowError: 64 | # 32-bit 65 | MAXSIZE = int((1 << 31) - 1) 66 | else: 67 | # 64-bit 68 | MAXSIZE = int((1 << 63) - 1) 69 | del X 70 | 71 | 72 | def _add_doc(func, doc): 73 | """Add documentation to a function.""" 74 | func.__doc__ = doc 75 | 76 | 77 | def _import_module(name): 78 | """Import module, returning the module after the last dot.""" 79 | __import__(name) 80 | return sys.modules[name] 81 | 82 | 83 | class _LazyDescr(object): 84 | 85 | def __init__(self, name): 86 | self.name = name 87 | 88 | def __get__(self, obj, tp): 89 | result = self._resolve() 90 | setattr(obj, self.name, result) # Invokes __set__. 91 | # This is a bit ugly, but it avoids running this again. 92 | delattr(obj.__class__, self.name) 93 | return result 94 | 95 | 96 | class MovedModule(_LazyDescr): 97 | 98 | def __init__(self, name, old, new=None): 99 | super(MovedModule, self).__init__(name) 100 | if PY3: 101 | if new is None: 102 | new = name 103 | self.mod = new 104 | else: 105 | self.mod = old 106 | 107 | def _resolve(self): 108 | return _import_module(self.mod) 109 | 110 | def __getattr__(self, attr): 111 | _module = self._resolve() 112 | value = getattr(_module, attr) 113 | setattr(self, attr, value) 114 | return value 115 | 116 | 117 | class _LazyModule(types.ModuleType): 118 | 119 | def __init__(self, name): 120 | super(_LazyModule, self).__init__(name) 121 | self.__doc__ = self.__class__.__doc__ 122 | 123 | def __dir__(self): 124 | attrs = ["__doc__", "__name__"] 125 | attrs += [attr.name for attr in self._moved_attributes] 126 | return attrs 127 | 128 | # Subclasses should override this 129 | _moved_attributes = [] 130 | 131 | 132 | class MovedAttribute(_LazyDescr): 133 | 134 | def __init__(self, name, old_mod, new_mod, old_attr=None, new_attr=None): 135 | super(MovedAttribute, self).__init__(name) 136 | if PY3: 137 | if new_mod is None: 138 | new_mod = name 139 | self.mod = new_mod 140 | if new_attr is None: 141 | if old_attr is None: 142 | new_attr = name 143 | else: 144 | new_attr = old_attr 145 | self.attr = new_attr 146 | else: 147 | self.mod = old_mod 148 | if old_attr is None: 149 | old_attr = name 150 | self.attr = old_attr 151 | 152 | def _resolve(self): 153 | module = _import_module(self.mod) 154 | return getattr(module, self.attr) 155 | 156 | 157 | class _SixMetaPathImporter(object): 158 | """ 159 | A meta path importer to import six.moves and its submodules. 160 | 161 | This class implements a PEP302 finder and loader. It should be compatible 162 | with Python 2.5 and all existing versions of Python3 163 | """ 164 | def __init__(self, six_module_name): 165 | self.name = six_module_name 166 | self.known_modules = {} 167 | 168 | def _add_module(self, mod, *fullnames): 169 | for fullname in fullnames: 170 | self.known_modules[self.name + "." + fullname] = mod 171 | 172 | def _get_module(self, fullname): 173 | return self.known_modules[self.name + "." + fullname] 174 | 175 | def find_module(self, fullname, path=None): 176 | if fullname in self.known_modules: 177 | return self 178 | return None 179 | 180 | def __get_module(self, fullname): 181 | try: 182 | return self.known_modules[fullname] 183 | except KeyError: 184 | raise ImportError("This loader does not know module " + fullname) 185 | 186 | def load_module(self, fullname): 187 | try: 188 | # in case of a reload 189 | return sys.modules[fullname] 190 | except KeyError: 191 | pass 192 | mod = self.__get_module(fullname) 193 | if isinstance(mod, MovedModule): 194 | mod = mod._resolve() 195 | else: 196 | mod.__loader__ = self 197 | sys.modules[fullname] = mod 198 | return mod 199 | 200 | def is_package(self, fullname): 201 | """ 202 | Return true, if the named module is a package. 203 | 204 | We need this method to get correct spec objects with 205 | Python 3.4 (see PEP451) 206 | """ 207 | return hasattr(self.__get_module(fullname), "__path__") 208 | 209 | def get_code(self, fullname): 210 | """Return None 211 | 212 | Required, if is_package is implemented""" 213 | self.__get_module(fullname) # eventually raises ImportError 214 | return None 215 | get_source = get_code # same as get_code 216 | 217 | _importer = _SixMetaPathImporter(__name__) 218 | 219 | 220 | class _MovedItems(_LazyModule): 221 | """Lazy loading of moved objects""" 222 | __path__ = [] # mark as package 223 | 224 | 225 | _moved_attributes = [ 226 | MovedAttribute("cStringIO", "cStringIO", "io", "StringIO"), 227 | MovedAttribute("filter", "itertools", "builtins", "ifilter", "filter"), 228 | MovedAttribute("filterfalse", "itertools", "itertools", "ifilterfalse", "filterfalse"), 229 | MovedAttribute("input", "__builtin__", "builtins", "raw_input", "input"), 230 | MovedAttribute("intern", "__builtin__", "sys"), 231 | MovedAttribute("map", "itertools", "builtins", "imap", "map"), 232 | MovedAttribute("range", "__builtin__", "builtins", "xrange", "range"), 233 | MovedAttribute("reload_module", "__builtin__", "imp", "reload"), 234 | MovedAttribute("reduce", "__builtin__", "functools"), 235 | MovedAttribute("shlex_quote", "pipes", "shlex", "quote"), 236 | MovedAttribute("StringIO", "StringIO", "io"), 237 | MovedAttribute("UserDict", "UserDict", "collections"), 238 | MovedAttribute("UserList", "UserList", "collections"), 239 | MovedAttribute("UserString", "UserString", "collections"), 240 | MovedAttribute("xrange", "__builtin__", "builtins", "xrange", "range"), 241 | MovedAttribute("zip", "itertools", "builtins", "izip", "zip"), 242 | MovedAttribute("zip_longest", "itertools", "itertools", "izip_longest", "zip_longest"), 243 | 244 | MovedModule("builtins", "__builtin__"), 245 | MovedModule("configparser", "ConfigParser"), 246 | MovedModule("copyreg", "copy_reg"), 247 | MovedModule("dbm_gnu", "gdbm", "dbm.gnu"), 248 | MovedModule("_dummy_thread", "dummy_thread", "_dummy_thread"), 249 | MovedModule("http_cookiejar", "cookielib", "http.cookiejar"), 250 | MovedModule("http_cookies", "Cookie", "http.cookies"), 251 | MovedModule("html_entities", "htmlentitydefs", "html.entities"), 252 | MovedModule("html_parser", "HTMLParser", "html.parser"), 253 | MovedModule("http_client", "httplib", "http.client"), 254 | MovedModule("email_mime_multipart", "email.MIMEMultipart", "email.mime.multipart"), 255 | MovedModule("email_mime_nonmultipart", "email.MIMENonMultipart", "email.mime.nonmultipart"), 256 | MovedModule("email_mime_text", "email.MIMEText", "email.mime.text"), 257 | MovedModule("email_mime_base", "email.MIMEBase", "email.mime.base"), 258 | MovedModule("BaseHTTPServer", "BaseHTTPServer", "http.server"), 259 | MovedModule("CGIHTTPServer", "CGIHTTPServer", "http.server"), 260 | MovedModule("SimpleHTTPServer", "SimpleHTTPServer", "http.server"), 261 | MovedModule("cPickle", "cPickle", "pickle"), 262 | MovedModule("queue", "Queue"), 263 | MovedModule("reprlib", "repr"), 264 | MovedModule("socketserver", "SocketServer"), 265 | MovedModule("_thread", "thread", "_thread"), 266 | MovedModule("tkinter", "Tkinter"), 267 | MovedModule("tkinter_dialog", "Dialog", "tkinter.dialog"), 268 | MovedModule("tkinter_filedialog", "FileDialog", "tkinter.filedialog"), 269 | MovedModule("tkinter_scrolledtext", "ScrolledText", "tkinter.scrolledtext"), 270 | MovedModule("tkinter_simpledialog", "SimpleDialog", "tkinter.simpledialog"), 271 | MovedModule("tkinter_tix", "Tix", "tkinter.tix"), 272 | MovedModule("tkinter_ttk", "ttk", "tkinter.ttk"), 273 | MovedModule("tkinter_constants", "Tkconstants", "tkinter.constants"), 274 | MovedModule("tkinter_dnd", "Tkdnd", "tkinter.dnd"), 275 | MovedModule("tkinter_colorchooser", "tkColorChooser", 276 | "tkinter.colorchooser"), 277 | MovedModule("tkinter_commondialog", "tkCommonDialog", 278 | "tkinter.commondialog"), 279 | MovedModule("tkinter_tkfiledialog", "tkFileDialog", "tkinter.filedialog"), 280 | MovedModule("tkinter_font", "tkFont", "tkinter.font"), 281 | MovedModule("tkinter_messagebox", "tkMessageBox", "tkinter.messagebox"), 282 | MovedModule("tkinter_tksimpledialog", "tkSimpleDialog", 283 | "tkinter.simpledialog"), 284 | MovedModule("urllib_parse", __name__ + ".moves.urllib_parse", "urllib.parse"), 285 | MovedModule("urllib_error", __name__ + ".moves.urllib_error", "urllib.error"), 286 | MovedModule("urllib", __name__ + ".moves.urllib", __name__ + ".moves.urllib"), 287 | MovedModule("urllib_robotparser", "robotparser", "urllib.robotparser"), 288 | MovedModule("xmlrpc_client", "xmlrpclib", "xmlrpc.client"), 289 | MovedModule("xmlrpc_server", "SimpleXMLRPCServer", "xmlrpc.server"), 290 | MovedModule("winreg", "_winreg"), 291 | ] 292 | for attr in _moved_attributes: 293 | setattr(_MovedItems, attr.name, attr) 294 | if isinstance(attr, MovedModule): 295 | _importer._add_module(attr, "moves." + attr.name) 296 | del attr 297 | 298 | _MovedItems._moved_attributes = _moved_attributes 299 | 300 | moves = _MovedItems(__name__ + ".moves") 301 | _importer._add_module(moves, "moves") 302 | 303 | 304 | class Module_six_moves_urllib_parse(_LazyModule): 305 | """Lazy loading of moved objects in six.moves.urllib_parse""" 306 | 307 | 308 | _urllib_parse_moved_attributes = [ 309 | MovedAttribute("ParseResult", "urlparse", "urllib.parse"), 310 | MovedAttribute("SplitResult", "urlparse", "urllib.parse"), 311 | MovedAttribute("parse_qs", "urlparse", "urllib.parse"), 312 | MovedAttribute("parse_qsl", "urlparse", "urllib.parse"), 313 | MovedAttribute("urldefrag", "urlparse", "urllib.parse"), 314 | MovedAttribute("urljoin", "urlparse", "urllib.parse"), 315 | MovedAttribute("urlparse", "urlparse", "urllib.parse"), 316 | MovedAttribute("urlsplit", "urlparse", "urllib.parse"), 317 | MovedAttribute("urlunparse", "urlparse", "urllib.parse"), 318 | MovedAttribute("urlunsplit", "urlparse", "urllib.parse"), 319 | MovedAttribute("quote", "urllib", "urllib.parse"), 320 | MovedAttribute("quote_plus", "urllib", "urllib.parse"), 321 | MovedAttribute("unquote", "urllib", "urllib.parse"), 322 | MovedAttribute("unquote_plus", "urllib", "urllib.parse"), 323 | MovedAttribute("urlencode", "urllib", "urllib.parse"), 324 | MovedAttribute("splitquery", "urllib", "urllib.parse"), 325 | MovedAttribute("splittag", "urllib", "urllib.parse"), 326 | MovedAttribute("splituser", "urllib", "urllib.parse"), 327 | MovedAttribute("uses_fragment", "urlparse", "urllib.parse"), 328 | MovedAttribute("uses_netloc", "urlparse", "urllib.parse"), 329 | MovedAttribute("uses_params", "urlparse", "urllib.parse"), 330 | MovedAttribute("uses_query", "urlparse", "urllib.parse"), 331 | MovedAttribute("uses_relative", "urlparse", "urllib.parse"), 332 | ] 333 | for attr in _urllib_parse_moved_attributes: 334 | setattr(Module_six_moves_urllib_parse, attr.name, attr) 335 | del attr 336 | 337 | Module_six_moves_urllib_parse._moved_attributes = _urllib_parse_moved_attributes 338 | 339 | _importer._add_module(Module_six_moves_urllib_parse(__name__ + ".moves.urllib_parse"), 340 | "moves.urllib_parse", "moves.urllib.parse") 341 | 342 | 343 | class Module_six_moves_urllib_error(_LazyModule): 344 | """Lazy loading of moved objects in six.moves.urllib_error""" 345 | 346 | 347 | _urllib_error_moved_attributes = [ 348 | MovedAttribute("URLError", "urllib2", "urllib.error"), 349 | MovedAttribute("HTTPError", "urllib2", "urllib.error"), 350 | MovedAttribute("ContentTooShortError", "urllib", "urllib.error"), 351 | ] 352 | for attr in _urllib_error_moved_attributes: 353 | setattr(Module_six_moves_urllib_error, attr.name, attr) 354 | del attr 355 | 356 | Module_six_moves_urllib_error._moved_attributes = _urllib_error_moved_attributes 357 | 358 | _importer._add_module(Module_six_moves_urllib_error(__name__ + ".moves.urllib.error"), 359 | "moves.urllib_error", "moves.urllib.error") 360 | 361 | 362 | class Module_six_moves_urllib_request(_LazyModule): 363 | """Lazy loading of moved objects in six.moves.urllib_request""" 364 | 365 | 366 | _urllib_request_moved_attributes = [ 367 | MovedAttribute("urlopen", "urllib2", "urllib.request"), 368 | MovedAttribute("install_opener", "urllib2", "urllib.request"), 369 | MovedAttribute("build_opener", "urllib2", "urllib.request"), 370 | MovedAttribute("pathname2url", "urllib", "urllib.request"), 371 | MovedAttribute("url2pathname", "urllib", "urllib.request"), 372 | MovedAttribute("getproxies", "urllib", "urllib.request"), 373 | MovedAttribute("Request", "urllib2", "urllib.request"), 374 | MovedAttribute("OpenerDirector", "urllib2", "urllib.request"), 375 | MovedAttribute("HTTPDefaultErrorHandler", "urllib2", "urllib.request"), 376 | MovedAttribute("HTTPRedirectHandler", "urllib2", "urllib.request"), 377 | MovedAttribute("HTTPCookieProcessor", "urllib2", "urllib.request"), 378 | MovedAttribute("ProxyHandler", "urllib2", "urllib.request"), 379 | MovedAttribute("BaseHandler", "urllib2", "urllib.request"), 380 | MovedAttribute("HTTPPasswordMgr", "urllib2", "urllib.request"), 381 | MovedAttribute("HTTPPasswordMgrWithDefaultRealm", "urllib2", "urllib.request"), 382 | MovedAttribute("AbstractBasicAuthHandler", "urllib2", "urllib.request"), 383 | MovedAttribute("HTTPBasicAuthHandler", "urllib2", "urllib.request"), 384 | MovedAttribute("ProxyBasicAuthHandler", "urllib2", "urllib.request"), 385 | MovedAttribute("AbstractDigestAuthHandler", "urllib2", "urllib.request"), 386 | MovedAttribute("HTTPDigestAuthHandler", "urllib2", "urllib.request"), 387 | MovedAttribute("ProxyDigestAuthHandler", "urllib2", "urllib.request"), 388 | MovedAttribute("HTTPHandler", "urllib2", "urllib.request"), 389 | MovedAttribute("HTTPSHandler", "urllib2", "urllib.request"), 390 | MovedAttribute("FileHandler", "urllib2", "urllib.request"), 391 | MovedAttribute("FTPHandler", "urllib2", "urllib.request"), 392 | MovedAttribute("CacheFTPHandler", "urllib2", "urllib.request"), 393 | MovedAttribute("UnknownHandler", "urllib2", "urllib.request"), 394 | MovedAttribute("HTTPErrorProcessor", "urllib2", "urllib.request"), 395 | MovedAttribute("urlretrieve", "urllib", "urllib.request"), 396 | MovedAttribute("urlcleanup", "urllib", "urllib.request"), 397 | MovedAttribute("URLopener", "urllib", "urllib.request"), 398 | MovedAttribute("FancyURLopener", "urllib", "urllib.request"), 399 | MovedAttribute("proxy_bypass", "urllib", "urllib.request"), 400 | ] 401 | for attr in _urllib_request_moved_attributes: 402 | setattr(Module_six_moves_urllib_request, attr.name, attr) 403 | del attr 404 | 405 | Module_six_moves_urllib_request._moved_attributes = _urllib_request_moved_attributes 406 | 407 | _importer._add_module(Module_six_moves_urllib_request(__name__ + ".moves.urllib.request"), 408 | "moves.urllib_request", "moves.urllib.request") 409 | 410 | 411 | class Module_six_moves_urllib_response(_LazyModule): 412 | """Lazy loading of moved objects in six.moves.urllib_response""" 413 | 414 | 415 | _urllib_response_moved_attributes = [ 416 | MovedAttribute("addbase", "urllib", "urllib.response"), 417 | MovedAttribute("addclosehook", "urllib", "urllib.response"), 418 | MovedAttribute("addinfo", "urllib", "urllib.response"), 419 | MovedAttribute("addinfourl", "urllib", "urllib.response"), 420 | ] 421 | for attr in _urllib_response_moved_attributes: 422 | setattr(Module_six_moves_urllib_response, attr.name, attr) 423 | del attr 424 | 425 | Module_six_moves_urllib_response._moved_attributes = _urllib_response_moved_attributes 426 | 427 | _importer._add_module(Module_six_moves_urllib_response(__name__ + ".moves.urllib.response"), 428 | "moves.urllib_response", "moves.urllib.response") 429 | 430 | 431 | class Module_six_moves_urllib_robotparser(_LazyModule): 432 | """Lazy loading of moved objects in six.moves.urllib_robotparser""" 433 | 434 | 435 | _urllib_robotparser_moved_attributes = [ 436 | MovedAttribute("RobotFileParser", "robotparser", "urllib.robotparser"), 437 | ] 438 | for attr in _urllib_robotparser_moved_attributes: 439 | setattr(Module_six_moves_urllib_robotparser, attr.name, attr) 440 | del attr 441 | 442 | Module_six_moves_urllib_robotparser._moved_attributes = _urllib_robotparser_moved_attributes 443 | 444 | _importer._add_module(Module_six_moves_urllib_robotparser(__name__ + ".moves.urllib.robotparser"), 445 | "moves.urllib_robotparser", "moves.urllib.robotparser") 446 | 447 | 448 | class Module_six_moves_urllib(types.ModuleType): 449 | """Create a six.moves.urllib namespace that resembles the Python 3 namespace""" 450 | __path__ = [] # mark as package 451 | parse = _importer._get_module("moves.urllib_parse") 452 | error = _importer._get_module("moves.urllib_error") 453 | request = _importer._get_module("moves.urllib_request") 454 | response = _importer._get_module("moves.urllib_response") 455 | robotparser = _importer._get_module("moves.urllib_robotparser") 456 | 457 | def __dir__(self): 458 | return ['parse', 'error', 'request', 'response', 'robotparser'] 459 | 460 | _importer._add_module(Module_six_moves_urllib(__name__ + ".moves.urllib"), 461 | "moves.urllib") 462 | 463 | 464 | def add_move(move): 465 | """Add an item to six.moves.""" 466 | setattr(_MovedItems, move.name, move) 467 | 468 | 469 | def remove_move(name): 470 | """Remove item from six.moves.""" 471 | try: 472 | delattr(_MovedItems, name) 473 | except AttributeError: 474 | try: 475 | del moves.__dict__[name] 476 | except KeyError: 477 | raise AttributeError("no such move, %r" % (name,)) 478 | 479 | 480 | if PY3: 481 | _meth_func = "__func__" 482 | _meth_self = "__self__" 483 | 484 | _func_closure = "__closure__" 485 | _func_code = "__code__" 486 | _func_defaults = "__defaults__" 487 | _func_globals = "__globals__" 488 | else: 489 | _meth_func = "im_func" 490 | _meth_self = "im_self" 491 | 492 | _func_closure = "func_closure" 493 | _func_code = "func_code" 494 | _func_defaults = "func_defaults" 495 | _func_globals = "func_globals" 496 | 497 | 498 | try: 499 | advance_iterator = next 500 | except NameError: 501 | def advance_iterator(it): 502 | return it.next() 503 | next = advance_iterator 504 | 505 | 506 | try: 507 | callable = callable 508 | except NameError: 509 | def callable(obj): 510 | return any("__call__" in klass.__dict__ for klass in type(obj).__mro__) 511 | 512 | 513 | if PY3: 514 | def get_unbound_function(unbound): 515 | return unbound 516 | 517 | create_bound_method = types.MethodType 518 | 519 | Iterator = object 520 | else: 521 | def get_unbound_function(unbound): 522 | return unbound.im_func 523 | 524 | def create_bound_method(func, obj): 525 | return types.MethodType(func, obj, obj.__class__) 526 | 527 | class Iterator(object): 528 | 529 | def next(self): 530 | return type(self).__next__(self) 531 | 532 | callable = callable 533 | _add_doc(get_unbound_function, 534 | """Get the function out of a possibly unbound function""") 535 | 536 | 537 | get_method_function = operator.attrgetter(_meth_func) 538 | get_method_self = operator.attrgetter(_meth_self) 539 | get_function_closure = operator.attrgetter(_func_closure) 540 | get_function_code = operator.attrgetter(_func_code) 541 | get_function_defaults = operator.attrgetter(_func_defaults) 542 | get_function_globals = operator.attrgetter(_func_globals) 543 | 544 | 545 | if PY3: 546 | def iterkeys(d, **kw): 547 | return iter(d.keys(**kw)) 548 | 549 | def itervalues(d, **kw): 550 | return iter(d.values(**kw)) 551 | 552 | def iteritems(d, **kw): 553 | return iter(d.items(**kw)) 554 | 555 | def iterlists(d, **kw): 556 | return iter(d.lists(**kw)) 557 | else: 558 | def iterkeys(d, **kw): 559 | return iter(d.iterkeys(**kw)) 560 | 561 | def itervalues(d, **kw): 562 | return iter(d.itervalues(**kw)) 563 | 564 | def iteritems(d, **kw): 565 | return iter(d.iteritems(**kw)) 566 | 567 | def iterlists(d, **kw): 568 | return iter(d.iterlists(**kw)) 569 | 570 | _add_doc(iterkeys, "Return an iterator over the keys of a dictionary.") 571 | _add_doc(itervalues, "Return an iterator over the values of a dictionary.") 572 | _add_doc(iteritems, 573 | "Return an iterator over the (key, value) pairs of a dictionary.") 574 | _add_doc(iterlists, 575 | "Return an iterator over the (key, [values]) pairs of a dictionary.") 576 | 577 | 578 | if PY3: 579 | def b(s): 580 | return s.encode("latin-1") 581 | def u(s): 582 | return s 583 | unichr = chr 584 | if sys.version_info[1] <= 1: 585 | def int2byte(i): 586 | return bytes((i,)) 587 | else: 588 | # This is about 2x faster than the implementation above on 3.2+ 589 | int2byte = operator.methodcaller("to_bytes", 1, "big") 590 | byte2int = operator.itemgetter(0) 591 | indexbytes = operator.getitem 592 | iterbytes = iter 593 | import io 594 | StringIO = io.StringIO 595 | BytesIO = io.BytesIO 596 | else: 597 | def b(s): 598 | return s 599 | # Workaround for standalone backslash 600 | def u(s): 601 | return unicode(s.replace(r'\\', r'\\\\'), "unicode_escape") 602 | unichr = unichr 603 | int2byte = chr 604 | def byte2int(bs): 605 | return ord(bs[0]) 606 | def indexbytes(buf, i): 607 | return ord(buf[i]) 608 | def iterbytes(buf): 609 | return (ord(byte) for byte in buf) 610 | import StringIO 611 | StringIO = BytesIO = StringIO.StringIO 612 | _add_doc(b, """Byte literal""") 613 | _add_doc(u, """Text literal""") 614 | 615 | 616 | if PY3: 617 | exec_ = getattr(moves.builtins, "exec") 618 | 619 | 620 | def reraise(tp, value, tb=None): 621 | if value is None: 622 | value = tp() 623 | if value.__traceback__ is not tb: 624 | raise value.with_traceback(tb) 625 | raise value 626 | 627 | else: 628 | def exec_(_code_, _globs_=None, _locs_=None): 629 | """Execute code in a namespace.""" 630 | if _globs_ is None: 631 | frame = sys._getframe(1) 632 | _globs_ = frame.f_globals 633 | if _locs_ is None: 634 | _locs_ = frame.f_locals 635 | del frame 636 | elif _locs_ is None: 637 | _locs_ = _globs_ 638 | exec("""exec _code_ in _globs_, _locs_""") 639 | 640 | 641 | exec_("""def reraise(tp, value, tb=None): 642 | raise tp, value, tb 643 | """) 644 | 645 | 646 | print_ = getattr(moves.builtins, "print", None) 647 | if print_ is None: 648 | def print_(*args, **kwargs): 649 | """The new-style print function for Python 2.4 and 2.5.""" 650 | fp = kwargs.pop("file", sys.stdout) 651 | if fp is None: 652 | return 653 | def write(data): 654 | if not isinstance(data, basestring): 655 | data = str(data) 656 | # If the file has an encoding, encode unicode with it. 657 | if (isinstance(fp, file) and 658 | isinstance(data, unicode) and 659 | fp.encoding is not None): 660 | errors = getattr(fp, "errors", None) 661 | if errors is None: 662 | errors = "strict" 663 | data = data.encode(fp.encoding, errors) 664 | fp.write(data) 665 | want_unicode = False 666 | sep = kwargs.pop("sep", None) 667 | if sep is not None: 668 | if isinstance(sep, unicode): 669 | want_unicode = True 670 | elif not isinstance(sep, str): 671 | raise TypeError("sep must be None or a string") 672 | end = kwargs.pop("end", None) 673 | if end is not None: 674 | if isinstance(end, unicode): 675 | want_unicode = True 676 | elif not isinstance(end, str): 677 | raise TypeError("end must be None or a string") 678 | if kwargs: 679 | raise TypeError("invalid keyword arguments to print()") 680 | if not want_unicode: 681 | for arg in args: 682 | if isinstance(arg, unicode): 683 | want_unicode = True 684 | break 685 | if want_unicode: 686 | newline = unicode("\n") 687 | space = unicode(" ") 688 | else: 689 | newline = "\n" 690 | space = " " 691 | if sep is None: 692 | sep = space 693 | if end is None: 694 | end = newline 695 | for i, arg in enumerate(args): 696 | if i: 697 | write(sep) 698 | write(arg) 699 | write(end) 700 | 701 | _add_doc(reraise, """Reraise an exception.""") 702 | 703 | if sys.version_info[0:2] < (3, 4): 704 | def wraps(wrapped, assigned=functools.WRAPPER_ASSIGNMENTS, 705 | updated=functools.WRAPPER_UPDATES): 706 | def wrapper(f): 707 | f = functools.wraps(wrapped)(f) 708 | f.__wrapped__ = wrapped 709 | return f 710 | return wrapper 711 | else: 712 | wraps = functools.wraps 713 | 714 | def with_metaclass(meta, *bases): 715 | """Create a base class with a metaclass.""" 716 | # This requires a bit of explanation: the basic idea is to make a dummy 717 | # metaclass for one level of class instantiation that replaces itself with 718 | # the actual metaclass. 719 | class metaclass(meta): 720 | def __new__(cls, name, this_bases, d): 721 | return meta(name, bases, d) 722 | return type.__new__(metaclass, 'temporary_class', (), {}) 723 | 724 | 725 | def add_metaclass(metaclass): 726 | """Class decorator for creating a class with a metaclass.""" 727 | def wrapper(cls): 728 | orig_vars = cls.__dict__.copy() 729 | slots = orig_vars.get('__slots__') 730 | if slots is not None: 731 | if isinstance(slots, str): 732 | slots = [slots] 733 | for slots_var in slots: 734 | orig_vars.pop(slots_var) 735 | orig_vars.pop('__dict__', None) 736 | orig_vars.pop('__weakref__', None) 737 | return metaclass(cls.__name__, cls.__bases__, orig_vars) 738 | return wrapper 739 | 740 | # Complete the moves implementation. 741 | # This code is at the end of this module to speed up module loading. 742 | # Turn this module into a package. 743 | __path__ = [] # required for PEP 302 and PEP 451 744 | __package__ = __name__ # see PEP 366 @ReservedAssignment 745 | if globals().get("__spec__") is not None: 746 | __spec__.submodule_search_locations = [] # PEP 451 @UndefinedVariable 747 | # Remove other six meta path importers, since they cause problems. This can 748 | # happen if six is removed from sys.modules and then reloaded. (Setuptools does 749 | # this for some reason.) 750 | if sys.meta_path: 751 | for i, importer in enumerate(sys.meta_path): 752 | # Here's some real nastiness: Another "instance" of the six module might 753 | # be floating around. Therefore, we can't use isinstance() to check for 754 | # the six meta path importer, since the other six instance will have 755 | # inserted an importer with different class. 756 | if (type(importer).__name__ == "_SixMetaPathImporter" and 757 | importer.name == __name__): 758 | del sys.meta_path[i] 759 | break 760 | del i, importer 761 | # Finally, add the importer to the meta path import hook. 762 | sys.meta_path.append(_importer) 763 | -------------------------------------------------------------------------------- /setup.py: -------------------------------------------------------------------------------- 1 | from distutils.core import setup 2 | setup( 3 | name = 'pandas-ply', 4 | version = '0.2.1', 5 | author = 'Coursera Inc.', 6 | author_email = 'pandas-ply@coursera.org', 7 | packages = [ 8 | 'pandas_ply', 9 | 'pandas_ply.vendor', 10 | ], 11 | description = 'functional data manipulation for pandas', 12 | long_description = open('README.rst').read(), 13 | license = 'Apache License 2.0', 14 | url = 'https://github.com/coursera/pandas-ply', 15 | classifiers = [], 16 | ) 17 | -------------------------------------------------------------------------------- /tests/test_methods.py: -------------------------------------------------------------------------------- 1 | import sys 2 | if sys.version_info < (2, 7): 3 | import unittest2 as unittest 4 | else: 5 | import unittest 6 | 7 | from pandas.util.testing import assert_frame_equal 8 | from pandas.util.testing import assert_series_equal 9 | from pandas_ply.methods import install_ply 10 | from pandas_ply.symbolic import X 11 | import pandas as pd 12 | 13 | install_ply(pd) 14 | 15 | 16 | def assert_frame_equiv(df1, df2, **kwargs): 17 | """ Assert that two dataframes are equal, ignoring ordering of columns. 18 | 19 | See http://stackoverflow.com/questions/14224172/equality-in-pandas- 20 | dataframes-column-order-matters 21 | """ 22 | return assert_frame_equal( 23 | df1.sort(axis=1), 24 | df2.sort(axis=1), 25 | check_names=True, **kwargs) 26 | 27 | test_df = pd.DataFrame( 28 | {'x': [1, 2, 3, 4], 'y': [4, 3, 2, 1]}, 29 | columns=['x', 'y']) 30 | test_series = pd.Series([1, 2, 3, 4]) 31 | 32 | test_dfsq = pd.DataFrame( 33 | {'x': [-2, -1, 0, 1, 2], 'xsq': [4, 1, 0, 1, 4]}, 34 | columns=['x', 'xsq']) 35 | 36 | 37 | class PlyWhereTest(unittest.TestCase): 38 | 39 | def test_no_conditions(self): 40 | assert_frame_equal(test_df.ply_where(), test_df) 41 | 42 | def test_single_condition(self): 43 | expected = pd.DataFrame( 44 | {'x': [3, 4], 'y': [2, 1]}, 45 | index=[2, 3], 46 | columns=['x', 'y']) 47 | 48 | assert_frame_equal(test_df.ply_where(test_df.x > 2.5), expected) 49 | assert_frame_equal(test_df.ply_where(lambda df: df.x > 2.5), expected) 50 | assert_frame_equal(test_df.ply_where(X.x > 2.5), expected) 51 | 52 | def test_multiple_conditions(self): 53 | expected = pd.DataFrame( 54 | {'x': [2, 3], 'y': [3, 2]}, 55 | index=[1, 2], 56 | columns=['x', 'y']) 57 | 58 | lo_df = test_df.x > 1.5 59 | hi_df = test_df.x < 3.5 60 | lo_func = lambda df: df.x > 1.5 61 | hi_func = lambda df: df.x < 3.5 62 | lo_sym = X.x > 1.5 63 | hi_sym = X.x < 3.5 64 | 65 | for lo in [lo_df, lo_func, lo_sym]: 66 | for hi in [hi_df, hi_func, hi_sym]: 67 | assert_frame_equal(test_df.ply_where(lo, hi), expected) 68 | 69 | 70 | class PlyWhereForSeriesTest(unittest.TestCase): 71 | 72 | def test_no_conditions(self): 73 | assert_series_equal(test_series.ply_where(), test_series) 74 | 75 | def test_single_condition(self): 76 | expected = pd.Series([3, 4], index=[2, 3]) 77 | 78 | assert_series_equal(test_series.ply_where(test_series > 2.5), expected) 79 | assert_series_equal(test_series.ply_where(lambda s: s > 2.5), expected) 80 | assert_series_equal(test_series.ply_where(X > 2.5), expected) 81 | 82 | def test_multiple_conditions(self): 83 | expected = pd.Series([2, 3], index=[1, 2]) 84 | 85 | assert_series_equal( 86 | test_series.ply_where(test_series < 3.5, test_series > 1.5), expected) 87 | assert_series_equal( 88 | test_series.ply_where(test_series < 3.5, lambda s: s > 1.5), expected) 89 | assert_series_equal( 90 | test_series.ply_where(test_series < 3.5, X > 1.5), expected) 91 | assert_series_equal( 92 | test_series.ply_where(lambda s: s < 3.5, lambda s: s > 1.5), expected) 93 | assert_series_equal( 94 | test_series.ply_where(lambda s: s < 3.5, X > 1.5), expected) 95 | assert_series_equal( 96 | test_series.ply_where(X < 3.5, X > 1.5), expected) 97 | 98 | 99 | class PlySelectTest(unittest.TestCase): 100 | 101 | def test_bad_arguments(self): 102 | # Nonexistent column, include or exclude 103 | with self.assertRaises(ValueError): 104 | test_df.ply_select('z') 105 | with self.assertRaises(ValueError): 106 | test_df.ply_select('-z') 107 | 108 | # Exclude without asterisk 109 | with self.assertRaises(ValueError): 110 | test_df.ply_select('-x') 111 | 112 | # Include with asterisk 113 | with self.assertRaises(ValueError): 114 | test_df.ply_select('*', 'x') 115 | 116 | def test_noops(self): 117 | assert_frame_equal(test_df.ply_select('*'), test_df) 118 | assert_frame_equal(test_df.ply_select('x', 'y'), test_df) 119 | assert_frame_equiv(test_df.ply_select(x=X.x, y=X.y), test_df) 120 | 121 | def test_reorder(self): 122 | reordered = test_df.ply_select('y', 'x') 123 | assert_frame_equiv(reordered, test_df[['y', 'x']]) 124 | self.assertEqual(list(reordered.columns), ['y', 'x']) 125 | 126 | def test_subset_via_includes(self): 127 | assert_frame_equal(test_df.ply_select('x'), test_df[['x']]) 128 | assert_frame_equal(test_df.ply_select('y'), test_df[['y']]) 129 | 130 | def test_subset_via_excludes(self): 131 | assert_frame_equal(test_df.ply_select('*', '-y'), test_df[['x']]) 132 | assert_frame_equal(test_df.ply_select('*', '-x'), test_df[['y']]) 133 | 134 | def test_empty(self): 135 | assert_frame_equal(test_df.ply_select(), test_df[[]]) 136 | assert_frame_equal(test_df.ply_select('*', '-x', '-y'), test_df[[]]) 137 | 138 | def test_ways_of_providing_new_columns(self): 139 | # Value 140 | assert_frame_equal( 141 | test_df.ply_select(new=5), 142 | pd.DataFrame({'new': [5, 5, 5, 5]})) 143 | 144 | # Dataframe-like 145 | assert_frame_equal( 146 | test_df.ply_select(new=[5, 6, 7, 8]), 147 | pd.DataFrame({'new': [5, 6, 7, 8]})) 148 | 149 | # Function 150 | assert_frame_equal( 151 | test_df.ply_select(new=lambda df: df.x), 152 | pd.DataFrame({'new': [1, 2, 3, 4]})) 153 | 154 | # Symbolic expression 155 | assert_frame_equal( 156 | test_df.ply_select(new=X.x), 157 | pd.DataFrame({'new': [1, 2, 3, 4]})) 158 | 159 | def test_old_and_new_together(self): 160 | assert_frame_equal( 161 | test_df.ply_select('x', total=X.x + X.y), 162 | pd.DataFrame( 163 | {'x': [1, 2, 3, 4], 'total': [5, 5, 5, 5]}, 164 | columns=['x', 'total'])) 165 | 166 | def test_kwarg_overrides_asterisk(self): 167 | assert_frame_equal( 168 | test_df.ply_select('*', y=X.x), 169 | pd.DataFrame({'x': [1, 2, 3, 4], 'y': [1, 2, 3, 4]})) 170 | 171 | def test_kwarg_overrides_column_include(self): 172 | assert_frame_equal( 173 | test_df.ply_select('x', 'y', y=X.x), 174 | pd.DataFrame({'x': [1, 2, 3, 4], 'y': [1, 2, 3, 4]})) 175 | 176 | def test_new_index(self): 177 | assert_frame_equal( 178 | test_df.ply_select('x', index=X.y), 179 | pd.DataFrame( 180 | {'x': [1, 2, 3, 4]}, 181 | index=pd.Index([4, 3, 2, 1], name='y'))) 182 | 183 | 184 | class PlySelectForGroupsTest(unittest.TestCase): 185 | 186 | def test_simple(self): 187 | grp = test_dfsq.groupby('xsq') 188 | assert_frame_equal( 189 | grp.ply_select(count=X.x.count()), 190 | pd.DataFrame( 191 | {'count': [1, 2, 2]}, 192 | index=pd.Index([0, 1, 4], name='xsq'))) 193 | -------------------------------------------------------------------------------- /tests/test_symbolic.py: -------------------------------------------------------------------------------- 1 | import sys 2 | if sys.version_info < (2, 7): 3 | import unittest2 as unittest 4 | else: 5 | import unittest 6 | import mock 7 | 8 | from pandas_ply.symbolic import Call 9 | from pandas_ply.symbolic import GetAttr 10 | from pandas_ply.symbolic import Symbol 11 | from pandas_ply.symbolic import eval_if_symbolic 12 | from pandas_ply.symbolic import sym_call 13 | from pandas_ply.symbolic import to_callable 14 | 15 | 16 | class ExpressionTest(unittest.TestCase): 17 | 18 | # These test whether operations on symbolic expressions correctly construct 19 | # compound symbolic expressions: 20 | 21 | def test_getattr(self): 22 | expr = Symbol('some_symbol').some_attr 23 | self.assertEqual( 24 | repr(expr), 25 | "getattr(Symbol('some_symbol'), 'some_attr')") 26 | 27 | def test_call(self): 28 | expr = Symbol('some_symbol')('arg1', 'arg2', kwarg_name='kwarg value') 29 | self.assertEqual( 30 | repr(expr), 31 | "Symbol('some_symbol')(*('arg1', 'arg2'), " + 32 | "**{'kwarg_name': 'kwarg value'})") 33 | 34 | def test_ops(self): 35 | expr = Symbol('some_symbol') + 1 36 | self.assertEqual( 37 | repr(expr), 38 | "getattr(Symbol('some_symbol'), '__add__')(*(1,), **{})") 39 | 40 | expr = 1 + Symbol('some_symbol') 41 | self.assertEqual( 42 | repr(expr), 43 | "getattr(Symbol('some_symbol'), '__radd__')(*(1,), **{})") 44 | 45 | expr = Symbol('some_symbol')['key'] 46 | self.assertEqual( 47 | repr(expr), 48 | "getattr(Symbol('some_symbol'), '__getitem__')(*('key',), **{})") 49 | 50 | 51 | class SymbolTest(unittest.TestCase): 52 | 53 | def test_eval(self): 54 | self.assertEqual( 55 | Symbol('some_symbol')._eval({'some_symbol': 'value'}), 56 | 'value') 57 | self.assertEqual( 58 | Symbol('some_symbol')._eval( 59 | {'some_symbol': 'value', 'other_symbol': 'irrelevant'}), 60 | 'value') 61 | with self.assertRaises(KeyError): 62 | Symbol('some_symbol')._eval({'other_symbol': 'irrelevant'}), 63 | 64 | def test_repr(self): 65 | self.assertEqual(repr(Symbol('some_symbol')), "Symbol('some_symbol')") 66 | 67 | 68 | class GetAttrTest(unittest.TestCase): 69 | 70 | def test_eval_with_nonsymbolic_object(self): 71 | some_obj = mock.Mock() 72 | del some_obj._eval 73 | # Ensure constructing the expression does not access `.some_attr`. 74 | del some_obj.some_attr 75 | 76 | with self.assertRaises(AttributeError): 77 | some_obj.some_attr 78 | expr = GetAttr(some_obj, 'some_attr') 79 | 80 | some_obj.some_attr = 'attribute value' 81 | 82 | self.assertEqual(expr._eval({}), 'attribute value') 83 | 84 | def test_eval_with_symbolic_object(self): 85 | some_obj = mock.Mock() 86 | del some_obj._eval 87 | some_obj.some_attr = 'attribute value' 88 | 89 | expr = GetAttr(Symbol('some_symbol'), 'some_attr') 90 | 91 | self.assertEqual( 92 | expr._eval({'some_symbol': some_obj}), 93 | 'attribute value') 94 | 95 | def test_repr(self): 96 | self.assertEqual( 97 | repr(GetAttr('object', 'attrname')), 98 | "getattr('object', 'attrname')") 99 | 100 | 101 | class CallTest(unittest.TestCase): 102 | 103 | def test_eval_with_nonsymbolic_func(self): 104 | func = mock.Mock(return_value='return value') 105 | del func._eval # So it doesn't pretend to be symbolic 106 | 107 | expr = Call(func, ('arg1', 'arg2'), {'kwarg_name': 'kwarg value'}) 108 | 109 | # Ensure constructing the expression does not call the function 110 | self.assertFalse(func.called) 111 | 112 | result = expr._eval({}) 113 | 114 | func.assert_called_once_with('arg1', 'arg2', kwarg_name='kwarg value') 115 | self.assertEqual(result, 'return value') 116 | 117 | def test_eval_with_symbolic_func(self): 118 | func = mock.Mock(return_value='return value') 119 | del func._eval # So it doesn't pretend to be symbolic 120 | 121 | expr = Call( 122 | Symbol('some_symbol'), 123 | ('arg1', 'arg2'), 124 | {'kwarg_name': 'kwarg value'}) 125 | 126 | result = expr._eval({'some_symbol': func}) 127 | 128 | func.assert_called_once_with('arg1', 'arg2', kwarg_name='kwarg value') 129 | self.assertEqual(result, 'return value') 130 | 131 | def test_eval_with_symbolic_arg(self): 132 | func = mock.Mock(return_value='return value') 133 | del func._eval # So it doesn't pretend to be symbolic 134 | 135 | expr = Call( 136 | func, 137 | (Symbol('some_symbol'), 'arg2'), 138 | {'kwarg_name': 'kwarg value'}) 139 | 140 | result = expr._eval({'some_symbol': 'arg1'}) 141 | 142 | func.assert_called_once_with('arg1', 'arg2', kwarg_name='kwarg value') 143 | self.assertEqual(result, 'return value') 144 | 145 | def test_eval_with_symbol_kwarg(self): 146 | func = mock.Mock(return_value='return value') 147 | del func._eval # So it doesn't pretend to be symbolic 148 | 149 | expr = Call( 150 | func, 151 | ('arg1', 'arg2'), 152 | {'kwarg_name': Symbol('some_symbol')}) 153 | 154 | result = expr._eval({'some_symbol': 'kwarg value'}) 155 | 156 | func.assert_called_once_with('arg1', 'arg2', kwarg_name='kwarg value') 157 | self.assertEqual(result, 'return value') 158 | 159 | def test_repr(self): 160 | # One arg 161 | self.assertEqual( 162 | repr(Call('func', ('arg1',), {'kwarg_name': 'kwarg value'})), 163 | "'func'(*('arg1',), **{'kwarg_name': 'kwarg value'})") 164 | 165 | # Two args 166 | self.assertEqual( 167 | repr(Call( 168 | 'func', 169 | ('arg1', 'arg2'), 170 | {'kwarg_name': 'kwarg value'})), 171 | "'func'(*('arg1', 'arg2'), **{'kwarg_name': 'kwarg value'})") 172 | 173 | 174 | class FunctionsTest(unittest.TestCase): 175 | 176 | def test_eval_if_symbolic(self): 177 | self.assertEqual( 178 | eval_if_symbolic( 179 | 'nonsymbolic', 180 | {'some_symbol': 'symbol_value'}), 181 | 'nonsymbolic') 182 | self.assertEqual( 183 | eval_if_symbolic( 184 | Symbol('some_symbol'), 185 | {'some_symbol': 'symbol_value'}), 186 | 'symbol_value') 187 | 188 | def test_to_callable_from_nonsymbolic_noncallable(self): 189 | test_callable = to_callable('nonsymbolic') 190 | self.assertEqual( 191 | test_callable('arg1', 'arg2', kwarg_name='kwarg value'), 192 | 'nonsymbolic') 193 | 194 | def test_to_callable_from_nonsymbolic_callable(self): 195 | func = mock.Mock(return_value='return value') 196 | del func._eval # So it doesn't pretend to be symbolic 197 | 198 | test_callable = to_callable(func) 199 | 200 | # Ensure running to_callable does not call the function 201 | self.assertFalse(func.called) 202 | 203 | result = test_callable('arg1', 'arg2', kwarg_name='kwarg value') 204 | 205 | func.assert_called_once_with('arg1', 'arg2', kwarg_name='kwarg value') 206 | self.assertEqual(result, 'return value') 207 | 208 | def test_to_callable_from_symbolic(self): 209 | mock_expr = mock.Mock() 210 | mock_expr._eval.return_value = 'eval return value' 211 | 212 | test_callable = to_callable(mock_expr) 213 | 214 | # Ensure running to_callable does not evaluate the expression 215 | self.assertFalse(mock_expr._eval.called) 216 | 217 | result = test_callable('arg1', 'arg2', kwarg_name='kwarg value') 218 | 219 | mock_expr._eval.assert_called_once_with( 220 | {0: 'arg1', 1: 'arg2', 'kwarg_name': 'kwarg value'}) 221 | self.assertEqual(result, 'eval return value') 222 | 223 | def test_sym_call(self): 224 | expr = sym_call( 225 | 'func', Symbol('some_symbol'), 'arg1', 'arg2', 226 | kwarg_name='kwarg value') 227 | self.assertEqual( 228 | repr(expr), 229 | "'func'(*(Symbol('some_symbol'), 'arg1', 'arg2'), " + 230 | "**{'kwarg_name': 'kwarg value'})") 231 | 232 | 233 | class IntegrationTest(unittest.TestCase): 234 | 235 | def test_pythagoras(self): 236 | from math import sqrt 237 | 238 | X = Symbol('X') 239 | Y = Symbol('Y') 240 | 241 | expr = sym_call(sqrt, X ** 2 + Y ** 2) 242 | func = to_callable(expr) 243 | 244 | self.assertEqual(func(X=3, Y=4), 5) 245 | --------------------------------------------------------------------------------