├── .gitignore ├── Makefile ├── README.md ├── Untitled ├── make.bat └── source ├── .DS_Store ├── 1_setup.rst ├── 2_basic_python.rst ├── 3_pandas.rst ├── 4_install_packages.rst ├── _data ├── .DS_Store ├── regression_table.tex └── wdi_indicators.csv ├── conf.py ├── googleanalytics ├── MANIFEST.in ├── README.rst ├── build │ └── lib │ │ └── sphinxcontrib │ │ ├── __init__.py │ │ └── googleanalytics.py ├── dist │ └── sphinxcontrib_googleanalytics-0.1.dev20151208-py3.4.egg ├── setup.cfg ├── setup.py ├── sphinxcontrib │ ├── __init__.py │ └── googleanalytics.py └── sphinxcontrib_googleanalytics.egg-info │ ├── PKG-INFO │ ├── SOURCES.txt │ ├── dependency_links.txt │ ├── namespace_packages.txt │ ├── not-zip-safe │ ├── requires.txt │ └── top_level.txt ├── index.rst ├── python_for_r.rst ├── python_for_stata.rst ├── r_to_python.rst ├── st_command_line.rst ├── st_git_and_github.rst ├── st_ipython.rst ├── t_big_data.rst ├── t_getting_help.rst ├── t_gis.rst ├── t_igraph.rst ├── t_scikitlearn.rst ├── t_seaborn.rst ├── t_statsmodels.rst ├── t_super_fast.rst ├── t_teaching_programming.rst ├── t_text_analysis.rst └── why_python.rst /.gitignore: -------------------------------------------------------------------------------- 1 | build/.* 2 | build/ 3 | -------------------------------------------------------------------------------- /Makefile: -------------------------------------------------------------------------------- 1 | # Makefile for Sphinx documentation 2 | # 3 | 4 | # You can set these variables from the command line. 5 | SPHINXOPTS = 6 | SPHINXBUILD = sphinx-build 7 | PAPER = 8 | BUILDDIR = build 9 | 10 | # User-friendly check for sphinx-build 11 | ifeq ($(shell which $(SPHINXBUILD) >/dev/null 2>&1; echo $$?), 1) 12 | $(error The '$(SPHINXBUILD)' command was not found. Make sure you have Sphinx installed, then set the SPHINXBUILD environment variable to point to the full path of the '$(SPHINXBUILD)' executable. Alternatively you can add the directory with the executable to your PATH. If you don't have Sphinx installed, grab it from http://sphinx-doc.org/) 13 | endif 14 | 15 | # Internal variables. 16 | PAPEROPT_a4 = -D latex_paper_size=a4 17 | PAPEROPT_letter = -D latex_paper_size=letter 18 | ALLSPHINXOPTS = -d $(BUILDDIR)/doctrees $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) source 19 | # the i18n builder cannot share the environment and doctrees with the others 20 | I18NSPHINXOPTS = $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) source 21 | 22 | .PHONY: help clean html dirhtml singlehtml pickle json htmlhelp qthelp devhelp epub latex latexpdf text man changes linkcheck doctest coverage gettext 23 | 24 | help: 25 | @echo "Please use \`make ' where is one of" 26 | @echo " html to make standalone HTML files" 27 | @echo " dirhtml to make HTML files named index.html in directories" 28 | @echo " singlehtml to make a single large HTML file" 29 | @echo " pickle to make pickle files" 30 | @echo " json to make JSON files" 31 | @echo " htmlhelp to make HTML files and a HTML help project" 32 | @echo " qthelp to make HTML files and a qthelp project" 33 | @echo " applehelp to make an Apple Help Book" 34 | @echo " devhelp to make HTML files and a Devhelp project" 35 | @echo " epub to make an epub" 36 | @echo " latex to make LaTeX files, you can set PAPER=a4 or PAPER=letter" 37 | @echo " latexpdf to make LaTeX files and run them through pdflatex" 38 | @echo " latexpdfja to make LaTeX files and run them through platex/dvipdfmx" 39 | @echo " text to make text files" 40 | @echo " man to make manual pages" 41 | @echo " texinfo to make Texinfo files" 42 | @echo " info to make Texinfo files and run them through makeinfo" 43 | @echo " gettext to make PO message catalogs" 44 | @echo " changes to make an overview of all changed/added/deprecated items" 45 | @echo " xml to make Docutils-native XML files" 46 | @echo " pseudoxml to make pseudoxml-XML files for display purposes" 47 | @echo " linkcheck to check all external links for integrity" 48 | @echo " doctest to run all doctests embedded in the documentation (if enabled)" 49 | @echo " coverage to run coverage check of the documentation (if enabled)" 50 | 51 | clean: 52 | rm -rf $(BUILDDIR)/* 53 | 54 | html: 55 | $(SPHINXBUILD) -b html $(ALLSPHINXOPTS) $(BUILDDIR)/html 56 | @echo 57 | @echo "Build finished. The HTML pages are in $(BUILDDIR)/html." 58 | 59 | dirhtml: 60 | $(SPHINXBUILD) -b dirhtml $(ALLSPHINXOPTS) $(BUILDDIR)/dirhtml 61 | @echo 62 | @echo "Build finished. The HTML pages are in $(BUILDDIR)/dirhtml." 63 | 64 | singlehtml: 65 | $(SPHINXBUILD) -b singlehtml $(ALLSPHINXOPTS) $(BUILDDIR)/singlehtml 66 | @echo 67 | @echo "Build finished. The HTML page is in $(BUILDDIR)/singlehtml." 68 | 69 | pickle: 70 | $(SPHINXBUILD) -b pickle $(ALLSPHINXOPTS) $(BUILDDIR)/pickle 71 | @echo 72 | @echo "Build finished; now you can process the pickle files." 73 | 74 | json: 75 | $(SPHINXBUILD) -b json $(ALLSPHINXOPTS) $(BUILDDIR)/json 76 | @echo 77 | @echo "Build finished; now you can process the JSON files." 78 | 79 | htmlhelp: 80 | $(SPHINXBUILD) -b htmlhelp $(ALLSPHINXOPTS) $(BUILDDIR)/htmlhelp 81 | @echo 82 | @echo "Build finished; now you can run HTML Help Workshop with the" \ 83 | ".hhp project file in $(BUILDDIR)/htmlhelp." 84 | 85 | qthelp: 86 | $(SPHINXBUILD) -b qthelp $(ALLSPHINXOPTS) $(BUILDDIR)/qthelp 87 | @echo 88 | @echo "Build finished; now you can run "qcollectiongenerator" with the" \ 89 | ".qhcp project file in $(BUILDDIR)/qthelp, like this:" 90 | @echo "# qcollectiongenerator $(BUILDDIR)/qthelp/python_for_social_scientists.qhcp" 91 | @echo "To view the help file:" 92 | @echo "# assistant -collectionFile $(BUILDDIR)/qthelp/python_for_social_scientists.qhc" 93 | 94 | applehelp: 95 | $(SPHINXBUILD) -b applehelp $(ALLSPHINXOPTS) $(BUILDDIR)/applehelp 96 | @echo 97 | @echo "Build finished. The help book is in $(BUILDDIR)/applehelp." 98 | @echo "N.B. You won't be able to view it unless you put it in" \ 99 | "~/Library/Documentation/Help or install it in your application" \ 100 | "bundle." 101 | 102 | devhelp: 103 | $(SPHINXBUILD) -b devhelp $(ALLSPHINXOPTS) $(BUILDDIR)/devhelp 104 | @echo 105 | @echo "Build finished." 106 | @echo "To view the help file:" 107 | @echo "# mkdir -p $$HOME/.local/share/devhelp/python_for_social_scientists" 108 | @echo "# ln -s $(BUILDDIR)/devhelp $$HOME/.local/share/devhelp/python_for_social_scientists" 109 | @echo "# devhelp" 110 | 111 | epub: 112 | $(SPHINXBUILD) -b epub $(ALLSPHINXOPTS) $(BUILDDIR)/epub 113 | @echo 114 | @echo "Build finished. The epub file is in $(BUILDDIR)/epub." 115 | 116 | latex: 117 | $(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex 118 | @echo 119 | @echo "Build finished; the LaTeX files are in $(BUILDDIR)/latex." 120 | @echo "Run \`make' in that directory to run these through (pdf)latex" \ 121 | "(use \`make latexpdf' here to do that automatically)." 122 | 123 | latexpdf: 124 | $(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex 125 | @echo "Running LaTeX files through pdflatex..." 126 | $(MAKE) -C $(BUILDDIR)/latex all-pdf 127 | @echo "pdflatex finished; the PDF files are in $(BUILDDIR)/latex." 128 | 129 | latexpdfja: 130 | $(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex 131 | @echo "Running LaTeX files through platex and dvipdfmx..." 132 | $(MAKE) -C $(BUILDDIR)/latex all-pdf-ja 133 | @echo "pdflatex finished; the PDF files are in $(BUILDDIR)/latex." 134 | 135 | text: 136 | $(SPHINXBUILD) -b text $(ALLSPHINXOPTS) $(BUILDDIR)/text 137 | @echo 138 | @echo "Build finished. The text files are in $(BUILDDIR)/text." 139 | 140 | man: 141 | $(SPHINXBUILD) -b man $(ALLSPHINXOPTS) $(BUILDDIR)/man 142 | @echo 143 | @echo "Build finished. The manual pages are in $(BUILDDIR)/man." 144 | 145 | texinfo: 146 | $(SPHINXBUILD) -b texinfo $(ALLSPHINXOPTS) $(BUILDDIR)/texinfo 147 | @echo 148 | @echo "Build finished. The Texinfo files are in $(BUILDDIR)/texinfo." 149 | @echo "Run \`make' in that directory to run these through makeinfo" \ 150 | "(use \`make info' here to do that automatically)." 151 | 152 | info: 153 | $(SPHINXBUILD) -b texinfo $(ALLSPHINXOPTS) $(BUILDDIR)/texinfo 154 | @echo "Running Texinfo files through makeinfo..." 155 | make -C $(BUILDDIR)/texinfo info 156 | @echo "makeinfo finished; the Info files are in $(BUILDDIR)/texinfo." 157 | 158 | gettext: 159 | $(SPHINXBUILD) -b gettext $(I18NSPHINXOPTS) $(BUILDDIR)/locale 160 | @echo 161 | @echo "Build finished. The message catalogs are in $(BUILDDIR)/locale." 162 | 163 | changes: 164 | $(SPHINXBUILD) -b changes $(ALLSPHINXOPTS) $(BUILDDIR)/changes 165 | @echo 166 | @echo "The overview file is in $(BUILDDIR)/changes." 167 | 168 | linkcheck: 169 | $(SPHINXBUILD) -b linkcheck $(ALLSPHINXOPTS) $(BUILDDIR)/linkcheck 170 | @echo 171 | @echo "Link check complete; look for any errors in the above output " \ 172 | "or in $(BUILDDIR)/linkcheck/output.txt." 173 | 174 | doctest: 175 | $(SPHINXBUILD) -b doctest $(ALLSPHINXOPTS) $(BUILDDIR)/doctest 176 | @echo "Testing of doctests in the sources finished, look at the " \ 177 | "results in $(BUILDDIR)/doctest/output.txt." 178 | 179 | coverage: 180 | $(SPHINXBUILD) -b coverage $(ALLSPHINXOPTS) $(BUILDDIR)/coverage 181 | @echo "Testing of coverage in the sources finished, look at the " \ 182 | "results in $(BUILDDIR)/coverage/python.txt." 183 | 184 | xml: 185 | $(SPHINXBUILD) -b xml $(ALLSPHINXOPTS) $(BUILDDIR)/xml 186 | @echo 187 | @echo "Build finished. The XML files are in $(BUILDDIR)/xml." 188 | 189 | pseudoxml: 190 | $(SPHINXBUILD) -b pseudoxml $(ALLSPHINXOPTS) $(BUILDDIR)/pseudoxml 191 | @echo 192 | @echo "Build finished. The pseudo-XML files are in $(BUILDDIR)/pseudoxml." 193 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | Data Analysis in Python! 2 | 3 | Welcome to the git repository for www.data-analysis-in-python.org ! 4 | 5 | Note this page and all associated content is under a CC license! 6 | 7 |

This work is licensed under a Creative Commons Attribution 4.0 International License. 8 | -------------------------------------------------------------------------------- /Untitled: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/nickeubank/data-analysis-in-python/0fb65e2e853176c1a10f84a93e128f59086d334c/Untitled -------------------------------------------------------------------------------- /make.bat: -------------------------------------------------------------------------------- 1 | @ECHO OFF 2 | 3 | REM Command file for Sphinx documentation 4 | 5 | if "%SPHINXBUILD%" == "" ( 6 | set SPHINXBUILD=sphinx-build 7 | ) 8 | set BUILDDIR=build 9 | set ALLSPHINXOPTS=-d %BUILDDIR%/doctrees %SPHINXOPTS% source 10 | set I18NSPHINXOPTS=%SPHINXOPTS% source 11 | if NOT "%PAPER%" == "" ( 12 | set ALLSPHINXOPTS=-D latex_paper_size=%PAPER% %ALLSPHINXOPTS% 13 | set I18NSPHINXOPTS=-D latex_paper_size=%PAPER% %I18NSPHINXOPTS% 14 | ) 15 | 16 | if "%1" == "" goto help 17 | 18 | if "%1" == "help" ( 19 | :help 20 | echo.Please use `make ^` where ^ is one of 21 | echo. html to make standalone HTML files 22 | echo. dirhtml to make HTML files named index.html in directories 23 | echo. singlehtml to make a single large HTML file 24 | echo. pickle to make pickle files 25 | echo. json to make JSON files 26 | echo. htmlhelp to make HTML files and a HTML help project 27 | echo. qthelp to make HTML files and a qthelp project 28 | echo. devhelp to make HTML files and a Devhelp project 29 | echo. epub to make an epub 30 | echo. latex to make LaTeX files, you can set PAPER=a4 or PAPER=letter 31 | echo. text to make text files 32 | echo. man to make manual pages 33 | echo. texinfo to make Texinfo files 34 | echo. gettext to make PO message catalogs 35 | echo. changes to make an overview over all changed/added/deprecated items 36 | echo. xml to make Docutils-native XML files 37 | echo. pseudoxml to make pseudoxml-XML files for display purposes 38 | echo. linkcheck to check all external links for integrity 39 | echo. doctest to run all doctests embedded in the documentation if enabled 40 | echo. coverage to run coverage check of the documentation if enabled 41 | goto end 42 | ) 43 | 44 | if "%1" == "clean" ( 45 | for /d %%i in (%BUILDDIR%\*) do rmdir /q /s %%i 46 | del /q /s %BUILDDIR%\* 47 | goto end 48 | ) 49 | 50 | 51 | REM Check if sphinx-build is available and fallback to Python version if any 52 | %SPHINXBUILD% 2> nul 53 | if errorlevel 9009 goto sphinx_python 54 | goto sphinx_ok 55 | 56 | :sphinx_python 57 | 58 | set SPHINXBUILD=python -m sphinx.__init__ 59 | %SPHINXBUILD% 2> nul 60 | if errorlevel 9009 ( 61 | echo. 62 | echo.The 'sphinx-build' command was not found. Make sure you have Sphinx 63 | echo.installed, then set the SPHINXBUILD environment variable to point 64 | echo.to the full path of the 'sphinx-build' executable. Alternatively you 65 | echo.may add the Sphinx directory to PATH. 66 | echo. 67 | echo.If you don't have Sphinx installed, grab it from 68 | echo.http://sphinx-doc.org/ 69 | exit /b 1 70 | ) 71 | 72 | :sphinx_ok 73 | 74 | 75 | if "%1" == "html" ( 76 | %SPHINXBUILD% -b html %ALLSPHINXOPTS% %BUILDDIR%/html 77 | if errorlevel 1 exit /b 1 78 | echo. 79 | echo.Build finished. The HTML pages are in %BUILDDIR%/html. 80 | goto end 81 | ) 82 | 83 | if "%1" == "dirhtml" ( 84 | %SPHINXBUILD% -b dirhtml %ALLSPHINXOPTS% %BUILDDIR%/dirhtml 85 | if errorlevel 1 exit /b 1 86 | echo. 87 | echo.Build finished. The HTML pages are in %BUILDDIR%/dirhtml. 88 | goto end 89 | ) 90 | 91 | if "%1" == "singlehtml" ( 92 | %SPHINXBUILD% -b singlehtml %ALLSPHINXOPTS% %BUILDDIR%/singlehtml 93 | if errorlevel 1 exit /b 1 94 | echo. 95 | echo.Build finished. The HTML pages are in %BUILDDIR%/singlehtml. 96 | goto end 97 | ) 98 | 99 | if "%1" == "pickle" ( 100 | %SPHINXBUILD% -b pickle %ALLSPHINXOPTS% %BUILDDIR%/pickle 101 | if errorlevel 1 exit /b 1 102 | echo. 103 | echo.Build finished; now you can process the pickle files. 104 | goto end 105 | ) 106 | 107 | if "%1" == "json" ( 108 | %SPHINXBUILD% -b json %ALLSPHINXOPTS% %BUILDDIR%/json 109 | if errorlevel 1 exit /b 1 110 | echo. 111 | echo.Build finished; now you can process the JSON files. 112 | goto end 113 | ) 114 | 115 | if "%1" == "htmlhelp" ( 116 | %SPHINXBUILD% -b htmlhelp %ALLSPHINXOPTS% %BUILDDIR%/htmlhelp 117 | if errorlevel 1 exit /b 1 118 | echo. 119 | echo.Build finished; now you can run HTML Help Workshop with the ^ 120 | .hhp project file in %BUILDDIR%/htmlhelp. 121 | goto end 122 | ) 123 | 124 | if "%1" == "qthelp" ( 125 | %SPHINXBUILD% -b qthelp %ALLSPHINXOPTS% %BUILDDIR%/qthelp 126 | if errorlevel 1 exit /b 1 127 | echo. 128 | echo.Build finished; now you can run "qcollectiongenerator" with the ^ 129 | .qhcp project file in %BUILDDIR%/qthelp, like this: 130 | echo.^> qcollectiongenerator %BUILDDIR%\qthelp\python_for_social_scientists.qhcp 131 | echo.To view the help file: 132 | echo.^> assistant -collectionFile %BUILDDIR%\qthelp\python_for_social_scientists.ghc 133 | goto end 134 | ) 135 | 136 | if "%1" == "devhelp" ( 137 | %SPHINXBUILD% -b devhelp %ALLSPHINXOPTS% %BUILDDIR%/devhelp 138 | if errorlevel 1 exit /b 1 139 | echo. 140 | echo.Build finished. 141 | goto end 142 | ) 143 | 144 | if "%1" == "epub" ( 145 | %SPHINXBUILD% -b epub %ALLSPHINXOPTS% %BUILDDIR%/epub 146 | if errorlevel 1 exit /b 1 147 | echo. 148 | echo.Build finished. The epub file is in %BUILDDIR%/epub. 149 | goto end 150 | ) 151 | 152 | if "%1" == "latex" ( 153 | %SPHINXBUILD% -b latex %ALLSPHINXOPTS% %BUILDDIR%/latex 154 | if errorlevel 1 exit /b 1 155 | echo. 156 | echo.Build finished; the LaTeX files are in %BUILDDIR%/latex. 157 | goto end 158 | ) 159 | 160 | if "%1" == "latexpdf" ( 161 | %SPHINXBUILD% -b latex %ALLSPHINXOPTS% %BUILDDIR%/latex 162 | cd %BUILDDIR%/latex 163 | make all-pdf 164 | cd %~dp0 165 | echo. 166 | echo.Build finished; the PDF files are in %BUILDDIR%/latex. 167 | goto end 168 | ) 169 | 170 | if "%1" == "latexpdfja" ( 171 | %SPHINXBUILD% -b latex %ALLSPHINXOPTS% %BUILDDIR%/latex 172 | cd %BUILDDIR%/latex 173 | make all-pdf-ja 174 | cd %~dp0 175 | echo. 176 | echo.Build finished; the PDF files are in %BUILDDIR%/latex. 177 | goto end 178 | ) 179 | 180 | if "%1" == "text" ( 181 | %SPHINXBUILD% -b text %ALLSPHINXOPTS% %BUILDDIR%/text 182 | if errorlevel 1 exit /b 1 183 | echo. 184 | echo.Build finished. The text files are in %BUILDDIR%/text. 185 | goto end 186 | ) 187 | 188 | if "%1" == "man" ( 189 | %SPHINXBUILD% -b man %ALLSPHINXOPTS% %BUILDDIR%/man 190 | if errorlevel 1 exit /b 1 191 | echo. 192 | echo.Build finished. The manual pages are in %BUILDDIR%/man. 193 | goto end 194 | ) 195 | 196 | if "%1" == "texinfo" ( 197 | %SPHINXBUILD% -b texinfo %ALLSPHINXOPTS% %BUILDDIR%/texinfo 198 | if errorlevel 1 exit /b 1 199 | echo. 200 | echo.Build finished. The Texinfo files are in %BUILDDIR%/texinfo. 201 | goto end 202 | ) 203 | 204 | if "%1" == "gettext" ( 205 | %SPHINXBUILD% -b gettext %I18NSPHINXOPTS% %BUILDDIR%/locale 206 | if errorlevel 1 exit /b 1 207 | echo. 208 | echo.Build finished. The message catalogs are in %BUILDDIR%/locale. 209 | goto end 210 | ) 211 | 212 | if "%1" == "changes" ( 213 | %SPHINXBUILD% -b changes %ALLSPHINXOPTS% %BUILDDIR%/changes 214 | if errorlevel 1 exit /b 1 215 | echo. 216 | echo.The overview file is in %BUILDDIR%/changes. 217 | goto end 218 | ) 219 | 220 | if "%1" == "linkcheck" ( 221 | %SPHINXBUILD% -b linkcheck %ALLSPHINXOPTS% %BUILDDIR%/linkcheck 222 | if errorlevel 1 exit /b 1 223 | echo. 224 | echo.Link check complete; look for any errors in the above output ^ 225 | or in %BUILDDIR%/linkcheck/output.txt. 226 | goto end 227 | ) 228 | 229 | if "%1" == "doctest" ( 230 | %SPHINXBUILD% -b doctest %ALLSPHINXOPTS% %BUILDDIR%/doctest 231 | if errorlevel 1 exit /b 1 232 | echo. 233 | echo.Testing of doctests in the sources finished, look at the ^ 234 | results in %BUILDDIR%/doctest/output.txt. 235 | goto end 236 | ) 237 | 238 | if "%1" == "coverage" ( 239 | %SPHINXBUILD% -b coverage %ALLSPHINXOPTS% %BUILDDIR%/coverage 240 | if errorlevel 1 exit /b 1 241 | echo. 242 | echo.Testing of coverage in the sources finished, look at the ^ 243 | results in %BUILDDIR%/coverage/python.txt. 244 | goto end 245 | ) 246 | 247 | if "%1" == "xml" ( 248 | %SPHINXBUILD% -b xml %ALLSPHINXOPTS% %BUILDDIR%/xml 249 | if errorlevel 1 exit /b 1 250 | echo. 251 | echo.Build finished. The XML files are in %BUILDDIR%/xml. 252 | goto end 253 | ) 254 | 255 | if "%1" == "pseudoxml" ( 256 | %SPHINXBUILD% -b pseudoxml %ALLSPHINXOPTS% %BUILDDIR%/pseudoxml 257 | if errorlevel 1 exit /b 1 258 | echo. 259 | echo.Build finished. The pseudo-XML files are in %BUILDDIR%/pseudoxml. 260 | goto end 261 | ) 262 | 263 | :end 264 | -------------------------------------------------------------------------------- /source/.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/nickeubank/data-analysis-in-python/0fb65e2e853176c1a10f84a93e128f59086d334c/source/.DS_Store -------------------------------------------------------------------------------- /source/1_setup.rst: -------------------------------------------------------------------------------- 1 | 2 | 1. Setting Up Python 3 | ============================== 4 | 5 | 6 | 7 | Installing Python 8 | ^^^^^^^^^^^^^^^^^ 9 | 10 | First, we want to download the Anaconda Python distribution. The Anaconda distribution is not the only way to get python -- indeed, there's a good chance a version of Python is already installed on your computer -- but the Anaconda installation does a number of very nice things: 11 | 12 | * A clean installation of the latest version of Python 13 | * Installation of a large number of add-on libraries that are commonly used in scientific computing (this is what the type of work social scientists do is called) 14 | * Installation of a "package manager" -- a tool for installing new packages in the future and updating already installed packages. [#pip]_ 15 | 16 | To install the Anaconda distribution, just go to the `Anaconda download page `_ and pick the appropriate installer. **Make sure to get an installer for Python 3.x, not 2.x**! [#2v3]_ 17 | 18 | 19 | Running Python 20 | ^^^^^^^^^^^^^^ 21 | 22 | Once you have python, you have to decide how to interact with it. Anaconda includes a "Launcher" app that offers several options (which you can find in the "anaconda" folder, though there's probably a shortcut on your desktop after installation), but I would suggest using the program Spyder to work with Python -- it's probably the most familiar interface to Stata and R users, as it is analogous to Stata / RStudio. 23 | 24 | 25 | 26 | .. [#pip] If you know what pip is, Anaconda is similar to pip, but with the added ability of being able to not just load python modules, but also compile the C or C++ code that underlies some fancier libraries. 27 | .. [#2v3] In 2000, Python 3 was first released. Python 3 fixed a lot of things people disliked about Python, but in the process it made some changes that meant code written in Python 2 would not work any more. To ease the transition to Python 3, both Python 2 and Python 3 have been supported for several years so people could keep running their Python 2 until they finished the transition. Almost everything is now in Python 3, so since you don't have any old code to worry about, you want Python 3. -------------------------------------------------------------------------------- /source/2_basic_python.rst: -------------------------------------------------------------------------------- 1 | 2 | 2. Basic Python 3 | ========================== 4 | 5 | This site provides a number of links to recommended introduction to Python tutorials (no reason to re-invent the wheel by trying to write a new one!). These different tutorials each have their own strengths and weaknesses, and readers are encouraged to shop around a little. 6 | 7 | Words of Advice 8 | ^^^^^^^^^^^^^^^^^^^^^ 9 | Before that list, however, some guidance: 10 | 11 | Manage your expectations 12 | ------------------------- 13 | 14 | Python is very user friendly, but it does ask that you invest a little in learning the fundamentals of the language before you jump into applications. Languages like R and Stata were explicitly designed to have you up and running almost immediately, but accomplishing this affected how the languages could be written. Python requires a little training, but pays you back for that investment down the road. 15 | 16 | Prioritize 17 | ------------ 18 | 19 | Because Python is a general purpose language, it has many features that definitely won't be relevant for you immediately, and may never be relevant. Many tutorials, for example, were written by people in silicon valley who write apps for phones, and emphasize the skills they use. Here's an overview of the things I think are critical to know immediately, important to learn soon, and things you may never need to know: 20 | 21 | Need immediately: 22 | 23 | * Data types: integers, floats, strings, booleans, lists, dictionaries, and sets (tuples are kinda optional) 24 | * Defining functions 25 | * Writing loops 26 | * Understanding mutable versus immutable data types 27 | * Methods for manipulating strings 28 | * Importing third party modules 29 | * Reading and interpreting errors 30 | 31 | Things you'll want to know at some point, but not necessary immediately: 32 | 33 | * Advanced debugging utilities (like pdb) 34 | * File input / output (most libraries you'll use have tools to simplify this for you) 35 | 36 | Don't need: 37 | 38 | * Defining or writing classes 39 | * Understanding Exceptions 40 | 41 | 42 | If you try one of these and find my annotations inaccurate, please `send me a note `_ letting me know and I'll update my advice! 43 | 44 | One last note: in the Python world, the type of work you will probably be doing is called "scientific computing", so if you're trying to find tailored resources, that's a great search term! 45 | 46 | Interacting with Python 47 | ------------------------ 48 | 49 | There are three ways you may see people use Python: interactively with Python, interactively with iPython, and by executing a saved file. They're all basically equivalent, but I want to highlight them so you aren't surprised if you see them. 50 | 51 | **Interactive Use**: Most social scientists work with Python interactively -- you open a session, and execute your code one line at a time, just like in Stata or R. 52 | 53 | At its core, Python has a simple, text-based interactive interface. However, there is also a program called `iPython` many people use when working interactively with Python. `iPython` is just a little program that sits between the user and Python itself that adds a few bells and whistles to make it a little easier to work with. What `iPython` can do is not really important here; what is important is that you not be surprised if you find different tutorials working with slightly different interfaces. (If you see a tutorial with something called `Jupyter` notebooks, that's basically a special document that runs `iPython`). You can learn more about :doc:`iPython and Jupyter notebooks when you're ready here `. 54 | 55 | You can tell if someone is using Python or `iPython` by looking at the prompt on the screen. Python always has a simple three arrow icon (`>>>`). iPython has a more colorful interface that looks like this: 56 | 57 | .. ipython:: python 58 | 59 | print('hello!') 60 | 61 | **Running Files:** The other way to use Python -- more common among software developers -- is to write your python into a file, save it as `something.py`, then run that file by openning a command line interface and typing `python something.py`. This is analogous to openning Python, executing each line, then closing Python. 62 | 63 | **Don't know what I mean by Command Line?** Cool! You don't need to know much about it to work with Python, but you should know enough you're not scared of it. Take 10 minutes to run over to :doc:`ST:Command Line ` for a quick overview. 64 | 65 | Tutorials 66 | ^^^^^^^^^^^ 67 | 68 | `Python 3 Essentials `_ from Lynda.com 69 | ----------------------------------------------------------------------------------------------------------------- 70 | **Pre-requisites:** Some experience in another language like R or Java. If you've used for-loops and defined your own functions in another language, you're probably fine. 71 | 72 | **The Good:** A really incredibly clear instructor and carefully designed class. Some assumed familiarity with programming, but not a lot. Lynda.com also offers ability to speed-up playback for sections that feel familiar or straightforward. Topics are carefully organized and indexed if you need to jump around. Includes a "Quick Start" section. 73 | 74 | **The Bad:** May not be free. Lynda.com tutorials are great because it's a paid service, and sadly you sometimes get what you pay for. However, many universities (Yale, Stanford, etc.) have subscriptions, so check with your institution to see if you can get free access. You can also get a 10 day free trial, or pay $30 for a month. If you're unsure, try the "Quick Start" section, which is public. 75 | 76 | **Recommended Sections:** I would recommend proceeding as follows: 77 | 78 | * Section 1 79 | * Section 2 up to "Reusing code and data with a class" 80 | * If you installed Python using the setup recommended here, skip section 3. 81 | * Do Sections 4 - 8, 11, 13, 14, and the first two parts of Section 17 82 | 83 | **Optional Sections**: Not immediately needed, but potentially quite useful: 84 | 85 | * Section 9 86 | 87 | **Other Notes:** About a decade ago Python began a transition from version 2 to version 3 (for reasons that aren't worth getting into, this was a somewhat controversial move, but it's basically done at this point). This was published in part to help with that transition, so if you hear digressions about how Python 3 is different from Python 2, that's why. 88 | 89 | 90 | `A Byte of Python `_ 91 | -------------------------------------------------------------------------------------------------------------------------------------------- 92 | 93 | **Pre-requisites:** Zero assumed knowledge! 94 | 95 | **The Good:** A really, really nice, free textbook for learning Python with no assumed knowledge of programming. 96 | 97 | **The Bad:** Don't have one! Very good. 98 | 99 | 100 | 101 | `Python in 10 Minutes `_ 102 | -------------------------------------------------------------------------------------------------------------------------------------------- 103 | 104 | **Pre-requisites:** Knowing how to program in another language 105 | 106 | **The Good:** If you've programmed a fair amount in another language, this is a great "get up and go" resource, but not ideal for beginners. 107 | 108 | **The Bad:** If you haven't programmed a lot in another language, not great. Also not great for mastery. 109 | 110 | 111 | 112 | `Python for Data Science `_ 113 | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ 114 | 115 | **Pre-requisites:** None, it appears! 116 | 117 | **The Good:** A nice, focused, "let's get going" text-based tutorial for social scientists. Doesn't waste much time on things like classes, which I really appreciate! 118 | 119 | **The Bad:** It talks a lot about differences between Python 2 and Python 3. As noted elsewhere on this site, you should really only work in Python 3. This is Python 3 focused, but you'll have to wade through some junk about Python 2. 120 | 121 | 122 | 123 | `Python for You and Me `_ 124 | ---------------------------------------------------------------------- 125 | 126 | **Pre-requisites:** If you've done any programming in another language, you should be set. Maybe *just* a little too much assumed knowledge for someone who has never programmed, but I could be wrong on that (I'm very intolerant of assumed knowledge in teaching...). 127 | 128 | **The Good:** If you don't like video tutorials, this is a great choice. Clearly written, moves slowly and incrementally. 129 | 130 | **The Bad:** No explicit exercises to work through. 131 | 132 | **Recommended Sections:** I would recommend proceeding as follows: 133 | 134 | * Everything up to but not including "File Handling" 135 | * Modules 136 | 137 | **Optional Sections:** Not crucial, but potentially quite helpful: 138 | 139 | * PEP8 Guidelines 140 | 141 | 142 | **Other Notes:** 143 | 144 | 145 | `Automate the Boring Stuff `_ 146 | ------------------------------------------------------------------- 147 | **Pre-requisites:** None! Though the name is a little weird, it seems like a great resource for social scientists. 148 | 149 | **The Good:** Seems like a great introduction with essentially no assumed knowledge! The holy grail for absolute beginners. Also includes lectures (links to youtube at top of each section) for those who like it. 150 | 151 | **The Bad:** The narrative voice is fun but a little verbose (kinda like this site), so it could feel a little slow for people with more background. 152 | 153 | **Recommended Section:** 154 | 155 | * Chapters 0-6 156 | 157 | **Optional Sections:** Not crucial, but potentially quite helpful: 158 | 159 | * Chapter 7, Chapter 10 160 | 161 | 162 | A Note on Omitted Tutorials 163 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 164 | 165 | Some users will note that I have left several relatively popular Python tutorials off this list. In most cases, this is due to the fact that I made an executive editorial decision early on in writing this website to make it "Python 3 Only". "Uh, what?" you say? 166 | 167 | About a decade ago, Python version 3 was released. Python 3 changed several problems that existed in Python 2, but as a result, code written in Python 2 would no longer work. For a while, both Python 2 and Python 3 were supported side by side, and the world has taken a while to transition to Python 3. 168 | 169 | Many popular tutorials (like Python the Hard Way) are written in Python 2. As recently as a few years ago, this made sense because many important libraries weren't yet available in Python 3. Moreover, if you plan to be a software developer (the target of many tutorials), it's still absolutely necessarily you are familiar with both Python 2 and Python 3 since old code you will encounter at a company may still be in Python 2. 170 | 171 | But for social scientists, I think it just makes sense to start off with Python 3. `Almost all libraries `_ (especially in data-science) are updated, and Guido van Rossum (Python's "Benevolent Dictator for Life") has made it `very clear `_ there will never be another Python 2 release. With that in mind, I've been avoiding any tutorial written in Python 2. I think that asking social scientist to learn a programming language is hard enough; also asking them to learn the language AND understand all the small differences between 2 and 3 was just a pointless invitation for confusion. 172 | 173 | 174 | 175 | .. `Dive Into Python `_ 176 | .. ----------------------------------------------------------------- 177 | .. 178 | .. Good, but moves relatively quickly for beginners. 179 | .. 180 | .. 181 | .. 182 | .. `Python the Guide `_ 183 | .. ----------------------------------------------------------------------------- 184 | .. A guide to tutorials! 185 | .. 186 | .. 187 | .. 188 | .. 189 | .. `Learn Python the Hard Way `_ 190 | .. ----------------------------------------------------------------- 191 | .. A very popular and free resource for learning Python. 192 | .. 193 | .. **The Bad:** The tutorial is written for Python 2, and the author goes out of his way to say "A programmer may try to get you to install Python 3 and learn that. Say, "When all of the Python code on your computer is Python 3, then I'll try to learn it." That should keep them busy for about 10 years. I repeat, do not use Python 3." The problem is that that decade has basically passed, and all scientific computing software is basically now made the transition to Python 3. 194 | .. 195 | .. **Other Notes:** 196 | -------------------------------------------------------------------------------- /source/3_pandas.rst: -------------------------------------------------------------------------------- 1 | 2 | 3. Pandas 3 | ============================= 4 | 5 | Python itself does not include vectors, matrices, or dataframes as fundamental data types. As Python became an increasingly popular language, however, it was quickly realized that this was a major short-coming, and new libraries were created that added these data-types (and did so in a very, very high performance manner) to Python. 6 | 7 | The original library that added vectors and matrices to Python was called `numpy`. But `numpy`, while very powerful, was a no-frills library. You couldn't do things like mix data-types, label your columns, etc.. To remedy this shortcoming a new library was created -- built on top of `numpy` -- that added all the nice features we've come to expect from modern languages: `pandas`. 8 | 9 | 10 | Advice 11 | ^^^^^^^^^^^ 12 | 13 | Many things about `pandas` will seem familiar to users of other languages like Matlab, R, or Stata, and most differences are relatively obvious. However, there are two things I think most tutorials under-emphasize I want to alert you to: 14 | 15 | * **Indices:** Most languages (like R, Matlab, and Stata) organize data based on it's order. In R, for example, if you `cbind` two vectors, they attached to one another based on the order of rows. In `pandas`, every row of a DataFrame has a name (an index label), and most things in `pandas` are designed to keep careful track of those index labels, and where possible to make sure that when different objects are combined, they are always `aligned` according to those indices. 16 | 17 | * **Changes:** As of late 2015, `pandas` was in version 0.18. Until Version 1, `pandas` developers are likely to feel relatively free to make changes to some core functions in `pandas` to improve the language. For example, in mid-2015, the `sort` function was changed to help make the behavior more intuitive. So you're aware of changes, I strongly recommend subscribing to the `pydata Google Group Mailing List `_ so when updates to `pandas` come out you'll be alerted and get a summary of changes. 18 | 19 | Tutorials 20 | ^^^^^^^^^^^ 21 | 22 | `Intro to Pandas (Greg Reda) `_ 23 | --------------------------------------------------------------------------------------------------------------------------------- 24 | 25 | **Format:** Text / iPython Notebooks 26 | 27 | **Summary:** A really nice, text-based tutorial. Very good for basics. 28 | 29 | `Pandas from the Ground Up (Brandon Rhodes) `_ 30 | --------------------------------------------------------------------------------------------------------------------------------- 31 | 32 | **Format:** Video, with linked iPython notebooks 33 | 34 | **Summary:** One of two very good video introductions to `pandas` (the other is the Jonathan Rocher video below). I suggest watching a little of both of these and picking the one that suits your learning style. This tutorial is a little more "let's learn the principles of Pandas in the abstract then apply them", while the Rocher tutorial is a little more "let's learn as we do", but mix principles and examples well depending on your learning style. 35 | 36 | `Make sure you download the associated materials here, not from the link in the video! `_ 37 | 38 | `Analyzing and Manipulating Data with Pandas (Jonathan Rocher) `_ 39 | --------------------------------------------------------------------------------------------------------------------------------- 40 | 41 | **Format:** Video, with linked iPython notebooks 42 | 43 | **Summary:** The second very good video introductions to `pandas`. Again, I suggest watching a little of both of these and picking the one that suits your learning style. 44 | 45 | `Make sure you download the associated materials here! `_ 46 | 47 | 48 | `Learn Pandas `_ 49 | --------------------------------------------------------------------------------------------------------------------------------- 50 | 51 | **Format:** Text / iPython Notebooks 52 | 53 | **Summary:** Nice. A little dense, and a little more focused on getting users going than teaching basic organizing principles, but if you want to get to useable code fast, a good option. 54 | 55 | -------------------------------------------------------------------------------- /source/4_install_packages.rst: -------------------------------------------------------------------------------- 1 | 2 | 4. Installing Packages 3 | ========================== 4 | 5 | There are three tools for adding modules to your python installation: `conda`, `anaconda`, and `pip`. 6 | 7 | `pip` is the oldest tool, and works for most things. `conda` is the newest, and is great because it can handle libraries that `pip` struggles with (libraries that require compilation, as many scientific libraries do) and is curated and managed by the Continuum Analytics. And finally, anaconda.org is a user-contributed version of `conda` -- same engine, but the specific packages are managed by users on Anaconda.org, so results aren't quite as well guaranteed, but many things are available that aren't on `conda`. You access it by typing `conda -c` instead of `conda`/ 8 | 9 | **An important note:** Unlike in R or Stata, you don't download and install Python libraries from inside Python. Instead, you invoke `pip`, `conda`, or `anaconda` from the "command line" -- a text-based interface for your operating system. The command line is found in the `terminal` application (on a mac) or `PowerShell` application (on Windows), and you may also hear it referred to as `bash`. The command line may seem scary, but you literally just need to open `terminal` (on a mac, it's in Applications/Utilities) or PowerShell and type `conda install [library]` or `pip install [library]`. 10 | 11 | Generally, I would attempt to load libraries as follows: 12 | 13 | * Try `conda install [library]` 14 | * Try `pip install [library]` 15 | * Use `conda -c install [library]` 16 | 17 | Here are tutorials for each: 18 | 19 | * `conda and anaconda `_ (ignore all the stuff about "environments" -- not relevant at this stage!) 20 | * `pip `_ Note pip is included in the Anaconda distribution, so you can ignore all the stuff about setting up pip in the guide above the "Use pip for installing" entry, and you don't need to worry about anything below "Upgrading packages". 21 | 22 | 23 | -------------------------------------------------------------------------------- /source/_data/.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/nickeubank/data-analysis-in-python/0fb65e2e853176c1a10f84a93e128f59086d334c/source/_data/.DS_Store -------------------------------------------------------------------------------- /source/_data/regression_table.tex: -------------------------------------------------------------------------------- 1 | \begin{center} 2 | \begin{tabular}{lclc} 3 | \toprule 4 | \textbf{Dep. Variable:} & life_expectancy & \textbf{ R-squared: } & 0.580 \\ 5 | \textbf{Model:} & OLS & \textbf{ Adj. R-squared: } & 0.574 \\ 6 | \textbf{Method:} & Least Squares & \textbf{ F-statistic: } & 83.84 \\ 7 | \textbf{Date:} & Sun, 07 Aug 2016 & \textbf{ Prob (F-statistic):} & 7.43e-36 \\ 8 | \textbf{Time:} & 09:27:24 & \textbf{ Log-Likelihood: } & -692.02 \\ 9 | \textbf{No. Observations:} & 217 & \textbf{ AIC: } & 1392. \\ 10 | \textbf{Df Residuals:} & 213 & \textbf{ BIC: } & 1406. \\ 11 | \textbf{Df Model:} & 3 & \textbf{ } & \\ 12 | \bottomrule 13 | \end{tabular} 14 | \begin{tabular}{lccccc} 15 | & \textbf{coef} & \textbf{std err} & \textbf{t} & \textbf{P$>$$|$t$|$} & \textbf{[95.0\% Conf. Int.]} \\ 16 | \midrule 17 | \textbf{Intercept} & 69.7352 & 0.918 & 75.969 & 0.000 & 67.926 71.545 \\ 18 | \textbf{C(low_income)[T.True]} & -10.6282 & 1.203 & -8.832 & 0.000 & -13.000 -8.256 \\ 19 | \textbf{population_density} & -0.0003 & 0.000 & -0.851 & 0.396 & -0.001 0.000 \\ 20 | \textbf{gdp_per_cap} & 0.0002 & 3.96e-05 & 4.564 & 0.000 & 0.000 0.000 \\ 21 | \bottomrule 22 | \end{tabular} 23 | \begin{tabular}{lclc} 24 | \textbf{Omnibus:} & 75.439 & \textbf{ Durbin-Watson: } & 2.145 \\ 25 | \textbf{Prob(Omnibus):} & 0.000 & \textbf{ Jarque-Bera (JB): } & 205.379 \\ 26 | \textbf{Skew:} & -1.527 & \textbf{ Prob(JB): } & 2.53e-45 \\ 27 | \textbf{Kurtosis:} & 6.659 & \textbf{ Cond. No. } & 7.67e+04 \\ 28 | \bottomrule 29 | \end{tabular} 30 | %\caption{OLS Regression Results} 31 | \end{center} -------------------------------------------------------------------------------- /source/_data/wdi_indicators.csv: -------------------------------------------------------------------------------- 1 | year,country_name,country_code,gdp_per_cap,literacy_rate,life_expectancy,population_density,region 2 | 2011,Afghanistan,AFG,1712.58872,31.74111748,60.06536585,44.1276338, 3 | 2011,Albania,ALB,9640.130216,96.84529877,77.16321951,106.0138686, 4 | 2011,Algeria,DZA,12964.82721,,70.75168293,15.4160958, 5 | 2011,American Samoa,ASM,,,,276.58, 6 | 2011,Andorra,ADO,,,,175.1617021, 7 | 2011,Angola,AGO,6524.481181,,51.05931707,17.6003016, 8 | 2011,Antigua and Barbuda,ATG,19987.92414,,75.50036585,200.3454545, 9 | 2011,Arab World,ARB,14555.34651,,69.73832617,26.48489576, 10 | 2011,Argentina,ARG,,,75.83860976,15.22116718, 11 | 2011,Armenia,ARM,6803.482997,99.74441528,74.33219512,104.2495258, 12 | 2011,Aruba,ABW,36015.07046,,75.08039024,566.3111111, 13 | 2011,Australia,AUS,41705.94508,,81.89512195,2.90798641, 14 | 2011,Austria,AUT,44028.77404,,80.98292683,101.6503501, 15 | 2011,Azerbaijan,AZE,15754.15236,99.77863312,70.55129268,110.9763362, 16 | 2011,"Bahamas, The",BHS,22665.16756,,74.75490244,36.63446553, 17 | 2011,Bahrain,BHR,39676.71702,,76.40204878,1702.756193, 18 | 2011,Bangladesh,BGD,2579.335048,,69.89180488,1178.502051, 19 | 2011,Barbados,BRB,15401.54965,,74.96714634,652.5627907, 20 | 2011,Belarus,BLR,16603.38953,,70.55365854,46.68572273, 21 | 2011,Belgium,BEL,40945.91637,,80.58536585,364.8528402, 22 | 2011,Belize,BLZ,7857.224198,,73.48780488,14.43195967, 23 | 2011,Benin,BEN,1643.378006,,58.94490244,86.72748315, 24 | 2011,Bermuda,BMU,54984.51039,,81.15268293,1291.28, 25 | 2011,Bhutan,BTN,6882.688407,,67.46073171,19.21048351, 26 | 2011,Bolivia,BOL,5595.180889,92.22614288,66.62692683,9.303275178, 27 | 2011,Bosnia and Herzegovina,BIH,9265.015888,,75.96241463,74.84980469, 28 | 2011,Botswana,BWA,13634.26846,,46.66895122,3.68730436, 29 | 2011,Brazil,BRA,14830.89956,91.41124725,73.34736585,23.99069458, 30 | 2011,Brunei Darussalam,BRN,73265.12163,96.08555603,78.18102439,75.79563567, 31 | 2011,Bulgaria,BGR,15278.42195,98.35244751,74.16341463,67.68909359, 32 | 2011,Burkina Faso,BFA,1470.435184,,55.4404878,58.87006944, 33 | 2011,Burundi,BDI,712.6704834,,53.13656098,381.2364097, 34 | 2011,Cabo Verde,CPV,6147.974534,,74.20682927,122.8682382, 35 | 2011,Cambodia,KHM,2648.650018,,71.05109756,82.67107976, 36 | 2011,Cameroon,CMR,2614.495561,,54.13736585,44.67657761, 37 | 2011,Canada,CAN,41567.43177,,81.06831707,3.776625307, 38 | 2011,Caribbean small states,CSS,14020.03556,,71.65454923,17.03519081, 39 | 2011,Cayman Islands,CYM,49902.14655,,,235.75, 40 | 2011,Central African Republic,CAF,893.6794586,,48.79304878,7.272950978, 41 | 2011,Central Europe and the Baltics,CEB,21678.67018,,75.9369075,94.34498842, 42 | 2011,Chad,TCD,1862.351166,,50.23619512,9.766925032, 43 | 2011,Channel Islands,CHI,,,79.97863415,844.6789474, 44 | 2011,Chile,CHL,20266.03603,96.70301056,79.3065122,23.134586, 45 | 2011,China,CHN,10274.49435,,75.042,143.1721123, 46 | 2011,Colombia,COL,11496.52725,93.58053589,73.57385366,41.82644975, 47 | 2011,Comoros,COM,1396.033351,,60.4254878,384.7243418, 48 | 2011,"Congo, Dem. Rep.",ZAR,617.1479614,,49.30070732,30.03346905, 49 | 2011,"Congo, Rep.",COG,5632.435875,79.31117249,57.77521951,12.23260615, 50 | 2011,Costa Rica,CRI,13072.14256,97.40658569,79.48956098,90.09962789, 51 | 2011,Cote d'Ivoire,CIV,2546.906919,,50.04756098,64.79299371, 52 | 2011,Croatia,HRV,20571.24888,99.12535858,76.77560976,76.49431737, 53 | 2011,Cuba,CUB,18924.76327,,78.89382927,106.3845359, 54 | 2011,Curacao,CUW,,,77.47317073,339.7094595, 55 | 2011,Cyprus,CYP,32983.11391,98.67842865,79.47195122,120.8489177, 56 | 2011,Czech Republic,CZE,28603.49445,,77.87317073,135.8892802, 57 | 2011,Denmark,DNK,43314.05636,,79.8,131.2885223, 58 | 2011,Djibouti,DJI,2782.971292,,60.80307317,36.31587575, 59 | 2011,Dominica,DMA,10332.3809,,,95.20266667, 60 | 2011,Dominican Republic,DOM,11375.62039,90.10626984,73.01304878,207.5153146, 61 | 2011,East Asia & Pacific (all income levels),EAS,12362.22162,,74.54782419,90.99453118, 62 | 2011,East Asia & Pacific (developing only),EAP,9445.12164,,73.66096173,124.2691459, 63 | 2011,Ecuador,ECU,9926.952496,91.5868988,75.91660976,61.11000161, 64 | 2011,"Egypt, Arab Rep.",EGY,10071.20526,,70.67856098,84.17061028, 65 | 2011,El Salvador,SLV,7596.509131,85.49398804,71.86870732,292.2397683, 66 | 2011,Equatorial Guinea,GNQ,33515.54469,,52.08490244,26.77069519, 67 | 2011,Eritrea,ERI,1411.157638,,61.71092683,47.42146535, 68 | 2011,Estonia,EST,23575.68715,99.86277771,76.22926829,31.31490918, 69 | 2011,Ethiopia,ETH,1165.216838,,62.25285366,89.858696, 70 | 2011,Euro area,EMU,37275.31176,,81.30274125,125.843392, 71 | 2011,Europe & Central Asia (all income levels),ECS,27186.73211,,76.36788893,32.54290735, 72 | 2011,Europe & Central Asia (developing only),ECA,12678.55621,,71.89905958,41.42791803, 73 | 2011,European Union,EUU,34520.52794,,80.29947443,119.2043691, 74 | 2011,Faeroe Islands,FRO,,,80.84634146,34.73638968, 75 | 2011,Fiji,FJI,7232.648686,,69.56397561,47.4727422, 76 | 2011,Finland,FIN,40251.37351,,80.47073171,17.73099477, 77 | 2011,Fragile and conflict affected situations,FCS,3204.910423,,59.64876329,30.52870667, 78 | 2011,France,FRA,37325.30434,,82.11463415,119.3351121, 79 | 2011,French Polynesia,PYF,,,75.91602439,74.00601093, 80 | 2011,Gabon,GAB,16766.33287,,62.69334146,6.121387822, 81 | 2011,"Gambia, The",GMB,1532.494004,,58.37480488,172.8358696, 82 | 2011,Georgia,GEO,6322.499298,,73.80978049,78.43783897, 83 | 2011,Germany,DEU,42079.87575,,80.74146341,234.6731495, 84 | 2011,Ghana,GHA,3430.857389,,60.78858537,109.5565747, 85 | 2011,Greece,GRC,26675.45723,,80.73170732,86.29335144, 86 | 2011,Greenland,GRL,,,71.00170732,0.138603971, 87 | 2011,Grenada,GRD,11219.48673,,72.47636585,309.0294118, 88 | 2011,Guam,GUM,,,78.36058537,297.8851852, 89 | 2011,Guatemala,GTM,6798.850903,,71.32790244,140.4374767, 90 | 2011,Guinea,GIN,1183.855846,,55.59019512,46.05384584, 91 | 2011,Guinea-Bissau,GNB,1413.946945,,53.79609756,59.51312233, 92 | 2011,Guyana,GUY,6077.186038,,65.88007317,3.83989332, 93 | 2011,Haiti,HTI,1562.304129,,62.30597561,368.1019594, 94 | 2011,Heavily indebted poor countries (HIPC),HPC,1842.465726,,57.67490371,32.70239541, 95 | 2011,High income,HIC,37477.93334,,78.76868118,24.97468964, 96 | 2011,High income: nonOECD,NOC,30112.69632,,73.39688439,13.6521203, 97 | 2011,High income: OECD,OEC,39837.51725,,80.39680424,33.37565148, 98 | 2011,Honduras,HND,4433.721957,85.1233139,73.17319512,68.11523818, 99 | 2011,"Hong Kong SAR, China",HKG,50085.95933,,83.42195122,6734.857143, 100 | 2011,Hungary,HUN,22523.83071,,74.85853659,110.1483155, 101 | 2011,Iceland,ISL,39619.44153,,82.35853659,3.182184539, 102 | 2011,India,IND,4685.863732,69.3025589,65.9584878,419.5648482, 103 | 2011,Indonesia,IDN,8870.284189,92.81190491,70.39156098,135.1359616, 104 | 2011,"Iran, Islamic Rep.",IRN,17480.18235,,73.44931707,46.16641921, 105 | 2011,Iraq,IRQ,13226.94139,,69.01578049,73.24136812, 106 | 2011,Ireland,IRL,44912.65491,,80.74634146,66.43626071, 107 | 2011,Isle of Man,IMY,,,,149.2438596, 108 | 2011,Israel,ISR,30182.50117,,81.65609756,358.8632163, 109 | 2011,Italy,ITA,35901.25902,,82.18780488,201.8747841, 110 | 2011,Jamaica,JAM,8484.585699,,73.07639024,249.2925208, 111 | 2011,Japan,JPN,34315.79841,,82.59121951,350.6117787, 112 | 2011,Jordan,JOR,11292.20029,95.90444946,73.59187805,69.62153638, 113 | 2011,Kazakhstan,KAZ,20772.06612,,68.98,6.132755491, 114 | 2011,Kenya,KEN,2622.730169,,60.37090244,72.77638894, 115 | 2011,Kiribati,KIR,1659.293952,,68.20960976,129.2123457, 116 | 2011,"Korea, Dem. Rep.",PRK,,,69.18880488,204.5624035, 117 | 2011,"Korea, Rep.",KOR,31327.1269,,80.96707317,511.9761391, 118 | 2011,Kosovo,KSV,8226.86375,,70.14878049,164.5041793, 119 | 2011,Kuwait,KWT,76306.4367,,74.2582439,181.7722222, 120 | 2011,Kyrgyz Republic,KGZ,2920.60321,,69.60243902,28.75182482, 121 | 2011,Lao PDR,LAO,4233.381248,,67.35495122,27.58626083, 122 | 2011,Latin America & Caribbean (all income levels),LCN,14327.27982,,74.40124685,30.22081192, 123 | 2011,Latin America & Caribbean (developing only),LAC,13160.64775,,74.08586574,32.80254469, 124 | 2011,Latvia,LVA,19405.47239,99.89590454,73.57560976,33.1142926, 125 | 2011,Least developed countries: UN classification,LDC,1981.368032,,60.6220804,43.0239505, 126 | 2011,Lebanon,LBN,16409.25831,,79.55892683,428.9967742, 127 | 2011,Lesotho,LSO,2297.432657,,48.2197561,66.96146245, 128 | 2011,Liberia,LBR,732.6330069,,59.85987805,42.35438123, 129 | 2011,Libya,LBY,11023.43714,,74.98746341,3.574031849, 130 | 2011,Liechtenstein,LIE,,,81.79268293,228.35625, 131 | 2011,Lithuania,LTU,22530.30357,99.81559753,73.56341463,48.31533012, 132 | 2011,Low & middle income,LMY,7567.249092,,68.49788054,75.44675418, 133 | 2011,Low income,LIC,1401.750794,,58.23376937,42.91637777, 134 | 2011,Lower middle income,LMC,5127.597479,,66.12759698,135.3948814, 135 | 2011,Luxembourg,LUX,91469.08769,,80.98780488,200.1339768, 136 | 2011,"Macao SAR, China",MAC,117101.0242,95.64003754,79.90982927,18283.67893, 137 | 2011,"Macedonia, FYR",MKD,11641.24517,,74.8745122,81.9146709, 138 | 2011,Madagascar,MDG,1371.648575,,63.79890244,37.26203732, 139 | 2011,Malawi,MWI,758.5815996,,54.13995122,161.5062898, 140 | 2011,Malaysia,MYS,21233.35025,,74.66839024,86.9668848, 141 | 2011,Maldives,MDV,12444.4319,,77.19287805,1128.726667, 142 | 2011,Mali,MLI,1528.400308,33.56093979,54.19082927,12.81695064, 143 | 2011,Malta,MLT,28177.54023,93.30735779,80.74634146,1300.8375, 144 | 2011,Marshall Islands,MHL,3507.201705,,,291.8944444, 145 | 2011,Mauritania,MRT,3375.046539,,61.18695122,3.573514117, 146 | 2011,Mauritius,MUS,16179.29153,89.24983215,73.26682927,616.9477833, 147 | 2011,Mexico,MEX,15754.19716,93.51998138,76.91417073,61.9178842, 148 | 2011,"Micronesia, Fed. Sts.",FSM,3410.284084,,68.73797561,147.8228571, 149 | 2011,Middle East & North Africa (all income levels),MEA,16896.0381,,71.93335956,34.97715472, 150 | 2011,Middle East & North Africa (developing only),MNA,11490.48371,,71.17017601,39.0736381, 151 | 2011,Middle income,MIC,8268.130908,,69.66204847,82.54289095, 152 | 2011,Moldova,MDA,4179.214809,,68.57529268,124.0975355, 153 | 2011,Monaco,MCO,,,,18594.5, 154 | 2011,Mongolia,MNG,8889.345538,,67.12470732,1.775968743, 155 | 2011,Montenegro,MNE,14081.77802,98.44220734,74.53936585,46.10252788, 156 | 2011,Morocco,MAR,6603.171083,67.08415985,70.40912195,72.89259243, 157 | 2011,Mozambique,MOZ,956.6069415,,49.48782927,31.81276355, 158 | 2011,Myanmar,MMR,,,64.7612439,79.78908448, 159 | 2011,Namibia,NAM,8626.823577,,63.27717073,2.72098653, 160 | 2011,Nepal,NPL,2042.140687,59.62725067,67.54956098,189.6005371, 161 | 2011,Netherlands,NLD,46388.29359,,81.20487805,495.0496441, 162 | 2011,New Caledonia,NCL,,,76.72682927,13.89496718, 163 | 2011,New Zealand,NZL,32283.03429,,80.90487805,16.64957654, 164 | 2011,Nicaragua,NIC,4223.427239,,74.13470732,48.26148413, 165 | 2011,Niger,NER,807.1930209,,57.48263415,13.37845188, 166 | 2011,Nigeria,NGA,5230.598854,,51.7102439,179.8156165, 167 | 2011,North America,NAC,48967.34558,,78.8827233,18.97534979, 168 | 2011,Northern Mariana Islands,MNP,,,,115.726087, 169 | 2011,Norway,NOR,62736.68047,,81.29512195,13.56100152, 170 | 2011,Not classified,INX,,,,, 171 | 2011,OECD members,OED,36230.12369,,79.71810307,36.37464029, 172 | 2011,Oman,OMN,42479.20132,,76.32314634,10.37157674, 173 | 2011,Other small states,OSS,8400.345912,,59.04757131,10.58946645, 174 | 2011,Pacific island small states,PSS,4773.190404,,69.47360442,34.74300405, 175 | 2011,Pakistan,PAK,4322.534383,54.73801804,66.28387805,225.2875259, 176 | 2011,Palau,PLW,13033.65817,,,44.79565217, 177 | 2011,Panama,PAN,16511.05477,,77.15680488,49.52890772, 178 | 2011,Papua New Guinea,PNG,2345.065877,,62.16212195,15.45990372, 179 | 2011,Paraguay,PRY,7504.677492,,72.11346341,15.84133652, 180 | 2011,Peru,PER,10378.57187,,74.21053659,23.24991484, 181 | 2011,Philippines,PHL,5754.112442,,68.3914878,316.9374283, 182 | 2011,Poland,POL,22333.49091,,76.74634146,124.2962969, 183 | 2011,Portugal,PRT,26932.41149,94.47705078,80.47073171,115.2697893, 184 | 2011,Puerto Rico,PRI,34121.07729,,78.35885366,415.6449831, 185 | 2011,Qatar,QAT,134117.4309,96.40686798,78.29968293,164.1203273, 186 | 2011,Romania,ROM,17362.88789,98.60428619,74.56341463,87.53324934, 187 | 2011,Russian Federation,RUS,22569.81427,,69.65853659,8.729437799, 188 | 2011,Rwanda,RWA,1397.224691,,62.92241463,427.9055128, 189 | 2011,Samoa,WSM,5674.478554,98.97325897,72.69578049,66.23109541, 190 | 2011,San Marino,SMR,,,83.32321508,515.6333333, 191 | 2011,Sao Tome and Principe,STP,2938.116031,,66.00326829,181.9229167, 192 | 2011,Saudi Arabia,SAU,47474.04338,,75.288,13.39190209, 193 | 2011,Senegal,SEN,2159.022888,52.05195999,63.04258537,69.37621669, 194 | 2011,Serbia,SRB,12571.89036,97.96240997,74.53658537,82.7132289, 195 | 2011,Seychelles,SYC,22556.41933,,72.72439024,190.0891304, 196 | 2011,Sierra Leone,SLE,1389.52981,,45.10258537,81.8635079, 197 | 2011,Singapore,SGP,74949.06837,96.17615509,81.74390244,7363.210227, 198 | 2011,Sint Maarten (Dutch part),SXM,36327.2319,,,983.3823529, 199 | 2011,Slovak Republic,SVK,25066.09589,,75.95853659,112.2605224, 200 | 2011,Slovenia,SVN,28491.9137,,79.97073171,101.9286495, 201 | 2011,Small states,SST,9464.098804,,62.8122819,12.37488819, 202 | 2011,Solomon Islands,SLB,1976.4069,,67.29221951,19.20857449, 203 | 2011,Somalia,SOM,,,54.35795122,15.63214525, 204 | 2011,South Africa,ZAF,12290.89436,93.10214233,55.29565854,42.49765393, 205 | 2011,South Asia,SAS,4402.624818,,66.38451738,346.3109867, 206 | 2011,South Sudan,SSD,3461.727293,,54.04556098,, 207 | 2011,Spain,ESP,32673.96047,97.78375244,82.47560976,93.50783588, 208 | 2011,Sri Lanka,LKA,8111.554142,,73.89929268,332.7858396, 209 | 2011,St. Kitts and Nevis,KNA,20571.97012,,,203.8384615, 210 | 2011,St. Lucia,LCA,10534.19432,,74.55295122,293.8983607, 211 | 2011,St. Martin (French part),MAF,,,78.87073171,562.7757353, 212 | 2011,St. Vincent and the Grenadines,VCT,9884.934792,,72.29943902,280.3615385, 213 | 2011,Sub-Saharan Africa (all income levels),SSF,3231.794985,,55.86019297,38.02136148, 214 | 2011,Sub-Saharan Africa (developing only),SSA,3204.46192,,55.8617092,38.03177401, 215 | 2011,Sudan,SDN,3478.252239,,61.67941463,19.96141204, 216 | 2011,Suriname,SUR,15122.88216,,70.58112195,3.355378205, 217 | 2011,Swaziland,SWZ,5844.887402,,48.66139024,70.49174419, 218 | 2011,Sweden,SWE,43709.21145,,81.80243902,23.02776478, 219 | 2011,Switzerland,CHE,54550.69248,,82.69512195,200.2327665, 220 | 2011,Syrian Arab Republic,SYR,,,74.77058537,114.7465937, 221 | 2011,"Taiwan, China",TWN,,,,, 222 | 2011,Tajikistan,TJK,2229.445742,,67.13521951,55.40100743, 223 | 2011,Tanzania,TZA,2243.493321,,60.07465854,53.19823662, 224 | 2011,Thailand,THA,12735.54732,,74.00890244,130.9537435, 225 | 2011,Timor-Leste,TMP,1939.614423,,66.49180488,75.34579691, 226 | 2011,Togo,TGO,1255.087825,60.40994644,55.80765854,120.7240118, 227 | 2011,Tonga,TON,5030.652213,99.38553619,72.3352439,144.9888889, 228 | 2011,Trinidad and Tobago,TTO,28706.65533,,69.70856098,260.1929825, 229 | 2011,Tunisia,TUN,10235.03337,79.65390778,74.34390244,68.70365602, 230 | 2011,Turkey,TUR,17873.69844,94.10609436,74.5404878,95.1098216, 231 | 2011,Turkmenistan,TKM,11360.50215,,65.16360976,10.86687805, 232 | 2011,Turks and Caicos Islands,TCA,,,,33.39684211, 233 | 2011,Tuvalu,TUV,3488.248359,,,328.1333333, 234 | 2011,Uganda,UGA,1648.531066,,58.01753659,170.8574806, 235 | 2011,Ukraine,UKR,8281.867126,,70.80926829,78.89611959, 236 | 2011,United Arab Emirates,ARE,57416.9743,,76.781,104.4823206, 237 | 2011,United Kingdom,GBR,36549.4228,,80.95121951,261.4761212, 238 | 2011,United States,USA,49781.35749,,78.64146341,34.07754667, 239 | 2011,Upper middle income,UMC,12044.73854,,73.88157871,56.30583589, 240 | 2011,Uruguay,URY,17904.81986,98.33589935,76.76426829,19.3441321, 241 | 2011,Uzbekistan,UZB,4412.474193,,67.98039024,68.96897038, 242 | 2011,Vanuatu,VUT,2915.239621,,71.12804878,19.84216571, 243 | 2011,"Venezuela, RB",VEN,17001.91324,94.77022552,74.33097561,33.36276968, 244 | 2011,Vietnam,VNM,4716.9762,,75.45790244,283.2908698, 245 | 2011,Virgin Islands (U.S.),VIR,,,79.37317073,302.24, 246 | 2011,West Bank and Gaza,WBG,4358.655078,95.26724243,72.82943902,652.3340532, 247 | 2011,World,WLD,13420.89202,,70.51435535,54.01171897, 248 | 2011,"Yemen, Rep.",YEM,3616.243909,,62.7184878,45.90211565, 249 | 2011,Zambia,ZMB,3381.356194,,55.83314634,19.29475242, 250 | 2011,Zimbabwe,ZWE,1523.621786,83.5827179,55.94621951,36.85043815, -------------------------------------------------------------------------------- /source/conf.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | # -*- coding: utf-8 -*- 3 | # 4 | # python_for_social_scientists documentation build configuration file, created by 5 | # sphinx-quickstart on Fri Nov 6 13:49:59 2015. 6 | # 7 | # This file is execfile()d with the current directory set to its 8 | # containing dir. 9 | # 10 | # Note that not all possible configuration values are present in this 11 | # autogenerated file. 12 | # 13 | # All configuration values have a default; values that are commented out 14 | # serve to show the default. 15 | 16 | import sys 17 | import os 18 | import shlex 19 | 20 | # If extensions (or modules to document with autodoc) are in another directory, 21 | # add these directories to sys.path here. If the directory is relative to the 22 | # documentation root, use os.path.abspath to make it absolute, like shown here. 23 | #sys.path.insert(0, os.path.abspath('.')) 24 | 25 | # -- General configuration ------------------------------------------------ 26 | 27 | # If your documentation needs a minimal Sphinx version, state it here. 28 | #needs_sphinx = '1.0' 29 | 30 | # Add any Sphinx extension module names here, as strings. They can be 31 | # extensions coming with Sphinx (named 'sphinx.ext.*') or your custom 32 | # ones. 33 | extensions = [ 34 | 'sphinx.ext.doctest', 35 | 'sphinx.ext.intersphinx', 36 | 'sphinx.ext.todo', 37 | 'sphinx.ext.coverage', 38 | 'sphinx.ext.mathjax', 39 | 'sphinx.ext.ifconfig', 40 | 'sphinx.ext.viewcode', 41 | 'IPython.sphinxext.ipython_console_highlighting', 42 | 'IPython.sphinxext.ipython_directive', 43 | 'sphinxcontrib.googleanalytics' 44 | ] 45 | 46 | # Googl analytics 47 | googleanalytics_id = 'UA-75136222-1' 48 | googleanalytics_enabled = True 49 | 50 | 51 | # Add any paths that contain templates here, relative to this directory. 52 | templates_path = ['_templates'] 53 | 54 | # The suffix(es) of source filenames. 55 | # You can specify multiple suffix as a list of string: 56 | # source_suffix = ['.rst', '.md'] 57 | source_suffix = '.rst' 58 | 59 | # The encoding of source files. 60 | #source_encoding = 'utf-8-sig' 61 | 62 | # The master toctree document. 63 | master_doc = 'index' 64 | 65 | # General information about the project. 66 | project = 'Data Analysis in Python' 67 | copyright = '2015, Nick Eubank' 68 | author = 'Nick Eubank' 69 | 70 | # The version info for the project you're documenting, acts as replacement for 71 | # |version| and |release|, also used in various other places throughout the 72 | # built documents. 73 | # 74 | # The short X.Y version. 75 | version = '0.1' 76 | # The full version, including alpha/beta/rc tags. 77 | release = '0.1' 78 | 79 | # The language for content autogenerated by Sphinx. Refer to documentation 80 | # for a list of supported languages. 81 | # 82 | # This is also used if you do content translation via gettext catalogs. 83 | # Usually you set "language" from the command line for these cases. 84 | language = None 85 | 86 | # There are two options for replacing |today|: either, you set today to some 87 | # non-false value, then it is used: 88 | #today = '' 89 | # Else, today_fmt is used as the format for a strftime call. 90 | #today_fmt = '%B %d, %Y' 91 | 92 | # List of patterns, relative to source directory, that match files and 93 | # directories to ignore when looking for source files. 94 | exclude_patterns = [] 95 | 96 | # The reST default role (used for this markup: `text`) to use for all 97 | # documents. 98 | #default_role = None 99 | 100 | # If true, '()' will be appended to :func: etc. cross-reference text. 101 | #add_function_parentheses = True 102 | 103 | # If true, the current module name will be prepended to all description 104 | # unit titles (such as .. function::). 105 | #add_module_names = True 106 | 107 | # If true, sectionauthor and moduleauthor directives will be shown in the 108 | # output. They are ignored by default. 109 | #show_authors = False 110 | 111 | # The name of the Pygments (syntax highlighting) style to use. 112 | pygments_style = 'sphinx' 113 | 114 | # A list of ignored prefixes for module index sorting. 115 | #modindex_common_prefix = [] 116 | 117 | # If true, keep warnings as "system message" paragraphs in the built documents. 118 | #keep_warnings = False 119 | 120 | # If true, `todo` and `todoList` produce output, else they produce nothing. 121 | todo_include_todos = True 122 | 123 | 124 | # -- Options for HTML output ---------------------------------------------- 125 | 126 | # The theme to use for HTML and HTML Help pages. See the documentation for 127 | # a list of builtin themes. 128 | html_theme = 'alabaster' 129 | 130 | # Theme options are theme-specific and customize the look and feel of a theme 131 | # further. For a list of options available for each theme, see the 132 | # documentation. 133 | html_theme_options = {'github_button':False} 134 | 135 | # Add any paths that contain custom themes here, relative to this directory. 136 | #html_theme_path = [] 137 | 138 | # The name for this set of Sphinx documents. If None, it defaults to 139 | # " v documentation". 140 | #html_title = None 141 | 142 | # A shorter title for the navigation bar. Default is the same as html_title. 143 | #html_short_title = None 144 | 145 | # The name of an image file (relative to this directory) to place at the top 146 | # of the sidebar. 147 | #html_logo = None 148 | 149 | # The name of an image file (within the static path) to use as favicon of the 150 | # docs. This file should be a Windows icon file (.ico) being 16x16 or 32x32 151 | # pixels large. 152 | #html_favicon = None 153 | 154 | # Add any paths that contain custom static files (such as style sheets) here, 155 | # relative to this directory. They are copied after the builtin static files, 156 | # so a file named "default.css" will overwrite the builtin "default.css". 157 | html_static_path = ['_static'] 158 | 159 | # Add any extra paths that contain custom files (such as robots.txt or 160 | # .htaccess) here, relative to this directory. These files are copied 161 | # directly to the root of the documentation. 162 | #html_extra_path = [] 163 | 164 | # If not '', a 'Last updated on:' timestamp is inserted at every page bottom, 165 | # using the given strftime format. 166 | #html_last_updated_fmt = '%b %d, %Y' 167 | 168 | # If true, SmartyPants will be used to convert quotes and dashes to 169 | # typographically correct entities. 170 | #html_use_smartypants = True 171 | 172 | # Custom sidebar templates, maps document names to template names. 173 | html_sidebars = { 174 | '**': [ 175 | 'about.html', 176 | 'navigation.html', 177 | 'searchbox.html' 178 | ] 179 | } 180 | 181 | # Additional templates that should be rendered to pages, maps page names to 182 | # template names. 183 | #html_additional_pages = {} 184 | 185 | # If false, no module index is generated. 186 | #html_domain_indices = True 187 | 188 | # If false, no index is generated. 189 | #html_use_index = True 190 | 191 | # If true, the index is split into individual pages for each letter. 192 | #html_split_index = False 193 | 194 | # If true, links to the reST sources are added to the pages. 195 | #html_show_sourcelink = True 196 | 197 | # If true, "Created using Sphinx" is shown in the HTML footer. Default is True. 198 | #html_show_sphinx = True 199 | 200 | # If true, "(C) Copyright ..." is shown in the HTML footer. Default is True. 201 | #html_show_copyright = True 202 | 203 | # If true, an OpenSearch description file will be output, and all pages will 204 | # contain a tag referring to it. The value of this option must be the 205 | # base URL from which the finished HTML is served. 206 | #html_use_opensearch = '' 207 | 208 | # This is the file name suffix for HTML files (e.g. ".xhtml"). 209 | #html_file_suffix = None 210 | 211 | # Language to be used for generating the HTML full-text search index. 212 | # Sphinx supports the following languages: 213 | # 'da', 'de', 'en', 'es', 'fi', 'fr', 'h', 'it', 'ja' 214 | # 'nl', 'no', 'pt', 'ro', 'r', 'sv', 'tr' 215 | #html_search_language = 'en' 216 | 217 | # A dictionary with options for the search language support, empty by default. 218 | # Now only 'ja' uses this config value 219 | #html_search_options = {'type': 'default'} 220 | 221 | # The name of a javascript file (relative to the configuration directory) that 222 | # implements a search results scorer. If empty, the default will be used. 223 | #html_search_scorer = 'scorer.js' 224 | 225 | # Output file base name for HTML help builder. 226 | htmlhelp_basename = 'python_for_social_scientistsdoc' 227 | 228 | # -- Options for LaTeX output --------------------------------------------- 229 | 230 | latex_elements = { 231 | # The paper size ('letterpaper' or 'a4paper'). 232 | #'papersize': 'letterpaper', 233 | 234 | # The font size ('10pt', '11pt' or '12pt'). 235 | #'pointsize': '10pt', 236 | 237 | # Additional stuff for the LaTeX preamble. 238 | #'preamble': '', 239 | 240 | # Latex figure (float) alignment 241 | #'figure_align': 'htbp', 242 | } 243 | 244 | # Grouping the document tree into LaTeX files. List of tuples 245 | # (source start file, target name, title, 246 | # author, documentclass [howto, manual, or own class]). 247 | latex_documents = [ 248 | (master_doc, 'data_analysis_in_python.tex', 'data\_analysis\_in\_python Documentation', 249 | 'Nick Eubank', 'manual'), 250 | ] 251 | 252 | # The name of an image file (relative to this directory) to place at the top of 253 | # the title page. 254 | #latex_logo = None 255 | 256 | # For "manual" documents, if this is true, then toplevel headings are parts, 257 | # not chapters. 258 | #latex_use_parts = False 259 | 260 | # If true, show page references after internal links. 261 | #latex_show_pagerefs = False 262 | 263 | # If true, show URL addresses after external links. 264 | #latex_show_urls = False 265 | 266 | # Documents to append as an appendix to all manuals. 267 | #latex_appendices = [] 268 | 269 | # If false, no module index is generated. 270 | #latex_domain_indices = True 271 | 272 | 273 | # -- Options for manual page output --------------------------------------- 274 | 275 | # One entry per manual page. List of tuples 276 | # (source start file, name, description, authors, manual section). 277 | man_pages = [ 278 | (master_doc, 'data_analysis_in_python', 'data_analysis_in_python Documentation', 279 | [author], 1) 280 | ] 281 | 282 | # If true, show URL addresses after external links. 283 | #man_show_urls = False 284 | 285 | 286 | # -- Options for Texinfo output ------------------------------------------- 287 | 288 | # Grouping the document tree into Texinfo files. List of tuples 289 | # (source start file, target name, title, author, 290 | # dir menu entry, description, category) 291 | texinfo_documents = [ 292 | (master_doc, 'data_analysis_in_python', 'data_analysis_in_python Documentation', 293 | author, 'data_analysis_in_python', 'One line description of project.', 294 | 'Miscellaneous'), 295 | ] 296 | 297 | # Documents to append as an appendix to all manuals. 298 | #texinfo_appendices = [] 299 | 300 | # If false, no module index is generated. 301 | #texinfo_domain_indices = True 302 | 303 | # How to display URL addresses: 'footnote', 'no', or 'inline'. 304 | #texinfo_show_urls = 'footnote' 305 | 306 | # If true, do not generate a @detailmenu in the "Top" node's menu. 307 | #texinfo_no_detailmenu = False 308 | 309 | 310 | # Example configuration for intersphinx: refer to the Python standard library. 311 | intersphinx_mapping = {'https://docs.python.org/': None} 312 | -------------------------------------------------------------------------------- /source/googleanalytics/MANIFEST.in: -------------------------------------------------------------------------------- 1 | include README 2 | include LICENSE 3 | include CHANGES.* 4 | -------------------------------------------------------------------------------- /source/googleanalytics/README.rst: -------------------------------------------------------------------------------- 1 | .. -*- restructuredtext -*- 2 | 3 | =========================================== 4 | Google Analytics extension for Sphinx 5 | =========================================== 6 | 7 | :author: Domen Kožar 8 | 9 | 10 | About 11 | ===== 12 | 13 | This extensions allows you to track generated html files 14 | with Google Analytics web service. 15 | 16 | 17 | Installing from sphinx-contrib checkout 18 | --------------------------------------- 19 | 20 | Checkout sphinx-contrib:: 21 | 22 | $ hg clone https://bitbucket.org/birkenfeld/sphinx-contrib/ 23 | 24 | Change into the googleanalytics directory:: 25 | 26 | $ cd sphinx-contrib/googleanalytics 27 | 28 | Install the module:: 29 | 30 | $ python setup.py install 31 | 32 | 33 | Enabling the extension in Sphinx_ 34 | --------------------------------- 35 | 36 | Just add ``sphinxcontrib.googleanalytics`` to the list of extensions in the ``conf.py`` 37 | file. For example:: 38 | 39 | extensions = ['sphinxcontrib.googleanalytics'] 40 | 41 | 42 | Configuration 43 | ------------- 44 | 45 | For now one optional configuration is added to Sphinx_. It can be set in 46 | ``conf.py`` file: 47 | 48 | ``googleanalytics_id`` : 49 | UA id for your site, example:: 50 | googleanalytics_id = 'UA-123-123-123' 51 | 52 | ``googleanalytics_enabled`` : 53 | True by default, use it to turn off tracking. 54 | 55 | 56 | .. Links: 57 | .. _gnuplot: http://www.gnuplot.info/ 58 | .. _Sphinx: http://sphinx.pocoo.org/ 59 | 60 | -------------------------------------------------------------------------------- /source/googleanalytics/build/lib/sphinxcontrib/__init__.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | """ 3 | sphinxcontrib 4 | ~~~~~~~~~~~~~ 5 | 6 | This package is a namespace package that contains all extensions 7 | distributed in the ``sphinx-contrib`` distribution. 8 | 9 | :copyright: Copyright 2007-2009 by the Sphinx team, see AUTHORS. 10 | :license: BSD, see LICENSE for details. 11 | """ 12 | 13 | __import__('pkg_resources').declare_namespace(__name__) 14 | 15 | -------------------------------------------------------------------------------- /source/googleanalytics/build/lib/sphinxcontrib/googleanalytics.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # -*- coding: utf-8 -*- 3 | 4 | from sphinx.application import ExtensionError 5 | 6 | def add_ga_javascript(app, pagename, templatename, context, doctree): 7 | if not app.config.googleanalytics_enabled: 8 | return 9 | 10 | metatags = context.get('metatags', '') 11 | metatags += """""" % app.config.googleanalytics_id 23 | context['metatags'] = metatags 24 | 25 | def check_config(app): 26 | if not app.config.googleanalytics_id: 27 | raise ExtensionError("'googleanalytics_id' config value must be set for ga statistics to function properly.") 28 | 29 | def setup(app): 30 | app.add_config_value('googleanalytics_id', '', 'html') 31 | app.add_config_value('googleanalytics_enabled', True, 'html') 32 | app.connect('html-page-context', add_ga_javascript) 33 | app.connect('builder-inited', check_config) 34 | return {'version': '0.1'} 35 | -------------------------------------------------------------------------------- /source/googleanalytics/dist/sphinxcontrib_googleanalytics-0.1.dev20151208-py3.4.egg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/nickeubank/data-analysis-in-python/0fb65e2e853176c1a10f84a93e128f59086d334c/source/googleanalytics/dist/sphinxcontrib_googleanalytics-0.1.dev20151208-py3.4.egg -------------------------------------------------------------------------------- /source/googleanalytics/setup.cfg: -------------------------------------------------------------------------------- 1 | [egg_info] 2 | tag_build = dev 3 | tag_date = true 4 | 5 | [aliases] 6 | release = egg_info -RDb '' 7 | -------------------------------------------------------------------------------- /source/googleanalytics/setup.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | 3 | import os 4 | from setuptools import setup, find_packages 5 | 6 | HERE = os.path.dirname(os.path.abspath(__file__)) 7 | long_desc = open(os.path.join(HERE, 'README.rst')).read() 8 | 9 | requires = ['Sphinx>=0.6'] 10 | 11 | setup( 12 | name='sphinxcontrib-googleanalytics', 13 | version='0.1', 14 | url='http://bitbucket.org/birkenfeld/sphinx-contrib', 15 | download_url='http://pypi.python.org/pypi/sphinxcontrib-googleanalytics', 16 | license='BSD', 17 | author='Domen Kozar', 18 | author_email='domen@dev.si', 19 | description='Sphinx extension googleanalytics', 20 | long_description=long_desc, 21 | zip_safe=False, 22 | classifiers=[ 23 | 'Development Status :: 4 - Beta', 24 | 'Environment :: Console', 25 | 'Environment :: Web Environment', 26 | 'Intended Audience :: Developers', 27 | 'License :: OSI Approved :: BSD License', 28 | 'Operating System :: OS Independent', 29 | 'Programming Language :: Python', 30 | 'Topic :: Documentation', 31 | 'Topic :: Utilities', 32 | ], 33 | platforms='any', 34 | packages=find_packages(), 35 | include_package_data=True, 36 | install_requires=requires, 37 | namespace_packages=['sphinxcontrib'], 38 | ) 39 | -------------------------------------------------------------------------------- /source/googleanalytics/sphinxcontrib/__init__.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | """ 3 | sphinxcontrib 4 | ~~~~~~~~~~~~~ 5 | 6 | This package is a namespace package that contains all extensions 7 | distributed in the ``sphinx-contrib`` distribution. 8 | 9 | :copyright: Copyright 2007-2009 by the Sphinx team, see AUTHORS. 10 | :license: BSD, see LICENSE for details. 11 | """ 12 | 13 | __import__('pkg_resources').declare_namespace(__name__) 14 | 15 | -------------------------------------------------------------------------------- /source/googleanalytics/sphinxcontrib/googleanalytics.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # -*- coding: utf-8 -*- 3 | 4 | from sphinx.application import ExtensionError 5 | 6 | def add_ga_javascript(app, pagename, templatename, context, doctree): 7 | if not app.config.googleanalytics_enabled: 8 | return 9 | 10 | metatags = context.get('metatags', '') 11 | metatags += """""" % app.config.googleanalytics_id 23 | context['metatags'] = metatags 24 | 25 | def check_config(app): 26 | if not app.config.googleanalytics_id: 27 | raise ExtensionError("'googleanalytics_id' config value must be set for ga statistics to function properly.") 28 | 29 | def setup(app): 30 | app.add_config_value('googleanalytics_id', '', 'html') 31 | app.add_config_value('googleanalytics_enabled', True, 'html') 32 | app.connect('html-page-context', add_ga_javascript) 33 | app.connect('builder-inited', check_config) 34 | return {'version': '0.1'} 35 | -------------------------------------------------------------------------------- /source/googleanalytics/sphinxcontrib_googleanalytics.egg-info/PKG-INFO: -------------------------------------------------------------------------------- 1 | Metadata-Version: 1.1 2 | Name: sphinxcontrib-googleanalytics 3 | Version: 0.1.dev20151208 4 | Summary: Sphinx extension googleanalytics 5 | Home-page: http://bitbucket.org/birkenfeld/sphinx-contrib 6 | Author: Domen Kozar 7 | Author-email: domen@dev.si 8 | License: BSD 9 | Download-URL: http://pypi.python.org/pypi/sphinxcontrib-googleanalytics 10 | Description: .. -*- restructuredtext -*- 11 | 12 | =========================================== 13 | Google Analytics extension for Sphinx 14 | =========================================== 15 | 16 | :author: Domen Kožar 17 | 18 | 19 | About 20 | ===== 21 | 22 | This extensions allows you to track generated html files 23 | with Google Analytics web service. 24 | 25 | 26 | Installing from sphinx-contrib checkout 27 | --------------------------------------- 28 | 29 | Checkout sphinx-contrib:: 30 | 31 | $ hg clone https://bitbucket.org/birkenfeld/sphinx-contrib/ 32 | 33 | Change into the googleanalytics directory:: 34 | 35 | $ cd sphinx-contrib/googleanalytics 36 | 37 | Install the module:: 38 | 39 | $ python setup.py install 40 | 41 | 42 | Enabling the extension in Sphinx_ 43 | --------------------------------- 44 | 45 | Just add ``sphinxcontrib.googleanalytics`` to the list of extensions in the ``conf.py`` 46 | file. For example:: 47 | 48 | extensions = ['sphinxcontrib.googleanalytics'] 49 | 50 | 51 | Configuration 52 | ------------- 53 | 54 | For now one optional configuration is added to Sphinx_. It can be set in 55 | ``conf.py`` file: 56 | 57 | ``googleanalytics_id`` : 58 | UA id for your site, example:: 59 | googleanalytics_id = 'UA-123-123-123' 60 | 61 | ``googleanalytics_enabled`` : 62 | True by default, use it to turn off tracking. 63 | 64 | 65 | .. Links: 66 | .. _gnuplot: http://www.gnuplot.info/ 67 | .. _Sphinx: http://sphinx.pocoo.org/ 68 | 69 | 70 | Platform: any 71 | Classifier: Development Status :: 4 - Beta 72 | Classifier: Environment :: Console 73 | Classifier: Environment :: Web Environment 74 | Classifier: Intended Audience :: Developers 75 | Classifier: License :: OSI Approved :: BSD License 76 | Classifier: Operating System :: OS Independent 77 | Classifier: Programming Language :: Python 78 | Classifier: Topic :: Documentation 79 | Classifier: Topic :: Utilities 80 | -------------------------------------------------------------------------------- /source/googleanalytics/sphinxcontrib_googleanalytics.egg-info/SOURCES.txt: -------------------------------------------------------------------------------- 1 | MANIFEST.in 2 | README.rst 3 | setup.cfg 4 | setup.py 5 | sphinxcontrib/__init__.py 6 | sphinxcontrib/googleanalytics.py 7 | sphinxcontrib_googleanalytics.egg-info/PKG-INFO 8 | sphinxcontrib_googleanalytics.egg-info/SOURCES.txt 9 | sphinxcontrib_googleanalytics.egg-info/dependency_links.txt 10 | sphinxcontrib_googleanalytics.egg-info/namespace_packages.txt 11 | sphinxcontrib_googleanalytics.egg-info/not-zip-safe 12 | sphinxcontrib_googleanalytics.egg-info/requires.txt 13 | sphinxcontrib_googleanalytics.egg-info/top_level.txt -------------------------------------------------------------------------------- /source/googleanalytics/sphinxcontrib_googleanalytics.egg-info/dependency_links.txt: -------------------------------------------------------------------------------- 1 | 2 | -------------------------------------------------------------------------------- /source/googleanalytics/sphinxcontrib_googleanalytics.egg-info/namespace_packages.txt: -------------------------------------------------------------------------------- 1 | sphinxcontrib 2 | -------------------------------------------------------------------------------- /source/googleanalytics/sphinxcontrib_googleanalytics.egg-info/not-zip-safe: -------------------------------------------------------------------------------- 1 | 2 | -------------------------------------------------------------------------------- /source/googleanalytics/sphinxcontrib_googleanalytics.egg-info/requires.txt: -------------------------------------------------------------------------------- 1 | Sphinx>=0.6 2 | -------------------------------------------------------------------------------- /source/googleanalytics/sphinxcontrib_googleanalytics.egg-info/top_level.txt: -------------------------------------------------------------------------------- 1 | sphinxcontrib 2 | -------------------------------------------------------------------------------- /source/index.rst: -------------------------------------------------------------------------------- 1 | .. data_analysis_in_python documentation master file, created by 2 | sphinx-quickstart on Fri Nov 6 13:49:59 2015. 3 | You can adapt this file completely to your liking, but it should at least 4 | contain the root `toctree` directive. 5 | 6 | Welcome to Data Analysis in Python! 7 | ======================================================== 8 | 9 | 10 | Python is an increasingly popular tool for data analysis. In recent years, a number of libraries have reached maturity, allowing R and Stata users to take advantage of the beauty, flexibility, and performance of Python without sacrificing the functionality these older programs have accumulated over the years. 11 | 12 | This site is designed to offer an introduction to Python specifically tailored for social scientists and people doing applied data analysis -- users with little or no serious programming experience who just want to get things done, and who have experience with programs like R and Stata but are anxious for something better. 13 | 14 | Components 15 | ^^^^^^^^^^^^^ 16 | 17 | 1. **Core Skill Sequence**: A collection of four numbered tutorials that cover core skills everyone needs to work in Python in social science. I recommend you visit these in sequence -- a site for setting up Python on your computer using the Anaconda distribution, an intro to Python for those not familiar with the language, an introduction to the `pandas` library for working with tabular data (analogous to `data.frames` in R, or everything you ever did in Stata), and a guide to installing libraries to expand Python. 18 | 19 | 2. **Specific Resources for Different Research Topics:** "topic" pages, which you should feel free to jump through as appropriate for your purposes: :doc:`statsmodels, quantecon, and stan for econometrics `, :doc:`machine learning with scikit-learn `, :doc:`seaborn and ggplot ` for graphing, :doc:`network analysis using igraph `, :doc:`geo-spatial analysis`, :doc:`ways to accelerate Python `, :doc:`big data tools `, and :doc:`text analysis libraries `. The topic pages also include two topics that are a little unusual, but I think potentially quite useful: guide to :doc:`getting effective help online `, and resources on evidence-based research on :doc:`how to teach programming ` for anyone teaching this material. 20 | 21 | 3. **Resources for Other Software Tools:** Resources on tools and programs you may come across while using Python with descriptions of the tool, guidance on what you need to know most, and links to other tutorials. These include pages on the :doc:`Command Line `, :doc:`iPython `, and :doc:`Git and Github `. 22 | 23 | 24 | Ready to get started? Head on over to :doc:`Setup `! 25 | 26 | Question or comments? `Please send them my way! `_ Feedback of all sorts is greatly appreciated, and if you have any experience with github, `suggested changes to this site can also be submitted as pull-requests here `_ 27 | Contents: 28 | 29 | .. toctree:: 30 | :maxdepth: 1 31 | 32 | why_python 33 | python_for_r 34 | python_for_stata 35 | 1_setup 36 | 2_basic_python 37 | 3_pandas 38 | 4_install_packages 39 | t_statsmodels 40 | t_scikitlearn 41 | t_seaborn 42 | t_gis 43 | t_igraph 44 | t_super_fast 45 | t_big_data 46 | t_text_analysis 47 | t_getting_help 48 | t_teaching_programming 49 | r_to_python 50 | st_ipython 51 | st_command_line 52 | st_git_and_github 53 | -------------------------------------------------------------------------------- /source/python_for_r.rst: -------------------------------------------------------------------------------- 1 | 2 | Note to R Users 3 | ========================= 4 | 5 | Welcome R users! 6 | 7 | Most of the recent innovations in Python tools for data science were created by frustrated R users, which means you'll find that many of the things you love about R are preserved in Python, while many of the frustrations you may have have been been addressed. Below are some links to great tutorials on Python (no reason to reinvent the wheel), but first let's highlight some of the things you as an R user will find especially useful/surprising. 8 | 9 | * **Analogues for `data.frames` and `vectors` are in `pandas` library, not Python itself**: When you watch basic Python tutorials, you'll find yourself wondering where the vectors and data.frames are -- don't worry! They exist, but they're in a different library called `pandas`, so basic Python tutorials won't talk about them. This site is organized to point you to basic Python tutorials that will just cover what you need to know, then move you to `pandas` tutorials, which is what you'll be using for most of your work. 10 | * **Periods are operators in Python**: In R, it's common to use periods in variable names (like: `my.list`). There are many syntax differences between R and Python, but for some reason people really struggle with this one -- periods are an operator in Python, so don't use them in names! The convention is to use underscores, like: `my_list`. 11 | * **In Python you always count starting at 0, not 1** For example, the first row of a data frame is numbered 0, not 1. 12 | 13 | 14 | Key Concepts: References 15 | ^^^^^^^^^^^^^^^^^^^^^^^^ 16 | 17 | Before you dive into general-purpose tutorials on Python syntax, there's one large conceptual change you should be aware of coming from R. 18 | 19 | Say you make a new vector as follows: 20 | 21 | ``my.list <- list(1,2,3)`` 22 | 23 | In R, there's no difference between a variable (like ``my.list``) and the object (the list 1, 2, 3) associated with it. But this is actually a slight of hand used by R to hide something fundamental about how computers work, and it does not happen in Python. 24 | 25 | In Python, when you create an object like a `list`, Python puts that list somewhere in memory, kind of like you might put something big on a shelf somewhere in a warehouse. The variable associated with that `list` (``my.list``) is not the same as that list -- it actually just stores the location on the shelves where you placed that list. And because this behavior is normal in most languages, you may not see it emphasized in Python tutorials written by programmers not coming from R. 26 | 27 | The reason this matters is that it's possible for multiple variables to be pointed at the same item on a shelf, which means if you do something to one variable, it changes the item on the shelf, and so if you call the other variable that points to that item on a shelf, you will find the change affected both items. For example: 28 | 29 | .. ipython:: python 30 | 31 | # Make a new list 32 | x = [1, 2, 3] 33 | # Make new var Y, and assign it x. In R this would make a copy. 34 | y = x 35 | # Add to the end of the string 36 | x.append(4) 37 | # We see this new addition is now at end of x 38 | x 39 | 40 | # But look! It's also at the end of y! 41 | # That's because they both point at the same list on the shelf. 42 | y 43 | 44 | 45 | If what you want to do is make a *copy* of x, you use the `copy()` command: 46 | 47 | .. ipython:: python 48 | 49 | x = [1, 2, 3] 50 | y = x 51 | y_copy = x.copy() 52 | x.append(4) 53 | y_copy 54 | 55 | If you want to see if two variables point to the same thing, you can use the `is` operator, which tests whether two variables are pointed at the same place in memory / same shelf: 56 | 57 | .. ipython:: python 58 | 59 | x is y 60 | x is y_copy 61 | 62 | Mutable versus Immutable Types 63 | """""""""""""""""""""""""""""" 64 | 65 | However, there are three exceptions to this behavior. Certain data types in Python are called "immutable", meaning that if you try to change them, Python can't modify the item that's already sitting on the shelf; instead, it has to create a new item on a different shelf and redirect the variable to point at that new item. There are only three of these: strings, plain numbers, and tuples (a Python data structure you don't really need to worry about now). So: 66 | 67 | .. ipython:: python 68 | 69 | # Make x a simple number 70 | x = 5 71 | y = x 72 | 73 | # Modify x 74 | x = x + 1 75 | x 76 | 77 | # y is unchanged because x + 1 actually created a new "6" on a new shelf, and x changed from points 78 | # to 5 to pointing to 6 79 | y 80 | 81 | OK, that's it -- that's the one big, weird conceptual change to be aware of! 82 | 83 | Next Steps 84 | ^^^^^^^^^^ 85 | 86 | OK, so here's the best approach to getting going with Python: 87 | 88 | #. Do a basic Python tutorial (won't talk about data frames or vector data) 89 | #. `pandas` tutorial (where you find analogues to data frames and vectors) 90 | #. `statsmodels` tutorial (for econometrics) 91 | #. `seaborn` tutorial (equivalent for ggplot) 92 | -------------------------------------------------------------------------------- /source/python_for_stata.rst: -------------------------------------------------------------------------------- 1 | 2 | Note to Stata Users 3 | ============================ 4 | 5 | Welcome Stata users! 6 | 7 | There's good news and bad news for you Stata users (of whom I am definitely one -- I spent 6 years working exclusively in Stata). The goods news is that you have no bad habits developed from R, and you're about to enter a world of phenomenal flexibility and power. 8 | 9 | The main thing to keep in mind during your transition to Python is that there will be things in Python that will frustrate you. The beauty of Stata is that it does one thing, and one thing only: manipulate tabular data-sets. As a result of this narrow focus, Stata is able to develop a very clean, simple, intuitive syntax for that specific application. 10 | 11 | The advantage Python has over Stata is that is can do all sorts of things Stata never could -- network analysis, geo-spatial analysis, web-scraping, text analysis, etc. But the downside is that a side-effect of this flexibility is that Python syntax will feel more cumbersome. That's just a compromise you'll have to decide you can live with if you want to work with Python -- sorry! 12 | 13 | 14 | Sense of Perspective 15 | ^^^^^^^^^^^^^^^^^^^^ 16 | 17 | Everything you know from Stata is inside a single DataFrame object in Python. Think of it as "zooming out". 18 | 19 | 20 | Macros 21 | ^^^^^^^^^^^^^ 22 | 23 | If you're someone who is used to working with macros (using the `local` or `global` commands) -- either directly or in for-loops -- you have both an advantage and disadvantage moving into Python. On the plus side, you've been introduced to the idea of generalizing your code, which is great. But on the negative, you've gotten used to a way of using variables that's pretty unique to Stata. 24 | 25 | In Stata, macros work by replacing part of a command before that command is executed. For example, if I write:: 26 | 27 | local controls "age age2 education" 28 | reg income gender `controls' 29 | 30 | Then what happens is, before running the second line of that code, it replace `\`controls\'` with the string "age age2 education", then executes the code (`reg income gender age age2 education`). In other words, the macro evaluation happens before the command is passed to Stata. 31 | 32 | Unfortunately, almost no other programming language works this way. You'll see quickly how variables work in Python, but just don't be surprised when you find your intuition from macros failing you. -------------------------------------------------------------------------------- /source/r_to_python.rst: -------------------------------------------------------------------------------- 1 | 2 | R-to-Python Table 3 | ======================= 4 | 5 | *Note this section is still very preliminary. Please check back for updates.* 6 | 7 | If you know of any existing sources for this type of table, `please send me an email letting me know! `_ 8 | 9 | Outside Sources 10 | ^^^^^^^^^^^^^^^^ 11 | 12 | `A nice table of pandas-to-R translations. `_ 13 | 14 | Commands 15 | ^^^^^^^^^ 16 | 17 | 18 | =========== =================== ======================= 19 | R Python Notes 20 | =========== =================== ======================= 21 | library() import 22 | lapply() map 23 | ddplyr apply (`pandas`) 24 | lm ols (`statsmodels`) 25 | =========== =================== ======================= 26 | 27 | 28 | Libraries 29 | ^^^^^^^^^ 30 | 31 | =========== =================== ============================ 32 | R Python Notes 33 | =========== =================== ============================ 34 | ggplot seaborn 35 | rgdal geopandas / fiona / fiona and rasterio are 36 | rasterio embedded in GeoPandas 37 | sp geopandas / shapely GeoPandas builds off shapely 38 | rgeos geopandas / shapely GeoPandas builds off shapely 39 | =========== =================== ============================ 40 | -------------------------------------------------------------------------------- /source/st_command_line.rst: -------------------------------------------------------------------------------- 1 | 2 | ST: Command Line 3 | ===================== 4 | 5 | What is the Command Line? 6 | ------------------------------------------------------ 7 | Most of use are accustomed to interacting with our operating system by moving around the mouse and clicking on images of folders, and choosing options from drop down menus. This interface is called the "Graphical User Interface" or GUI. But you can also interact with your operating system using text-commands the same way you work with R, Stata, or Python with text commands. The interface for interacting with your operating system using text commands is called "the command line". 8 | 9 | Well, actually, it has lots of names. Here I'll call it the "command line", but on a Mac, you will also heard it referred to as "the terminal", "command prompt", "UNIX", "bash", and "the shell". On Windows, you may heard it referred to as "DOS", "Command Prompt", "PowerShell". These are all basically the same. 10 | 11 | Here's the most important thing to know about the Command Line: it's not as scary as it seems. In fact, the command line is an incredibly simple tool that is mostly just used to move from folder to folder in your operating system and either open files or execute other programs. 12 | 13 | 14 | Intro to Command Line 15 | ------------------------------------------------------ 16 | 17 | How the command line works depends on your operating system, so below resources are split to OSX resources and Windows resources. 18 | 19 | 20 | Mac Users / Linux Users / UNIX Users 21 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 22 | 23 | First, I'm gonna let you in on a little secret: OSX, Linux, and Unix are actually all built on the same fundamental operating system (UNIX), and so when you're working through the command line, they all behave the same. In fact, this "family" is often referred to as "POSIX". 24 | 25 | `Quick, great gentle intro `_ It gets a little lost around minute 10, but the first part is really all you need to know. 26 | 27 | 28 | The general syntax for command line commands is: 29 | 30 | `[name of program or command] [space] [options or arguments passed to program]` 31 | 32 | That's it. So for example, you've probably seen in your tutorials people writing a python file, then saving it, and running it by openning Terminal, navigating to the folder that file is in, and typing: 33 | 34 | `python the_name_of_file_with_code.py` 35 | 36 | They're just launching Python (first word), and telling it to run the file `the_name_of_file_with_code.py`. 37 | 38 | 39 | 40 | 41 | Windows Users 42 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 43 | 44 | 45 | `Here's a great tutorial `_, but only read the sections "What is Command Prompt?" and "Working with Files and Directories". The stuff on Java doesn't really matter. -------------------------------------------------------------------------------- /source/st_git_and_github.rst: -------------------------------------------------------------------------------- 1 | 2 | ST: Git and Github 3 | ===================== 4 | 5 | *STILL IN PROGRESS! Check back for updates* 6 | 7 | What is Git? 8 | -------------- 9 | Git is a tool developed to help different people work on anything written in plain-text (usually code of some sort) at the same time without getting in each other's ways. It's not only how big software projects -- both commercial and open-source -- allow people to add and make changes to big projects, it's also increasing a tool for collaboration on things like lessons plans and tutorials. See the links below for more. 10 | 11 | What is GitHub? 12 | ---------------- 13 | 14 | GitHub is a repository where people can host projects that they want other people to be able to contribute to using git. For example, this `entire website is on git `_, and if you something here you don't like, you can submit a change on github that, if accepted, will show up here! 15 | 16 | 17 | Tutorials 18 | ----------- 19 | Here are some basic tutorials. 20 | 21 | 22 | `Software-Carpentry's Git Tutorial `_ 23 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 24 | **Pre-requisites:** A little experience with command line tools (:doc:`see here `) is a good idea, but probably even that's not really required. 25 | 26 | **The Good:** An open-source tutorial (so people have been refining over and over) written mostly for natural scientists, so NOT software developers. Really nice! 27 | 28 | **The Bad:** Not quite as in-depth as you might want if you really want to dive in full-bore. 29 | 30 | 31 | `Lynda.com's Git Essential Training `_ 32 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 33 | **Pre-requisites:** A little experience with command line tools (:doc:`see here `) is a good idea, but probably even that's not really required. 34 | 35 | **The Good:** A really incredibly clear instructor and carefully designed class. Some assumed familiarity with programming, but not a lot. Lynda.com also offers ability to speed-up playback for sections that feel familiar or straightforward. Topics are carefully organized and indexed if you need to jump around. 36 | 37 | **The Bad:** May not be free. Lynda.com tutorials are great because it's a paid service, and sadly you sometimes get what you pay for. However, many universities (Yale, Stanford, etc.) have subscriptions, so check with your institution to see if you can get free access. You can also get a 10 day free trial, or pay $30 for a month. 38 | 39 | In addition, probably MORE depth than almost any of us need, but you can figure out pretty quick what you need. 40 | -------------------------------------------------------------------------------- /source/st_ipython.rst: -------------------------------------------------------------------------------- 1 | 2 | ST: iPython 3 | ===================== 4 | 5 | *STILL IN PROGRESS! Check back for updates* 6 | 7 | 8 | What is iPython? 9 | ^^^^^^^^^^^^^^^^^^^^^^^^^^ 10 | 11 | `Great quick intro `_ 12 | 13 | 14 | What is an iPython Notebook? 15 | ^^^^^^^^^^^^^^^^^^^^^^^^^^ 16 | 17 | `More here `_ -------------------------------------------------------------------------------- /source/t_big_data.rst: -------------------------------------------------------------------------------- 1 | 2 | Big Data / Parallelization 3 | ============================= 4 | 5 | What *Is* Big Data? 6 | ^^^^^^^^^^^^^^^^^^^^ 7 | Excellent question! The truth is the term has a different meaning to different people. Within Political Science, for example, the term is often used for any project that studies correlations in relatively large datasets. On this site, however, we will use a more precise and practical definition of two types of *big data*: 8 | 9 | * **Big Data**: data that is too big to load and work with entirely in RAM 10 | + hundreds of gigabytes of data to a few terabytes of data 11 | * **Very Big Data**: data that is too big to load and work with entirely in RAM, and also too big to store on a single computer's hard drive. 12 | + peta-bytes of data 13 | 14 | Note this distinction is my own -- when you go into the world, you will find big data sometimes means using unusually large datasets, sometimes means data that doesn't fit into ram, and sometimes means data that won't fit on a single computer's harddrive. 15 | 16 | Further, also note that programs very often make copies of data when you work, so if your data just barely fits into RAM (say your data is 7gb and you have 10gb of RAM), then you may be working with *Big Data*! 17 | 18 | Why this definition? When your computer is manipulating data (defining variables, running regressions, etc), what its actually doing is grabbing a little bit of data from storage, moving it to the processor, doing math with it, and then putting it back in storage. 19 | 20 | To simplify somewhat, we can think of your computer as having two forms of storage -- RAM (sometimes called main memory), and your harddrive. But these two forms of memory are ''very, very different.'' Your computer can grab data from RAM 100,000 times faster it can grab data on the harddrive. Because of this, your computer is much happier (and performs faster) when it's able to keep all the data you're working with in RAM. Indeed, if you're moving data back and forth to your harddrive, it's almost certainly the biggest bottleneck -- doing actual computations is almost instantaneous compared to moving data back and forth to your harddrive on modern computers. 21 | 22 | With that in mind, if you can't keep all your data in RAM, you have to use special tools that minimize the amount of time your computer spends going back and forth to the harddrive for data. 23 | 24 | 25 | Big Data Tools 26 | ^^^^^^^^^^^^^^^ 27 | 28 | Easy to work with, but somewhat limited: `dask` 29 | ------------------------------------------------ 30 | 31 | `Dask` is a new tool written for working with data that doesn't fit into memory (and parallelizing operations) for Python. It was written to basically work just like `pandas`, so it's quite easy to get started using. The only catch is that it only supports a certain number of functions at this point, so it will do a lot, but not everything. 32 | 33 | * `Dask Overview `_ 34 | * `Long Tutorial `_ 35 | 36 | Dask is probably best for *big data*, but I'm not sure it can really handle *very big data* (data you can't put on one computer's harddrive). The authors aim to fix that, but I'm not sure the system is there yet. 37 | 38 | More involved, but more powerful: `pyspark` 39 | --------------------------------------------- 40 | 41 | `Spark` is like the second generation of a platform called Hadoop for working with data across lots and lots of software. `pyspark` is a great tool for manipulating `spark` using python. `spark` is *very* powerful, but it's not a tool created for Python users, it's an entire computing system you'll have to learn about to use. 42 | 43 | Though there are attempts to make `PySpark` easier for `pandas` users to use, like `Sparkling Pandas` (`tutorial here `_) 44 | 45 | Unlike `Dask`, `Spark` and `PySpark` were built not just for *big data* (data that doesn't fit in RAM), but specifically for *very big data* (data that won't even fit on a single computer's hard drive). So if you have *very big data*, this is probably the way to go. 46 | 47 | 48 | -------------------------------------------------------------------------------- /source/t_getting_help.rst: -------------------------------------------------------------------------------- 1 | 2 | Getting Help 3 | ============================= 4 | 5 | Getting help online is actually a bit of an acquired skill, so here's some advice to get you started. 6 | 7 | Resources 8 | ^^^^^^^^^^^^^^^ 9 | 10 | * **StackOverflow:** `StackOverflow `_ (SO) is by far the most useful resource on the internet for programming-related problems, and it can be useful in two different ways: 11 | + Search for existing answers: People have asked LOTS of questions on SO, so if you have a problem, there's a good chance you can find the answer there already. Always start with this. 12 | + Posting your question: This requires some skill (see below), but if the answer to your question can't already be found online, this is often the best way to figure it out. 13 | 14 | * **Mailing Lists:** Most open source communities have a mailing list where you can send your questions. If your question is very specific to a library or tool, this is often the best place to go. Here are a few examples: 15 | + `PyData `_: Great source for help with `pandas` and other data tools. 16 | + `PyStatsmodels `_: Help with statsmodels package. 17 | + `iGraph `_: Amazing help with iGraph issues. 18 | 19 | 20 | Posting a Question 21 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 22 | 23 | Few things are as amazing to me as the willingness of strangers to help one another out with their programming problems. To get the most out of your request for assistance, however, follow a few easy guidelines: 24 | 25 | * **Show your work:** People like to help people who are stuck, but hate spending time assisting someone with a problem they could have solved with google. So to get the best help, make sure to include in your request for help: 26 | + *Description of things you've tried yourself* 27 | + *Links to other sites you've visited that seem like might help but don't*: People are often quick to say "hey, this has been asked elsewhere" and throw a link at you, so add links to the other sites you've check and some information about why those solutions don't work. 28 | * **If possible, include a reproducible example:** Nothing will make it more likely you get help than if you can post a chunk of code (5-20 lines?) that someone else can copy and paste into Python to recreate the problem. This isn't always possible of course, but really do try. 29 | + Taking your code and trying to whittle it down to a "minimal reproducible example" is also one of the best ways to find problems yourself, so often doing this will lead you to solving your problem before you even post! 30 | -------------------------------------------------------------------------------- /source/t_gis.rst: -------------------------------------------------------------------------------- 1 | 2 | GIS in Python 3 | ================ 4 | 5 | There are two sets of tools for using GIS in Python: the first is by using python scripts to control ArcGIS, a popular (but expensive) commercial platform; the second is using native python tools. 6 | 7 | 8 | GeoPandas: 9 | ^^^^^^^^^^^ 10 | The all-in-one GIS platform for Python is `GeoPandas`, which extends the popular `Pandas` library to also support spatial data. GeoPandas recently released version 0.2, and you can find `docs for 0.2 here `_ . Hopefully, they're pretty good (full disclosure, I wrote many of them!) 11 | 12 | You can also find a `a full course of geospatial analysis using GeoPandas `_ . 13 | 14 | And a short stand-alone `tutorial here `_ . 15 | 16 | 17 | Native Python GIS Tools 18 | ------------------------- 19 | GeoPandas bundles a lot of separate libraries, but if you don't want to use GeoPandas, you are welcome to use these libraries on their own. But for context, here are the main python GIS libraries: 20 | 21 | * `Fiona `_: Tools for importing and exporting vector data from various formats like shapefile. 22 | 23 | * `Rasterio `_: Tools for importing and exporting raster data from various formats 24 | 25 | * `PyProj `_: Tools for defining and transforming the datum and projections of spatial data 26 | 27 | * `Shapely `_: Tools for spatial analytics, like testing for intersections, measuring areas, etc. Note that this is basically a tool for analyzing 2-dimensional cartesian shapes -- it has no facilities for managing projections. That you have to do with PyProj before you start manipulations with shapely. 28 | 29 | * `RTree `_: Spatial analytics (like intersections) can be relatively computationally difficult and thus slow. For example, if you want to do something like a spatial join of millions of points to a shapefile of polygons, you want to use what's called a "Spatial Index" tool like RTree. Basically, for each point, RTree will very quickly identify a list of polygons with which that point ''might'' intersect (this list will always include the polygon that the point intersects with, but also some others. In other words, it has no false negatives, but lots of false positives). You then use a slower but more accurate tool like Shapely to check more accurately whether your point lies in each of these candidates to find the one true intersecting polygon. 30 | 31 | * `CartoPy `_ and `Descartes `_: Cartography tools for making pretty maps. Cartopy is basically the successor to Basemap, which you may also read about on some forums. 32 | 33 | (Side note: most of which are actually just Python interfaces for extremely fast C / C++ libraries published by the OSGeo collective that back most geo-spatial tools, regardless of the language you're using.) 34 | 35 | Here's a `good tutorial on all major libraries and GeoPandas `_. The only problem is that the presenter (who actually started GeoPandas) doesn't get to GeoPandas till 2 hours 32 minutes (`direct link to that point `_). The whole tutorial is great if you want to really understand the eco-system, but there's much to be said for jumping to the GeoPandas section. 36 | 37 | 38 | ArcPy: Controlling ArcGIS using Python 39 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 40 | 41 | To use ArcPy -- the Python module for manipulating ArcGIS -- you first need an ArcGIS license. 42 | 43 | If you have ArcGIS and are familiar with it's use through the normal point-and-click interface, you can find an `arcpy tutorial here `_ I wrote a few years ago. It assumes no real knowledge of Python so it may be a little slow and it's a little old and clunky, but should cover what you need to know. 44 | -------------------------------------------------------------------------------- /source/t_igraph.rst: -------------------------------------------------------------------------------- 1 | 2 | Network Analysis 3 | ============================= 4 | 5 | Just use `iGraph `_. 6 | 7 | If you're doing community detection, make sure to get the `louvain-igraph` module that adds the most cutting edge algorithms to `iGraph`. `You can get it here. `_ 8 | 9 | `NetworkX` is the other big network analysis library, but it's *much* slower than `iGraph`. The reason is that `iGraph` is written in C, so it's orders of magnitudes faster than `NetworkX`, which is entirely written in native Python (much, much slower). For small graphs, `NetworkX` is fine, but for moderate sized networks (10,000 nodes or more) you really want to use `iGraph`. 10 | 11 | If you get stuck working with `iGraph`, `the mailing list community `_ is *amazingly* helpful and supportive. -------------------------------------------------------------------------------- /source/t_scikitlearn.rst: -------------------------------------------------------------------------------- 1 | 2 | Machine learning 3 | =================================== 4 | 5 | The primary library for Machine Learning in Python is ``scikit-learn``, which has its `own great tutorial page here `_. 6 | 7 | If you're wondering about the difference between ``statsmodels`` and ``scikit-learn``, the answer is: there's no easy answer. 8 | 9 | ``statsmodels`` is primarily written for and by econometricians, while ``scikit-learn`` is primarily written for and by computer scientists and people doing machine learning. But the relationship between "econometrics" and "machine learning" is complicated. In very broad terms, machine learning tends to focus on prediction while econometrics tends to focus on testing hypotheses. But that's somewhat simplistic. 10 | 11 | The reason is that Econometrics and Machine Learning both developed when people in specific disciplines (economics and computer science respectively) branched off statistics to develop tools tailored for their own area. For several decades, econometrics and machine learning more or less developed independently and in parallel, each borrowing from statistics, but neither really paying attention to the other. As a result, there are some places where the two fields use the same tools but refer to them with different nomenclature, and other places where they actually do fundamentally different things. 12 | 13 | -------------------------------------------------------------------------------- /source/t_seaborn.rst: -------------------------------------------------------------------------------- 1 | 2 | Plotting 3 | ===================== 4 | 5 | The low-level library for making figures in Python is called `matplotlib`. It's very powerful, but also a little *too* low-level for most social science uses, so it's probably not your best bet. Instead, most people use either `seaborn`, or `ggplot` (meant to duplicate syntax and functionality of `ggplot` in R). 6 | 7 | ggplot 8 | ^^^^^^^^^^^^^^^^^^^^^ 9 | 10 | If you're accustomed to using `ggplot` in R, then good news: the folks at yhat have duplicated it's functionality in Python! This library is still quite young, but seems very promising. 11 | 12 | * `ggplot Overview `_ 13 | * `Installation `_ 14 | 15 | seaborn 16 | ^^^^^^^^^^^^^^^^^^^^^ 17 | 18 | The other very popular library for plotting is called `seaborn`. It's build on top of `matplotlib`, and basically allows you to make common statistical plots more easily. If you come from stata, think of `seaborn` as your `twoway`; if you come from `R`, it's your `ggplot`. 19 | 20 | * `Main page `_ 21 | * `Gallery of example plots (with the code that made them) `_ 22 | * `Introduction to seaborn `_ 23 | * `Tutorial `_. (Note this is not the default tutorial on the `seaborn` site, but another one that's hidden away I actually much prefer!) 24 | 25 | At the time of this is being written, `seaborn` did not come with Anaconda by default, but can be easily added by typing `conda install seaborn` in your terminal window. 26 | -------------------------------------------------------------------------------- /source/t_statsmodels.rst: -------------------------------------------------------------------------------- 1 | 2 | Econometrics 3 | ========================= 4 | 5 | There are two sets of tools for econometrics: `statsmodels`, `quantecon`, and (for bayesians) `stan`. 6 | 7 | Stats with StatsModels 8 | ^^^^^^^^^^^^^^^^^^^^^^^ 9 | `statsmodels` is the go-to library for doing econometrics (linear regression, logit regression, etc.). 10 | 11 | You can find a `good tutorial here `_, and a brand new book built around `statsmodels` `here `_ (with lots of `example code here `_). 12 | 13 | The most important things are also covered on `the statsmodel page here `_, especially the pages on OLS `here `_ and `here `_. 14 | 15 | (If you want to do machine learning, by the way, or want to know the difference between ``statsmodels`` and the machine learning library ``scikit-learn``, head over to :doc:`Machine Learning with scikit-learn `) 16 | 17 | Here are some simple illustrative examples of standard OLS: 18 | 19 | On with the show: 20 | 21 | .. ipython:: python 22 | 23 | @suppress 24 | import os 25 | @suppress 26 | os.chdir("source/_data") 27 | 28 | # Load pandas and statsmodels 29 | import pandas as pd 30 | import statsmodels.formula.api as smf 31 | 32 | # Load a csv dataset of World Development Indicators 33 | my_data = pd.read_csv('wdi_indicators.csv') 34 | 35 | # Look at first three lines 36 | my_data.head(3) 37 | 38 | # OLS 39 | results = smf.ols('life_expectancy ~ population_density + gdp_per_cap', 40 | data=my_data).fit() 41 | print(results.summary()) 42 | 43 | 44 | # Categorical Vars are easy 45 | # Make categorical var 46 | my_data['low_income'] = my_data['gdp_per_cap'] < 4000 47 | 48 | results2 = smf.ols('life_expectancy ~ population_density + gdp_per_cap + C(low_income)', data=my_data).fit() 49 | 50 | print(results2.summary()) 51 | 52 | 53 | # Heteroskedastic-Robust Standard Errors 54 | results2_robust = results2.get_robustcov_results() 55 | print(results2_robust.summary()) 56 | 57 | 58 | # Output to LaTeX 59 | latex = results2_robust.summary().as_latex() 60 | latex 61 | 62 | # Save to disk 63 | with open("regression_table.tex", "w") as text_file: 64 | text_file.write(latex) 65 | 66 | QuantEcon 67 | ^^^^^^^^^^ 68 | QuantEcon is a new library specifically for economists with some tools not found in `statsmodels`. `A full index is here `_ 69 | 70 | PyStan 71 | ^^^^^^^^^^ 72 | PyStan is the Python interface for the Stan library -- a set of tools for statisticians, especially bayesians. You can find resources on `Stan `_ in general here, and `PyStan in particular here `_ . 73 | -------------------------------------------------------------------------------- /source/t_super_fast.rst: -------------------------------------------------------------------------------- 1 | 2 | Making Python faster 3 | ======================= 4 | 5 | Tuning your code is a very easy way to waste lots of time. With that in mind, think carefully about how much energy it actually makes sense to invest in speeding up your code. If you have something that takes an hour to run, it may be annoying, but at the end of the day it may not be worth spending 2 hours figuring out how to speed up that code if you only need to run it a couple times to clean your data. 6 | 7 | But, if you do indeed need to make code you've written faster, here are some words of guidance, written as a kind of "checklist" for approaching the problem from easiest to hardest solutions. 8 | 9 | 1. Find Your Bottlenecks! 10 | -------------------------------------------- 11 | 12 | There's no reason to tune a line of code that is only responsible for 1/100 of your running time, so before you invest in speeding up your code, figure out what's slowing it down -- a process known as "profiling" your code. Thankfully, because this is so important, there are lots of tools (called profilers) for measuring exactly how long your computer is spending doing each step in a block of code. 13 | 14 | The easiest way to profile your Python code is using the iPython `%prun` command (`documented here `_). If you haven't used iPython `magic` commands before, `you can learn more about them here `_. 15 | 16 | 2. Check your memory management 17 | -------------------------------------------- 18 | 19 | When your computer is manipulating data (defining variables, running regressions, etc), what its actually doing is grabbing a little bit of data from storage, moving it to the processor, doing math with it, and then putting it back in storage. 20 | 21 | To simplify somewhat, we can think of your computer as having two forms of storage -- RAM (sometimes called main memory), and your harddrive. But these two forms of memory are very, very different. Your computer can grab data from RAM 100,000 times faster it can grab data on the harddrive. Because of this, your computer is much happier (and performs faster) when it's able to keep all the data you're working with in RAM. Indeed, if you're moving data back and forth to your harddrive, it's almost certainly the biggest bottleneck -- doing actual computations is almost instantaneous compared to moving data back and forth to your harddrive on modern computers. 22 | 23 | `You can learn more about how to know if your computer is wasting time going back and forth to the harddrive here. `_ 24 | 25 | **Important:** Just because your dataset is small enough it seems like it should fit into RAM doesn't mean this isn't relevant for you! Programs often make copies of your data when they manipulate it, so even if your dataset is 2gb and your have 8gb of RAM, the program you're using can very easy end up using all 8gb of RAM! 26 | 27 | If you can't get your data into memory, head on over to the :doc:`Big Data page for tools! ` for appropriate tools! 28 | 29 | 3. Use other people's (compiled) functions 30 | -------------------------------------------- 31 | 32 | As a general rule, code you write in Python is slow. But don't worry, it's not your fault; Python was written to minimize the amount of time it takes to write code, not to minimize the amount of time it takes for code to run. Anything written in Python will be slow compared to code written in a fast language like C or C++. 33 | 34 | But interestingly, commands that other people have written that are available in Python are actually usually written in faster ("compiled") languages, like C++. As a result, whenever you have the choice between writing a function yourself or using a function in an established library, you're almost always better off using the command someone else wrote. 35 | 36 | As a result, looping over a vector and adding up each value will be much slower than using a dedicated function from `pandas` to do the same thing. 37 | 38 | Now, using other people's functions is not fool-proof -- some people write their libraries Python (not a compiled language like C++), so they may run as slowly as your own commands. So if you can, check the documentation for whatever library you want to use to see whether it was written in C / C++ or not! But for big libraries -- like pandas, numpy, etc. -- you can probably assume the dedicated functions in the library are faster than anything you'll ever write. 39 | 40 | 41 | 4. Use a "Just-In-Time Compiler" like numba 42 | -------------------------------------------- 43 | 44 | If what's slowing you down is some type of numerical operation, there's an incredibly easy tool for speeding up your code: `numba`. Basically, you add one line above the function you want to speed up, and if the function only uses a certain subset of operations, it can immediately speed up by 10x - 100x or more. 45 | 46 | `Details of using numba with pandas `_ 47 | 48 | If you want to know much more (like how it all works) you can find a `tutorial here `_ . 49 | 50 | 5. Parallelize 51 | --------------- 52 | Parallelization (which has it's own page Parallelization here) is often the first thing social scientists turn to to speed up code. Whether this is the right decision depends a lot on your situation, but a few facts: 53 | 54 | * Parallelization is "sub-linear", meaning if you parallelize across two cores, you can except ~1.8x speedup at best. (More generally, for N cores, expect ~0.8*Nx speedups). 55 | * Improving how your code is written can often yield much higher returns than parallelization, on the order of 5x, 10x, or 100x speed improvements. 56 | * Using code that was written in C++ by someone else can yield 10x or 100x returns. 57 | 58 | So, parallelization certainly isn't the best way to speed up your code. But if you read this and find yourself thinking "um, I still don't really what makes code fast versus slow" or "I've already tried all that and it's still too slow!", it's a good option. 59 | 60 | `Here are some added considerations and vocabulary for parallelization. `_ 61 | 62 | (I almost never use parallelization on my own work, so not sure of easiest resources. When I have, I've done it with `iPython parallelization `_, though I don't know it's the best tool.) 63 | 64 | 6. Cython 65 | -------------------------------------------- 66 | 67 | One reason that Python code is relatively slow is that ever time code executes, Python has to spend time checking to see the type of each variable (integer, floating point number, string, etc.). There's a way to speed up Python massively by adding information about the types of your variables using a tool called Cython. It's not super-difficult to use, but it's also not nearly as easy as a tool like numba, so use it only if needed. (Also debatable if easier than parallelization). 68 | 69 | `Here's a great tutorial on everything you could ever want to know about Cython. `_ 70 | 71 | -------------------------------------------------------------------------------- /source/t_teaching_programming.rst: -------------------------------------------------------------------------------- 1 | 2 | Teaching Programming 3 | ============================= 4 | 5 | *STILL IN PROGRESS! Check back for updates* 6 | 7 | `Read this `_ 8 | -------------------------------------------------------------------------------- /source/t_text_analysis.rst: -------------------------------------------------------------------------------- 1 | 2 | Text Analysis 3 | ============================= 4 | 5 | Python is a phenomenally good tool for text analysis, and there are a few good tools out there you can use. 6 | 7 | Natural Language Tool Kit (NLTK) 8 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 9 | 10 | The most used library in social science is probably `the "Natural Language Tool Kit", normally referred to as "NLTK" `_. The library has lots of tools, and is very user friendly. 11 | 12 | Moreover, the site for NLTK not only includes some simple examples on the main page, but also the full contents of the book `"Language Natural Language Processing with Python – Analyzing Text with the Natural Language Toolkit" `_ by Steven Bird, Ewan Klein, and Edward Loper (talk about generous!). However, if you want a paper copy, you can also find it on `Amazon here `_. 13 | 14 | 15 | TextBlob 16 | ^^^^^^^^^^^^^ 17 | `TextBlob `_ is a nice looking new library that seems to have a *very* streamlined interface. The library appears to actually use NLTK and another library (`pattern `_) in the background, so much of the functionality should be similar to NLTK itself. 18 | 19 | 20 | Stanford CoreNLP 21 | ^^^^^^^^^^^^^^^^^ 22 | 23 | There are some other libraries that are somewhat more powerful, but not as user-friendly. One of the most powerful libraries for Natural Language Processing is the `Stanford CoreNLP `_ library. The library itself is written in Java (not Python), but a number of people have written Python interfaces (also called wrappers or APIs) for the library `which you can find here `_. Again, these are a little harder to use and the documentation is not as good, but they're good options if you run into limitations with NLTK. 24 | -------------------------------------------------------------------------------- /source/why_python.rst: -------------------------------------------------------------------------------- 1 | 2 | Why Python? 3 | ============================== 4 | 5 | It's a great language 6 | ^^^^^^^^^^^^^^^^^^^^^ 7 | 8 | The best reason to learn Python is also the hardest to articulate to someone who is just starting to work with Python: in terms of structure and syntax, it's a beautifully designed, intuitive, but exceedingly powerful general-purpose programming language. 9 | 10 | Python was explicitly designed (a) so code written in Python would be easy for humans to read, and (b) to minimize the amount of time required to write code. Indeed, its ease of use is the reason that `according to a recent study, 80% of the top 10 CS programs in the country `_ use Python in their intro to computer science classes. 11 | 12 | Generalizable skills > non-generalizable skills 13 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 14 | 15 | At the same time, however, it's a **real, general-purpose programming language.** Major companies like `Google `_ and `Dropbox `_ use Python in their core applications. 16 | 17 | This sets Python apart from "Domain Specific Languages" languages like R that are highly tuned to serve only a specific purpose -- like statistics -- and work for a specific audience. John Chambers created R with the goal of making a language that non-programmers could get started with quickly, but which could also be used by "power users". To a large degree he succeeded, as is evidenced by R's uptake. But in trying to make the language so accessible to non-programmers, many compromises were made in the language. R only really serves one purpose -- statistical analysis -- and the language syntax has all sorts of oddities and warts that come from this original bargain. Python does require a little more training to get started with (though not that much more), but as a result there's **no ceiling to what you can do with Python.** If you learn Python, you're learning a full programming language. This means if you ever need to work in a different language like Java or C for some reason, understand code someone else has written, or otherwise deal with a programming problem, your background in a real programming language will give you a good conceptual foundation for whatever you come across. Indeed, this is the reason top CS programs teach in Python. 18 | 19 | **Of all the reasons to choose Python, I think this is by far the most compelling**. Python sets you up to understand and operate in the broader programming world. And if you're at all interested in doing *computational* social science, building a generalizable programming skill just makes you more flexible. R is great if you want to just run regressions or do things that perfectly fit the mold someone has created with an R function. But as social scientists keep finding new sources of data (like text) and new ways to analyze it, the more literate you are in general programming, the more prepared you will be to steal tools from other disciplines and to write new tools yourself. 20 | 21 | Python only, or Python and ... 22 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 23 | 24 | Personally, I find the idea of working in a single programming environment incredibly appealing. I first came to Python because I was doing my econometrics in Stata, my GIS work in ArcGIS, and my network analysis in R, and I just wanted to unify my work flow. For me, one of the best parts of Python is that I'm confident I can do anything I want in this one environment. 25 | 26 | But not everyone feels that way, and many people use Python AND other tools like R, moving back and forth depending on the application at hand. But even if you plan to mix and match, one of the great things about Python is that because of its generality, anecdotally many people say getting better at Python has made them much better programmers, not just in Python, but also in R or Stata. 27 | 28 | Performance 29 | ^^^^^^^^^^^ 30 | 31 | Performance never comes into play for the vast majority of social science applications, so this is not one of the top reasons to choose Python. However, if you find yourself in a situation where it does, Python does have some major performance advantages over most other high-level languages, including Matlab and R, both in terms of computation speed and memory use (R is a notorious memory hog). 32 | 33 | More importantly, though, :doc:`there are new tools that make it possible to write code in Python ` that runs at nearly the speed of code written in C or FORTRAN -- orders of magnitude faster than R or native Python. Again, this is a second-order consideration in most cases, but another example of how Python gives you options no matter what the future brings. 34 | 35 | 36 | Why NOT Python? 37 | --------------- 38 | 39 | There is one huge reason one might choose to use R over Python, in my view: colleagues. If you know lots of people who work with R, then if you choose to use R (a) you can turn to the person next to you and ask for help, and (b) if you co-author, collaboration will be easier. Python has a great support community and mailing lists, but there is no substitute for personal help. 40 | 41 | 42 | Want some more opinions? 43 | ------------------------ 44 | 45 | * `Python as a teaching language `_ 46 | * `Python versus R `_ 47 | --------------------------------------------------------------------------------