├── 22__pandas-how-to-filter-results-of-value_counts.patch
├── README.md
├── notebooks
    ├── Books
    │   └── Think Python
    │   │   ├── Chapter_3_Functions_1.ipynb
    │   │   ├── Chapter_4__Case_study_interface_design.ipynb
    │   │   ├── Chapter_5__Conditionals_and_recursion.ipynb
    │   │   ├── Chapter_6__Fruitful_functions.ipynb
    │   │   ├── Think_Python_Chapter_10__Lists.ipynb
    │   │   ├── Think_Python_Chapter_11__Dictionaries.ipynb
    │   │   ├── Think_Python_Chapter_12__Tuples.ipynb
    │   │   ├── Think_Python_Chapter_7__Iteration.ipynb
    │   │   ├── Think_Python_Chapter_8__Strings.ipynb
    │   │   ├── Think_Python_Chapter_9__Case_study_A_word_play.ipynb
    │   │   ├── ch7_debug.py
    │   │   └── strings_in_python.png
    ├── DataFrame_column_transformations.ipynb
    ├── Dataframe_to_json_nested.ipynb
    ├── How_to_extract_information_from_excel_with_Python_and_Pandas.ipynb
    ├── IPython tricks 2019.ipynb
    ├── Image_validation_with_Python.ipynb
    ├── Load_multiple_CSV_files_into_a_single _Dataframe.ipynb
    ├── Pandas count and percentage by value for a column.ipynb
    ├── Pandas is column is contained in another column in the same row.ipynb
    ├── Pandas search in column, every column and regex.ipynb
    ├── Python Extract Table from PDF.ipynb
    ├── Python group and sort a list of lists by a specific index,pattern.ipynb
    ├── Python_group_or_sort_list_of_lists_by_common_element.ipynb
    ├── Q&A
    │   └── Questions_and_Answers_1_Improve_OCR_and_tabula_range.ipynb
    ├── Scrape wiki tables with pandas and python.ipynb
    ├── What_is_the_usage_of_*_asterisk_in_Python.ipynb
    ├── csv
    │   ├── data.csv.zip
    │   ├── data_201901.csv
    │   ├── data_201902.csv
    │   ├── data_202001.csv
    │   ├── data_202002.csv
    │   └── excel
    │   │   └── example.xlsx
    ├── pandas
    │   ├── 20._Pandas_-_value_counts_-_multiple_columns%2C_all_columns_and_bad_data.ipynb
    │   ├── 21. pandas-dataframe-sampling-rows-or-columns.ipynb
    │   ├── 22.pandas-how-to-filter-results-of-value_counts.ipynb
    │   ├── 23.pandas-typeerror-unhashable-type-list-dict.ipynb
    │   ├── 24-pandas-check-value-column-contained-another-column-same-row.ipynb
    │   ├── 25_Pandas_Create_A_Matplotlib_Scatterplot_From_A_Dataframe.ipynb
    │   ├── 26.pandas-display-all-columns-and-show-more-rows.ipynb
    │   ├── How_to_Optimize_and_Speed_Up_Pandas.ipynb
    │   ├── Pandas_Crosstab_-_cross_tabulation_of_two_factors_examples.ipynb
    │   ├── Pandas_How_add_new_column_existing_DataFrame.ipynb
    │   ├── Pandas_Select_rows_between_two_dates_-_DataFrame_or_CSV_file.ipynb
    │   ├── Pandas_compare_columns_in_two_Dataframes.ipynb
    │   ├── Pandas_count_values_in_a_column_of_type_list.ipynb
    │   ├── Pandas_extract_url_or_dates_from_column.ipynb
    │   ├── Python_Pandas_find_and_drop_duplicate_data.ipynb
    │   ├── map_the_headers_to_a_column_with_pandas.ipynb
    │   └── pandas-use-list-values-select-rows-column.ipynb
    ├── python
    │   ├── Files
    │   │   └── How_to_merge_multiple_CSV_files_with_Python.ipynb
    │   └── JSON
    │   │   ├── 41._Create_a_table_in_MySQL_Database_from_python_dictionary.ipynb
    │   │   └── 42._Convert_MySQL_table_to_Pandas_DataFrame_Python_dictionary.ipynb
    ├── python_problems
    │   └── Python_problems_for_beginners_1.ipynb
    └── youtube
    │   └── Youtube-PewDiePie.ipynb
├── scripts
    ├── 1.python_wrap_lines.py
    └── __init__.py
└── test.py


/22__pandas-how-to-filter-results-of-value_counts.patch:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/softhints/python/a256a054d74ca397f41874b3e26f1c4b84214432/22__pandas-how-to-filter-results-of-value_counts.patch


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | # python
  2 | Jupyter notebooks and datasets for the interesting pandas/python/data science video series.
  3 | 
  4 | # Contribution
  5 | 
  6 | Feel free to contribute or suggest new ideas. To get in touch write on [mail](mailto:grouprivl@gmail.com?subject=[GitHub]%20Source%20Python).
  7 | 
  8 | You can find nice guide about GitHub contribution: 
  9 | * [Contributing to projects](https://docs.github.com/en/get-started/quickstart/contributing-to-projects)
 10 | * [Step-by-step guide to contributing on GitHub](https://www.dataschool.io/how-to-contribute-on-github/)
 11 | 
 12 | # Who is this repo for?
 13 | 
 14 | For people who are interested in data science, data analysis and finding interesting insights for data. This repository is related to sites: 
 15 | * [DataScientYst.com - Data Science Tutorials, Exercises, Guides, Videos with Python and Pandas](https://datascientyst.com/)
 16 | * [SoftHints.com - Python, Pandas, Linux, SQL Tutorials and Guides](https://softhints.com/)
 17 | 
 18 | where you can find more interesting articles. 
 19 | 
 20 | New website dedicated to Pandas and Data Science was started: https://datascientyst.com/. It has better organization and covers topics in many areas.
 21 | 
 22 | 
 23 | The youtube channel is: 
 24 | 
 25 | * [SoftHints Youtube](https://www.youtube.com/@softhints/)
 26 | * [Popular Videos](https://www.youtube.com/@softhints/videos)
 27 | 
 28 | # Latest Videos
 29 | 
 30 | ## Pandas
 31 | 
 32 | 0. [Pandas Tutorial : How to split columns of dataframe](https://www.youtube.com/watch?v=cCoGsFVPVh0&list=PLeicpQTG639FTJ-daMp7YWmQLBH3zumXv)
 33 | 1. [Pandas Tutorial : How to split dataframe by string or date](https://www.youtube.com/watch?v=7sgDvC4k6Xg&list=PLeicpQTG639FTJ-daMp7YWmQLBH3zumXv)
 34 | 2. [Easily extract tables from websites with pandas and python](https://www.youtube.com/watch?v=OXA_ZD1gR6A&list=PLeicpQTG639FTJ-daMp7YWmQLBH3zumXv)
 35 | 3. [Easily extract information from excel with Python and Pandas](https://www.youtube.com/watch?v=hJMH_1o8eU0&list=PLeicpQTG639FTJ-daMp7YWmQLBH3zumXv)
 36 | 4. [Extract tabular data from PDF with Python - Tabula, Camelot, PyPDF2](https://www.youtube.com/watch?v=702lkQbZx50&list=PLeicpQTG639FTJ-daMp7YWmQLBH3zumXv)
 37 | 5. [Pandas is column part of another column in the same row of dataframe](https://www.youtube.com/watch?v=duOHHDqI40c&list=PLeicpQTG639FTJ-daMp7YWmQLBH3zumXv)
 38 | 6. [Load multiple CSV files into a single  Dataframe](https://www.youtube.com/watch?v=30ndwJm1I5c&list=PLeicpQTG639FTJ-daMp7YWmQLBH3zumXv)
 39 | 7. [Analyze top youtube channels 2019 with pandas - PewDiePie I](https://www.youtube.com/watch?v=mG9OnH9R5yM&list=PLeicpQTG639FTJ-daMp7YWmQLBH3zumXv)
 40 | 8. [dataframe column transformations ( str, int, category, concat)](https://www.youtube.com/watch?v=5pbRivDYzko&list=PLeicpQTG639FTJ-daMp7YWmQLBH3zumXv)
 41 | 9. [Pandas DataFrame generate n-level hierarchical JSON](https://www.youtube.com/watch?v=lCcE-0bykRU&list=PLeicpQTG639FTJ-daMp7YWmQLBH3zumXv)
 42 | 10. [Pandas How add new column existing DataFrame](https://www.youtube.com/watch?v=UvCO5gKQqtE&list=PLeicpQTG639FTJ-daMp7YWmQLBH3zumXv)
 43 | 11. [Python Pandas find and drop duplicate data](https://www.youtube.com/watch?v=4ixLp8aFomw&list=PLeicpQTG639FTJ-daMp7YWmQLBH3zumXv)
 44 | 12. [Map the headers to a column with pandas?](https://www.youtube.com/watch?v=3g6KG_8zq0E&list=PLeicpQTG639FTJ-daMp7YWmQLBH3zumXv)
 45 | 13. [Pandas count values in a column of type list](https://www.youtube.com/watch?v=lx7KFd6BPcg&list=PLeicpQTG639FTJ-daMp7YWmQLBH3zumXv)
 46 | 14. [How to Optimize and Speed Up Pandas](https://www.youtube.com/watch?v=nW5ltiwV-6Y&list=PLeicpQTG639FTJ-daMp7YWmQLBH3zumXv)
 47 | 15. [Pandas count and percentage by value for a column](https://www.youtube.com/watch?v=P5pxJkv71BU&list=PLeicpQTG639FTJ-daMp7YWmQLBH3zumXv)
 48 | 16. [Pandas use a list of values to select rows from a column](https://www.youtube.com/watch?v=jlSbo5wmTPQ&list=PLeicpQTG639FTJ-daMp7YWmQLBH3zumXv)
 49 | 
 50 | 
 51 | ## python
 52 | 
 53 | 0. [python string split by separator](https://www.youtube.com/watch?v=iBsg75W2Vig&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C)
 54 | 1. [python random number generation examples](https://www.youtube.com/watch?v=WDTnZgSreL4&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C)
 55 | 2. [bilingual programming education in java and python](https://www.youtube.com/watch?v=eEHBjP06WSI&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C)
 56 | 3. [biggest programmer salaries 2018](https://www.youtube.com/watch?v=X2bUUkWC7dE&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C)
 57 | 4. [python extract text from image or pdf](https://www.youtube.com/watch?v=PK-GvWWQ03g&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C)
 58 | 5. [Python read validate and import CSV JSON file to MySQL](https://www.youtube.com/watch?v=WbW0rHCX2UU&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C)
 59 | 6. [python regex match date](https://www.youtube.com/watch?v=o8Je7hPgsdU&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C)
 60 | 7. [python regex cheat sheet with examples](https://www.youtube.com/watch?v=o_CSmob64uU&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C)
 61 | 8. [python string methods tutorial](https://www.youtube.com/watch?v=7yuPVq9DtV0&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C)
 62 | 9. [python shuffle list](https://www.youtube.com/watch?v=WFRBxz6AeZI&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C)
 63 | 10. [Easy install of Python and PyCharm on Windows](https://www.youtube.com/watch?v=cDOlBRzHRI0&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C)
 64 | 11. [learn python for beginners complete tutorial 2018](https://www.youtube.com/watch?v=hnc3bGtYQsQ&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C)
 65 | 12. [think python chaper 2](https://www.youtube.com/watch?v=A6EIl677ntQ&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C)
 66 | 13. [Python/Java bad and good code comments examples](https://www.youtube.com/watch?v=SRCToEkq7to&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C)
 67 | 14. [intellij pycharm surround string quote](https://www.youtube.com/watch?v=AgRHEGB8Urs&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C)
 68 | 15. [Top Five Most Annoying Programming Mistakes For Beginners with Python](https://www.youtube.com/watch?v=JToPoYip-C4&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C)
 69 | 16. [No Python Interpreter Configured For The Module - PyCharm/IntelliJ](https://www.youtube.com/watch?v=mkKDI6y2kyE&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C)
 70 | 17. [python split string into list examples](https://www.youtube.com/watch?v=T8EfomTlcfA&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C)
 71 | 18. [How to migrate/update virtualenv from Python 3.5 to 3.6](https://www.youtube.com/watch?v=cFTB5EJUxzw&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C)
 72 | 19. [Python String Remove Last n Characters](https://www.youtube.com/watch?v=hZHfdOKFlAw&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C)
 73 | 20. [Python Pandas 7 examples of filters and lambda apply](https://www.youtube.com/watch?v=7nYkJctgSSA&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C)
 74 | 21. [The simplest way to run python headless test with Chrome on Ubuntu](https://www.youtube.com/watch?v=BdppFIT_lIs&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C)
 75 | 22. [Python 3 Simple Examples get current folder and go to parent](https://www.youtube.com/watch?v=tQ_9a6UhUQs&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C)
 76 | 23. [python join/merge list two and more lists](https://www.youtube.com/watch?v=-zcJ4uB7XUo&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C)
 77 | 24. [Easy way to convert dictionary to SQL insert with Python](https://www.youtube.com/watch?v=hUXGQwTSfMs&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C)
 78 | 25. [Python 3 detect and prevent TypeError-s](https://www.youtube.com/watch?v=DJd0JYaVkqA&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C)
 79 | 26. [The right way to declare multiple variables in Python](https://www.youtube.com/watch?v=8OoLg39nNlo&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C)
 80 | 27. [Python uninstall a module installed with pip install and virtual envirornment](https://www.youtube.com/watch?v=03ahRfkfwME&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C)
 81 | 28. [python performance profiling in pycharm](https://www.youtube.com/watch?v=EZ-im7m8630&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C)
 82 | 29. [Python Cumulative Sum per Group with Pandas](https://www.youtube.com/watch?v=1tCbvYv_ibw&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C)
 83 | 30. [PyCharm - Breakpoints, Favorites, TODOs simple examples](https://www.youtube.com/watch?v=_fNZLrz97kg&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C)
 84 | 31. [Python 3 simple ways to list files and folders](https://www.youtube.com/watch?v=oJdubyyJNIQ&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C)
 85 | 32. [Python 3 elegant way to find most/less common element in a list](https://www.youtube.com/watch?v=P4LonC3puS4&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C)
 86 | 33. [clock angle problem final](https://www.youtube.com/watch?v=eIRhXharV7k&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C)
 87 | 34. [Python 3 List Comprehension Tutorial for beginners](https://www.youtube.com/watch?v=DmSephyJNtQ&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C)
 88 | 35. [python 3 how to remove white spaces](https://www.youtube.com/watch?v=0k0fvqikaoE&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C)
 89 | 36. [Pandas Tutorial : How to split dataframe by string or date](https://www.youtube.com/watch?v=7sgDvC4k6Xg&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C)
 90 | 37. [improve your programming skills with fun](https://www.youtube.com/watch?v=uoAV7651Op0&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C)
 91 | 38. [pandas dataframe search for string in all columns filter regex](https://www.youtube.com/watch?v=vbHFIALhSWE&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C)
 92 | 39. [Pandas is column part of another column in the same row of dataframe](https://www.youtube.com/watch?v=duOHHDqI40c&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C)
 93 | 40. [Easily extract tables from websites with pandas and python](https://www.youtube.com/watch?v=OXA_ZD1gR6A&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C)
 94 | 41. [Easily extract information from excel with Python and Pandas](https://www.youtube.com/watch?v=hJMH_1o8eU0&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C)
 95 | 42. [Python asterisk argument or What is the usage of *   asterisk in Python](https://www.youtube.com/watch?v=JBm8iptLnuA&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C)
 96 | 43. [Easy Image validation with Python - valid image, blank or pattern](https://www.youtube.com/watch?v=HMB4zrP_-HY&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C)
 97 | 44. [Pandas DataFrame generate n-level hierarchical JSON](https://www.youtube.com/watch?v=lCcE-0bykRU&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C)
 98 | 45. [Python group or sort list of lists by common element](https://www.youtube.com/watch?v=zVQJQxpedm8&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C)
 99 | 46. [Think Python: Chapter 3 Functions 3.2](https://www.youtube.com/watch?v=Ol3Dwucax9U&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C)
100 | 47. [Questions and Answers 1 Improve OCR and tabula range](https://www.youtube.com/watch?v=nrF_Rgh88no&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C)
101 | 48. [Map the headers to a column with pandas?](https://www.youtube.com/watch?v=3g6KG_8zq0E&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C)
102 | 


--------------------------------------------------------------------------------
/notebooks/Books/Think Python/Chapter_3_Functions_1.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "metadata": {},
  6 |    "source": [
  7 |     "# Think Python: How to Think Like a Computer Scientist"
  8 |    ]
  9 |   },
 10 |   {
 11 |    "cell_type": "markdown",
 12 |    "metadata": {},
 13 |    "source": [
 14 |     "## Chapter 3  Functions\n",
 15 |     "\n",
 16 |     "* Function calls\n",
 17 |     "* Math functions\n",
 18 |     "* Composition\n",
 19 |     "* Adding new functions\n",
 20 |     "* Definitions and uses\n",
 21 |     "* Flow of execution\n",
 22 |     "* Parameters and arguments\n",
 23 |     "------\n",
 24 |     "* Variables and parameters are local\n",
 25 |     "* Stack diagrams\n",
 26 |     "* Fruitful functions and void functions\n",
 27 |     "* Why functions?\n",
 28 |     "* Debugging\n",
 29 |     "* Glossary\n",
 30 |     "* Exercises\n",
 31 |     "\n",
 32 |     "\n",
 33 |     "> In the context of programming, a function is a named sequence of statements that performs a computation. When you define a function, you specify the name and the sequence of statements. Later, you can “call” the function by name."
 34 |    ]
 35 |   },
 36 |   {
 37 |    "cell_type": "markdown",
 38 |    "metadata": {},
 39 |    "source": [
 40 |     "### Functions best practices\n",
 41 |     "\n",
 42 |     "* is name proper for the functionality\n",
 43 |     "* it should do one thing and only one thing.\n",
 44 |     "* has documentation\n",
 45 |     "* relatively short one"
 46 |    ]
 47 |   },
 48 |   {
 49 |    "cell_type": "markdown",
 50 |    "metadata": {},
 51 |    "source": [
 52 |     "## 3.1  Function calls"
 53 |    ]
 54 |   },
 55 |   {
 56 |    "cell_type": "code",
 57 |    "execution_count": 2,
 58 |    "metadata": {},
 59 |    "outputs": [
 60 |     {
 61 |      "data": {
 62 |       "text/plain": [
 63 |        "str"
 64 |       ]
 65 |      },
 66 |      "execution_count": 2,
 67 |      "metadata": {},
 68 |      "output_type": "execute_result"
 69 |     }
 70 |    ],
 71 |    "source": [
 72 |     "# type is the function name\n",
 73 |     "# 42 is the argument\n",
 74 |     "\n",
 75 |     "type('a')"
 76 |    ]
 77 |   },
 78 |   {
 79 |    "cell_type": "markdown",
 80 |    "metadata": {},
 81 |    "source": [
 82 |     "> a function “takes” an argument and “returns” a result"
 83 |    ]
 84 |   },
 85 |   {
 86 |    "cell_type": "code",
 87 |    "execution_count": 3,
 88 |    "metadata": {},
 89 |    "outputs": [
 90 |     {
 91 |      "data": {
 92 |       "text/plain": [
 93 |        "32"
 94 |       ]
 95 |      },
 96 |      "execution_count": 3,
 97 |      "metadata": {},
 98 |      "output_type": "execute_result"
 99 |     }
100 |    ],
101 |    "source": [
102 |     "int('32')"
103 |    ]
104 |   },
105 |   {
106 |    "cell_type": "code",
107 |    "execution_count": 4,
108 |    "metadata": {},
109 |    "outputs": [
110 |     {
111 |      "ename": "ValueError",
112 |      "evalue": "invalid literal for int() with base 10: 'Hello'",
113 |      "traceback": [
114 |       "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
115 |       "\u001b[0;31mValueError\u001b[0m                                Traceback (most recent call last)",
116 |       "\u001b[0;32m<ipython-input-4-6765ce49acfe>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mint\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'Hello'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
117 |       "\u001b[0;31mValueError\u001b[0m: invalid literal for int() with base 10: 'Hello'"
118 |      ],
119 |      "output_type": "error"
120 |     }
121 |    ],
122 |    "source": [
123 |     "int('Hello')"
124 |    ]
125 |   },
126 |   {
127 |    "cell_type": "code",
128 |    "execution_count": 5,
129 |    "metadata": {},
130 |    "outputs": [
131 |     {
132 |      "data": {
133 |       "text/plain": [
134 |        "3"
135 |       ]
136 |      },
137 |      "execution_count": 5,
138 |      "metadata": {},
139 |      "output_type": "execute_result"
140 |     }
141 |    ],
142 |    "source": [
143 |     "int(3.99999)"
144 |    ]
145 |   },
146 |   {
147 |    "cell_type": "code",
148 |    "execution_count": 6,
149 |    "metadata": {},
150 |    "outputs": [
151 |     {
152 |      "data": {
153 |       "text/plain": [
154 |        "-2"
155 |       ]
156 |      },
157 |      "execution_count": 6,
158 |      "metadata": {},
159 |      "output_type": "execute_result"
160 |     }
161 |    ],
162 |    "source": [
163 |     "int(-2.3)"
164 |    ]
165 |   },
166 |   {
167 |    "cell_type": "code",
168 |    "execution_count": 7,
169 |    "metadata": {},
170 |    "outputs": [
171 |     {
172 |      "data": {
173 |       "text/plain": [
174 |        "32.0"
175 |       ]
176 |      },
177 |      "execution_count": 7,
178 |      "metadata": {},
179 |      "output_type": "execute_result"
180 |     }
181 |    ],
182 |    "source": [
183 |     "float(32)"
184 |    ]
185 |   },
186 |   {
187 |    "cell_type": "code",
188 |    "execution_count": 8,
189 |    "metadata": {},
190 |    "outputs": [
191 |     {
192 |      "data": {
193 |       "text/plain": [
194 |        "3.14159"
195 |       ]
196 |      },
197 |      "execution_count": 8,
198 |      "metadata": {},
199 |      "output_type": "execute_result"
200 |     }
201 |    ],
202 |    "source": [
203 |     "float('3.14159')"
204 |    ]
205 |   },
206 |   {
207 |    "cell_type": "code",
208 |    "execution_count": 9,
209 |    "metadata": {},
210 |    "outputs": [
211 |     {
212 |      "data": {
213 |       "text/plain": [
214 |        "'3.14159'"
215 |       ]
216 |      },
217 |      "execution_count": 9,
218 |      "metadata": {},
219 |      "output_type": "execute_result"
220 |     }
221 |    ],
222 |    "source": [
223 |     "str(3.14159)"
224 |    ]
225 |   },
226 |   {
227 |    "cell_type": "code",
228 |    "execution_count": 10,
229 |    "metadata": {},
230 |    "outputs": [
231 |     {
232 |      "data": {
233 |       "text/plain": [
234 |        "'32'"
235 |       ]
236 |      },
237 |      "execution_count": 10,
238 |      "metadata": {},
239 |      "output_type": "execute_result"
240 |     }
241 |    ],
242 |    "source": [
243 |     "str(32)"
244 |    ]
245 |   },
246 |   {
247 |    "cell_type": "markdown",
248 |    "metadata": {},
249 |    "source": [
250 |     "## 3.2  Math functions"
251 |    ]
252 |   },
253 |   {
254 |    "cell_type": "markdown",
255 |    "metadata": {},
256 |    "source": [
257 |     "> Python has a math module that provides most of the familiar mathematical functions. A module is a file that contains a collection of related functions."
258 |    ]
259 |   },
260 |   {
261 |    "cell_type": "code",
262 |    "execution_count": 11,
263 |    "metadata": {},
264 |    "outputs": [
265 |     {
266 |      "data": {
267 |       "text/plain": [
268 |        "<module 'math' (built-in)>"
269 |       ]
270 |      },
271 |      "execution_count": 11,
272 |      "metadata": {},
273 |      "output_type": "execute_result"
274 |     }
275 |    ],
276 |    "source": [
277 |     "import math\n",
278 |     "math"
279 |    ]
280 |   },
281 |   {
282 |    "cell_type": "markdown",
283 |    "metadata": {},
284 |    "source": [
285 |     "> This format is called dot notation."
286 |    ]
287 |   },
288 |   {
289 |    "cell_type": "code",
290 |    "execution_count": 12,
291 |    "metadata": {},
292 |    "outputs": [
293 |     {
294 |      "data": {
295 |       "text/plain": [
296 |        "2.2184874961635637"
297 |       ]
298 |      },
299 |      "execution_count": 12,
300 |      "metadata": {},
301 |      "output_type": "execute_result"
302 |     }
303 |    ],
304 |    "source": [
305 |     "# This example uses math.log10 to compute a signal-to-noise ratio in decibels \n",
306 |     "\n",
307 |     "signal_power = 5\n",
308 |     "noise_power = 3\n",
309 |     "ratio = signal_power / noise_power\n",
310 |     "decibels = 10 * math.log10(ratio)\n",
311 |     "decibels"
312 |    ]
313 |   },
314 |   {
315 |    "cell_type": "code",
316 |    "execution_count": null,
317 |    "metadata": {},
318 |    "outputs": [],
319 |    "source": [
320 |     "#The second example finds the sine of radians. The name of the variable is a \n",
321 |     "# hint that sin and the other trigonometric functions (cos, tan, etc.) take arguments in radians. \n",
322 |     "# To convert from degrees to radians, divide by 180 and multiply by π:\n",
323 |     "\n",
324 |     "radians = 0.7\n",
325 |     "height = math.sin(radians)\n",
326 |     "height"
327 |    ]
328 |   },
329 |   {
330 |    "cell_type": "code",
331 |    "execution_count": null,
332 |    "metadata": {},
333 |    "outputs": [],
334 |    "source": [
335 |     "# The expression math.pi gets the variable pi from the math module. Its value is a \n",
336 |     "# floating-point approximation of π, accurate to about 15 digits.\n",
337 |     "\n",
338 |     "degrees = 45\n",
339 |     "radians = degrees / 180.0 * math.pi\n",
340 |     "math.sin(radians)"
341 |    ]
342 |   },
343 |   {
344 |    "cell_type": "code",
345 |    "execution_count": null,
346 |    "metadata": {},
347 |    "outputs": [],
348 |    "source": [
349 |     "# verify the previous result by\n",
350 |     "\n",
351 |     "math.sqrt(2) / 2.0"
352 |    ]
353 |   },
354 |   {
355 |    "cell_type": "markdown",
356 |    "metadata": {},
357 |    "source": [
358 |     "> add meaningful and descriptive comments to your functions"
359 |    ]
360 |   },
361 |   {
362 |    "cell_type": "markdown",
363 |    "metadata": {},
364 |    "source": [
365 |     "## 3.3 Composition"
366 |    ]
367 |   },
368 |   {
369 |    "cell_type": "markdown",
370 |    "metadata": {},
371 |    "source": [
372 |     "> One of the most useful features of programming languages is their ability to take small building blocks and compose them."
373 |    ]
374 |   },
375 |   {
376 |    "cell_type": "code",
377 |    "execution_count": 13,
378 |    "metadata": {},
379 |    "outputs": [
380 |     {
381 |      "ename": "NameError",
382 |      "evalue": "name 'degrees' is not defined",
383 |      "traceback": [
384 |       "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
385 |       "\u001b[0;31mNameError\u001b[0m                                 Traceback (most recent call last)",
386 |       "\u001b[0;32m<ipython-input-13-d9e18bdeee91>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mx\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mmath\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0msin\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mdegrees\u001b[0m \u001b[0;34m/\u001b[0m \u001b[0;36m360.0\u001b[0m \u001b[0;34m*\u001b[0m \u001b[0;36m2\u001b[0m \u001b[0;34m*\u001b[0m \u001b[0mmath\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mpi\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m      2\u001b[0m \u001b[0mx\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
387 |       "\u001b[0;31mNameError\u001b[0m: name 'degrees' is not defined"
388 |      ],
389 |      "output_type": "error"
390 |     }
391 |    ],
392 |    "source": [
393 |     "x = math.sin(degrees / 360.0 * 2 * math.pi)\n",
394 |     "x"
395 |    ]
396 |   },
397 |   {
398 |    "cell_type": "code",
399 |    "execution_count": 14,
400 |    "metadata": {},
401 |    "outputs": [
402 |     {
403 |      "data": {
404 |       "text/plain": [
405 |        "0.01745240643728351"
406 |       ]
407 |      },
408 |      "execution_count": 14,
409 |      "metadata": {},
410 |      "output_type": "execute_result"
411 |     }
412 |    ],
413 |    "source": [
414 |     "x = math.sin(1 / 360.0 * 2 * math.pi)\n",
415 |     "x"
416 |    ]
417 |   },
418 |   {
419 |    "cell_type": "code",
420 |    "execution_count": null,
421 |    "metadata": {},
422 |    "outputs": [],
423 |    "source": [
424 |     "x = math.exp(math.log(x+1))\n",
425 |     "x"
426 |    ]
427 |   },
428 |   {
429 |    "cell_type": "code",
430 |    "execution_count": 15,
431 |    "metadata": {},
432 |    "outputs": [
433 |     {
434 |      "data": {
435 |       "text/plain": [
436 |        "600"
437 |       ]
438 |      },
439 |      "execution_count": 15,
440 |      "metadata": {},
441 |      "output_type": "execute_result"
442 |     }
443 |    ],
444 |    "source": [
445 |     "hours = 10\n",
446 |     "minutes = hours * 60\n",
447 |     "minutes"
448 |    ]
449 |   },
450 |   {
451 |    "cell_type": "code",
452 |    "execution_count": 16,
453 |    "metadata": {},
454 |    "outputs": [
455 |     {
456 |      "ename": "SyntaxError",
457 |      "evalue": "can't assign to operator (<ipython-input-16-9442855be4e0>, line 1)",
458 |      "traceback": [
459 |       "\u001b[0;36m  File \u001b[0;32m\"<ipython-input-16-9442855be4e0>\"\u001b[0;36m, line \u001b[0;32m1\u001b[0m\n\u001b[0;31m    hours * 60 = minutes\u001b[0m\n\u001b[0m                        ^\u001b[0m\n\u001b[0;31mSyntaxError\u001b[0m\u001b[0;31m:\u001b[0m can't assign to operator\n"
460 |      ],
461 |      "output_type": "error"
462 |     }
463 |    ],
464 |    "source": [
465 |     "hours * 60 = minutes"
466 |    ]
467 |   },
468 |   {
469 |    "cell_type": "markdown",
470 |    "metadata": {},
471 |    "source": [
472 |     "> avoid confusing and misleading compositions\n",
473 |     "\n",
474 |     "> keep to the KISS principle - keep it simple, stupid"
475 |    ]
476 |   },
477 |   {
478 |    "cell_type": "markdown",
479 |    "metadata": {},
480 |    "source": [
481 |     "## 3.4  Adding new functions"
482 |    ]
483 |   },
484 |   {
485 |    "cell_type": "markdown",
486 |    "metadata": {},
487 |    "source": [
488 |     "> A function definition specifies the name of a new function and the sequence of statements that run when the function is called."
489 |    ]
490 |   },
491 |   {
492 |    "cell_type": "code",
493 |    "execution_count": 17,
494 |    "metadata": {},
495 |    "outputs": [],
496 |    "source": [
497 |     "# def -  is a keyword that indicates that this is a function definition\n",
498 |     "# print_lyrics - the function name\n",
499 |     "# () -  indicate that this function doesn’t take any arguments.\n",
500 |     "\n",
501 |     "def print_lyrics():\n",
502 |     "    print(\"I'm a lumberjack, and I'm okay.\")\n",
503 |     "    print(\"I sleep all night and I work all day.\")"
504 |    ]
505 |   },
506 |   {
507 |    "cell_type": "markdown",
508 |    "metadata": {},
509 |    "source": [
510 |     "> The first line of the function definition is called the header; the rest is called the body. \n",
511 |     "\n",
512 |     "> Single quotes and double quotes do the same thing in most situations;"
513 |    ]
514 |   },
515 |   {
516 |    "cell_type": "code",
517 |    "execution_count": 18,
518 |    "metadata": {},
519 |    "outputs": [
520 |     {
521 |      "data": {
522 |       "text/plain": [
523 |        "function"
524 |       ]
525 |      },
526 |      "execution_count": 18,
527 |      "metadata": {},
528 |      "output_type": "execute_result"
529 |     }
530 |    ],
531 |    "source": [
532 |     "type(print_lyrics)"
533 |    ]
534 |   },
535 |   {
536 |    "cell_type": "code",
537 |    "execution_count": 19,
538 |    "metadata": {},
539 |    "outputs": [
540 |     {
541 |      "name": "stdout",
542 |      "output_type": "stream",
543 |      "text": [
544 |       "<function print_lyrics at 0x7f60bc408b70>\n"
545 |      ]
546 |     }
547 |    ],
548 |    "source": [
549 |     "print(print_lyrics)"
550 |    ]
551 |   },
552 |   {
553 |    "cell_type": "markdown",
554 |    "metadata": {},
555 |    "source": [
556 |     "> The syntax for calling the new function is the same as for built-in functions:"
557 |    ]
558 |   },
559 |   {
560 |    "cell_type": "code",
561 |    "execution_count": 20,
562 |    "metadata": {},
563 |    "outputs": [
564 |     {
565 |      "name": "stdout",
566 |      "output_type": "stream",
567 |      "text": [
568 |       "I'm a lumberjack, and I'm okay.\n",
569 |       "I sleep all night and I work all day.\n"
570 |      ]
571 |     }
572 |    ],
573 |    "source": [
574 |     "print_lyrics()"
575 |    ]
576 |   },
577 |   {
578 |    "cell_type": "code",
579 |    "execution_count": 21,
580 |    "metadata": {},
581 |    "outputs": [],
582 |    "source": [
583 |     "def repeat_lyrics():\n",
584 |     "    print_lyrics()\n",
585 |     "    print_lyrics()"
586 |    ]
587 |   },
588 |   {
589 |    "cell_type": "code",
590 |    "execution_count": 22,
591 |    "metadata": {},
592 |    "outputs": [
593 |     {
594 |      "name": "stdout",
595 |      "output_type": "stream",
596 |      "text": [
597 |       "I'm a lumberjack, and I'm okay.\n",
598 |       "I sleep all night and I work all day.\n",
599 |       "I'm a lumberjack, and I'm okay.\n",
600 |       "I sleep all night and I work all day.\n"
601 |      ]
602 |     }
603 |    ],
604 |    "source": [
605 |     "repeat_lyrics()"
606 |    ]
607 |   },
608 |   {
609 |    "cell_type": "markdown",
610 |    "metadata": {},
611 |    "source": [
612 |     "## 3.5  Definitions and uses"
613 |    ]
614 |   },
615 |   {
616 |    "cell_type": "markdown",
617 |    "metadata": {},
618 |    "source": [
619 |     "> This program contains two function definitions: print_lyrics and repeat_lyrics. Function definitions get executed just like other statements, but the effect is to create function objects.\n",
620 |     "\n",
621 |     ">  You have to create a function before you can run it. In other words, the function definition has to run before the function gets called."
622 |    ]
623 |   },
624 |   {
625 |    "cell_type": "code",
626 |    "execution_count": 24,
627 |    "metadata": {},
628 |    "outputs": [
629 |     {
630 |      "name": "stdout",
631 |      "output_type": "stream",
632 |      "text": [
633 |       "I'm a lumberjack, and I'm okay.\n",
634 |       "I sleep all night and I work all day.\n",
635 |       "I'm a lumberjack, and I'm okay.\n",
636 |       "I sleep all night and I work all day.\n"
637 |      ]
638 |     }
639 |    ],
640 |    "source": [
641 |     "def print_lyrics():\n",
642 |     "    print(\"I'm a lumberjack, and I'm okay.\")\n",
643 |     "    print(\"I sleep all night and I work all day.\")\n",
644 |     "\n",
645 |     "def repeat_lyrics():\n",
646 |     "    print_lyrics()\n",
647 |     "    print_lyrics()\n",
648 |     "\n",
649 |     "repeat_lyrics()"
650 |    ]
651 |   },
652 |   {
653 |    "cell_type": "code",
654 |    "execution_count": 25,
655 |    "metadata": {},
656 |    "outputs": [
657 |     {
658 |      "ename": "NameError",
659 |      "evalue": "name 'repeat_lyrics_new' is not defined",
660 |      "traceback": [
661 |       "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
662 |       "\u001b[0;31mNameError\u001b[0m                                 Traceback (most recent call last)",
663 |       "\u001b[0;32m<ipython-input-25-ac3014ab3823>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mrepeat_lyrics_new\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m      2\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m      3\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0mrepeat_lyrics_new\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m      4\u001b[0m     \u001b[0mprint_lyrics\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m      5\u001b[0m     \u001b[0mprint_lyrics\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
664 |       "\u001b[0;31mNameError\u001b[0m: name 'repeat_lyrics_new' is not defined"
665 |      ],
666 |      "output_type": "error"
667 |     }
668 |    ],
669 |    "source": [
670 |     "repeat_lyrics_new()\n",
671 |     "\n",
672 |     "def repeat_lyrics_new():\n",
673 |     "    print_lyrics()\n",
674 |     "    print_lyrics()\n",
675 |     "\n"
676 |    ]
677 |   },
678 |   {
679 |    "cell_type": "markdown",
680 |    "metadata": {},
681 |    "source": [
682 |     "## 3.6  Flow of execution"
683 |    ]
684 |   },
685 |   {
686 |    "cell_type": "markdown",
687 |    "metadata": {},
688 |    "source": [
689 |     "> To ensure that a function is defined before its first use, you have to know the order statements run in, which is called the flow of execution.\n",
690 |     "\n",
691 |     "> Execution always begins at the first statement of the program. Statements are run one at a time, in order from top to bottom.\n",
692 |     "\n",
693 |     "> In summary, when you read a program, you don’t always want to read from top to bottom. Sometimes it makes more sense if you follow the flow of execution.\n",
694 |     "\n"
695 |    ]
696 |   },
697 |   {
698 |    "cell_type": "code",
699 |    "execution_count": 26,
700 |    "metadata": {},
701 |    "outputs": [
702 |     {
703 |      "name": "stdout",
704 |      "output_type": "stream",
705 |      "text": [
706 |       "1\n",
707 |       "3\n",
708 |       "2\n"
709 |      ]
710 |     }
711 |    ],
712 |    "source": [
713 |     "def print_lyrics_1():\n",
714 |     "    print(\"1\")\n",
715 |     "\n",
716 |     "def print_lyrics_2():\n",
717 |     "    print(\"2\")  \n",
718 |     "    \n",
719 |     "def print_lyrics_3():\n",
720 |     "    print(\"3\")\n",
721 |     "\n",
722 |     "def repeat_lyrics():\n",
723 |     "    print_lyrics_1()\n",
724 |     "    print_lyrics_3()\n",
725 |     "    print_lyrics_2()\n",
726 |     "\n",
727 |     "repeat_lyrics()"
728 |    ]
729 |   },
730 |   {
731 |    "cell_type": "markdown",
732 |    "metadata": {},
733 |    "source": [
734 |     "## 3.7  Parameters and arguments"
735 |    ]
736 |   },
737 |   {
738 |    "cell_type": "markdown",
739 |    "metadata": {},
740 |    "source": [
741 |     "> Some of the functions we have seen require arguments. For example, when you call math.sin you pass a number as an argument. Some functions take more than one argument: math.pow takes two, the base and the exponent.\n",
742 |     "\n",
743 |     "> Inside the function, the arguments are assigned to variables called parameters. "
744 |    ]
745 |   },
746 |   {
747 |    "cell_type": "code",
748 |    "execution_count": 27,
749 |    "metadata": {},
750 |    "outputs": [],
751 |    "source": [
752 |     "def print_twice(bruce):\n",
753 |     "    print(bruce)\n",
754 |     "    print(bruce)"
755 |    ]
756 |   },
757 |   {
758 |    "cell_type": "code",
759 |    "execution_count": 28,
760 |    "metadata": {},
761 |    "outputs": [
762 |     {
763 |      "name": "stdout",
764 |      "output_type": "stream",
765 |      "text": [
766 |       "Spam\n",
767 |       "Spam\n",
768 |       "42\n",
769 |       "42\n",
770 |       "3.141592653589793\n",
771 |       "3.141592653589793\n"
772 |      ]
773 |     }
774 |    ],
775 |    "source": [
776 |     "print_twice('Spam')\n",
777 |     "print_twice(42)\n",
778 |     "print_twice(math.pi)"
779 |    ]
780 |   },
781 |   {
782 |    "cell_type": "code",
783 |    "execution_count": 29,
784 |    "metadata": {},
785 |    "outputs": [
786 |     {
787 |      "name": "stdout",
788 |      "output_type": "stream",
789 |      "text": [
790 |       "Spam Spam Spam Spam \n",
791 |       "Spam Spam Spam Spam \n"
792 |      ]
793 |     }
794 |    ],
795 |    "source": [
796 |     "print_twice('Spam '*4)"
797 |    ]
798 |   },
799 |   {
800 |    "cell_type": "code",
801 |    "execution_count": 30,
802 |    "metadata": {},
803 |    "outputs": [
804 |     {
805 |      "name": "stdout",
806 |      "output_type": "stream",
807 |      "text": [
808 |       "-1.0\n",
809 |       "-1.0\n"
810 |      ]
811 |     }
812 |    ],
813 |    "source": [
814 |     "print_twice(math.cos(math.pi))"
815 |    ]
816 |   },
817 |   {
818 |    "cell_type": "markdown",
819 |    "metadata": {},
820 |    "source": [
821 |     "> The argument is evaluated before the function is called, so in the examples the expressions 'Spam '*4 and math.cos(math.pi) are only evaluated once\n",
822 |     "\n",
823 |     "> The name of the variable we pass as an argument (michael) has nothing to do with the name of the parameter (bruce)."
824 |    ]
825 |   },
826 |   {
827 |    "cell_type": "code",
828 |    "execution_count": 31,
829 |    "metadata": {},
830 |    "outputs": [
831 |     {
832 |      "name": "stdout",
833 |      "output_type": "stream",
834 |      "text": [
835 |       "Eric, the half a bee.\n",
836 |       "Eric, the half a bee.\n"
837 |      ]
838 |     }
839 |    ],
840 |    "source": [
841 |     "michael = 'Eric, the half a bee.'\n",
842 |     "print_twice(michael)"
843 |    ]
844 |   },
845 |   {
846 |    "cell_type": "code",
847 |    "execution_count": null,
848 |    "metadata": {},
849 |    "outputs": [],
850 |    "source": []
851 |   }
852 |  ],
853 |  "metadata": {
854 |   "kernelspec": {
855 |    "display_name": "Python 3",
856 |    "language": "python",
857 |    "name": "python3"
858 |   },
859 |   "language_info": {
860 |    "codemirror_mode": {
861 |     "name": "ipython",
862 |     "version": 3
863 |    },
864 |    "file_extension": ".py",
865 |    "mimetype": "text/x-python",
866 |    "name": "python",
867 |    "nbconvert_exporter": "python",
868 |    "pygments_lexer": "ipython3",
869 |    "version": "3.6.7"
870 |   }
871 |  },
872 |  "nbformat": 4,
873 |  "nbformat_minor": 2
874 | }
875 | 


--------------------------------------------------------------------------------
/notebooks/Books/Think Python/Chapter_4__Case_study_interface_design.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "metadata": {},
  6 |    "source": [
  7 |     "## Chapter 4  Case study: interface design\n",
  8 |     "\n",
  9 |     "> This chapter presents a case study that demonstrates a process for designing functions that work together.\n",
 10 |     "\n",
 11 |     "\n",
 12 |     "\n",
 13 |     "* The turtle module\n",
 14 |     "* Simple repetition\n",
 15 |     "* Exercises\n",
 16 |     "* **Encapsulation**\n",
 17 |     "* **Generalization**\n",
 18 |     "* **Interface design**\n",
 19 |     "* **Refactoring**\n",
 20 |     "* **A development plan**\n",
 21 |     "* **docstring**\n",
 22 |     "* Debugging"
 23 |    ]
 24 |   },
 25 |   {
 26 |    "cell_type": "markdown",
 27 |    "metadata": {},
 28 |    "source": [
 29 |     "## 4.1 The turtle module"
 30 |    ]
 31 |   },
 32 |   {
 33 |    "cell_type": "code",
 34 |    "execution_count": 1,
 35 |    "metadata": {},
 36 |    "outputs": [
 37 |     {
 38 |      "data": {
 39 |       "text/plain": [
 40 |        "'0.23.4'"
 41 |       ]
 42 |      },
 43 |      "execution_count": 1,
 44 |      "metadata": {},
 45 |      "output_type": "execute_result"
 46 |     }
 47 |    ],
 48 |    "source": [
 49 |     "import pandas\n",
 50 |     "pandas.__version__"
 51 |    ]
 52 |   },
 53 |   {
 54 |    "cell_type": "code",
 55 |    "execution_count": 2,
 56 |    "metadata": {},
 57 |    "outputs": [],
 58 |    "source": [
 59 |     "import turtle"
 60 |    ]
 61 |   },
 62 |   {
 63 |    "cell_type": "markdown",
 64 |    "metadata": {},
 65 |    "source": [
 66 |     "> The turtle module (with a lowercase ’t’) provides a function called Turtle (with an uppercase ’T’) that creates a Turtle object, which we assign to a variable named bob. Printing bob displays something like:"
 67 |    ]
 68 |   },
 69 |   {
 70 |    "cell_type": "code",
 71 |    "execution_count": null,
 72 |    "metadata": {
 73 |     "scrolled": true
 74 |    },
 75 |    "outputs": [
 76 |     {
 77 |      "name": "stdout",
 78 |      "output_type": "stream",
 79 |      "text": [
 80 |       "<turtle.Turtle object at 0x7febbee7b898>\n"
 81 |      ]
 82 |     }
 83 |    ],
 84 |    "source": [
 85 |     "# mypolygon.py\n",
 86 |     "import turtle\n",
 87 |     "bob = turtle.Turtle()\n",
 88 |     "print(bob)\n",
 89 |     "turtle.mainloop()\n",
 90 |     "\n",
 91 |     "import os\n",
 92 |     "os._exit(00)"
 93 |    ]
 94 |   },
 95 |   {
 96 |    "cell_type": "code",
 97 |    "execution_count": null,
 98 |    "metadata": {},
 99 |    "outputs": [],
100 |    "source": [
101 |     "# draw a right angle\n",
102 |     "import turtle\n",
103 |     "bob = turtle.Turtle()\n",
104 |     "bob.fd(100)\n",
105 |     "bob.lt(90)\n",
106 |     "bob.fd(100)\n",
107 |     "turtle.mainloop()\n",
108 |     "\n",
109 |     "import os\n",
110 |     "os._exit(00)"
111 |    ]
112 |   },
113 |   {
114 |    "cell_type": "code",
115 |    "execution_count": null,
116 |    "metadata": {},
117 |    "outputs": [],
118 |    "source": [
119 |     "# A method is similar to a function, but it uses slightly different syntax. \n",
120 |     "import turtle\n",
121 |     "bob = turtle.Turtle()\n",
122 |     "bob.fd(100)\n",
123 |     "bob.lt(90)\n",
124 |     "bob.fd(100)\n",
125 |     "turtle.mainloop()\n",
126 |     "\n",
127 |     "import os\n",
128 |     "os._exit(00)"
129 |    ]
130 |   },
131 |   {
132 |    "cell_type": "markdown",
133 |    "metadata": {},
134 |    "source": [
135 |     "* A **function** is a piece of code that is called by name. It can be passed data to operate on (i.e. the parameters) and can optionally return data (the return value). All data that is passed to a function is explicitly passed.<br/><br/>\n",
136 |     "\n",
137 |     "* A **method** is a piece of code that is called by a name **that is associated with an object**. In most respects it is identical to a function except for two key differences:\n",
138 |     "    * A method is implicitly passed the object on which it was called.\n",
139 |     "    * A method is able to operate on data that is contained within the class"
140 |    ]
141 |   },
142 |   {
143 |    "cell_type": "markdown",
144 |    "metadata": {},
145 |    "source": [
146 |     "## 4.2  Simple repetition"
147 |    ]
148 |   },
149 |   {
150 |    "cell_type": "code",
151 |    "execution_count": null,
152 |    "metadata": {
153 |     "scrolled": true
154 |    },
155 |    "outputs": [],
156 |    "source": [
157 |     "# square\n",
158 |     "import turtle\n",
159 |     "bob3 = turtle.Turtle()\n",
160 |     "\n",
161 |     "bob3.fd(100)\n",
162 |     "bob3.lt(90)\n",
163 |     "\n",
164 |     "bob3.fd(100)\n",
165 |     "bob3.lt(90)\n",
166 |     "\n",
167 |     "bob3.fd(100)\n",
168 |     "bob3.lt(90)\n",
169 |     "\n",
170 |     "bob3.fd(100)\n",
171 |     "\n",
172 |     "turtle.mainloop()\n",
173 |     "\n",
174 |     "import os\n",
175 |     "os._exit(00)"
176 |    ]
177 |   },
178 |   {
179 |    "cell_type": "markdown",
180 |    "metadata": {},
181 |    "source": [
182 |     "> A for statement is also called a loop because the flow of execution runs through the body and then loops back to the top"
183 |    ]
184 |   },
185 |   {
186 |    "cell_type": "code",
187 |    "execution_count": 1,
188 |    "metadata": {},
189 |    "outputs": [
190 |     {
191 |      "name": "stdout",
192 |      "output_type": "stream",
193 |      "text": [
194 |       "Hello!\n",
195 |       "Hello!\n",
196 |       "Hello!\n",
197 |       "Hello!\n"
198 |      ]
199 |     }
200 |    ],
201 |    "source": [
202 |     "for i in range(4):\n",
203 |     "    print('Hello!')"
204 |    ]
205 |   },
206 |   {
207 |    "cell_type": "code",
208 |    "execution_count": null,
209 |    "metadata": {},
210 |    "outputs": [],
211 |    "source": [
212 |     "# square \n",
213 |     "import turtle\n",
214 |     "bob = turtle.Turtle()\n",
215 |     "for i in range(4):\n",
216 |     "    bob.fd(100)\n",
217 |     "    bob.lt(90)\n",
218 |     "\n",
219 |     "turtle.done()\n",
220 |     "\n",
221 |     "import os\n",
222 |     "os._exit(00)"
223 |    ]
224 |   },
225 |   {
226 |    "cell_type": "markdown",
227 |    "metadata": {},
228 |    "source": [
229 |     "Did you notice the difference between both programs?\n",
230 |     "\n",
231 |     "**The art of cognitive blindspots | Kyle Eschen**\n",
232 |     "\n",
233 |     "https://www.youtube.com/watch?reload=9&v=OOG65rSM5fA"
234 |    ]
235 |   },
236 |   {
237 |    "cell_type": "markdown",
238 |    "metadata": {},
239 |    "source": [
240 |     "## 4.3  Exercises\n",
241 |     "\n",
242 |     "1. Write a function called square that takes a parameter named t, which is a turtle. It should use the turtle to draw a square.\n",
243 |     "Write a function call that passes bob as an argument to square, and then run the program again.<br /><br />\n",
244 |     "\n",
245 |     "2. Add another parameter, named length, to square. Modify the body so length of the sides is length, and then modify the function call to provide a second argument. Run the program again. Test your program with a range of values for length.<br /><br />\n",
246 |     "\n",
247 |     "3. Make a copy of square and change the name to polygon. Add another parameter named n and modify the body so it draws an n-sided regular polygon. Hint: The exterior angles of an n-sided regular polygon are 360/n degrees.<br /><br />\n",
248 |     "\n",
249 |     "4. Write a function called circle that takes a turtle, t, and radius, r, as parameters and that draws an approximate circle by calling polygon with an appropriate length and number of sides. Test your function with a range of values of r.\n",
250 |     "Hint: figure out the circumference of the circle and make sure that length * n = circumference.<br /><br />\n",
251 |     "\n",
252 |     "5. Make a more general version of circle called arc that takes an additional parameter angle, which determines what fraction of a circle to draw. angle is in units of degrees, so when angle=360, arc should draw a complete circle."
253 |    ]
254 |   },
255 |   {
256 |    "cell_type": "markdown",
257 |    "metadata": {},
258 |    "source": [
259 |     "## 4.4  Encapsulation\n",
260 |     "\n",
261 |     "> Wrapping a piece of code up in a function is called encapsulation. \n",
262 |     "\n",
263 |     "The major advantages: \n",
264 |     "* code re-use\n",
265 |     "* shorter programs (it is more concise to call a function twice than to copy and paste the body)"
266 |    ]
267 |   },
268 |   {
269 |    "cell_type": "code",
270 |    "execution_count": null,
271 |    "metadata": {},
272 |    "outputs": [],
273 |    "source": [
274 |     "# square \n",
275 |     "import turtle\n",
276 |     "\n",
277 |     "def square(t):\n",
278 |     "    for i in range(4):\n",
279 |     "        t.fd(100)\n",
280 |     "        t.lt(90)\n",
281 |     "\n",
282 |     "bob = turtle.Turtle()\n",
283 |     "square(bob)\n",
284 |     "turtle.done()\n",
285 |     "\n",
286 |     "import os\n",
287 |     "os._exit(00)"
288 |    ]
289 |   },
290 |   {
291 |    "cell_type": "markdown",
292 |    "metadata": {},
293 |    "source": [
294 |     "> The innermost statements, fd and lt are indented twice to show that they are inside the for loop, which is inside the function definition. The next line, square(bob), is flush with the left margin, which indicates the end of both the for loop and the function definition."
295 |    ]
296 |   },
297 |   {
298 |    "cell_type": "markdown",
299 |    "metadata": {},
300 |    "source": [
301 |     "> Inside the function, t refers to the same turtle bob, so t.lt(90) has the same effect as bob.lt(90). In that case, why not call the parameter bob? "
302 |    ]
303 |   },
304 |   {
305 |    "cell_type": "markdown",
306 |    "metadata": {},
307 |    "source": [
308 |     "## 4.5  Generalization\n",
309 |     "\n",
310 |     "> Adding a parameter to a function is called generalization because it makes the function more general: in the previous version, the square is always the same size; in this version it can be any size."
311 |    ]
312 |   },
313 |   {
314 |    "cell_type": "code",
315 |    "execution_count": null,
316 |    "metadata": {},
317 |    "outputs": [],
318 |    "source": [
319 |     "# add a length parameter to square.  \n",
320 |     "import turtle\n",
321 |     "\n",
322 |     "def square(t, length):\n",
323 |     "    for i in range(4):\n",
324 |     "        t.fd(length)\n",
325 |     "        t.lt(90)\n",
326 |     "\n",
327 |     "\n",
328 |     "\n",
329 |     "bob = turtle.Turtle()\n",
330 |     "square(bob, 100)\n",
331 |     "\n",
332 |     "turtle.done()\n",
333 |     "\n",
334 |     "import os\n",
335 |     "os._exit(00)"
336 |    ]
337 |   },
338 |   {
339 |    "cell_type": "code",
340 |    "execution_count": null,
341 |    "metadata": {},
342 |    "outputs": [],
343 |    "source": [
344 |     "# Instead of drawing squares, polygon draws regular polygons with any number of sides.\n",
345 |     "import turtle\n",
346 |     "\n",
347 |     "def polygon(t, n, length):\n",
348 |     "    angle = 360 / n\n",
349 |     "    for i in range(n):\n",
350 |     "        t.fd(length)\n",
351 |     "        t.lt(angle)\n",
352 |     "\n",
353 |     "bob = turtle.Turtle()\n",
354 |     "polygon(bob, 21, 70)\n",
355 |     "\n",
356 |     "turtle.done()\n",
357 |     "\n",
358 |     "import os\n",
359 |     "os._exit(00)"
360 |    ]
361 |   },
362 |   {
363 |    "cell_type": "markdown",
364 |    "metadata": {},
365 |    "source": [
366 |     "> When a function has more than a few numeric arguments, it is easy to forget what they are, or what order they should be in. In that case it is often a good idea to include the names of the parameters in the argument list:\n",
367 |     "\n",
368 |     "```python\n",
369 |     "polygon(bob, n=7, length=70)```\n",
370 |     "\n",
371 |     "> These are called keyword arguments because they include the parameter names as “keywords” (not to be confused with Python keywords like while and def)."
372 |    ]
373 |   },
374 |   {
375 |    "cell_type": "markdown",
376 |    "metadata": {},
377 |    "source": [
378 |     "## 4.6 Interface design\n",
379 |     "\n",
380 |     "> The interface of a function is a summary of how it is used: \n",
381 |     "\n",
382 |     "* what are the parameters? \n",
383 |     "* What does the function do? \n",
384 |     "* And what is the return value? \n",
385 |     "\n",
386 |     "> An interface is “clean” if it allows the caller to do what they want without dealing with unnecessary details.\n",
387 |     "\n"
388 |    ]
389 |   },
390 |   {
391 |    "cell_type": "code",
392 |    "execution_count": null,
393 |    "metadata": {},
394 |    "outputs": [],
395 |    "source": [
396 |     "# The next step is to write circle, which takes a radius, r, as a parameter. \n",
397 |     "import turtle\n",
398 |     "import math\n",
399 |     "\n",
400 |     "def polygon(t, n, length):\n",
401 |     "    angle = 360 / n\n",
402 |     "    for i in range(n):\n",
403 |     "        t.fd(length)\n",
404 |     "        t.lt(angle)\n",
405 |     "\n",
406 |     "def circle(t, r):\n",
407 |     "    circumference = 2 * math.pi * r\n",
408 |     "    n = 50\n",
409 |     "    length = circumference / n\n",
410 |     "    polygon(t, n, length)\n",
411 |     "\n",
412 |     "bob = turtle.Turtle()\n",
413 |     "circle(bob, 75)\n",
414 |     "\n",
415 |     "turtle.done()\n",
416 |     "\n",
417 |     "import os\n",
418 |     "os._exit(00)"
419 |    ]
420 |   },
421 |   {
422 |    "cell_type": "code",
423 |    "execution_count": null,
424 |    "metadata": {},
425 |    "outputs": [],
426 |    "source": [
427 |     "# One limitation of this solution is that n is a constant,\n",
428 |     "import turtle\n",
429 |     "import math\n",
430 |     "\n",
431 |     "def polygon(t, n, length):\n",
432 |     "    angle = 360 / n\n",
433 |     "    for i in range(n):\n",
434 |     "        t.fd(length)\n",
435 |     "        t.lt(angle)\n",
436 |     "\n",
437 |     "def circle(t, r):\n",
438 |     "    circumference = 2 * math.pi * r\n",
439 |     "    n = int(circumference / 3) + 3\n",
440 |     "    length = circumference / n\n",
441 |     "    polygon(t, n, length)\n",
442 |     "\n",
443 |     "bob = turtle.Turtle()\n",
444 |     "circle(bob, 75)\n",
445 |     "\n",
446 |     "turtle.done()\n",
447 |     "\n",
448 |     "import os\n",
449 |     "os._exit(00)"
450 |    ]
451 |   },
452 |   {
453 |    "cell_type": "markdown",
454 |    "metadata": {},
455 |    "source": [
456 |     "## 4.7  Refactoring\n",
457 |     "\n",
458 |     "> This process—rearranging a program to improve interfaces and facilitate code re-use—is called refactoring. In this case, we noticed that there was similar code in arc and polygon, so we “factored it out” into polyline."
459 |    ]
460 |   },
461 |   {
462 |    "cell_type": "code",
463 |    "execution_count": null,
464 |    "metadata": {
465 |     "scrolled": true
466 |    },
467 |    "outputs": [],
468 |    "source": [
469 |     "# copy of polygon and transform it into arc\n",
470 |     "import turtle\n",
471 |     "import math\n",
472 |     "\n",
473 |     "def arc(t, r, angle):\n",
474 |     "    arc_length = 2 * math.pi * r * angle / 360\n",
475 |     "    n = int(arc_length / 3) + 1\n",
476 |     "    step_length = arc_length / n\n",
477 |     "    step_angle = angle / n\n",
478 |     "    \n",
479 |     "    for i in range(n):\n",
480 |     "        t.fd(step_length)\n",
481 |     "        t.lt(step_angle)\n",
482 |     "\n",
483 |     "bob = turtle.Turtle()\n",
484 |     "arc(bob, 100, 180)\n",
485 |     "\n",
486 |     "turtle.done()\n",
487 |     "\n",
488 |     "import os\n",
489 |     "os._exit(00)"
490 |    ]
491 |   },
492 |   {
493 |    "cell_type": "code",
494 |    "execution_count": null,
495 |    "metadata": {},
496 |    "outputs": [],
497 |    "source": [
498 |     "# general function polyline\n",
499 |     "# rewrite polygon and arc to use polyline\n",
500 |     "\n",
501 |     "import turtle\n",
502 |     "import math\n",
503 |     "\n",
504 |     "def polyline(t, n, length, angle):\n",
505 |     "    for i in range(n):\n",
506 |     "        t.fd(length)\n",
507 |     "        t.lt(angle)\n",
508 |     "\n",
509 |     "def polygon(t, n, length):\n",
510 |     "    angle = 360.0 / n\n",
511 |     "    polyline(t, n, length, angle)\n",
512 |     "\n",
513 |     "def arc(t, r, angle):\n",
514 |     "    arc_length = 2 * math.pi * r * angle / 360\n",
515 |     "    n = int(arc_length / 3) + 1\n",
516 |     "    step_length = arc_length / n\n",
517 |     "    step_angle = float(angle) / n\n",
518 |     "    polyline(t, n, step_length, step_angle)\n",
519 |     "    \n",
520 |     "def circle(t, r):\n",
521 |     "    arc(t, r, 360)\n",
522 |     "\n",
523 |     "bob = turtle.Turtle()\n",
524 |     "arc(bob, 100, 180)\n",
525 |     "\n",
526 |     "turtle.done()\n",
527 |     "\n",
528 |     "import os\n",
529 |     "os._exit(00)"
530 |    ]
531 |   },
532 |   {
533 |    "cell_type": "markdown",
534 |    "metadata": {},
535 |    "source": [
536 |     "## 4.8  A development plan\n",
537 |     "\n",
538 |     "1. Start by writing a small program with no function definitions. <br/><br/>\n",
539 |     "2. Once you get the program working, identify a coherent piece of it, encapsulate the piece in a function and give it a name.<br/><br/>\n",
540 |     "3. Generalize the function by adding appropriate parameters.<br/><br/>\n",
541 |     "4. Repeat steps 1–3 until you have a set of working functions. Copy and paste working code to avoid retyping (and re-debugging).<br/><br/>\n",
542 |     "5. Look for opportunities to improve the program by refactoring. For example, if you have similar code in several places, consider factoring it into an appropriately general function.<br/><br/>\n"
543 |    ]
544 |   },
545 |   {
546 |    "cell_type": "markdown",
547 |    "metadata": {},
548 |    "source": [
549 |     "## 4.9  docstring\n",
550 |     "\n",
551 |     "> A docstring is a string at the beginning of a function that explains the interface (“doc” is short for “documentation”)."
552 |    ]
553 |   },
554 |   {
555 |    "cell_type": "code",
556 |    "execution_count": 1,
557 |    "metadata": {},
558 |    "outputs": [
559 |     {
560 |      "name": "stdout",
561 |      "output_type": "stream",
562 |      "text": [
563 |       "polyline\n",
564 |       "square\n"
565 |      ]
566 |     }
567 |    ],
568 |    "source": [
569 |     "import turtle\n",
570 |     "\n",
571 |     "def polyline():\n",
572 |     "    \"\"\"Draws n line segments with the given length and\n",
573 |     "    angle (in degrees) between them.  t is a turtle.\n",
574 |     "    \"\"\"  \n",
575 |     "    print('polyline')\n",
576 |     "    #for i in range(n):\n",
577 |     "    #    t.fd(length)\n",
578 |     "    #    t.lt(angle)\n",
579 |     "    \n",
580 |     "def square():\n",
581 |     "    print('square')\n",
582 |     "    \n",
583 |     "polyline()       \n",
584 |     "square()"
585 |    ]
586 |   },
587 |   {
588 |    "cell_type": "markdown",
589 |    "metadata": {},
590 |    "source": [
591 |     "## 4.10  Debugging\n",
592 |     "\n",
593 |     "> If the preconditions are satisfied and the postconditions are not, the bug is in the function. If your pre- and postconditions are clear, they can help with debugging."
594 |    ]
595 |   }
596 |  ],
597 |  "metadata": {
598 |   "kernelspec": {
599 |    "display_name": "Python 3",
600 |    "language": "python",
601 |    "name": "python3"
602 |   },
603 |   "language_info": {
604 |    "codemirror_mode": {
605 |     "name": "ipython",
606 |     "version": 3
607 |    },
608 |    "file_extension": ".py",
609 |    "mimetype": "text/x-python",
610 |    "name": "python",
611 |    "nbconvert_exporter": "python",
612 |    "pygments_lexer": "ipython3",
613 |    "version": "3.6.8"
614 |   }
615 |  },
616 |  "nbformat": 4,
617 |  "nbformat_minor": 2
618 | }
619 | 


--------------------------------------------------------------------------------
/notebooks/Books/Think Python/ch7_debug.py:
--------------------------------------------------------------------------------
 1 | def pascal(n):
 2 |     row_map = {0: {n: 1}}
 3 | 
 4 |     for i in range(1, n):
 5 |         row_list = {}
 6 | 
 7 |         prev_row = row_map[i - 1]
 8 |         for k, v in row_map[i - 1].items():
 9 | 
10 |             if k + 1 in row_list:
11 |                 row_list[k + 1] = row_list[k + 1]
12 |             else:
13 |                 row_list[k + 1] = prev_row.get(k, 0) + prev_row.get(k + 2, 0)
14 | 
15 |             if k - 1 in row_list:
16 |                 row_list[k - 1] = row_list[k - 1]
17 |             else:
18 |                 row_list[k - 1] = prev_row.get(k, 0) + prev_row.get(k - 2, 0)
19 | 
20 |         row_map[i] = row_list
21 | 
22 |     for k, v in row_map.items():
23 |         print(f'k: {k}, v: {v}')
24 | 
25 |     for k, v in row_map.items():
26 |         # print(f'k: {k}, v: {v}')
27 |         count = 0
28 |         for kk, vv in sorted(v.items()):
29 |             count = count + 1
30 |             if count == 1:
31 |                 print('   ' * kk + f'{vv:3}', end='')
32 |             else:
33 |                 print('   ' + f'{vv:3}', end='')
34 |         print()
35 | 
36 | 
37 | def fibMemo(i):
38 |     memo = {}
39 |     if i in memo:
40 |         return memo[i]
41 |     if i <= 2:
42 |         return 1
43 |     else:
44 |         f = fibMemo(i - 1) + fibMemo(i - 2)
45 |         memo[i] = f
46 |     # print("calc", i, memo)
47 |     return f
48 | 
49 | 
50 | x = fibMemo(4)
51 | print(x)
52 | pascal(x)
53 | 


--------------------------------------------------------------------------------
/notebooks/Books/Think Python/strings_in_python.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/softhints/python/a256a054d74ca397f41874b3e26f1c4b84214432/notebooks/Books/Think Python/strings_in_python.png


--------------------------------------------------------------------------------
/notebooks/IPython tricks 2019.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "metadata": {},
  6 |    "source": [
  7 |     "# IPython/Jupyter Notebook tricks for advanced in 2019"
  8 |    ]
  9 |   },
 10 |   {
 11 |    "cell_type": "markdown",
 12 |    "metadata": {},
 13 |    "source": [
 14 |     "## Suppress output in IPython Notebook "
 15 |    ]
 16 |   },
 17 |   {
 18 |    "cell_type": "markdown",
 19 |    "metadata": {},
 20 |    "source": [
 21 |     "simple"
 22 |    ]
 23 |   },
 24 |   {
 25 |    "cell_type": "code",
 26 |    "execution_count": 1,
 27 |    "metadata": {},
 28 |    "outputs": [
 29 |     {
 30 |      "data": {
 31 |       "text/plain": [
 32 |        "4"
 33 |       ]
 34 |      },
 35 |      "execution_count": 1,
 36 |      "metadata": {},
 37 |      "output_type": "execute_result"
 38 |     }
 39 |    ],
 40 |    "source": [
 41 |     "2*2"
 42 |    ]
 43 |   },
 44 |   {
 45 |    "cell_type": "code",
 46 |    "execution_count": 2,
 47 |    "metadata": {},
 48 |    "outputs": [],
 49 |    "source": [
 50 |     "2*2;"
 51 |    ]
 52 |   },
 53 |   {
 54 |    "cell_type": "markdown",
 55 |    "metadata": {},
 56 |    "source": [
 57 |     "function"
 58 |    ]
 59 |   },
 60 |   {
 61 |    "cell_type": "code",
 62 |    "execution_count": 4,
 63 |    "metadata": {},
 64 |    "outputs": [
 65 |     {
 66 |      "name": "stdout",
 67 |      "output_type": "stream",
 68 |      "text": [
 69 |       "Private Message\n"
 70 |      ]
 71 |     }
 72 |    ],
 73 |    "source": [
 74 |     "def myfunc():\n",
 75 |     "    print('Private Message')\n",
 76 |     "myfunc();"
 77 |    ]
 78 |   },
 79 |   {
 80 |    "cell_type": "code",
 81 |    "execution_count": 5,
 82 |    "metadata": {},
 83 |    "outputs": [],
 84 |    "source": [
 85 |     "%%capture\n",
 86 |     "def myfunc():\n",
 87 |     "    print('Private Message')\n",
 88 |     "myfunc()"
 89 |    ]
 90 |   },
 91 |   {
 92 |    "cell_type": "markdown",
 93 |    "metadata": {},
 94 |    "source": [
 95 |     "function 2"
 96 |    ]
 97 |   },
 98 |   {
 99 |    "cell_type": "code",
100 |    "execution_count": 6,
101 |    "metadata": {},
102 |    "outputs": [
103 |     {
104 |      "name": "stdout",
105 |      "output_type": "stream",
106 |      "text": [
107 |       "Private Message\n"
108 |      ]
109 |     }
110 |    ],
111 |    "source": [
112 |     "def myfunc():\n",
113 |     "    print('Private Message')\n",
114 |     "    \n",
115 |     "myfunc()"
116 |    ]
117 |   },
118 |   {
119 |    "cell_type": "code",
120 |    "execution_count": 7,
121 |    "metadata": {},
122 |    "outputs": [],
123 |    "source": [
124 |     "from IPython.utils import io\n",
125 |     "\n",
126 |     "def myfunc():\n",
127 |     "    print('Private Message')\n",
128 |     "\n",
129 |     "with io.capture_output() as captured:\n",
130 |     "    myfunc()"
131 |    ]
132 |   },
133 |   {
134 |    "cell_type": "markdown",
135 |    "metadata": {},
136 |    "source": [
137 |     "## Get function docs and arguments IPython Notebook "
138 |    ]
139 |   },
140 |   {
141 |    "cell_type": "code",
142 |    "execution_count": 8,
143 |    "metadata": {},
144 |    "outputs": [],
145 |    "source": [
146 |     "import numpy\n",
147 |     "table_list = [1,2,3,4,4]\n",
148 |     "l = numpy.array_split(table_list, len(table_list)/4)"
149 |    ]
150 |   },
151 |   {
152 |    "cell_type": "code",
153 |    "execution_count": 9,
154 |    "metadata": {},
155 |    "outputs": [],
156 |    "source": [
157 |     "?"
158 |    ]
159 |   },
160 |   {
161 |    "cell_type": "code",
162 |    "execution_count": null,
163 |    "metadata": {},
164 |    "outputs": [],
165 |    "source": [
166 |     "? numpy.array_split"
167 |    ]
168 |   },
169 |   {
170 |    "cell_type": "markdown",
171 |    "metadata": {},
172 |    "source": [
173 |     "## Change theme IPython Notebook \n",
174 |     "\n",
175 |     "install the module by\n",
176 |     "\n",
177 |     "`pip install jupyterthemes`\n",
178 |     "\n",
179 |     "install a theme:\n",
180 |     "\n",
181 |     "`jt -t chesterish`\n",
182 |     "\n",
183 |     "restore a theme:\n",
184 |     "\n",
185 |     "`jt -r`\n",
186 |     "\n",
187 |     "It can be done even inside jupyter notebook by:\n",
188 |     "\n",
189 |     "`!jt -r`"
190 |    ]
191 |   },
192 |   {
193 |    "cell_type": "code",
194 |    "execution_count": null,
195 |    "metadata": {},
196 |    "outputs": [],
197 |    "source": [
198 |     "!jt -r"
199 |    ]
200 |   },
201 |   {
202 |    "cell_type": "code",
203 |    "execution_count": null,
204 |    "metadata": {},
205 |    "outputs": [],
206 |    "source": [
207 |     "!jt -t chesterish"
208 |    ]
209 |   },
210 |   {
211 |    "cell_type": "markdown",
212 |    "metadata": {},
213 |    "source": [
214 |     "## Bonus: some useful jupyter notebook magics"
215 |    ]
216 |   },
217 |   {
218 |    "cell_type": "code",
219 |    "execution_count": null,
220 |    "metadata": {},
221 |    "outputs": [],
222 |    "source": [
223 |     "!jupyter kernelspec list"
224 |    ]
225 |   },
226 |   {
227 |    "cell_type": "code",
228 |    "execution_count": null,
229 |    "metadata": {},
230 |    "outputs": [],
231 |    "source": [
232 |     "import numpy\n",
233 |     "print (numpy.__path__)"
234 |    ]
235 |   },
236 |   {
237 |    "cell_type": "code",
238 |    "execution_count": 18,
239 |    "metadata": {},
240 |    "outputs": [
241 |     {
242 |      "name": "stdout",
243 |      "output_type": "stream",
244 |      "text": [
245 |       "Python 3.6.7\r\n"
246 |      ]
247 |     }
248 |    ],
249 |    "source": [
250 |     "!python -V"
251 |    ]
252 |   },
253 |   {
254 |    "cell_type": "code",
255 |    "execution_count": null,
256 |    "metadata": {},
257 |    "outputs": [],
258 |    "source": [
259 |     "!which python"
260 |    ]
261 |   },
262 |   {
263 |    "cell_type": "code",
264 |    "execution_count": 17,
265 |    "metadata": {},
266 |    "outputs": [
267 |     {
268 |      "name": "stdout",
269 |      "output_type": "stream",
270 |      "text": [
271 |       "appdirs==1.4.3\r\n",
272 |       "asn1crypto==0.24.0\r\n",
273 |       "atomicwrites==1.2.1\r\n",
274 |       "attrs==18.2.0\r\n",
275 |       "backcall==0.1.0\r\n",
276 |       "black==18.9b0\r\n",
277 |       "bleach==3.0.2\r\n",
278 |       "boto==2.49.0\r\n",
279 |       "boto3==1.9.67\r\n",
280 |       "botocore==1.12.67\r\n",
281 |       "camelot-py==0.7.1\r\n",
282 |       "certifi==2018.8.24\r\n",
283 |       "cffi==1.11.5\r\n",
284 |       "chardet==3.0.4\r\n",
285 |       "Click==7.0\r\n",
286 |       "cryptography==2.3.1\r\n",
287 |       "cycler==0.10.0\r\n",
288 |       "decorator==4.3.0\r\n",
289 |       "defusedxml==0.5.0\r\n",
290 |       "distro==1.3.0\r\n",
291 |       "docutils==0.14\r\n",
292 |       "entrypoints==0.2.3\r\n",
293 |       "et-xmlfile==1.0.1\r\n",
294 |       "filelock==3.0.10\r\n",
295 |       "idna==2.7\r\n",
296 |       "ipykernel==5.1.0\r\n",
297 |       "ipython==7.2.0\r\n",
298 |       "ipython-genutils==0.2.0\r\n",
299 |       "ipywidgets==7.4.2\r\n",
300 |       "jdcal==1.4\r\n",
301 |       "jedi==0.13.1\r\n",
302 |       "Jinja2==2.10\r\n",
303 |       "jira==2.0.0\r\n",
304 |       "jmespath==0.9.3\r\n",
305 |       "jsonref==0.2\r\n",
306 |       "jsonschema==2.6.0\r\n",
307 |       "jupyter==1.0.0\r\n",
308 |       "jupyter-client==5.2.3\r\n",
309 |       "jupyter-console==6.0.0\r\n",
310 |       "jupyter-core==4.4.0\r\n",
311 |       "jupyterthemes==0.20.0\r\n",
312 |       "kiwisolver==1.0.1\r\n",
313 |       "lesscpy==0.13.0\r\n",
314 |       "lxml==4.3.0\r\n",
315 |       "MarkupSafe==1.1.0\r\n",
316 |       "matplotlib==3.0.0\r\n",
317 |       "mistune==0.8.4\r\n",
318 |       "more-itertools==5.0.0\r\n",
319 |       "nbconvert==5.4.0\r\n",
320 |       "nbformat==4.4.0\r\n",
321 |       "notebook==5.7.2\r\n",
322 |       "numpy==1.15.1\r\n",
323 |       "oauthlib==2.1.0\r\n",
324 |       "opencv-python==4.0.0.21\r\n",
325 |       "openpyxl==2.5.14\r\n",
326 |       "packaging==16.8\r\n",
327 |       "pandas==0.23.4\r\n",
328 |       "pandocfilters==1.4.2\r\n",
329 |       "parso==0.3.1\r\n",
330 |       "pbr==4.2.0\r\n",
331 |       "pdfminer.six==20181108\r\n",
332 |       "pexpect==4.6.0\r\n",
333 |       "pickleshare==0.7.5\r\n",
334 |       "Pillow==5.2.0\r\n",
335 |       "pkg-resources==0.0.0\r\n",
336 |       "pluggy==0.8.1\r\n",
337 |       "ply==3.11\r\n",
338 |       "prometheus-client==0.4.2\r\n",
339 |       "prompt-toolkit==2.0.7\r\n",
340 |       "ptyprocess==0.6.0\r\n",
341 |       "py==1.7.0\r\n",
342 |       "py-spy==0.1.8\r\n",
343 |       "pycodestyle==2.3.1\r\n",
344 |       "pycparser==2.18\r\n",
345 |       "pycryptodome==3.7.3\r\n",
346 |       "Pygments==2.3.0\r\n",
347 |       "PyJWT==1.6.4\r\n",
348 |       "PyMySQL==0.9.2\r\n",
349 |       "pyparsing==2.2.0\r\n",
350 |       "PyPDF2==1.26.0\r\n",
351 |       "pytesseract==0.2.4\r\n",
352 |       "pytest==4.1.1\r\n",
353 |       "python-dateutil==2.7.3\r\n",
354 |       "pytz==2018.5\r\n",
355 |       "pyzmq==17.1.2\r\n",
356 |       "qtconsole==4.4.3\r\n",
357 |       "requests==2.19.1\r\n",
358 |       "requests-oauthlib==1.0.0\r\n",
359 |       "requests-toolbelt==0.8.0\r\n",
360 |       "retrying==1.3.3\r\n",
361 |       "s3transfer==0.1.13\r\n",
362 |       "scrapinghub==2.0.3\r\n",
363 |       "selenium==3.14.0\r\n",
364 |       "Send2Trash==1.5.0\r\n",
365 |       "simplejson==3.10.0\r\n",
366 |       "six==1.10.0\r\n",
367 |       "sortedcontainers==2.1.0\r\n",
368 |       "style==1.1.0\r\n",
369 |       "tabula-py==1.3.1\r\n",
370 |       "tabulate==0.8.2\r\n",
371 |       "terminado==0.8.1\r\n",
372 |       "testpath==0.4.2\r\n",
373 |       "toml==0.10.0\r\n",
374 |       "tornado==5.1.1\r\n",
375 |       "tox==3.7.0\r\n",
376 |       "traitlets==4.3.2\r\n",
377 |       "update==0.0.1\r\n",
378 |       "urllib3==1.23\r\n",
379 |       "virtualenv==16.3.0\r\n",
380 |       "Wand==0.4.4\r\n",
381 |       "wcwidth==0.1.7\r\n",
382 |       "webencodings==0.5.1\r\n",
383 |       "widgetsnbextension==3.4.2\r\n"
384 |      ]
385 |     }
386 |    ],
387 |    "source": [
388 |     "!pip freeze"
389 |    ]
390 |   },
391 |   {
392 |    "cell_type": "code",
393 |    "execution_count": null,
394 |    "metadata": {},
395 |    "outputs": [],
396 |    "source": [
397 |     "!echo $PATH  "
398 |    ]
399 |   },
400 |   {
401 |    "cell_type": "markdown",
402 |    "metadata": {},
403 |    "source": [
404 |     "## Bonus 2: Top 10 most useful ipython key shortcuts"
405 |    ]
406 |   },
407 |   {
408 |    "cell_type": "markdown",
409 |    "metadata": {},
410 |    "source": [
411 |     "* <kbd>Shift</kbd> + <kbd>Enter</kbd> - \trun cell\n",
412 |     "* <kbd>Alt</kbd> + <kbd>Enter</kbd> - \trun cell, insert below\n",
413 |     "* <kbd>Ctrl</kbd> + <kbd>m</kbd>, <kbd>c</kbd>  - \tcopy cell\n",
414 |     "* <kbd>Ctrl</kbd> + <kbd>m</kbd>, <kbd>v</kbd>  - \tpaste cell\n",
415 |     "* <kbd>Ctrl</kbd> + <kbd>m</kbd>, <kbd>l</kbd>  - \ttoggle line numbers\n",
416 |     "* <kbd>Ctrl</kbd> + <kbd>m</kbd>, <kbd>j</kbd>  -\tmove cell\n",
417 |     "* <kbd>Ctrl</kbd> + <kbd>m</kbd>, <kbd>y</kbd>  -\tcode cell\n",
418 |     "* <kbd>Ctrl</kbd> + <kbd>m</kbd>, <kbd>m</kbd>  -\tmarkdown cell\n",
419 |     "* <kbd>Ctrl</kbd> + <kbd>m</kbd>, <kbd>.</kbd>  -\trestart kernel\n",
420 |     "* <kbd>Ctrl</kbd> + <kbd>m</kbd>, <kbd>h</kbd>  -\tshow keyboard shortcuts"
421 |    ]
422 |   },
423 |   {
424 |    "cell_type": "code",
425 |    "execution_count": 21,
426 |    "metadata": {},
427 |    "outputs": [
428 |     {
429 |      "data": {
430 |       "text/plain": [
431 |        "2"
432 |       ]
433 |      },
434 |      "execution_count": 21,
435 |      "metadata": {},
436 |      "output_type": "execute_result"
437 |     }
438 |    ],
439 |    "source": [
440 |     "1+1"
441 |    ]
442 |   },
443 |   {
444 |    "cell_type": "code",
445 |    "execution_count": null,
446 |    "metadata": {},
447 |    "outputs": [],
448 |    "source": []
449 |   },
450 |   {
451 |    "cell_type": "code",
452 |    "execution_count": 22,
453 |    "metadata": {},
454 |    "outputs": [
455 |     {
456 |      "data": {
457 |       "text/plain": [
458 |        "2"
459 |       ]
460 |      },
461 |      "execution_count": 22,
462 |      "metadata": {},
463 |      "output_type": "execute_result"
464 |     }
465 |    ],
466 |    "source": [
467 |     "1+1\n",
468 |     "## markdown"
469 |    ]
470 |   },
471 |   {
472 |    "cell_type": "code",
473 |    "execution_count": null,
474 |    "metadata": {},
475 |    "outputs": [],
476 |    "source": []
477 |   },
478 |   {
479 |    "cell_type": "code",
480 |    "execution_count": null,
481 |    "metadata": {},
482 |    "outputs": [],
483 |    "source": []
484 |   },
485 |   {
486 |    "cell_type": "code",
487 |    "execution_count": null,
488 |    "metadata": {},
489 |    "outputs": [],
490 |    "source": []
491 |   },
492 |   {
493 |    "cell_type": "code",
494 |    "execution_count": null,
495 |    "metadata": {},
496 |    "outputs": [],
497 |    "source": []
498 |   },
499 |   {
500 |    "cell_type": "code",
501 |    "execution_count": null,
502 |    "metadata": {},
503 |    "outputs": [],
504 |    "source": []
505 |   },
506 |   {
507 |    "cell_type": "code",
508 |    "execution_count": null,
509 |    "metadata": {},
510 |    "outputs": [],
511 |    "source": []
512 |   }
513 |  ],
514 |  "metadata": {
515 |   "kernelspec": {
516 |    "display_name": "Python 3",
517 |    "language": "python",
518 |    "name": "python3"
519 |   },
520 |   "language_info": {
521 |    "codemirror_mode": {
522 |     "name": "ipython",
523 |     "version": 3
524 |    },
525 |    "file_extension": ".py",
526 |    "mimetype": "text/x-python",
527 |    "name": "python",
528 |    "nbconvert_exporter": "python",
529 |    "pygments_lexer": "ipython3",
530 |    "version": "3.6.7"
531 |   }
532 |  },
533 |  "nbformat": 4,
534 |  "nbformat_minor": 2
535 | }
536 | 


--------------------------------------------------------------------------------
/notebooks/Image_validation_with_Python.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "metadata": {},
  6 |    "source": [
  7 |     "## Image validation with Python\n",
  8 |     "\n",
  9 |     "* is a file valid image\n",
 10 |     "    * check file extension\n",
 11 |     "    * check the file with pil\n",
 12 |     "* is the image blank\n",
 13 |     "* is the image contains a pattern\n",
 14 |     "\n",
 15 |     "#### possible future video:\n",
 16 |     "* multiple image validation\n",
 17 |     "* validation url image without donwload\n",
 18 |     "* search image in image"
 19 |    ]
 20 |   },
 21 |   {
 22 |    "cell_type": "markdown",
 23 |    "metadata": {},
 24 |    "source": [
 25 |     "## is a file valid image"
 26 |    ]
 27 |   },
 28 |   {
 29 |    "cell_type": "code",
 30 |    "execution_count": 6,
 31 |    "metadata": {},
 32 |    "outputs": [
 33 |     {
 34 |      "data": {
 35 |       "text/plain": [
 36 |        "False"
 37 |       ]
 38 |      },
 39 |      "execution_count": 6,
 40 |      "metadata": {},
 41 |      "output_type": "execute_result"
 42 |     }
 43 |    ],
 44 |    "source": [
 45 |     "# check file extension\n",
 46 |     "test_img = './csv/movie_metadata.csv'\n",
 47 |     "test_img.lower().endswith(('.png', '.jpg', '.jpeg'))"
 48 |    ]
 49 |   },
 50 |   {
 51 |    "cell_type": "code",
 52 |    "execution_count": 7,
 53 |    "metadata": {},
 54 |    "outputs": [
 55 |     {
 56 |      "data": {
 57 |       "text/plain": [
 58 |        "True"
 59 |       ]
 60 |      },
 61 |      "execution_count": 7,
 62 |      "metadata": {},
 63 |      "output_type": "execute_result"
 64 |     }
 65 |    ],
 66 |    "source": [
 67 |     "# check file extension\n",
 68 |     "test_img = './csv/Selection_001.png'\n",
 69 |     "test_img.lower().endswith(('.png', '.jpg', '.jpeg'))"
 70 |    ]
 71 |   },
 72 |   {
 73 |    "cell_type": "markdown",
 74 |    "metadata": {},
 75 |    "source": [
 76 |     "### check the file with pil\n",
 77 |     "\n",
 78 |     "`pip install Pillow`"
 79 |    ]
 80 |   },
 81 |   {
 82 |    "cell_type": "code",
 83 |    "execution_count": 25,
 84 |    "metadata": {},
 85 |    "outputs": [],
 86 |    "source": [
 87 |     "from PIL import Image\n",
 88 |     "def is_jpg(filename):\n",
 89 |     "    try:\n",
 90 |     "        i=Image.open(filename)\n",
 91 |     "        return i.format in ['PNG', 'JPEG']\n",
 92 |     "    except IOError:\n",
 93 |     "        return False\n",
 94 |     "  "
 95 |    ]
 96 |   },
 97 |   {
 98 |    "cell_type": "code",
 99 |    "execution_count": 26,
100 |    "metadata": {},
101 |    "outputs": [
102 |     {
103 |      "data": {
104 |       "text/plain": [
105 |        "False"
106 |       ]
107 |      },
108 |      "execution_count": 26,
109 |      "metadata": {},
110 |      "output_type": "execute_result"
111 |     }
112 |    ],
113 |    "source": [
114 |     "is_jpg('./csv')  "
115 |    ]
116 |   },
117 |   {
118 |    "cell_type": "code",
119 |    "execution_count": 27,
120 |    "metadata": {},
121 |    "outputs": [
122 |     {
123 |      "data": {
124 |       "text/plain": [
125 |        "False"
126 |       ]
127 |      },
128 |      "execution_count": 27,
129 |      "metadata": {},
130 |      "output_type": "execute_result"
131 |     }
132 |    ],
133 |    "source": [
134 |     "is_jpg('./csv/movie_metadata.csv')  "
135 |    ]
136 |   },
137 |   {
138 |    "cell_type": "code",
139 |    "execution_count": 28,
140 |    "metadata": {},
141 |    "outputs": [
142 |     {
143 |      "data": {
144 |       "text/plain": [
145 |        "True"
146 |       ]
147 |      },
148 |      "execution_count": 28,
149 |      "metadata": {},
150 |      "output_type": "execute_result"
151 |     }
152 |    ],
153 |    "source": [
154 |     "is_jpg('./csv/Selection_001.png')  "
155 |    ]
156 |   },
157 |   {
158 |    "cell_type": "code",
159 |    "execution_count": 29,
160 |    "metadata": {},
161 |    "outputs": [
162 |     {
163 |      "data": {
164 |       "text/plain": [
165 |        "True"
166 |       ]
167 |      },
168 |      "execution_count": 29,
169 |      "metadata": {},
170 |      "output_type": "execute_result"
171 |     }
172 |    ],
173 |    "source": [
174 |     "is_jpg('./csv/Selection_001.png')  "
175 |    ]
176 |   },
177 |   {
178 |    "cell_type": "code",
179 |    "execution_count": 30,
180 |    "metadata": {},
181 |    "outputs": [
182 |     {
183 |      "data": {
184 |       "text/plain": [
185 |        "True"
186 |       ]
187 |      },
188 |      "execution_count": 30,
189 |      "metadata": {},
190 |      "output_type": "execute_result"
191 |     }
192 |    ],
193 |    "source": [
194 |     "is_jpg('./csv/fire-and-water-2354583_960_720.jpg')  "
195 |    ]
196 |   },
197 |   {
198 |    "cell_type": "markdown",
199 |    "metadata": {},
200 |    "source": [
201 |     "## is the image blank"
202 |    ]
203 |   },
204 |   {
205 |    "cell_type": "markdown",
206 |    "metadata": {},
207 |    "source": [
208 |     "<img src=\"https://cdn.pixabay.com/photo/2013/03/29/07/34/girl-97433_960_720.jpg\" style=\"width: 200px;\"/>"
209 |    ]
210 |   },
211 |   {
212 |    "cell_type": "code",
213 |    "execution_count": 5,
214 |    "metadata": {},
215 |    "outputs": [
216 |     {
217 |      "name": "stdout",
218 |      "output_type": "stream",
219 |      "text": [
220 |       "None\n"
221 |      ]
222 |     }
223 |    ],
224 |    "source": [
225 |     "import json\n",
226 |     "from io import BytesIO\n",
227 |     "from PIL import Image\n",
228 |     "import requests\n",
229 |     "\n",
230 |     "remote_file = 'https://cdn.pixabay.com/photo/2013/03/29/07/34/girl-97433_960_720.jpg'\n",
231 |     "\n",
232 |     "response = requests.get(remote_file)\n",
233 |     "img = Image.open(BytesIO(response.content))\n",
234 |     "\n",
235 |     "clrs = img.getcolors()\n",
236 |     "print(clrs)"
237 |    ]
238 |   },
239 |   {
240 |    "cell_type": "code",
241 |    "execution_count": 31,
242 |    "metadata": {},
243 |    "outputs": [
244 |     {
245 |      "data": {
246 |       "text/html": [
247 |        "<img src=\"./csv/Selection_139.png\"/>"
248 |       ],
249 |       "text/plain": [
250 |        "<IPython.core.display.Image object>"
251 |       ]
252 |      },
253 |      "execution_count": 31,
254 |      "metadata": {},
255 |      "output_type": "execute_result"
256 |     }
257 |    ],
258 |    "source": [
259 |     "from IPython.display import Image\n",
260 |     "from IPython.core.display import HTML \n",
261 |     "\n",
262 |     "color_image = './csv/Selection_139.png'\n",
263 |     "\n",
264 |     "Image(url= color_image)"
265 |    ]
266 |   },
267 |   {
268 |    "cell_type": "code",
269 |    "execution_count": 32,
270 |    "metadata": {},
271 |    "outputs": [
272 |     {
273 |      "name": "stdout",
274 |      "output_type": "stream",
275 |      "text": [
276 |       "None\n"
277 |      ]
278 |     }
279 |    ],
280 |    "source": [
281 |     "import json\n",
282 |     "from io import BytesIO\n",
283 |     "from PIL import Image\n",
284 |     "import requests\n",
285 |     "\n",
286 |     "img = Image.open(color_image)\n",
287 |     "\n",
288 |     "clrs = img.getcolors()\n",
289 |     "print(clrs)"
290 |    ]
291 |   },
292 |   {
293 |    "cell_type": "code",
294 |    "execution_count": 33,
295 |    "metadata": {},
296 |    "outputs": [
297 |     {
298 |      "data": {
299 |       "text/html": [
300 |        "<img src=\"./csv/Selection_140.png\"/>"
301 |       ],
302 |       "text/plain": [
303 |        "<IPython.core.display.Image object>"
304 |       ]
305 |      },
306 |      "execution_count": 33,
307 |      "metadata": {},
308 |      "output_type": "execute_result"
309 |     }
310 |    ],
311 |    "source": [
312 |     "from IPython.display import Image\n",
313 |     "from IPython.core.display import HTML \n",
314 |     "\n",
315 |     "blank_image = './csv/Selection_140.png'\n",
316 |     "\n",
317 |     "Image(url= blank_image)"
318 |    ]
319 |   },
320 |   {
321 |    "cell_type": "code",
322 |    "execution_count": 34,
323 |    "metadata": {},
324 |    "outputs": [
325 |     {
326 |      "name": "stdout",
327 |      "output_type": "stream",
328 |      "text": [
329 |       "[(49128, (238, 238, 238))]\n"
330 |      ]
331 |     }
332 |    ],
333 |    "source": [
334 |     "import json\n",
335 |     "from io import BytesIO\n",
336 |     "from PIL import Image\n",
337 |     "import requests\n",
338 |     "\n",
339 |     "img = Image.open(blank_image)\n",
340 |     "\n",
341 |     "clrs = img.getcolors()\n",
342 |     "print(clrs)"
343 |    ]
344 |   },
345 |   {
346 |    "cell_type": "markdown",
347 |    "metadata": {},
348 |    "source": [
349 |     "## is the image contains a pattern"
350 |    ]
351 |   },
352 |   {
353 |    "cell_type": "markdown",
354 |    "metadata": {},
355 |    "source": [
356 |     "<img src=\"./csv/coin.png\" alt=\"Drawing\" style=\"width: 100px;\"/>\n",
357 |     "<img src=\"./csv/image_with_coin.jpg\" alt=\"Drawing\" style=\"width: 600px;\"/>"
358 |    ]
359 |   },
360 |   {
361 |    "cell_type": "code",
362 |    "execution_count": null,
363 |    "metadata": {},
364 |    "outputs": [],
365 |    "source": [
366 |     "import cv2\n",
367 |     "import numpy as np\n",
368 |     "\n",
369 |     "img_rgb = cv2.imread('./csv/image_with_coin.jpg')\n",
370 |     "template = cv2.imread('./csv/coin.png')\n",
371 |     "w, h = template.shape[:-1]\n",
372 |     "\n",
373 |     "res = cv2.matchTemplate(img_rgb, template, cv2.TM_CCOEFF_NORMED)\n",
374 |     "threshold = .8\n",
375 |     "loc = np.where(res >= threshold)\n",
376 |     "for pt in zip(*loc[::-1]): \n",
377 |     "    cv2.rectangle(img_rgb, pt, (pt[0] + w, pt[1] + h), (0, 0, 255), 2)\n",
378 |     "\n",
379 |     "cv2.imwrite('./csv/result.png', img_rgb)"
380 |    ]
381 |   },
382 |   {
383 |    "cell_type": "code",
384 |    "execution_count": null,
385 |    "metadata": {},
386 |    "outputs": [],
387 |    "source": [
388 |     "from IPython.display import Image\n",
389 |     "from IPython.core.display import HTML \n",
390 |     "\n",
391 |     "Image(url= './csv/result.png')"
392 |    ]
393 |   },
394 |   {
395 |    "cell_type": "code",
396 |    "execution_count": null,
397 |    "metadata": {},
398 |    "outputs": [],
399 |    "source": []
400 |   }
401 |  ],
402 |  "metadata": {
403 |   "kernelspec": {
404 |    "display_name": "Python 3",
405 |    "language": "python",
406 |    "name": "python3"
407 |   },
408 |   "language_info": {
409 |    "codemirror_mode": {
410 |     "name": "ipython",
411 |     "version": 3
412 |    },
413 |    "file_extension": ".py",
414 |    "mimetype": "text/x-python",
415 |    "name": "python",
416 |    "nbconvert_exporter": "python",
417 |    "pygments_lexer": "ipython3",
418 |    "version": "3.6.7"
419 |   }
420 |  },
421 |  "nbformat": 4,
422 |  "nbformat_minor": 2
423 | }
424 | 


--------------------------------------------------------------------------------
/notebooks/Load_multiple_CSV_files_into_a_single _Dataframe.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "code",
  5 |    "execution_count": 1,
  6 |    "metadata": {},
  7 |    "outputs": [],
  8 |    "source": [
  9 |     "import pandas as pd\n",
 10 |     "pd.set_option('display.max_colwidth', -1)"
 11 |    ]
 12 |   },
 13 |   {
 14 |    "cell_type": "markdown",
 15 |    "metadata": {},
 16 |    "source": [
 17 |     "## Rename multiple CSV files in a folder with Python"
 18 |    ]
 19 |   },
 20 |   {
 21 |    "cell_type": "code",
 22 |    "execution_count": 3,
 23 |    "metadata": {},
 24 |    "outputs": [],
 25 |    "source": [
 26 |     "import glob, os\n",
 27 |     "\n",
 28 |     "def rename(dir, pathAndFilename, pattern, titlePattern):\n",
 29 |     "    os.rename(pathAndFilename, os.path.join(dir, titlePattern))\n",
 30 |     "\n",
 31 |     "# search for csv files in the working folder     \n",
 32 |     "path = os.path.expanduser(\"~/Projects/MYP/Datasets/test/*.csv\")\n",
 33 |     "\n",
 34 |     "# iterate and rename them one by one with the number of the iteration\n",
 35 |     "for i, fname in enumerate(glob.glob(path)):\n",
 36 |     "    rename(os.path.expanduser('~/Projects/MYP/Datasets/test/'), fname, r'*.csv', r'test{}.csv'.format(i))"
 37 |    ]
 38 |   },
 39 |   {
 40 |    "cell_type": "markdown",
 41 |    "metadata": {},
 42 |    "source": [
 43 |     "## Load several files into Dataframe"
 44 |    ]
 45 |   },
 46 |   {
 47 |    "cell_type": "code",
 48 |    "execution_count": 5,
 49 |    "metadata": {},
 50 |    "outputs": [
 51 |     {
 52 |      "name": "stdout",
 53 |      "output_type": "stream",
 54 |      "text": [
 55 |       "(541, 7)\n",
 56 |       "(550, 7)\n",
 57 |       "(1641, 7)\n"
 58 |      ]
 59 |     }
 60 |    ],
 61 |    "source": [
 62 |     "# change separator for CSV file\n",
 63 |     "df1 = pd.read_csv('~/Projects/MYP/Datasets/test/test0.csv', sep=\"@\")\n",
 64 |     "df2 = pd.read_csv('~/Projects/MYP/Datasets/test/test1.csv', sep=\"@\")\n",
 65 |     "df3 = pd.read_csv('~/Projects/MYP/Datasets/test/test1.csv', sep=\"@\")\n",
 66 |     "\n",
 67 |     "frames = [df1, df2, df3]\n",
 68 |     "\n",
 69 |     "# concatenate multiple data CSV files\n",
 70 |     "all = pd.concat(frames)\n",
 71 |     "\n",
 72 |     "print(df1.shape)\n",
 73 |     "print(df2.shape)\n",
 74 |     "print(all.shape)"
 75 |    ]
 76 |   },
 77 |   {
 78 |    "cell_type": "markdown",
 79 |    "metadata": {},
 80 |    "source": [
 81 |     "## Dynamically Load multiple csv file into Dataframe"
 82 |    ]
 83 |   },
 84 |   {
 85 |    "cell_type": "code",
 86 |    "execution_count": 9,
 87 |    "metadata": {},
 88 |    "outputs": [
 89 |     {
 90 |      "data": {
 91 |       "text/html": [
 92 |        "<div>\n",
 93 |        "<style scoped>\n",
 94 |        "    .dataframe tbody tr th:only-of-type {\n",
 95 |        "        vertical-align: middle;\n",
 96 |        "    }\n",
 97 |        "\n",
 98 |        "    .dataframe tbody tr th {\n",
 99 |        "        vertical-align: top;\n",
100 |        "    }\n",
101 |        "\n",
102 |        "    .dataframe thead th {\n",
103 |        "        text-align: right;\n",
104 |        "    }\n",
105 |        "</style>\n",
106 |        "<table border=\"1\" class=\"dataframe\">\n",
107 |        "  <thead>\n",
108 |        "    <tr style=\"text-align: right;\">\n",
109 |        "      <th></th>\n",
110 |        "      <th>title</th>\n",
111 |        "      <th>Views</th>\n",
112 |        "      <th>Like</th>\n",
113 |        "      <th>Dislike</th>\n",
114 |        "      <th>Comment</th>\n",
115 |        "      <th>channel</th>\n",
116 |        "    </tr>\n",
117 |        "  </thead>\n",
118 |        "  <tbody>\n",
119 |        "    <tr>\n",
120 |        "      <th>215</th>\n",
121 |        "      <td>Turning Google Earth into SimCity 2000</td>\n",
122 |        "      <td>168175.0</td>\n",
123 |        "      <td>3251.0</td>\n",
124 |        "      <td>1125.0</td>\n",
125 |        "      <td>215.0</td>\n",
126 |        "      <td>test0.csv</td>\n",
127 |        "    </tr>\n",
128 |        "    <tr>\n",
129 |        "      <th>301</th>\n",
130 |        "      <td>Microservices + Events + Docker = A Perfect Trio</td>\n",
131 |        "      <td>161110.0</td>\n",
132 |        "      <td>3213.0</td>\n",
133 |        "      <td>50.0</td>\n",
134 |        "      <td>83.0</td>\n",
135 |        "      <td>test0.csv</td>\n",
136 |        "    </tr>\n",
137 |        "    <tr>\n",
138 |        "      <th>265</th>\n",
139 |        "      <td>PHP in 2018 by the Creator of PHP</td>\n",
140 |        "      <td>164577.0</td>\n",
141 |        "      <td>3557.0</td>\n",
142 |        "      <td>69.0</td>\n",
143 |        "      <td>384.0</td>\n",
144 |        "      <td>test0.csv</td>\n",
145 |        "    </tr>\n",
146 |        "    <tr>\n",
147 |        "      <th>468</th>\n",
148 |        "      <td>Developing Blockchain Software</td>\n",
149 |        "      <td>169484.0</td>\n",
150 |        "      <td>2512.0</td>\n",
151 |        "      <td>116.0</td>\n",
152 |        "      <td>133.0</td>\n",
153 |        "      <td>test0.csv</td>\n",
154 |        "    </tr>\n",
155 |        "    <tr>\n",
156 |        "      <th>398</th>\n",
157 |        "      <td>VS Code: The Last Editor You'll Ever Need</td>\n",
158 |        "      <td>172738.0</td>\n",
159 |        "      <td>1930.0</td>\n",
160 |        "      <td>194.0</td>\n",
161 |        "      <td>340.0</td>\n",
162 |        "      <td>test0.csv</td>\n",
163 |        "    </tr>\n",
164 |        "    <tr>\n",
165 |        "      <th>175</th>\n",
166 |        "      <td>Coding Challenge #74: Clock with p5.js</td>\n",
167 |        "      <td>232227.0</td>\n",
168 |        "      <td>4609.0</td>\n",
169 |        "      <td>68.0</td>\n",
170 |        "      <td>289.0</td>\n",
171 |        "      <td>test1.csv</td>\n",
172 |        "    </tr>\n",
173 |        "    <tr>\n",
174 |        "      <th>373</th>\n",
175 |        "      <td>Coding Challenge #12: The Lorenz Attractor in Processing</td>\n",
176 |        "      <td>217172.0</td>\n",
177 |        "      <td>3680.0</td>\n",
178 |        "      <td>43.0</td>\n",
179 |        "      <td>333.0</td>\n",
180 |        "      <td>test1.csv</td>\n",
181 |        "    </tr>\n",
182 |        "    <tr>\n",
183 |        "      <th>447</th>\n",
184 |        "      <td>10.4: Loading JSON data from a URL (Asynchronous Callbacks!) - p5.js Tutorial</td>\n",
185 |        "      <td>218081.0</td>\n",
186 |        "      <td>2120.0</td>\n",
187 |        "      <td>79.0</td>\n",
188 |        "      <td>240.0</td>\n",
189 |        "      <td>test1.csv</td>\n",
190 |        "    </tr>\n",
191 |        "    <tr>\n",
192 |        "      <th>269</th>\n",
193 |        "      <td>The Coding Train!</td>\n",
194 |        "      <td>218635.0</td>\n",
195 |        "      <td>2482.0</td>\n",
196 |        "      <td>83.0</td>\n",
197 |        "      <td>324.0</td>\n",
198 |        "      <td>test1.csv</td>\n",
199 |        "    </tr>\n",
200 |        "    <tr>\n",
201 |        "      <th>193</th>\n",
202 |        "      <td>Coding Challenge #71: Minesweeper</td>\n",
203 |        "      <td>220816.0</td>\n",
204 |        "      <td>3334.0</td>\n",
205 |        "      <td>71.0</td>\n",
206 |        "      <td>401.0</td>\n",
207 |        "      <td>test1.csv</td>\n",
208 |        "    </tr>\n",
209 |        "  </tbody>\n",
210 |        "</table>\n",
211 |        "</div>"
212 |       ],
213 |       "text/plain": [
214 |        "                                                                             title  \\\n",
215 |        "215  Turning Google Earth into SimCity 2000                                          \n",
216 |        "301  Microservices + Events + Docker = A Perfect Trio                                \n",
217 |        "265  PHP in 2018 by the Creator of PHP                                               \n",
218 |        "468  Developing Blockchain Software                                                  \n",
219 |        "398  VS Code: The Last Editor You'll Ever Need                                       \n",
220 |        "175  Coding Challenge #74: Clock with p5.js                                          \n",
221 |        "373  Coding Challenge #12: The Lorenz Attractor in Processing                        \n",
222 |        "447  10.4: Loading JSON data from a URL (Asynchronous Callbacks!) - p5.js Tutorial   \n",
223 |        "269  The Coding Train!                                                               \n",
224 |        "193  Coding Challenge #71: Minesweeper                                               \n",
225 |        "\n",
226 |        "        Views    Like  Dislike  Comment    channel  \n",
227 |        "215  168175.0  3251.0  1125.0   215.0    test0.csv  \n",
228 |        "301  161110.0  3213.0  50.0     83.0     test0.csv  \n",
229 |        "265  164577.0  3557.0  69.0     384.0    test0.csv  \n",
230 |        "468  169484.0  2512.0  116.0    133.0    test0.csv  \n",
231 |        "398  172738.0  1930.0  194.0    340.0    test0.csv  \n",
232 |        "175  232227.0  4609.0  68.0     289.0    test1.csv  \n",
233 |        "373  217172.0  3680.0  43.0     333.0    test1.csv  \n",
234 |        "447  218081.0  2120.0  79.0     240.0    test1.csv  \n",
235 |        "269  218635.0  2482.0  83.0     324.0    test1.csv  \n",
236 |        "193  220816.0  3334.0  71.0     401.0    test1.csv  "
237 |       ]
238 |      },
239 |      "execution_count": 9,
240 |      "metadata": {},
241 |      "output_type": "execute_result"
242 |     }
243 |    ],
244 |    "source": [
245 |     "import glob\n",
246 |     "\n",
247 |     "result = pd.DataFrame()\n",
248 |     "\n",
249 |     "path = os.path.expanduser(\"~/Projects/MYP/Datasets/test/*.csv\")\n",
250 |     "\n",
251 |     "for fname in glob.glob(path):\n",
252 |     "    head, tail = os.path.split(fname)\n",
253 |     "    df = pd.read_csv(fname, sep=\"@\")\n",
254 |     "    df2 = df.sort_values(by=['Views'], ascending=False).drop(['Favorite', 'videoID'], axis=1).iloc[15:20,:]\n",
255 |     "    df2['channel'] = tail\n",
256 |     "    result = pd.concat([result, df2])\n",
257 |     "result.sort_values(by=['channel']).iloc[0:10,]    "
258 |    ]
259 |   },
260 |   {
261 |    "cell_type": "markdown",
262 |    "metadata": {},
263 |    "source": [
264 |     "## Generate clickable links with pandas and Jupyter notebook"
265 |    ]
266 |   },
267 |   {
268 |    "cell_type": "code",
269 |    "execution_count": 11,
270 |    "metadata": {},
271 |    "outputs": [
272 |     {
273 |      "data": {
274 |       "text/html": [
275 |        "<table border=\"1\" class=\"dataframe\">\n",
276 |        "  <thead>\n",
277 |        "    <tr style=\"text-align: right;\">\n",
278 |        "      <th></th>\n",
279 |        "      <th>title</th>\n",
280 |        "      <th>Views</th>\n",
281 |        "      <th>Like</th>\n",
282 |        "      <th>Dislike</th>\n",
283 |        "      <th>Favorite</th>\n",
284 |        "      <th>Comment</th>\n",
285 |        "      <th>videoID</th>\n",
286 |        "      <th>nameurl</th>\n",
287 |        "    </tr>\n",
288 |        "  </thead>\n",
289 |        "  <tbody>\n",
290 |        "    <tr>\n",
291 |        "      <th>20</th>\n",
292 |        "      <td>How To...</td>\n",
293 |        "      <td>80620.0</td>\n",
294 |        "      <td>121.0</td>\n",
295 |        "      <td>13.0</td>\n",
296 |        "      <td>0.0</td>\n",
297 |        "      <td>13.0</td>\n",
298 |        "      <td>https:...</td>\n",
299 |        "      <td><a hre...</td>\n",
300 |        "    </tr>\n",
301 |        "    <tr>\n",
302 |        "      <th>21</th>\n",
303 |        "      <td>How To...</td>\n",
304 |        "      <td>165533.0</td>\n",
305 |        "      <td>432.0</td>\n",
306 |        "      <td>143.0</td>\n",
307 |        "      <td>0.0</td>\n",
308 |        "      <td>17.0</td>\n",
309 |        "      <td>https:...</td>\n",
310 |        "      <td><a hre...</td>\n",
311 |        "    </tr>\n",
312 |        "    <tr>\n",
313 |        "      <th>22</th>\n",
314 |        "      <td>How To...</td>\n",
315 |        "      <td>29636.0</td>\n",
316 |        "      <td>99.0</td>\n",
317 |        "      <td>16.0</td>\n",
318 |        "      <td>0.0</td>\n",
319 |        "      <td>8.0</td>\n",
320 |        "      <td>https:...</td>\n",
321 |        "      <td><a hre...</td>\n",
322 |        "    </tr>\n",
323 |        "    <tr>\n",
324 |        "      <th>23</th>\n",
325 |        "      <td>How to...</td>\n",
326 |        "      <td>409.0</td>\n",
327 |        "      <td>4.0</td>\n",
328 |        "      <td>0.0</td>\n",
329 |        "      <td>0.0</td>\n",
330 |        "      <td>0.0</td>\n",
331 |        "      <td>https:...</td>\n",
332 |        "      <td><a hre...</td>\n",
333 |        "    </tr>\n",
334 |        "    <tr>\n",
335 |        "      <th>24</th>\n",
336 |        "      <td>How to...</td>\n",
337 |        "      <td>31358.0</td>\n",
338 |        "      <td>59.0</td>\n",
339 |        "      <td>33.0</td>\n",
340 |        "      <td>0.0</td>\n",
341 |        "      <td>2.0</td>\n",
342 |        "      <td>https:...</td>\n",
343 |        "      <td><a hre...</td>\n",
344 |        "    </tr>\n",
345 |        "    <tr>\n",
346 |        "      <th>25</th>\n",
347 |        "      <td>How To...</td>\n",
348 |        "      <td>85887.0</td>\n",
349 |        "      <td>272.0</td>\n",
350 |        "      <td>76.0</td>\n",
351 |        "      <td>0.0</td>\n",
352 |        "      <td>4.0</td>\n",
353 |        "      <td>https:...</td>\n",
354 |        "      <td><a hre...</td>\n",
355 |        "    </tr>\n",
356 |        "    <tr>\n",
357 |        "      <th>26</th>\n",
358 |        "      <td>How To...</td>\n",
359 |        "      <td>61449.0</td>\n",
360 |        "      <td>95.0</td>\n",
361 |        "      <td>34.0</td>\n",
362 |        "      <td>0.0</td>\n",
363 |        "      <td>0.0</td>\n",
364 |        "      <td>https:...</td>\n",
365 |        "      <td><a hre...</td>\n",
366 |        "    </tr>\n",
367 |        "    <tr>\n",
368 |        "      <th>27</th>\n",
369 |        "      <td>How To...</td>\n",
370 |        "      <td>262342.0</td>\n",
371 |        "      <td>1440.0</td>\n",
372 |        "      <td>93.0</td>\n",
373 |        "      <td>0.0</td>\n",
374 |        "      <td>447.0</td>\n",
375 |        "      <td>https:...</td>\n",
376 |        "      <td><a hre...</td>\n",
377 |        "    </tr>\n",
378 |        "    <tr>\n",
379 |        "      <th>28</th>\n",
380 |        "      <td>How To...</td>\n",
381 |        "      <td>154661.0</td>\n",
382 |        "      <td>453.0</td>\n",
383 |        "      <td>122.0</td>\n",
384 |        "      <td>0.0</td>\n",
385 |        "      <td>11.0</td>\n",
386 |        "      <td>https:...</td>\n",
387 |        "      <td><a hre...</td>\n",
388 |        "    </tr>\n",
389 |        "    <tr>\n",
390 |        "      <th>29</th>\n",
391 |        "      <td>How To...</td>\n",
392 |        "      <td>109787.0</td>\n",
393 |        "      <td>257.0</td>\n",
394 |        "      <td>40.0</td>\n",
395 |        "      <td>0.0</td>\n",
396 |        "      <td>22.0</td>\n",
397 |        "      <td>https:...</td>\n",
398 |        "      <td><a hre...</td>\n",
399 |        "    </tr>\n",
400 |        "  </tbody>\n",
401 |        "</table>"
402 |       ],
403 |       "text/plain": [
404 |        "<IPython.core.display.HTML object>"
405 |       ]
406 |      },
407 |      "execution_count": 11,
408 |      "metadata": {},
409 |      "output_type": "execute_result"
410 |     }
411 |    ],
412 |    "source": [
413 |     "from IPython.display import HTML\n",
414 |     "\n",
415 |     "# convert url column into href tag and add it as a new column to dataframe\n",
416 |     "df['nameurl'] = df['videoID'].apply(lambda x: '<a href=\"{}\">XXXXX</a>'.format(x))\n",
417 |     "\n",
418 |     "\n",
419 |     "\n",
420 |     "# otherwise the link will be blank\n",
421 |     "pd.set_option('display.max_colwidth', 10)\n",
422 |     "\n",
423 |     "# in order to display HTML code\n",
424 |     "HTML(df.iloc[20:30,] .to_html(escape=False))"
425 |    ]
426 |   },
427 |   {
428 |    "cell_type": "code",
429 |    "execution_count": null,
430 |    "metadata": {},
431 |    "outputs": [],
432 |    "source": []
433 |   }
434 |  ],
435 |  "metadata": {
436 |   "kernelspec": {
437 |    "display_name": "Python 3",
438 |    "language": "python",
439 |    "name": "python3"
440 |   },
441 |   "language_info": {
442 |    "codemirror_mode": {
443 |     "name": "ipython",
444 |     "version": 3
445 |    },
446 |    "file_extension": ".py",
447 |    "mimetype": "text/x-python",
448 |    "name": "python",
449 |    "nbconvert_exporter": "python",
450 |    "pygments_lexer": "ipython3",
451 |    "version": "3.6.7"
452 |   }
453 |  },
454 |  "nbformat": 4,
455 |  "nbformat_minor": 1
456 | }
457 | 


--------------------------------------------------------------------------------
/notebooks/Pandas count and percentage by value for a column.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "metadata": {},
  6 |    "source": [
  7 |     "## Pandas count and percentage by value for a column\n",
  8 |     "\n",
  9 |     "* read remote data from pdf\n",
 10 |     "* calculate count and percent\n",
 11 |     "* format percent in better output\n",
 12 |     "\n",
 13 |     "Bonus\n",
 14 |     "\n",
 15 |     "* pandas column renaming"
 16 |    ]
 17 |   },
 18 |   {
 19 |    "cell_type": "code",
 20 |    "execution_count": 1,
 21 |    "metadata": {},
 22 |    "outputs": [
 23 |     {
 24 |      "data": {
 25 |       "text/html": [
 26 |        "<div>\n",
 27 |        "<style scoped>\n",
 28 |        "    .dataframe tbody tr th:only-of-type {\n",
 29 |        "        vertical-align: middle;\n",
 30 |        "    }\n",
 31 |        "\n",
 32 |        "    .dataframe tbody tr th {\n",
 33 |        "        vertical-align: top;\n",
 34 |        "    }\n",
 35 |        "\n",
 36 |        "    .dataframe thead th {\n",
 37 |        "        text-align: right;\n",
 38 |        "    }\n",
 39 |        "</style>\n",
 40 |        "<table border=\"1\" class=\"dataframe\">\n",
 41 |        "  <thead>\n",
 42 |        "    <tr style=\"text-align: right;\">\n",
 43 |        "      <th></th>\n",
 44 |        "      <th>food</th>\n",
 45 |        "      <th>Portion size</th>\n",
 46 |        "      <th>per 100 grams</th>\n",
 47 |        "      <th>energy</th>\n",
 48 |        "    </tr>\n",
 49 |        "  </thead>\n",
 50 |        "  <tbody>\n",
 51 |        "    <tr>\n",
 52 |        "      <th>0</th>\n",
 53 |        "      <td>Fish cake</td>\n",
 54 |        "      <td>90 cals per cake</td>\n",
 55 |        "      <td>200 cals</td>\n",
 56 |        "      <td>Medium</td>\n",
 57 |        "    </tr>\n",
 58 |        "    <tr>\n",
 59 |        "      <th>1</th>\n",
 60 |        "      <td>Fish fingers</td>\n",
 61 |        "      <td>50 cals per piece</td>\n",
 62 |        "      <td>220 cals</td>\n",
 63 |        "      <td>Medium</td>\n",
 64 |        "    </tr>\n",
 65 |        "    <tr>\n",
 66 |        "      <th>2</th>\n",
 67 |        "      <td>Gammon</td>\n",
 68 |        "      <td>320 cals</td>\n",
 69 |        "      <td>280 cals</td>\n",
 70 |        "      <td>Med-High</td>\n",
 71 |        "    </tr>\n",
 72 |        "    <tr>\n",
 73 |        "      <th>3</th>\n",
 74 |        "      <td>Haddock fresh</td>\n",
 75 |        "      <td>200 cals</td>\n",
 76 |        "      <td>110 cals</td>\n",
 77 |        "      <td>Low calorie</td>\n",
 78 |        "    </tr>\n",
 79 |        "    <tr>\n",
 80 |        "      <th>4</th>\n",
 81 |        "      <td>Halibut fresh</td>\n",
 82 |        "      <td>220 cals</td>\n",
 83 |        "      <td>125 cals</td>\n",
 84 |        "      <td>Low calorie</td>\n",
 85 |        "    </tr>\n",
 86 |        "  </tbody>\n",
 87 |        "</table>\n",
 88 |        "</div>"
 89 |       ],
 90 |       "text/plain": [
 91 |        "            food      Portion size  per 100 grams       energy\n",
 92 |        "0      Fish cake   90 cals per cake      200 cals       Medium\n",
 93 |        "1   Fish fingers  50 cals per piece      220 cals       Medium\n",
 94 |        "2         Gammon           320 cals      280 cals     Med-High\n",
 95 |        "3  Haddock fresh           200 cals      110 cals  Low calorie\n",
 96 |        "4  Halibut fresh           220 cals      125 cals  Low calorie"
 97 |       ]
 98 |      },
 99 |      "execution_count": 1,
100 |      "metadata": {},
101 |      "output_type": "execute_result"
102 |     }
103 |    ],
104 |    "source": [
105 |     "from tabula import read_pdf\n",
106 |     "import pandas as pd\n",
107 |     "df = read_pdf(\"http://www.uncledavesenterprise.com/file/health/Food%20Calories%20List.pdf\", pages=3, pandas_options={'header': None})\n",
108 |     "df.columns = ['food', 'Portion size ', 'per 100 grams', 'energy']\n",
109 |     "df.head()"
110 |    ]
111 |   },
112 |   {
113 |    "cell_type": "code",
114 |    "execution_count": 2,
115 |    "metadata": {},
116 |    "outputs": [],
117 |    "source": [
118 |     "s = df.energy"
119 |    ]
120 |   },
121 |   {
122 |    "cell_type": "code",
123 |    "execution_count": 3,
124 |    "metadata": {},
125 |    "outputs": [
126 |     {
127 |      "data": {
128 |       "text/plain": [
129 |        "Medium         14\n",
130 |        "High            6\n",
131 |        "Low calorie     4\n",
132 |        "Med-High        4\n",
133 |        "Low-Med         1\n",
134 |        "Low- Med        1\n",
135 |        "Name: energy, dtype: int64"
136 |       ]
137 |      },
138 |      "execution_count": 3,
139 |      "metadata": {},
140 |      "output_type": "execute_result"
141 |     }
142 |    ],
143 |    "source": [
144 |     "counts = s.value_counts()\n",
145 |     "counts"
146 |    ]
147 |   },
148 |   {
149 |    "cell_type": "code",
150 |    "execution_count": 4,
151 |    "metadata": {},
152 |    "outputs": [
153 |     {
154 |      "data": {
155 |       "text/plain": [
156 |        "Medium         0.466667\n",
157 |        "High           0.200000\n",
158 |        "Low calorie    0.133333\n",
159 |        "Med-High       0.133333\n",
160 |        "Low-Med        0.033333\n",
161 |        "Low- Med       0.033333\n",
162 |        "Name: energy, dtype: float64"
163 |       ]
164 |      },
165 |      "execution_count": 4,
166 |      "metadata": {},
167 |      "output_type": "execute_result"
168 |     }
169 |    ],
170 |    "source": [
171 |     "percent = s.value_counts(normalize=True)\n",
172 |     "percent"
173 |    ]
174 |   },
175 |   {
176 |    "cell_type": "code",
177 |    "execution_count": 5,
178 |    "metadata": {},
179 |    "outputs": [
180 |     {
181 |      "data": {
182 |       "text/plain": [
183 |        "Medium         46.7%\n",
184 |        "High           20.0%\n",
185 |        "Low calorie    13.3%\n",
186 |        "Med-High       13.3%\n",
187 |        "Low-Med         3.3%\n",
188 |        "Low- Med        3.3%\n",
189 |        "Name: energy, dtype: object"
190 |       ]
191 |      },
192 |      "execution_count": 5,
193 |      "metadata": {},
194 |      "output_type": "execute_result"
195 |     }
196 |    ],
197 |    "source": [
198 |     "percent100 = s.value_counts(normalize=True).mul(100).round(1).astype(str) + '%'\n",
199 |     "percent100"
200 |    ]
201 |   },
202 |   {
203 |    "cell_type": "code",
204 |    "execution_count": 6,
205 |    "metadata": {},
206 |    "outputs": [
207 |     {
208 |      "data": {
209 |       "text/html": [
210 |        "<div>\n",
211 |        "<style scoped>\n",
212 |        "    .dataframe tbody tr th:only-of-type {\n",
213 |        "        vertical-align: middle;\n",
214 |        "    }\n",
215 |        "\n",
216 |        "    .dataframe tbody tr th {\n",
217 |        "        vertical-align: top;\n",
218 |        "    }\n",
219 |        "\n",
220 |        "    .dataframe thead th {\n",
221 |        "        text-align: right;\n",
222 |        "    }\n",
223 |        "</style>\n",
224 |        "<table border=\"1\" class=\"dataframe\">\n",
225 |        "  <thead>\n",
226 |        "    <tr style=\"text-align: right;\">\n",
227 |        "      <th></th>\n",
228 |        "      <th>counts</th>\n",
229 |        "      <th>per</th>\n",
230 |        "      <th>per100</th>\n",
231 |        "    </tr>\n",
232 |        "  </thead>\n",
233 |        "  <tbody>\n",
234 |        "    <tr>\n",
235 |        "      <th>Medium</th>\n",
236 |        "      <td>14</td>\n",
237 |        "      <td>0.466667</td>\n",
238 |        "      <td>46.7%</td>\n",
239 |        "    </tr>\n",
240 |        "    <tr>\n",
241 |        "      <th>High</th>\n",
242 |        "      <td>6</td>\n",
243 |        "      <td>0.200000</td>\n",
244 |        "      <td>20.0%</td>\n",
245 |        "    </tr>\n",
246 |        "    <tr>\n",
247 |        "      <th>Low calorie</th>\n",
248 |        "      <td>4</td>\n",
249 |        "      <td>0.133333</td>\n",
250 |        "      <td>13.3%</td>\n",
251 |        "    </tr>\n",
252 |        "    <tr>\n",
253 |        "      <th>Med-High</th>\n",
254 |        "      <td>4</td>\n",
255 |        "      <td>0.133333</td>\n",
256 |        "      <td>13.3%</td>\n",
257 |        "    </tr>\n",
258 |        "    <tr>\n",
259 |        "      <th>Low-Med</th>\n",
260 |        "      <td>1</td>\n",
261 |        "      <td>0.033333</td>\n",
262 |        "      <td>3.3%</td>\n",
263 |        "    </tr>\n",
264 |        "    <tr>\n",
265 |        "      <th>Low- Med</th>\n",
266 |        "      <td>1</td>\n",
267 |        "      <td>0.033333</td>\n",
268 |        "      <td>3.3%</td>\n",
269 |        "    </tr>\n",
270 |        "  </tbody>\n",
271 |        "</table>\n",
272 |        "</div>"
273 |       ],
274 |       "text/plain": [
275 |        "             counts       per per100\n",
276 |        "Medium           14  0.466667  46.7%\n",
277 |        "High              6  0.200000  20.0%\n",
278 |        "Low calorie       4  0.133333  13.3%\n",
279 |        "Med-High          4  0.133333  13.3%\n",
280 |        "Low-Med           1  0.033333   3.3%\n",
281 |        "Low- Med          1  0.033333   3.3%"
282 |       ]
283 |      },
284 |      "execution_count": 6,
285 |      "metadata": {},
286 |      "output_type": "execute_result"
287 |     }
288 |    ],
289 |    "source": [
290 |     "pd.DataFrame({'counts': counts, 'per': percent, 'per100': percent100})"
291 |    ]
292 |   },
293 |   {
294 |    "cell_type": "code",
295 |    "execution_count": null,
296 |    "metadata": {},
297 |    "outputs": [],
298 |    "source": [
299 |     "s = df.energy\n",
300 |     "counts = s.value_counts()\n",
301 |     "percent = s.value_counts(normalize=True)\n",
302 |     "percent100 = s.value_counts(normalize=True).mul(100).round(1).astype(str) + '%'\n",
303 |     "pd.DataFrame({'counts': counts, 'per': percent, 'per100': percent100})"
304 |    ]
305 |   }
306 |  ],
307 |  "metadata": {
308 |   "kernelspec": {
309 |    "display_name": "Python 3",
310 |    "language": "python",
311 |    "name": "python3"
312 |   },
313 |   "language_info": {
314 |    "codemirror_mode": {
315 |     "name": "ipython",
316 |     "version": 3
317 |    },
318 |    "file_extension": ".py",
319 |    "mimetype": "text/x-python",
320 |    "name": "python",
321 |    "nbconvert_exporter": "python",
322 |    "pygments_lexer": "ipython3",
323 |    "version": "3.6.7"
324 |   }
325 |  },
326 |  "nbformat": 4,
327 |  "nbformat_minor": 2
328 | }
329 | 


--------------------------------------------------------------------------------
/notebooks/Python group and sort a list of lists by a specific index,pattern.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "code",
  5 |    "execution_count": 1,
  6 |    "metadata": {},
  7 |    "outputs": [],
  8 |    "source": [
  9 |     "movies = [\n",
 10 |     "1, \"Avatar\" ,'good',\n",
 11 |     "2, \"Titanic\" ,'not bad',\n",
 12 |     "3, \"Star Wars: The Force Awakens\" ,'good',\n",
 13 |     "4, \"Jurassic World\" ,'good',\n",
 14 |     "5, \"The Avengers\" ,'not bad',\n",
 15 |     "6, \"Furious 7\" ,'not bad',\n",
 16 |     "7, \"Avengers: Age of Ultron\" ,'good',\n",
 17 |     "8, \"Harry Potter and the Deathly Hallows – Part 2\" ,'not bad',\n",
 18 |     "9, \"Frozen\" ,'good',\n",
 19 |     "\n",
 20 |     "\n",
 21 |     "\"The Birth of a Nation\" ,1915,\n",
 22 |     "\"The Birth of a Nation\" ,1940,\n",
 23 |     "\"Gone with the Wind\" ,1940,\n",
 24 |     "\"Gone with the Wind\" ,1963,\n",
 25 |     "\"Gone with the Wind\" ,1963,\n",
 26 |     "\"The Sound of Music\" ,1966]"
 27 |    ]
 28 |   },
 29 |   {
 30 |    "cell_type": "code",
 31 |    "execution_count": null,
 32 |    "metadata": {},
 33 |    "outputs": [],
 34 |    "source": [
 35 |     "def sortGroupList(list_unsorted, category, category2, short=True):\n",
 36 |     "    listx = []\n",
 37 |     "    listy = []\n",
 38 |     "    last_section = 0\n",
 39 |     "    for i in range(0, len(list_unsorted), 3):\n",
 40 |     "        if list_unsorted[i + 2] == category:\n",
 41 |     "            listy.append(list_unsorted[i])\n",
 42 |     "            listy.append(list_unsorted[i + 1])\n",
 43 |     "            if not short:\n",
 44 |     "                listy.append(list_unsorted[i + 2])\n",
 45 |     "            last_section = i+2\n",
 46 |     "        elif list_unsorted[i + 2] == category2:\n",
 47 |     "            listx.append(list_unsorted[i])\n",
 48 |     "            listx.append(list_unsorted[i + 1])\n",
 49 |     "            if not short:\n",
 50 |     "                listx.append(list_unsorted[i + 2])\n",
 51 |     "            last_section = i + 2\n",
 52 |     "    header_category = [' - ' + category + ' - ']\n",
 53 |     "    header_category2 = [' - ' + category2 + ' - ']\n",
 54 |     "    header_category3 = [' - '  + ' - ']\n",
 55 |     "    return header_category + listy + header_category2 + listx + header_category3 +  list_unsorted[last_section:]"
 56 |    ]
 57 |   },
 58 |   {
 59 |    "cell_type": "code",
 60 |    "execution_count": null,
 61 |    "metadata": {},
 62 |    "outputs": [],
 63 |    "source": [
 64 |     "sortGroupList(movies, 'good', 'not bad')"
 65 |    ]
 66 |   },
 67 |   {
 68 |    "cell_type": "code",
 69 |    "execution_count": null,
 70 |    "metadata": {},
 71 |    "outputs": [],
 72 |    "source": [
 73 |     "movies = [\n",
 74 |     "1, \"Avatar\" ,2009,\n",
 75 |     "2, \"Titanic\" ,1997,\n",
 76 |     "3, \"Star Wars: The Force Awakens\" ,2015,\n",
 77 |     "4, \"Jurassic World\" ,2015,\n",
 78 |     "5, \"The Avengers\" ,2012,\n",
 79 |     "6, \"Furious 7\" ,2015,\n",
 80 |     "7, \"Avengers: Age of Ultron\" ,2015,\n",
 81 |     "8, \"Harry Potter and the Deathly Hallows – Part 2\" ,2011,\n",
 82 |     "9, \"Frozen\" ,2013,\n",
 83 |     "\n",
 84 |     "\n",
 85 |     "\"The Birth of a Nation\" ,1915,\n",
 86 |     "\"The Birth of a Nation\" ,1940,\n",
 87 |     "\"Gone with the Wind\" ,1940,\n",
 88 |     "\"Gone with the Wind\" ,1963,\n",
 89 |     "\"The Sound of Music\" ,1966]"
 90 |    ]
 91 |   },
 92 |   {
 93 |    "cell_type": "code",
 94 |    "execution_count": null,
 95 |    "metadata": {},
 96 |    "outputs": [],
 97 |    "source": [
 98 |     "print(len(movies))"
 99 |    ]
100 |   },
101 |   {
102 |    "cell_type": "code",
103 |    "execution_count": null,
104 |    "metadata": {},
105 |    "outputs": [],
106 |    "source": [
107 |     "years = [str(x) for x in range(1997, 2015)]\n",
108 |     "years"
109 |    ]
110 |   },
111 |   {
112 |    "cell_type": "code",
113 |    "execution_count": null,
114 |    "metadata": {},
115 |    "outputs": [],
116 |    "source": [
117 |     "def sortGroupList(list_unsorted):\n",
118 |     "    listx = []\n",
119 |     "    listy = []\n",
120 |     "    for i in range(0, len(list_unsorted), 3):\n",
121 |     "        if list_unsorted[i + 2] in years:\n",
122 |     "            listy.append(list_unsorted[i])\n",
123 |     "            listy.append(list_unsorted[i + 1])\n",
124 |     "            listy.append(list_unsorted[i + 2])\n",
125 |     "        else:\n",
126 |     "            listx.append(list_unsorted[i])\n",
127 |     "            listx.append(list_unsorted[i + 1])\n",
128 |     "            listx.append(list_unsorted[i + 2])\n",
129 |     "    for i in listy:\n",
130 |     "        print(i)\n",
131 |     "    for i in listx:\n",
132 |     "        print(i)"
133 |    ]
134 |   },
135 |   {
136 |    "cell_type": "code",
137 |    "execution_count": null,
138 |    "metadata": {},
139 |    "outputs": [],
140 |    "source": [
141 |     "sortGroupList(movies)"
142 |    ]
143 |   },
144 |   {
145 |    "cell_type": "code",
146 |    "execution_count": null,
147 |    "metadata": {},
148 |    "outputs": [],
149 |    "source": [
150 |     "\n",
151 |     "movies = [\n",
152 |     "1, \"Avatar\" ,'good',\n",
153 |     "2, \"Titanic\" ,'not bad',\n",
154 |     "3, \"Star Wars: The Force Awakens\" ,'good',\n",
155 |     "4, \"Jurassic World\" ,'good',\n",
156 |     "5, \"The Avengers\" ,'not bad',\n",
157 |     "6, \"Furious 7\" ,'not bad',\n",
158 |     "7, \"Avengers: Age of Ultron\" ,'good',\n",
159 |     "8, \"Harry Potter and the Deathly Hallows – Part 2\" ,'not bad',\n",
160 |     "9, \"Frozen\" ,'good',\n",
161 |     "\n",
162 |     "\n",
163 |     "\"The Birth of a Nation\" ,1915,\n",
164 |     "\"The Birth of a Nation\" ,1940,\n",
165 |     "\"Gone with the Wind\" ,1940,\n",
166 |     "\"Gone with the Wind\" ,1963,\n",
167 |     "\"The Sound of Music\" ,1966]\n",
168 |     "df = pd.DataFrame(movies)"
169 |    ]
170 |   },
171 |   {
172 |    "cell_type": "code",
173 |    "execution_count": null,
174 |    "metadata": {},
175 |    "outputs": [],
176 |    "source": [
177 |     "df"
178 |    ]
179 |   },
180 |   {
181 |    "cell_type": "code",
182 |    "execution_count": 2,
183 |    "metadata": {},
184 |    "outputs": [],
185 |    "source": [
186 |     "import pandas as pd\n",
187 |     "types = []\n",
188 |     "raw_list = []\n",
189 |     "for e in movies:\n",
190 |     "    types.append(type(e))\n",
191 |     "    if isinstance(e, int):\n",
192 |     "        raw_list.append(1)\n",
193 |     "    else:\n",
194 |     "        raw_list.append(0)\n",
195 |     "df1 = pd.DataFrame({'elem':movies, 'types':types})    "
196 |    ]
197 |   },
198 |   {
199 |    "cell_type": "code",
200 |    "execution_count": 3,
201 |    "metadata": {},
202 |    "outputs": [],
203 |    "source": [
204 |     "raw_list = [1,\n",
205 |     " 0,\n",
206 |     " 0,\n",
207 |     " 1,\n",
208 |     " 0,\n",
209 |     " 0,\n",
210 |     " 1,\n",
211 |     " 0,\n",
212 |     " 0,\n",
213 |     " 1,\n",
214 |     " 0,\n",
215 |     " 0,\n",
216 |     " 1,\n",
217 |     " 0,\n",
218 |     " 0,\n",
219 |     " 1,\n",
220 |     " 0,\n",
221 |     " 0,\n",
222 |     " 1,\n",
223 |     " 0,\n",
224 |     " 0,\n",
225 |     " 1,\n",
226 |     " 0,\n",
227 |     " 0,\n",
228 |     " 1,\n",
229 |     " 0,\n",
230 |     " 0,\n",
231 |     " 0,\n",
232 |     " 1,\n",
233 |     " 0,\n",
234 |     " 1,\n",
235 |     " 0,\n",
236 |     " 1,\n",
237 |     " 0,\n",
238 |     " 1,\n",
239 |     " 0,\n",
240 |     " 1]\n",
241 |     "movies = [\n",
242 |     "1, \"Avatar\" ,'good',\n",
243 |     "2, \"Titanic\" ,'not bad',\n",
244 |     "3, \"Star Wars: The Force Awakens\" ,'good',\n",
245 |     "4, \"Jurassic World\" ,'good',\n",
246 |     "5, \"The Avengers\" ,'not bad',\n",
247 |     "6, \"Furious 7\" ,'not bad',\n",
248 |     "7, \"Avengers: Age of Ultron\" ,'good',\n",
249 |     "8, \"Harry Potter and the Deathly Hallows – Part 2\" ,'not bad',\n",
250 |     "9, \"Frozen\" ,'good',\n",
251 |     "\n",
252 |     "\n",
253 |     "\"The Birth of a Nation\" ,1915,\n",
254 |     "\"The Birth of a Nation\" ,1940,\n",
255 |     "\"Gone with the Wind\" ,1940,\n",
256 |     "\"Gone with the Wind\" ,1963,\n",
257 |     "\"The Sound of Music\" ,1966]"
258 |    ]
259 |   },
260 |   {
261 |    "cell_type": "code",
262 |    "execution_count": 4,
263 |    "metadata": {},
264 |    "outputs": [
265 |     {
266 |      "name": "stdout",
267 |      "output_type": "stream",
268 |      "text": [
269 |       "[0, 1]\n",
270 |       "[0, 1]\n",
271 |       "[0, 1]\n",
272 |       "[0, 1]\n",
273 |       "[0, 1]\n"
274 |      ]
275 |     },
276 |     {
277 |      "data": {
278 |       "text/plain": [
279 |        "[[1, 'Avatar', 'good'],\n",
280 |        " [2, 'Titanic', 'not bad'],\n",
281 |        " [3, 'Star Wars: The Force Awakens', 'good'],\n",
282 |        " [4, 'Jurassic World', 'good'],\n",
283 |        " [5, 'The Avengers', 'not bad'],\n",
284 |        " [6, 'Furious 7', 'not bad'],\n",
285 |        " [7, 'Avengers: Age of Ultron', 'good'],\n",
286 |        " [8, 'Harry Potter and the Deathly Hallows – Part 2', 'not bad'],\n",
287 |        " [9, 'Frozen', 'good']]"
288 |       ]
289 |      },
290 |      "execution_count": 4,
291 |      "metadata": {},
292 |      "output_type": "execute_result"
293 |     }
294 |    ],
295 |    "source": [
296 |     "patern1 = [1, 0, 0]\n",
297 |     "patern2 = [1, 0]\n",
298 |     "\n",
299 |     "len1 = len(patern1)\n",
300 |     "len2 = len(patern2)\n",
301 |     "\n",
302 |     "output1 = []\n",
303 |     "output2 = []\n",
304 |     "\n",
305 |     "while(raw_list):\n",
306 |     "    if raw_list[:len1] == patern1:    \n",
307 |     "        output1.append(movies[:len1])\n",
308 |     "        raw_list = raw_list[len1:]\n",
309 |     "        movies = movies[len1:]\n",
310 |     "    else:\n",
311 |     "        print(raw_list[:len2])\n",
312 |     "        output2.append(movies[:len2])\n",
313 |     "        raw_list = raw_list[len2:]\n",
314 |     "        movies = movies[len2:]\n",
315 |     "    \n",
316 |     "output1"
317 |    ]
318 |   },
319 |   {
320 |    "cell_type": "code",
321 |    "execution_count": 5,
322 |    "metadata": {},
323 |    "outputs": [
324 |     {
325 |      "data": {
326 |       "text/plain": [
327 |        "[['The Birth of a Nation', 1915],\n",
328 |        " ['The Birth of a Nation', 1940],\n",
329 |        " ['Gone with the Wind', 1940],\n",
330 |        " ['Gone with the Wind', 1963],\n",
331 |        " ['The Sound of Music', 1966]]"
332 |       ]
333 |      },
334 |      "execution_count": 5,
335 |      "metadata": {},
336 |      "output_type": "execute_result"
337 |     }
338 |    ],
339 |    "source": [
340 |     "output2"
341 |    ]
342 |   },
343 |   {
344 |    "cell_type": "code",
345 |    "execution_count": 6,
346 |    "metadata": {},
347 |    "outputs": [],
348 |    "source": [
349 |     "\n",
350 |     "new_list = sorted(output1, key=lambda x: x[2])"
351 |    ]
352 |   },
353 |   {
354 |    "cell_type": "code",
355 |    "execution_count": 7,
356 |    "metadata": {},
357 |    "outputs": [
358 |     {
359 |      "data": {
360 |       "text/plain": [
361 |        "[[1, 'Avatar', 'good'],\n",
362 |        " [3, 'Star Wars: The Force Awakens', 'good'],\n",
363 |        " [4, 'Jurassic World', 'good'],\n",
364 |        " [7, 'Avengers: Age of Ultron', 'good'],\n",
365 |        " [9, 'Frozen', 'good'],\n",
366 |        " [2, 'Titanic', 'not bad'],\n",
367 |        " [5, 'The Avengers', 'not bad'],\n",
368 |        " [6, 'Furious 7', 'not bad'],\n",
369 |        " [8, 'Harry Potter and the Deathly Hallows – Part 2', 'not bad']]"
370 |       ]
371 |      },
372 |      "execution_count": 7,
373 |      "metadata": {},
374 |      "output_type": "execute_result"
375 |     }
376 |    ],
377 |    "source": [
378 |     "new_list"
379 |    ]
380 |   },
381 |   {
382 |    "cell_type": "markdown",
383 |    "metadata": {},
384 |    "source": [
385 |     "## Python make groups in a list"
386 |    ]
387 |   },
388 |   {
389 |    "cell_type": "markdown",
390 |    "metadata": {},
391 |    "source": [
392 |     "#### Simple grouping"
393 |    ]
394 |   },
395 |   {
396 |    "cell_type": "code",
397 |    "execution_count": null,
398 |    "metadata": {},
399 |    "outputs": [],
400 |    "source": []
401 |   }
402 |  ],
403 |  "metadata": {
404 |   "kernelspec": {
405 |    "display_name": "Python 3",
406 |    "language": "python",
407 |    "name": "python3"
408 |   },
409 |   "language_info": {
410 |    "codemirror_mode": {
411 |     "name": "ipython",
412 |     "version": 3
413 |    },
414 |    "file_extension": ".py",
415 |    "mimetype": "text/x-python",
416 |    "name": "python",
417 |    "nbconvert_exporter": "python",
418 |    "pygments_lexer": "ipython3",
419 |    "version": "3.6.7"
420 |   }
421 |  },
422 |  "nbformat": 4,
423 |  "nbformat_minor": 1
424 | }
425 | 


--------------------------------------------------------------------------------
/notebooks/Python_group_or_sort_list_of_lists_by_common_element.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "metadata": {},
  6 |    "source": [
  7 |     "## Python group or sort list of lists by common element\n",
  8 |     "\n",
  9 |     " * Grouping of lists of list by position\n",
 10 |     " * Grouping of lists of list by key\n",
 11 |     " * Sort and group flatten lists of lists\n",
 12 |     " * Grouping list of lists different sizes\n",
 13 |     " \n",
 14 |     " #### Bonus tips\n",
 15 |     " \n",
 16 |     " \n",
 17 |     " * Sort list of lists elements\n",
 18 |     " * sort maps by key or value\n",
 19 |     " * Iterating list over every two elements\n",
 20 |     " * Iterating list over every N elements"
 21 |    ]
 22 |   },
 23 |   {
 24 |    "cell_type": "code",
 25 |    "execution_count": null,
 26 |    "metadata": {},
 27 |    "outputs": [],
 28 |    "source": [
 29 |     "# equaly sized list of lists \n",
 30 |     "[[\"Linux\", 0], [\"Windows 7\",1], [\"Ubuntu\",0], [\"Windows 10\",1], [\"MacOS\",2], [\"Linux Mint\",0]]\n",
 31 |     "\n",
 32 |     "# Different sized list of lists \n",
 33 |     "[[\"Linux\", 0, 22], [\"Windows 7\",1 , 5, 6], [\"Ubuntu\",0], [\"Linux Mint\"]]\n",
 34 |     "\n",
 35 |     "# flatten\n",
 36 |     "[\"Linux\", 0, \"Windows 7\",1, \"Ubuntu\",0, \"Windows 10\",1, \"MacOS\",2, \"Linux Mint\",0]"
 37 |    ]
 38 |   },
 39 |   {
 40 |    "cell_type": "markdown",
 41 |    "metadata": {},
 42 |    "source": [
 43 |     "#### Grouping of lists of list by position (size 2)"
 44 |    ]
 45 |   },
 46 |   {
 47 |    "cell_type": "code",
 48 |    "execution_count": 1,
 49 |    "metadata": {},
 50 |    "outputs": [
 51 |     {
 52 |      "data": {
 53 |       "text/plain": [
 54 |        "[['Linux', 'Ubuntu', 'Linux Mint'], ['Windows 7', 'Windows 10'], ['MacOS']]"
 55 |       ]
 56 |      },
 57 |      "execution_count": 1,
 58 |      "metadata": {},
 59 |      "output_type": "execute_result"
 60 |     }
 61 |    ],
 62 |    "source": [
 63 |     "# equaly sized list of lists \n",
 64 |     "raw_list = [[\"Linux\", 0], [\"Windows 7\",1], [\"Ubuntu\",0], [\"Windows 10\",1], [\"MacOS\",2], [\"Linux Mint\",0]]\n",
 65 |     "\n",
 66 |     "keys = set(map(lambda x:x[1], raw_list))\n",
 67 |     "new_list = [[y[0] for y in raw_list if y[1]==x] for x in keys]\n",
 68 |     "\n",
 69 |     "new_list"
 70 |    ]
 71 |   },
 72 |   {
 73 |    "cell_type": "code",
 74 |    "execution_count": 2,
 75 |    "metadata": {},
 76 |    "outputs": [
 77 |     {
 78 |      "data": {
 79 |       "text/plain": [
 80 |        "{0: ['Linux', 'Ubuntu', 'Linux Mint'],\n",
 81 |        " 1: ['Windows 7', 'Windows 10'],\n",
 82 |        " 2: ['MacOS']}"
 83 |       ]
 84 |      },
 85 |      "execution_count": 2,
 86 |      "metadata": {},
 87 |      "output_type": "execute_result"
 88 |     }
 89 |    ],
 90 |    "source": [
 91 |     "\n",
 92 |     "raw_list = [[\"Linux\", 0], [\"Windows 7\",1], [\"Ubuntu\",0], [\"Windows 10\",1], [\"MacOS\",2], [\"Linux Mint\",0]]\n",
 93 |     "\n",
 94 |     "keys = set(map(lambda x:x[1], raw_list))\n",
 95 |     "new_list = {x:[y[0] for y in raw_list if y[1]==x] for x in keys}\n",
 96 |     "\n",
 97 |     "new_list"
 98 |    ]
 99 |   },
100 |   {
101 |    "cell_type": "markdown",
102 |    "metadata": {},
103 |    "source": [
104 |     "#### Grouping of lists of list by position (size 4)"
105 |    ]
106 |   },
107 |   {
108 |    "cell_type": "code",
109 |    "execution_count": 5,
110 |    "metadata": {},
111 |    "outputs": [
112 |     {
113 |      "data": {
114 |       "text/plain": [
115 |        "{'Ubuntu': [['Xenial Xerus', 0.4], ['Bionic Beaver', 0]],\n",
116 |        " 'Linux Mint': [['Rosa', 17.3], ['Sonya', 18.2]]}"
117 |       ]
118 |      },
119 |      "execution_count": 5,
120 |      "metadata": {},
121 |      "output_type": "execute_result"
122 |     }
123 |    ],
124 |    "source": [
125 |     "raw_list = [\n",
126 |     "    ['Linux Mint', 17, 'Rosa', 17.3], \n",
127 |     "    ['Linux Mint', 18, 'Sonya', 18.2],\n",
128 |     "    ['Ubuntu', 16, 'Xenial Xerus', 0.4],\n",
129 |     "    ['Ubuntu', 18, 'Bionic Beaver', 0]]\n",
130 |     "\n",
131 |     "keys = set(map(lambda x:x[0], raw_list))\n",
132 |     "unsorted_map = {x:[y[2:] for y in raw_list if y[0]==x] for x in keys}\n",
133 |     "\n",
134 |     "unsorted_map"
135 |    ]
136 |   },
137 |   {
138 |    "cell_type": "markdown",
139 |    "metadata": {},
140 |    "source": [
141 |     "#### List of list different size"
142 |    ]
143 |   },
144 |   {
145 |    "cell_type": "code",
146 |    "execution_count": 7,
147 |    "metadata": {},
148 |    "outputs": [],
149 |    "source": [
150 |     "raw_list = [\n",
151 |     "    ['Linux Mint', 17, 'Rosa', 17.3], \n",
152 |     "    ['Linux Mint', 18, 'Sonya', 18.2],\n",
153 |     "    ['Ubuntu', 16, 'Xenial Xerus', 0.4],\n",
154 |     "    ['Ubuntu', 18, 'Bionic Beaver', 0],\n",
155 |     "    \n",
156 |     "    ['Windows', 7, 'Home'],\n",
157 |     "    ['Windows', 7, 'Profesional'],\n",
158 |     "    ['Windows', 10, 'Ultimate']\n",
159 |     "]"
160 |    ]
161 |   },
162 |   {
163 |    "cell_type": "code",
164 |    "execution_count": 9,
165 |    "metadata": {},
166 |    "outputs": [
167 |     {
168 |      "data": {
169 |       "text/plain": [
170 |        "{'Ubuntu': [[16, 'Xenial Xerus'], [18, 'Bionic Beaver']],\n",
171 |        " 'Linux Mint': [[17, 'Rosa'], [18, 'Sonya']],\n",
172 |        " 'Windows': [[7, 'Home'], [7, 'Profesional'], [10, 'Ultimate']]}"
173 |       ]
174 |      },
175 |      "execution_count": 9,
176 |      "metadata": {},
177 |      "output_type": "execute_result"
178 |     }
179 |    ],
180 |    "source": [
181 |     "keys = set(map(lambda x:x[0], raw_list))\n",
182 |     "unsorted_map = {x:[y[1:3] for y in raw_list if y[0]==x] for x in keys}\n",
183 |     "unsorted_map"
184 |    ]
185 |   },
186 |   {
187 |    "cell_type": "markdown",
188 |    "metadata": {},
189 |    "source": [
190 |     "#### Sort python map by key or value"
191 |    ]
192 |   },
193 |   {
194 |    "cell_type": "code",
195 |    "execution_count": 10,
196 |    "metadata": {},
197 |    "outputs": [
198 |     {
199 |      "data": {
200 |       "text/plain": [
201 |        "['Linux Mint', 'Ubuntu', 'Windows']"
202 |       ]
203 |      },
204 |      "execution_count": 10,
205 |      "metadata": {},
206 |      "output_type": "execute_result"
207 |     }
208 |    ],
209 |    "source": [
210 |     "sorted(unsorted_map.keys())"
211 |    ]
212 |   },
213 |   {
214 |    "cell_type": "code",
215 |    "execution_count": 11,
216 |    "metadata": {},
217 |    "outputs": [
218 |     {
219 |      "data": {
220 |       "text/plain": [
221 |        "[[[7, 'Home'], [7, 'Profesional'], [10, 'Ultimate']],\n",
222 |        " [[16, 'Xenial Xerus'], [18, 'Bionic Beaver']],\n",
223 |        " [[17, 'Rosa'], [18, 'Sonya']]]"
224 |       ]
225 |      },
226 |      "execution_count": 11,
227 |      "metadata": {},
228 |      "output_type": "execute_result"
229 |     }
230 |    ],
231 |    "source": [
232 |     "sorted(unsorted_map.values())"
233 |    ]
234 |   },
235 |   {
236 |    "cell_type": "markdown",
237 |    "metadata": {},
238 |    "source": [
239 |     "### Sort list of lists by key"
240 |    ]
241 |   },
242 |   {
243 |    "cell_type": "code",
244 |    "execution_count": 12,
245 |    "metadata": {},
246 |    "outputs": [
247 |     {
248 |      "name": "stdout",
249 |      "output_type": "stream",
250 |      "text": [
251 |       "Linux Mint: [[17, 'Rosa'], [18, 'Sonya']]\n",
252 |       "Ubuntu: [[16, 'Xenial Xerus'], [18, 'Bionic Beaver']]\n",
253 |       "Windows: [[7, 'Home'], [7, 'Profesional'], [10, 'Ultimate']]\n"
254 |      ]
255 |     }
256 |    ],
257 |    "source": [
258 |     "for key in sorted(unsorted_map.keys()):\n",
259 |     "    print (\"%s: %s\" % (key, unsorted_map[key]))"
260 |    ]
261 |   },
262 |   {
263 |    "cell_type": "code",
264 |    "execution_count": 13,
265 |    "metadata": {},
266 |    "outputs": [
267 |     {
268 |      "name": "stdout",
269 |      "output_type": "stream",
270 |      "text": [
271 |       "Windows: [[7, 'Home'], [7, 'Profesional'], [10, 'Ultimate']]\n",
272 |       "Ubuntu: [[16, 'Xenial Xerus'], [18, 'Bionic Beaver']]\n",
273 |       "Linux Mint: [[17, 'Rosa'], [18, 'Sonya']]\n"
274 |      ]
275 |     }
276 |    ],
277 |    "source": [
278 |     "for key in sorted(unsorted_map.keys(), reverse=True):\n",
279 |     "    print (\"%s: %s\" % (key, unsorted_map[key]))"
280 |    ]
281 |   },
282 |   {
283 |    "cell_type": "markdown",
284 |    "metadata": {},
285 |    "source": [
286 |     "### Sort and group flatten lists of lists"
287 |    ]
288 |   },
289 |   {
290 |    "cell_type": "code",
291 |    "execution_count": 17,
292 |    "metadata": {},
293 |    "outputs": [],
294 |    "source": [
295 |     "os_list = [\n",
296 |     "    'Ubuntu 18',\n",
297 |     "    'This article informs you about Ubuntu 18.04 release date,',\n",
298 |     "    'Released',\n",
299 |     "    \n",
300 |     "    'Ubuntu 20',\n",
301 |     "    'The desktop image allows you to try Ubuntu without changing y..',\n",
302 |     "    'Not Released',\n",
303 |     "    \n",
304 |     "    'Ubuntu 19',\n",
305 |     "    'Ubuntu is an open source software operating system that runs from',\n",
306 |     "    'Released',\n",
307 |     "    \n",
308 |     "    'Linux mint 18',\n",
309 |     "    'Linux Mint is an elegant, easy to use, up to date and comfortable',\n",
310 |     "    'Released',\n",
311 |     "    \n",
312 |     "    'Linux mint 20',\n",
313 |     "    'Suggestion: For Mint 20 to go full Debian',\n",
314 |     "    'Not Released',\n",
315 |     "    \n",
316 |     "    'Linux mint 19',\n",
317 |     "    'Linux Mint 19 is a long term support release which will be supported until 2023',\n",
318 |     "    'Released',\n",
319 |     "\n",
320 |     "    'Windows 7',\n",
321 |     "    'Windows 7 is a personal computer operating system that was ..',\n",
322 |     "    'Windows 10',\n",
323 |     "    'Windows 10 is a series of personal computer operating systems',\n",
324 |     "    \"Windows XP\",\n",
325 |     "    'Windows XP is old, and Microsoft no longer provides official support']"
326 |    ]
327 |   },
328 |   {
329 |    "cell_type": "code",
330 |    "execution_count": 14,
331 |    "metadata": {},
332 |    "outputs": [
333 |     {
334 |      "name": "stdout",
335 |      "output_type": "stream",
336 |      "text": [
337 |       "[1, 2]\n",
338 |       "[3, 4]\n",
339 |       "[5, 6]\n"
340 |      ]
341 |     }
342 |    ],
343 |    "source": [
344 |     "# iterating over every two elements\n",
345 |     "test_list = [1, 2, 3, 4, 5, 6]\n",
346 |     "\n",
347 |     "for i in range(0, len(test_list), 2):\n",
348 |     "    print (test_list[i:i+2])"
349 |    ]
350 |   },
351 |   {
352 |    "cell_type": "code",
353 |    "execution_count": 15,
354 |    "metadata": {},
355 |    "outputs": [
356 |     {
357 |      "name": "stdout",
358 |      "output_type": "stream",
359 |      "text": [
360 |       "[0, 1, 2]\n",
361 |       "[3, 4, 5]\n",
362 |       "[6, 7, 8]\n",
363 |       "[9]\n"
364 |      ]
365 |     }
366 |    ],
367 |    "source": [
368 |     "# iterating over every N elements\n",
369 |     "test_list = list(range(0, 10))\n",
370 |     "\n",
371 |     "for i in range(0, len(test_list), 3):\n",
372 |     "    print (test_list[i:i+3])"
373 |    ]
374 |   },
375 |   {
376 |    "cell_type": "code",
377 |    "execution_count": 18,
378 |    "metadata": {},
379 |    "outputs": [
380 |     {
381 |      "data": {
382 |       "text/plain": [
383 |        "['Windows 7',\n",
384 |        " 'Windows 7 is a personal computer operating system that was ..',\n",
385 |        " 'Windows 10',\n",
386 |        " 'Windows 10 is a series of personal computer operating systems',\n",
387 |        " 'Windows XP',\n",
388 |        " 'Windows XP is old, and Microsoft no longer provides official support']"
389 |       ]
390 |      },
391 |      "execution_count": 18,
392 |      "metadata": {},
393 |      "output_type": "execute_result"
394 |     }
395 |    ],
396 |    "source": [
397 |     "list3 = []\n",
398 |     "last = 0\n",
399 |     "for i in range(0, len(os_list), 3):\n",
400 |     "    if i+2 < len(os_list) and os_list[i+2] in ['Released', 'Not Released']:\n",
401 |     "        list3.append(os_list[i:i+3])\n",
402 |     "        last = i+3\n",
403 |     "list2 = os_list[last:]\n",
404 |     "list2"
405 |    ]
406 |   },
407 |   {
408 |    "cell_type": "code",
409 |    "execution_count": 19,
410 |    "metadata": {},
411 |    "outputs": [
412 |     {
413 |      "data": {
414 |       "text/plain": [
415 |        "[['Ubuntu 18',\n",
416 |        "  'This article informs you about Ubuntu 18.04 release date,',\n",
417 |        "  'Released'],\n",
418 |        " ['Ubuntu 20',\n",
419 |        "  'The desktop image allows you to try Ubuntu without changing y..',\n",
420 |        "  'Not Released'],\n",
421 |        " ['Ubuntu 19',\n",
422 |        "  'Ubuntu is an open source software operating system that runs from',\n",
423 |        "  'Released'],\n",
424 |        " ['Linux mint 18',\n",
425 |        "  'Linux Mint is an elegant, easy to use, up to date and comfortable',\n",
426 |        "  'Released'],\n",
427 |        " ['Linux mint 20',\n",
428 |        "  'Suggestion: For Mint 20 to go full Debian',\n",
429 |        "  'Not Released'],\n",
430 |        " ['Linux mint 19',\n",
431 |        "  'Linux Mint 19 is a long term support release which will be supported until 2023',\n",
432 |        "  'Released']]"
433 |       ]
434 |      },
435 |      "execution_count": 19,
436 |      "metadata": {},
437 |      "output_type": "execute_result"
438 |     }
439 |    ],
440 |    "source": [
441 |     "list3"
442 |    ]
443 |   },
444 |   {
445 |    "cell_type": "code",
446 |    "execution_count": 20,
447 |    "metadata": {},
448 |    "outputs": [],
449 |    "source": [
450 |     "def sortList(working_list, category, category2):\n",
451 |     "    listx = []\n",
452 |     "    listy = []\n",
453 |     "    last_section = 0\n",
454 |     "    for i in range(0, len(os_list) - 3, 3):\n",
455 |     "            if working_list[i + 2] == category:\n",
456 |     "                listy.append(working_list[i])\n",
457 |     "                listy.append(working_list[i + 1])\n",
458 |     "                last_section = i + 2\n",
459 |     "            elif working_list[i + 2] == category2:\n",
460 |     "                listx.append(working_list[i])\n",
461 |     "                listx.append(working_list[i + 1])\n",
462 |     "                last_section = i + 2\n",
463 |     "\n",
464 |     "    if last_section > 0:\n",
465 |     "        listz = working_list[(last_section + 1):]\n",
466 |     "    else:\n",
467 |     "        listz = working_list[(last_section):]\n",
468 |     "\n",
469 |     "    return listx, listy, listz"
470 |    ]
471 |   },
472 |   {
473 |    "cell_type": "code",
474 |    "execution_count": 21,
475 |    "metadata": {},
476 |    "outputs": [
477 |     {
478 |      "data": {
479 |       "text/plain": [
480 |        "['Ubuntu 20',\n",
481 |        " 'The desktop image allows you to try Ubuntu without changing y..',\n",
482 |        " 'Linux mint 20',\n",
483 |        " 'Suggestion: For Mint 20 to go full Debian']"
484 |       ]
485 |      },
486 |      "execution_count": 21,
487 |      "metadata": {},
488 |      "output_type": "execute_result"
489 |     }
490 |    ],
491 |    "source": [
492 |     "listx, listy, listz = sortList(os_list, 'Released', 'Not Released')\n",
493 |     "listx"
494 |    ]
495 |   },
496 |   {
497 |    "cell_type": "code",
498 |    "execution_count": 22,
499 |    "metadata": {},
500 |    "outputs": [
501 |     {
502 |      "data": {
503 |       "text/plain": [
504 |        "['Ubuntu 18',\n",
505 |        " 'This article informs you about Ubuntu 18.04 release date,',\n",
506 |        " 'Ubuntu 19',\n",
507 |        " 'Ubuntu is an open source software operating system that runs from',\n",
508 |        " 'Linux mint 18',\n",
509 |        " 'Linux Mint is an elegant, easy to use, up to date and comfortable',\n",
510 |        " 'Linux mint 19',\n",
511 |        " 'Linux Mint 19 is a long term support release which will be supported until 2023']"
512 |       ]
513 |      },
514 |      "execution_count": 22,
515 |      "metadata": {},
516 |      "output_type": "execute_result"
517 |     }
518 |    ],
519 |    "source": [
520 |     "listy"
521 |    ]
522 |   },
523 |   {
524 |    "cell_type": "code",
525 |    "execution_count": 23,
526 |    "metadata": {},
527 |    "outputs": [
528 |     {
529 |      "data": {
530 |       "text/plain": [
531 |        "['Windows 7',\n",
532 |        " 'Windows 7 is a personal computer operating system that was ..',\n",
533 |        " 'Windows 10',\n",
534 |        " 'Windows 10 is a series of personal computer operating systems',\n",
535 |        " 'Windows XP',\n",
536 |        " 'Windows XP is old, and Microsoft no longer provides official support']"
537 |       ]
538 |      },
539 |      "execution_count": 23,
540 |      "metadata": {},
541 |      "output_type": "execute_result"
542 |     }
543 |    ],
544 |    "source": [
545 |     "listz"
546 |    ]
547 |   },
548 |   {
549 |    "cell_type": "markdown",
550 |    "metadata": {},
551 |    "source": [
552 |     "## Generic solution for flatten list"
553 |    ]
554 |   },
555 |   {
556 |    "cell_type": "code",
557 |    "execution_count": 24,
558 |    "metadata": {},
559 |    "outputs": [
560 |     {
561 |      "data": {
562 |       "text/plain": [
563 |        "[['Ubuntu 18',\n",
564 |        "  'This article informs you about Ubuntu 18.04 release date,',\n",
565 |        "  'Released'],\n",
566 |        " ['Ubuntu 20',\n",
567 |        "  'The desktop image allows you to try Ubuntu without changing y..',\n",
568 |        "  'Not Released'],\n",
569 |        " ['Ubuntu 19',\n",
570 |        "  'Ubuntu is an open source software operating system that runs from',\n",
571 |        "  'Released'],\n",
572 |        " ['Linux mint 18',\n",
573 |        "  'Linux Mint is an elegant, easy to use, up to date and comfortable',\n",
574 |        "  'Released'],\n",
575 |        " ['Linux mint 20',\n",
576 |        "  'Suggestion: For Mint 20 to go full Debian',\n",
577 |        "  'Not Released'],\n",
578 |        " ['Linux mint 19',\n",
579 |        "  'Linux Mint 19 is a long term support release which will be supported until 2023',\n",
580 |        "  'Released']]"
581 |       ]
582 |      },
583 |      "execution_count": 24,
584 |      "metadata": {},
585 |      "output_type": "execute_result"
586 |     }
587 |    ],
588 |    "source": [
589 |     "os_list = [\n",
590 |     "    \n",
591 |     "    \n",
592 |     "    'Windows 10',\n",
593 |     "    'Windows 10 is a series of personal computer operating systems',\n",
594 |     "    \"Windows XP\",\n",
595 |     "    'Windows XP is old, and Microsoft no longer provides official support',\n",
596 |     "    \n",
597 |     "    'Ubuntu 18',\n",
598 |     "    'This article informs you about Ubuntu 18.04 release date,',\n",
599 |     "    'Released',\n",
600 |     "\n",
601 |     "    'Ubuntu 20',\n",
602 |     "    'The desktop image allows you to try Ubuntu without changing y..',\n",
603 |     "    'Not Released',\n",
604 |     "\n",
605 |     "    'Windows 7',\n",
606 |     "    'Windows 7 is a personal computer operating system that was ..',\n",
607 |     "\n",
608 |     "    'Ubuntu 19',\n",
609 |     "    'Ubuntu is an open source software operating system that runs from',\n",
610 |     "    'Released',\n",
611 |     "\n",
612 |     "    'Linux mint 18',\n",
613 |     "    'Linux Mint is an elegant, easy to use, up to date and comfortable',\n",
614 |     "    'Released',\n",
615 |     "\n",
616 |     "    'Linux mint 20',\n",
617 |     "    'Suggestion: For Mint 20 to go full Debian',\n",
618 |     "    'Not Released',\n",
619 |     "\n",
620 |     "    'Linux mint 19',\n",
621 |     "    'Linux Mint 19 is a long term support release which will be supported until 2023',\n",
622 |     "    'Released',\n",
623 |     "\n",
624 |     "]\n",
625 |     "\n",
626 |     "list3 = []\n",
627 |     "list2 = []\n",
628 |     "cur = 0\n",
629 |     "\n",
630 |     "os_list_tmp = os_list\n",
631 |     "\n",
632 |     "while cur <= len(os_list_tmp):\n",
633 |     "    cur = 0\n",
634 |     "    if cur+2 < len(os_list_tmp) and os_list_tmp[cur+2] in ['Released', 'Not Released']:\n",
635 |     "        list3.append(os_list_tmp[cur:cur+3])\n",
636 |     "        cur = cur + 3\n",
637 |     "    else:\n",
638 |     "        list2.append(os_list_tmp[cur:cur+2])\n",
639 |     "        cur = cur + 2\n",
640 |     "    os_list_tmp = os_list_tmp[cur:]\n",
641 |     "list3"
642 |    ]
643 |   },
644 |   {
645 |    "cell_type": "code",
646 |    "execution_count": 25,
647 |    "metadata": {},
648 |    "outputs": [
649 |     {
650 |      "data": {
651 |       "text/plain": [
652 |        "[['Windows 10',\n",
653 |        "  'Windows 10 is a series of personal computer operating systems'],\n",
654 |        " ['Windows XP',\n",
655 |        "  'Windows XP is old, and Microsoft no longer provides official support'],\n",
656 |        " ['Windows 7',\n",
657 |        "  'Windows 7 is a personal computer operating system that was ..']]"
658 |       ]
659 |      },
660 |      "execution_count": 25,
661 |      "metadata": {},
662 |      "output_type": "execute_result"
663 |     }
664 |    ],
665 |    "source": [
666 |     "list2"
667 |    ]
668 |   },
669 |   {
670 |    "cell_type": "code",
671 |    "execution_count": null,
672 |    "metadata": {},
673 |    "outputs": [],
674 |    "source": []
675 |   },
676 |   {
677 |    "cell_type": "code",
678 |    "execution_count": null,
679 |    "metadata": {},
680 |    "outputs": [],
681 |    "source": []
682 |   }
683 |  ],
684 |  "metadata": {
685 |   "kernelspec": {
686 |    "display_name": "Python 3",
687 |    "language": "python",
688 |    "name": "python3"
689 |   },
690 |   "language_info": {
691 |    "codemirror_mode": {
692 |     "name": "ipython",
693 |     "version": 3
694 |    },
695 |    "file_extension": ".py",
696 |    "mimetype": "text/x-python",
697 |    "name": "python",
698 |    "nbconvert_exporter": "python",
699 |    "pygments_lexer": "ipython3",
700 |    "version": "3.6.7"
701 |   }
702 |  },
703 |  "nbformat": 4,
704 |  "nbformat_minor": 2
705 | }
706 | 


--------------------------------------------------------------------------------
/notebooks/Q&A/Questions_and_Answers_1_Improve_OCR_and_tabula_range.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "metadata": {},
  6 |    "source": [
  7 |     "## Questions and Answers 2 Improve OCR and tabula range"
  8 |    ]
  9 |   },
 10 |   {
 11 |    "cell_type": "markdown",
 12 |    "metadata": {},
 13 |    "source": [
 14 |     "## Question 1\n",
 15 |     "\n",
 16 |     "#### Extract tabular data from PDF with Python - Tabula, Camelot, PyPDF2\n",
 17 |     "\n",
 18 |     "https://youtu.be/702lkQbZx50\n",
 19 |     "\n",
 20 |     "![Question 1](../images/Selection_177.png)\n",
 21 |     "\n"
 22 |    ]
 23 |   },
 24 |   {
 25 |    "cell_type": "code",
 26 |    "execution_count": 1,
 27 |    "metadata": {},
 28 |    "outputs": [
 29 |     {
 30 |      "data": {
 31 |       "text/plain": [
 32 |        "(29, 4)"
 33 |       ]
 34 |      },
 35 |      "execution_count": 1,
 36 |      "metadata": {},
 37 |      "output_type": "execute_result"
 38 |     }
 39 |    ],
 40 |    "source": [
 41 |     "from tabula import read_pdf\n",
 42 |     "df = read_pdf(\"http://www.uncledavesenterprise.com/file/health/Food%20Calories%20List.pdf\", pages=3)\n",
 43 |     "df.shape"
 44 |    ]
 45 |   },
 46 |   {
 47 |    "cell_type": "code",
 48 |    "execution_count": 2,
 49 |    "metadata": {},
 50 |    "outputs": [
 51 |     {
 52 |      "data": {
 53 |       "text/plain": [
 54 |        "(69, 5)"
 55 |       ]
 56 |      },
 57 |      "execution_count": 2,
 58 |      "metadata": {},
 59 |      "output_type": "execute_result"
 60 |     }
 61 |    ],
 62 |    "source": [
 63 |     "# specify page range 1 to 3 page\n",
 64 |     "df = read_pdf(\"http://www.uncledavesenterprise.com/file/health/Food%20Calories%20List.pdf\", pages='1-3')\n",
 65 |     "df.shape"
 66 |    ]
 67 |   },
 68 |   {
 69 |    "cell_type": "code",
 70 |    "execution_count": 3,
 71 |    "metadata": {},
 72 |    "outputs": [
 73 |     {
 74 |      "data": {
 75 |       "text/html": [
 76 |        "<div>\n",
 77 |        "<style scoped>\n",
 78 |        "    .dataframe tbody tr th:only-of-type {\n",
 79 |        "        vertical-align: middle;\n",
 80 |        "    }\n",
 81 |        "\n",
 82 |        "    .dataframe tbody tr th {\n",
 83 |        "        vertical-align: top;\n",
 84 |        "    }\n",
 85 |        "\n",
 86 |        "    .dataframe thead th {\n",
 87 |        "        text-align: right;\n",
 88 |        "    }\n",
 89 |        "</style>\n",
 90 |        "<table border=\"1\" class=\"dataframe\">\n",
 91 |        "  <thead>\n",
 92 |        "    <tr style=\"text-align: right;\">\n",
 93 |        "      <th></th>\n",
 94 |        "      <th>BREADS &amp; CEREALS</th>\n",
 95 |        "      <th>Portion size *</th>\n",
 96 |        "      <th>per 100 grams (3.5 oz)</th>\n",
 97 |        "      <th>Unnamed: 3</th>\n",
 98 |        "      <th>energy content</th>\n",
 99 |        "    </tr>\n",
100 |        "  </thead>\n",
101 |        "  <tbody>\n",
102 |        "    <tr>\n",
103 |        "      <th>0</th>\n",
104 |        "      <td>Bagel ( 1 average )</td>\n",
105 |        "      <td>140 cals (45g)</td>\n",
106 |        "      <td>310 cals</td>\n",
107 |        "      <td>NaN</td>\n",
108 |        "      <td>Medium</td>\n",
109 |        "    </tr>\n",
110 |        "    <tr>\n",
111 |        "      <th>1</th>\n",
112 |        "      <td>Biscuit digestives</td>\n",
113 |        "      <td>86 cals (per biscuit)</td>\n",
114 |        "      <td>480 cals</td>\n",
115 |        "      <td>NaN</td>\n",
116 |        "      <td>High</td>\n",
117 |        "    </tr>\n",
118 |        "    <tr>\n",
119 |        "      <th>2</th>\n",
120 |        "      <td>Jaffa cake</td>\n",
121 |        "      <td>48 cals (per biscuit)</td>\n",
122 |        "      <td>370 cals</td>\n",
123 |        "      <td>NaN</td>\n",
124 |        "      <td>Med-High</td>\n",
125 |        "    </tr>\n",
126 |        "    <tr>\n",
127 |        "      <th>3</th>\n",
128 |        "      <td>Bread white (thick slice)</td>\n",
129 |        "      <td>96  cals (1 slice 40g)</td>\n",
130 |        "      <td>240 cals</td>\n",
131 |        "      <td>NaN</td>\n",
132 |        "      <td>Medium</td>\n",
133 |        "    </tr>\n",
134 |        "    <tr>\n",
135 |        "      <th>4</th>\n",
136 |        "      <td>Bread wholemeal (thick)</td>\n",
137 |        "      <td>88  cals (1 slice 40g)</td>\n",
138 |        "      <td>220 cals</td>\n",
139 |        "      <td>NaN</td>\n",
140 |        "      <td>Low-med</td>\n",
141 |        "    </tr>\n",
142 |        "  </tbody>\n",
143 |        "</table>\n",
144 |        "</div>"
145 |       ],
146 |       "text/plain": [
147 |        "            BREADS & CEREALS          Portion size * per 100 grams (3.5 oz)  \\\n",
148 |        "0        Bagel ( 1 average )          140 cals (45g)               310 cals   \n",
149 |        "1         Biscuit digestives   86 cals (per biscuit)               480 cals   \n",
150 |        "2                 Jaffa cake   48 cals (per biscuit)               370 cals   \n",
151 |        "3  Bread white (thick slice)  96  cals (1 slice 40g)               240 cals   \n",
152 |        "4    Bread wholemeal (thick)  88  cals (1 slice 40g)               220 cals   \n",
153 |        "\n",
154 |        "  Unnamed: 3 energy content  \n",
155 |        "0        NaN         Medium  \n",
156 |        "1        NaN           High  \n",
157 |        "2        NaN       Med-High  \n",
158 |        "3        NaN         Medium  \n",
159 |        "4        NaN        Low-med  "
160 |       ]
161 |      },
162 |      "execution_count": 3,
163 |      "metadata": {},
164 |      "output_type": "execute_result"
165 |     }
166 |    ],
167 |    "source": [
168 |     "df.head()"
169 |    ]
170 |   },
171 |   {
172 |    "cell_type": "code",
173 |    "execution_count": 4,
174 |    "metadata": {},
175 |    "outputs": [
176 |     {
177 |      "data": {
178 |       "text/html": [
179 |        "<div>\n",
180 |        "<style scoped>\n",
181 |        "    .dataframe tbody tr th:only-of-type {\n",
182 |        "        vertical-align: middle;\n",
183 |        "    }\n",
184 |        "\n",
185 |        "    .dataframe tbody tr th {\n",
186 |        "        vertical-align: top;\n",
187 |        "    }\n",
188 |        "\n",
189 |        "    .dataframe thead th {\n",
190 |        "        text-align: right;\n",
191 |        "    }\n",
192 |        "</style>\n",
193 |        "<table border=\"1\" class=\"dataframe\">\n",
194 |        "  <thead>\n",
195 |        "    <tr style=\"text-align: right;\">\n",
196 |        "      <th></th>\n",
197 |        "      <th>BREADS &amp; CEREALS</th>\n",
198 |        "      <th>Portion size *</th>\n",
199 |        "      <th>per 100 grams (3.5 oz)</th>\n",
200 |        "      <th>Unnamed: 3</th>\n",
201 |        "      <th>energy content</th>\n",
202 |        "    </tr>\n",
203 |        "  </thead>\n",
204 |        "  <tbody>\n",
205 |        "    <tr>\n",
206 |        "      <th>64</th>\n",
207 |        "      <td>Sausage pork fried</td>\n",
208 |        "      <td>250 cals</td>\n",
209 |        "      <td>320 cals</td>\n",
210 |        "      <td>High</td>\n",
211 |        "      <td>NaN</td>\n",
212 |        "    </tr>\n",
213 |        "    <tr>\n",
214 |        "      <th>65</th>\n",
215 |        "      <td>Sausage pork grilled</td>\n",
216 |        "      <td>220 cals</td>\n",
217 |        "      <td>280 cals</td>\n",
218 |        "      <td>Med-High</td>\n",
219 |        "      <td>NaN</td>\n",
220 |        "    </tr>\n",
221 |        "    <tr>\n",
222 |        "      <th>66</th>\n",
223 |        "      <td>Sausage roll</td>\n",
224 |        "      <td>290 cals</td>\n",
225 |        "      <td>480 cals</td>\n",
226 |        "      <td>High</td>\n",
227 |        "      <td>NaN</td>\n",
228 |        "    </tr>\n",
229 |        "    <tr>\n",
230 |        "      <th>67</th>\n",
231 |        "      <td>Scampi fried in oil</td>\n",
232 |        "      <td>400 cals</td>\n",
233 |        "      <td>340 cals</td>\n",
234 |        "      <td>High</td>\n",
235 |        "      <td>NaN</td>\n",
236 |        "    </tr>\n",
237 |        "    <tr>\n",
238 |        "      <th>68</th>\n",
239 |        "      <td>Steak &amp; kidney pie</td>\n",
240 |        "      <td>400 cals</td>\n",
241 |        "      <td>350 cals</td>\n",
242 |        "      <td>High</td>\n",
243 |        "      <td>NaN</td>\n",
244 |        "    </tr>\n",
245 |        "  </tbody>\n",
246 |        "</table>\n",
247 |        "</div>"
248 |       ],
249 |       "text/plain": [
250 |        "        BREADS & CEREALS Portion size * per 100 grams (3.5 oz) Unnamed: 3  \\\n",
251 |        "64    Sausage pork fried       250 cals               320 cals       High   \n",
252 |        "65  Sausage pork grilled       220 cals               280 cals   Med-High   \n",
253 |        "66          Sausage roll       290 cals               480 cals       High   \n",
254 |        "67   Scampi fried in oil       400 cals               340 cals       High   \n",
255 |        "68    Steak & kidney pie       400 cals               350 cals       High   \n",
256 |        "\n",
257 |        "   energy content  \n",
258 |        "64            NaN  \n",
259 |        "65            NaN  \n",
260 |        "66            NaN  \n",
261 |        "67            NaN  \n",
262 |        "68            NaN  "
263 |       ]
264 |      },
265 |      "execution_count": 4,
266 |      "metadata": {},
267 |      "output_type": "execute_result"
268 |     }
269 |    ],
270 |    "source": [
271 |     "df.tail()"
272 |    ]
273 |   },
274 |   {
275 |    "cell_type": "code",
276 |    "execution_count": 5,
277 |    "metadata": {},
278 |    "outputs": [
279 |     {
280 |      "data": {
281 |       "text/plain": [
282 |        "(69, 5)"
283 |       ]
284 |      },
285 |      "execution_count": 5,
286 |      "metadata": {},
287 |      "output_type": "execute_result"
288 |     }
289 |    ],
290 |    "source": [
291 |     "# create page range 1 to 3 page\n",
292 |     "pages=(str(1)+'-'+str(3))\n",
293 |     "df = read_pdf(\"http://www.uncledavesenterprise.com/file/health/Food%20Calories%20List.pdf\", pages=pages)\n",
294 |     "df.shape"
295 |    ]
296 |   },
297 |   {
298 |    "cell_type": "code",
299 |    "execution_count": 6,
300 |    "metadata": {},
301 |    "outputs": [
302 |     {
303 |      "data": {
304 |       "text/plain": [
305 |        "(69, 5)"
306 |       ]
307 |      },
308 |      "execution_count": 6,
309 |      "metadata": {},
310 |      "output_type": "execute_result"
311 |     }
312 |    ],
313 |    "source": [
314 |     "# list all possible pages\n",
315 |     "df = read_pdf(\"http://www.uncledavesenterprise.com/file/health/Food%20Calories%20List.pdf\", pages=[1,2,3])\n",
316 |     "df.shape"
317 |    ]
318 |   },
319 |   {
320 |    "cell_type": "code",
321 |    "execution_count": 7,
322 |    "metadata": {},
323 |    "outputs": [
324 |     {
325 |      "data": {
326 |       "text/plain": [
327 |        "(69, 5)"
328 |       ]
329 |      },
330 |      "execution_count": 7,
331 |      "metadata": {},
332 |      "output_type": "execute_result"
333 |     }
334 |    ],
335 |    "source": [
336 |     "# list all possible pages using range\n",
337 |     "pages = list(range(1, 4))\n",
338 |     "df = read_pdf(\"http://www.uncledavesenterprise.com/file/health/Food%20Calories%20List.pdf\", pages=pages)\n",
339 |     "df.shape"
340 |    ]
341 |   },
342 |   {
343 |    "cell_type": "markdown",
344 |    "metadata": {},
345 |    "source": [
346 |     "## Question 2\n",
347 |     "\n",
348 |     "#### python extract text from image or pdf\n",
349 |     "\n",
350 |     "https://youtu.be/PK-GvWWQ03g\n",
351 |     "\n",
352 |     "![Question ](../images/Selection_178.png)\n",
353 |     "\n",
354 |     "python extract text from image or pdf\n",
355 |     "\n",
356 |     "https://blog.softhints.com/python-extract-text-from-image-or-pdf/\n",
357 |     "\n",
358 |     "Improve OCR Accuracy With Advanced Image Preprocessing\n",
359 |     "\n",
360 |     "https://docparser.com/blog/improve-ocr-accuracy/"
361 |    ]
362 |   },
363 |   {
364 |    "cell_type": "markdown",
365 |    "metadata": {},
366 |    "source": [
367 |     "![Question ](../images/Selection_174.png)\n"
368 |    ]
369 |   },
370 |   {
371 |    "cell_type": "code",
372 |    "execution_count": 8,
373 |    "metadata": {},
374 |    "outputs": [],
375 |    "source": [
376 |     "from PIL import Image\n",
377 |     "import pytesseract"
378 |    ]
379 |   },
380 |   {
381 |    "cell_type": "code",
382 |    "execution_count": 9,
383 |    "metadata": {},
384 |    "outputs": [
385 |     {
386 |      "name": "stdout",
387 |      "output_type": "stream",
388 |      "text": [
389 |       "Java\n",
390 |       "\n",
391 |       "Python\n",
392 |       "\n",
393 |       "public class JavaPyramid1 {\n",
394 |       "public static void main(String[] args) {\n",
395 |       "for(int i=1; i<=5; i++) {\n",
396 |       "for(int j=0; j<i; j++) {\n",
397 |       "System.out.print(\"*\");\n",
398 |       "\n",
399 |       "//generate a new line\n",
400 |       "System.out.printin(\"\");\n",
401 |       "\n",
402 |       "def create_pyramid(rows) :\n",
403 |       "for i in range(rows) :\n",
404 |       "print('*' * (i+1))\n"
405 |      ]
406 |     }
407 |    ],
408 |    "source": [
409 |     "im = Image.open(\"../images/Selection_174.png\")\n",
410 |     "text = pytesseract.image_to_string(im)\n",
411 |     "print(text)"
412 |    ]
413 |   },
414 |   {
415 |    "cell_type": "markdown",
416 |    "metadata": {},
417 |    "source": [
418 |     "![Question ](../images/Selection_171.png)"
419 |    ]
420 |   },
421 |   {
422 |    "cell_type": "code",
423 |    "execution_count": 10,
424 |    "metadata": {},
425 |    "outputs": [
426 |     {
427 |      "name": "stdout",
428 |      "output_type": "stream",
429 |      "text": [
430 |       "* def create_pyramid(rows) :\n",
431 |       "* — for i in range(rows) :\n",
432 |       "*      print(’*' * (i+1))\n"
433 |      ]
434 |     }
435 |    ],
436 |    "source": [
437 |     "# How to get spaces and indentation\n",
438 |     "\n",
439 |     "im = Image.open(\"../images/Selection_171.png\")\n",
440 |     "text = pytesseract.image_to_string(im, config='-c preserve_interword_spaces=1')\n",
441 |     "print(text)"
442 |    ]
443 |   },
444 |   {
445 |    "cell_type": "code",
446 |    "execution_count": 11,
447 |    "metadata": {},
448 |    "outputs": [],
449 |    "source": [
450 |     "# Improve OCR - change the image size\n",
451 |     "\n",
452 |     "s = im.size\n",
453 |     "im.show()\n",
454 |     "newimg = im.resize((s[0]*2, s[1]*2), Image.ANTIALIAS)\n",
455 |     "newimg.show()"
456 |    ]
457 |   },
458 |   {
459 |    "cell_type": "code",
460 |    "execution_count": 13,
461 |    "metadata": {},
462 |    "outputs": [],
463 |    "source": [
464 |     "# Improve OCR - change the contrast\n",
465 |     "\n",
466 |     "from PIL import Image, ImageEnhance\n",
467 |     "\n",
468 |     "contrast = ImageEnhance.Contrast(im)\n",
469 |     "im.show()\n"
470 |    ]
471 |   },
472 |   {
473 |    "cell_type": "code",
474 |    "execution_count": 14,
475 |    "metadata": {},
476 |    "outputs": [],
477 |    "source": [
478 |     "# Improve OCR - convert to black and white\n",
479 |     "\n",
480 |     "black_white = im.convert('1') # convert image to black and white\n",
481 |     "black_white.show()"
482 |    ]
483 |   },
484 |   {
485 |    "cell_type": "code",
486 |    "execution_count": null,
487 |    "metadata": {},
488 |    "outputs": [],
489 |    "source": []
490 |   }
491 |  ],
492 |  "metadata": {
493 |   "kernelspec": {
494 |    "display_name": "Python 3",
495 |    "language": "python",
496 |    "name": "python3"
497 |   },
498 |   "language_info": {
499 |    "codemirror_mode": {
500 |     "name": "ipython",
501 |     "version": 3
502 |    },
503 |    "file_extension": ".py",
504 |    "mimetype": "text/x-python",
505 |    "name": "python",
506 |    "nbconvert_exporter": "python",
507 |    "pygments_lexer": "ipython3",
508 |    "version": "3.6.7"
509 |   }
510 |  },
511 |  "nbformat": 4,
512 |  "nbformat_minor": 2
513 | }
514 | 


--------------------------------------------------------------------------------
/notebooks/What_is_the_usage_of_*_asterisk_in_Python.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "metadata": {},
  6 |    "source": [
  7 |     "# What is the usage of * - asterisk in Python\n",
  8 |     "\n",
  9 |     "* For multiplication and power operations.\n",
 10 |     "* Extending collections\n",
 11 |     "* Unpacking\n",
 12 |     "* positional arguments and keyword arguments"
 13 |    ]
 14 |   },
 15 |   {
 16 |    "cell_type": "markdown",
 17 |    "metadata": {},
 18 |    "source": [
 19 |     "## For multiplication and power operations."
 20 |    ]
 21 |   },
 22 |   {
 23 |    "cell_type": "code",
 24 |    "execution_count": 1,
 25 |    "metadata": {},
 26 |    "outputs": [
 27 |     {
 28 |      "data": {
 29 |       "text/plain": [
 30 |        "30"
 31 |       ]
 32 |      },
 33 |      "execution_count": 1,
 34 |      "metadata": {},
 35 |      "output_type": "execute_result"
 36 |     }
 37 |    ],
 38 |    "source": [
 39 |     "5 * 6"
 40 |    ]
 41 |   },
 42 |   {
 43 |    "cell_type": "code",
 44 |    "execution_count": 2,
 45 |    "metadata": {},
 46 |    "outputs": [
 47 |     {
 48 |      "data": {
 49 |       "text/plain": [
 50 |        "4"
 51 |       ]
 52 |      },
 53 |      "execution_count": 2,
 54 |      "metadata": {},
 55 |      "output_type": "execute_result"
 56 |     }
 57 |    ],
 58 |    "source": [
 59 |     "2 ** 2"
 60 |    ]
 61 |   },
 62 |   {
 63 |    "cell_type": "code",
 64 |    "execution_count": 3,
 65 |    "metadata": {},
 66 |    "outputs": [
 67 |     {
 68 |      "ename": "SyntaxError",
 69 |      "evalue": "invalid syntax (<ipython-input-3-51a6a06f7259>, line 1)",
 70 |      "traceback": [
 71 |       "\u001b[0;36m  File \u001b[0;32m\"<ipython-input-3-51a6a06f7259>\"\u001b[0;36m, line \u001b[0;32m1\u001b[0m\n\u001b[0;31m    2 *** 2\u001b[0m\n\u001b[0m        ^\u001b[0m\n\u001b[0;31mSyntaxError\u001b[0m\u001b[0;31m:\u001b[0m invalid syntax\n"
 72 |      ],
 73 |      "output_type": "error"
 74 |     }
 75 |    ],
 76 |    "source": [
 77 |     "2 *** 2"
 78 |    ]
 79 |   },
 80 |   {
 81 |    "cell_type": "code",
 82 |    "execution_count": 4,
 83 |    "metadata": {},
 84 |    "outputs": [
 85 |     {
 86 |      "data": {
 87 |       "text/plain": [
 88 |        "'aaaaa'"
 89 |       ]
 90 |      },
 91 |      "execution_count": 4,
 92 |      "metadata": {},
 93 |      "output_type": "execute_result"
 94 |     }
 95 |    ],
 96 |    "source": [
 97 |     "'a' * 5"
 98 |    ]
 99 |   },
100 |   {
101 |    "cell_type": "code",
102 |    "execution_count": 5,
103 |    "metadata": {},
104 |    "outputs": [
105 |     {
106 |      "data": {
107 |       "text/plain": [
108 |        "'ffffff'"
109 |       ]
110 |      },
111 |      "execution_count": 5,
112 |      "metadata": {},
113 |      "output_type": "execute_result"
114 |     }
115 |    ],
116 |    "source": [
117 |     "'fff' * 2"
118 |    ]
119 |   },
120 |   {
121 |    "cell_type": "markdown",
122 |    "metadata": {},
123 |    "source": [
124 |     "## Extending collections"
125 |    ]
126 |   },
127 |   {
128 |    "cell_type": "code",
129 |    "execution_count": 6,
130 |    "metadata": {},
131 |    "outputs": [
132 |     {
133 |      "data": {
134 |       "text/plain": [
135 |        "[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]"
136 |       ]
137 |      },
138 |      "execution_count": 6,
139 |      "metadata": {},
140 |      "output_type": "execute_result"
141 |     }
142 |    ],
143 |    "source": [
144 |     "[0] * 20 "
145 |    ]
146 |   },
147 |   {
148 |    "cell_type": "code",
149 |    "execution_count": 7,
150 |    "metadata": {},
151 |    "outputs": [
152 |     {
153 |      "data": {
154 |       "text/plain": [
155 |        "[0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2]"
156 |       ]
157 |      },
158 |      "execution_count": 7,
159 |      "metadata": {},
160 |      "output_type": "execute_result"
161 |     }
162 |    ],
163 |    "source": [
164 |     "[0, 1 , 2] * 5"
165 |    ]
166 |   },
167 |   {
168 |    "cell_type": "code",
169 |    "execution_count": 8,
170 |    "metadata": {},
171 |    "outputs": [
172 |     {
173 |      "data": {
174 |       "text/plain": [
175 |        "[[0, 1, 2], [3], [0, 1, 2], [3]]"
176 |       ]
177 |      },
178 |      "execution_count": 8,
179 |      "metadata": {},
180 |      "output_type": "execute_result"
181 |     }
182 |    ],
183 |    "source": [
184 |     "[[0, 1 , 2], [3]] * 2"
185 |    ]
186 |   },
187 |   {
188 |    "cell_type": "markdown",
189 |    "metadata": {},
190 |    "source": [
191 |     "## Unpacking"
192 |    ]
193 |   },
194 |   {
195 |    "cell_type": "code",
196 |    "execution_count": 9,
197 |    "metadata": {},
198 |    "outputs": [
199 |     {
200 |      "data": {
201 |       "text/plain": [
202 |        "[1, 3, 5, 7, 9]"
203 |       ]
204 |      },
205 |      "execution_count": 9,
206 |      "metadata": {},
207 |      "output_type": "execute_result"
208 |     }
209 |    ],
210 |    "source": [
211 |     "odds = [1, 3, 5, 7, 9]\n",
212 |     "*x, = odds\n",
213 |     "x"
214 |    ]
215 |   },
216 |   {
217 |    "cell_type": "code",
218 |    "execution_count": 10,
219 |    "metadata": {},
220 |    "outputs": [
221 |     {
222 |      "data": {
223 |       "text/plain": [
224 |        "[1, 3, 5, 7]"
225 |       ]
226 |      },
227 |      "execution_count": 10,
228 |      "metadata": {},
229 |      "output_type": "execute_result"
230 |     }
231 |    ],
232 |    "source": [
233 |     "*x,y = odds\n",
234 |     "x"
235 |    ]
236 |   },
237 |   {
238 |    "cell_type": "code",
239 |    "execution_count": 11,
240 |    "metadata": {},
241 |    "outputs": [
242 |     {
243 |      "data": {
244 |       "text/plain": [
245 |        "9"
246 |       ]
247 |      },
248 |      "execution_count": 11,
249 |      "metadata": {},
250 |      "output_type": "execute_result"
251 |     }
252 |    ],
253 |    "source": [
254 |     "y"
255 |    ]
256 |   },
257 |   {
258 |    "cell_type": "code",
259 |    "execution_count": 12,
260 |    "metadata": {},
261 |    "outputs": [],
262 |    "source": [
263 |     "x, *y, z = odds"
264 |    ]
265 |   },
266 |   {
267 |    "cell_type": "code",
268 |    "execution_count": 13,
269 |    "metadata": {},
270 |    "outputs": [
271 |     {
272 |      "data": {
273 |       "text/plain": [
274 |        "[3, 5, 7]"
275 |       ]
276 |      },
277 |      "execution_count": 13,
278 |      "metadata": {},
279 |      "output_type": "execute_result"
280 |     }
281 |    ],
282 |    "source": [
283 |     "y"
284 |    ]
285 |   },
286 |   {
287 |    "cell_type": "code",
288 |    "execution_count": 14,
289 |    "metadata": {},
290 |    "outputs": [
291 |     {
292 |      "name": "stdout",
293 |      "output_type": "stream",
294 |      "text": [
295 |       "(1, 3, 5, 7, 9)\n",
296 |       "([1, 3, 5, 7, 9],)\n"
297 |      ]
298 |     }
299 |    ],
300 |    "source": [
301 |     "odds = [1, 3, 5, 7, 9]\n",
302 |     "\n",
303 |     "def sum_all(*numbers):\n",
304 |     "    print(numbers)\n",
305 |     "\n",
306 |     "sum_all(*odds)\n",
307 |     "\n",
308 |     "sum_all(odds)"
309 |    ]
310 |   },
311 |   {
312 |    "cell_type": "markdown",
313 |    "metadata": {},
314 |    "source": [
315 |     "## positional arguments and keyword arguments"
316 |    ]
317 |   },
318 |   {
319 |    "cell_type": "code",
320 |    "execution_count": 15,
321 |    "metadata": {},
322 |    "outputs": [
323 |     {
324 |      "name": "stdout",
325 |      "output_type": "stream",
326 |      "text": [
327 |       "('x', 'y', 'z', 'w', 'v')\n"
328 |      ]
329 |     }
330 |    ],
331 |    "source": [
332 |     "def print_all(*args):\n",
333 |     "    print(args) \n",
334 |     "print_all('x', 'y', 'z', 'w', 'v')"
335 |    ]
336 |   },
337 |   {
338 |    "cell_type": "code",
339 |    "execution_count": 16,
340 |    "metadata": {},
341 |    "outputs": [
342 |     {
343 |      "name": "stdout",
344 |      "output_type": "stream",
345 |      "text": [
346 |       "{'x': 'x', 'y': 'y', 'z': 'z', 'w': 'w', 'v': 'v'}\n"
347 |      ]
348 |     }
349 |    ],
350 |    "source": [
351 |     "def print_all(**kwargs):\n",
352 |     "    print(kwargs)\n",
353 |     "print_all(x='x', y='y', z='z', w='w', v='v')"
354 |    ]
355 |   },
356 |   {
357 |    "cell_type": "code",
358 |    "execution_count": null,
359 |    "metadata": {},
360 |    "outputs": [],
361 |    "source": []
362 |   }
363 |  ],
364 |  "metadata": {
365 |   "kernelspec": {
366 |    "display_name": "Python 3",
367 |    "language": "python",
368 |    "name": "python3"
369 |   },
370 |   "language_info": {
371 |    "codemirror_mode": {
372 |     "name": "ipython",
373 |     "version": 3
374 |    },
375 |    "file_extension": ".py",
376 |    "mimetype": "text/x-python",
377 |    "name": "python",
378 |    "nbconvert_exporter": "python",
379 |    "pygments_lexer": "ipython3",
380 |    "version": "3.6.7"
381 |   }
382 |  },
383 |  "nbformat": 4,
384 |  "nbformat_minor": 2
385 | }
386 | 


--------------------------------------------------------------------------------
/notebooks/csv/data.csv.zip:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/softhints/python/a256a054d74ca397f41874b3e26f1c4b84214432/notebooks/csv/data.csv.zip


--------------------------------------------------------------------------------
/notebooks/csv/data_201901.csv:
--------------------------------------------------------------------------------
1 | col1,col2,col3
2 | A,B,1
3 | AA,BB,2


--------------------------------------------------------------------------------
/notebooks/csv/data_201902.csv:
--------------------------------------------------------------------------------
1 | col1,col2,col3
2 | C,D,3
3 | CC,DD,4


--------------------------------------------------------------------------------
/notebooks/csv/data_202001.csv:
--------------------------------------------------------------------------------
1 | col1,col2,col3,col4
2 | E,F,5,e5
3 | EE,FF,6,ee6


--------------------------------------------------------------------------------
/notebooks/csv/data_202002.csv:
--------------------------------------------------------------------------------
1 | col1,col2,col3,col5
2 | H,J,7,77
3 | HH,JJ,8,88


--------------------------------------------------------------------------------
/notebooks/csv/excel/example.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/softhints/python/a256a054d74ca397f41874b3e26f1c4b84214432/notebooks/csv/excel/example.xlsx


--------------------------------------------------------------------------------
/notebooks/pandas/Pandas_Select_rows_between_two_dates_-_DataFrame_or_CSV_file.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "metadata": {},
  6 |    "source": [
  7 |     "# Pandas : Select rows between two dates - DataFrame or CSV file"
  8 |    ]
  9 |   },
 10 |   {
 11 |    "cell_type": "markdown",
 12 |    "metadata": {},
 13 |    "source": [
 14 |     "## Resources\n",
 15 |     "\n",
 16 |     "* [pandas.to_datetime](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.to_datetime.html)\n",
 17 |     "* [pandas.DataFrame.between_time](https://pandas.pydata.org/pandas-docs/version/0.23.4/generated/pandas.DataFrame.between_time.html)\n",
 18 |     "* [pandas.DataFrame.loc](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.loc.html)"
 19 |    ]
 20 |   },
 21 |   {
 22 |    "cell_type": "markdown",
 23 |    "metadata": {},
 24 |    "source": [
 25 |     "## Use cases\n",
 26 |     "\n",
 27 |     "* Pandas: Verify columns containing dates\n",
 28 |     "* Convert string to datetime in DataFrame\n",
 29 |     "* Select rows between two dates\n",
 30 |     "    * 1. Select rows based on dates with loc\n",
 31 |     "    * 2. Series method between\n",
 32 |     "    * 3. Select rows between two times\n",
 33 |     "    * 4. Select rows based on dates without loc\n",
 34 |     "    * 5. Use mask to mark the records\n",
 35 |     "    * 6. Select records from last month/30 days "
 36 |    ]
 37 |   },
 38 |   {
 39 |    "cell_type": "markdown",
 40 |    "metadata": {},
 41 |    "source": [
 42 |     "## Step 1: Import Pandas and read data"
 43 |    ]
 44 |   },
 45 |   {
 46 |    "cell_type": "code",
 47 |    "execution_count": 1,
 48 |    "metadata": {},
 49 |    "outputs": [
 50 |     {
 51 |      "data": {
 52 |       "text/html": [
 53 |        "<div>\n",
 54 |        "<style scoped>\n",
 55 |        "    .dataframe tbody tr th:only-of-type {\n",
 56 |        "        vertical-align: middle;\n",
 57 |        "    }\n",
 58 |        "\n",
 59 |        "    .dataframe tbody tr th {\n",
 60 |        "        vertical-align: top;\n",
 61 |        "    }\n",
 62 |        "\n",
 63 |        "    .dataframe thead th {\n",
 64 |        "        text-align: right;\n",
 65 |        "    }\n",
 66 |        "</style>\n",
 67 |        "<table border=\"1\" class=\"dataframe\">\n",
 68 |        "  <thead>\n",
 69 |        "    <tr style=\"text-align: right;\">\n",
 70 |        "      <th></th>\n",
 71 |        "      <th>loading_datetime</th>\n",
 72 |        "      <th>pages</th>\n",
 73 |        "      <th>title</th>\n",
 74 |        "      <th>datetime_col</th>\n",
 75 |        "    </tr>\n",
 76 |        "  </thead>\n",
 77 |        "  <tbody>\n",
 78 |        "    <tr>\n",
 79 |        "      <th>0</th>\n",
 80 |        "      <td>2019-10-28 19:56:03</td>\n",
 81 |        "      <td>main</td>\n",
 82 |        "      <td>&lt;GET https://www.wikipedia.org/&gt; (The Free En...</td>\n",
 83 |        "      <td>2019-10-29 9:06:03</td>\n",
 84 |        "    </tr>\n",
 85 |        "    <tr>\n",
 86 |        "      <th>1</th>\n",
 87 |        "      <td>2019-10-29 19:56:03</td>\n",
 88 |        "      <td>english</td>\n",
 89 |        "      <td>&lt;GET https://en.wikipedia.org/wiki/Main_Page&gt;...</td>\n",
 90 |        "      <td>2019-10-31 11:16:43</td>\n",
 91 |        "    </tr>\n",
 92 |        "    <tr>\n",
 93 |        "      <th>2</th>\n",
 94 |        "      <td>2019-10-29 19:56:03</td>\n",
 95 |        "      <td>italiano</td>\n",
 96 |        "      <td>&lt;GET https://it.wikipedia.org/wiki/Pagina_pri...</td>\n",
 97 |        "      <td>2019-10-30 21:15:23</td>\n",
 98 |        "    </tr>\n",
 99 |        "    <tr>\n",
100 |        "      <th>3</th>\n",
101 |        "      <td>2019-10-30 19:56:03</td>\n",
102 |        "      <td>português</td>\n",
103 |        "      <td>&lt;GET https://pt.wikipedia.org/wiki/Wikip%C3%A...</td>\n",
104 |        "      <td>2019-10-30 20:26:35</td>\n",
105 |        "    </tr>\n",
106 |        "  </tbody>\n",
107 |        "</table>\n",
108 |        "</div>"
109 |       ],
110 |       "text/plain": [
111 |        "      loading_datetime       pages  \\\n",
112 |        "0  2019-10-28 19:56:03        main   \n",
113 |        "1  2019-10-29 19:56:03     english   \n",
114 |        "2  2019-10-29 19:56:03    italiano   \n",
115 |        "3  2019-10-30 19:56:03   português   \n",
116 |        "\n",
117 |        "                                               title          datetime_col  \n",
118 |        "0   <GET https://www.wikipedia.org/> (The Free En...    2019-10-29 9:06:03  \n",
119 |        "1   <GET https://en.wikipedia.org/wiki/Main_Page>...   2019-10-31 11:16:43  \n",
120 |        "2   <GET https://it.wikipedia.org/wiki/Pagina_pri...   2019-10-30 21:15:23  \n",
121 |        "3   <GET https://pt.wikipedia.org/wiki/Wikip%C3%A...   2019-10-30 20:26:35  "
122 |       ]
123 |      },
124 |      "execution_count": 1,
125 |      "metadata": {},
126 |      "output_type": "execute_result"
127 |     }
128 |    ],
129 |    "source": [
130 |     "import pandas as pd\n",
131 |     "df = pd.read_csv(\"../csv/data.csv\")\n",
132 |     "df"
133 |    ]
134 |   },
135 |   {
136 |    "cell_type": "markdown",
137 |    "metadata": {},
138 |    "source": [
139 |     "## Step 2: Pandas: Verify columns containing dates"
140 |    ]
141 |   },
142 |   {
143 |    "cell_type": "code",
144 |    "execution_count": 2,
145 |    "metadata": {},
146 |    "outputs": [
147 |     {
148 |      "data": {
149 |       "text/plain": [
150 |        "loading_datetime    object\n",
151 |        "pages               object\n",
152 |        "title               object\n",
153 |        "datetime_col        object\n",
154 |        "dtype: object"
155 |       ]
156 |      },
157 |      "execution_count": 2,
158 |      "metadata": {},
159 |      "output_type": "execute_result"
160 |     }
161 |    ],
162 |    "source": [
163 |     "df.dtypes"
164 |    ]
165 |   },
166 |   {
167 |    "cell_type": "code",
168 |    "execution_count": 3,
169 |    "metadata": {},
170 |    "outputs": [
171 |     {
172 |      "data": {
173 |       "text/plain": [
174 |        "0      2019-10-29 9:06:03\n",
175 |        "1     2019-10-31 11:16:43\n",
176 |        "2     2019-10-30 21:15:23\n",
177 |        "3     2019-10-30 20:26:35\n",
178 |        "Name: datetime_col, dtype: object"
179 |       ]
180 |      },
181 |      "execution_count": 3,
182 |      "metadata": {},
183 |      "output_type": "execute_result"
184 |     }
185 |    ],
186 |    "source": [
187 |     "df.datetime_col"
188 |    ]
189 |   },
190 |   {
191 |    "cell_type": "code",
192 |    "execution_count": 4,
193 |    "metadata": {},
194 |    "outputs": [],
195 |    "source": [
196 |     "dateCols = ['loading_datetime']\n",
197 |     "df = pd.read_csv(\"../csv/data.csv\", parse_dates=dateCols)"
198 |    ]
199 |   },
200 |   {
201 |    "cell_type": "code",
202 |    "execution_count": 5,
203 |    "metadata": {},
204 |    "outputs": [
205 |     {
206 |      "data": {
207 |       "text/plain": [
208 |        "loading_datetime    datetime64[ns]\n",
209 |        "pages                       object\n",
210 |        "title                       object\n",
211 |        "datetime_col                object\n",
212 |        "dtype: object"
213 |       ]
214 |      },
215 |      "execution_count": 5,
216 |      "metadata": {},
217 |      "output_type": "execute_result"
218 |     }
219 |    ],
220 |    "source": [
221 |     "df.dtypes"
222 |    ]
223 |   },
224 |   {
225 |    "cell_type": "markdown",
226 |    "metadata": {},
227 |    "source": [
228 |     "## Step 3: Convert string to datetime in DataFrame"
229 |    ]
230 |   },
231 |   {
232 |    "cell_type": "code",
233 |    "execution_count": 6,
234 |    "metadata": {},
235 |    "outputs": [],
236 |    "source": [
237 |     "df.datetime_col=pd.to_datetime(df.datetime_col)"
238 |    ]
239 |   },
240 |   {
241 |    "cell_type": "code",
242 |    "execution_count": 7,
243 |    "metadata": {},
244 |    "outputs": [
245 |     {
246 |      "data": {
247 |       "text/plain": [
248 |        "loading_datetime    datetime64[ns]\n",
249 |        "pages                       object\n",
250 |        "title                       object\n",
251 |        "datetime_col        datetime64[ns]\n",
252 |        "dtype: object"
253 |       ]
254 |      },
255 |      "execution_count": 7,
256 |      "metadata": {},
257 |      "output_type": "execute_result"
258 |     }
259 |    ],
260 |    "source": [
261 |     "df.dtypes"
262 |    ]
263 |   },
264 |   {
265 |    "cell_type": "code",
266 |    "execution_count": 8,
267 |    "metadata": {},
268 |    "outputs": [],
269 |    "source": [
270 |     "df.datetime_col=pd.to_datetime(df.datetime_col, utc=True)"
271 |    ]
272 |   },
273 |   {
274 |    "cell_type": "code",
275 |    "execution_count": 9,
276 |    "metadata": {},
277 |    "outputs": [
278 |     {
279 |      "data": {
280 |       "text/plain": [
281 |        "loading_datetime         datetime64[ns]\n",
282 |        "pages                            object\n",
283 |        "title                            object\n",
284 |        "datetime_col        datetime64[ns, UTC]\n",
285 |        "dtype: object"
286 |       ]
287 |      },
288 |      "execution_count": 9,
289 |      "metadata": {},
290 |      "output_type": "execute_result"
291 |     }
292 |    ],
293 |    "source": [
294 |     "df.dtypes"
295 |    ]
296 |   },
297 |   {
298 |    "cell_type": "markdown",
299 |    "metadata": {},
300 |    "source": [
301 |     "## Step 4: Select rows between two dates"
302 |    ]
303 |   },
304 |   {
305 |    "cell_type": "markdown",
306 |    "metadata": {},
307 |    "source": [
308 |     "#### 1. Select rows based on dates with loc"
309 |    ]
310 |   },
311 |   {
312 |    "cell_type": "code",
313 |    "execution_count": 10,
314 |    "metadata": {},
315 |    "outputs": [
316 |     {
317 |      "data": {
318 |       "text/html": [
319 |        "<div>\n",
320 |        "<style scoped>\n",
321 |        "    .dataframe tbody tr th:only-of-type {\n",
322 |        "        vertical-align: middle;\n",
323 |        "    }\n",
324 |        "\n",
325 |        "    .dataframe tbody tr th {\n",
326 |        "        vertical-align: top;\n",
327 |        "    }\n",
328 |        "\n",
329 |        "    .dataframe thead th {\n",
330 |        "        text-align: right;\n",
331 |        "    }\n",
332 |        "</style>\n",
333 |        "<table border=\"1\" class=\"dataframe\">\n",
334 |        "  <thead>\n",
335 |        "    <tr style=\"text-align: right;\">\n",
336 |        "      <th></th>\n",
337 |        "      <th>loading_datetime</th>\n",
338 |        "      <th>pages</th>\n",
339 |        "      <th>title</th>\n",
340 |        "      <th>datetime_col</th>\n",
341 |        "    </tr>\n",
342 |        "  </thead>\n",
343 |        "  <tbody>\n",
344 |        "    <tr>\n",
345 |        "      <th>1</th>\n",
346 |        "      <td>2019-10-29 19:56:03</td>\n",
347 |        "      <td>english</td>\n",
348 |        "      <td>&lt;GET https://en.wikipedia.org/wiki/Main_Page&gt;...</td>\n",
349 |        "      <td>2019-10-31 11:16:43+00:00</td>\n",
350 |        "    </tr>\n",
351 |        "    <tr>\n",
352 |        "      <th>2</th>\n",
353 |        "      <td>2019-10-29 19:56:03</td>\n",
354 |        "      <td>italiano</td>\n",
355 |        "      <td>&lt;GET https://it.wikipedia.org/wiki/Pagina_pri...</td>\n",
356 |        "      <td>2019-10-30 21:15:23+00:00</td>\n",
357 |        "    </tr>\n",
358 |        "  </tbody>\n",
359 |        "</table>\n",
360 |        "</div>"
361 |       ],
362 |       "text/plain": [
363 |        "     loading_datetime      pages  \\\n",
364 |        "1 2019-10-29 19:56:03    english   \n",
365 |        "2 2019-10-29 19:56:03   italiano   \n",
366 |        "\n",
367 |        "                                               title              datetime_col  \n",
368 |        "1   <GET https://en.wikipedia.org/wiki/Main_Page>... 2019-10-31 11:16:43+00:00  \n",
369 |        "2   <GET https://it.wikipedia.org/wiki/Pagina_pri... 2019-10-30 21:15:23+00:00  "
370 |       ]
371 |      },
372 |      "execution_count": 10,
373 |      "metadata": {},
374 |      "output_type": "execute_result"
375 |     }
376 |    ],
377 |    "source": [
378 |     "start_date = pd.to_datetime('2019-10-30 20:41', utc= True)\n",
379 |     "end_date = pd.to_datetime('5/13/2020 8:55', utc= True)\n",
380 |     "\n",
381 |     "df.loc[(df['datetime_col'] > start_date) & (df['datetime_col'] < end_date)]"
382 |    ]
383 |   },
384 |   {
385 |    "cell_type": "markdown",
386 |    "metadata": {},
387 |    "source": [
388 |     "#### 2.  Series method between"
389 |    ]
390 |   },
391 |   {
392 |    "cell_type": "code",
393 |    "execution_count": null,
394 |    "metadata": {},
395 |    "outputs": [],
396 |    "source": [
397 |     "start_date = pd.to_datetime('2019-10-30 20:41', utc= True)\n",
398 |     "end_date = pd.to_datetime('5/13/2020 8:55', utc= True)\n",
399 |     "\n",
400 |     "df[df.datetime_col.between(start_date, end_date)]"
401 |    ]
402 |   },
403 |   {
404 |    "cell_type": "markdown",
405 |    "metadata": {},
406 |    "source": [
407 |     "#### 3. Select rows between two times"
408 |    ]
409 |   },
410 |   {
411 |    "cell_type": "code",
412 |    "execution_count": 11,
413 |    "metadata": {},
414 |    "outputs": [
415 |     {
416 |      "data": {
417 |       "text/html": [
418 |        "<div>\n",
419 |        "<style scoped>\n",
420 |        "    .dataframe tbody tr th:only-of-type {\n",
421 |        "        vertical-align: middle;\n",
422 |        "    }\n",
423 |        "\n",
424 |        "    .dataframe tbody tr th {\n",
425 |        "        vertical-align: top;\n",
426 |        "    }\n",
427 |        "\n",
428 |        "    .dataframe thead th {\n",
429 |        "        text-align: right;\n",
430 |        "    }\n",
431 |        "</style>\n",
432 |        "<table border=\"1\" class=\"dataframe\">\n",
433 |        "  <thead>\n",
434 |        "    <tr style=\"text-align: right;\">\n",
435 |        "      <th></th>\n",
436 |        "      <th>loading_datetime</th>\n",
437 |        "      <th>pages</th>\n",
438 |        "      <th>title</th>\n",
439 |        "    </tr>\n",
440 |        "    <tr>\n",
441 |        "      <th>datetime_col</th>\n",
442 |        "      <th></th>\n",
443 |        "      <th></th>\n",
444 |        "      <th></th>\n",
445 |        "    </tr>\n",
446 |        "  </thead>\n",
447 |        "  <tbody>\n",
448 |        "    <tr>\n",
449 |        "      <th>2019-10-30 21:15:23+00:00</th>\n",
450 |        "      <td>2019-10-29 19:56:03</td>\n",
451 |        "      <td>italiano</td>\n",
452 |        "      <td>&lt;GET https://it.wikipedia.org/wiki/Pagina_pri...</td>\n",
453 |        "    </tr>\n",
454 |        "  </tbody>\n",
455 |        "</table>\n",
456 |        "</div>"
457 |       ],
458 |       "text/plain": [
459 |        "                             loading_datetime      pages  \\\n",
460 |        "datetime_col                                               \n",
461 |        "2019-10-30 21:15:23+00:00 2019-10-29 19:56:03   italiano   \n",
462 |        "\n",
463 |        "                                                                       title  \n",
464 |        "datetime_col                                                                  \n",
465 |        "2019-10-30 21:15:23+00:00   <GET https://it.wikipedia.org/wiki/Pagina_pri...  "
466 |       ]
467 |      },
468 |      "execution_count": 11,
469 |      "metadata": {},
470 |      "output_type": "execute_result"
471 |     }
472 |    ],
473 |    "source": [
474 |     "df2 = df.copy()\n",
475 |     "df2 = df2.set_index(['datetime_col'])\n",
476 |     "df2.between_time('21:10', '23:50')"
477 |    ]
478 |   },
479 |   {
480 |    "cell_type": "markdown",
481 |    "metadata": {},
482 |    "source": [
483 |     "#### 4. Select rows based on dates without loc"
484 |    ]
485 |   },
486 |   {
487 |    "cell_type": "code",
488 |    "execution_count": null,
489 |    "metadata": {},
490 |    "outputs": [],
491 |    "source": [
492 |     "df[(df['datetime_col'] > '2018-12-02') & (df['datetime_col'] <= '2018-12-03 23:26:10+00:00')]"
493 |    ]
494 |   },
495 |   {
496 |    "cell_type": "markdown",
497 |    "metadata": {},
498 |    "source": [
499 |     "#### 6. Select records from last month/30 days "
500 |    ]
501 |   },
502 |   {
503 |    "cell_type": "code",
504 |    "execution_count": 12,
505 |    "metadata": {},
506 |    "outputs": [
507 |     {
508 |      "data": {
509 |       "text/html": [
510 |        "<div>\n",
511 |        "<style scoped>\n",
512 |        "    .dataframe tbody tr th:only-of-type {\n",
513 |        "        vertical-align: middle;\n",
514 |        "    }\n",
515 |        "\n",
516 |        "    .dataframe tbody tr th {\n",
517 |        "        vertical-align: top;\n",
518 |        "    }\n",
519 |        "\n",
520 |        "    .dataframe thead th {\n",
521 |        "        text-align: right;\n",
522 |        "    }\n",
523 |        "</style>\n",
524 |        "<table border=\"1\" class=\"dataframe\">\n",
525 |        "  <thead>\n",
526 |        "    <tr style=\"text-align: right;\">\n",
527 |        "      <th></th>\n",
528 |        "      <th>loading_datetime</th>\n",
529 |        "      <th>pages</th>\n",
530 |        "      <th>title</th>\n",
531 |        "      <th>datetime_col</th>\n",
532 |        "    </tr>\n",
533 |        "  </thead>\n",
534 |        "  <tbody>\n",
535 |        "    <tr>\n",
536 |        "      <th>1</th>\n",
537 |        "      <td>2019-10-29 19:56:03</td>\n",
538 |        "      <td>english</td>\n",
539 |        "      <td>&lt;GET https://en.wikipedia.org/wiki/Main_Page&gt;...</td>\n",
540 |        "      <td>2019-10-31 11:16:43+00:00</td>\n",
541 |        "    </tr>\n",
542 |        "  </tbody>\n",
543 |        "</table>\n",
544 |        "</div>"
545 |       ],
546 |       "text/plain": [
547 |        "     loading_datetime     pages  \\\n",
548 |        "1 2019-10-29 19:56:03   english   \n",
549 |        "\n",
550 |        "                                               title              datetime_col  \n",
551 |        "1   <GET https://en.wikipedia.org/wiki/Main_Page>... 2019-10-31 11:16:43+00:00  "
552 |       ]
553 |      },
554 |      "execution_count": 12,
555 |      "metadata": {},
556 |      "output_type": "execute_result"
557 |     }
558 |    ],
559 |    "source": [
560 |     "df[df[\"datetime_col\"] >= (pd.to_datetime('11/30/2019', utc=True) - pd.Timedelta(days=30))]"
561 |    ]
562 |   },
563 |   {
564 |    "cell_type": "code",
565 |    "execution_count": null,
566 |    "metadata": {},
567 |    "outputs": [],
568 |    "source": []
569 |   }
570 |  ],
571 |  "metadata": {
572 |   "kernelspec": {
573 |    "display_name": "Python 3",
574 |    "language": "python",
575 |    "name": "python3"
576 |   },
577 |   "language_info": {
578 |    "codemirror_mode": {
579 |     "name": "ipython",
580 |     "version": 3
581 |    },
582 |    "file_extension": ".py",
583 |    "mimetype": "text/x-python",
584 |    "name": "python",
585 |    "nbconvert_exporter": "python",
586 |    "pygments_lexer": "ipython3",
587 |    "version": "3.6.9"
588 |   }
589 |  },
590 |  "nbformat": 4,
591 |  "nbformat_minor": 2
592 | }
593 | 


--------------------------------------------------------------------------------
/notebooks/python/Files/How_to_merge_multiple_CSV_files_with_Python.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "metadata": {},
  6 |    "source": [
  7 |     "# How to merge multiple CSV files with Python\n",
  8 |     "Python convert normal JSON to JSON separated lines 3 examples\n",
  9 |     "\n",
 10 |     "* Steps to merge multiple CSV(identical) files with Python\n",
 11 |     "* Steps to merge multiple CSV(identical) files with Python with trace\n",
 12 |     "* Combine multiple CSV files when the columns are different\n",
 13 |     "* Bonus: Merge multiple files with Windows/Linux"
 14 |    ]
 15 |   },
 16 |   {
 17 |    "cell_type": "code",
 18 |    "execution_count": 2,
 19 |    "metadata": {},
 20 |    "outputs": [
 21 |     {
 22 |      "data": {
 23 |       "text/plain": [
 24 |        "['../../csv/data_202001.csv',\n",
 25 |        " '../../csv/data_202002.csv',\n",
 26 |        " '../../csv/data_201902.csv',\n",
 27 |        " '../../csv/data_201901.csv']"
 28 |       ]
 29 |      },
 30 |      "metadata": {},
 31 |      "output_type": "display_data"
 32 |     },
 33 |     {
 34 |      "data": {
 35 |       "text/html": [
 36 |        "<div>\n",
 37 |        "<style scoped>\n",
 38 |        "    .dataframe tbody tr th:only-of-type {\n",
 39 |        "        vertical-align: middle;\n",
 40 |        "    }\n",
 41 |        "\n",
 42 |        "    .dataframe tbody tr th {\n",
 43 |        "        vertical-align: top;\n",
 44 |        "    }\n",
 45 |        "\n",
 46 |        "    .dataframe thead th {\n",
 47 |        "        text-align: right;\n",
 48 |        "    }\n",
 49 |        "</style>\n",
 50 |        "<table border=\"1\" class=\"dataframe\">\n",
 51 |        "  <thead>\n",
 52 |        "    <tr style=\"text-align: right;\">\n",
 53 |        "      <th></th>\n",
 54 |        "      <th>col1</th>\n",
 55 |        "      <th>col2</th>\n",
 56 |        "      <th>col3</th>\n",
 57 |        "      <th>col4</th>\n",
 58 |        "    </tr>\n",
 59 |        "  </thead>\n",
 60 |        "  <tbody>\n",
 61 |        "    <tr>\n",
 62 |        "      <th>0</th>\n",
 63 |        "      <td>E</td>\n",
 64 |        "      <td>F</td>\n",
 65 |        "      <td>5</td>\n",
 66 |        "      <td>e5</td>\n",
 67 |        "    </tr>\n",
 68 |        "    <tr>\n",
 69 |        "      <th>1</th>\n",
 70 |        "      <td>EE</td>\n",
 71 |        "      <td>FF</td>\n",
 72 |        "      <td>6</td>\n",
 73 |        "      <td>ee6</td>\n",
 74 |        "    </tr>\n",
 75 |        "  </tbody>\n",
 76 |        "</table>\n",
 77 |        "</div>"
 78 |       ],
 79 |       "text/plain": [
 80 |        "  col1 col2  col3 col4\n",
 81 |        "0    E    F     5   e5\n",
 82 |        "1   EE   FF     6  ee6"
 83 |       ]
 84 |      },
 85 |      "metadata": {},
 86 |      "output_type": "display_data"
 87 |     },
 88 |     {
 89 |      "data": {
 90 |       "text/html": [
 91 |        "<div>\n",
 92 |        "<style scoped>\n",
 93 |        "    .dataframe tbody tr th:only-of-type {\n",
 94 |        "        vertical-align: middle;\n",
 95 |        "    }\n",
 96 |        "\n",
 97 |        "    .dataframe tbody tr th {\n",
 98 |        "        vertical-align: top;\n",
 99 |        "    }\n",
100 |        "\n",
101 |        "    .dataframe thead th {\n",
102 |        "        text-align: right;\n",
103 |        "    }\n",
104 |        "</style>\n",
105 |        "<table border=\"1\" class=\"dataframe\">\n",
106 |        "  <thead>\n",
107 |        "    <tr style=\"text-align: right;\">\n",
108 |        "      <th></th>\n",
109 |        "      <th>col1</th>\n",
110 |        "      <th>col2</th>\n",
111 |        "      <th>col3</th>\n",
112 |        "      <th>col5</th>\n",
113 |        "    </tr>\n",
114 |        "  </thead>\n",
115 |        "  <tbody>\n",
116 |        "    <tr>\n",
117 |        "      <th>0</th>\n",
118 |        "      <td>H</td>\n",
119 |        "      <td>J</td>\n",
120 |        "      <td>7</td>\n",
121 |        "      <td>77</td>\n",
122 |        "    </tr>\n",
123 |        "    <tr>\n",
124 |        "      <th>1</th>\n",
125 |        "      <td>HH</td>\n",
126 |        "      <td>JJ</td>\n",
127 |        "      <td>8</td>\n",
128 |        "      <td>88</td>\n",
129 |        "    </tr>\n",
130 |        "  </tbody>\n",
131 |        "</table>\n",
132 |        "</div>"
133 |       ],
134 |       "text/plain": [
135 |        "  col1 col2  col3  col5\n",
136 |        "0    H    J     7    77\n",
137 |        "1   HH   JJ     8    88"
138 |       ]
139 |      },
140 |      "metadata": {},
141 |      "output_type": "display_data"
142 |     },
143 |     {
144 |      "data": {
145 |       "text/html": [
146 |        "<div>\n",
147 |        "<style scoped>\n",
148 |        "    .dataframe tbody tr th:only-of-type {\n",
149 |        "        vertical-align: middle;\n",
150 |        "    }\n",
151 |        "\n",
152 |        "    .dataframe tbody tr th {\n",
153 |        "        vertical-align: top;\n",
154 |        "    }\n",
155 |        "\n",
156 |        "    .dataframe thead th {\n",
157 |        "        text-align: right;\n",
158 |        "    }\n",
159 |        "</style>\n",
160 |        "<table border=\"1\" class=\"dataframe\">\n",
161 |        "  <thead>\n",
162 |        "    <tr style=\"text-align: right;\">\n",
163 |        "      <th></th>\n",
164 |        "      <th>col1</th>\n",
165 |        "      <th>col2</th>\n",
166 |        "      <th>col3</th>\n",
167 |        "    </tr>\n",
168 |        "  </thead>\n",
169 |        "  <tbody>\n",
170 |        "    <tr>\n",
171 |        "      <th>0</th>\n",
172 |        "      <td>C</td>\n",
173 |        "      <td>D</td>\n",
174 |        "      <td>3</td>\n",
175 |        "    </tr>\n",
176 |        "    <tr>\n",
177 |        "      <th>1</th>\n",
178 |        "      <td>CC</td>\n",
179 |        "      <td>DD</td>\n",
180 |        "      <td>4</td>\n",
181 |        "    </tr>\n",
182 |        "  </tbody>\n",
183 |        "</table>\n",
184 |        "</div>"
185 |       ],
186 |       "text/plain": [
187 |        "  col1 col2  col3\n",
188 |        "0    C    D     3\n",
189 |        "1   CC   DD     4"
190 |       ]
191 |      },
192 |      "metadata": {},
193 |      "output_type": "display_data"
194 |     },
195 |     {
196 |      "data": {
197 |       "text/html": [
198 |        "<div>\n",
199 |        "<style scoped>\n",
200 |        "    .dataframe tbody tr th:only-of-type {\n",
201 |        "        vertical-align: middle;\n",
202 |        "    }\n",
203 |        "\n",
204 |        "    .dataframe tbody tr th {\n",
205 |        "        vertical-align: top;\n",
206 |        "    }\n",
207 |        "\n",
208 |        "    .dataframe thead th {\n",
209 |        "        text-align: right;\n",
210 |        "    }\n",
211 |        "</style>\n",
212 |        "<table border=\"1\" class=\"dataframe\">\n",
213 |        "  <thead>\n",
214 |        "    <tr style=\"text-align: right;\">\n",
215 |        "      <th></th>\n",
216 |        "      <th>col1</th>\n",
217 |        "      <th>col2</th>\n",
218 |        "      <th>col3</th>\n",
219 |        "    </tr>\n",
220 |        "  </thead>\n",
221 |        "  <tbody>\n",
222 |        "    <tr>\n",
223 |        "      <th>0</th>\n",
224 |        "      <td>A</td>\n",
225 |        "      <td>B</td>\n",
226 |        "      <td>1</td>\n",
227 |        "    </tr>\n",
228 |        "    <tr>\n",
229 |        "      <th>1</th>\n",
230 |        "      <td>AA</td>\n",
231 |        "      <td>BB</td>\n",
232 |        "      <td>2</td>\n",
233 |        "    </tr>\n",
234 |        "  </tbody>\n",
235 |        "</table>\n",
236 |        "</div>"
237 |       ],
238 |       "text/plain": [
239 |        "  col1 col2  col3\n",
240 |        "0    A    B     1\n",
241 |        "1   AA   BB     2"
242 |       ]
243 |      },
244 |      "metadata": {},
245 |      "output_type": "display_data"
246 |     }
247 |    ],
248 |    "source": [
249 |     "all_files = glob.glob(os.path.join(path, \"data_*.csv\"))\n",
250 |     "display(all_files)\n",
251 |     "for f in all_files:\n",
252 |     "    display(pd.read_csv(f, sep=','))"
253 |    ]
254 |   },
255 |   {
256 |    "cell_type": "markdown",
257 |    "metadata": {},
258 |    "source": [
259 |     "## 1. Steps to merge multiple CSV(identical) files with Python"
260 |    ]
261 |   },
262 |   {
263 |    "cell_type": "code",
264 |    "execution_count": 3,
265 |    "metadata": {},
266 |    "outputs": [],
267 |    "source": [
268 |     "import os, glob\n",
269 |     "import pandas as pd\n",
270 |     "\n",
271 |     "path = \"../../csv/\"\n",
272 |     "#path = \"/home/user/data\"\n",
273 |     "\n",
274 |     "all_files = glob.glob(os.path.join(path, \"data_2019*.csv\"))\n",
275 |     "\n",
276 |     "all_csv = (pd.read_csv(f, sep=',') for f in all_files)\n",
277 |     "df_merged   = pd.concat(all_csv, ignore_index=True)\n",
278 |     "df_merged.to_csv( \"merged.csv\")"
279 |    ]
280 |   },
281 |   {
282 |    "cell_type": "code",
283 |    "execution_count": 4,
284 |    "metadata": {},
285 |    "outputs": [
286 |     {
287 |      "data": {
288 |       "text/html": [
289 |        "<div>\n",
290 |        "<style scoped>\n",
291 |        "    .dataframe tbody tr th:only-of-type {\n",
292 |        "        vertical-align: middle;\n",
293 |        "    }\n",
294 |        "\n",
295 |        "    .dataframe tbody tr th {\n",
296 |        "        vertical-align: top;\n",
297 |        "    }\n",
298 |        "\n",
299 |        "    .dataframe thead th {\n",
300 |        "        text-align: right;\n",
301 |        "    }\n",
302 |        "</style>\n",
303 |        "<table border=\"1\" class=\"dataframe\">\n",
304 |        "  <thead>\n",
305 |        "    <tr style=\"text-align: right;\">\n",
306 |        "      <th></th>\n",
307 |        "      <th>Unnamed: 0</th>\n",
308 |        "      <th>col1</th>\n",
309 |        "      <th>col2</th>\n",
310 |        "      <th>col3</th>\n",
311 |        "    </tr>\n",
312 |        "  </thead>\n",
313 |        "  <tbody>\n",
314 |        "    <tr>\n",
315 |        "      <th>0</th>\n",
316 |        "      <td>0</td>\n",
317 |        "      <td>C</td>\n",
318 |        "      <td>D</td>\n",
319 |        "      <td>3</td>\n",
320 |        "    </tr>\n",
321 |        "    <tr>\n",
322 |        "      <th>1</th>\n",
323 |        "      <td>1</td>\n",
324 |        "      <td>CC</td>\n",
325 |        "      <td>DD</td>\n",
326 |        "      <td>4</td>\n",
327 |        "    </tr>\n",
328 |        "    <tr>\n",
329 |        "      <th>2</th>\n",
330 |        "      <td>2</td>\n",
331 |        "      <td>A</td>\n",
332 |        "      <td>B</td>\n",
333 |        "      <td>1</td>\n",
334 |        "    </tr>\n",
335 |        "    <tr>\n",
336 |        "      <th>3</th>\n",
337 |        "      <td>3</td>\n",
338 |        "      <td>AA</td>\n",
339 |        "      <td>BB</td>\n",
340 |        "      <td>2</td>\n",
341 |        "    </tr>\n",
342 |        "  </tbody>\n",
343 |        "</table>\n",
344 |        "</div>"
345 |       ],
346 |       "text/plain": [
347 |        "   Unnamed: 0 col1 col2  col3\n",
348 |        "0           0    C    D     3\n",
349 |        "1           1   CC   DD     4\n",
350 |        "2           2    A    B     1\n",
351 |        "3           3   AA   BB     2"
352 |       ]
353 |      },
354 |      "execution_count": 4,
355 |      "metadata": {},
356 |      "output_type": "execute_result"
357 |     }
358 |    ],
359 |    "source": [
360 |     "pd.read_csv('merged.csv')"
361 |    ]
362 |   },
363 |   {
364 |    "cell_type": "markdown",
365 |    "metadata": {},
366 |    "source": [
367 |     "## 2. Steps to merge multiple CSV(identical) files with Python with trace"
368 |    ]
369 |   },
370 |   {
371 |    "cell_type": "code",
372 |    "execution_count": 5,
373 |    "metadata": {},
374 |    "outputs": [
375 |     {
376 |      "data": {
377 |       "text/html": [
378 |        "<div>\n",
379 |        "<style scoped>\n",
380 |        "    .dataframe tbody tr th:only-of-type {\n",
381 |        "        vertical-align: middle;\n",
382 |        "    }\n",
383 |        "\n",
384 |        "    .dataframe tbody tr th {\n",
385 |        "        vertical-align: top;\n",
386 |        "    }\n",
387 |        "\n",
388 |        "    .dataframe thead th {\n",
389 |        "        text-align: right;\n",
390 |        "    }\n",
391 |        "</style>\n",
392 |        "<table border=\"1\" class=\"dataframe\">\n",
393 |        "  <thead>\n",
394 |        "    <tr style=\"text-align: right;\">\n",
395 |        "      <th></th>\n",
396 |        "      <th>col1</th>\n",
397 |        "      <th>col2</th>\n",
398 |        "      <th>col3</th>\n",
399 |        "      <th>file</th>\n",
400 |        "    </tr>\n",
401 |        "  </thead>\n",
402 |        "  <tbody>\n",
403 |        "    <tr>\n",
404 |        "      <th>0</th>\n",
405 |        "      <td>C</td>\n",
406 |        "      <td>D</td>\n",
407 |        "      <td>3</td>\n",
408 |        "      <td>data_201902.csv</td>\n",
409 |        "    </tr>\n",
410 |        "    <tr>\n",
411 |        "      <th>1</th>\n",
412 |        "      <td>CC</td>\n",
413 |        "      <td>DD</td>\n",
414 |        "      <td>4</td>\n",
415 |        "      <td>data_201902.csv</td>\n",
416 |        "    </tr>\n",
417 |        "    <tr>\n",
418 |        "      <th>2</th>\n",
419 |        "      <td>A</td>\n",
420 |        "      <td>B</td>\n",
421 |        "      <td>1</td>\n",
422 |        "      <td>data_201901.csv</td>\n",
423 |        "    </tr>\n",
424 |        "    <tr>\n",
425 |        "      <th>3</th>\n",
426 |        "      <td>AA</td>\n",
427 |        "      <td>BB</td>\n",
428 |        "      <td>2</td>\n",
429 |        "      <td>data_201901.csv</td>\n",
430 |        "    </tr>\n",
431 |        "  </tbody>\n",
432 |        "</table>\n",
433 |        "</div>"
434 |       ],
435 |       "text/plain": [
436 |        "  col1 col2  col3             file\n",
437 |        "0    C    D     3  data_201902.csv\n",
438 |        "1   CC   DD     4  data_201902.csv\n",
439 |        "2    A    B     1  data_201901.csv\n",
440 |        "3   AA   BB     2  data_201901.csv"
441 |       ]
442 |      },
443 |      "execution_count": 5,
444 |      "metadata": {},
445 |      "output_type": "execute_result"
446 |     }
447 |    ],
448 |    "source": [
449 |     "import os, glob\n",
450 |     "import pandas as pd\n",
451 |     "\n",
452 |     "path = \"../../csv/\"\n",
453 |     "\n",
454 |     "all_files = glob.glob(os.path.join(path, \"data_2019*.csv\"))\n",
455 |     "\n",
456 |     "all_df = []\n",
457 |     "for f in all_files:\n",
458 |     "    df = pd.read_csv(f, sep=',')\n",
459 |     "    df['file'] = f.split('/')[-1]\n",
460 |     "    all_df.append(df)\n",
461 |     "    \n",
462 |     "merged_df = pd.concat(all_df, ignore_index=True)\n",
463 |     "merged_df"
464 |    ]
465 |   },
466 |   {
467 |    "cell_type": "markdown",
468 |    "metadata": {},
469 |    "source": [
470 |     "## 3. Combine multiple CSV files when the columns are different"
471 |    ]
472 |   },
473 |   {
474 |    "cell_type": "code",
475 |    "execution_count": 6,
476 |    "metadata": {},
477 |    "outputs": [
478 |     {
479 |      "data": {
480 |       "text/html": [
481 |        "<div>\n",
482 |        "<style scoped>\n",
483 |        "    .dataframe tbody tr th:only-of-type {\n",
484 |        "        vertical-align: middle;\n",
485 |        "    }\n",
486 |        "\n",
487 |        "    .dataframe tbody tr th {\n",
488 |        "        vertical-align: top;\n",
489 |        "    }\n",
490 |        "\n",
491 |        "    .dataframe thead th {\n",
492 |        "        text-align: right;\n",
493 |        "    }\n",
494 |        "</style>\n",
495 |        "<table border=\"1\" class=\"dataframe\">\n",
496 |        "  <thead>\n",
497 |        "    <tr style=\"text-align: right;\">\n",
498 |        "      <th></th>\n",
499 |        "      <th>col1</th>\n",
500 |        "      <th>col2</th>\n",
501 |        "      <th>col3</th>\n",
502 |        "      <th>col4</th>\n",
503 |        "      <th>col5</th>\n",
504 |        "      <th>file</th>\n",
505 |        "    </tr>\n",
506 |        "  </thead>\n",
507 |        "  <tbody>\n",
508 |        "    <tr>\n",
509 |        "      <th>0</th>\n",
510 |        "      <td>E</td>\n",
511 |        "      <td>F</td>\n",
512 |        "      <td>5</td>\n",
513 |        "      <td>e5</td>\n",
514 |        "      <td>NaN</td>\n",
515 |        "      <td>data_202001.csv</td>\n",
516 |        "    </tr>\n",
517 |        "    <tr>\n",
518 |        "      <th>1</th>\n",
519 |        "      <td>EE</td>\n",
520 |        "      <td>FF</td>\n",
521 |        "      <td>6</td>\n",
522 |        "      <td>ee6</td>\n",
523 |        "      <td>NaN</td>\n",
524 |        "      <td>data_202001.csv</td>\n",
525 |        "    </tr>\n",
526 |        "    <tr>\n",
527 |        "      <th>2</th>\n",
528 |        "      <td>H</td>\n",
529 |        "      <td>J</td>\n",
530 |        "      <td>7</td>\n",
531 |        "      <td>NaN</td>\n",
532 |        "      <td>77.0</td>\n",
533 |        "      <td>data_202002.csv</td>\n",
534 |        "    </tr>\n",
535 |        "    <tr>\n",
536 |        "      <th>3</th>\n",
537 |        "      <td>HH</td>\n",
538 |        "      <td>JJ</td>\n",
539 |        "      <td>8</td>\n",
540 |        "      <td>NaN</td>\n",
541 |        "      <td>88.0</td>\n",
542 |        "      <td>data_202002.csv</td>\n",
543 |        "    </tr>\n",
544 |        "    <tr>\n",
545 |        "      <th>4</th>\n",
546 |        "      <td>C</td>\n",
547 |        "      <td>D</td>\n",
548 |        "      <td>3</td>\n",
549 |        "      <td>NaN</td>\n",
550 |        "      <td>NaN</td>\n",
551 |        "      <td>data_201902.csv</td>\n",
552 |        "    </tr>\n",
553 |        "    <tr>\n",
554 |        "      <th>5</th>\n",
555 |        "      <td>CC</td>\n",
556 |        "      <td>DD</td>\n",
557 |        "      <td>4</td>\n",
558 |        "      <td>NaN</td>\n",
559 |        "      <td>NaN</td>\n",
560 |        "      <td>data_201902.csv</td>\n",
561 |        "    </tr>\n",
562 |        "    <tr>\n",
563 |        "      <th>6</th>\n",
564 |        "      <td>A</td>\n",
565 |        "      <td>B</td>\n",
566 |        "      <td>1</td>\n",
567 |        "      <td>NaN</td>\n",
568 |        "      <td>NaN</td>\n",
569 |        "      <td>data_201901.csv</td>\n",
570 |        "    </tr>\n",
571 |        "    <tr>\n",
572 |        "      <th>7</th>\n",
573 |        "      <td>AA</td>\n",
574 |        "      <td>BB</td>\n",
575 |        "      <td>2</td>\n",
576 |        "      <td>NaN</td>\n",
577 |        "      <td>NaN</td>\n",
578 |        "      <td>data_201901.csv</td>\n",
579 |        "    </tr>\n",
580 |        "  </tbody>\n",
581 |        "</table>\n",
582 |        "</div>"
583 |       ],
584 |       "text/plain": [
585 |        "  col1 col2  col3 col4  col5             file\n",
586 |        "0    E    F     5   e5   NaN  data_202001.csv\n",
587 |        "1   EE   FF     6  ee6   NaN  data_202001.csv\n",
588 |        "2    H    J     7  NaN  77.0  data_202002.csv\n",
589 |        "3   HH   JJ     8  NaN  88.0  data_202002.csv\n",
590 |        "4    C    D     3  NaN   NaN  data_201902.csv\n",
591 |        "5   CC   DD     4  NaN   NaN  data_201902.csv\n",
592 |        "6    A    B     1  NaN   NaN  data_201901.csv\n",
593 |        "7   AA   BB     2  NaN   NaN  data_201901.csv"
594 |       ]
595 |      },
596 |      "execution_count": 6,
597 |      "metadata": {},
598 |      "output_type": "execute_result"
599 |     }
600 |    ],
601 |    "source": [
602 |     "import os, glob\n",
603 |     "import pandas as pd\n",
604 |     "\n",
605 |     "path = \"../../csv/\"\n",
606 |     "\n",
607 |     "all_files = glob.glob(os.path.join(path, \"data_*.csv\"))\n",
608 |     "\n",
609 |     "\n",
610 |     "all_df = []\n",
611 |     "for f in all_files:\n",
612 |     "    df = pd.read_csv(f, sep=',')\n",
613 |     "    df['file'] = f.split('/')[-1]\n",
614 |     "    all_df.append(df)\n",
615 |     "    \n",
616 |     "merged_df = pd.concat(all_df, ignore_index=True, sort=True)\n",
617 |     "merged_df"
618 |    ]
619 |   },
620 |   {
621 |    "cell_type": "markdown",
622 |    "metadata": {},
623 |    "source": [
624 |     "## 4. Bonus: Merge multiple files with Windows/Linux\n",
625 |     "\n",
626 |     "Linux\n",
627 |     "\n",
628 |     "`sed 1d data_*.csv > merged.csv`\n",
629 |     "\n",
630 |     "Windows\n",
631 |     "\n",
632 |     "`C:\\> copy data_*.csv merged.csv `"
633 |    ]
634 |   },
635 |   {
636 |    "cell_type": "code",
637 |    "execution_count": null,
638 |    "metadata": {},
639 |    "outputs": [],
640 |    "source": []
641 |   }
642 |  ],
643 |  "metadata": {
644 |   "kernelspec": {
645 |    "display_name": "Python 3",
646 |    "language": "python",
647 |    "name": "python3"
648 |   },
649 |   "language_info": {
650 |    "codemirror_mode": {
651 |     "name": "ipython",
652 |     "version": 3
653 |    },
654 |    "file_extension": ".py",
655 |    "mimetype": "text/x-python",
656 |    "name": "python",
657 |    "nbconvert_exporter": "python",
658 |    "pygments_lexer": "ipython3",
659 |    "version": "3.6.9"
660 |   }
661 |  },
662 |  "nbformat": 4,
663 |  "nbformat_minor": 2
664 | }
665 | 


--------------------------------------------------------------------------------
/notebooks/python/JSON/41._Create_a_table_in_MySQL_Database_from_python_dictionary.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "metadata": {},
  6 |    "source": [
  7 |     "# 41. Create a table in SQL(MySQL Database) from python dictionary\n",
  8 |     "\n",
  9 |     "\n",
 10 |     "[Python convert normal JSON to JSON separated lines 3 examples](https://blog.softhints.com/python-convert-json-to-json-lines/)\n",
 11 |     "\n",
 12 |     "* Pandas DataFrame to MySQL\n",
 13 |     "* Create table from Python Dict\n",
 14 |     "* connect MySQL database and Python\n",
 15 |     "    * SQLAlchemy\n",
 16 |     "    * PyMySQL"
 17 |    ]
 18 |   },
 19 |   {
 20 |    "cell_type": "markdown",
 21 |    "metadata": {},
 22 |    "source": [
 23 |     "Python dict which is converted to a Database Table\n",
 24 |     "\n",
 25 |     "```json\n",
 26 |     "{\"id\":1,\"label\":\"A\",\"size\":\"S\"}\n",
 27 |     "{\"id\":2,\"label\":\"B\",\"size\":\"XL\"}\n",
 28 |     "{\"id\":3,\"label\":\"C\",\"size\":\"XXl\"}\n",
 29 |     "```"
 30 |    ]
 31 |   },
 32 |   {
 33 |    "cell_type": "markdown",
 34 |    "metadata": {},
 35 |    "source": [
 36 |     "## Step 1: Read/Create a Python dict"
 37 |    ]
 38 |   },
 39 |   {
 40 |    "cell_type": "code",
 41 |    "execution_count": 1,
 42 |    "metadata": {},
 43 |    "outputs": [
 44 |     {
 45 |      "data": {
 46 |       "text/html": [
 47 |        "<div>\n",
 48 |        "<style scoped>\n",
 49 |        "    .dataframe tbody tr th:only-of-type {\n",
 50 |        "        vertical-align: middle;\n",
 51 |        "    }\n",
 52 |        "\n",
 53 |        "    .dataframe tbody tr th {\n",
 54 |        "        vertical-align: top;\n",
 55 |        "    }\n",
 56 |        "\n",
 57 |        "    .dataframe thead th {\n",
 58 |        "        text-align: right;\n",
 59 |        "    }\n",
 60 |        "</style>\n",
 61 |        "<table border=\"1\" class=\"dataframe\">\n",
 62 |        "  <thead>\n",
 63 |        "    <tr style=\"text-align: right;\">\n",
 64 |        "      <th></th>\n",
 65 |        "      <th>id</th>\n",
 66 |        "      <th>label</th>\n",
 67 |        "      <th>size</th>\n",
 68 |        "    </tr>\n",
 69 |        "  </thead>\n",
 70 |        "  <tbody>\n",
 71 |        "    <tr>\n",
 72 |        "      <th>0</th>\n",
 73 |        "      <td>1</td>\n",
 74 |        "      <td>A</td>\n",
 75 |        "      <td>S</td>\n",
 76 |        "    </tr>\n",
 77 |        "    <tr>\n",
 78 |        "      <th>1</th>\n",
 79 |        "      <td>2</td>\n",
 80 |        "      <td>B</td>\n",
 81 |        "      <td>XL</td>\n",
 82 |        "    </tr>\n",
 83 |        "    <tr>\n",
 84 |        "      <th>2</th>\n",
 85 |        "      <td>3</td>\n",
 86 |        "      <td>C</td>\n",
 87 |        "      <td>XXl</td>\n",
 88 |        "    </tr>\n",
 89 |        "  </tbody>\n",
 90 |        "</table>\n",
 91 |        "</div>"
 92 |       ],
 93 |       "text/plain": [
 94 |        "   id label size\n",
 95 |        "0   1     A    S\n",
 96 |        "1   2     B   XL\n",
 97 |        "2   3     C  XXl"
 98 |       ]
 99 |      },
100 |      "execution_count": 1,
101 |      "metadata": {},
102 |      "output_type": "execute_result"
103 |     }
104 |    ],
105 |    "source": [
106 |     "import pandas as pd\n",
107 |     "\n",
108 |     "# read normal JSON with pandas\n",
109 |     "df = pd.read_json('/home/vanx/Downloads/old/normal_json.json')\n",
110 |     "\n",
111 |     "df.head()"
112 |    ]
113 |   },
114 |   {
115 |    "cell_type": "code",
116 |    "execution_count": 4,
117 |    "metadata": {},
118 |    "outputs": [
119 |     {
120 |      "data": {
121 |       "text/plain": [
122 |        "{'id': {0: 1, 1: 2, 2: 3},\n",
123 |        " 'label': {0: 'A', 1: 'B', 2: 'C'},\n",
124 |        " 'size': {0: 'S', 1: 'XL', 2: 'XXl'}}"
125 |       ]
126 |      },
127 |      "execution_count": 4,
128 |      "metadata": {},
129 |      "output_type": "execute_result"
130 |     }
131 |    ],
132 |    "source": [
133 |     "data_dict = df.to_dict()\n",
134 |     "data_dict"
135 |    ]
136 |   },
137 |   {
138 |    "cell_type": "code",
139 |    "execution_count": 5,
140 |    "metadata": {},
141 |    "outputs": [
142 |     {
143 |      "data": {
144 |       "text/html": [
145 |        "<div>\n",
146 |        "<style scoped>\n",
147 |        "    .dataframe tbody tr th:only-of-type {\n",
148 |        "        vertical-align: middle;\n",
149 |        "    }\n",
150 |        "\n",
151 |        "    .dataframe tbody tr th {\n",
152 |        "        vertical-align: top;\n",
153 |        "    }\n",
154 |        "\n",
155 |        "    .dataframe thead th {\n",
156 |        "        text-align: right;\n",
157 |        "    }\n",
158 |        "</style>\n",
159 |        "<table border=\"1\" class=\"dataframe\">\n",
160 |        "  <thead>\n",
161 |        "    <tr style=\"text-align: right;\">\n",
162 |        "      <th></th>\n",
163 |        "      <th>id</th>\n",
164 |        "      <th>label</th>\n",
165 |        "      <th>size</th>\n",
166 |        "    </tr>\n",
167 |        "  </thead>\n",
168 |        "  <tbody>\n",
169 |        "    <tr>\n",
170 |        "      <th>0</th>\n",
171 |        "      <td>1</td>\n",
172 |        "      <td>A</td>\n",
173 |        "      <td>S</td>\n",
174 |        "    </tr>\n",
175 |        "    <tr>\n",
176 |        "      <th>1</th>\n",
177 |        "      <td>2</td>\n",
178 |        "      <td>B</td>\n",
179 |        "      <td>XL</td>\n",
180 |        "    </tr>\n",
181 |        "    <tr>\n",
182 |        "      <th>2</th>\n",
183 |        "      <td>3</td>\n",
184 |        "      <td>C</td>\n",
185 |        "      <td>XXl</td>\n",
186 |        "    </tr>\n",
187 |        "  </tbody>\n",
188 |        "</table>\n",
189 |        "</div>"
190 |       ],
191 |       "text/plain": [
192 |        "   id label size\n",
193 |        "0   1     A    S\n",
194 |        "1   2     B   XL\n",
195 |        "2   3     C  XXl"
196 |       ]
197 |      },
198 |      "execution_count": 5,
199 |      "metadata": {},
200 |      "output_type": "execute_result"
201 |     }
202 |    ],
203 |    "source": [
204 |     "df2 = pd.DataFrame.from_dict(data_dict)\n",
205 |     "df2.head()"
206 |    ]
207 |   },
208 |   {
209 |    "cell_type": "markdown",
210 |    "metadata": {},
211 |    "source": [
212 |     "## Step 2: Pandas DataFrame to MySQL table with SQLAlchemy"
213 |    ]
214 |   },
215 |   {
216 |    "cell_type": "code",
217 |    "execution_count": 6,
218 |    "metadata": {},
219 |    "outputs": [],
220 |    "source": [
221 |     "# connect\n",
222 |     "from sqlalchemy import create_engine\n",
223 |     "cnx = create_engine('mysql+pymysql://test:pass@localhost/test')    "
224 |    ]
225 |   },
226 |   {
227 |    "cell_type": "code",
228 |    "execution_count": 7,
229 |    "metadata": {},
230 |    "outputs": [],
231 |    "source": [
232 |     "# create table from DataFrame\n",
233 |     "df.to_sql('test', cnx, if_exists='replace', index = False)"
234 |    ]
235 |   },
236 |   {
237 |    "cell_type": "code",
238 |    "execution_count": 8,
239 |    "metadata": {},
240 |    "outputs": [
241 |     {
242 |      "data": {
243 |       "text/html": [
244 |        "<div>\n",
245 |        "<style scoped>\n",
246 |        "    .dataframe tbody tr th:only-of-type {\n",
247 |        "        vertical-align: middle;\n",
248 |        "    }\n",
249 |        "\n",
250 |        "    .dataframe tbody tr th {\n",
251 |        "        vertical-align: top;\n",
252 |        "    }\n",
253 |        "\n",
254 |        "    .dataframe thead th {\n",
255 |        "        text-align: right;\n",
256 |        "    }\n",
257 |        "</style>\n",
258 |        "<table border=\"1\" class=\"dataframe\">\n",
259 |        "  <thead>\n",
260 |        "    <tr style=\"text-align: right;\">\n",
261 |        "      <th></th>\n",
262 |        "      <th>id</th>\n",
263 |        "      <th>label</th>\n",
264 |        "      <th>size</th>\n",
265 |        "    </tr>\n",
266 |        "  </thead>\n",
267 |        "  <tbody>\n",
268 |        "    <tr>\n",
269 |        "      <th>0</th>\n",
270 |        "      <td>1</td>\n",
271 |        "      <td>A</td>\n",
272 |        "      <td>S</td>\n",
273 |        "    </tr>\n",
274 |        "    <tr>\n",
275 |        "      <th>1</th>\n",
276 |        "      <td>2</td>\n",
277 |        "      <td>B</td>\n",
278 |        "      <td>XL</td>\n",
279 |        "    </tr>\n",
280 |        "    <tr>\n",
281 |        "      <th>2</th>\n",
282 |        "      <td>3</td>\n",
283 |        "      <td>C</td>\n",
284 |        "      <td>XXl</td>\n",
285 |        "    </tr>\n",
286 |        "  </tbody>\n",
287 |        "</table>\n",
288 |        "</div>"
289 |       ],
290 |       "text/plain": [
291 |        "   id label size\n",
292 |        "0   1     A    S\n",
293 |        "1   2     B   XL\n",
294 |        "2   3     C  XXl"
295 |       ]
296 |      },
297 |      "execution_count": 8,
298 |      "metadata": {},
299 |      "output_type": "execute_result"
300 |     }
301 |    ],
302 |    "source": [
303 |     "# query table\n",
304 |     "df = pd.read_sql('SELECT * FROM test', cnx)\n",
305 |     "df.head()"
306 |    ]
307 |   },
308 |   {
309 |    "cell_type": "markdown",
310 |    "metadata": {},
311 |    "source": [
312 |     "## Step 3: Python Dict Insert Records Into a MySQL Database"
313 |    ]
314 |   },
315 |   {
316 |    "cell_type": "code",
317 |    "execution_count": 18,
318 |    "metadata": {},
319 |    "outputs": [],
320 |    "source": [
321 |     "# connect\n",
322 |     "import pymysql\n",
323 |     "\n",
324 |     "connection = pymysql.connect(host='localhost',\n",
325 |     "                             user='test',\n",
326 |     "                             password='pass',\n",
327 |     "                             db='test')\n",
328 |     "cursor = connection.cursor()"
329 |    ]
330 |   },
331 |   {
332 |    "cell_type": "code",
333 |    "execution_count": 19,
334 |    "metadata": {},
335 |    "outputs": [
336 |     {
337 |      "data": {
338 |       "text/plain": [
339 |        "0"
340 |       ]
341 |      },
342 |      "execution_count": 19,
343 |      "metadata": {},
344 |      "output_type": "execute_result"
345 |     }
346 |    ],
347 |    "source": [
348 |     "# Create table\n",
349 |     "cols = df.columns\n",
350 |     "table_name = 'test'\n",
351 |     "ddl = \"\"\n",
352 |     "for col in cols:\n",
353 |     "    ddl += \"`{}` text,\".format(col)\n",
354 |     "\n",
355 |     "sql_create = \"CREATE TABLE IF NOT EXISTS `{}` ({}) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin;\".format(table_name, ddl[:-1])\n",
356 |     "cursor.execute(sql_create)"
357 |    ]
358 |   },
359 |   {
360 |    "cell_type": "code",
361 |    "execution_count": 20,
362 |    "metadata": {},
363 |    "outputs": [],
364 |    "source": [
365 |     "# insert data\n",
366 |     "cols = \"`,`\".join([str(i) for i in df.columns.tolist()])\n",
367 |     "\n",
368 |     "# insert dict records .\n",
369 |     "for i,row in df.iterrows():\n",
370 |     "    sql = \"INSERT INTO `test` (`\" +cols + \"`) VALUES (\" + \"%s,\"*(len(row)-1) + \"%s)\"\n",
371 |     "    cursor.execute(sql, tuple(row))\n",
372 |     "    connection.commit()"
373 |    ]
374 |   },
375 |   {
376 |    "cell_type": "code",
377 |    "execution_count": 21,
378 |    "metadata": {},
379 |    "outputs": [
380 |     {
381 |      "name": "stdout",
382 |      "output_type": "stream",
383 |      "text": [
384 |       "('1', 'A', 'S')\n",
385 |       "('2', 'B', 'XL')\n",
386 |       "('3', 'C', 'XXl')\n"
387 |      ]
388 |     }
389 |    ],
390 |    "source": [
391 |     "# read\n",
392 |     "sql = \"SELECT * FROM test\"\n",
393 |     "cursor.execute(sql)\n",
394 |     "result = cursor.fetchall()\n",
395 |     "for i in result:\n",
396 |     "    print(i)"
397 |    ]
398 |   }
399 |  ],
400 |  "metadata": {
401 |   "kernelspec": {
402 |    "display_name": "Python 3",
403 |    "language": "python",
404 |    "name": "python3"
405 |   },
406 |   "language_info": {
407 |    "codemirror_mode": {
408 |     "name": "ipython",
409 |     "version": 3
410 |    },
411 |    "file_extension": ".py",
412 |    "mimetype": "text/x-python",
413 |    "name": "python",
414 |    "nbconvert_exporter": "python",
415 |    "pygments_lexer": "ipython3",
416 |    "version": "3.6.9"
417 |   }
418 |  },
419 |  "nbformat": 4,
420 |  "nbformat_minor": 2
421 | }
422 | 


--------------------------------------------------------------------------------
/notebooks/python/JSON/42._Convert_MySQL_table_to_Pandas_DataFrame_Python_dictionary.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "metadata": {},
  6 |    "source": [
  7 |     "# 42. Convert MySQL table to Pandas DataFrame(Python dictionary)\n",
  8 |     "\n",
  9 |     "\n",
 10 |     "[How to Convert MySQL Table to Pandas DataFrame / Python Dictionary](https://blog.softhints.com/convert-mysql-table-pandas-dataframe-python-dictionary/)\n",
 11 |     "\n",
 12 |     "* [PyMySQL](https://pypi.org/project/PyMySQL/) + [SQLAlchemy](https://pypi.org/project/SQLAlchemy/) - the shortest and easiest way to convert MySQL table to Python dict\n",
 13 |     "* [mysql.connector](https://pypi.org/project/mysql-connector-python/)\n",
 14 |     "* [pyodbc](https://pypi.org/project/pyodbc/) in order to connect to MySQL database, read table and convert it to DataFrame or Python dict."
 15 |    ]
 16 |   },
 17 |   {
 18 |    "cell_type": "markdown",
 19 |    "metadata": {},
 20 |    "source": [
 21 |     "![](https://blog.softhints.com/content/images/2020/11/MySQL_table_to_Pandas_DataFrame_to_Python_dict.png)"
 22 |    ]
 23 |   },
 24 |   {
 25 |    "cell_type": "code",
 26 |    "execution_count": 7,
 27 |    "metadata": {},
 28 |    "outputs": [],
 29 |    "source": [
 30 |     "password = ''"
 31 |    ]
 32 |   },
 33 |   {
 34 |    "cell_type": "markdown",
 35 |    "metadata": {},
 36 |    "source": [
 37 |     "## 1: Convert MySQL Table to DataFrame with PyMySQL + SQLAlchemy "
 38 |    ]
 39 |   },
 40 |   {
 41 |    "cell_type": "code",
 42 |    "execution_count": 2,
 43 |    "metadata": {},
 44 |    "outputs": [
 45 |     {
 46 |      "data": {
 47 |       "text/plain": [
 48 |        "{'id': {0: 1, 1: 2, 2: 3, 3: 4, 4: 5},\n",
 49 |        " 'name': {0: 'Emma', 1: 'Ann', 2: 'Kim', 3: 'Olivia', 4: 'Victoria'}}"
 50 |       ]
 51 |      },
 52 |      "execution_count": 2,
 53 |      "metadata": {},
 54 |      "output_type": "execute_result"
 55 |     }
 56 |    ],
 57 |    "source": [
 58 |     "from sqlalchemy import create_engine\n",
 59 |     "import pymysql\n",
 60 |     "import pandas as pd\n",
 61 |     "\n",
 62 |     "db_connection_str = 'mysql+pymysql://root:' + password + '@localhost:3306/test'\n",
 63 |     "db_connection = create_engine(db_connection_str)\n",
 64 |     "\n",
 65 |     "df = pd.read_sql('SELECT * FROM girls', con=db_connection)\n",
 66 |     "df.to_dict()"
 67 |    ]
 68 |   },
 69 |   {
 70 |    "cell_type": "code",
 71 |    "execution_count": 3,
 72 |    "metadata": {},
 73 |    "outputs": [
 74 |     {
 75 |      "data": {
 76 |       "text/plain": [
 77 |        "[{'id': 1, 'name': 'Emma'},\n",
 78 |        " {'id': 2, 'name': 'Ann'},\n",
 79 |        " {'id': 3, 'name': 'Kim'},\n",
 80 |        " {'id': 4, 'name': 'Olivia'},\n",
 81 |        " {'id': 5, 'name': 'Victoria'}]"
 82 |       ]
 83 |      },
 84 |      "execution_count": 3,
 85 |      "metadata": {},
 86 |      "output_type": "execute_result"
 87 |     }
 88 |    ],
 89 |    "source": [
 90 |     "df.to_dict('records')"
 91 |    ]
 92 |   },
 93 |   {
 94 |    "cell_type": "code",
 95 |    "execution_count": 4,
 96 |    "metadata": {},
 97 |    "outputs": [
 98 |     {
 99 |      "data": {
100 |       "text/plain": [
101 |        "{'id': [1, 2, 3, 4, 5], 'name': ['Emma', 'Ann', 'Kim', 'Olivia', 'Victoria']}"
102 |       ]
103 |      },
104 |      "execution_count": 4,
105 |      "metadata": {},
106 |      "output_type": "execute_result"
107 |     }
108 |    ],
109 |    "source": [
110 |     "df.to_dict('list')"
111 |    ]
112 |   },
113 |   {
114 |    "cell_type": "code",
115 |    "execution_count": 5,
116 |    "metadata": {},
117 |    "outputs": [
118 |     {
119 |      "data": {
120 |       "text/plain": [
121 |        "{0: {'id': 1, 'name': 'Emma'},\n",
122 |        " 1: {'id': 2, 'name': 'Ann'},\n",
123 |        " 2: {'id': 3, 'name': 'Kim'},\n",
124 |        " 3: {'id': 4, 'name': 'Olivia'},\n",
125 |        " 4: {'id': 5, 'name': 'Victoria'}}"
126 |       ]
127 |      },
128 |      "execution_count": 5,
129 |      "metadata": {},
130 |      "output_type": "execute_result"
131 |     }
132 |    ],
133 |    "source": [
134 |     "df.to_dict('index')"
135 |    ]
136 |   },
137 |   {
138 |    "cell_type": "markdown",
139 |    "metadata": {},
140 |    "source": [
141 |     "## 2: Convert MySQL Table to DataFrame with mysql.connector"
142 |    ]
143 |   },
144 |   {
145 |    "cell_type": "code",
146 |    "execution_count": 6,
147 |    "metadata": {},
148 |    "outputs": [
149 |     {
150 |      "data": {
151 |       "text/plain": [
152 |        "{0: {0: 1, 1: 2, 2: 3, 3: 4, 4: 5},\n",
153 |        " 1: {0: bytearray(b'Emma'),\n",
154 |        "  1: bytearray(b'Ann'),\n",
155 |        "  2: bytearray(b'Kim'),\n",
156 |        "  3: bytearray(b'Olivia'),\n",
157 |        "  4: bytearray(b'Victoria')}}"
158 |       ]
159 |      },
160 |      "execution_count": 6,
161 |      "metadata": {},
162 |      "output_type": "execute_result"
163 |     }
164 |    ],
165 |    "source": [
166 |     "import pandas as pd\n",
167 |     "import mysql.connector\n",
168 |     "\n",
169 |     "# Setup MySQL connection\n",
170 |     "db = mysql.connector.connect(\n",
171 |     "    host=\"localhost\",              # your host, usually localhost\n",
172 |     "    user=\"root\",            # your username\n",
173 |     "    password=password,        # your password\n",
174 |     "    database=\"test\"     # name of the data base\n",
175 |     ")   \n",
176 |     "\n",
177 |     "# You must create a Cursor object. It will let you execute all the queries you need\n",
178 |     "cur = db.cursor()\n",
179 |     "\n",
180 |     "# Use all the SQL you like\n",
181 |     "cur.execute(\"SELECT * FROM girls\")\n",
182 |     "\n",
183 |     "# Put it all to a data frame\n",
184 |     "df_sql_data = pd.DataFrame(cur.fetchall())\n",
185 |     "\n",
186 |     "# Close the session\n",
187 |     "db.close()\n",
188 |     "\n",
189 |     "# Show the data\n",
190 |     "df_sql_data.to_dict()"
191 |    ]
192 |   },
193 |   {
194 |    "cell_type": "code",
195 |    "execution_count": null,
196 |    "metadata": {},
197 |    "outputs": [],
198 |    "source": []
199 |   }
200 |  ],
201 |  "metadata": {
202 |   "kernelspec": {
203 |    "display_name": "Python 3",
204 |    "language": "python",
205 |    "name": "python3"
206 |   },
207 |   "language_info": {
208 |    "codemirror_mode": {
209 |     "name": "ipython",
210 |     "version": 3
211 |    },
212 |    "file_extension": ".py",
213 |    "mimetype": "text/x-python",
214 |    "name": "python",
215 |    "nbconvert_exporter": "python",
216 |    "pygments_lexer": "ipython3",
217 |    "version": "3.8.4"
218 |   }
219 |  },
220 |  "nbformat": 4,
221 |  "nbformat_minor": 2
222 | }
223 | 


--------------------------------------------------------------------------------
/notebooks/python_problems/Python_problems_for_beginners_1.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "metadata": {},
  6 |    "source": [
  7 |     "## Python problems for beginners"
  8 |    ]
  9 |   },
 10 |   {
 11 |    "cell_type": "markdown",
 12 |    "metadata": {},
 13 |    "source": [
 14 |     "## Problem 1 Triangle\n",
 15 |     "\n",
 16 |     "Write a simple program that demonstrate star pattern in Python 3.x for any n:\n",
 17 |     "\n",
 18 |     "Example n=5\n",
 19 |     "\n",
 20 |     "    * \n",
 21 |     "    * * \n",
 22 |     "    * * * \n",
 23 |     "    * * * * \n",
 24 |     "    * * * * * "
 25 |    ]
 26 |   },
 27 |   {
 28 |    "cell_type": "code",
 29 |    "execution_count": 20,
 30 |    "metadata": {},
 31 |    "outputs": [
 32 |     {
 33 |      "name": "stdout",
 34 |      "output_type": "stream",
 35 |      "text": [
 36 |       "\n",
 37 |       "* \n",
 38 |       "* * \n",
 39 |       "* * * \n",
 40 |       "* * * * \n",
 41 |       "* * * * * \n"
 42 |      ]
 43 |     }
 44 |    ],
 45 |    "source": [
 46 |     "n = 5\n",
 47 |     "\n",
 48 |     "for i in range(0, n+1):\n",
 49 |     "    print('* ' * i)"
 50 |    ]
 51 |   },
 52 |   {
 53 |    "cell_type": "code",
 54 |    "execution_count": 22,
 55 |    "metadata": {},
 56 |    "outputs": [
 57 |     {
 58 |      "name": "stdout",
 59 |      "output_type": "stream",
 60 |      "text": [
 61 |       "7\n"
 62 |      ]
 63 |     }
 64 |    ],
 65 |    "source": [
 66 |     "n = n +2\n",
 67 |     "print(n)"
 68 |    ]
 69 |   },
 70 |   {
 71 |    "cell_type": "code",
 72 |    "execution_count": 33,
 73 |    "metadata": {},
 74 |    "outputs": [
 75 |     {
 76 |      "name": "stdout",
 77 |      "output_type": "stream",
 78 |      "text": [
 79 |       "ddd"
 80 |      ]
 81 |     }
 82 |    ],
 83 |    "source": [
 84 |     "for x in ['a', 's', 'd']:\n",
 85 |     "    print('d', end='')"
 86 |    ]
 87 |   },
 88 |   {
 89 |    "cell_type": "code",
 90 |    "execution_count": 31,
 91 |    "metadata": {},
 92 |    "outputs": [
 93 |     {
 94 |      "data": {
 95 |       "text/plain": [
 96 |        "[3, 6, 9]"
 97 |       ]
 98 |      },
 99 |      "execution_count": 31,
100 |      "metadata": {},
101 |      "output_type": "execute_result"
102 |     }
103 |    ],
104 |    "source": [
105 |     "list(range(3,10,3))"
106 |    ]
107 |   },
108 |   {
109 |    "cell_type": "markdown",
110 |    "metadata": {},
111 |    "source": [
112 |     "## Problem 2 Triangle with numbers\n",
113 |     "\n",
114 |     "Write a simple program that demonstrate triangle (with numbers 0..n per line) in Python 3.x for any n:\n",
115 |     "\n",
116 |     "Example n=4\n",
117 |     "\n",
118 |     "    1 \n",
119 |     "    1 2 \n",
120 |     "    1 2 3 \n",
121 |     "    1 2 3 4"
122 |    ]
123 |   },
124 |   {
125 |    "cell_type": "code",
126 |    "execution_count": 38,
127 |    "metadata": {},
128 |    "outputs": [
129 |     {
130 |      "name": "stdout",
131 |      "output_type": "stream",
132 |      "text": [
133 |       "x\n",
134 |       "\n",
135 |       "x\n",
136 |       "y\n",
137 |       "1 \n",
138 |       "x\n",
139 |       "y\n",
140 |       "1 y\n",
141 |       "2 \n",
142 |       "x\n",
143 |       "y\n",
144 |       "1 y\n",
145 |       "2 y\n",
146 |       "3 \n",
147 |       "x\n",
148 |       "y\n",
149 |       "1 y\n",
150 |       "2 y\n",
151 |       "3 y\n",
152 |       "4 \n"
153 |      ]
154 |     }
155 |    ],
156 |    "source": [
157 |     "n = 4\n",
158 |     "\n",
159 |     "for i in range(0, n+1):\n",
160 |     "    for j in range(1, i + 1):\n",
161 |     "        print(j, end=' ')\n",
162 |     "    print()"
163 |    ]
164 |   },
165 |   {
166 |    "cell_type": "markdown",
167 |    "metadata": {},
168 |    "source": [
169 |     "## Homework 1 Triangle with letters    \n",
170 |     "\n",
171 |     "Write a simple program that demonstrate triangle (with consequtive letters) in Python 3.x for any n:\n",
172 |     "\n",
173 |     "Example n=4\n",
174 |     "\n",
175 |     "    A \n",
176 |     "    B C \n",
177 |     "    D E F \n",
178 |     "    G H I J \n",
179 |     "    K L M N O "
180 |    ]
181 |   },
182 |   {
183 |    "cell_type": "markdown",
184 |    "metadata": {},
185 |    "source": [
186 |     "## Homework 2 Diagonal of numbers\n",
187 |     "\n",
188 |     "Write a simple program that demonstrate diagonal pattern in Python 3.x for any n:\n",
189 |     "\n",
190 |     "Example n=4\n",
191 |     "\n",
192 |     "    0\n",
193 |     "     1\n",
194 |     "      2\n",
195 |     "       3\n",
196 |     "        4"
197 |    ]
198 |   },
199 |   {
200 |    "cell_type": "markdown",
201 |    "metadata": {},
202 |    "source": [
203 |     "## Homework 3 Pyramid\n",
204 |     "\n",
205 |     "Write a simple program that demonstrate pyramid pattern in Python 3.x for any n:\n",
206 |     "\n",
207 |     "Example n=3\n",
208 |     "\n",
209 |     "        * \n",
210 |     "      * * * \n",
211 |     "    * * * * * "
212 |    ]
213 |   },
214 |   {
215 |    "cell_type": "code",
216 |    "execution_count": null,
217 |    "metadata": {},
218 |    "outputs": [],
219 |    "source": [
220 |     "0 - 1\n",
221 |     "1 - 3\n",
222 |     "2 - 5\n",
223 |     "\n",
224 |     "2 * i + 1"
225 |    ]
226 |   },
227 |   {
228 |    "cell_type": "code",
229 |    "execution_count": 64,
230 |    "metadata": {},
231 |    "outputs": [
232 |     {
233 |      "name": "stdout",
234 |      "output_type": "stream",
235 |      "text": [
236 |       "    *    \n",
237 |       "  * * *  \n",
238 |       "* * * * * \n"
239 |      ]
240 |     }
241 |    ],
242 |    "source": [
243 |     "n = 3\n",
244 |     "\n",
245 |     "for i in range(n):\n",
246 |     "    row = '* ' * (2 * i + 1) # calc the * for a given row based formula\n",
247 |     "    print(row.center(n * 3))"
248 |    ]
249 |   },
250 |   {
251 |    "cell_type": "code",
252 |    "execution_count": 70,
253 |    "metadata": {},
254 |    "outputs": [
255 |     {
256 |      "name": "stdout",
257 |      "output_type": "stream",
258 |      "text": [
259 |       "    *\n",
260 |       "   ***\n",
261 |       "  *****\n",
262 |       " *******\n",
263 |       "*********\n"
264 |      ]
265 |     }
266 |    ],
267 |    "source": [
268 |     "n = 5\n",
269 |     "\n",
270 |     "for i in range(n):\n",
271 |     "    print( ' ' * (n-i-1), end='')\n",
272 |     "    print('*' * (2 * i + 1))"
273 |    ]
274 |   }
275 |  ],
276 |  "metadata": {
277 |   "kernelspec": {
278 |    "display_name": "Python 3",
279 |    "language": "python",
280 |    "name": "python3"
281 |   },
282 |   "language_info": {
283 |    "codemirror_mode": {
284 |     "name": "ipython",
285 |     "version": 3
286 |    },
287 |    "file_extension": ".py",
288 |    "mimetype": "text/x-python",
289 |    "name": "python",
290 |    "nbconvert_exporter": "python",
291 |    "pygments_lexer": "ipython3",
292 |    "version": "3.6.7"
293 |   }
294 |  },
295 |  "nbformat": 4,
296 |  "nbformat_minor": 2
297 | }
298 | 


--------------------------------------------------------------------------------
/scripts/1.python_wrap_lines.py:
--------------------------------------------------------------------------------
 1 | import os
 2 | 
 3 | size = 80
 4 | file = 'budo'
 5 | folder = os.path.expanduser('~/Documents/Fortunes/')
 6 | 
 7 | # Read and store the entire file line by line
 8 | with open(f'{folder}{file}.txt') as reader:
 9 |     provers = reader.readlines()
10 | 
11 | # wrap/collate lines by separators [",", " ", "."]
12 | def collate(text, size):
13 |     new_text = []
14 |     split_char = 1
15 |     while split_char > 0:
16 |         comma = str.find(text, ',', size)
17 |         space = str.find(text, ' ', size)
18 |         dot = str.find(text, '.', size)
19 | 
20 |         split_char = min(max(comma, dot), max(comma, space), max(dot, space))
21 | 
22 |         if text[:split_char]:
23 |             new_text.append(text[:split_char])
24 |         text = text[split_char+1:].replace('\n', "")
25 | 
26 |     return new_text
27 | 
28 | # write collated information to new(same) file
29 | with open(f'{folder}{file}.txt', 'w') as writer:
30 |     for wisdom in provers:
31 |         if len(wisdom) > size:
32 |             collated = collate(wisdom, size)
33 |             for short in collated:
34 |                 writer.write(short)
35 |                 writer.write('\n')
36 |         else:
37 |             writer.write(wisdom)
38 | 
39 | # Executing Shell Commands with Python
40 | import os
41 | myCmd = f'strfile -c % {folder}{file}.txt {folder}{file}.txt.dat'
42 | os.system(myCmd)


--------------------------------------------------------------------------------
/scripts/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/softhints/python/a256a054d74ca397f41874b3e26f1c4b84214432/scripts/__init__.py


--------------------------------------------------------------------------------
/test.py:
--------------------------------------------------------------------------------
1 | import urllib.parse
2 | 
3 | f = '25 Pandas Create A Matplotlib Scatterplot From A Dataframe '
4 | ff = urllib.parse.quote_plus(f)
5 | print(ff.replace('+', '_'))


--------------------------------------------------------------------------------