├── 22__pandas-how-to-filter-results-of-value_counts.patch ├── README.md ├── notebooks ├── Books │ └── Think Python │ │ ├── Chapter_3_Functions_1.ipynb │ │ ├── Chapter_4__Case_study_interface_design.ipynb │ │ ├── Chapter_5__Conditionals_and_recursion.ipynb │ │ ├── Chapter_6__Fruitful_functions.ipynb │ │ ├── Think_Python_Chapter_10__Lists.ipynb │ │ ├── Think_Python_Chapter_11__Dictionaries.ipynb │ │ ├── Think_Python_Chapter_12__Tuples.ipynb │ │ ├── Think_Python_Chapter_7__Iteration.ipynb │ │ ├── Think_Python_Chapter_8__Strings.ipynb │ │ ├── Think_Python_Chapter_9__Case_study_A_word_play.ipynb │ │ ├── ch7_debug.py │ │ └── strings_in_python.png ├── DataFrame_column_transformations.ipynb ├── Dataframe_to_json_nested.ipynb ├── How_to_extract_information_from_excel_with_Python_and_Pandas.ipynb ├── IPython tricks 2019.ipynb ├── Image_validation_with_Python.ipynb ├── Load_multiple_CSV_files_into_a_single _Dataframe.ipynb ├── Pandas count and percentage by value for a column.ipynb ├── Pandas is column is contained in another column in the same row.ipynb ├── Pandas search in column, every column and regex.ipynb ├── Python Extract Table from PDF.ipynb ├── Python group and sort a list of lists by a specific index,pattern.ipynb ├── Python_group_or_sort_list_of_lists_by_common_element.ipynb ├── Q&A │ └── Questions_and_Answers_1_Improve_OCR_and_tabula_range.ipynb ├── Scrape wiki tables with pandas and python.ipynb ├── What_is_the_usage_of_*_asterisk_in_Python.ipynb ├── csv │ ├── data.csv.zip │ ├── data_201901.csv │ ├── data_201902.csv │ ├── data_202001.csv │ ├── data_202002.csv │ └── excel │ │ └── example.xlsx ├── pandas │ ├── 20._Pandas_-_value_counts_-_multiple_columns%2C_all_columns_and_bad_data.ipynb │ ├── 21. pandas-dataframe-sampling-rows-or-columns.ipynb │ ├── 22.pandas-how-to-filter-results-of-value_counts.ipynb │ ├── 23.pandas-typeerror-unhashable-type-list-dict.ipynb │ ├── 24-pandas-check-value-column-contained-another-column-same-row.ipynb │ ├── 25_Pandas_Create_A_Matplotlib_Scatterplot_From_A_Dataframe.ipynb │ ├── 26.pandas-display-all-columns-and-show-more-rows.ipynb │ ├── How_to_Optimize_and_Speed_Up_Pandas.ipynb │ ├── Pandas_Crosstab_-_cross_tabulation_of_two_factors_examples.ipynb │ ├── Pandas_How_add_new_column_existing_DataFrame.ipynb │ ├── Pandas_Select_rows_between_two_dates_-_DataFrame_or_CSV_file.ipynb │ ├── Pandas_compare_columns_in_two_Dataframes.ipynb │ ├── Pandas_count_values_in_a_column_of_type_list.ipynb │ ├── Pandas_extract_url_or_dates_from_column.ipynb │ ├── Python_Pandas_find_and_drop_duplicate_data.ipynb │ ├── map_the_headers_to_a_column_with_pandas.ipynb │ └── pandas-use-list-values-select-rows-column.ipynb ├── python │ ├── Files │ │ └── How_to_merge_multiple_CSV_files_with_Python.ipynb │ └── JSON │ │ ├── 41._Create_a_table_in_MySQL_Database_from_python_dictionary.ipynb │ │ └── 42._Convert_MySQL_table_to_Pandas_DataFrame_Python_dictionary.ipynb ├── python_problems │ └── Python_problems_for_beginners_1.ipynb └── youtube │ └── Youtube-PewDiePie.ipynb ├── scripts ├── 1.python_wrap_lines.py └── __init__.py └── test.py /22__pandas-how-to-filter-results-of-value_counts.patch: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/softhints/python/a256a054d74ca397f41874b3e26f1c4b84214432/22__pandas-how-to-filter-results-of-value_counts.patch -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # python 2 | Jupyter notebooks and datasets for the interesting pandas/python/data science video series. 3 | 4 | # Contribution 5 | 6 | Feel free to contribute or suggest new ideas. To get in touch write on [mail](mailto:grouprivl@gmail.com?subject=[GitHub]%20Source%20Python). 7 | 8 | You can find nice guide about GitHub contribution: 9 | * [Contributing to projects](https://docs.github.com/en/get-started/quickstart/contributing-to-projects) 10 | * [Step-by-step guide to contributing on GitHub](https://www.dataschool.io/how-to-contribute-on-github/) 11 | 12 | # Who is this repo for? 13 | 14 | For people who are interested in data science, data analysis and finding interesting insights for data. This repository is related to sites: 15 | * [DataScientYst.com - Data Science Tutorials, Exercises, Guides, Videos with Python and Pandas](https://datascientyst.com/) 16 | * [SoftHints.com - Python, Pandas, Linux, SQL Tutorials and Guides](https://softhints.com/) 17 | 18 | where you can find more interesting articles. 19 | 20 | New website dedicated to Pandas and Data Science was started: https://datascientyst.com/. It has better organization and covers topics in many areas. 21 | 22 | 23 | The youtube channel is: 24 | 25 | * [SoftHints Youtube](https://www.youtube.com/@softhints/) 26 | * [Popular Videos](https://www.youtube.com/@softhints/videos) 27 | 28 | # Latest Videos 29 | 30 | ## Pandas 31 | 32 | 0. [Pandas Tutorial : How to split columns of dataframe](https://www.youtube.com/watch?v=cCoGsFVPVh0&list=PLeicpQTG639FTJ-daMp7YWmQLBH3zumXv) 33 | 1. [Pandas Tutorial : How to split dataframe by string or date](https://www.youtube.com/watch?v=7sgDvC4k6Xg&list=PLeicpQTG639FTJ-daMp7YWmQLBH3zumXv) 34 | 2. [Easily extract tables from websites with pandas and python](https://www.youtube.com/watch?v=OXA_ZD1gR6A&list=PLeicpQTG639FTJ-daMp7YWmQLBH3zumXv) 35 | 3. [Easily extract information from excel with Python and Pandas](https://www.youtube.com/watch?v=hJMH_1o8eU0&list=PLeicpQTG639FTJ-daMp7YWmQLBH3zumXv) 36 | 4. [Extract tabular data from PDF with Python - Tabula, Camelot, PyPDF2](https://www.youtube.com/watch?v=702lkQbZx50&list=PLeicpQTG639FTJ-daMp7YWmQLBH3zumXv) 37 | 5. [Pandas is column part of another column in the same row of dataframe](https://www.youtube.com/watch?v=duOHHDqI40c&list=PLeicpQTG639FTJ-daMp7YWmQLBH3zumXv) 38 | 6. [Load multiple CSV files into a single Dataframe](https://www.youtube.com/watch?v=30ndwJm1I5c&list=PLeicpQTG639FTJ-daMp7YWmQLBH3zumXv) 39 | 7. [Analyze top youtube channels 2019 with pandas - PewDiePie I](https://www.youtube.com/watch?v=mG9OnH9R5yM&list=PLeicpQTG639FTJ-daMp7YWmQLBH3zumXv) 40 | 8. [dataframe column transformations ( str, int, category, concat)](https://www.youtube.com/watch?v=5pbRivDYzko&list=PLeicpQTG639FTJ-daMp7YWmQLBH3zumXv) 41 | 9. [Pandas DataFrame generate n-level hierarchical JSON](https://www.youtube.com/watch?v=lCcE-0bykRU&list=PLeicpQTG639FTJ-daMp7YWmQLBH3zumXv) 42 | 10. [Pandas How add new column existing DataFrame](https://www.youtube.com/watch?v=UvCO5gKQqtE&list=PLeicpQTG639FTJ-daMp7YWmQLBH3zumXv) 43 | 11. [Python Pandas find and drop duplicate data](https://www.youtube.com/watch?v=4ixLp8aFomw&list=PLeicpQTG639FTJ-daMp7YWmQLBH3zumXv) 44 | 12. [Map the headers to a column with pandas?](https://www.youtube.com/watch?v=3g6KG_8zq0E&list=PLeicpQTG639FTJ-daMp7YWmQLBH3zumXv) 45 | 13. [Pandas count values in a column of type list](https://www.youtube.com/watch?v=lx7KFd6BPcg&list=PLeicpQTG639FTJ-daMp7YWmQLBH3zumXv) 46 | 14. [How to Optimize and Speed Up Pandas](https://www.youtube.com/watch?v=nW5ltiwV-6Y&list=PLeicpQTG639FTJ-daMp7YWmQLBH3zumXv) 47 | 15. [Pandas count and percentage by value for a column](https://www.youtube.com/watch?v=P5pxJkv71BU&list=PLeicpQTG639FTJ-daMp7YWmQLBH3zumXv) 48 | 16. [Pandas use a list of values to select rows from a column](https://www.youtube.com/watch?v=jlSbo5wmTPQ&list=PLeicpQTG639FTJ-daMp7YWmQLBH3zumXv) 49 | 50 | 51 | ## python 52 | 53 | 0. [python string split by separator](https://www.youtube.com/watch?v=iBsg75W2Vig&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C) 54 | 1. [python random number generation examples](https://www.youtube.com/watch?v=WDTnZgSreL4&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C) 55 | 2. [bilingual programming education in java and python](https://www.youtube.com/watch?v=eEHBjP06WSI&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C) 56 | 3. [biggest programmer salaries 2018](https://www.youtube.com/watch?v=X2bUUkWC7dE&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C) 57 | 4. [python extract text from image or pdf](https://www.youtube.com/watch?v=PK-GvWWQ03g&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C) 58 | 5. [Python read validate and import CSV JSON file to MySQL](https://www.youtube.com/watch?v=WbW0rHCX2UU&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C) 59 | 6. [python regex match date](https://www.youtube.com/watch?v=o8Je7hPgsdU&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C) 60 | 7. [python regex cheat sheet with examples](https://www.youtube.com/watch?v=o_CSmob64uU&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C) 61 | 8. [python string methods tutorial](https://www.youtube.com/watch?v=7yuPVq9DtV0&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C) 62 | 9. [python shuffle list](https://www.youtube.com/watch?v=WFRBxz6AeZI&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C) 63 | 10. [Easy install of Python and PyCharm on Windows](https://www.youtube.com/watch?v=cDOlBRzHRI0&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C) 64 | 11. [learn python for beginners complete tutorial 2018](https://www.youtube.com/watch?v=hnc3bGtYQsQ&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C) 65 | 12. [think python chaper 2](https://www.youtube.com/watch?v=A6EIl677ntQ&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C) 66 | 13. [Python/Java bad and good code comments examples](https://www.youtube.com/watch?v=SRCToEkq7to&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C) 67 | 14. [intellij pycharm surround string quote](https://www.youtube.com/watch?v=AgRHEGB8Urs&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C) 68 | 15. [Top Five Most Annoying Programming Mistakes For Beginners with Python](https://www.youtube.com/watch?v=JToPoYip-C4&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C) 69 | 16. [No Python Interpreter Configured For The Module - PyCharm/IntelliJ](https://www.youtube.com/watch?v=mkKDI6y2kyE&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C) 70 | 17. [python split string into list examples](https://www.youtube.com/watch?v=T8EfomTlcfA&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C) 71 | 18. [How to migrate/update virtualenv from Python 3.5 to 3.6](https://www.youtube.com/watch?v=cFTB5EJUxzw&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C) 72 | 19. [Python String Remove Last n Characters](https://www.youtube.com/watch?v=hZHfdOKFlAw&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C) 73 | 20. [Python Pandas 7 examples of filters and lambda apply](https://www.youtube.com/watch?v=7nYkJctgSSA&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C) 74 | 21. [The simplest way to run python headless test with Chrome on Ubuntu](https://www.youtube.com/watch?v=BdppFIT_lIs&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C) 75 | 22. [Python 3 Simple Examples get current folder and go to parent](https://www.youtube.com/watch?v=tQ_9a6UhUQs&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C) 76 | 23. [python join/merge list two and more lists](https://www.youtube.com/watch?v=-zcJ4uB7XUo&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C) 77 | 24. [Easy way to convert dictionary to SQL insert with Python](https://www.youtube.com/watch?v=hUXGQwTSfMs&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C) 78 | 25. [Python 3 detect and prevent TypeError-s](https://www.youtube.com/watch?v=DJd0JYaVkqA&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C) 79 | 26. [The right way to declare multiple variables in Python](https://www.youtube.com/watch?v=8OoLg39nNlo&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C) 80 | 27. [Python uninstall a module installed with pip install and virtual envirornment](https://www.youtube.com/watch?v=03ahRfkfwME&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C) 81 | 28. [python performance profiling in pycharm](https://www.youtube.com/watch?v=EZ-im7m8630&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C) 82 | 29. [Python Cumulative Sum per Group with Pandas](https://www.youtube.com/watch?v=1tCbvYv_ibw&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C) 83 | 30. [PyCharm - Breakpoints, Favorites, TODOs simple examples](https://www.youtube.com/watch?v=_fNZLrz97kg&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C) 84 | 31. [Python 3 simple ways to list files and folders](https://www.youtube.com/watch?v=oJdubyyJNIQ&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C) 85 | 32. [Python 3 elegant way to find most/less common element in a list](https://www.youtube.com/watch?v=P4LonC3puS4&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C) 86 | 33. [clock angle problem final](https://www.youtube.com/watch?v=eIRhXharV7k&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C) 87 | 34. [Python 3 List Comprehension Tutorial for beginners](https://www.youtube.com/watch?v=DmSephyJNtQ&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C) 88 | 35. [python 3 how to remove white spaces](https://www.youtube.com/watch?v=0k0fvqikaoE&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C) 89 | 36. [Pandas Tutorial : How to split dataframe by string or date](https://www.youtube.com/watch?v=7sgDvC4k6Xg&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C) 90 | 37. [improve your programming skills with fun](https://www.youtube.com/watch?v=uoAV7651Op0&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C) 91 | 38. [pandas dataframe search for string in all columns filter regex](https://www.youtube.com/watch?v=vbHFIALhSWE&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C) 92 | 39. [Pandas is column part of another column in the same row of dataframe](https://www.youtube.com/watch?v=duOHHDqI40c&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C) 93 | 40. [Easily extract tables from websites with pandas and python](https://www.youtube.com/watch?v=OXA_ZD1gR6A&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C) 94 | 41. [Easily extract information from excel with Python and Pandas](https://www.youtube.com/watch?v=hJMH_1o8eU0&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C) 95 | 42. [Python asterisk argument or What is the usage of * asterisk in Python](https://www.youtube.com/watch?v=JBm8iptLnuA&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C) 96 | 43. [Easy Image validation with Python - valid image, blank or pattern](https://www.youtube.com/watch?v=HMB4zrP_-HY&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C) 97 | 44. [Pandas DataFrame generate n-level hierarchical JSON](https://www.youtube.com/watch?v=lCcE-0bykRU&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C) 98 | 45. [Python group or sort list of lists by common element](https://www.youtube.com/watch?v=zVQJQxpedm8&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C) 99 | 46. [Think Python: Chapter 3 Functions 3.2](https://www.youtube.com/watch?v=Ol3Dwucax9U&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C) 100 | 47. [Questions and Answers 1 Improve OCR and tabula range](https://www.youtube.com/watch?v=nrF_Rgh88no&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C) 101 | 48. [Map the headers to a column with pandas?](https://www.youtube.com/watch?v=3g6KG_8zq0E&list=PLeicpQTG639HMut5w0WfLz684cSCMBD4C) 102 | -------------------------------------------------------------------------------- /notebooks/Books/Think Python/Chapter_3_Functions_1.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Think Python: How to Think Like a Computer Scientist" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "## Chapter 3 Functions\n", 15 | "\n", 16 | "* Function calls\n", 17 | "* Math functions\n", 18 | "* Composition\n", 19 | "* Adding new functions\n", 20 | "* Definitions and uses\n", 21 | "* Flow of execution\n", 22 | "* Parameters and arguments\n", 23 | "------\n", 24 | "* Variables and parameters are local\n", 25 | "* Stack diagrams\n", 26 | "* Fruitful functions and void functions\n", 27 | "* Why functions?\n", 28 | "* Debugging\n", 29 | "* Glossary\n", 30 | "* Exercises\n", 31 | "\n", 32 | "\n", 33 | "> In the context of programming, a function is a named sequence of statements that performs a computation. When you define a function, you specify the name and the sequence of statements. Later, you can “call” the function by name." 34 | ] 35 | }, 36 | { 37 | "cell_type": "markdown", 38 | "metadata": {}, 39 | "source": [ 40 | "### Functions best practices\n", 41 | "\n", 42 | "* is name proper for the functionality\n", 43 | "* it should do one thing and only one thing.\n", 44 | "* has documentation\n", 45 | "* relatively short one" 46 | ] 47 | }, 48 | { 49 | "cell_type": "markdown", 50 | "metadata": {}, 51 | "source": [ 52 | "## 3.1 Function calls" 53 | ] 54 | }, 55 | { 56 | "cell_type": "code", 57 | "execution_count": 2, 58 | "metadata": {}, 59 | "outputs": [ 60 | { 61 | "data": { 62 | "text/plain": [ 63 | "str" 64 | ] 65 | }, 66 | "execution_count": 2, 67 | "metadata": {}, 68 | "output_type": "execute_result" 69 | } 70 | ], 71 | "source": [ 72 | "# type is the function name\n", 73 | "# 42 is the argument\n", 74 | "\n", 75 | "type('a')" 76 | ] 77 | }, 78 | { 79 | "cell_type": "markdown", 80 | "metadata": {}, 81 | "source": [ 82 | "> a function “takes” an argument and “returns” a result" 83 | ] 84 | }, 85 | { 86 | "cell_type": "code", 87 | "execution_count": 3, 88 | "metadata": {}, 89 | "outputs": [ 90 | { 91 | "data": { 92 | "text/plain": [ 93 | "32" 94 | ] 95 | }, 96 | "execution_count": 3, 97 | "metadata": {}, 98 | "output_type": "execute_result" 99 | } 100 | ], 101 | "source": [ 102 | "int('32')" 103 | ] 104 | }, 105 | { 106 | "cell_type": "code", 107 | "execution_count": 4, 108 | "metadata": {}, 109 | "outputs": [ 110 | { 111 | "ename": "ValueError", 112 | "evalue": "invalid literal for int() with base 10: 'Hello'", 113 | "traceback": [ 114 | "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", 115 | "\u001b[0;31mValueError\u001b[0m Traceback (most recent call last)", 116 | "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mint\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'Hello'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", 117 | "\u001b[0;31mValueError\u001b[0m: invalid literal for int() with base 10: 'Hello'" 118 | ], 119 | "output_type": "error" 120 | } 121 | ], 122 | "source": [ 123 | "int('Hello')" 124 | ] 125 | }, 126 | { 127 | "cell_type": "code", 128 | "execution_count": 5, 129 | "metadata": {}, 130 | "outputs": [ 131 | { 132 | "data": { 133 | "text/plain": [ 134 | "3" 135 | ] 136 | }, 137 | "execution_count": 5, 138 | "metadata": {}, 139 | "output_type": "execute_result" 140 | } 141 | ], 142 | "source": [ 143 | "int(3.99999)" 144 | ] 145 | }, 146 | { 147 | "cell_type": "code", 148 | "execution_count": 6, 149 | "metadata": {}, 150 | "outputs": [ 151 | { 152 | "data": { 153 | "text/plain": [ 154 | "-2" 155 | ] 156 | }, 157 | "execution_count": 6, 158 | "metadata": {}, 159 | "output_type": "execute_result" 160 | } 161 | ], 162 | "source": [ 163 | "int(-2.3)" 164 | ] 165 | }, 166 | { 167 | "cell_type": "code", 168 | "execution_count": 7, 169 | "metadata": {}, 170 | "outputs": [ 171 | { 172 | "data": { 173 | "text/plain": [ 174 | "32.0" 175 | ] 176 | }, 177 | "execution_count": 7, 178 | "metadata": {}, 179 | "output_type": "execute_result" 180 | } 181 | ], 182 | "source": [ 183 | "float(32)" 184 | ] 185 | }, 186 | { 187 | "cell_type": "code", 188 | "execution_count": 8, 189 | "metadata": {}, 190 | "outputs": [ 191 | { 192 | "data": { 193 | "text/plain": [ 194 | "3.14159" 195 | ] 196 | }, 197 | "execution_count": 8, 198 | "metadata": {}, 199 | "output_type": "execute_result" 200 | } 201 | ], 202 | "source": [ 203 | "float('3.14159')" 204 | ] 205 | }, 206 | { 207 | "cell_type": "code", 208 | "execution_count": 9, 209 | "metadata": {}, 210 | "outputs": [ 211 | { 212 | "data": { 213 | "text/plain": [ 214 | "'3.14159'" 215 | ] 216 | }, 217 | "execution_count": 9, 218 | "metadata": {}, 219 | "output_type": "execute_result" 220 | } 221 | ], 222 | "source": [ 223 | "str(3.14159)" 224 | ] 225 | }, 226 | { 227 | "cell_type": "code", 228 | "execution_count": 10, 229 | "metadata": {}, 230 | "outputs": [ 231 | { 232 | "data": { 233 | "text/plain": [ 234 | "'32'" 235 | ] 236 | }, 237 | "execution_count": 10, 238 | "metadata": {}, 239 | "output_type": "execute_result" 240 | } 241 | ], 242 | "source": [ 243 | "str(32)" 244 | ] 245 | }, 246 | { 247 | "cell_type": "markdown", 248 | "metadata": {}, 249 | "source": [ 250 | "## 3.2 Math functions" 251 | ] 252 | }, 253 | { 254 | "cell_type": "markdown", 255 | "metadata": {}, 256 | "source": [ 257 | "> Python has a math module that provides most of the familiar mathematical functions. A module is a file that contains a collection of related functions." 258 | ] 259 | }, 260 | { 261 | "cell_type": "code", 262 | "execution_count": 11, 263 | "metadata": {}, 264 | "outputs": [ 265 | { 266 | "data": { 267 | "text/plain": [ 268 | "" 269 | ] 270 | }, 271 | "execution_count": 11, 272 | "metadata": {}, 273 | "output_type": "execute_result" 274 | } 275 | ], 276 | "source": [ 277 | "import math\n", 278 | "math" 279 | ] 280 | }, 281 | { 282 | "cell_type": "markdown", 283 | "metadata": {}, 284 | "source": [ 285 | "> This format is called dot notation." 286 | ] 287 | }, 288 | { 289 | "cell_type": "code", 290 | "execution_count": 12, 291 | "metadata": {}, 292 | "outputs": [ 293 | { 294 | "data": { 295 | "text/plain": [ 296 | "2.2184874961635637" 297 | ] 298 | }, 299 | "execution_count": 12, 300 | "metadata": {}, 301 | "output_type": "execute_result" 302 | } 303 | ], 304 | "source": [ 305 | "# This example uses math.log10 to compute a signal-to-noise ratio in decibels \n", 306 | "\n", 307 | "signal_power = 5\n", 308 | "noise_power = 3\n", 309 | "ratio = signal_power / noise_power\n", 310 | "decibels = 10 * math.log10(ratio)\n", 311 | "decibels" 312 | ] 313 | }, 314 | { 315 | "cell_type": "code", 316 | "execution_count": null, 317 | "metadata": {}, 318 | "outputs": [], 319 | "source": [ 320 | "#The second example finds the sine of radians. The name of the variable is a \n", 321 | "# hint that sin and the other trigonometric functions (cos, tan, etc.) take arguments in radians. \n", 322 | "# To convert from degrees to radians, divide by 180 and multiply by π:\n", 323 | "\n", 324 | "radians = 0.7\n", 325 | "height = math.sin(radians)\n", 326 | "height" 327 | ] 328 | }, 329 | { 330 | "cell_type": "code", 331 | "execution_count": null, 332 | "metadata": {}, 333 | "outputs": [], 334 | "source": [ 335 | "# The expression math.pi gets the variable pi from the math module. Its value is a \n", 336 | "# floating-point approximation of π, accurate to about 15 digits.\n", 337 | "\n", 338 | "degrees = 45\n", 339 | "radians = degrees / 180.0 * math.pi\n", 340 | "math.sin(radians)" 341 | ] 342 | }, 343 | { 344 | "cell_type": "code", 345 | "execution_count": null, 346 | "metadata": {}, 347 | "outputs": [], 348 | "source": [ 349 | "# verify the previous result by\n", 350 | "\n", 351 | "math.sqrt(2) / 2.0" 352 | ] 353 | }, 354 | { 355 | "cell_type": "markdown", 356 | "metadata": {}, 357 | "source": [ 358 | "> add meaningful and descriptive comments to your functions" 359 | ] 360 | }, 361 | { 362 | "cell_type": "markdown", 363 | "metadata": {}, 364 | "source": [ 365 | "## 3.3 Composition" 366 | ] 367 | }, 368 | { 369 | "cell_type": "markdown", 370 | "metadata": {}, 371 | "source": [ 372 | "> One of the most useful features of programming languages is their ability to take small building blocks and compose them." 373 | ] 374 | }, 375 | { 376 | "cell_type": "code", 377 | "execution_count": 13, 378 | "metadata": {}, 379 | "outputs": [ 380 | { 381 | "ename": "NameError", 382 | "evalue": "name 'degrees' is not defined", 383 | "traceback": [ 384 | "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", 385 | "\u001b[0;31mNameError\u001b[0m Traceback (most recent call last)", 386 | "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mx\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mmath\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0msin\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mdegrees\u001b[0m \u001b[0;34m/\u001b[0m \u001b[0;36m360.0\u001b[0m \u001b[0;34m*\u001b[0m \u001b[0;36m2\u001b[0m \u001b[0;34m*\u001b[0m \u001b[0mmath\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mpi\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 2\u001b[0m \u001b[0mx\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", 387 | "\u001b[0;31mNameError\u001b[0m: name 'degrees' is not defined" 388 | ], 389 | "output_type": "error" 390 | } 391 | ], 392 | "source": [ 393 | "x = math.sin(degrees / 360.0 * 2 * math.pi)\n", 394 | "x" 395 | ] 396 | }, 397 | { 398 | "cell_type": "code", 399 | "execution_count": 14, 400 | "metadata": {}, 401 | "outputs": [ 402 | { 403 | "data": { 404 | "text/plain": [ 405 | "0.01745240643728351" 406 | ] 407 | }, 408 | "execution_count": 14, 409 | "metadata": {}, 410 | "output_type": "execute_result" 411 | } 412 | ], 413 | "source": [ 414 | "x = math.sin(1 / 360.0 * 2 * math.pi)\n", 415 | "x" 416 | ] 417 | }, 418 | { 419 | "cell_type": "code", 420 | "execution_count": null, 421 | "metadata": {}, 422 | "outputs": [], 423 | "source": [ 424 | "x = math.exp(math.log(x+1))\n", 425 | "x" 426 | ] 427 | }, 428 | { 429 | "cell_type": "code", 430 | "execution_count": 15, 431 | "metadata": {}, 432 | "outputs": [ 433 | { 434 | "data": { 435 | "text/plain": [ 436 | "600" 437 | ] 438 | }, 439 | "execution_count": 15, 440 | "metadata": {}, 441 | "output_type": "execute_result" 442 | } 443 | ], 444 | "source": [ 445 | "hours = 10\n", 446 | "minutes = hours * 60\n", 447 | "minutes" 448 | ] 449 | }, 450 | { 451 | "cell_type": "code", 452 | "execution_count": 16, 453 | "metadata": {}, 454 | "outputs": [ 455 | { 456 | "ename": "SyntaxError", 457 | "evalue": "can't assign to operator (, line 1)", 458 | "traceback": [ 459 | "\u001b[0;36m File \u001b[0;32m\"\"\u001b[0;36m, line \u001b[0;32m1\u001b[0m\n\u001b[0;31m hours * 60 = minutes\u001b[0m\n\u001b[0m ^\u001b[0m\n\u001b[0;31mSyntaxError\u001b[0m\u001b[0;31m:\u001b[0m can't assign to operator\n" 460 | ], 461 | "output_type": "error" 462 | } 463 | ], 464 | "source": [ 465 | "hours * 60 = minutes" 466 | ] 467 | }, 468 | { 469 | "cell_type": "markdown", 470 | "metadata": {}, 471 | "source": [ 472 | "> avoid confusing and misleading compositions\n", 473 | "\n", 474 | "> keep to the KISS principle - keep it simple, stupid" 475 | ] 476 | }, 477 | { 478 | "cell_type": "markdown", 479 | "metadata": {}, 480 | "source": [ 481 | "## 3.4 Adding new functions" 482 | ] 483 | }, 484 | { 485 | "cell_type": "markdown", 486 | "metadata": {}, 487 | "source": [ 488 | "> A function definition specifies the name of a new function and the sequence of statements that run when the function is called." 489 | ] 490 | }, 491 | { 492 | "cell_type": "code", 493 | "execution_count": 17, 494 | "metadata": {}, 495 | "outputs": [], 496 | "source": [ 497 | "# def - is a keyword that indicates that this is a function definition\n", 498 | "# print_lyrics - the function name\n", 499 | "# () - indicate that this function doesn’t take any arguments.\n", 500 | "\n", 501 | "def print_lyrics():\n", 502 | " print(\"I'm a lumberjack, and I'm okay.\")\n", 503 | " print(\"I sleep all night and I work all day.\")" 504 | ] 505 | }, 506 | { 507 | "cell_type": "markdown", 508 | "metadata": {}, 509 | "source": [ 510 | "> The first line of the function definition is called the header; the rest is called the body. \n", 511 | "\n", 512 | "> Single quotes and double quotes do the same thing in most situations;" 513 | ] 514 | }, 515 | { 516 | "cell_type": "code", 517 | "execution_count": 18, 518 | "metadata": {}, 519 | "outputs": [ 520 | { 521 | "data": { 522 | "text/plain": [ 523 | "function" 524 | ] 525 | }, 526 | "execution_count": 18, 527 | "metadata": {}, 528 | "output_type": "execute_result" 529 | } 530 | ], 531 | "source": [ 532 | "type(print_lyrics)" 533 | ] 534 | }, 535 | { 536 | "cell_type": "code", 537 | "execution_count": 19, 538 | "metadata": {}, 539 | "outputs": [ 540 | { 541 | "name": "stdout", 542 | "output_type": "stream", 543 | "text": [ 544 | "\n" 545 | ] 546 | } 547 | ], 548 | "source": [ 549 | "print(print_lyrics)" 550 | ] 551 | }, 552 | { 553 | "cell_type": "markdown", 554 | "metadata": {}, 555 | "source": [ 556 | "> The syntax for calling the new function is the same as for built-in functions:" 557 | ] 558 | }, 559 | { 560 | "cell_type": "code", 561 | "execution_count": 20, 562 | "metadata": {}, 563 | "outputs": [ 564 | { 565 | "name": "stdout", 566 | "output_type": "stream", 567 | "text": [ 568 | "I'm a lumberjack, and I'm okay.\n", 569 | "I sleep all night and I work all day.\n" 570 | ] 571 | } 572 | ], 573 | "source": [ 574 | "print_lyrics()" 575 | ] 576 | }, 577 | { 578 | "cell_type": "code", 579 | "execution_count": 21, 580 | "metadata": {}, 581 | "outputs": [], 582 | "source": [ 583 | "def repeat_lyrics():\n", 584 | " print_lyrics()\n", 585 | " print_lyrics()" 586 | ] 587 | }, 588 | { 589 | "cell_type": "code", 590 | "execution_count": 22, 591 | "metadata": {}, 592 | "outputs": [ 593 | { 594 | "name": "stdout", 595 | "output_type": "stream", 596 | "text": [ 597 | "I'm a lumberjack, and I'm okay.\n", 598 | "I sleep all night and I work all day.\n", 599 | "I'm a lumberjack, and I'm okay.\n", 600 | "I sleep all night and I work all day.\n" 601 | ] 602 | } 603 | ], 604 | "source": [ 605 | "repeat_lyrics()" 606 | ] 607 | }, 608 | { 609 | "cell_type": "markdown", 610 | "metadata": {}, 611 | "source": [ 612 | "## 3.5 Definitions and uses" 613 | ] 614 | }, 615 | { 616 | "cell_type": "markdown", 617 | "metadata": {}, 618 | "source": [ 619 | "> This program contains two function definitions: print_lyrics and repeat_lyrics. Function definitions get executed just like other statements, but the effect is to create function objects.\n", 620 | "\n", 621 | "> You have to create a function before you can run it. In other words, the function definition has to run before the function gets called." 622 | ] 623 | }, 624 | { 625 | "cell_type": "code", 626 | "execution_count": 24, 627 | "metadata": {}, 628 | "outputs": [ 629 | { 630 | "name": "stdout", 631 | "output_type": "stream", 632 | "text": [ 633 | "I'm a lumberjack, and I'm okay.\n", 634 | "I sleep all night and I work all day.\n", 635 | "I'm a lumberjack, and I'm okay.\n", 636 | "I sleep all night and I work all day.\n" 637 | ] 638 | } 639 | ], 640 | "source": [ 641 | "def print_lyrics():\n", 642 | " print(\"I'm a lumberjack, and I'm okay.\")\n", 643 | " print(\"I sleep all night and I work all day.\")\n", 644 | "\n", 645 | "def repeat_lyrics():\n", 646 | " print_lyrics()\n", 647 | " print_lyrics()\n", 648 | "\n", 649 | "repeat_lyrics()" 650 | ] 651 | }, 652 | { 653 | "cell_type": "code", 654 | "execution_count": 25, 655 | "metadata": {}, 656 | "outputs": [ 657 | { 658 | "ename": "NameError", 659 | "evalue": "name 'repeat_lyrics_new' is not defined", 660 | "traceback": [ 661 | "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", 662 | "\u001b[0;31mNameError\u001b[0m Traceback (most recent call last)", 663 | "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mrepeat_lyrics_new\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 2\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 3\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0mrepeat_lyrics_new\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 4\u001b[0m \u001b[0mprint_lyrics\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 5\u001b[0m \u001b[0mprint_lyrics\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", 664 | "\u001b[0;31mNameError\u001b[0m: name 'repeat_lyrics_new' is not defined" 665 | ], 666 | "output_type": "error" 667 | } 668 | ], 669 | "source": [ 670 | "repeat_lyrics_new()\n", 671 | "\n", 672 | "def repeat_lyrics_new():\n", 673 | " print_lyrics()\n", 674 | " print_lyrics()\n", 675 | "\n" 676 | ] 677 | }, 678 | { 679 | "cell_type": "markdown", 680 | "metadata": {}, 681 | "source": [ 682 | "## 3.6 Flow of execution" 683 | ] 684 | }, 685 | { 686 | "cell_type": "markdown", 687 | "metadata": {}, 688 | "source": [ 689 | "> To ensure that a function is defined before its first use, you have to know the order statements run in, which is called the flow of execution.\n", 690 | "\n", 691 | "> Execution always begins at the first statement of the program. Statements are run one at a time, in order from top to bottom.\n", 692 | "\n", 693 | "> In summary, when you read a program, you don’t always want to read from top to bottom. Sometimes it makes more sense if you follow the flow of execution.\n", 694 | "\n" 695 | ] 696 | }, 697 | { 698 | "cell_type": "code", 699 | "execution_count": 26, 700 | "metadata": {}, 701 | "outputs": [ 702 | { 703 | "name": "stdout", 704 | "output_type": "stream", 705 | "text": [ 706 | "1\n", 707 | "3\n", 708 | "2\n" 709 | ] 710 | } 711 | ], 712 | "source": [ 713 | "def print_lyrics_1():\n", 714 | " print(\"1\")\n", 715 | "\n", 716 | "def print_lyrics_2():\n", 717 | " print(\"2\") \n", 718 | " \n", 719 | "def print_lyrics_3():\n", 720 | " print(\"3\")\n", 721 | "\n", 722 | "def repeat_lyrics():\n", 723 | " print_lyrics_1()\n", 724 | " print_lyrics_3()\n", 725 | " print_lyrics_2()\n", 726 | "\n", 727 | "repeat_lyrics()" 728 | ] 729 | }, 730 | { 731 | "cell_type": "markdown", 732 | "metadata": {}, 733 | "source": [ 734 | "## 3.7 Parameters and arguments" 735 | ] 736 | }, 737 | { 738 | "cell_type": "markdown", 739 | "metadata": {}, 740 | "source": [ 741 | "> Some of the functions we have seen require arguments. For example, when you call math.sin you pass a number as an argument. Some functions take more than one argument: math.pow takes two, the base and the exponent.\n", 742 | "\n", 743 | "> Inside the function, the arguments are assigned to variables called parameters. " 744 | ] 745 | }, 746 | { 747 | "cell_type": "code", 748 | "execution_count": 27, 749 | "metadata": {}, 750 | "outputs": [], 751 | "source": [ 752 | "def print_twice(bruce):\n", 753 | " print(bruce)\n", 754 | " print(bruce)" 755 | ] 756 | }, 757 | { 758 | "cell_type": "code", 759 | "execution_count": 28, 760 | "metadata": {}, 761 | "outputs": [ 762 | { 763 | "name": "stdout", 764 | "output_type": "stream", 765 | "text": [ 766 | "Spam\n", 767 | "Spam\n", 768 | "42\n", 769 | "42\n", 770 | "3.141592653589793\n", 771 | "3.141592653589793\n" 772 | ] 773 | } 774 | ], 775 | "source": [ 776 | "print_twice('Spam')\n", 777 | "print_twice(42)\n", 778 | "print_twice(math.pi)" 779 | ] 780 | }, 781 | { 782 | "cell_type": "code", 783 | "execution_count": 29, 784 | "metadata": {}, 785 | "outputs": [ 786 | { 787 | "name": "stdout", 788 | "output_type": "stream", 789 | "text": [ 790 | "Spam Spam Spam Spam \n", 791 | "Spam Spam Spam Spam \n" 792 | ] 793 | } 794 | ], 795 | "source": [ 796 | "print_twice('Spam '*4)" 797 | ] 798 | }, 799 | { 800 | "cell_type": "code", 801 | "execution_count": 30, 802 | "metadata": {}, 803 | "outputs": [ 804 | { 805 | "name": "stdout", 806 | "output_type": "stream", 807 | "text": [ 808 | "-1.0\n", 809 | "-1.0\n" 810 | ] 811 | } 812 | ], 813 | "source": [ 814 | "print_twice(math.cos(math.pi))" 815 | ] 816 | }, 817 | { 818 | "cell_type": "markdown", 819 | "metadata": {}, 820 | "source": [ 821 | "> The argument is evaluated before the function is called, so in the examples the expressions 'Spam '*4 and math.cos(math.pi) are only evaluated once\n", 822 | "\n", 823 | "> The name of the variable we pass as an argument (michael) has nothing to do with the name of the parameter (bruce)." 824 | ] 825 | }, 826 | { 827 | "cell_type": "code", 828 | "execution_count": 31, 829 | "metadata": {}, 830 | "outputs": [ 831 | { 832 | "name": "stdout", 833 | "output_type": "stream", 834 | "text": [ 835 | "Eric, the half a bee.\n", 836 | "Eric, the half a bee.\n" 837 | ] 838 | } 839 | ], 840 | "source": [ 841 | "michael = 'Eric, the half a bee.'\n", 842 | "print_twice(michael)" 843 | ] 844 | }, 845 | { 846 | "cell_type": "code", 847 | "execution_count": null, 848 | "metadata": {}, 849 | "outputs": [], 850 | "source": [] 851 | } 852 | ], 853 | "metadata": { 854 | "kernelspec": { 855 | "display_name": "Python 3", 856 | "language": "python", 857 | "name": "python3" 858 | }, 859 | "language_info": { 860 | "codemirror_mode": { 861 | "name": "ipython", 862 | "version": 3 863 | }, 864 | "file_extension": ".py", 865 | "mimetype": "text/x-python", 866 | "name": "python", 867 | "nbconvert_exporter": "python", 868 | "pygments_lexer": "ipython3", 869 | "version": "3.6.7" 870 | } 871 | }, 872 | "nbformat": 4, 873 | "nbformat_minor": 2 874 | } 875 | -------------------------------------------------------------------------------- /notebooks/Books/Think Python/Chapter_4__Case_study_interface_design.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "## Chapter 4 Case study: interface design\n", 8 | "\n", 9 | "> This chapter presents a case study that demonstrates a process for designing functions that work together.\n", 10 | "\n", 11 | "\n", 12 | "\n", 13 | "* The turtle module\n", 14 | "* Simple repetition\n", 15 | "* Exercises\n", 16 | "* **Encapsulation**\n", 17 | "* **Generalization**\n", 18 | "* **Interface design**\n", 19 | "* **Refactoring**\n", 20 | "* **A development plan**\n", 21 | "* **docstring**\n", 22 | "* Debugging" 23 | ] 24 | }, 25 | { 26 | "cell_type": "markdown", 27 | "metadata": {}, 28 | "source": [ 29 | "## 4.1 The turtle module" 30 | ] 31 | }, 32 | { 33 | "cell_type": "code", 34 | "execution_count": 1, 35 | "metadata": {}, 36 | "outputs": [ 37 | { 38 | "data": { 39 | "text/plain": [ 40 | "'0.23.4'" 41 | ] 42 | }, 43 | "execution_count": 1, 44 | "metadata": {}, 45 | "output_type": "execute_result" 46 | } 47 | ], 48 | "source": [ 49 | "import pandas\n", 50 | "pandas.__version__" 51 | ] 52 | }, 53 | { 54 | "cell_type": "code", 55 | "execution_count": 2, 56 | "metadata": {}, 57 | "outputs": [], 58 | "source": [ 59 | "import turtle" 60 | ] 61 | }, 62 | { 63 | "cell_type": "markdown", 64 | "metadata": {}, 65 | "source": [ 66 | "> The turtle module (with a lowercase ’t’) provides a function called Turtle (with an uppercase ’T’) that creates a Turtle object, which we assign to a variable named bob. Printing bob displays something like:" 67 | ] 68 | }, 69 | { 70 | "cell_type": "code", 71 | "execution_count": null, 72 | "metadata": { 73 | "scrolled": true 74 | }, 75 | "outputs": [ 76 | { 77 | "name": "stdout", 78 | "output_type": "stream", 79 | "text": [ 80 | "\n" 81 | ] 82 | } 83 | ], 84 | "source": [ 85 | "# mypolygon.py\n", 86 | "import turtle\n", 87 | "bob = turtle.Turtle()\n", 88 | "print(bob)\n", 89 | "turtle.mainloop()\n", 90 | "\n", 91 | "import os\n", 92 | "os._exit(00)" 93 | ] 94 | }, 95 | { 96 | "cell_type": "code", 97 | "execution_count": null, 98 | "metadata": {}, 99 | "outputs": [], 100 | "source": [ 101 | "# draw a right angle\n", 102 | "import turtle\n", 103 | "bob = turtle.Turtle()\n", 104 | "bob.fd(100)\n", 105 | "bob.lt(90)\n", 106 | "bob.fd(100)\n", 107 | "turtle.mainloop()\n", 108 | "\n", 109 | "import os\n", 110 | "os._exit(00)" 111 | ] 112 | }, 113 | { 114 | "cell_type": "code", 115 | "execution_count": null, 116 | "metadata": {}, 117 | "outputs": [], 118 | "source": [ 119 | "# A method is similar to a function, but it uses slightly different syntax. \n", 120 | "import turtle\n", 121 | "bob = turtle.Turtle()\n", 122 | "bob.fd(100)\n", 123 | "bob.lt(90)\n", 124 | "bob.fd(100)\n", 125 | "turtle.mainloop()\n", 126 | "\n", 127 | "import os\n", 128 | "os._exit(00)" 129 | ] 130 | }, 131 | { 132 | "cell_type": "markdown", 133 | "metadata": {}, 134 | "source": [ 135 | "* A **function** is a piece of code that is called by name. It can be passed data to operate on (i.e. the parameters) and can optionally return data (the return value). All data that is passed to a function is explicitly passed.

\n", 136 | "\n", 137 | "* A **method** is a piece of code that is called by a name **that is associated with an object**. In most respects it is identical to a function except for two key differences:\n", 138 | " * A method is implicitly passed the object on which it was called.\n", 139 | " * A method is able to operate on data that is contained within the class" 140 | ] 141 | }, 142 | { 143 | "cell_type": "markdown", 144 | "metadata": {}, 145 | "source": [ 146 | "## 4.2 Simple repetition" 147 | ] 148 | }, 149 | { 150 | "cell_type": "code", 151 | "execution_count": null, 152 | "metadata": { 153 | "scrolled": true 154 | }, 155 | "outputs": [], 156 | "source": [ 157 | "# square\n", 158 | "import turtle\n", 159 | "bob3 = turtle.Turtle()\n", 160 | "\n", 161 | "bob3.fd(100)\n", 162 | "bob3.lt(90)\n", 163 | "\n", 164 | "bob3.fd(100)\n", 165 | "bob3.lt(90)\n", 166 | "\n", 167 | "bob3.fd(100)\n", 168 | "bob3.lt(90)\n", 169 | "\n", 170 | "bob3.fd(100)\n", 171 | "\n", 172 | "turtle.mainloop()\n", 173 | "\n", 174 | "import os\n", 175 | "os._exit(00)" 176 | ] 177 | }, 178 | { 179 | "cell_type": "markdown", 180 | "metadata": {}, 181 | "source": [ 182 | "> A for statement is also called a loop because the flow of execution runs through the body and then loops back to the top" 183 | ] 184 | }, 185 | { 186 | "cell_type": "code", 187 | "execution_count": 1, 188 | "metadata": {}, 189 | "outputs": [ 190 | { 191 | "name": "stdout", 192 | "output_type": "stream", 193 | "text": [ 194 | "Hello!\n", 195 | "Hello!\n", 196 | "Hello!\n", 197 | "Hello!\n" 198 | ] 199 | } 200 | ], 201 | "source": [ 202 | "for i in range(4):\n", 203 | " print('Hello!')" 204 | ] 205 | }, 206 | { 207 | "cell_type": "code", 208 | "execution_count": null, 209 | "metadata": {}, 210 | "outputs": [], 211 | "source": [ 212 | "# square \n", 213 | "import turtle\n", 214 | "bob = turtle.Turtle()\n", 215 | "for i in range(4):\n", 216 | " bob.fd(100)\n", 217 | " bob.lt(90)\n", 218 | "\n", 219 | "turtle.done()\n", 220 | "\n", 221 | "import os\n", 222 | "os._exit(00)" 223 | ] 224 | }, 225 | { 226 | "cell_type": "markdown", 227 | "metadata": {}, 228 | "source": [ 229 | "Did you notice the difference between both programs?\n", 230 | "\n", 231 | "**The art of cognitive blindspots | Kyle Eschen**\n", 232 | "\n", 233 | "https://www.youtube.com/watch?reload=9&v=OOG65rSM5fA" 234 | ] 235 | }, 236 | { 237 | "cell_type": "markdown", 238 | "metadata": {}, 239 | "source": [ 240 | "## 4.3 Exercises\n", 241 | "\n", 242 | "1. Write a function called square that takes a parameter named t, which is a turtle. It should use the turtle to draw a square.\n", 243 | "Write a function call that passes bob as an argument to square, and then run the program again.

\n", 244 | "\n", 245 | "2. Add another parameter, named length, to square. Modify the body so length of the sides is length, and then modify the function call to provide a second argument. Run the program again. Test your program with a range of values for length.

\n", 246 | "\n", 247 | "3. Make a copy of square and change the name to polygon. Add another parameter named n and modify the body so it draws an n-sided regular polygon. Hint: The exterior angles of an n-sided regular polygon are 360/n degrees.

\n", 248 | "\n", 249 | "4. Write a function called circle that takes a turtle, t, and radius, r, as parameters and that draws an approximate circle by calling polygon with an appropriate length and number of sides. Test your function with a range of values of r.\n", 250 | "Hint: figure out the circumference of the circle and make sure that length * n = circumference.

\n", 251 | "\n", 252 | "5. Make a more general version of circle called arc that takes an additional parameter angle, which determines what fraction of a circle to draw. angle is in units of degrees, so when angle=360, arc should draw a complete circle." 253 | ] 254 | }, 255 | { 256 | "cell_type": "markdown", 257 | "metadata": {}, 258 | "source": [ 259 | "## 4.4 Encapsulation\n", 260 | "\n", 261 | "> Wrapping a piece of code up in a function is called encapsulation. \n", 262 | "\n", 263 | "The major advantages: \n", 264 | "* code re-use\n", 265 | "* shorter programs (it is more concise to call a function twice than to copy and paste the body)" 266 | ] 267 | }, 268 | { 269 | "cell_type": "code", 270 | "execution_count": null, 271 | "metadata": {}, 272 | "outputs": [], 273 | "source": [ 274 | "# square \n", 275 | "import turtle\n", 276 | "\n", 277 | "def square(t):\n", 278 | " for i in range(4):\n", 279 | " t.fd(100)\n", 280 | " t.lt(90)\n", 281 | "\n", 282 | "bob = turtle.Turtle()\n", 283 | "square(bob)\n", 284 | "turtle.done()\n", 285 | "\n", 286 | "import os\n", 287 | "os._exit(00)" 288 | ] 289 | }, 290 | { 291 | "cell_type": "markdown", 292 | "metadata": {}, 293 | "source": [ 294 | "> The innermost statements, fd and lt are indented twice to show that they are inside the for loop, which is inside the function definition. The next line, square(bob), is flush with the left margin, which indicates the end of both the for loop and the function definition." 295 | ] 296 | }, 297 | { 298 | "cell_type": "markdown", 299 | "metadata": {}, 300 | "source": [ 301 | "> Inside the function, t refers to the same turtle bob, so t.lt(90) has the same effect as bob.lt(90). In that case, why not call the parameter bob? " 302 | ] 303 | }, 304 | { 305 | "cell_type": "markdown", 306 | "metadata": {}, 307 | "source": [ 308 | "## 4.5 Generalization\n", 309 | "\n", 310 | "> Adding a parameter to a function is called generalization because it makes the function more general: in the previous version, the square is always the same size; in this version it can be any size." 311 | ] 312 | }, 313 | { 314 | "cell_type": "code", 315 | "execution_count": null, 316 | "metadata": {}, 317 | "outputs": [], 318 | "source": [ 319 | "# add a length parameter to square. \n", 320 | "import turtle\n", 321 | "\n", 322 | "def square(t, length):\n", 323 | " for i in range(4):\n", 324 | " t.fd(length)\n", 325 | " t.lt(90)\n", 326 | "\n", 327 | "\n", 328 | "\n", 329 | "bob = turtle.Turtle()\n", 330 | "square(bob, 100)\n", 331 | "\n", 332 | "turtle.done()\n", 333 | "\n", 334 | "import os\n", 335 | "os._exit(00)" 336 | ] 337 | }, 338 | { 339 | "cell_type": "code", 340 | "execution_count": null, 341 | "metadata": {}, 342 | "outputs": [], 343 | "source": [ 344 | "# Instead of drawing squares, polygon draws regular polygons with any number of sides.\n", 345 | "import turtle\n", 346 | "\n", 347 | "def polygon(t, n, length):\n", 348 | " angle = 360 / n\n", 349 | " for i in range(n):\n", 350 | " t.fd(length)\n", 351 | " t.lt(angle)\n", 352 | "\n", 353 | "bob = turtle.Turtle()\n", 354 | "polygon(bob, 21, 70)\n", 355 | "\n", 356 | "turtle.done()\n", 357 | "\n", 358 | "import os\n", 359 | "os._exit(00)" 360 | ] 361 | }, 362 | { 363 | "cell_type": "markdown", 364 | "metadata": {}, 365 | "source": [ 366 | "> When a function has more than a few numeric arguments, it is easy to forget what they are, or what order they should be in. In that case it is often a good idea to include the names of the parameters in the argument list:\n", 367 | "\n", 368 | "```python\n", 369 | "polygon(bob, n=7, length=70)```\n", 370 | "\n", 371 | "> These are called keyword arguments because they include the parameter names as “keywords” (not to be confused with Python keywords like while and def)." 372 | ] 373 | }, 374 | { 375 | "cell_type": "markdown", 376 | "metadata": {}, 377 | "source": [ 378 | "## 4.6 Interface design\n", 379 | "\n", 380 | "> The interface of a function is a summary of how it is used: \n", 381 | "\n", 382 | "* what are the parameters? \n", 383 | "* What does the function do? \n", 384 | "* And what is the return value? \n", 385 | "\n", 386 | "> An interface is “clean” if it allows the caller to do what they want without dealing with unnecessary details.\n", 387 | "\n" 388 | ] 389 | }, 390 | { 391 | "cell_type": "code", 392 | "execution_count": null, 393 | "metadata": {}, 394 | "outputs": [], 395 | "source": [ 396 | "# The next step is to write circle, which takes a radius, r, as a parameter. \n", 397 | "import turtle\n", 398 | "import math\n", 399 | "\n", 400 | "def polygon(t, n, length):\n", 401 | " angle = 360 / n\n", 402 | " for i in range(n):\n", 403 | " t.fd(length)\n", 404 | " t.lt(angle)\n", 405 | "\n", 406 | "def circle(t, r):\n", 407 | " circumference = 2 * math.pi * r\n", 408 | " n = 50\n", 409 | " length = circumference / n\n", 410 | " polygon(t, n, length)\n", 411 | "\n", 412 | "bob = turtle.Turtle()\n", 413 | "circle(bob, 75)\n", 414 | "\n", 415 | "turtle.done()\n", 416 | "\n", 417 | "import os\n", 418 | "os._exit(00)" 419 | ] 420 | }, 421 | { 422 | "cell_type": "code", 423 | "execution_count": null, 424 | "metadata": {}, 425 | "outputs": [], 426 | "source": [ 427 | "# One limitation of this solution is that n is a constant,\n", 428 | "import turtle\n", 429 | "import math\n", 430 | "\n", 431 | "def polygon(t, n, length):\n", 432 | " angle = 360 / n\n", 433 | " for i in range(n):\n", 434 | " t.fd(length)\n", 435 | " t.lt(angle)\n", 436 | "\n", 437 | "def circle(t, r):\n", 438 | " circumference = 2 * math.pi * r\n", 439 | " n = int(circumference / 3) + 3\n", 440 | " length = circumference / n\n", 441 | " polygon(t, n, length)\n", 442 | "\n", 443 | "bob = turtle.Turtle()\n", 444 | "circle(bob, 75)\n", 445 | "\n", 446 | "turtle.done()\n", 447 | "\n", 448 | "import os\n", 449 | "os._exit(00)" 450 | ] 451 | }, 452 | { 453 | "cell_type": "markdown", 454 | "metadata": {}, 455 | "source": [ 456 | "## 4.7 Refactoring\n", 457 | "\n", 458 | "> This process—rearranging a program to improve interfaces and facilitate code re-use—is called refactoring. In this case, we noticed that there was similar code in arc and polygon, so we “factored it out” into polyline." 459 | ] 460 | }, 461 | { 462 | "cell_type": "code", 463 | "execution_count": null, 464 | "metadata": { 465 | "scrolled": true 466 | }, 467 | "outputs": [], 468 | "source": [ 469 | "# copy of polygon and transform it into arc\n", 470 | "import turtle\n", 471 | "import math\n", 472 | "\n", 473 | "def arc(t, r, angle):\n", 474 | " arc_length = 2 * math.pi * r * angle / 360\n", 475 | " n = int(arc_length / 3) + 1\n", 476 | " step_length = arc_length / n\n", 477 | " step_angle = angle / n\n", 478 | " \n", 479 | " for i in range(n):\n", 480 | " t.fd(step_length)\n", 481 | " t.lt(step_angle)\n", 482 | "\n", 483 | "bob = turtle.Turtle()\n", 484 | "arc(bob, 100, 180)\n", 485 | "\n", 486 | "turtle.done()\n", 487 | "\n", 488 | "import os\n", 489 | "os._exit(00)" 490 | ] 491 | }, 492 | { 493 | "cell_type": "code", 494 | "execution_count": null, 495 | "metadata": {}, 496 | "outputs": [], 497 | "source": [ 498 | "# general function polyline\n", 499 | "# rewrite polygon and arc to use polyline\n", 500 | "\n", 501 | "import turtle\n", 502 | "import math\n", 503 | "\n", 504 | "def polyline(t, n, length, angle):\n", 505 | " for i in range(n):\n", 506 | " t.fd(length)\n", 507 | " t.lt(angle)\n", 508 | "\n", 509 | "def polygon(t, n, length):\n", 510 | " angle = 360.0 / n\n", 511 | " polyline(t, n, length, angle)\n", 512 | "\n", 513 | "def arc(t, r, angle):\n", 514 | " arc_length = 2 * math.pi * r * angle / 360\n", 515 | " n = int(arc_length / 3) + 1\n", 516 | " step_length = arc_length / n\n", 517 | " step_angle = float(angle) / n\n", 518 | " polyline(t, n, step_length, step_angle)\n", 519 | " \n", 520 | "def circle(t, r):\n", 521 | " arc(t, r, 360)\n", 522 | "\n", 523 | "bob = turtle.Turtle()\n", 524 | "arc(bob, 100, 180)\n", 525 | "\n", 526 | "turtle.done()\n", 527 | "\n", 528 | "import os\n", 529 | "os._exit(00)" 530 | ] 531 | }, 532 | { 533 | "cell_type": "markdown", 534 | "metadata": {}, 535 | "source": [ 536 | "## 4.8 A development plan\n", 537 | "\n", 538 | "1. Start by writing a small program with no function definitions.

\n", 539 | "2. Once you get the program working, identify a coherent piece of it, encapsulate the piece in a function and give it a name.

\n", 540 | "3. Generalize the function by adding appropriate parameters.

\n", 541 | "4. Repeat steps 1–3 until you have a set of working functions. Copy and paste working code to avoid retyping (and re-debugging).

\n", 542 | "5. Look for opportunities to improve the program by refactoring. For example, if you have similar code in several places, consider factoring it into an appropriately general function.

\n" 543 | ] 544 | }, 545 | { 546 | "cell_type": "markdown", 547 | "metadata": {}, 548 | "source": [ 549 | "## 4.9 docstring\n", 550 | "\n", 551 | "> A docstring is a string at the beginning of a function that explains the interface (“doc” is short for “documentation”)." 552 | ] 553 | }, 554 | { 555 | "cell_type": "code", 556 | "execution_count": 1, 557 | "metadata": {}, 558 | "outputs": [ 559 | { 560 | "name": "stdout", 561 | "output_type": "stream", 562 | "text": [ 563 | "polyline\n", 564 | "square\n" 565 | ] 566 | } 567 | ], 568 | "source": [ 569 | "import turtle\n", 570 | "\n", 571 | "def polyline():\n", 572 | " \"\"\"Draws n line segments with the given length and\n", 573 | " angle (in degrees) between them. t is a turtle.\n", 574 | " \"\"\" \n", 575 | " print('polyline')\n", 576 | " #for i in range(n):\n", 577 | " # t.fd(length)\n", 578 | " # t.lt(angle)\n", 579 | " \n", 580 | "def square():\n", 581 | " print('square')\n", 582 | " \n", 583 | "polyline() \n", 584 | "square()" 585 | ] 586 | }, 587 | { 588 | "cell_type": "markdown", 589 | "metadata": {}, 590 | "source": [ 591 | "## 4.10 Debugging\n", 592 | "\n", 593 | "> If the preconditions are satisfied and the postconditions are not, the bug is in the function. If your pre- and postconditions are clear, they can help with debugging." 594 | ] 595 | } 596 | ], 597 | "metadata": { 598 | "kernelspec": { 599 | "display_name": "Python 3", 600 | "language": "python", 601 | "name": "python3" 602 | }, 603 | "language_info": { 604 | "codemirror_mode": { 605 | "name": "ipython", 606 | "version": 3 607 | }, 608 | "file_extension": ".py", 609 | "mimetype": "text/x-python", 610 | "name": "python", 611 | "nbconvert_exporter": "python", 612 | "pygments_lexer": "ipython3", 613 | "version": "3.6.8" 614 | } 615 | }, 616 | "nbformat": 4, 617 | "nbformat_minor": 2 618 | } 619 | -------------------------------------------------------------------------------- /notebooks/Books/Think Python/ch7_debug.py: -------------------------------------------------------------------------------- 1 | def pascal(n): 2 | row_map = {0: {n: 1}} 3 | 4 | for i in range(1, n): 5 | row_list = {} 6 | 7 | prev_row = row_map[i - 1] 8 | for k, v in row_map[i - 1].items(): 9 | 10 | if k + 1 in row_list: 11 | row_list[k + 1] = row_list[k + 1] 12 | else: 13 | row_list[k + 1] = prev_row.get(k, 0) + prev_row.get(k + 2, 0) 14 | 15 | if k - 1 in row_list: 16 | row_list[k - 1] = row_list[k - 1] 17 | else: 18 | row_list[k - 1] = prev_row.get(k, 0) + prev_row.get(k - 2, 0) 19 | 20 | row_map[i] = row_list 21 | 22 | for k, v in row_map.items(): 23 | print(f'k: {k}, v: {v}') 24 | 25 | for k, v in row_map.items(): 26 | # print(f'k: {k}, v: {v}') 27 | count = 0 28 | for kk, vv in sorted(v.items()): 29 | count = count + 1 30 | if count == 1: 31 | print(' ' * kk + f'{vv:3}', end='') 32 | else: 33 | print(' ' + f'{vv:3}', end='') 34 | print() 35 | 36 | 37 | def fibMemo(i): 38 | memo = {} 39 | if i in memo: 40 | return memo[i] 41 | if i <= 2: 42 | return 1 43 | else: 44 | f = fibMemo(i - 1) + fibMemo(i - 2) 45 | memo[i] = f 46 | # print("calc", i, memo) 47 | return f 48 | 49 | 50 | x = fibMemo(4) 51 | print(x) 52 | pascal(x) 53 | -------------------------------------------------------------------------------- /notebooks/Books/Think Python/strings_in_python.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/softhints/python/a256a054d74ca397f41874b3e26f1c4b84214432/notebooks/Books/Think Python/strings_in_python.png -------------------------------------------------------------------------------- /notebooks/IPython tricks 2019.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# IPython/Jupyter Notebook tricks for advanced in 2019" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "## Suppress output in IPython Notebook " 15 | ] 16 | }, 17 | { 18 | "cell_type": "markdown", 19 | "metadata": {}, 20 | "source": [ 21 | "simple" 22 | ] 23 | }, 24 | { 25 | "cell_type": "code", 26 | "execution_count": 1, 27 | "metadata": {}, 28 | "outputs": [ 29 | { 30 | "data": { 31 | "text/plain": [ 32 | "4" 33 | ] 34 | }, 35 | "execution_count": 1, 36 | "metadata": {}, 37 | "output_type": "execute_result" 38 | } 39 | ], 40 | "source": [ 41 | "2*2" 42 | ] 43 | }, 44 | { 45 | "cell_type": "code", 46 | "execution_count": 2, 47 | "metadata": {}, 48 | "outputs": [], 49 | "source": [ 50 | "2*2;" 51 | ] 52 | }, 53 | { 54 | "cell_type": "markdown", 55 | "metadata": {}, 56 | "source": [ 57 | "function" 58 | ] 59 | }, 60 | { 61 | "cell_type": "code", 62 | "execution_count": 4, 63 | "metadata": {}, 64 | "outputs": [ 65 | { 66 | "name": "stdout", 67 | "output_type": "stream", 68 | "text": [ 69 | "Private Message\n" 70 | ] 71 | } 72 | ], 73 | "source": [ 74 | "def myfunc():\n", 75 | " print('Private Message')\n", 76 | "myfunc();" 77 | ] 78 | }, 79 | { 80 | "cell_type": "code", 81 | "execution_count": 5, 82 | "metadata": {}, 83 | "outputs": [], 84 | "source": [ 85 | "%%capture\n", 86 | "def myfunc():\n", 87 | " print('Private Message')\n", 88 | "myfunc()" 89 | ] 90 | }, 91 | { 92 | "cell_type": "markdown", 93 | "metadata": {}, 94 | "source": [ 95 | "function 2" 96 | ] 97 | }, 98 | { 99 | "cell_type": "code", 100 | "execution_count": 6, 101 | "metadata": {}, 102 | "outputs": [ 103 | { 104 | "name": "stdout", 105 | "output_type": "stream", 106 | "text": [ 107 | "Private Message\n" 108 | ] 109 | } 110 | ], 111 | "source": [ 112 | "def myfunc():\n", 113 | " print('Private Message')\n", 114 | " \n", 115 | "myfunc()" 116 | ] 117 | }, 118 | { 119 | "cell_type": "code", 120 | "execution_count": 7, 121 | "metadata": {}, 122 | "outputs": [], 123 | "source": [ 124 | "from IPython.utils import io\n", 125 | "\n", 126 | "def myfunc():\n", 127 | " print('Private Message')\n", 128 | "\n", 129 | "with io.capture_output() as captured:\n", 130 | " myfunc()" 131 | ] 132 | }, 133 | { 134 | "cell_type": "markdown", 135 | "metadata": {}, 136 | "source": [ 137 | "## Get function docs and arguments IPython Notebook " 138 | ] 139 | }, 140 | { 141 | "cell_type": "code", 142 | "execution_count": 8, 143 | "metadata": {}, 144 | "outputs": [], 145 | "source": [ 146 | "import numpy\n", 147 | "table_list = [1,2,3,4,4]\n", 148 | "l = numpy.array_split(table_list, len(table_list)/4)" 149 | ] 150 | }, 151 | { 152 | "cell_type": "code", 153 | "execution_count": 9, 154 | "metadata": {}, 155 | "outputs": [], 156 | "source": [ 157 | "?" 158 | ] 159 | }, 160 | { 161 | "cell_type": "code", 162 | "execution_count": null, 163 | "metadata": {}, 164 | "outputs": [], 165 | "source": [ 166 | "? numpy.array_split" 167 | ] 168 | }, 169 | { 170 | "cell_type": "markdown", 171 | "metadata": {}, 172 | "source": [ 173 | "## Change theme IPython Notebook \n", 174 | "\n", 175 | "install the module by\n", 176 | "\n", 177 | "`pip install jupyterthemes`\n", 178 | "\n", 179 | "install a theme:\n", 180 | "\n", 181 | "`jt -t chesterish`\n", 182 | "\n", 183 | "restore a theme:\n", 184 | "\n", 185 | "`jt -r`\n", 186 | "\n", 187 | "It can be done even inside jupyter notebook by:\n", 188 | "\n", 189 | "`!jt -r`" 190 | ] 191 | }, 192 | { 193 | "cell_type": "code", 194 | "execution_count": null, 195 | "metadata": {}, 196 | "outputs": [], 197 | "source": [ 198 | "!jt -r" 199 | ] 200 | }, 201 | { 202 | "cell_type": "code", 203 | "execution_count": null, 204 | "metadata": {}, 205 | "outputs": [], 206 | "source": [ 207 | "!jt -t chesterish" 208 | ] 209 | }, 210 | { 211 | "cell_type": "markdown", 212 | "metadata": {}, 213 | "source": [ 214 | "## Bonus: some useful jupyter notebook magics" 215 | ] 216 | }, 217 | { 218 | "cell_type": "code", 219 | "execution_count": null, 220 | "metadata": {}, 221 | "outputs": [], 222 | "source": [ 223 | "!jupyter kernelspec list" 224 | ] 225 | }, 226 | { 227 | "cell_type": "code", 228 | "execution_count": null, 229 | "metadata": {}, 230 | "outputs": [], 231 | "source": [ 232 | "import numpy\n", 233 | "print (numpy.__path__)" 234 | ] 235 | }, 236 | { 237 | "cell_type": "code", 238 | "execution_count": 18, 239 | "metadata": {}, 240 | "outputs": [ 241 | { 242 | "name": "stdout", 243 | "output_type": "stream", 244 | "text": [ 245 | "Python 3.6.7\r\n" 246 | ] 247 | } 248 | ], 249 | "source": [ 250 | "!python -V" 251 | ] 252 | }, 253 | { 254 | "cell_type": "code", 255 | "execution_count": null, 256 | "metadata": {}, 257 | "outputs": [], 258 | "source": [ 259 | "!which python" 260 | ] 261 | }, 262 | { 263 | "cell_type": "code", 264 | "execution_count": 17, 265 | "metadata": {}, 266 | "outputs": [ 267 | { 268 | "name": "stdout", 269 | "output_type": "stream", 270 | "text": [ 271 | "appdirs==1.4.3\r\n", 272 | "asn1crypto==0.24.0\r\n", 273 | "atomicwrites==1.2.1\r\n", 274 | "attrs==18.2.0\r\n", 275 | "backcall==0.1.0\r\n", 276 | "black==18.9b0\r\n", 277 | "bleach==3.0.2\r\n", 278 | "boto==2.49.0\r\n", 279 | "boto3==1.9.67\r\n", 280 | "botocore==1.12.67\r\n", 281 | "camelot-py==0.7.1\r\n", 282 | "certifi==2018.8.24\r\n", 283 | "cffi==1.11.5\r\n", 284 | "chardet==3.0.4\r\n", 285 | "Click==7.0\r\n", 286 | "cryptography==2.3.1\r\n", 287 | "cycler==0.10.0\r\n", 288 | "decorator==4.3.0\r\n", 289 | "defusedxml==0.5.0\r\n", 290 | "distro==1.3.0\r\n", 291 | "docutils==0.14\r\n", 292 | "entrypoints==0.2.3\r\n", 293 | "et-xmlfile==1.0.1\r\n", 294 | "filelock==3.0.10\r\n", 295 | "idna==2.7\r\n", 296 | "ipykernel==5.1.0\r\n", 297 | "ipython==7.2.0\r\n", 298 | "ipython-genutils==0.2.0\r\n", 299 | "ipywidgets==7.4.2\r\n", 300 | "jdcal==1.4\r\n", 301 | "jedi==0.13.1\r\n", 302 | "Jinja2==2.10\r\n", 303 | "jira==2.0.0\r\n", 304 | "jmespath==0.9.3\r\n", 305 | "jsonref==0.2\r\n", 306 | "jsonschema==2.6.0\r\n", 307 | "jupyter==1.0.0\r\n", 308 | "jupyter-client==5.2.3\r\n", 309 | "jupyter-console==6.0.0\r\n", 310 | "jupyter-core==4.4.0\r\n", 311 | "jupyterthemes==0.20.0\r\n", 312 | "kiwisolver==1.0.1\r\n", 313 | "lesscpy==0.13.0\r\n", 314 | "lxml==4.3.0\r\n", 315 | "MarkupSafe==1.1.0\r\n", 316 | "matplotlib==3.0.0\r\n", 317 | "mistune==0.8.4\r\n", 318 | "more-itertools==5.0.0\r\n", 319 | "nbconvert==5.4.0\r\n", 320 | "nbformat==4.4.0\r\n", 321 | "notebook==5.7.2\r\n", 322 | "numpy==1.15.1\r\n", 323 | "oauthlib==2.1.0\r\n", 324 | "opencv-python==4.0.0.21\r\n", 325 | "openpyxl==2.5.14\r\n", 326 | "packaging==16.8\r\n", 327 | "pandas==0.23.4\r\n", 328 | "pandocfilters==1.4.2\r\n", 329 | "parso==0.3.1\r\n", 330 | "pbr==4.2.0\r\n", 331 | "pdfminer.six==20181108\r\n", 332 | "pexpect==4.6.0\r\n", 333 | "pickleshare==0.7.5\r\n", 334 | "Pillow==5.2.0\r\n", 335 | "pkg-resources==0.0.0\r\n", 336 | "pluggy==0.8.1\r\n", 337 | "ply==3.11\r\n", 338 | "prometheus-client==0.4.2\r\n", 339 | "prompt-toolkit==2.0.7\r\n", 340 | "ptyprocess==0.6.0\r\n", 341 | "py==1.7.0\r\n", 342 | "py-spy==0.1.8\r\n", 343 | "pycodestyle==2.3.1\r\n", 344 | "pycparser==2.18\r\n", 345 | "pycryptodome==3.7.3\r\n", 346 | "Pygments==2.3.0\r\n", 347 | "PyJWT==1.6.4\r\n", 348 | "PyMySQL==0.9.2\r\n", 349 | "pyparsing==2.2.0\r\n", 350 | "PyPDF2==1.26.0\r\n", 351 | "pytesseract==0.2.4\r\n", 352 | "pytest==4.1.1\r\n", 353 | "python-dateutil==2.7.3\r\n", 354 | "pytz==2018.5\r\n", 355 | "pyzmq==17.1.2\r\n", 356 | "qtconsole==4.4.3\r\n", 357 | "requests==2.19.1\r\n", 358 | "requests-oauthlib==1.0.0\r\n", 359 | "requests-toolbelt==0.8.0\r\n", 360 | "retrying==1.3.3\r\n", 361 | "s3transfer==0.1.13\r\n", 362 | "scrapinghub==2.0.3\r\n", 363 | "selenium==3.14.0\r\n", 364 | "Send2Trash==1.5.0\r\n", 365 | "simplejson==3.10.0\r\n", 366 | "six==1.10.0\r\n", 367 | "sortedcontainers==2.1.0\r\n", 368 | "style==1.1.0\r\n", 369 | "tabula-py==1.3.1\r\n", 370 | "tabulate==0.8.2\r\n", 371 | "terminado==0.8.1\r\n", 372 | "testpath==0.4.2\r\n", 373 | "toml==0.10.0\r\n", 374 | "tornado==5.1.1\r\n", 375 | "tox==3.7.0\r\n", 376 | "traitlets==4.3.2\r\n", 377 | "update==0.0.1\r\n", 378 | "urllib3==1.23\r\n", 379 | "virtualenv==16.3.0\r\n", 380 | "Wand==0.4.4\r\n", 381 | "wcwidth==0.1.7\r\n", 382 | "webencodings==0.5.1\r\n", 383 | "widgetsnbextension==3.4.2\r\n" 384 | ] 385 | } 386 | ], 387 | "source": [ 388 | "!pip freeze" 389 | ] 390 | }, 391 | { 392 | "cell_type": "code", 393 | "execution_count": null, 394 | "metadata": {}, 395 | "outputs": [], 396 | "source": [ 397 | "!echo $PATH " 398 | ] 399 | }, 400 | { 401 | "cell_type": "markdown", 402 | "metadata": {}, 403 | "source": [ 404 | "## Bonus 2: Top 10 most useful ipython key shortcuts" 405 | ] 406 | }, 407 | { 408 | "cell_type": "markdown", 409 | "metadata": {}, 410 | "source": [ 411 | "* Shift + Enter - \trun cell\n", 412 | "* Alt + Enter - \trun cell, insert below\n", 413 | "* Ctrl + m, c - \tcopy cell\n", 414 | "* Ctrl + m, v - \tpaste cell\n", 415 | "* Ctrl + m, l - \ttoggle line numbers\n", 416 | "* Ctrl + m, j -\tmove cell\n", 417 | "* Ctrl + m, y -\tcode cell\n", 418 | "* Ctrl + m, m -\tmarkdown cell\n", 419 | "* Ctrl + m, . -\trestart kernel\n", 420 | "* Ctrl + m, h -\tshow keyboard shortcuts" 421 | ] 422 | }, 423 | { 424 | "cell_type": "code", 425 | "execution_count": 21, 426 | "metadata": {}, 427 | "outputs": [ 428 | { 429 | "data": { 430 | "text/plain": [ 431 | "2" 432 | ] 433 | }, 434 | "execution_count": 21, 435 | "metadata": {}, 436 | "output_type": "execute_result" 437 | } 438 | ], 439 | "source": [ 440 | "1+1" 441 | ] 442 | }, 443 | { 444 | "cell_type": "code", 445 | "execution_count": null, 446 | "metadata": {}, 447 | "outputs": [], 448 | "source": [] 449 | }, 450 | { 451 | "cell_type": "code", 452 | "execution_count": 22, 453 | "metadata": {}, 454 | "outputs": [ 455 | { 456 | "data": { 457 | "text/plain": [ 458 | "2" 459 | ] 460 | }, 461 | "execution_count": 22, 462 | "metadata": {}, 463 | "output_type": "execute_result" 464 | } 465 | ], 466 | "source": [ 467 | "1+1\n", 468 | "## markdown" 469 | ] 470 | }, 471 | { 472 | "cell_type": "code", 473 | "execution_count": null, 474 | "metadata": {}, 475 | "outputs": [], 476 | "source": [] 477 | }, 478 | { 479 | "cell_type": "code", 480 | "execution_count": null, 481 | "metadata": {}, 482 | "outputs": [], 483 | "source": [] 484 | }, 485 | { 486 | "cell_type": "code", 487 | "execution_count": null, 488 | "metadata": {}, 489 | "outputs": [], 490 | "source": [] 491 | }, 492 | { 493 | "cell_type": "code", 494 | "execution_count": null, 495 | "metadata": {}, 496 | "outputs": [], 497 | "source": [] 498 | }, 499 | { 500 | "cell_type": "code", 501 | "execution_count": null, 502 | "metadata": {}, 503 | "outputs": [], 504 | "source": [] 505 | }, 506 | { 507 | "cell_type": "code", 508 | "execution_count": null, 509 | "metadata": {}, 510 | "outputs": [], 511 | "source": [] 512 | } 513 | ], 514 | "metadata": { 515 | "kernelspec": { 516 | "display_name": "Python 3", 517 | "language": "python", 518 | "name": "python3" 519 | }, 520 | "language_info": { 521 | "codemirror_mode": { 522 | "name": "ipython", 523 | "version": 3 524 | }, 525 | "file_extension": ".py", 526 | "mimetype": "text/x-python", 527 | "name": "python", 528 | "nbconvert_exporter": "python", 529 | "pygments_lexer": "ipython3", 530 | "version": "3.6.7" 531 | } 532 | }, 533 | "nbformat": 4, 534 | "nbformat_minor": 2 535 | } 536 | -------------------------------------------------------------------------------- /notebooks/Image_validation_with_Python.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "## Image validation with Python\n", 8 | "\n", 9 | "* is a file valid image\n", 10 | " * check file extension\n", 11 | " * check the file with pil\n", 12 | "* is the image blank\n", 13 | "* is the image contains a pattern\n", 14 | "\n", 15 | "#### possible future video:\n", 16 | "* multiple image validation\n", 17 | "* validation url image without donwload\n", 18 | "* search image in image" 19 | ] 20 | }, 21 | { 22 | "cell_type": "markdown", 23 | "metadata": {}, 24 | "source": [ 25 | "## is a file valid image" 26 | ] 27 | }, 28 | { 29 | "cell_type": "code", 30 | "execution_count": 6, 31 | "metadata": {}, 32 | "outputs": [ 33 | { 34 | "data": { 35 | "text/plain": [ 36 | "False" 37 | ] 38 | }, 39 | "execution_count": 6, 40 | "metadata": {}, 41 | "output_type": "execute_result" 42 | } 43 | ], 44 | "source": [ 45 | "# check file extension\n", 46 | "test_img = './csv/movie_metadata.csv'\n", 47 | "test_img.lower().endswith(('.png', '.jpg', '.jpeg'))" 48 | ] 49 | }, 50 | { 51 | "cell_type": "code", 52 | "execution_count": 7, 53 | "metadata": {}, 54 | "outputs": [ 55 | { 56 | "data": { 57 | "text/plain": [ 58 | "True" 59 | ] 60 | }, 61 | "execution_count": 7, 62 | "metadata": {}, 63 | "output_type": "execute_result" 64 | } 65 | ], 66 | "source": [ 67 | "# check file extension\n", 68 | "test_img = './csv/Selection_001.png'\n", 69 | "test_img.lower().endswith(('.png', '.jpg', '.jpeg'))" 70 | ] 71 | }, 72 | { 73 | "cell_type": "markdown", 74 | "metadata": {}, 75 | "source": [ 76 | "### check the file with pil\n", 77 | "\n", 78 | "`pip install Pillow`" 79 | ] 80 | }, 81 | { 82 | "cell_type": "code", 83 | "execution_count": 25, 84 | "metadata": {}, 85 | "outputs": [], 86 | "source": [ 87 | "from PIL import Image\n", 88 | "def is_jpg(filename):\n", 89 | " try:\n", 90 | " i=Image.open(filename)\n", 91 | " return i.format in ['PNG', 'JPEG']\n", 92 | " except IOError:\n", 93 | " return False\n", 94 | " " 95 | ] 96 | }, 97 | { 98 | "cell_type": "code", 99 | "execution_count": 26, 100 | "metadata": {}, 101 | "outputs": [ 102 | { 103 | "data": { 104 | "text/plain": [ 105 | "False" 106 | ] 107 | }, 108 | "execution_count": 26, 109 | "metadata": {}, 110 | "output_type": "execute_result" 111 | } 112 | ], 113 | "source": [ 114 | "is_jpg('./csv') " 115 | ] 116 | }, 117 | { 118 | "cell_type": "code", 119 | "execution_count": 27, 120 | "metadata": {}, 121 | "outputs": [ 122 | { 123 | "data": { 124 | "text/plain": [ 125 | "False" 126 | ] 127 | }, 128 | "execution_count": 27, 129 | "metadata": {}, 130 | "output_type": "execute_result" 131 | } 132 | ], 133 | "source": [ 134 | "is_jpg('./csv/movie_metadata.csv') " 135 | ] 136 | }, 137 | { 138 | "cell_type": "code", 139 | "execution_count": 28, 140 | "metadata": {}, 141 | "outputs": [ 142 | { 143 | "data": { 144 | "text/plain": [ 145 | "True" 146 | ] 147 | }, 148 | "execution_count": 28, 149 | "metadata": {}, 150 | "output_type": "execute_result" 151 | } 152 | ], 153 | "source": [ 154 | "is_jpg('./csv/Selection_001.png') " 155 | ] 156 | }, 157 | { 158 | "cell_type": "code", 159 | "execution_count": 29, 160 | "metadata": {}, 161 | "outputs": [ 162 | { 163 | "data": { 164 | "text/plain": [ 165 | "True" 166 | ] 167 | }, 168 | "execution_count": 29, 169 | "metadata": {}, 170 | "output_type": "execute_result" 171 | } 172 | ], 173 | "source": [ 174 | "is_jpg('./csv/Selection_001.png') " 175 | ] 176 | }, 177 | { 178 | "cell_type": "code", 179 | "execution_count": 30, 180 | "metadata": {}, 181 | "outputs": [ 182 | { 183 | "data": { 184 | "text/plain": [ 185 | "True" 186 | ] 187 | }, 188 | "execution_count": 30, 189 | "metadata": {}, 190 | "output_type": "execute_result" 191 | } 192 | ], 193 | "source": [ 194 | "is_jpg('./csv/fire-and-water-2354583_960_720.jpg') " 195 | ] 196 | }, 197 | { 198 | "cell_type": "markdown", 199 | "metadata": {}, 200 | "source": [ 201 | "## is the image blank" 202 | ] 203 | }, 204 | { 205 | "cell_type": "markdown", 206 | "metadata": {}, 207 | "source": [ 208 | "" 209 | ] 210 | }, 211 | { 212 | "cell_type": "code", 213 | "execution_count": 5, 214 | "metadata": {}, 215 | "outputs": [ 216 | { 217 | "name": "stdout", 218 | "output_type": "stream", 219 | "text": [ 220 | "None\n" 221 | ] 222 | } 223 | ], 224 | "source": [ 225 | "import json\n", 226 | "from io import BytesIO\n", 227 | "from PIL import Image\n", 228 | "import requests\n", 229 | "\n", 230 | "remote_file = 'https://cdn.pixabay.com/photo/2013/03/29/07/34/girl-97433_960_720.jpg'\n", 231 | "\n", 232 | "response = requests.get(remote_file)\n", 233 | "img = Image.open(BytesIO(response.content))\n", 234 | "\n", 235 | "clrs = img.getcolors()\n", 236 | "print(clrs)" 237 | ] 238 | }, 239 | { 240 | "cell_type": "code", 241 | "execution_count": 31, 242 | "metadata": {}, 243 | "outputs": [ 244 | { 245 | "data": { 246 | "text/html": [ 247 | "" 248 | ], 249 | "text/plain": [ 250 | "" 251 | ] 252 | }, 253 | "execution_count": 31, 254 | "metadata": {}, 255 | "output_type": "execute_result" 256 | } 257 | ], 258 | "source": [ 259 | "from IPython.display import Image\n", 260 | "from IPython.core.display import HTML \n", 261 | "\n", 262 | "color_image = './csv/Selection_139.png'\n", 263 | "\n", 264 | "Image(url= color_image)" 265 | ] 266 | }, 267 | { 268 | "cell_type": "code", 269 | "execution_count": 32, 270 | "metadata": {}, 271 | "outputs": [ 272 | { 273 | "name": "stdout", 274 | "output_type": "stream", 275 | "text": [ 276 | "None\n" 277 | ] 278 | } 279 | ], 280 | "source": [ 281 | "import json\n", 282 | "from io import BytesIO\n", 283 | "from PIL import Image\n", 284 | "import requests\n", 285 | "\n", 286 | "img = Image.open(color_image)\n", 287 | "\n", 288 | "clrs = img.getcolors()\n", 289 | "print(clrs)" 290 | ] 291 | }, 292 | { 293 | "cell_type": "code", 294 | "execution_count": 33, 295 | "metadata": {}, 296 | "outputs": [ 297 | { 298 | "data": { 299 | "text/html": [ 300 | "" 301 | ], 302 | "text/plain": [ 303 | "" 304 | ] 305 | }, 306 | "execution_count": 33, 307 | "metadata": {}, 308 | "output_type": "execute_result" 309 | } 310 | ], 311 | "source": [ 312 | "from IPython.display import Image\n", 313 | "from IPython.core.display import HTML \n", 314 | "\n", 315 | "blank_image = './csv/Selection_140.png'\n", 316 | "\n", 317 | "Image(url= blank_image)" 318 | ] 319 | }, 320 | { 321 | "cell_type": "code", 322 | "execution_count": 34, 323 | "metadata": {}, 324 | "outputs": [ 325 | { 326 | "name": "stdout", 327 | "output_type": "stream", 328 | "text": [ 329 | "[(49128, (238, 238, 238))]\n" 330 | ] 331 | } 332 | ], 333 | "source": [ 334 | "import json\n", 335 | "from io import BytesIO\n", 336 | "from PIL import Image\n", 337 | "import requests\n", 338 | "\n", 339 | "img = Image.open(blank_image)\n", 340 | "\n", 341 | "clrs = img.getcolors()\n", 342 | "print(clrs)" 343 | ] 344 | }, 345 | { 346 | "cell_type": "markdown", 347 | "metadata": {}, 348 | "source": [ 349 | "## is the image contains a pattern" 350 | ] 351 | }, 352 | { 353 | "cell_type": "markdown", 354 | "metadata": {}, 355 | "source": [ 356 | "\"Drawing\"\n", 357 | "\"Drawing\"" 358 | ] 359 | }, 360 | { 361 | "cell_type": "code", 362 | "execution_count": null, 363 | "metadata": {}, 364 | "outputs": [], 365 | "source": [ 366 | "import cv2\n", 367 | "import numpy as np\n", 368 | "\n", 369 | "img_rgb = cv2.imread('./csv/image_with_coin.jpg')\n", 370 | "template = cv2.imread('./csv/coin.png')\n", 371 | "w, h = template.shape[:-1]\n", 372 | "\n", 373 | "res = cv2.matchTemplate(img_rgb, template, cv2.TM_CCOEFF_NORMED)\n", 374 | "threshold = .8\n", 375 | "loc = np.where(res >= threshold)\n", 376 | "for pt in zip(*loc[::-1]): \n", 377 | " cv2.rectangle(img_rgb, pt, (pt[0] + w, pt[1] + h), (0, 0, 255), 2)\n", 378 | "\n", 379 | "cv2.imwrite('./csv/result.png', img_rgb)" 380 | ] 381 | }, 382 | { 383 | "cell_type": "code", 384 | "execution_count": null, 385 | "metadata": {}, 386 | "outputs": [], 387 | "source": [ 388 | "from IPython.display import Image\n", 389 | "from IPython.core.display import HTML \n", 390 | "\n", 391 | "Image(url= './csv/result.png')" 392 | ] 393 | }, 394 | { 395 | "cell_type": "code", 396 | "execution_count": null, 397 | "metadata": {}, 398 | "outputs": [], 399 | "source": [] 400 | } 401 | ], 402 | "metadata": { 403 | "kernelspec": { 404 | "display_name": "Python 3", 405 | "language": "python", 406 | "name": "python3" 407 | }, 408 | "language_info": { 409 | "codemirror_mode": { 410 | "name": "ipython", 411 | "version": 3 412 | }, 413 | "file_extension": ".py", 414 | "mimetype": "text/x-python", 415 | "name": "python", 416 | "nbconvert_exporter": "python", 417 | "pygments_lexer": "ipython3", 418 | "version": "3.6.7" 419 | } 420 | }, 421 | "nbformat": 4, 422 | "nbformat_minor": 2 423 | } 424 | -------------------------------------------------------------------------------- /notebooks/Load_multiple_CSV_files_into_a_single _Dataframe.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": 1, 6 | "metadata": {}, 7 | "outputs": [], 8 | "source": [ 9 | "import pandas as pd\n", 10 | "pd.set_option('display.max_colwidth', -1)" 11 | ] 12 | }, 13 | { 14 | "cell_type": "markdown", 15 | "metadata": {}, 16 | "source": [ 17 | "## Rename multiple CSV files in a folder with Python" 18 | ] 19 | }, 20 | { 21 | "cell_type": "code", 22 | "execution_count": 3, 23 | "metadata": {}, 24 | "outputs": [], 25 | "source": [ 26 | "import glob, os\n", 27 | "\n", 28 | "def rename(dir, pathAndFilename, pattern, titlePattern):\n", 29 | " os.rename(pathAndFilename, os.path.join(dir, titlePattern))\n", 30 | "\n", 31 | "# search for csv files in the working folder \n", 32 | "path = os.path.expanduser(\"~/Projects/MYP/Datasets/test/*.csv\")\n", 33 | "\n", 34 | "# iterate and rename them one by one with the number of the iteration\n", 35 | "for i, fname in enumerate(glob.glob(path)):\n", 36 | " rename(os.path.expanduser('~/Projects/MYP/Datasets/test/'), fname, r'*.csv', r'test{}.csv'.format(i))" 37 | ] 38 | }, 39 | { 40 | "cell_type": "markdown", 41 | "metadata": {}, 42 | "source": [ 43 | "## Load several files into Dataframe" 44 | ] 45 | }, 46 | { 47 | "cell_type": "code", 48 | "execution_count": 5, 49 | "metadata": {}, 50 | "outputs": [ 51 | { 52 | "name": "stdout", 53 | "output_type": "stream", 54 | "text": [ 55 | "(541, 7)\n", 56 | "(550, 7)\n", 57 | "(1641, 7)\n" 58 | ] 59 | } 60 | ], 61 | "source": [ 62 | "# change separator for CSV file\n", 63 | "df1 = pd.read_csv('~/Projects/MYP/Datasets/test/test0.csv', sep=\"@\")\n", 64 | "df2 = pd.read_csv('~/Projects/MYP/Datasets/test/test1.csv', sep=\"@\")\n", 65 | "df3 = pd.read_csv('~/Projects/MYP/Datasets/test/test1.csv', sep=\"@\")\n", 66 | "\n", 67 | "frames = [df1, df2, df3]\n", 68 | "\n", 69 | "# concatenate multiple data CSV files\n", 70 | "all = pd.concat(frames)\n", 71 | "\n", 72 | "print(df1.shape)\n", 73 | "print(df2.shape)\n", 74 | "print(all.shape)" 75 | ] 76 | }, 77 | { 78 | "cell_type": "markdown", 79 | "metadata": {}, 80 | "source": [ 81 | "## Dynamically Load multiple csv file into Dataframe" 82 | ] 83 | }, 84 | { 85 | "cell_type": "code", 86 | "execution_count": 9, 87 | "metadata": {}, 88 | "outputs": [ 89 | { 90 | "data": { 91 | "text/html": [ 92 | "
\n", 93 | "\n", 106 | "\n", 107 | " \n", 108 | " \n", 109 | " \n", 110 | " \n", 111 | " \n", 112 | " \n", 113 | " \n", 114 | " \n", 115 | " \n", 116 | " \n", 117 | " \n", 118 | " \n", 119 | " \n", 120 | " \n", 121 | " \n", 122 | " \n", 123 | " \n", 124 | " \n", 125 | " \n", 126 | " \n", 127 | " \n", 128 | " \n", 129 | " \n", 130 | " \n", 131 | " \n", 132 | " \n", 133 | " \n", 134 | " \n", 135 | " \n", 136 | " \n", 137 | " \n", 138 | " \n", 139 | " \n", 140 | " \n", 141 | " \n", 142 | " \n", 143 | " \n", 144 | " \n", 145 | " \n", 146 | " \n", 147 | " \n", 148 | " \n", 149 | " \n", 150 | " \n", 151 | " \n", 152 | " \n", 153 | " \n", 154 | " \n", 155 | " \n", 156 | " \n", 157 | " \n", 158 | " \n", 159 | " \n", 160 | " \n", 161 | " \n", 162 | " \n", 163 | " \n", 164 | " \n", 165 | " \n", 166 | " \n", 167 | " \n", 168 | " \n", 169 | " \n", 170 | " \n", 171 | " \n", 172 | " \n", 173 | " \n", 174 | " \n", 175 | " \n", 176 | " \n", 177 | " \n", 178 | " \n", 179 | " \n", 180 | " \n", 181 | " \n", 182 | " \n", 183 | " \n", 184 | " \n", 185 | " \n", 186 | " \n", 187 | " \n", 188 | " \n", 189 | " \n", 190 | " \n", 191 | " \n", 192 | " \n", 193 | " \n", 194 | " \n", 195 | " \n", 196 | " \n", 197 | " \n", 198 | " \n", 199 | " \n", 200 | " \n", 201 | " \n", 202 | " \n", 203 | " \n", 204 | " \n", 205 | " \n", 206 | " \n", 207 | " \n", 208 | " \n", 209 | " \n", 210 | "
titleViewsLikeDislikeCommentchannel
215Turning Google Earth into SimCity 2000168175.03251.01125.0215.0test0.csv
301Microservices + Events + Docker = A Perfect Trio161110.03213.050.083.0test0.csv
265PHP in 2018 by the Creator of PHP164577.03557.069.0384.0test0.csv
468Developing Blockchain Software169484.02512.0116.0133.0test0.csv
398VS Code: The Last Editor You'll Ever Need172738.01930.0194.0340.0test0.csv
175Coding Challenge #74: Clock with p5.js232227.04609.068.0289.0test1.csv
373Coding Challenge #12: The Lorenz Attractor in Processing217172.03680.043.0333.0test1.csv
44710.4: Loading JSON data from a URL (Asynchronous Callbacks!) - p5.js Tutorial218081.02120.079.0240.0test1.csv
269The Coding Train!218635.02482.083.0324.0test1.csv
193Coding Challenge #71: Minesweeper220816.03334.071.0401.0test1.csv
\n", 211 | "
" 212 | ], 213 | "text/plain": [ 214 | " title \\\n", 215 | "215 Turning Google Earth into SimCity 2000 \n", 216 | "301 Microservices + Events + Docker = A Perfect Trio \n", 217 | "265 PHP in 2018 by the Creator of PHP \n", 218 | "468 Developing Blockchain Software \n", 219 | "398 VS Code: The Last Editor You'll Ever Need \n", 220 | "175 Coding Challenge #74: Clock with p5.js \n", 221 | "373 Coding Challenge #12: The Lorenz Attractor in Processing \n", 222 | "447 10.4: Loading JSON data from a URL (Asynchronous Callbacks!) - p5.js Tutorial \n", 223 | "269 The Coding Train! \n", 224 | "193 Coding Challenge #71: Minesweeper \n", 225 | "\n", 226 | " Views Like Dislike Comment channel \n", 227 | "215 168175.0 3251.0 1125.0 215.0 test0.csv \n", 228 | "301 161110.0 3213.0 50.0 83.0 test0.csv \n", 229 | "265 164577.0 3557.0 69.0 384.0 test0.csv \n", 230 | "468 169484.0 2512.0 116.0 133.0 test0.csv \n", 231 | "398 172738.0 1930.0 194.0 340.0 test0.csv \n", 232 | "175 232227.0 4609.0 68.0 289.0 test1.csv \n", 233 | "373 217172.0 3680.0 43.0 333.0 test1.csv \n", 234 | "447 218081.0 2120.0 79.0 240.0 test1.csv \n", 235 | "269 218635.0 2482.0 83.0 324.0 test1.csv \n", 236 | "193 220816.0 3334.0 71.0 401.0 test1.csv " 237 | ] 238 | }, 239 | "execution_count": 9, 240 | "metadata": {}, 241 | "output_type": "execute_result" 242 | } 243 | ], 244 | "source": [ 245 | "import glob\n", 246 | "\n", 247 | "result = pd.DataFrame()\n", 248 | "\n", 249 | "path = os.path.expanduser(\"~/Projects/MYP/Datasets/test/*.csv\")\n", 250 | "\n", 251 | "for fname in glob.glob(path):\n", 252 | " head, tail = os.path.split(fname)\n", 253 | " df = pd.read_csv(fname, sep=\"@\")\n", 254 | " df2 = df.sort_values(by=['Views'], ascending=False).drop(['Favorite', 'videoID'], axis=1).iloc[15:20,:]\n", 255 | " df2['channel'] = tail\n", 256 | " result = pd.concat([result, df2])\n", 257 | "result.sort_values(by=['channel']).iloc[0:10,] " 258 | ] 259 | }, 260 | { 261 | "cell_type": "markdown", 262 | "metadata": {}, 263 | "source": [ 264 | "## Generate clickable links with pandas and Jupyter notebook" 265 | ] 266 | }, 267 | { 268 | "cell_type": "code", 269 | "execution_count": 11, 270 | "metadata": {}, 271 | "outputs": [ 272 | { 273 | "data": { 274 | "text/html": [ 275 | "\n", 276 | " \n", 277 | " \n", 278 | " \n", 279 | " \n", 280 | " \n", 281 | " \n", 282 | " \n", 283 | " \n", 284 | " \n", 285 | " \n", 286 | " \n", 287 | " \n", 288 | " \n", 289 | " \n", 290 | " \n", 291 | " \n", 292 | " \n", 293 | " \n", 294 | " \n", 295 | " \n", 296 | " \n", 297 | " \n", 298 | " \n", 299 | " \n", 301 | " \n", 302 | " \n", 303 | " \n", 304 | " \n", 305 | " \n", 306 | " \n", 307 | " \n", 308 | " \n", 309 | " \n", 310 | " \n", 312 | " \n", 313 | " \n", 314 | " \n", 315 | " \n", 316 | " \n", 317 | " \n", 318 | " \n", 319 | " \n", 320 | " \n", 321 | " \n", 323 | " \n", 324 | " \n", 325 | " \n", 326 | " \n", 327 | " \n", 328 | " \n", 329 | " \n", 330 | " \n", 331 | " \n", 332 | " \n", 334 | " \n", 335 | " \n", 336 | " \n", 337 | " \n", 338 | " \n", 339 | " \n", 340 | " \n", 341 | " \n", 342 | " \n", 343 | " \n", 345 | " \n", 346 | " \n", 347 | " \n", 348 | " \n", 349 | " \n", 350 | " \n", 351 | " \n", 352 | " \n", 353 | " \n", 354 | " \n", 356 | " \n", 357 | " \n", 358 | " \n", 359 | " \n", 360 | " \n", 361 | " \n", 362 | " \n", 363 | " \n", 364 | " \n", 365 | " \n", 367 | " \n", 368 | " \n", 369 | " \n", 370 | " \n", 371 | " \n", 372 | " \n", 373 | " \n", 374 | " \n", 375 | " \n", 376 | " \n", 378 | " \n", 379 | " \n", 380 | " \n", 381 | " \n", 382 | " \n", 383 | " \n", 384 | " \n", 385 | " \n", 386 | " \n", 387 | " \n", 389 | " \n", 390 | " \n", 391 | " \n", 392 | " \n", 393 | " \n", 394 | " \n", 395 | " \n", 396 | " \n", 397 | " \n", 398 | " \n", 400 | " \n", 401 | "
titleViewsLikeDislikeFavoriteCommentvideoIDnameurl
20How To...80620.0121.013.00.013.0https:...\n", 300 | "
21How To...165533.0432.0143.00.017.0https:...\n", 311 | "
22How To...29636.099.016.00.08.0https:...\n", 322 | "
23How to...409.04.00.00.00.0https:...\n", 333 | "
24How to...31358.059.033.00.02.0https:...\n", 344 | "
25How To...85887.0272.076.00.04.0https:...\n", 355 | "
26How To...61449.095.034.00.00.0https:...\n", 366 | "
27How To...262342.01440.093.00.0447.0https:...\n", 377 | "
28How To...154661.0453.0122.00.011.0https:...\n", 388 | "
29How To...109787.0257.040.00.022.0https:...\n", 399 | "
" 402 | ], 403 | "text/plain": [ 404 | "" 405 | ] 406 | }, 407 | "execution_count": 11, 408 | "metadata": {}, 409 | "output_type": "execute_result" 410 | } 411 | ], 412 | "source": [ 413 | "from IPython.display import HTML\n", 414 | "\n", 415 | "# convert url column into href tag and add it as a new column to dataframe\n", 416 | "df['nameurl'] = df['videoID'].apply(lambda x: 'XXXXX'.format(x))\n", 417 | "\n", 418 | "\n", 419 | "\n", 420 | "# otherwise the link will be blank\n", 421 | "pd.set_option('display.max_colwidth', 10)\n", 422 | "\n", 423 | "# in order to display HTML code\n", 424 | "HTML(df.iloc[20:30,] .to_html(escape=False))" 425 | ] 426 | }, 427 | { 428 | "cell_type": "code", 429 | "execution_count": null, 430 | "metadata": {}, 431 | "outputs": [], 432 | "source": [] 433 | } 434 | ], 435 | "metadata": { 436 | "kernelspec": { 437 | "display_name": "Python 3", 438 | "language": "python", 439 | "name": "python3" 440 | }, 441 | "language_info": { 442 | "codemirror_mode": { 443 | "name": "ipython", 444 | "version": 3 445 | }, 446 | "file_extension": ".py", 447 | "mimetype": "text/x-python", 448 | "name": "python", 449 | "nbconvert_exporter": "python", 450 | "pygments_lexer": "ipython3", 451 | "version": "3.6.7" 452 | } 453 | }, 454 | "nbformat": 4, 455 | "nbformat_minor": 1 456 | } 457 | -------------------------------------------------------------------------------- /notebooks/Pandas count and percentage by value for a column.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "## Pandas count and percentage by value for a column\n", 8 | "\n", 9 | "* read remote data from pdf\n", 10 | "* calculate count and percent\n", 11 | "* format percent in better output\n", 12 | "\n", 13 | "Bonus\n", 14 | "\n", 15 | "* pandas column renaming" 16 | ] 17 | }, 18 | { 19 | "cell_type": "code", 20 | "execution_count": 1, 21 | "metadata": {}, 22 | "outputs": [ 23 | { 24 | "data": { 25 | "text/html": [ 26 | "
\n", 27 | "\n", 40 | "\n", 41 | " \n", 42 | " \n", 43 | " \n", 44 | " \n", 45 | " \n", 46 | " \n", 47 | " \n", 48 | " \n", 49 | " \n", 50 | " \n", 51 | " \n", 52 | " \n", 53 | " \n", 54 | " \n", 55 | " \n", 56 | " \n", 57 | " \n", 58 | " \n", 59 | " \n", 60 | " \n", 61 | " \n", 62 | " \n", 63 | " \n", 64 | " \n", 65 | " \n", 66 | " \n", 67 | " \n", 68 | " \n", 69 | " \n", 70 | " \n", 71 | " \n", 72 | " \n", 73 | " \n", 74 | " \n", 75 | " \n", 76 | " \n", 77 | " \n", 78 | " \n", 79 | " \n", 80 | " \n", 81 | " \n", 82 | " \n", 83 | " \n", 84 | " \n", 85 | " \n", 86 | " \n", 87 | "
foodPortion sizeper 100 gramsenergy
0Fish cake90 cals per cake200 calsMedium
1Fish fingers50 cals per piece220 calsMedium
2Gammon320 cals280 calsMed-High
3Haddock fresh200 cals110 calsLow calorie
4Halibut fresh220 cals125 calsLow calorie
\n", 88 | "
" 89 | ], 90 | "text/plain": [ 91 | " food Portion size per 100 grams energy\n", 92 | "0 Fish cake 90 cals per cake 200 cals Medium\n", 93 | "1 Fish fingers 50 cals per piece 220 cals Medium\n", 94 | "2 Gammon 320 cals 280 cals Med-High\n", 95 | "3 Haddock fresh 200 cals 110 cals Low calorie\n", 96 | "4 Halibut fresh 220 cals 125 cals Low calorie" 97 | ] 98 | }, 99 | "execution_count": 1, 100 | "metadata": {}, 101 | "output_type": "execute_result" 102 | } 103 | ], 104 | "source": [ 105 | "from tabula import read_pdf\n", 106 | "import pandas as pd\n", 107 | "df = read_pdf(\"http://www.uncledavesenterprise.com/file/health/Food%20Calories%20List.pdf\", pages=3, pandas_options={'header': None})\n", 108 | "df.columns = ['food', 'Portion size ', 'per 100 grams', 'energy']\n", 109 | "df.head()" 110 | ] 111 | }, 112 | { 113 | "cell_type": "code", 114 | "execution_count": 2, 115 | "metadata": {}, 116 | "outputs": [], 117 | "source": [ 118 | "s = df.energy" 119 | ] 120 | }, 121 | { 122 | "cell_type": "code", 123 | "execution_count": 3, 124 | "metadata": {}, 125 | "outputs": [ 126 | { 127 | "data": { 128 | "text/plain": [ 129 | "Medium 14\n", 130 | "High 6\n", 131 | "Low calorie 4\n", 132 | "Med-High 4\n", 133 | "Low-Med 1\n", 134 | "Low- Med 1\n", 135 | "Name: energy, dtype: int64" 136 | ] 137 | }, 138 | "execution_count": 3, 139 | "metadata": {}, 140 | "output_type": "execute_result" 141 | } 142 | ], 143 | "source": [ 144 | "counts = s.value_counts()\n", 145 | "counts" 146 | ] 147 | }, 148 | { 149 | "cell_type": "code", 150 | "execution_count": 4, 151 | "metadata": {}, 152 | "outputs": [ 153 | { 154 | "data": { 155 | "text/plain": [ 156 | "Medium 0.466667\n", 157 | "High 0.200000\n", 158 | "Low calorie 0.133333\n", 159 | "Med-High 0.133333\n", 160 | "Low-Med 0.033333\n", 161 | "Low- Med 0.033333\n", 162 | "Name: energy, dtype: float64" 163 | ] 164 | }, 165 | "execution_count": 4, 166 | "metadata": {}, 167 | "output_type": "execute_result" 168 | } 169 | ], 170 | "source": [ 171 | "percent = s.value_counts(normalize=True)\n", 172 | "percent" 173 | ] 174 | }, 175 | { 176 | "cell_type": "code", 177 | "execution_count": 5, 178 | "metadata": {}, 179 | "outputs": [ 180 | { 181 | "data": { 182 | "text/plain": [ 183 | "Medium 46.7%\n", 184 | "High 20.0%\n", 185 | "Low calorie 13.3%\n", 186 | "Med-High 13.3%\n", 187 | "Low-Med 3.3%\n", 188 | "Low- Med 3.3%\n", 189 | "Name: energy, dtype: object" 190 | ] 191 | }, 192 | "execution_count": 5, 193 | "metadata": {}, 194 | "output_type": "execute_result" 195 | } 196 | ], 197 | "source": [ 198 | "percent100 = s.value_counts(normalize=True).mul(100).round(1).astype(str) + '%'\n", 199 | "percent100" 200 | ] 201 | }, 202 | { 203 | "cell_type": "code", 204 | "execution_count": 6, 205 | "metadata": {}, 206 | "outputs": [ 207 | { 208 | "data": { 209 | "text/html": [ 210 | "
\n", 211 | "\n", 224 | "\n", 225 | " \n", 226 | " \n", 227 | " \n", 228 | " \n", 229 | " \n", 230 | " \n", 231 | " \n", 232 | " \n", 233 | " \n", 234 | " \n", 235 | " \n", 236 | " \n", 237 | " \n", 238 | " \n", 239 | " \n", 240 | " \n", 241 | " \n", 242 | " \n", 243 | " \n", 244 | " \n", 245 | " \n", 246 | " \n", 247 | " \n", 248 | " \n", 249 | " \n", 250 | " \n", 251 | " \n", 252 | " \n", 253 | " \n", 254 | " \n", 255 | " \n", 256 | " \n", 257 | " \n", 258 | " \n", 259 | " \n", 260 | " \n", 261 | " \n", 262 | " \n", 263 | " \n", 264 | " \n", 265 | " \n", 266 | " \n", 267 | " \n", 268 | " \n", 269 | " \n", 270 | " \n", 271 | "
countsperper100
Medium140.46666746.7%
High60.20000020.0%
Low calorie40.13333313.3%
Med-High40.13333313.3%
Low-Med10.0333333.3%
Low- Med10.0333333.3%
\n", 272 | "
" 273 | ], 274 | "text/plain": [ 275 | " counts per per100\n", 276 | "Medium 14 0.466667 46.7%\n", 277 | "High 6 0.200000 20.0%\n", 278 | "Low calorie 4 0.133333 13.3%\n", 279 | "Med-High 4 0.133333 13.3%\n", 280 | "Low-Med 1 0.033333 3.3%\n", 281 | "Low- Med 1 0.033333 3.3%" 282 | ] 283 | }, 284 | "execution_count": 6, 285 | "metadata": {}, 286 | "output_type": "execute_result" 287 | } 288 | ], 289 | "source": [ 290 | "pd.DataFrame({'counts': counts, 'per': percent, 'per100': percent100})" 291 | ] 292 | }, 293 | { 294 | "cell_type": "code", 295 | "execution_count": null, 296 | "metadata": {}, 297 | "outputs": [], 298 | "source": [ 299 | "s = df.energy\n", 300 | "counts = s.value_counts()\n", 301 | "percent = s.value_counts(normalize=True)\n", 302 | "percent100 = s.value_counts(normalize=True).mul(100).round(1).astype(str) + '%'\n", 303 | "pd.DataFrame({'counts': counts, 'per': percent, 'per100': percent100})" 304 | ] 305 | } 306 | ], 307 | "metadata": { 308 | "kernelspec": { 309 | "display_name": "Python 3", 310 | "language": "python", 311 | "name": "python3" 312 | }, 313 | "language_info": { 314 | "codemirror_mode": { 315 | "name": "ipython", 316 | "version": 3 317 | }, 318 | "file_extension": ".py", 319 | "mimetype": "text/x-python", 320 | "name": "python", 321 | "nbconvert_exporter": "python", 322 | "pygments_lexer": "ipython3", 323 | "version": "3.6.7" 324 | } 325 | }, 326 | "nbformat": 4, 327 | "nbformat_minor": 2 328 | } 329 | -------------------------------------------------------------------------------- /notebooks/Python group and sort a list of lists by a specific index,pattern.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": 1, 6 | "metadata": {}, 7 | "outputs": [], 8 | "source": [ 9 | "movies = [\n", 10 | "1, \"Avatar\" ,'good',\n", 11 | "2, \"Titanic\" ,'not bad',\n", 12 | "3, \"Star Wars: The Force Awakens\" ,'good',\n", 13 | "4, \"Jurassic World\" ,'good',\n", 14 | "5, \"The Avengers\" ,'not bad',\n", 15 | "6, \"Furious 7\" ,'not bad',\n", 16 | "7, \"Avengers: Age of Ultron\" ,'good',\n", 17 | "8, \"Harry Potter and the Deathly Hallows – Part 2\" ,'not bad',\n", 18 | "9, \"Frozen\" ,'good',\n", 19 | "\n", 20 | "\n", 21 | "\"The Birth of a Nation\" ,1915,\n", 22 | "\"The Birth of a Nation\" ,1940,\n", 23 | "\"Gone with the Wind\" ,1940,\n", 24 | "\"Gone with the Wind\" ,1963,\n", 25 | "\"Gone with the Wind\" ,1963,\n", 26 | "\"The Sound of Music\" ,1966]" 27 | ] 28 | }, 29 | { 30 | "cell_type": "code", 31 | "execution_count": null, 32 | "metadata": {}, 33 | "outputs": [], 34 | "source": [ 35 | "def sortGroupList(list_unsorted, category, category2, short=True):\n", 36 | " listx = []\n", 37 | " listy = []\n", 38 | " last_section = 0\n", 39 | " for i in range(0, len(list_unsorted), 3):\n", 40 | " if list_unsorted[i + 2] == category:\n", 41 | " listy.append(list_unsorted[i])\n", 42 | " listy.append(list_unsorted[i + 1])\n", 43 | " if not short:\n", 44 | " listy.append(list_unsorted[i + 2])\n", 45 | " last_section = i+2\n", 46 | " elif list_unsorted[i + 2] == category2:\n", 47 | " listx.append(list_unsorted[i])\n", 48 | " listx.append(list_unsorted[i + 1])\n", 49 | " if not short:\n", 50 | " listx.append(list_unsorted[i + 2])\n", 51 | " last_section = i + 2\n", 52 | " header_category = [' - ' + category + ' - ']\n", 53 | " header_category2 = [' - ' + category2 + ' - ']\n", 54 | " header_category3 = [' - ' + ' - ']\n", 55 | " return header_category + listy + header_category2 + listx + header_category3 + list_unsorted[last_section:]" 56 | ] 57 | }, 58 | { 59 | "cell_type": "code", 60 | "execution_count": null, 61 | "metadata": {}, 62 | "outputs": [], 63 | "source": [ 64 | "sortGroupList(movies, 'good', 'not bad')" 65 | ] 66 | }, 67 | { 68 | "cell_type": "code", 69 | "execution_count": null, 70 | "metadata": {}, 71 | "outputs": [], 72 | "source": [ 73 | "movies = [\n", 74 | "1, \"Avatar\" ,2009,\n", 75 | "2, \"Titanic\" ,1997,\n", 76 | "3, \"Star Wars: The Force Awakens\" ,2015,\n", 77 | "4, \"Jurassic World\" ,2015,\n", 78 | "5, \"The Avengers\" ,2012,\n", 79 | "6, \"Furious 7\" ,2015,\n", 80 | "7, \"Avengers: Age of Ultron\" ,2015,\n", 81 | "8, \"Harry Potter and the Deathly Hallows – Part 2\" ,2011,\n", 82 | "9, \"Frozen\" ,2013,\n", 83 | "\n", 84 | "\n", 85 | "\"The Birth of a Nation\" ,1915,\n", 86 | "\"The Birth of a Nation\" ,1940,\n", 87 | "\"Gone with the Wind\" ,1940,\n", 88 | "\"Gone with the Wind\" ,1963,\n", 89 | "\"The Sound of Music\" ,1966]" 90 | ] 91 | }, 92 | { 93 | "cell_type": "code", 94 | "execution_count": null, 95 | "metadata": {}, 96 | "outputs": [], 97 | "source": [ 98 | "print(len(movies))" 99 | ] 100 | }, 101 | { 102 | "cell_type": "code", 103 | "execution_count": null, 104 | "metadata": {}, 105 | "outputs": [], 106 | "source": [ 107 | "years = [str(x) for x in range(1997, 2015)]\n", 108 | "years" 109 | ] 110 | }, 111 | { 112 | "cell_type": "code", 113 | "execution_count": null, 114 | "metadata": {}, 115 | "outputs": [], 116 | "source": [ 117 | "def sortGroupList(list_unsorted):\n", 118 | " listx = []\n", 119 | " listy = []\n", 120 | " for i in range(0, len(list_unsorted), 3):\n", 121 | " if list_unsorted[i + 2] in years:\n", 122 | " listy.append(list_unsorted[i])\n", 123 | " listy.append(list_unsorted[i + 1])\n", 124 | " listy.append(list_unsorted[i + 2])\n", 125 | " else:\n", 126 | " listx.append(list_unsorted[i])\n", 127 | " listx.append(list_unsorted[i + 1])\n", 128 | " listx.append(list_unsorted[i + 2])\n", 129 | " for i in listy:\n", 130 | " print(i)\n", 131 | " for i in listx:\n", 132 | " print(i)" 133 | ] 134 | }, 135 | { 136 | "cell_type": "code", 137 | "execution_count": null, 138 | "metadata": {}, 139 | "outputs": [], 140 | "source": [ 141 | "sortGroupList(movies)" 142 | ] 143 | }, 144 | { 145 | "cell_type": "code", 146 | "execution_count": null, 147 | "metadata": {}, 148 | "outputs": [], 149 | "source": [ 150 | "\n", 151 | "movies = [\n", 152 | "1, \"Avatar\" ,'good',\n", 153 | "2, \"Titanic\" ,'not bad',\n", 154 | "3, \"Star Wars: The Force Awakens\" ,'good',\n", 155 | "4, \"Jurassic World\" ,'good',\n", 156 | "5, \"The Avengers\" ,'not bad',\n", 157 | "6, \"Furious 7\" ,'not bad',\n", 158 | "7, \"Avengers: Age of Ultron\" ,'good',\n", 159 | "8, \"Harry Potter and the Deathly Hallows – Part 2\" ,'not bad',\n", 160 | "9, \"Frozen\" ,'good',\n", 161 | "\n", 162 | "\n", 163 | "\"The Birth of a Nation\" ,1915,\n", 164 | "\"The Birth of a Nation\" ,1940,\n", 165 | "\"Gone with the Wind\" ,1940,\n", 166 | "\"Gone with the Wind\" ,1963,\n", 167 | "\"The Sound of Music\" ,1966]\n", 168 | "df = pd.DataFrame(movies)" 169 | ] 170 | }, 171 | { 172 | "cell_type": "code", 173 | "execution_count": null, 174 | "metadata": {}, 175 | "outputs": [], 176 | "source": [ 177 | "df" 178 | ] 179 | }, 180 | { 181 | "cell_type": "code", 182 | "execution_count": 2, 183 | "metadata": {}, 184 | "outputs": [], 185 | "source": [ 186 | "import pandas as pd\n", 187 | "types = []\n", 188 | "raw_list = []\n", 189 | "for e in movies:\n", 190 | " types.append(type(e))\n", 191 | " if isinstance(e, int):\n", 192 | " raw_list.append(1)\n", 193 | " else:\n", 194 | " raw_list.append(0)\n", 195 | "df1 = pd.DataFrame({'elem':movies, 'types':types}) " 196 | ] 197 | }, 198 | { 199 | "cell_type": "code", 200 | "execution_count": 3, 201 | "metadata": {}, 202 | "outputs": [], 203 | "source": [ 204 | "raw_list = [1,\n", 205 | " 0,\n", 206 | " 0,\n", 207 | " 1,\n", 208 | " 0,\n", 209 | " 0,\n", 210 | " 1,\n", 211 | " 0,\n", 212 | " 0,\n", 213 | " 1,\n", 214 | " 0,\n", 215 | " 0,\n", 216 | " 1,\n", 217 | " 0,\n", 218 | " 0,\n", 219 | " 1,\n", 220 | " 0,\n", 221 | " 0,\n", 222 | " 1,\n", 223 | " 0,\n", 224 | " 0,\n", 225 | " 1,\n", 226 | " 0,\n", 227 | " 0,\n", 228 | " 1,\n", 229 | " 0,\n", 230 | " 0,\n", 231 | " 0,\n", 232 | " 1,\n", 233 | " 0,\n", 234 | " 1,\n", 235 | " 0,\n", 236 | " 1,\n", 237 | " 0,\n", 238 | " 1,\n", 239 | " 0,\n", 240 | " 1]\n", 241 | "movies = [\n", 242 | "1, \"Avatar\" ,'good',\n", 243 | "2, \"Titanic\" ,'not bad',\n", 244 | "3, \"Star Wars: The Force Awakens\" ,'good',\n", 245 | "4, \"Jurassic World\" ,'good',\n", 246 | "5, \"The Avengers\" ,'not bad',\n", 247 | "6, \"Furious 7\" ,'not bad',\n", 248 | "7, \"Avengers: Age of Ultron\" ,'good',\n", 249 | "8, \"Harry Potter and the Deathly Hallows – Part 2\" ,'not bad',\n", 250 | "9, \"Frozen\" ,'good',\n", 251 | "\n", 252 | "\n", 253 | "\"The Birth of a Nation\" ,1915,\n", 254 | "\"The Birth of a Nation\" ,1940,\n", 255 | "\"Gone with the Wind\" ,1940,\n", 256 | "\"Gone with the Wind\" ,1963,\n", 257 | "\"The Sound of Music\" ,1966]" 258 | ] 259 | }, 260 | { 261 | "cell_type": "code", 262 | "execution_count": 4, 263 | "metadata": {}, 264 | "outputs": [ 265 | { 266 | "name": "stdout", 267 | "output_type": "stream", 268 | "text": [ 269 | "[0, 1]\n", 270 | "[0, 1]\n", 271 | "[0, 1]\n", 272 | "[0, 1]\n", 273 | "[0, 1]\n" 274 | ] 275 | }, 276 | { 277 | "data": { 278 | "text/plain": [ 279 | "[[1, 'Avatar', 'good'],\n", 280 | " [2, 'Titanic', 'not bad'],\n", 281 | " [3, 'Star Wars: The Force Awakens', 'good'],\n", 282 | " [4, 'Jurassic World', 'good'],\n", 283 | " [5, 'The Avengers', 'not bad'],\n", 284 | " [6, 'Furious 7', 'not bad'],\n", 285 | " [7, 'Avengers: Age of Ultron', 'good'],\n", 286 | " [8, 'Harry Potter and the Deathly Hallows – Part 2', 'not bad'],\n", 287 | " [9, 'Frozen', 'good']]" 288 | ] 289 | }, 290 | "execution_count": 4, 291 | "metadata": {}, 292 | "output_type": "execute_result" 293 | } 294 | ], 295 | "source": [ 296 | "patern1 = [1, 0, 0]\n", 297 | "patern2 = [1, 0]\n", 298 | "\n", 299 | "len1 = len(patern1)\n", 300 | "len2 = len(patern2)\n", 301 | "\n", 302 | "output1 = []\n", 303 | "output2 = []\n", 304 | "\n", 305 | "while(raw_list):\n", 306 | " if raw_list[:len1] == patern1: \n", 307 | " output1.append(movies[:len1])\n", 308 | " raw_list = raw_list[len1:]\n", 309 | " movies = movies[len1:]\n", 310 | " else:\n", 311 | " print(raw_list[:len2])\n", 312 | " output2.append(movies[:len2])\n", 313 | " raw_list = raw_list[len2:]\n", 314 | " movies = movies[len2:]\n", 315 | " \n", 316 | "output1" 317 | ] 318 | }, 319 | { 320 | "cell_type": "code", 321 | "execution_count": 5, 322 | "metadata": {}, 323 | "outputs": [ 324 | { 325 | "data": { 326 | "text/plain": [ 327 | "[['The Birth of a Nation', 1915],\n", 328 | " ['The Birth of a Nation', 1940],\n", 329 | " ['Gone with the Wind', 1940],\n", 330 | " ['Gone with the Wind', 1963],\n", 331 | " ['The Sound of Music', 1966]]" 332 | ] 333 | }, 334 | "execution_count": 5, 335 | "metadata": {}, 336 | "output_type": "execute_result" 337 | } 338 | ], 339 | "source": [ 340 | "output2" 341 | ] 342 | }, 343 | { 344 | "cell_type": "code", 345 | "execution_count": 6, 346 | "metadata": {}, 347 | "outputs": [], 348 | "source": [ 349 | "\n", 350 | "new_list = sorted(output1, key=lambda x: x[2])" 351 | ] 352 | }, 353 | { 354 | "cell_type": "code", 355 | "execution_count": 7, 356 | "metadata": {}, 357 | "outputs": [ 358 | { 359 | "data": { 360 | "text/plain": [ 361 | "[[1, 'Avatar', 'good'],\n", 362 | " [3, 'Star Wars: The Force Awakens', 'good'],\n", 363 | " [4, 'Jurassic World', 'good'],\n", 364 | " [7, 'Avengers: Age of Ultron', 'good'],\n", 365 | " [9, 'Frozen', 'good'],\n", 366 | " [2, 'Titanic', 'not bad'],\n", 367 | " [5, 'The Avengers', 'not bad'],\n", 368 | " [6, 'Furious 7', 'not bad'],\n", 369 | " [8, 'Harry Potter and the Deathly Hallows – Part 2', 'not bad']]" 370 | ] 371 | }, 372 | "execution_count": 7, 373 | "metadata": {}, 374 | "output_type": "execute_result" 375 | } 376 | ], 377 | "source": [ 378 | "new_list" 379 | ] 380 | }, 381 | { 382 | "cell_type": "markdown", 383 | "metadata": {}, 384 | "source": [ 385 | "## Python make groups in a list" 386 | ] 387 | }, 388 | { 389 | "cell_type": "markdown", 390 | "metadata": {}, 391 | "source": [ 392 | "#### Simple grouping" 393 | ] 394 | }, 395 | { 396 | "cell_type": "code", 397 | "execution_count": null, 398 | "metadata": {}, 399 | "outputs": [], 400 | "source": [] 401 | } 402 | ], 403 | "metadata": { 404 | "kernelspec": { 405 | "display_name": "Python 3", 406 | "language": "python", 407 | "name": "python3" 408 | }, 409 | "language_info": { 410 | "codemirror_mode": { 411 | "name": "ipython", 412 | "version": 3 413 | }, 414 | "file_extension": ".py", 415 | "mimetype": "text/x-python", 416 | "name": "python", 417 | "nbconvert_exporter": "python", 418 | "pygments_lexer": "ipython3", 419 | "version": "3.6.7" 420 | } 421 | }, 422 | "nbformat": 4, 423 | "nbformat_minor": 1 424 | } 425 | -------------------------------------------------------------------------------- /notebooks/Python_group_or_sort_list_of_lists_by_common_element.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "## Python group or sort list of lists by common element\n", 8 | "\n", 9 | " * Grouping of lists of list by position\n", 10 | " * Grouping of lists of list by key\n", 11 | " * Sort and group flatten lists of lists\n", 12 | " * Grouping list of lists different sizes\n", 13 | " \n", 14 | " #### Bonus tips\n", 15 | " \n", 16 | " \n", 17 | " * Sort list of lists elements\n", 18 | " * sort maps by key or value\n", 19 | " * Iterating list over every two elements\n", 20 | " * Iterating list over every N elements" 21 | ] 22 | }, 23 | { 24 | "cell_type": "code", 25 | "execution_count": null, 26 | "metadata": {}, 27 | "outputs": [], 28 | "source": [ 29 | "# equaly sized list of lists \n", 30 | "[[\"Linux\", 0], [\"Windows 7\",1], [\"Ubuntu\",0], [\"Windows 10\",1], [\"MacOS\",2], [\"Linux Mint\",0]]\n", 31 | "\n", 32 | "# Different sized list of lists \n", 33 | "[[\"Linux\", 0, 22], [\"Windows 7\",1 , 5, 6], [\"Ubuntu\",0], [\"Linux Mint\"]]\n", 34 | "\n", 35 | "# flatten\n", 36 | "[\"Linux\", 0, \"Windows 7\",1, \"Ubuntu\",0, \"Windows 10\",1, \"MacOS\",2, \"Linux Mint\",0]" 37 | ] 38 | }, 39 | { 40 | "cell_type": "markdown", 41 | "metadata": {}, 42 | "source": [ 43 | "#### Grouping of lists of list by position (size 2)" 44 | ] 45 | }, 46 | { 47 | "cell_type": "code", 48 | "execution_count": 1, 49 | "metadata": {}, 50 | "outputs": [ 51 | { 52 | "data": { 53 | "text/plain": [ 54 | "[['Linux', 'Ubuntu', 'Linux Mint'], ['Windows 7', 'Windows 10'], ['MacOS']]" 55 | ] 56 | }, 57 | "execution_count": 1, 58 | "metadata": {}, 59 | "output_type": "execute_result" 60 | } 61 | ], 62 | "source": [ 63 | "# equaly sized list of lists \n", 64 | "raw_list = [[\"Linux\", 0], [\"Windows 7\",1], [\"Ubuntu\",0], [\"Windows 10\",1], [\"MacOS\",2], [\"Linux Mint\",0]]\n", 65 | "\n", 66 | "keys = set(map(lambda x:x[1], raw_list))\n", 67 | "new_list = [[y[0] for y in raw_list if y[1]==x] for x in keys]\n", 68 | "\n", 69 | "new_list" 70 | ] 71 | }, 72 | { 73 | "cell_type": "code", 74 | "execution_count": 2, 75 | "metadata": {}, 76 | "outputs": [ 77 | { 78 | "data": { 79 | "text/plain": [ 80 | "{0: ['Linux', 'Ubuntu', 'Linux Mint'],\n", 81 | " 1: ['Windows 7', 'Windows 10'],\n", 82 | " 2: ['MacOS']}" 83 | ] 84 | }, 85 | "execution_count": 2, 86 | "metadata": {}, 87 | "output_type": "execute_result" 88 | } 89 | ], 90 | "source": [ 91 | "\n", 92 | "raw_list = [[\"Linux\", 0], [\"Windows 7\",1], [\"Ubuntu\",0], [\"Windows 10\",1], [\"MacOS\",2], [\"Linux Mint\",0]]\n", 93 | "\n", 94 | "keys = set(map(lambda x:x[1], raw_list))\n", 95 | "new_list = {x:[y[0] for y in raw_list if y[1]==x] for x in keys}\n", 96 | "\n", 97 | "new_list" 98 | ] 99 | }, 100 | { 101 | "cell_type": "markdown", 102 | "metadata": {}, 103 | "source": [ 104 | "#### Grouping of lists of list by position (size 4)" 105 | ] 106 | }, 107 | { 108 | "cell_type": "code", 109 | "execution_count": 5, 110 | "metadata": {}, 111 | "outputs": [ 112 | { 113 | "data": { 114 | "text/plain": [ 115 | "{'Ubuntu': [['Xenial Xerus', 0.4], ['Bionic Beaver', 0]],\n", 116 | " 'Linux Mint': [['Rosa', 17.3], ['Sonya', 18.2]]}" 117 | ] 118 | }, 119 | "execution_count": 5, 120 | "metadata": {}, 121 | "output_type": "execute_result" 122 | } 123 | ], 124 | "source": [ 125 | "raw_list = [\n", 126 | " ['Linux Mint', 17, 'Rosa', 17.3], \n", 127 | " ['Linux Mint', 18, 'Sonya', 18.2],\n", 128 | " ['Ubuntu', 16, 'Xenial Xerus', 0.4],\n", 129 | " ['Ubuntu', 18, 'Bionic Beaver', 0]]\n", 130 | "\n", 131 | "keys = set(map(lambda x:x[0], raw_list))\n", 132 | "unsorted_map = {x:[y[2:] for y in raw_list if y[0]==x] for x in keys}\n", 133 | "\n", 134 | "unsorted_map" 135 | ] 136 | }, 137 | { 138 | "cell_type": "markdown", 139 | "metadata": {}, 140 | "source": [ 141 | "#### List of list different size" 142 | ] 143 | }, 144 | { 145 | "cell_type": "code", 146 | "execution_count": 7, 147 | "metadata": {}, 148 | "outputs": [], 149 | "source": [ 150 | "raw_list = [\n", 151 | " ['Linux Mint', 17, 'Rosa', 17.3], \n", 152 | " ['Linux Mint', 18, 'Sonya', 18.2],\n", 153 | " ['Ubuntu', 16, 'Xenial Xerus', 0.4],\n", 154 | " ['Ubuntu', 18, 'Bionic Beaver', 0],\n", 155 | " \n", 156 | " ['Windows', 7, 'Home'],\n", 157 | " ['Windows', 7, 'Profesional'],\n", 158 | " ['Windows', 10, 'Ultimate']\n", 159 | "]" 160 | ] 161 | }, 162 | { 163 | "cell_type": "code", 164 | "execution_count": 9, 165 | "metadata": {}, 166 | "outputs": [ 167 | { 168 | "data": { 169 | "text/plain": [ 170 | "{'Ubuntu': [[16, 'Xenial Xerus'], [18, 'Bionic Beaver']],\n", 171 | " 'Linux Mint': [[17, 'Rosa'], [18, 'Sonya']],\n", 172 | " 'Windows': [[7, 'Home'], [7, 'Profesional'], [10, 'Ultimate']]}" 173 | ] 174 | }, 175 | "execution_count": 9, 176 | "metadata": {}, 177 | "output_type": "execute_result" 178 | } 179 | ], 180 | "source": [ 181 | "keys = set(map(lambda x:x[0], raw_list))\n", 182 | "unsorted_map = {x:[y[1:3] for y in raw_list if y[0]==x] for x in keys}\n", 183 | "unsorted_map" 184 | ] 185 | }, 186 | { 187 | "cell_type": "markdown", 188 | "metadata": {}, 189 | "source": [ 190 | "#### Sort python map by key or value" 191 | ] 192 | }, 193 | { 194 | "cell_type": "code", 195 | "execution_count": 10, 196 | "metadata": {}, 197 | "outputs": [ 198 | { 199 | "data": { 200 | "text/plain": [ 201 | "['Linux Mint', 'Ubuntu', 'Windows']" 202 | ] 203 | }, 204 | "execution_count": 10, 205 | "metadata": {}, 206 | "output_type": "execute_result" 207 | } 208 | ], 209 | "source": [ 210 | "sorted(unsorted_map.keys())" 211 | ] 212 | }, 213 | { 214 | "cell_type": "code", 215 | "execution_count": 11, 216 | "metadata": {}, 217 | "outputs": [ 218 | { 219 | "data": { 220 | "text/plain": [ 221 | "[[[7, 'Home'], [7, 'Profesional'], [10, 'Ultimate']],\n", 222 | " [[16, 'Xenial Xerus'], [18, 'Bionic Beaver']],\n", 223 | " [[17, 'Rosa'], [18, 'Sonya']]]" 224 | ] 225 | }, 226 | "execution_count": 11, 227 | "metadata": {}, 228 | "output_type": "execute_result" 229 | } 230 | ], 231 | "source": [ 232 | "sorted(unsorted_map.values())" 233 | ] 234 | }, 235 | { 236 | "cell_type": "markdown", 237 | "metadata": {}, 238 | "source": [ 239 | "### Sort list of lists by key" 240 | ] 241 | }, 242 | { 243 | "cell_type": "code", 244 | "execution_count": 12, 245 | "metadata": {}, 246 | "outputs": [ 247 | { 248 | "name": "stdout", 249 | "output_type": "stream", 250 | "text": [ 251 | "Linux Mint: [[17, 'Rosa'], [18, 'Sonya']]\n", 252 | "Ubuntu: [[16, 'Xenial Xerus'], [18, 'Bionic Beaver']]\n", 253 | "Windows: [[7, 'Home'], [7, 'Profesional'], [10, 'Ultimate']]\n" 254 | ] 255 | } 256 | ], 257 | "source": [ 258 | "for key in sorted(unsorted_map.keys()):\n", 259 | " print (\"%s: %s\" % (key, unsorted_map[key]))" 260 | ] 261 | }, 262 | { 263 | "cell_type": "code", 264 | "execution_count": 13, 265 | "metadata": {}, 266 | "outputs": [ 267 | { 268 | "name": "stdout", 269 | "output_type": "stream", 270 | "text": [ 271 | "Windows: [[7, 'Home'], [7, 'Profesional'], [10, 'Ultimate']]\n", 272 | "Ubuntu: [[16, 'Xenial Xerus'], [18, 'Bionic Beaver']]\n", 273 | "Linux Mint: [[17, 'Rosa'], [18, 'Sonya']]\n" 274 | ] 275 | } 276 | ], 277 | "source": [ 278 | "for key in sorted(unsorted_map.keys(), reverse=True):\n", 279 | " print (\"%s: %s\" % (key, unsorted_map[key]))" 280 | ] 281 | }, 282 | { 283 | "cell_type": "markdown", 284 | "metadata": {}, 285 | "source": [ 286 | "### Sort and group flatten lists of lists" 287 | ] 288 | }, 289 | { 290 | "cell_type": "code", 291 | "execution_count": 17, 292 | "metadata": {}, 293 | "outputs": [], 294 | "source": [ 295 | "os_list = [\n", 296 | " 'Ubuntu 18',\n", 297 | " 'This article informs you about Ubuntu 18.04 release date,',\n", 298 | " 'Released',\n", 299 | " \n", 300 | " 'Ubuntu 20',\n", 301 | " 'The desktop image allows you to try Ubuntu without changing y..',\n", 302 | " 'Not Released',\n", 303 | " \n", 304 | " 'Ubuntu 19',\n", 305 | " 'Ubuntu is an open source software operating system that runs from',\n", 306 | " 'Released',\n", 307 | " \n", 308 | " 'Linux mint 18',\n", 309 | " 'Linux Mint is an elegant, easy to use, up to date and comfortable',\n", 310 | " 'Released',\n", 311 | " \n", 312 | " 'Linux mint 20',\n", 313 | " 'Suggestion: For Mint 20 to go full Debian',\n", 314 | " 'Not Released',\n", 315 | " \n", 316 | " 'Linux mint 19',\n", 317 | " 'Linux Mint 19 is a long term support release which will be supported until 2023',\n", 318 | " 'Released',\n", 319 | "\n", 320 | " 'Windows 7',\n", 321 | " 'Windows 7 is a personal computer operating system that was ..',\n", 322 | " 'Windows 10',\n", 323 | " 'Windows 10 is a series of personal computer operating systems',\n", 324 | " \"Windows XP\",\n", 325 | " 'Windows XP is old, and Microsoft no longer provides official support']" 326 | ] 327 | }, 328 | { 329 | "cell_type": "code", 330 | "execution_count": 14, 331 | "metadata": {}, 332 | "outputs": [ 333 | { 334 | "name": "stdout", 335 | "output_type": "stream", 336 | "text": [ 337 | "[1, 2]\n", 338 | "[3, 4]\n", 339 | "[5, 6]\n" 340 | ] 341 | } 342 | ], 343 | "source": [ 344 | "# iterating over every two elements\n", 345 | "test_list = [1, 2, 3, 4, 5, 6]\n", 346 | "\n", 347 | "for i in range(0, len(test_list), 2):\n", 348 | " print (test_list[i:i+2])" 349 | ] 350 | }, 351 | { 352 | "cell_type": "code", 353 | "execution_count": 15, 354 | "metadata": {}, 355 | "outputs": [ 356 | { 357 | "name": "stdout", 358 | "output_type": "stream", 359 | "text": [ 360 | "[0, 1, 2]\n", 361 | "[3, 4, 5]\n", 362 | "[6, 7, 8]\n", 363 | "[9]\n" 364 | ] 365 | } 366 | ], 367 | "source": [ 368 | "# iterating over every N elements\n", 369 | "test_list = list(range(0, 10))\n", 370 | "\n", 371 | "for i in range(0, len(test_list), 3):\n", 372 | " print (test_list[i:i+3])" 373 | ] 374 | }, 375 | { 376 | "cell_type": "code", 377 | "execution_count": 18, 378 | "metadata": {}, 379 | "outputs": [ 380 | { 381 | "data": { 382 | "text/plain": [ 383 | "['Windows 7',\n", 384 | " 'Windows 7 is a personal computer operating system that was ..',\n", 385 | " 'Windows 10',\n", 386 | " 'Windows 10 is a series of personal computer operating systems',\n", 387 | " 'Windows XP',\n", 388 | " 'Windows XP is old, and Microsoft no longer provides official support']" 389 | ] 390 | }, 391 | "execution_count": 18, 392 | "metadata": {}, 393 | "output_type": "execute_result" 394 | } 395 | ], 396 | "source": [ 397 | "list3 = []\n", 398 | "last = 0\n", 399 | "for i in range(0, len(os_list), 3):\n", 400 | " if i+2 < len(os_list) and os_list[i+2] in ['Released', 'Not Released']:\n", 401 | " list3.append(os_list[i:i+3])\n", 402 | " last = i+3\n", 403 | "list2 = os_list[last:]\n", 404 | "list2" 405 | ] 406 | }, 407 | { 408 | "cell_type": "code", 409 | "execution_count": 19, 410 | "metadata": {}, 411 | "outputs": [ 412 | { 413 | "data": { 414 | "text/plain": [ 415 | "[['Ubuntu 18',\n", 416 | " 'This article informs you about Ubuntu 18.04 release date,',\n", 417 | " 'Released'],\n", 418 | " ['Ubuntu 20',\n", 419 | " 'The desktop image allows you to try Ubuntu without changing y..',\n", 420 | " 'Not Released'],\n", 421 | " ['Ubuntu 19',\n", 422 | " 'Ubuntu is an open source software operating system that runs from',\n", 423 | " 'Released'],\n", 424 | " ['Linux mint 18',\n", 425 | " 'Linux Mint is an elegant, easy to use, up to date and comfortable',\n", 426 | " 'Released'],\n", 427 | " ['Linux mint 20',\n", 428 | " 'Suggestion: For Mint 20 to go full Debian',\n", 429 | " 'Not Released'],\n", 430 | " ['Linux mint 19',\n", 431 | " 'Linux Mint 19 is a long term support release which will be supported until 2023',\n", 432 | " 'Released']]" 433 | ] 434 | }, 435 | "execution_count": 19, 436 | "metadata": {}, 437 | "output_type": "execute_result" 438 | } 439 | ], 440 | "source": [ 441 | "list3" 442 | ] 443 | }, 444 | { 445 | "cell_type": "code", 446 | "execution_count": 20, 447 | "metadata": {}, 448 | "outputs": [], 449 | "source": [ 450 | "def sortList(working_list, category, category2):\n", 451 | " listx = []\n", 452 | " listy = []\n", 453 | " last_section = 0\n", 454 | " for i in range(0, len(os_list) - 3, 3):\n", 455 | " if working_list[i + 2] == category:\n", 456 | " listy.append(working_list[i])\n", 457 | " listy.append(working_list[i + 1])\n", 458 | " last_section = i + 2\n", 459 | " elif working_list[i + 2] == category2:\n", 460 | " listx.append(working_list[i])\n", 461 | " listx.append(working_list[i + 1])\n", 462 | " last_section = i + 2\n", 463 | "\n", 464 | " if last_section > 0:\n", 465 | " listz = working_list[(last_section + 1):]\n", 466 | " else:\n", 467 | " listz = working_list[(last_section):]\n", 468 | "\n", 469 | " return listx, listy, listz" 470 | ] 471 | }, 472 | { 473 | "cell_type": "code", 474 | "execution_count": 21, 475 | "metadata": {}, 476 | "outputs": [ 477 | { 478 | "data": { 479 | "text/plain": [ 480 | "['Ubuntu 20',\n", 481 | " 'The desktop image allows you to try Ubuntu without changing y..',\n", 482 | " 'Linux mint 20',\n", 483 | " 'Suggestion: For Mint 20 to go full Debian']" 484 | ] 485 | }, 486 | "execution_count": 21, 487 | "metadata": {}, 488 | "output_type": "execute_result" 489 | } 490 | ], 491 | "source": [ 492 | "listx, listy, listz = sortList(os_list, 'Released', 'Not Released')\n", 493 | "listx" 494 | ] 495 | }, 496 | { 497 | "cell_type": "code", 498 | "execution_count": 22, 499 | "metadata": {}, 500 | "outputs": [ 501 | { 502 | "data": { 503 | "text/plain": [ 504 | "['Ubuntu 18',\n", 505 | " 'This article informs you about Ubuntu 18.04 release date,',\n", 506 | " 'Ubuntu 19',\n", 507 | " 'Ubuntu is an open source software operating system that runs from',\n", 508 | " 'Linux mint 18',\n", 509 | " 'Linux Mint is an elegant, easy to use, up to date and comfortable',\n", 510 | " 'Linux mint 19',\n", 511 | " 'Linux Mint 19 is a long term support release which will be supported until 2023']" 512 | ] 513 | }, 514 | "execution_count": 22, 515 | "metadata": {}, 516 | "output_type": "execute_result" 517 | } 518 | ], 519 | "source": [ 520 | "listy" 521 | ] 522 | }, 523 | { 524 | "cell_type": "code", 525 | "execution_count": 23, 526 | "metadata": {}, 527 | "outputs": [ 528 | { 529 | "data": { 530 | "text/plain": [ 531 | "['Windows 7',\n", 532 | " 'Windows 7 is a personal computer operating system that was ..',\n", 533 | " 'Windows 10',\n", 534 | " 'Windows 10 is a series of personal computer operating systems',\n", 535 | " 'Windows XP',\n", 536 | " 'Windows XP is old, and Microsoft no longer provides official support']" 537 | ] 538 | }, 539 | "execution_count": 23, 540 | "metadata": {}, 541 | "output_type": "execute_result" 542 | } 543 | ], 544 | "source": [ 545 | "listz" 546 | ] 547 | }, 548 | { 549 | "cell_type": "markdown", 550 | "metadata": {}, 551 | "source": [ 552 | "## Generic solution for flatten list" 553 | ] 554 | }, 555 | { 556 | "cell_type": "code", 557 | "execution_count": 24, 558 | "metadata": {}, 559 | "outputs": [ 560 | { 561 | "data": { 562 | "text/plain": [ 563 | "[['Ubuntu 18',\n", 564 | " 'This article informs you about Ubuntu 18.04 release date,',\n", 565 | " 'Released'],\n", 566 | " ['Ubuntu 20',\n", 567 | " 'The desktop image allows you to try Ubuntu without changing y..',\n", 568 | " 'Not Released'],\n", 569 | " ['Ubuntu 19',\n", 570 | " 'Ubuntu is an open source software operating system that runs from',\n", 571 | " 'Released'],\n", 572 | " ['Linux mint 18',\n", 573 | " 'Linux Mint is an elegant, easy to use, up to date and comfortable',\n", 574 | " 'Released'],\n", 575 | " ['Linux mint 20',\n", 576 | " 'Suggestion: For Mint 20 to go full Debian',\n", 577 | " 'Not Released'],\n", 578 | " ['Linux mint 19',\n", 579 | " 'Linux Mint 19 is a long term support release which will be supported until 2023',\n", 580 | " 'Released']]" 581 | ] 582 | }, 583 | "execution_count": 24, 584 | "metadata": {}, 585 | "output_type": "execute_result" 586 | } 587 | ], 588 | "source": [ 589 | "os_list = [\n", 590 | " \n", 591 | " \n", 592 | " 'Windows 10',\n", 593 | " 'Windows 10 is a series of personal computer operating systems',\n", 594 | " \"Windows XP\",\n", 595 | " 'Windows XP is old, and Microsoft no longer provides official support',\n", 596 | " \n", 597 | " 'Ubuntu 18',\n", 598 | " 'This article informs you about Ubuntu 18.04 release date,',\n", 599 | " 'Released',\n", 600 | "\n", 601 | " 'Ubuntu 20',\n", 602 | " 'The desktop image allows you to try Ubuntu without changing y..',\n", 603 | " 'Not Released',\n", 604 | "\n", 605 | " 'Windows 7',\n", 606 | " 'Windows 7 is a personal computer operating system that was ..',\n", 607 | "\n", 608 | " 'Ubuntu 19',\n", 609 | " 'Ubuntu is an open source software operating system that runs from',\n", 610 | " 'Released',\n", 611 | "\n", 612 | " 'Linux mint 18',\n", 613 | " 'Linux Mint is an elegant, easy to use, up to date and comfortable',\n", 614 | " 'Released',\n", 615 | "\n", 616 | " 'Linux mint 20',\n", 617 | " 'Suggestion: For Mint 20 to go full Debian',\n", 618 | " 'Not Released',\n", 619 | "\n", 620 | " 'Linux mint 19',\n", 621 | " 'Linux Mint 19 is a long term support release which will be supported until 2023',\n", 622 | " 'Released',\n", 623 | "\n", 624 | "]\n", 625 | "\n", 626 | "list3 = []\n", 627 | "list2 = []\n", 628 | "cur = 0\n", 629 | "\n", 630 | "os_list_tmp = os_list\n", 631 | "\n", 632 | "while cur <= len(os_list_tmp):\n", 633 | " cur = 0\n", 634 | " if cur+2 < len(os_list_tmp) and os_list_tmp[cur+2] in ['Released', 'Not Released']:\n", 635 | " list3.append(os_list_tmp[cur:cur+3])\n", 636 | " cur = cur + 3\n", 637 | " else:\n", 638 | " list2.append(os_list_tmp[cur:cur+2])\n", 639 | " cur = cur + 2\n", 640 | " os_list_tmp = os_list_tmp[cur:]\n", 641 | "list3" 642 | ] 643 | }, 644 | { 645 | "cell_type": "code", 646 | "execution_count": 25, 647 | "metadata": {}, 648 | "outputs": [ 649 | { 650 | "data": { 651 | "text/plain": [ 652 | "[['Windows 10',\n", 653 | " 'Windows 10 is a series of personal computer operating systems'],\n", 654 | " ['Windows XP',\n", 655 | " 'Windows XP is old, and Microsoft no longer provides official support'],\n", 656 | " ['Windows 7',\n", 657 | " 'Windows 7 is a personal computer operating system that was ..']]" 658 | ] 659 | }, 660 | "execution_count": 25, 661 | "metadata": {}, 662 | "output_type": "execute_result" 663 | } 664 | ], 665 | "source": [ 666 | "list2" 667 | ] 668 | }, 669 | { 670 | "cell_type": "code", 671 | "execution_count": null, 672 | "metadata": {}, 673 | "outputs": [], 674 | "source": [] 675 | }, 676 | { 677 | "cell_type": "code", 678 | "execution_count": null, 679 | "metadata": {}, 680 | "outputs": [], 681 | "source": [] 682 | } 683 | ], 684 | "metadata": { 685 | "kernelspec": { 686 | "display_name": "Python 3", 687 | "language": "python", 688 | "name": "python3" 689 | }, 690 | "language_info": { 691 | "codemirror_mode": { 692 | "name": "ipython", 693 | "version": 3 694 | }, 695 | "file_extension": ".py", 696 | "mimetype": "text/x-python", 697 | "name": "python", 698 | "nbconvert_exporter": "python", 699 | "pygments_lexer": "ipython3", 700 | "version": "3.6.7" 701 | } 702 | }, 703 | "nbformat": 4, 704 | "nbformat_minor": 2 705 | } 706 | -------------------------------------------------------------------------------- /notebooks/Q&A/Questions_and_Answers_1_Improve_OCR_and_tabula_range.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "## Questions and Answers 2 Improve OCR and tabula range" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "## Question 1\n", 15 | "\n", 16 | "#### Extract tabular data from PDF with Python - Tabula, Camelot, PyPDF2\n", 17 | "\n", 18 | "https://youtu.be/702lkQbZx50\n", 19 | "\n", 20 | "![Question 1](../images/Selection_177.png)\n", 21 | "\n" 22 | ] 23 | }, 24 | { 25 | "cell_type": "code", 26 | "execution_count": 1, 27 | "metadata": {}, 28 | "outputs": [ 29 | { 30 | "data": { 31 | "text/plain": [ 32 | "(29, 4)" 33 | ] 34 | }, 35 | "execution_count": 1, 36 | "metadata": {}, 37 | "output_type": "execute_result" 38 | } 39 | ], 40 | "source": [ 41 | "from tabula import read_pdf\n", 42 | "df = read_pdf(\"http://www.uncledavesenterprise.com/file/health/Food%20Calories%20List.pdf\", pages=3)\n", 43 | "df.shape" 44 | ] 45 | }, 46 | { 47 | "cell_type": "code", 48 | "execution_count": 2, 49 | "metadata": {}, 50 | "outputs": [ 51 | { 52 | "data": { 53 | "text/plain": [ 54 | "(69, 5)" 55 | ] 56 | }, 57 | "execution_count": 2, 58 | "metadata": {}, 59 | "output_type": "execute_result" 60 | } 61 | ], 62 | "source": [ 63 | "# specify page range 1 to 3 page\n", 64 | "df = read_pdf(\"http://www.uncledavesenterprise.com/file/health/Food%20Calories%20List.pdf\", pages='1-3')\n", 65 | "df.shape" 66 | ] 67 | }, 68 | { 69 | "cell_type": "code", 70 | "execution_count": 3, 71 | "metadata": {}, 72 | "outputs": [ 73 | { 74 | "data": { 75 | "text/html": [ 76 | "
\n", 77 | "\n", 90 | "\n", 91 | " \n", 92 | " \n", 93 | " \n", 94 | " \n", 95 | " \n", 96 | " \n", 97 | " \n", 98 | " \n", 99 | " \n", 100 | " \n", 101 | " \n", 102 | " \n", 103 | " \n", 104 | " \n", 105 | " \n", 106 | " \n", 107 | " \n", 108 | " \n", 109 | " \n", 110 | " \n", 111 | " \n", 112 | " \n", 113 | " \n", 114 | " \n", 115 | " \n", 116 | " \n", 117 | " \n", 118 | " \n", 119 | " \n", 120 | " \n", 121 | " \n", 122 | " \n", 123 | " \n", 124 | " \n", 125 | " \n", 126 | " \n", 127 | " \n", 128 | " \n", 129 | " \n", 130 | " \n", 131 | " \n", 132 | " \n", 133 | " \n", 134 | " \n", 135 | " \n", 136 | " \n", 137 | " \n", 138 | " \n", 139 | " \n", 140 | " \n", 141 | " \n", 142 | " \n", 143 | "
BREADS & CEREALSPortion size *per 100 grams (3.5 oz)Unnamed: 3energy content
0Bagel ( 1 average )140 cals (45g)310 calsNaNMedium
1Biscuit digestives86 cals (per biscuit)480 calsNaNHigh
2Jaffa cake48 cals (per biscuit)370 calsNaNMed-High
3Bread white (thick slice)96 cals (1 slice 40g)240 calsNaNMedium
4Bread wholemeal (thick)88 cals (1 slice 40g)220 calsNaNLow-med
\n", 144 | "
" 145 | ], 146 | "text/plain": [ 147 | " BREADS & CEREALS Portion size * per 100 grams (3.5 oz) \\\n", 148 | "0 Bagel ( 1 average ) 140 cals (45g) 310 cals \n", 149 | "1 Biscuit digestives 86 cals (per biscuit) 480 cals \n", 150 | "2 Jaffa cake 48 cals (per biscuit) 370 cals \n", 151 | "3 Bread white (thick slice) 96 cals (1 slice 40g) 240 cals \n", 152 | "4 Bread wholemeal (thick) 88 cals (1 slice 40g) 220 cals \n", 153 | "\n", 154 | " Unnamed: 3 energy content \n", 155 | "0 NaN Medium \n", 156 | "1 NaN High \n", 157 | "2 NaN Med-High \n", 158 | "3 NaN Medium \n", 159 | "4 NaN Low-med " 160 | ] 161 | }, 162 | "execution_count": 3, 163 | "metadata": {}, 164 | "output_type": "execute_result" 165 | } 166 | ], 167 | "source": [ 168 | "df.head()" 169 | ] 170 | }, 171 | { 172 | "cell_type": "code", 173 | "execution_count": 4, 174 | "metadata": {}, 175 | "outputs": [ 176 | { 177 | "data": { 178 | "text/html": [ 179 | "
\n", 180 | "\n", 193 | "\n", 194 | " \n", 195 | " \n", 196 | " \n", 197 | " \n", 198 | " \n", 199 | " \n", 200 | " \n", 201 | " \n", 202 | " \n", 203 | " \n", 204 | " \n", 205 | " \n", 206 | " \n", 207 | " \n", 208 | " \n", 209 | " \n", 210 | " \n", 211 | " \n", 212 | " \n", 213 | " \n", 214 | " \n", 215 | " \n", 216 | " \n", 217 | " \n", 218 | " \n", 219 | " \n", 220 | " \n", 221 | " \n", 222 | " \n", 223 | " \n", 224 | " \n", 225 | " \n", 226 | " \n", 227 | " \n", 228 | " \n", 229 | " \n", 230 | " \n", 231 | " \n", 232 | " \n", 233 | " \n", 234 | " \n", 235 | " \n", 236 | " \n", 237 | " \n", 238 | " \n", 239 | " \n", 240 | " \n", 241 | " \n", 242 | " \n", 243 | " \n", 244 | " \n", 245 | " \n", 246 | "
BREADS & CEREALSPortion size *per 100 grams (3.5 oz)Unnamed: 3energy content
64Sausage pork fried250 cals320 calsHighNaN
65Sausage pork grilled220 cals280 calsMed-HighNaN
66Sausage roll290 cals480 calsHighNaN
67Scampi fried in oil400 cals340 calsHighNaN
68Steak & kidney pie400 cals350 calsHighNaN
\n", 247 | "
" 248 | ], 249 | "text/plain": [ 250 | " BREADS & CEREALS Portion size * per 100 grams (3.5 oz) Unnamed: 3 \\\n", 251 | "64 Sausage pork fried 250 cals 320 cals High \n", 252 | "65 Sausage pork grilled 220 cals 280 cals Med-High \n", 253 | "66 Sausage roll 290 cals 480 cals High \n", 254 | "67 Scampi fried in oil 400 cals 340 cals High \n", 255 | "68 Steak & kidney pie 400 cals 350 cals High \n", 256 | "\n", 257 | " energy content \n", 258 | "64 NaN \n", 259 | "65 NaN \n", 260 | "66 NaN \n", 261 | "67 NaN \n", 262 | "68 NaN " 263 | ] 264 | }, 265 | "execution_count": 4, 266 | "metadata": {}, 267 | "output_type": "execute_result" 268 | } 269 | ], 270 | "source": [ 271 | "df.tail()" 272 | ] 273 | }, 274 | { 275 | "cell_type": "code", 276 | "execution_count": 5, 277 | "metadata": {}, 278 | "outputs": [ 279 | { 280 | "data": { 281 | "text/plain": [ 282 | "(69, 5)" 283 | ] 284 | }, 285 | "execution_count": 5, 286 | "metadata": {}, 287 | "output_type": "execute_result" 288 | } 289 | ], 290 | "source": [ 291 | "# create page range 1 to 3 page\n", 292 | "pages=(str(1)+'-'+str(3))\n", 293 | "df = read_pdf(\"http://www.uncledavesenterprise.com/file/health/Food%20Calories%20List.pdf\", pages=pages)\n", 294 | "df.shape" 295 | ] 296 | }, 297 | { 298 | "cell_type": "code", 299 | "execution_count": 6, 300 | "metadata": {}, 301 | "outputs": [ 302 | { 303 | "data": { 304 | "text/plain": [ 305 | "(69, 5)" 306 | ] 307 | }, 308 | "execution_count": 6, 309 | "metadata": {}, 310 | "output_type": "execute_result" 311 | } 312 | ], 313 | "source": [ 314 | "# list all possible pages\n", 315 | "df = read_pdf(\"http://www.uncledavesenterprise.com/file/health/Food%20Calories%20List.pdf\", pages=[1,2,3])\n", 316 | "df.shape" 317 | ] 318 | }, 319 | { 320 | "cell_type": "code", 321 | "execution_count": 7, 322 | "metadata": {}, 323 | "outputs": [ 324 | { 325 | "data": { 326 | "text/plain": [ 327 | "(69, 5)" 328 | ] 329 | }, 330 | "execution_count": 7, 331 | "metadata": {}, 332 | "output_type": "execute_result" 333 | } 334 | ], 335 | "source": [ 336 | "# list all possible pages using range\n", 337 | "pages = list(range(1, 4))\n", 338 | "df = read_pdf(\"http://www.uncledavesenterprise.com/file/health/Food%20Calories%20List.pdf\", pages=pages)\n", 339 | "df.shape" 340 | ] 341 | }, 342 | { 343 | "cell_type": "markdown", 344 | "metadata": {}, 345 | "source": [ 346 | "## Question 2\n", 347 | "\n", 348 | "#### python extract text from image or pdf\n", 349 | "\n", 350 | "https://youtu.be/PK-GvWWQ03g\n", 351 | "\n", 352 | "![Question ](../images/Selection_178.png)\n", 353 | "\n", 354 | "python extract text from image or pdf\n", 355 | "\n", 356 | "https://blog.softhints.com/python-extract-text-from-image-or-pdf/\n", 357 | "\n", 358 | "Improve OCR Accuracy With Advanced Image Preprocessing\n", 359 | "\n", 360 | "https://docparser.com/blog/improve-ocr-accuracy/" 361 | ] 362 | }, 363 | { 364 | "cell_type": "markdown", 365 | "metadata": {}, 366 | "source": [ 367 | "![Question ](../images/Selection_174.png)\n" 368 | ] 369 | }, 370 | { 371 | "cell_type": "code", 372 | "execution_count": 8, 373 | "metadata": {}, 374 | "outputs": [], 375 | "source": [ 376 | "from PIL import Image\n", 377 | "import pytesseract" 378 | ] 379 | }, 380 | { 381 | "cell_type": "code", 382 | "execution_count": 9, 383 | "metadata": {}, 384 | "outputs": [ 385 | { 386 | "name": "stdout", 387 | "output_type": "stream", 388 | "text": [ 389 | "Java\n", 390 | "\n", 391 | "Python\n", 392 | "\n", 393 | "public class JavaPyramid1 {\n", 394 | "public static void main(String[] args) {\n", 395 | "for(int i=1; i<=5; i++) {\n", 396 | "for(int j=0; j, line 1)", 70 | "traceback": [ 71 | "\u001b[0;36m File \u001b[0;32m\"\"\u001b[0;36m, line \u001b[0;32m1\u001b[0m\n\u001b[0;31m 2 *** 2\u001b[0m\n\u001b[0m ^\u001b[0m\n\u001b[0;31mSyntaxError\u001b[0m\u001b[0;31m:\u001b[0m invalid syntax\n" 72 | ], 73 | "output_type": "error" 74 | } 75 | ], 76 | "source": [ 77 | "2 *** 2" 78 | ] 79 | }, 80 | { 81 | "cell_type": "code", 82 | "execution_count": 4, 83 | "metadata": {}, 84 | "outputs": [ 85 | { 86 | "data": { 87 | "text/plain": [ 88 | "'aaaaa'" 89 | ] 90 | }, 91 | "execution_count": 4, 92 | "metadata": {}, 93 | "output_type": "execute_result" 94 | } 95 | ], 96 | "source": [ 97 | "'a' * 5" 98 | ] 99 | }, 100 | { 101 | "cell_type": "code", 102 | "execution_count": 5, 103 | "metadata": {}, 104 | "outputs": [ 105 | { 106 | "data": { 107 | "text/plain": [ 108 | "'ffffff'" 109 | ] 110 | }, 111 | "execution_count": 5, 112 | "metadata": {}, 113 | "output_type": "execute_result" 114 | } 115 | ], 116 | "source": [ 117 | "'fff' * 2" 118 | ] 119 | }, 120 | { 121 | "cell_type": "markdown", 122 | "metadata": {}, 123 | "source": [ 124 | "## Extending collections" 125 | ] 126 | }, 127 | { 128 | "cell_type": "code", 129 | "execution_count": 6, 130 | "metadata": {}, 131 | "outputs": [ 132 | { 133 | "data": { 134 | "text/plain": [ 135 | "[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]" 136 | ] 137 | }, 138 | "execution_count": 6, 139 | "metadata": {}, 140 | "output_type": "execute_result" 141 | } 142 | ], 143 | "source": [ 144 | "[0] * 20 " 145 | ] 146 | }, 147 | { 148 | "cell_type": "code", 149 | "execution_count": 7, 150 | "metadata": {}, 151 | "outputs": [ 152 | { 153 | "data": { 154 | "text/plain": [ 155 | "[0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2]" 156 | ] 157 | }, 158 | "execution_count": 7, 159 | "metadata": {}, 160 | "output_type": "execute_result" 161 | } 162 | ], 163 | "source": [ 164 | "[0, 1 , 2] * 5" 165 | ] 166 | }, 167 | { 168 | "cell_type": "code", 169 | "execution_count": 8, 170 | "metadata": {}, 171 | "outputs": [ 172 | { 173 | "data": { 174 | "text/plain": [ 175 | "[[0, 1, 2], [3], [0, 1, 2], [3]]" 176 | ] 177 | }, 178 | "execution_count": 8, 179 | "metadata": {}, 180 | "output_type": "execute_result" 181 | } 182 | ], 183 | "source": [ 184 | "[[0, 1 , 2], [3]] * 2" 185 | ] 186 | }, 187 | { 188 | "cell_type": "markdown", 189 | "metadata": {}, 190 | "source": [ 191 | "## Unpacking" 192 | ] 193 | }, 194 | { 195 | "cell_type": "code", 196 | "execution_count": 9, 197 | "metadata": {}, 198 | "outputs": [ 199 | { 200 | "data": { 201 | "text/plain": [ 202 | "[1, 3, 5, 7, 9]" 203 | ] 204 | }, 205 | "execution_count": 9, 206 | "metadata": {}, 207 | "output_type": "execute_result" 208 | } 209 | ], 210 | "source": [ 211 | "odds = [1, 3, 5, 7, 9]\n", 212 | "*x, = odds\n", 213 | "x" 214 | ] 215 | }, 216 | { 217 | "cell_type": "code", 218 | "execution_count": 10, 219 | "metadata": {}, 220 | "outputs": [ 221 | { 222 | "data": { 223 | "text/plain": [ 224 | "[1, 3, 5, 7]" 225 | ] 226 | }, 227 | "execution_count": 10, 228 | "metadata": {}, 229 | "output_type": "execute_result" 230 | } 231 | ], 232 | "source": [ 233 | "*x,y = odds\n", 234 | "x" 235 | ] 236 | }, 237 | { 238 | "cell_type": "code", 239 | "execution_count": 11, 240 | "metadata": {}, 241 | "outputs": [ 242 | { 243 | "data": { 244 | "text/plain": [ 245 | "9" 246 | ] 247 | }, 248 | "execution_count": 11, 249 | "metadata": {}, 250 | "output_type": "execute_result" 251 | } 252 | ], 253 | "source": [ 254 | "y" 255 | ] 256 | }, 257 | { 258 | "cell_type": "code", 259 | "execution_count": 12, 260 | "metadata": {}, 261 | "outputs": [], 262 | "source": [ 263 | "x, *y, z = odds" 264 | ] 265 | }, 266 | { 267 | "cell_type": "code", 268 | "execution_count": 13, 269 | "metadata": {}, 270 | "outputs": [ 271 | { 272 | "data": { 273 | "text/plain": [ 274 | "[3, 5, 7]" 275 | ] 276 | }, 277 | "execution_count": 13, 278 | "metadata": {}, 279 | "output_type": "execute_result" 280 | } 281 | ], 282 | "source": [ 283 | "y" 284 | ] 285 | }, 286 | { 287 | "cell_type": "code", 288 | "execution_count": 14, 289 | "metadata": {}, 290 | "outputs": [ 291 | { 292 | "name": "stdout", 293 | "output_type": "stream", 294 | "text": [ 295 | "(1, 3, 5, 7, 9)\n", 296 | "([1, 3, 5, 7, 9],)\n" 297 | ] 298 | } 299 | ], 300 | "source": [ 301 | "odds = [1, 3, 5, 7, 9]\n", 302 | "\n", 303 | "def sum_all(*numbers):\n", 304 | " print(numbers)\n", 305 | "\n", 306 | "sum_all(*odds)\n", 307 | "\n", 308 | "sum_all(odds)" 309 | ] 310 | }, 311 | { 312 | "cell_type": "markdown", 313 | "metadata": {}, 314 | "source": [ 315 | "## positional arguments and keyword arguments" 316 | ] 317 | }, 318 | { 319 | "cell_type": "code", 320 | "execution_count": 15, 321 | "metadata": {}, 322 | "outputs": [ 323 | { 324 | "name": "stdout", 325 | "output_type": "stream", 326 | "text": [ 327 | "('x', 'y', 'z', 'w', 'v')\n" 328 | ] 329 | } 330 | ], 331 | "source": [ 332 | "def print_all(*args):\n", 333 | " print(args) \n", 334 | "print_all('x', 'y', 'z', 'w', 'v')" 335 | ] 336 | }, 337 | { 338 | "cell_type": "code", 339 | "execution_count": 16, 340 | "metadata": {}, 341 | "outputs": [ 342 | { 343 | "name": "stdout", 344 | "output_type": "stream", 345 | "text": [ 346 | "{'x': 'x', 'y': 'y', 'z': 'z', 'w': 'w', 'v': 'v'}\n" 347 | ] 348 | } 349 | ], 350 | "source": [ 351 | "def print_all(**kwargs):\n", 352 | " print(kwargs)\n", 353 | "print_all(x='x', y='y', z='z', w='w', v='v')" 354 | ] 355 | }, 356 | { 357 | "cell_type": "code", 358 | "execution_count": null, 359 | "metadata": {}, 360 | "outputs": [], 361 | "source": [] 362 | } 363 | ], 364 | "metadata": { 365 | "kernelspec": { 366 | "display_name": "Python 3", 367 | "language": "python", 368 | "name": "python3" 369 | }, 370 | "language_info": { 371 | "codemirror_mode": { 372 | "name": "ipython", 373 | "version": 3 374 | }, 375 | "file_extension": ".py", 376 | "mimetype": "text/x-python", 377 | "name": "python", 378 | "nbconvert_exporter": "python", 379 | "pygments_lexer": "ipython3", 380 | "version": "3.6.7" 381 | } 382 | }, 383 | "nbformat": 4, 384 | "nbformat_minor": 2 385 | } 386 | -------------------------------------------------------------------------------- /notebooks/csv/data.csv.zip: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/softhints/python/a256a054d74ca397f41874b3e26f1c4b84214432/notebooks/csv/data.csv.zip -------------------------------------------------------------------------------- /notebooks/csv/data_201901.csv: -------------------------------------------------------------------------------- 1 | col1,col2,col3 2 | A,B,1 3 | AA,BB,2 -------------------------------------------------------------------------------- /notebooks/csv/data_201902.csv: -------------------------------------------------------------------------------- 1 | col1,col2,col3 2 | C,D,3 3 | CC,DD,4 -------------------------------------------------------------------------------- /notebooks/csv/data_202001.csv: -------------------------------------------------------------------------------- 1 | col1,col2,col3,col4 2 | E,F,5,e5 3 | EE,FF,6,ee6 -------------------------------------------------------------------------------- /notebooks/csv/data_202002.csv: -------------------------------------------------------------------------------- 1 | col1,col2,col3,col5 2 | H,J,7,77 3 | HH,JJ,8,88 -------------------------------------------------------------------------------- /notebooks/csv/excel/example.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/softhints/python/a256a054d74ca397f41874b3e26f1c4b84214432/notebooks/csv/excel/example.xlsx -------------------------------------------------------------------------------- /notebooks/pandas/Pandas_Select_rows_between_two_dates_-_DataFrame_or_CSV_file.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Pandas : Select rows between two dates - DataFrame or CSV file" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "## Resources\n", 15 | "\n", 16 | "* [pandas.to_datetime](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.to_datetime.html)\n", 17 | "* [pandas.DataFrame.between_time](https://pandas.pydata.org/pandas-docs/version/0.23.4/generated/pandas.DataFrame.between_time.html)\n", 18 | "* [pandas.DataFrame.loc](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.loc.html)" 19 | ] 20 | }, 21 | { 22 | "cell_type": "markdown", 23 | "metadata": {}, 24 | "source": [ 25 | "## Use cases\n", 26 | "\n", 27 | "* Pandas: Verify columns containing dates\n", 28 | "* Convert string to datetime in DataFrame\n", 29 | "* Select rows between two dates\n", 30 | " * 1. Select rows based on dates with loc\n", 31 | " * 2. Series method between\n", 32 | " * 3. Select rows between two times\n", 33 | " * 4. Select rows based on dates without loc\n", 34 | " * 5. Use mask to mark the records\n", 35 | " * 6. Select records from last month/30 days " 36 | ] 37 | }, 38 | { 39 | "cell_type": "markdown", 40 | "metadata": {}, 41 | "source": [ 42 | "## Step 1: Import Pandas and read data" 43 | ] 44 | }, 45 | { 46 | "cell_type": "code", 47 | "execution_count": 1, 48 | "metadata": {}, 49 | "outputs": [ 50 | { 51 | "data": { 52 | "text/html": [ 53 | "
\n", 54 | "\n", 67 | "\n", 68 | " \n", 69 | " \n", 70 | " \n", 71 | " \n", 72 | " \n", 73 | " \n", 74 | " \n", 75 | " \n", 76 | " \n", 77 | " \n", 78 | " \n", 79 | " \n", 80 | " \n", 81 | " \n", 82 | " \n", 83 | " \n", 84 | " \n", 85 | " \n", 86 | " \n", 87 | " \n", 88 | " \n", 89 | " \n", 90 | " \n", 91 | " \n", 92 | " \n", 93 | " \n", 94 | " \n", 95 | " \n", 96 | " \n", 97 | " \n", 98 | " \n", 99 | " \n", 100 | " \n", 101 | " \n", 102 | " \n", 103 | " \n", 104 | " \n", 105 | " \n", 106 | " \n", 107 | "
loading_datetimepagestitledatetime_col
02019-10-28 19:56:03main<GET https://www.wikipedia.org/> (The Free En...2019-10-29 9:06:03
12019-10-29 19:56:03english<GET https://en.wikipedia.org/wiki/Main_Page>...2019-10-31 11:16:43
22019-10-29 19:56:03italiano<GET https://it.wikipedia.org/wiki/Pagina_pri...2019-10-30 21:15:23
32019-10-30 19:56:03português<GET https://pt.wikipedia.org/wiki/Wikip%C3%A...2019-10-30 20:26:35
\n", 108 | "
" 109 | ], 110 | "text/plain": [ 111 | " loading_datetime pages \\\n", 112 | "0 2019-10-28 19:56:03 main \n", 113 | "1 2019-10-29 19:56:03 english \n", 114 | "2 2019-10-29 19:56:03 italiano \n", 115 | "3 2019-10-30 19:56:03 português \n", 116 | "\n", 117 | " title datetime_col \n", 118 | "0 (The Free En... 2019-10-29 9:06:03 \n", 119 | "1 ... 2019-10-31 11:16:43 \n", 120 | "2 \n", 320 | "\n", 333 | "\n", 334 | " \n", 335 | " \n", 336 | " \n", 337 | " \n", 338 | " \n", 339 | " \n", 340 | " \n", 341 | " \n", 342 | " \n", 343 | " \n", 344 | " \n", 345 | " \n", 346 | " \n", 347 | " \n", 348 | " \n", 349 | " \n", 350 | " \n", 351 | " \n", 352 | " \n", 353 | " \n", 354 | " \n", 355 | " \n", 356 | " \n", 357 | " \n", 358 | " \n", 359 | "
loading_datetimepagestitledatetime_col
12019-10-29 19:56:03english<GET https://en.wikipedia.org/wiki/Main_Page>...2019-10-31 11:16:43+00:00
22019-10-29 19:56:03italiano<GET https://it.wikipedia.org/wiki/Pagina_pri...2019-10-30 21:15:23+00:00
\n", 360 | "" 361 | ], 362 | "text/plain": [ 363 | " loading_datetime pages \\\n", 364 | "1 2019-10-29 19:56:03 english \n", 365 | "2 2019-10-29 19:56:03 italiano \n", 366 | "\n", 367 | " title datetime_col \n", 368 | "1 ... 2019-10-31 11:16:43+00:00 \n", 369 | "2 start_date) & (df['datetime_col'] < end_date)]" 382 | ] 383 | }, 384 | { 385 | "cell_type": "markdown", 386 | "metadata": {}, 387 | "source": [ 388 | "#### 2. Series method between" 389 | ] 390 | }, 391 | { 392 | "cell_type": "code", 393 | "execution_count": null, 394 | "metadata": {}, 395 | "outputs": [], 396 | "source": [ 397 | "start_date = pd.to_datetime('2019-10-30 20:41', utc= True)\n", 398 | "end_date = pd.to_datetime('5/13/2020 8:55', utc= True)\n", 399 | "\n", 400 | "df[df.datetime_col.between(start_date, end_date)]" 401 | ] 402 | }, 403 | { 404 | "cell_type": "markdown", 405 | "metadata": {}, 406 | "source": [ 407 | "#### 3. Select rows between two times" 408 | ] 409 | }, 410 | { 411 | "cell_type": "code", 412 | "execution_count": 11, 413 | "metadata": {}, 414 | "outputs": [ 415 | { 416 | "data": { 417 | "text/html": [ 418 | "
\n", 419 | "\n", 432 | "\n", 433 | " \n", 434 | " \n", 435 | " \n", 436 | " \n", 437 | " \n", 438 | " \n", 439 | " \n", 440 | " \n", 441 | " \n", 442 | " \n", 443 | " \n", 444 | " \n", 445 | " \n", 446 | " \n", 447 | " \n", 448 | " \n", 449 | " \n", 450 | " \n", 451 | " \n", 452 | " \n", 453 | " \n", 454 | " \n", 455 | "
loading_datetimepagestitle
datetime_col
2019-10-30 21:15:23+00:002019-10-29 19:56:03italiano<GET https://it.wikipedia.org/wiki/Pagina_pri...
\n", 456 | "
" 457 | ], 458 | "text/plain": [ 459 | " loading_datetime pages \\\n", 460 | "datetime_col \n", 461 | "2019-10-30 21:15:23+00:00 2019-10-29 19:56:03 italiano \n", 462 | "\n", 463 | " title \n", 464 | "datetime_col \n", 465 | "2019-10-30 21:15:23+00:00 '2018-12-02') & (df['datetime_col'] <= '2018-12-03 23:26:10+00:00')]" 493 | ] 494 | }, 495 | { 496 | "cell_type": "markdown", 497 | "metadata": {}, 498 | "source": [ 499 | "#### 6. Select records from last month/30 days " 500 | ] 501 | }, 502 | { 503 | "cell_type": "code", 504 | "execution_count": 12, 505 | "metadata": {}, 506 | "outputs": [ 507 | { 508 | "data": { 509 | "text/html": [ 510 | "
\n", 511 | "\n", 524 | "\n", 525 | " \n", 526 | " \n", 527 | " \n", 528 | " \n", 529 | " \n", 530 | " \n", 531 | " \n", 532 | " \n", 533 | " \n", 534 | " \n", 535 | " \n", 536 | " \n", 537 | " \n", 538 | " \n", 539 | " \n", 540 | " \n", 541 | " \n", 542 | " \n", 543 | "
loading_datetimepagestitledatetime_col
12019-10-29 19:56:03english<GET https://en.wikipedia.org/wiki/Main_Page>...2019-10-31 11:16:43+00:00
\n", 544 | "
" 545 | ], 546 | "text/plain": [ 547 | " loading_datetime pages \\\n", 548 | "1 2019-10-29 19:56:03 english \n", 549 | "\n", 550 | " title datetime_col \n", 551 | "1 ... 2019-10-31 11:16:43+00:00 " 552 | ] 553 | }, 554 | "execution_count": 12, 555 | "metadata": {}, 556 | "output_type": "execute_result" 557 | } 558 | ], 559 | "source": [ 560 | "df[df[\"datetime_col\"] >= (pd.to_datetime('11/30/2019', utc=True) - pd.Timedelta(days=30))]" 561 | ] 562 | }, 563 | { 564 | "cell_type": "code", 565 | "execution_count": null, 566 | "metadata": {}, 567 | "outputs": [], 568 | "source": [] 569 | } 570 | ], 571 | "metadata": { 572 | "kernelspec": { 573 | "display_name": "Python 3", 574 | "language": "python", 575 | "name": "python3" 576 | }, 577 | "language_info": { 578 | "codemirror_mode": { 579 | "name": "ipython", 580 | "version": 3 581 | }, 582 | "file_extension": ".py", 583 | "mimetype": "text/x-python", 584 | "name": "python", 585 | "nbconvert_exporter": "python", 586 | "pygments_lexer": "ipython3", 587 | "version": "3.6.9" 588 | } 589 | }, 590 | "nbformat": 4, 591 | "nbformat_minor": 2 592 | } 593 | -------------------------------------------------------------------------------- /notebooks/python/Files/How_to_merge_multiple_CSV_files_with_Python.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# How to merge multiple CSV files with Python\n", 8 | "Python convert normal JSON to JSON separated lines 3 examples\n", 9 | "\n", 10 | "* Steps to merge multiple CSV(identical) files with Python\n", 11 | "* Steps to merge multiple CSV(identical) files with Python with trace\n", 12 | "* Combine multiple CSV files when the columns are different\n", 13 | "* Bonus: Merge multiple files with Windows/Linux" 14 | ] 15 | }, 16 | { 17 | "cell_type": "code", 18 | "execution_count": 2, 19 | "metadata": {}, 20 | "outputs": [ 21 | { 22 | "data": { 23 | "text/plain": [ 24 | "['../../csv/data_202001.csv',\n", 25 | " '../../csv/data_202002.csv',\n", 26 | " '../../csv/data_201902.csv',\n", 27 | " '../../csv/data_201901.csv']" 28 | ] 29 | }, 30 | "metadata": {}, 31 | "output_type": "display_data" 32 | }, 33 | { 34 | "data": { 35 | "text/html": [ 36 | "
\n", 37 | "\n", 50 | "\n", 51 | " \n", 52 | " \n", 53 | " \n", 54 | " \n", 55 | " \n", 56 | " \n", 57 | " \n", 58 | " \n", 59 | " \n", 60 | " \n", 61 | " \n", 62 | " \n", 63 | " \n", 64 | " \n", 65 | " \n", 66 | " \n", 67 | " \n", 68 | " \n", 69 | " \n", 70 | " \n", 71 | " \n", 72 | " \n", 73 | " \n", 74 | " \n", 75 | " \n", 76 | "
col1col2col3col4
0EF5e5
1EEFF6ee6
\n", 77 | "
" 78 | ], 79 | "text/plain": [ 80 | " col1 col2 col3 col4\n", 81 | "0 E F 5 e5\n", 82 | "1 EE FF 6 ee6" 83 | ] 84 | }, 85 | "metadata": {}, 86 | "output_type": "display_data" 87 | }, 88 | { 89 | "data": { 90 | "text/html": [ 91 | "
\n", 92 | "\n", 105 | "\n", 106 | " \n", 107 | " \n", 108 | " \n", 109 | " \n", 110 | " \n", 111 | " \n", 112 | " \n", 113 | " \n", 114 | " \n", 115 | " \n", 116 | " \n", 117 | " \n", 118 | " \n", 119 | " \n", 120 | " \n", 121 | " \n", 122 | " \n", 123 | " \n", 124 | " \n", 125 | " \n", 126 | " \n", 127 | " \n", 128 | " \n", 129 | " \n", 130 | " \n", 131 | "
col1col2col3col5
0HJ777
1HHJJ888
\n", 132 | "
" 133 | ], 134 | "text/plain": [ 135 | " col1 col2 col3 col5\n", 136 | "0 H J 7 77\n", 137 | "1 HH JJ 8 88" 138 | ] 139 | }, 140 | "metadata": {}, 141 | "output_type": "display_data" 142 | }, 143 | { 144 | "data": { 145 | "text/html": [ 146 | "
\n", 147 | "\n", 160 | "\n", 161 | " \n", 162 | " \n", 163 | " \n", 164 | " \n", 165 | " \n", 166 | " \n", 167 | " \n", 168 | " \n", 169 | " \n", 170 | " \n", 171 | " \n", 172 | " \n", 173 | " \n", 174 | " \n", 175 | " \n", 176 | " \n", 177 | " \n", 178 | " \n", 179 | " \n", 180 | " \n", 181 | " \n", 182 | " \n", 183 | "
col1col2col3
0CD3
1CCDD4
\n", 184 | "
" 185 | ], 186 | "text/plain": [ 187 | " col1 col2 col3\n", 188 | "0 C D 3\n", 189 | "1 CC DD 4" 190 | ] 191 | }, 192 | "metadata": {}, 193 | "output_type": "display_data" 194 | }, 195 | { 196 | "data": { 197 | "text/html": [ 198 | "
\n", 199 | "\n", 212 | "\n", 213 | " \n", 214 | " \n", 215 | " \n", 216 | " \n", 217 | " \n", 218 | " \n", 219 | " \n", 220 | " \n", 221 | " \n", 222 | " \n", 223 | " \n", 224 | " \n", 225 | " \n", 226 | " \n", 227 | " \n", 228 | " \n", 229 | " \n", 230 | " \n", 231 | " \n", 232 | " \n", 233 | " \n", 234 | " \n", 235 | "
col1col2col3
0AB1
1AABB2
\n", 236 | "
" 237 | ], 238 | "text/plain": [ 239 | " col1 col2 col3\n", 240 | "0 A B 1\n", 241 | "1 AA BB 2" 242 | ] 243 | }, 244 | "metadata": {}, 245 | "output_type": "display_data" 246 | } 247 | ], 248 | "source": [ 249 | "all_files = glob.glob(os.path.join(path, \"data_*.csv\"))\n", 250 | "display(all_files)\n", 251 | "for f in all_files:\n", 252 | " display(pd.read_csv(f, sep=','))" 253 | ] 254 | }, 255 | { 256 | "cell_type": "markdown", 257 | "metadata": {}, 258 | "source": [ 259 | "## 1. Steps to merge multiple CSV(identical) files with Python" 260 | ] 261 | }, 262 | { 263 | "cell_type": "code", 264 | "execution_count": 3, 265 | "metadata": {}, 266 | "outputs": [], 267 | "source": [ 268 | "import os, glob\n", 269 | "import pandas as pd\n", 270 | "\n", 271 | "path = \"../../csv/\"\n", 272 | "#path = \"/home/user/data\"\n", 273 | "\n", 274 | "all_files = glob.glob(os.path.join(path, \"data_2019*.csv\"))\n", 275 | "\n", 276 | "all_csv = (pd.read_csv(f, sep=',') for f in all_files)\n", 277 | "df_merged = pd.concat(all_csv, ignore_index=True)\n", 278 | "df_merged.to_csv( \"merged.csv\")" 279 | ] 280 | }, 281 | { 282 | "cell_type": "code", 283 | "execution_count": 4, 284 | "metadata": {}, 285 | "outputs": [ 286 | { 287 | "data": { 288 | "text/html": [ 289 | "
\n", 290 | "\n", 303 | "\n", 304 | " \n", 305 | " \n", 306 | " \n", 307 | " \n", 308 | " \n", 309 | " \n", 310 | " \n", 311 | " \n", 312 | " \n", 313 | " \n", 314 | " \n", 315 | " \n", 316 | " \n", 317 | " \n", 318 | " \n", 319 | " \n", 320 | " \n", 321 | " \n", 322 | " \n", 323 | " \n", 324 | " \n", 325 | " \n", 326 | " \n", 327 | " \n", 328 | " \n", 329 | " \n", 330 | " \n", 331 | " \n", 332 | " \n", 333 | " \n", 334 | " \n", 335 | " \n", 336 | " \n", 337 | " \n", 338 | " \n", 339 | " \n", 340 | " \n", 341 | " \n", 342 | " \n", 343 | "
Unnamed: 0col1col2col3
00CD3
11CCDD4
22AB1
33AABB2
\n", 344 | "
" 345 | ], 346 | "text/plain": [ 347 | " Unnamed: 0 col1 col2 col3\n", 348 | "0 0 C D 3\n", 349 | "1 1 CC DD 4\n", 350 | "2 2 A B 1\n", 351 | "3 3 AA BB 2" 352 | ] 353 | }, 354 | "execution_count": 4, 355 | "metadata": {}, 356 | "output_type": "execute_result" 357 | } 358 | ], 359 | "source": [ 360 | "pd.read_csv('merged.csv')" 361 | ] 362 | }, 363 | { 364 | "cell_type": "markdown", 365 | "metadata": {}, 366 | "source": [ 367 | "## 2. Steps to merge multiple CSV(identical) files with Python with trace" 368 | ] 369 | }, 370 | { 371 | "cell_type": "code", 372 | "execution_count": 5, 373 | "metadata": {}, 374 | "outputs": [ 375 | { 376 | "data": { 377 | "text/html": [ 378 | "
\n", 379 | "\n", 392 | "\n", 393 | " \n", 394 | " \n", 395 | " \n", 396 | " \n", 397 | " \n", 398 | " \n", 399 | " \n", 400 | " \n", 401 | " \n", 402 | " \n", 403 | " \n", 404 | " \n", 405 | " \n", 406 | " \n", 407 | " \n", 408 | " \n", 409 | " \n", 410 | " \n", 411 | " \n", 412 | " \n", 413 | " \n", 414 | " \n", 415 | " \n", 416 | " \n", 417 | " \n", 418 | " \n", 419 | " \n", 420 | " \n", 421 | " \n", 422 | " \n", 423 | " \n", 424 | " \n", 425 | " \n", 426 | " \n", 427 | " \n", 428 | " \n", 429 | " \n", 430 | " \n", 431 | " \n", 432 | "
col1col2col3file
0CD3data_201902.csv
1CCDD4data_201902.csv
2AB1data_201901.csv
3AABB2data_201901.csv
\n", 433 | "
" 434 | ], 435 | "text/plain": [ 436 | " col1 col2 col3 file\n", 437 | "0 C D 3 data_201902.csv\n", 438 | "1 CC DD 4 data_201902.csv\n", 439 | "2 A B 1 data_201901.csv\n", 440 | "3 AA BB 2 data_201901.csv" 441 | ] 442 | }, 443 | "execution_count": 5, 444 | "metadata": {}, 445 | "output_type": "execute_result" 446 | } 447 | ], 448 | "source": [ 449 | "import os, glob\n", 450 | "import pandas as pd\n", 451 | "\n", 452 | "path = \"../../csv/\"\n", 453 | "\n", 454 | "all_files = glob.glob(os.path.join(path, \"data_2019*.csv\"))\n", 455 | "\n", 456 | "all_df = []\n", 457 | "for f in all_files:\n", 458 | " df = pd.read_csv(f, sep=',')\n", 459 | " df['file'] = f.split('/')[-1]\n", 460 | " all_df.append(df)\n", 461 | " \n", 462 | "merged_df = pd.concat(all_df, ignore_index=True)\n", 463 | "merged_df" 464 | ] 465 | }, 466 | { 467 | "cell_type": "markdown", 468 | "metadata": {}, 469 | "source": [ 470 | "## 3. Combine multiple CSV files when the columns are different" 471 | ] 472 | }, 473 | { 474 | "cell_type": "code", 475 | "execution_count": 6, 476 | "metadata": {}, 477 | "outputs": [ 478 | { 479 | "data": { 480 | "text/html": [ 481 | "
\n", 482 | "\n", 495 | "\n", 496 | " \n", 497 | " \n", 498 | " \n", 499 | " \n", 500 | " \n", 501 | " \n", 502 | " \n", 503 | " \n", 504 | " \n", 505 | " \n", 506 | " \n", 507 | " \n", 508 | " \n", 509 | " \n", 510 | " \n", 511 | " \n", 512 | " \n", 513 | " \n", 514 | " \n", 515 | " \n", 516 | " \n", 517 | " \n", 518 | " \n", 519 | " \n", 520 | " \n", 521 | " \n", 522 | " \n", 523 | " \n", 524 | " \n", 525 | " \n", 526 | " \n", 527 | " \n", 528 | " \n", 529 | " \n", 530 | " \n", 531 | " \n", 532 | " \n", 533 | " \n", 534 | " \n", 535 | " \n", 536 | " \n", 537 | " \n", 538 | " \n", 539 | " \n", 540 | " \n", 541 | " \n", 542 | " \n", 543 | " \n", 544 | " \n", 545 | " \n", 546 | " \n", 547 | " \n", 548 | " \n", 549 | " \n", 550 | " \n", 551 | " \n", 552 | " \n", 553 | " \n", 554 | " \n", 555 | " \n", 556 | " \n", 557 | " \n", 558 | " \n", 559 | " \n", 560 | " \n", 561 | " \n", 562 | " \n", 563 | " \n", 564 | " \n", 565 | " \n", 566 | " \n", 567 | " \n", 568 | " \n", 569 | " \n", 570 | " \n", 571 | " \n", 572 | " \n", 573 | " \n", 574 | " \n", 575 | " \n", 576 | " \n", 577 | " \n", 578 | " \n", 579 | " \n", 580 | " \n", 581 | "
col1col2col3col4col5file
0EF5e5NaNdata_202001.csv
1EEFF6ee6NaNdata_202001.csv
2HJ7NaN77.0data_202002.csv
3HHJJ8NaN88.0data_202002.csv
4CD3NaNNaNdata_201902.csv
5CCDD4NaNNaNdata_201902.csv
6AB1NaNNaNdata_201901.csv
7AABB2NaNNaNdata_201901.csv
\n", 582 | "
" 583 | ], 584 | "text/plain": [ 585 | " col1 col2 col3 col4 col5 file\n", 586 | "0 E F 5 e5 NaN data_202001.csv\n", 587 | "1 EE FF 6 ee6 NaN data_202001.csv\n", 588 | "2 H J 7 NaN 77.0 data_202002.csv\n", 589 | "3 HH JJ 8 NaN 88.0 data_202002.csv\n", 590 | "4 C D 3 NaN NaN data_201902.csv\n", 591 | "5 CC DD 4 NaN NaN data_201902.csv\n", 592 | "6 A B 1 NaN NaN data_201901.csv\n", 593 | "7 AA BB 2 NaN NaN data_201901.csv" 594 | ] 595 | }, 596 | "execution_count": 6, 597 | "metadata": {}, 598 | "output_type": "execute_result" 599 | } 600 | ], 601 | "source": [ 602 | "import os, glob\n", 603 | "import pandas as pd\n", 604 | "\n", 605 | "path = \"../../csv/\"\n", 606 | "\n", 607 | "all_files = glob.glob(os.path.join(path, \"data_*.csv\"))\n", 608 | "\n", 609 | "\n", 610 | "all_df = []\n", 611 | "for f in all_files:\n", 612 | " df = pd.read_csv(f, sep=',')\n", 613 | " df['file'] = f.split('/')[-1]\n", 614 | " all_df.append(df)\n", 615 | " \n", 616 | "merged_df = pd.concat(all_df, ignore_index=True, sort=True)\n", 617 | "merged_df" 618 | ] 619 | }, 620 | { 621 | "cell_type": "markdown", 622 | "metadata": {}, 623 | "source": [ 624 | "## 4. Bonus: Merge multiple files with Windows/Linux\n", 625 | "\n", 626 | "Linux\n", 627 | "\n", 628 | "`sed 1d data_*.csv > merged.csv`\n", 629 | "\n", 630 | "Windows\n", 631 | "\n", 632 | "`C:\\> copy data_*.csv merged.csv `" 633 | ] 634 | }, 635 | { 636 | "cell_type": "code", 637 | "execution_count": null, 638 | "metadata": {}, 639 | "outputs": [], 640 | "source": [] 641 | } 642 | ], 643 | "metadata": { 644 | "kernelspec": { 645 | "display_name": "Python 3", 646 | "language": "python", 647 | "name": "python3" 648 | }, 649 | "language_info": { 650 | "codemirror_mode": { 651 | "name": "ipython", 652 | "version": 3 653 | }, 654 | "file_extension": ".py", 655 | "mimetype": "text/x-python", 656 | "name": "python", 657 | "nbconvert_exporter": "python", 658 | "pygments_lexer": "ipython3", 659 | "version": "3.6.9" 660 | } 661 | }, 662 | "nbformat": 4, 663 | "nbformat_minor": 2 664 | } 665 | -------------------------------------------------------------------------------- /notebooks/python/JSON/41._Create_a_table_in_MySQL_Database_from_python_dictionary.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# 41. Create a table in SQL(MySQL Database) from python dictionary\n", 8 | "\n", 9 | "\n", 10 | "[Python convert normal JSON to JSON separated lines 3 examples](https://blog.softhints.com/python-convert-json-to-json-lines/)\n", 11 | "\n", 12 | "* Pandas DataFrame to MySQL\n", 13 | "* Create table from Python Dict\n", 14 | "* connect MySQL database and Python\n", 15 | " * SQLAlchemy\n", 16 | " * PyMySQL" 17 | ] 18 | }, 19 | { 20 | "cell_type": "markdown", 21 | "metadata": {}, 22 | "source": [ 23 | "Python dict which is converted to a Database Table\n", 24 | "\n", 25 | "```json\n", 26 | "{\"id\":1,\"label\":\"A\",\"size\":\"S\"}\n", 27 | "{\"id\":2,\"label\":\"B\",\"size\":\"XL\"}\n", 28 | "{\"id\":3,\"label\":\"C\",\"size\":\"XXl\"}\n", 29 | "```" 30 | ] 31 | }, 32 | { 33 | "cell_type": "markdown", 34 | "metadata": {}, 35 | "source": [ 36 | "## Step 1: Read/Create a Python dict" 37 | ] 38 | }, 39 | { 40 | "cell_type": "code", 41 | "execution_count": 1, 42 | "metadata": {}, 43 | "outputs": [ 44 | { 45 | "data": { 46 | "text/html": [ 47 | "
\n", 48 | "\n", 61 | "\n", 62 | " \n", 63 | " \n", 64 | " \n", 65 | " \n", 66 | " \n", 67 | " \n", 68 | " \n", 69 | " \n", 70 | " \n", 71 | " \n", 72 | " \n", 73 | " \n", 74 | " \n", 75 | " \n", 76 | " \n", 77 | " \n", 78 | " \n", 79 | " \n", 80 | " \n", 81 | " \n", 82 | " \n", 83 | " \n", 84 | " \n", 85 | " \n", 86 | " \n", 87 | " \n", 88 | " \n", 89 | " \n", 90 | "
idlabelsize
01AS
12BXL
23CXXl
\n", 91 | "
" 92 | ], 93 | "text/plain": [ 94 | " id label size\n", 95 | "0 1 A S\n", 96 | "1 2 B XL\n", 97 | "2 3 C XXl" 98 | ] 99 | }, 100 | "execution_count": 1, 101 | "metadata": {}, 102 | "output_type": "execute_result" 103 | } 104 | ], 105 | "source": [ 106 | "import pandas as pd\n", 107 | "\n", 108 | "# read normal JSON with pandas\n", 109 | "df = pd.read_json('/home/vanx/Downloads/old/normal_json.json')\n", 110 | "\n", 111 | "df.head()" 112 | ] 113 | }, 114 | { 115 | "cell_type": "code", 116 | "execution_count": 4, 117 | "metadata": {}, 118 | "outputs": [ 119 | { 120 | "data": { 121 | "text/plain": [ 122 | "{'id': {0: 1, 1: 2, 2: 3},\n", 123 | " 'label': {0: 'A', 1: 'B', 2: 'C'},\n", 124 | " 'size': {0: 'S', 1: 'XL', 2: 'XXl'}}" 125 | ] 126 | }, 127 | "execution_count": 4, 128 | "metadata": {}, 129 | "output_type": "execute_result" 130 | } 131 | ], 132 | "source": [ 133 | "data_dict = df.to_dict()\n", 134 | "data_dict" 135 | ] 136 | }, 137 | { 138 | "cell_type": "code", 139 | "execution_count": 5, 140 | "metadata": {}, 141 | "outputs": [ 142 | { 143 | "data": { 144 | "text/html": [ 145 | "
\n", 146 | "\n", 159 | "\n", 160 | " \n", 161 | " \n", 162 | " \n", 163 | " \n", 164 | " \n", 165 | " \n", 166 | " \n", 167 | " \n", 168 | " \n", 169 | " \n", 170 | " \n", 171 | " \n", 172 | " \n", 173 | " \n", 174 | " \n", 175 | " \n", 176 | " \n", 177 | " \n", 178 | " \n", 179 | " \n", 180 | " \n", 181 | " \n", 182 | " \n", 183 | " \n", 184 | " \n", 185 | " \n", 186 | " \n", 187 | " \n", 188 | "
idlabelsize
01AS
12BXL
23CXXl
\n", 189 | "
" 190 | ], 191 | "text/plain": [ 192 | " id label size\n", 193 | "0 1 A S\n", 194 | "1 2 B XL\n", 195 | "2 3 C XXl" 196 | ] 197 | }, 198 | "execution_count": 5, 199 | "metadata": {}, 200 | "output_type": "execute_result" 201 | } 202 | ], 203 | "source": [ 204 | "df2 = pd.DataFrame.from_dict(data_dict)\n", 205 | "df2.head()" 206 | ] 207 | }, 208 | { 209 | "cell_type": "markdown", 210 | "metadata": {}, 211 | "source": [ 212 | "## Step 2: Pandas DataFrame to MySQL table with SQLAlchemy" 213 | ] 214 | }, 215 | { 216 | "cell_type": "code", 217 | "execution_count": 6, 218 | "metadata": {}, 219 | "outputs": [], 220 | "source": [ 221 | "# connect\n", 222 | "from sqlalchemy import create_engine\n", 223 | "cnx = create_engine('mysql+pymysql://test:pass@localhost/test') " 224 | ] 225 | }, 226 | { 227 | "cell_type": "code", 228 | "execution_count": 7, 229 | "metadata": {}, 230 | "outputs": [], 231 | "source": [ 232 | "# create table from DataFrame\n", 233 | "df.to_sql('test', cnx, if_exists='replace', index = False)" 234 | ] 235 | }, 236 | { 237 | "cell_type": "code", 238 | "execution_count": 8, 239 | "metadata": {}, 240 | "outputs": [ 241 | { 242 | "data": { 243 | "text/html": [ 244 | "
\n", 245 | "\n", 258 | "\n", 259 | " \n", 260 | " \n", 261 | " \n", 262 | " \n", 263 | " \n", 264 | " \n", 265 | " \n", 266 | " \n", 267 | " \n", 268 | " \n", 269 | " \n", 270 | " \n", 271 | " \n", 272 | " \n", 273 | " \n", 274 | " \n", 275 | " \n", 276 | " \n", 277 | " \n", 278 | " \n", 279 | " \n", 280 | " \n", 281 | " \n", 282 | " \n", 283 | " \n", 284 | " \n", 285 | " \n", 286 | " \n", 287 | "
idlabelsize
01AS
12BXL
23CXXl
\n", 288 | "
" 289 | ], 290 | "text/plain": [ 291 | " id label size\n", 292 | "0 1 A S\n", 293 | "1 2 B XL\n", 294 | "2 3 C XXl" 295 | ] 296 | }, 297 | "execution_count": 8, 298 | "metadata": {}, 299 | "output_type": "execute_result" 300 | } 301 | ], 302 | "source": [ 303 | "# query table\n", 304 | "df = pd.read_sql('SELECT * FROM test', cnx)\n", 305 | "df.head()" 306 | ] 307 | }, 308 | { 309 | "cell_type": "markdown", 310 | "metadata": {}, 311 | "source": [ 312 | "## Step 3: Python Dict Insert Records Into a MySQL Database" 313 | ] 314 | }, 315 | { 316 | "cell_type": "code", 317 | "execution_count": 18, 318 | "metadata": {}, 319 | "outputs": [], 320 | "source": [ 321 | "# connect\n", 322 | "import pymysql\n", 323 | "\n", 324 | "connection = pymysql.connect(host='localhost',\n", 325 | " user='test',\n", 326 | " password='pass',\n", 327 | " db='test')\n", 328 | "cursor = connection.cursor()" 329 | ] 330 | }, 331 | { 332 | "cell_type": "code", 333 | "execution_count": 19, 334 | "metadata": {}, 335 | "outputs": [ 336 | { 337 | "data": { 338 | "text/plain": [ 339 | "0" 340 | ] 341 | }, 342 | "execution_count": 19, 343 | "metadata": {}, 344 | "output_type": "execute_result" 345 | } 346 | ], 347 | "source": [ 348 | "# Create table\n", 349 | "cols = df.columns\n", 350 | "table_name = 'test'\n", 351 | "ddl = \"\"\n", 352 | "for col in cols:\n", 353 | " ddl += \"`{}` text,\".format(col)\n", 354 | "\n", 355 | "sql_create = \"CREATE TABLE IF NOT EXISTS `{}` ({}) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin;\".format(table_name, ddl[:-1])\n", 356 | "cursor.execute(sql_create)" 357 | ] 358 | }, 359 | { 360 | "cell_type": "code", 361 | "execution_count": 20, 362 | "metadata": {}, 363 | "outputs": [], 364 | "source": [ 365 | "# insert data\n", 366 | "cols = \"`,`\".join([str(i) for i in df.columns.tolist()])\n", 367 | "\n", 368 | "# insert dict records .\n", 369 | "for i,row in df.iterrows():\n", 370 | " sql = \"INSERT INTO `test` (`\" +cols + \"`) VALUES (\" + \"%s,\"*(len(row)-1) + \"%s)\"\n", 371 | " cursor.execute(sql, tuple(row))\n", 372 | " connection.commit()" 373 | ] 374 | }, 375 | { 376 | "cell_type": "code", 377 | "execution_count": 21, 378 | "metadata": {}, 379 | "outputs": [ 380 | { 381 | "name": "stdout", 382 | "output_type": "stream", 383 | "text": [ 384 | "('1', 'A', 'S')\n", 385 | "('2', 'B', 'XL')\n", 386 | "('3', 'C', 'XXl')\n" 387 | ] 388 | } 389 | ], 390 | "source": [ 391 | "# read\n", 392 | "sql = \"SELECT * FROM test\"\n", 393 | "cursor.execute(sql)\n", 394 | "result = cursor.fetchall()\n", 395 | "for i in result:\n", 396 | " print(i)" 397 | ] 398 | } 399 | ], 400 | "metadata": { 401 | "kernelspec": { 402 | "display_name": "Python 3", 403 | "language": "python", 404 | "name": "python3" 405 | }, 406 | "language_info": { 407 | "codemirror_mode": { 408 | "name": "ipython", 409 | "version": 3 410 | }, 411 | "file_extension": ".py", 412 | "mimetype": "text/x-python", 413 | "name": "python", 414 | "nbconvert_exporter": "python", 415 | "pygments_lexer": "ipython3", 416 | "version": "3.6.9" 417 | } 418 | }, 419 | "nbformat": 4, 420 | "nbformat_minor": 2 421 | } 422 | -------------------------------------------------------------------------------- /notebooks/python/JSON/42._Convert_MySQL_table_to_Pandas_DataFrame_Python_dictionary.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# 42. Convert MySQL table to Pandas DataFrame(Python dictionary)\n", 8 | "\n", 9 | "\n", 10 | "[How to Convert MySQL Table to Pandas DataFrame / Python Dictionary](https://blog.softhints.com/convert-mysql-table-pandas-dataframe-python-dictionary/)\n", 11 | "\n", 12 | "* [PyMySQL](https://pypi.org/project/PyMySQL/) + [SQLAlchemy](https://pypi.org/project/SQLAlchemy/) - the shortest and easiest way to convert MySQL table to Python dict\n", 13 | "* [mysql.connector](https://pypi.org/project/mysql-connector-python/)\n", 14 | "* [pyodbc](https://pypi.org/project/pyodbc/) in order to connect to MySQL database, read table and convert it to DataFrame or Python dict." 15 | ] 16 | }, 17 | { 18 | "cell_type": "markdown", 19 | "metadata": {}, 20 | "source": [ 21 | "![](https://blog.softhints.com/content/images/2020/11/MySQL_table_to_Pandas_DataFrame_to_Python_dict.png)" 22 | ] 23 | }, 24 | { 25 | "cell_type": "code", 26 | "execution_count": 7, 27 | "metadata": {}, 28 | "outputs": [], 29 | "source": [ 30 | "password = ''" 31 | ] 32 | }, 33 | { 34 | "cell_type": "markdown", 35 | "metadata": {}, 36 | "source": [ 37 | "## 1: Convert MySQL Table to DataFrame with PyMySQL + SQLAlchemy " 38 | ] 39 | }, 40 | { 41 | "cell_type": "code", 42 | "execution_count": 2, 43 | "metadata": {}, 44 | "outputs": [ 45 | { 46 | "data": { 47 | "text/plain": [ 48 | "{'id': {0: 1, 1: 2, 2: 3, 3: 4, 4: 5},\n", 49 | " 'name': {0: 'Emma', 1: 'Ann', 2: 'Kim', 3: 'Olivia', 4: 'Victoria'}}" 50 | ] 51 | }, 52 | "execution_count": 2, 53 | "metadata": {}, 54 | "output_type": "execute_result" 55 | } 56 | ], 57 | "source": [ 58 | "from sqlalchemy import create_engine\n", 59 | "import pymysql\n", 60 | "import pandas as pd\n", 61 | "\n", 62 | "db_connection_str = 'mysql+pymysql://root:' + password + '@localhost:3306/test'\n", 63 | "db_connection = create_engine(db_connection_str)\n", 64 | "\n", 65 | "df = pd.read_sql('SELECT * FROM girls', con=db_connection)\n", 66 | "df.to_dict()" 67 | ] 68 | }, 69 | { 70 | "cell_type": "code", 71 | "execution_count": 3, 72 | "metadata": {}, 73 | "outputs": [ 74 | { 75 | "data": { 76 | "text/plain": [ 77 | "[{'id': 1, 'name': 'Emma'},\n", 78 | " {'id': 2, 'name': 'Ann'},\n", 79 | " {'id': 3, 'name': 'Kim'},\n", 80 | " {'id': 4, 'name': 'Olivia'},\n", 81 | " {'id': 5, 'name': 'Victoria'}]" 82 | ] 83 | }, 84 | "execution_count": 3, 85 | "metadata": {}, 86 | "output_type": "execute_result" 87 | } 88 | ], 89 | "source": [ 90 | "df.to_dict('records')" 91 | ] 92 | }, 93 | { 94 | "cell_type": "code", 95 | "execution_count": 4, 96 | "metadata": {}, 97 | "outputs": [ 98 | { 99 | "data": { 100 | "text/plain": [ 101 | "{'id': [1, 2, 3, 4, 5], 'name': ['Emma', 'Ann', 'Kim', 'Olivia', 'Victoria']}" 102 | ] 103 | }, 104 | "execution_count": 4, 105 | "metadata": {}, 106 | "output_type": "execute_result" 107 | } 108 | ], 109 | "source": [ 110 | "df.to_dict('list')" 111 | ] 112 | }, 113 | { 114 | "cell_type": "code", 115 | "execution_count": 5, 116 | "metadata": {}, 117 | "outputs": [ 118 | { 119 | "data": { 120 | "text/plain": [ 121 | "{0: {'id': 1, 'name': 'Emma'},\n", 122 | " 1: {'id': 2, 'name': 'Ann'},\n", 123 | " 2: {'id': 3, 'name': 'Kim'},\n", 124 | " 3: {'id': 4, 'name': 'Olivia'},\n", 125 | " 4: {'id': 5, 'name': 'Victoria'}}" 126 | ] 127 | }, 128 | "execution_count": 5, 129 | "metadata": {}, 130 | "output_type": "execute_result" 131 | } 132 | ], 133 | "source": [ 134 | "df.to_dict('index')" 135 | ] 136 | }, 137 | { 138 | "cell_type": "markdown", 139 | "metadata": {}, 140 | "source": [ 141 | "## 2: Convert MySQL Table to DataFrame with mysql.connector" 142 | ] 143 | }, 144 | { 145 | "cell_type": "code", 146 | "execution_count": 6, 147 | "metadata": {}, 148 | "outputs": [ 149 | { 150 | "data": { 151 | "text/plain": [ 152 | "{0: {0: 1, 1: 2, 2: 3, 3: 4, 4: 5},\n", 153 | " 1: {0: bytearray(b'Emma'),\n", 154 | " 1: bytearray(b'Ann'),\n", 155 | " 2: bytearray(b'Kim'),\n", 156 | " 3: bytearray(b'Olivia'),\n", 157 | " 4: bytearray(b'Victoria')}}" 158 | ] 159 | }, 160 | "execution_count": 6, 161 | "metadata": {}, 162 | "output_type": "execute_result" 163 | } 164 | ], 165 | "source": [ 166 | "import pandas as pd\n", 167 | "import mysql.connector\n", 168 | "\n", 169 | "# Setup MySQL connection\n", 170 | "db = mysql.connector.connect(\n", 171 | " host=\"localhost\", # your host, usually localhost\n", 172 | " user=\"root\", # your username\n", 173 | " password=password, # your password\n", 174 | " database=\"test\" # name of the data base\n", 175 | ") \n", 176 | "\n", 177 | "# You must create a Cursor object. It will let you execute all the queries you need\n", 178 | "cur = db.cursor()\n", 179 | "\n", 180 | "# Use all the SQL you like\n", 181 | "cur.execute(\"SELECT * FROM girls\")\n", 182 | "\n", 183 | "# Put it all to a data frame\n", 184 | "df_sql_data = pd.DataFrame(cur.fetchall())\n", 185 | "\n", 186 | "# Close the session\n", 187 | "db.close()\n", 188 | "\n", 189 | "# Show the data\n", 190 | "df_sql_data.to_dict()" 191 | ] 192 | }, 193 | { 194 | "cell_type": "code", 195 | "execution_count": null, 196 | "metadata": {}, 197 | "outputs": [], 198 | "source": [] 199 | } 200 | ], 201 | "metadata": { 202 | "kernelspec": { 203 | "display_name": "Python 3", 204 | "language": "python", 205 | "name": "python3" 206 | }, 207 | "language_info": { 208 | "codemirror_mode": { 209 | "name": "ipython", 210 | "version": 3 211 | }, 212 | "file_extension": ".py", 213 | "mimetype": "text/x-python", 214 | "name": "python", 215 | "nbconvert_exporter": "python", 216 | "pygments_lexer": "ipython3", 217 | "version": "3.8.4" 218 | } 219 | }, 220 | "nbformat": 4, 221 | "nbformat_minor": 2 222 | } 223 | -------------------------------------------------------------------------------- /notebooks/python_problems/Python_problems_for_beginners_1.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "## Python problems for beginners" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "## Problem 1 Triangle\n", 15 | "\n", 16 | "Write a simple program that demonstrate star pattern in Python 3.x for any n:\n", 17 | "\n", 18 | "Example n=5\n", 19 | "\n", 20 | " * \n", 21 | " * * \n", 22 | " * * * \n", 23 | " * * * * \n", 24 | " * * * * * " 25 | ] 26 | }, 27 | { 28 | "cell_type": "code", 29 | "execution_count": 20, 30 | "metadata": {}, 31 | "outputs": [ 32 | { 33 | "name": "stdout", 34 | "output_type": "stream", 35 | "text": [ 36 | "\n", 37 | "* \n", 38 | "* * \n", 39 | "* * * \n", 40 | "* * * * \n", 41 | "* * * * * \n" 42 | ] 43 | } 44 | ], 45 | "source": [ 46 | "n = 5\n", 47 | "\n", 48 | "for i in range(0, n+1):\n", 49 | " print('* ' * i)" 50 | ] 51 | }, 52 | { 53 | "cell_type": "code", 54 | "execution_count": 22, 55 | "metadata": {}, 56 | "outputs": [ 57 | { 58 | "name": "stdout", 59 | "output_type": "stream", 60 | "text": [ 61 | "7\n" 62 | ] 63 | } 64 | ], 65 | "source": [ 66 | "n = n +2\n", 67 | "print(n)" 68 | ] 69 | }, 70 | { 71 | "cell_type": "code", 72 | "execution_count": 33, 73 | "metadata": {}, 74 | "outputs": [ 75 | { 76 | "name": "stdout", 77 | "output_type": "stream", 78 | "text": [ 79 | "ddd" 80 | ] 81 | } 82 | ], 83 | "source": [ 84 | "for x in ['a', 's', 'd']:\n", 85 | " print('d', end='')" 86 | ] 87 | }, 88 | { 89 | "cell_type": "code", 90 | "execution_count": 31, 91 | "metadata": {}, 92 | "outputs": [ 93 | { 94 | "data": { 95 | "text/plain": [ 96 | "[3, 6, 9]" 97 | ] 98 | }, 99 | "execution_count": 31, 100 | "metadata": {}, 101 | "output_type": "execute_result" 102 | } 103 | ], 104 | "source": [ 105 | "list(range(3,10,3))" 106 | ] 107 | }, 108 | { 109 | "cell_type": "markdown", 110 | "metadata": {}, 111 | "source": [ 112 | "## Problem 2 Triangle with numbers\n", 113 | "\n", 114 | "Write a simple program that demonstrate triangle (with numbers 0..n per line) in Python 3.x for any n:\n", 115 | "\n", 116 | "Example n=4\n", 117 | "\n", 118 | " 1 \n", 119 | " 1 2 \n", 120 | " 1 2 3 \n", 121 | " 1 2 3 4" 122 | ] 123 | }, 124 | { 125 | "cell_type": "code", 126 | "execution_count": 38, 127 | "metadata": {}, 128 | "outputs": [ 129 | { 130 | "name": "stdout", 131 | "output_type": "stream", 132 | "text": [ 133 | "x\n", 134 | "\n", 135 | "x\n", 136 | "y\n", 137 | "1 \n", 138 | "x\n", 139 | "y\n", 140 | "1 y\n", 141 | "2 \n", 142 | "x\n", 143 | "y\n", 144 | "1 y\n", 145 | "2 y\n", 146 | "3 \n", 147 | "x\n", 148 | "y\n", 149 | "1 y\n", 150 | "2 y\n", 151 | "3 y\n", 152 | "4 \n" 153 | ] 154 | } 155 | ], 156 | "source": [ 157 | "n = 4\n", 158 | "\n", 159 | "for i in range(0, n+1):\n", 160 | " for j in range(1, i + 1):\n", 161 | " print(j, end=' ')\n", 162 | " print()" 163 | ] 164 | }, 165 | { 166 | "cell_type": "markdown", 167 | "metadata": {}, 168 | "source": [ 169 | "## Homework 1 Triangle with letters \n", 170 | "\n", 171 | "Write a simple program that demonstrate triangle (with consequtive letters) in Python 3.x for any n:\n", 172 | "\n", 173 | "Example n=4\n", 174 | "\n", 175 | " A \n", 176 | " B C \n", 177 | " D E F \n", 178 | " G H I J \n", 179 | " K L M N O " 180 | ] 181 | }, 182 | { 183 | "cell_type": "markdown", 184 | "metadata": {}, 185 | "source": [ 186 | "## Homework 2 Diagonal of numbers\n", 187 | "\n", 188 | "Write a simple program that demonstrate diagonal pattern in Python 3.x for any n:\n", 189 | "\n", 190 | "Example n=4\n", 191 | "\n", 192 | " 0\n", 193 | " 1\n", 194 | " 2\n", 195 | " 3\n", 196 | " 4" 197 | ] 198 | }, 199 | { 200 | "cell_type": "markdown", 201 | "metadata": {}, 202 | "source": [ 203 | "## Homework 3 Pyramid\n", 204 | "\n", 205 | "Write a simple program that demonstrate pyramid pattern in Python 3.x for any n:\n", 206 | "\n", 207 | "Example n=3\n", 208 | "\n", 209 | " * \n", 210 | " * * * \n", 211 | " * * * * * " 212 | ] 213 | }, 214 | { 215 | "cell_type": "code", 216 | "execution_count": null, 217 | "metadata": {}, 218 | "outputs": [], 219 | "source": [ 220 | "0 - 1\n", 221 | "1 - 3\n", 222 | "2 - 5\n", 223 | "\n", 224 | "2 * i + 1" 225 | ] 226 | }, 227 | { 228 | "cell_type": "code", 229 | "execution_count": 64, 230 | "metadata": {}, 231 | "outputs": [ 232 | { 233 | "name": "stdout", 234 | "output_type": "stream", 235 | "text": [ 236 | " * \n", 237 | " * * * \n", 238 | "* * * * * \n" 239 | ] 240 | } 241 | ], 242 | "source": [ 243 | "n = 3\n", 244 | "\n", 245 | "for i in range(n):\n", 246 | " row = '* ' * (2 * i + 1) # calc the * for a given row based formula\n", 247 | " print(row.center(n * 3))" 248 | ] 249 | }, 250 | { 251 | "cell_type": "code", 252 | "execution_count": 70, 253 | "metadata": {}, 254 | "outputs": [ 255 | { 256 | "name": "stdout", 257 | "output_type": "stream", 258 | "text": [ 259 | " *\n", 260 | " ***\n", 261 | " *****\n", 262 | " *******\n", 263 | "*********\n" 264 | ] 265 | } 266 | ], 267 | "source": [ 268 | "n = 5\n", 269 | "\n", 270 | "for i in range(n):\n", 271 | " print( ' ' * (n-i-1), end='')\n", 272 | " print('*' * (2 * i + 1))" 273 | ] 274 | } 275 | ], 276 | "metadata": { 277 | "kernelspec": { 278 | "display_name": "Python 3", 279 | "language": "python", 280 | "name": "python3" 281 | }, 282 | "language_info": { 283 | "codemirror_mode": { 284 | "name": "ipython", 285 | "version": 3 286 | }, 287 | "file_extension": ".py", 288 | "mimetype": "text/x-python", 289 | "name": "python", 290 | "nbconvert_exporter": "python", 291 | "pygments_lexer": "ipython3", 292 | "version": "3.6.7" 293 | } 294 | }, 295 | "nbformat": 4, 296 | "nbformat_minor": 2 297 | } 298 | -------------------------------------------------------------------------------- /scripts/1.python_wrap_lines.py: -------------------------------------------------------------------------------- 1 | import os 2 | 3 | size = 80 4 | file = 'budo' 5 | folder = os.path.expanduser('~/Documents/Fortunes/') 6 | 7 | # Read and store the entire file line by line 8 | with open(f'{folder}{file}.txt') as reader: 9 | provers = reader.readlines() 10 | 11 | # wrap/collate lines by separators [",", " ", "."] 12 | def collate(text, size): 13 | new_text = [] 14 | split_char = 1 15 | while split_char > 0: 16 | comma = str.find(text, ',', size) 17 | space = str.find(text, ' ', size) 18 | dot = str.find(text, '.', size) 19 | 20 | split_char = min(max(comma, dot), max(comma, space), max(dot, space)) 21 | 22 | if text[:split_char]: 23 | new_text.append(text[:split_char]) 24 | text = text[split_char+1:].replace('\n', "") 25 | 26 | return new_text 27 | 28 | # write collated information to new(same) file 29 | with open(f'{folder}{file}.txt', 'w') as writer: 30 | for wisdom in provers: 31 | if len(wisdom) > size: 32 | collated = collate(wisdom, size) 33 | for short in collated: 34 | writer.write(short) 35 | writer.write('\n') 36 | else: 37 | writer.write(wisdom) 38 | 39 | # Executing Shell Commands with Python 40 | import os 41 | myCmd = f'strfile -c % {folder}{file}.txt {folder}{file}.txt.dat' 42 | os.system(myCmd) -------------------------------------------------------------------------------- /scripts/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/softhints/python/a256a054d74ca397f41874b3e26f1c4b84214432/scripts/__init__.py -------------------------------------------------------------------------------- /test.py: -------------------------------------------------------------------------------- 1 | import urllib.parse 2 | 3 | f = '25 Pandas Create A Matplotlib Scatterplot From A Dataframe ' 4 | ff = urllib.parse.quote_plus(f) 5 | print(ff.replace('+', '_')) --------------------------------------------------------------------------------