├── 03_continuous_integration
    ├── iris-api
    │   ├── .travis.yml
    │   ├── tests
    │   │   ├── __init__.py
    │   │   └── resources
    │   │   │   └── prediction.py
    │   ├── resources
    │   │   ├── __init__.py
    │   │   ├── README.md
    │   │   └── IrisPredictorResource.py
    │   ├── service.yaml
    │   ├── models
    │   │   └── finalized_model.sav
    │   ├── bin
    │   │   └── docker_build_context.sh
    │   ├── LICENSE
    │   ├── Dockerfile
    │   ├── main.py
    │   ├── .gitignore
    │   └── README.md
    ├── Code_sharing_best_practices_workshop.pptx
    ├── 03_continuous_integration.py
    └── 03_continuous_integration.ipynb
├── img
    ├── ssh-remote.png
    ├── click_example_code.jpg
    ├── click_example_help.jpg
    └── jupyter_environments.png
├── Info Flyer
    ├── flyer.png
    ├── flyer-sketch.png
    ├── MUDS_Practical_Training_Flyer_2020.png
    ├── MUDS_Practical_Training_Flyer_2020_workshop.jpg
    └── ugly_code_numpy_linalg.py
├── MUDS resources
    ├── MUDS_Logofinal.png
    ├── MUDS_Logo_CMYK_final.eps
    ├── MUDS-Banner-Web-V1(1).jpg
    ├── presentation
    │   └── MUDS_Folienmaster.pptx
    └── Technical_University_of_Munich_emblem.svg
├── muds_practical_training_overview.pptx
├── 02_database_basics
    ├── photos
    │   ├── keyvalue_example.PNG
    │   └── cybernetics-1869205_1280.jpg
    └── 02_database_basics.py
├── 04_best_practices
    ├── Code_sharing_best_practices_workshop.pptx
    ├── 04_best_practices.py
    ├── slurm.ipynb
    └── 04_best_practices.ipynb
├── README.md
├── .gitignore
├── check_list_before_sharing.md
├── autogen_slidetype.py
├── 06_advanced_python
    ├── debugging.ipynb
    └── jupyter_addons.ipynb
├── 00_intro
    ├── 00_intro.py
    └── 00_intro.ipynb
├── 99_other_material
    ├── meme_treasury.ipynb
    └── complexity.ipynb
└── 07_graphs
    └── 07_graphs.ipynb


/03_continuous_integration/iris-api/.travis.yml:
--------------------------------------------------------------------------------
1 | 


--------------------------------------------------------------------------------
/03_continuous_integration/iris-api/tests/__init__.py:
--------------------------------------------------------------------------------
1 | 


--------------------------------------------------------------------------------
/03_continuous_integration/iris-api/resources/__init__.py:
--------------------------------------------------------------------------------
1 | 


--------------------------------------------------------------------------------
/img/ssh-remote.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Mu-DS/practical_training/HEAD/img/ssh-remote.png


--------------------------------------------------------------------------------
/Info Flyer/flyer.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Mu-DS/practical_training/HEAD/Info Flyer/flyer.png


--------------------------------------------------------------------------------
/img/click_example_code.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Mu-DS/practical_training/HEAD/img/click_example_code.jpg


--------------------------------------------------------------------------------
/img/click_example_help.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Mu-DS/practical_training/HEAD/img/click_example_help.jpg


--------------------------------------------------------------------------------
/Info Flyer/flyer-sketch.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Mu-DS/practical_training/HEAD/Info Flyer/flyer-sketch.png


--------------------------------------------------------------------------------
/img/jupyter_environments.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Mu-DS/practical_training/HEAD/img/jupyter_environments.png


--------------------------------------------------------------------------------
/MUDS resources/MUDS_Logofinal.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Mu-DS/practical_training/HEAD/MUDS resources/MUDS_Logofinal.png


--------------------------------------------------------------------------------
/MUDS resources/MUDS_Logo_CMYK_final.eps:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Mu-DS/practical_training/HEAD/MUDS resources/MUDS_Logo_CMYK_final.eps


--------------------------------------------------------------------------------
/muds_practical_training_overview.pptx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Mu-DS/practical_training/HEAD/muds_practical_training_overview.pptx


--------------------------------------------------------------------------------
/MUDS resources/MUDS-Banner-Web-V1(1).jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Mu-DS/practical_training/HEAD/MUDS resources/MUDS-Banner-Web-V1(1).jpg


--------------------------------------------------------------------------------
/02_database_basics/photos/keyvalue_example.PNG:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Mu-DS/practical_training/HEAD/02_database_basics/photos/keyvalue_example.PNG


--------------------------------------------------------------------------------
/Info Flyer/MUDS_Practical_Training_Flyer_2020.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Mu-DS/practical_training/HEAD/Info Flyer/MUDS_Practical_Training_Flyer_2020.png


--------------------------------------------------------------------------------
/MUDS resources/presentation/MUDS_Folienmaster.pptx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Mu-DS/practical_training/HEAD/MUDS resources/presentation/MUDS_Folienmaster.pptx


--------------------------------------------------------------------------------
/02_database_basics/photos/cybernetics-1869205_1280.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Mu-DS/practical_training/HEAD/02_database_basics/photos/cybernetics-1869205_1280.jpg


--------------------------------------------------------------------------------
/03_continuous_integration/iris-api/service.yaml:
--------------------------------------------------------------------------------
1 | 
2 | # The service configuration used in the Dockerfile.
3 | 
4 | # knn model
5 | model_path: /app/models/finalized_model.sav


--------------------------------------------------------------------------------
/04_best_practices/Code_sharing_best_practices_workshop.pptx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Mu-DS/practical_training/HEAD/04_best_practices/Code_sharing_best_practices_workshop.pptx


--------------------------------------------------------------------------------
/Info Flyer/MUDS_Practical_Training_Flyer_2020_workshop.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Mu-DS/practical_training/HEAD/Info Flyer/MUDS_Practical_Training_Flyer_2020_workshop.jpg


--------------------------------------------------------------------------------
/03_continuous_integration/iris-api/models/finalized_model.sav:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Mu-DS/practical_training/HEAD/03_continuous_integration/iris-api/models/finalized_model.sav


--------------------------------------------------------------------------------
/03_continuous_integration/Code_sharing_best_practices_workshop.pptx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Mu-DS/practical_training/HEAD/03_continuous_integration/Code_sharing_best_practices_workshop.pptx


--------------------------------------------------------------------------------
/03_continuous_integration/iris-api/tests/resources/prediction.py:
--------------------------------------------------------------------------------
1 | """Unit tests for iris-api"""
2 | 
3 | from resources.IrisPredictor import predict_knn
4 | 
5 | def test_predict_knn():
6 |     assert True


--------------------------------------------------------------------------------
/MUDS resources/Technical_University_of_Munich_emblem.svg:
--------------------------------------------------------------------------------
1 | <svg width="73" height="38" xmlns="http://www.w3.org/2000/svg"><path d="M28 0v31h8V0h37v38h-7V7h-8v31h-7V7h-8v31H21V7h-7v31H7V7H0V0h28z" fill="#3070B3"/></svg>


--------------------------------------------------------------------------------
/03_continuous_integration/iris-api/resources/README.md:
--------------------------------------------------------------------------------
 1 | # Resources	
 2 | 
 3 | Here you can find different resources which are existing as an HTTP API.
 4 | 
 5 | # API
 6 | 
 7 | 
 8 | ## /iris_api
 9 | 
10 | Detect iris class from the list.
11 | 
12 | Input: json file 
13 | 
14 | Example:
15 | ```bash
16 |     curl -d '{"features":[1,2,3,4]}' \
17 |         -H "Content-Type: application/json" \
18 |         -X POST http://localhost:8000/iris_api
19 | ```
20 | 


--------------------------------------------------------------------------------
/03_continuous_integration/iris-api/bin/docker_build_context.sh:
--------------------------------------------------------------------------------
 1 | #!/bin/bash
 2 | 
 3 | DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"/..
 4 | 
 5 | rm -rf ${DIR}/build/docker
 6 | 
 7 | mkdir -p ${DIR}/build/docker
 8 | mkdir -p ${DIR}/build/docker/resources
 9 | mkdir -p ${DIR}/build/docker/models
10 | 
11 | 
12 | cp ${DIR}/Dockerfile ${DIR}/build/docker
13 | 
14 | cp ${DIR}/resources/*.py ${DIR}/build/docker/resources
15 | cp ${DIR}/*.py ${DIR}/build/docker
16 | cp ${DIR}/models/*.sav ${DIR}/build/docker/models
17 | cp ${DIR}/service.yaml ${DIR}/build/docker
18 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # Practical Training for Researchers in Data Science and Software Development
 2 | 
 3 | The material presented here covers all material of the 5-day lecture series developed for the Helmholtz graduate school for data science Munich (MUDS). For any questions or comments on the material covered here, please contact @the-rccg or @aliechoes.
 4 | 
 5 | This repo uses [jupytext](https://github.com/mwouts/jupytext) to keep
 6 | notebooks as text files
 7 | 
 8 | ## Official flyer for the (postponed) lecture series
 9 | 
10 | ![FlyerImage](https://github.com/Mu-DS/practical_training/blob/master/Info%20Flyer/MUDS_Practical_Training_Flyer_2020_workshop.jpg)
11 | 


--------------------------------------------------------------------------------
/03_continuous_integration/iris-api/LICENSE:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) 2020 Ali Boushehri
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 | 


--------------------------------------------------------------------------------
/03_continuous_integration/iris-api/Dockerfile:
--------------------------------------------------------------------------------
 1 | 
 2 | # Dockerfile for the Face Detection Service
 3 | 
 4 | # Use an official Python runtime as a parent image
 5 | FROM continuumio/miniconda3
 6 | 
 7 | # Set the working directory to /app
 8 | WORKDIR /app
 9 | 
10 | # Update Linux package lists
11 | RUN apt-get update
12 | 
13 | # Install build tools (gcc etc.)
14 | RUN apt-get install -y build-essential
15 | 
16 | 
17 | # Install ops tools
18 | RUN apt-get install -y procps vim
19 | 
20 | 
21 | # Install any needed packages specified in requirements.txt
22 | # TODO: add correct python libraries 
23 | RUN conda install -c conda-forge gevent
24 | RUN conda install -c conda-forge gunicorn>=19.0
25 | RUN conda install -c conda-forge falcon>=2.0
26 | 
27 | 
28 | # Copy the current directory contents into the container at /app
29 | COPY . /app
30 | RUN pwd
31 | 
32 | # Make port 80 available to the world outside this container
33 | EXPOSE 80
34 | 
35 | # Define environment variable
36 | ENV PYTHONUNBUFFERED TRUE
37 | ENV IRIS_API_CONFIG /app/service.yaml
38 | ENV NUM_WORKER 1
39 | 
40 | # Run Gunicorn when the container launches
41 | CMD ["sh", "-c", "gunicorn --workers ${NUM_WORKER} --worker-class gevent --bind 0.0.0.0:80 main:app"]


--------------------------------------------------------------------------------
/03_continuous_integration/03_continuous_integration.py:
--------------------------------------------------------------------------------
 1 | # ---
 2 | # jupyter:
 3 | #   jupytext:
 4 | #     formats: ipynb,py
 5 | #     text_representation:
 6 | #       extension: .py
 7 | #       format_name: light
 8 | #       format_version: '1.5'
 9 | #       jupytext_version: 1.5.2
10 | #   kernelspec:
11 | #     display_name: Python 3
12 | #     language: python
13 | #     name: python3
14 | # ---
15 | 
16 | # # The Continuous Integration Pipeline
17 | 
18 | # ## Motivation
19 | 
20 | # ## Overview
21 | 
22 | # 1. Git  (Ali)
23 | # 2. Unit Tests  (Ali)
24 | # 3. Docker  (Ali)
25 | # 4. APIs  (Ali)
26 | 
27 | # # Git
28 | 
29 | # ## Motivation
30 | 
31 | # <img src="https://external-preview.redd.it/u1_S5Vu4FztMR72c9pfl086wbmdlZYVjK77i1IEvTjg.jpg?width=640&crop=smart&auto=webp&s=310af21a5b237f4b53a982afc2077fcdb4b1839c" width="400">
32 | 
33 | # <img src="https://i.redd.it/2ohly4ex2hy41.jpg" width=400 />
34 | 
35 | # # Unit Tests
36 | 
37 | # <img src="https://i.redd.it/lpik03matda41.jpg" width=400 />
38 | 
39 | # Testing in Production
40 | #
41 | # <img src="https://preview.redd.it/zyb8yizgy8041.jpg?width=640&crop=smart&auto=webp&s=054d58d040daf5217da2c28b40e48c92d90aef01" width=400 />
42 | 
43 | # + jupyter={"outputs_hidden": true}
44 | 
45 | # -
46 | 
47 | 
48 | # # Docker
49 | 
50 | # ## Motivation
51 | 
52 | # <img src="https://i.redd.it/ygjaybp2l5c21.jpg">
53 | 
54 | # + jupyter={"outputs_hidden": true}
55 | 
56 | # -
57 | 
58 | 
59 | # # APIs
60 | 
61 | # + jupyter={"outputs_hidden": true}
62 | 
63 | 


--------------------------------------------------------------------------------
/03_continuous_integration/iris-api/main.py:
--------------------------------------------------------------------------------
 1 | import sys
 2 | import os
 3 | import io
 4 | import logging
 5 | import falcon
 6 | import yaml
 7 | from resources.IrisPredictorResource import IrisPredictorResource
 8 | 
 9 | def init_logging():
10 |     """Initialize logging to write to STDOUT."""
11 |     logger = logging.getLogger(__name__)
12 |     logger.setLevel(logging.INFO)
13 |     handler = logging.StreamHandler(sys.stdout)
14 |     handler.setLevel(logging.INFO)
15 |     formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')
16 |     handler.setFormatter(formatter)
17 |     logger.addHandler(handler)
18 |     return logger
19 | 
20 | def ensure_if_path_exists(pth):
21 |     # Create target Directory if don't exist
22 |     if not os.path.exists(pth):
23 |         os.makedirs(pth)
24 |     return None
25 | 
26 | 
27 | def load_yaml(file_path):
28 |     with open(file_path, 'r') as stream:    
29 |         return yaml.load(stream)
30 | 
31 | 
32 |  
33 | """ 
34 | In this part, the app initialized to create an API. There are multiple steps to be followed:
35 | 1) Initializing the API, loading the config file and loading the logger
36 | 2) Initializing IrisPredictor
37 | 3) adding the route
38 | """
39 | 
40 | 
41 | ## Part 1 
42 | app = falcon.API()
43 | 
44 | config_path = os.environ.get('IRIS_API_CONFIG', None)
45 | if config_path is None:
46 |     config_path = 'service.yaml'
47 | 
48 | config = load_yaml(config_path)
49 | model_path = config['model_path']
50 | 
51 | # Start the logging
52 | logger = init_logging()
53 | logger.info('Service config: %s' % config)
54 | 
55 | 
56 | ## Part 2: Resources
57 | iris_api = IrisPredictorResource(model_path, logger)
58 | 
59 | ## Part 3: iris_api
60 | app.add_route("/iris_api", iris_api)


--------------------------------------------------------------------------------
/03_continuous_integration/iris-api/.gitignore:
--------------------------------------------------------------------------------
  1 | # Byte-compiled / optimized / DLL files
  2 | __pycache__/
  3 | *.py[cod]
  4 | *$py.class
  5 | 
  6 | # C extensions
  7 | *.so
  8 | 
  9 | # Distribution / packaging
 10 | .Python
 11 | build/
 12 | develop-eggs/
 13 | dist/
 14 | downloads/
 15 | eggs/
 16 | .eggs/
 17 | lib/
 18 | lib64/
 19 | parts/
 20 | sdist/
 21 | var/
 22 | wheels/
 23 | pip-wheel-metadata/
 24 | share/python-wheels/
 25 | *.egg-info/
 26 | .installed.cfg
 27 | *.egg
 28 | MANIFEST
 29 | 
 30 | # PyInstaller
 31 | #  Usually these files are written by a python script from a template
 32 | #  before PyInstaller builds the exe, so as to inject date/other infos into it.
 33 | *.manifest
 34 | *.spec
 35 | 
 36 | # Installer logs
 37 | pip-log.txt
 38 | pip-delete-this-directory.txt
 39 | 
 40 | # Unit test / coverage reports
 41 | htmlcov/
 42 | .tox/
 43 | .nox/
 44 | .coverage
 45 | .coverage.*
 46 | .cache
 47 | nosetests.xml
 48 | coverage.xml
 49 | *.cover
 50 | *.py,cover
 51 | .hypothesis/
 52 | .pytest_cache/
 53 | 
 54 | # Translations
 55 | *.mo
 56 | *.pot
 57 | 
 58 | # Django stuff:
 59 | *.log
 60 | local_settings.py
 61 | db.sqlite3
 62 | db.sqlite3-journal
 63 | 
 64 | # Flask stuff:
 65 | instance/
 66 | .webassets-cache
 67 | 
 68 | # Scrapy stuff:
 69 | .scrapy
 70 | 
 71 | # Sphinx documentation
 72 | docs/_build/
 73 | 
 74 | # PyBuilder
 75 | target/
 76 | 
 77 | # Jupyter Notebook
 78 | .ipynb_checkpoints
 79 | 
 80 | # IPython
 81 | profile_default/
 82 | ipython_config.py
 83 | 
 84 | # pyenv
 85 | .python-version
 86 | 
 87 | # pipenv
 88 | #   According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
 89 | #   However, in case of collaboration, if having platform-specific dependencies or dependencies
 90 | #   having no cross-platform support, pipenv may install dependencies that don't work, or not
 91 | #   install all needed dependencies.
 92 | #Pipfile.lock
 93 | 
 94 | # PEP 582; used by e.g. github.com/David-OConnor/pyflow
 95 | __pypackages__/
 96 | 
 97 | # Celery stuff
 98 | celerybeat-schedule
 99 | celerybeat.pid
100 | 
101 | # SageMath parsed files
102 | *.sage.py
103 | 
104 | # Environments
105 | .env
106 | .venv
107 | env/
108 | venv/
109 | ENV/
110 | env.bak/
111 | venv.bak/
112 | 
113 | # Spyder project settings
114 | .spyderproject
115 | .spyproject
116 | 
117 | # Rope project settings
118 | .ropeproject
119 | 
120 | # mkdocs documentation
121 | /site
122 | 
123 | # mypy
124 | .mypy_cache/
125 | .dmypy.json
126 | dmypy.json
127 | 
128 | # Pyre type checker
129 | .pyre/
130 | 


--------------------------------------------------------------------------------
/03_continuous_integration/iris-api/resources/IrisPredictorResource.py:
--------------------------------------------------------------------------------
 1 | import pickle
 2 | import sklearn
 3 | from sklearn.neighbors import KNeighborsClassifier
 4 | import json
 5 | import numpy as np
 6 | import falcon
 7 | 
 8 | 
 9 | 
10 | def predict_knn(features, model):
11 | 
12 |     """
13 |     This function gets the features and models and predicts the output
14 |     Args:
15 |         features(list): list of features. It must include 4 floating numbers
16 |         model(sklearn model): knn sklearn model, loaded from models folder
17 |     output:    
18 |         prediceted_class(str)
19 |     """
20 |     
21 |     classes = ['setosa', 'versicolor', 'virginica']
22 |     """
23 |     TODO: 
24 |     predict the class!
25 |     """
26 |     return predicted_class
27 | 
28 | ### resource
29 | class IrisPredictorResource():
30 |     """
31 |     TODO: Documentation
32 |     """
33 |     def __init__(self, model_path, logger):
34 |         """
35 |         TODO: Documentation
36 |         """
37 |         self.logger = logger
38 |         self.model = pickle.load(open(model_path, 'rb'))
39 |         self.logger.info("Starting: IrisPredictor")
40 |     
41 |     def on_post(self, req, resp):
42 |         """
43 |         TODO: Documentation
44 |         """
45 |         try:
46 |             self.logger.info("IrisPredictor: reading file")
47 |             request_bytes = req.stream.read()
48 | 
49 |             try:
50 |                 request = json.loads(request_bytes.decode("utf-8"))
51 |             
52 |             except Exception as e:
53 |                 self.logger.error(e, exc_info=True)
54 |                 resp.status = falcon.HTTP_400
55 |                 resp.body = "Invalid JSON\n"
56 |                 return
57 |             """
58 |             @TODO: 
59 |             check the quality of the input file.
60 |             In case the quality of the input is not valid, 
61 |             send back the correct resp.body and resp.status
62 |             """
63 | 
64 |             ## In this part, you consider the input is correct and 
65 |             ## just need to return the result
66 |             prediction = predict_knn(features, self.model)
67 | 
68 |             self.logger.info('IrisPredictor: the prediction is %s' % prediction)
69 |             response = {"predicted_class": prediction}
70 | 
71 |             self.logger.info('IrisPredictor: Sending the results \n')
72 |             
73 |             """
74 |             TODO: use the correct HTTP status code
75 |             """
76 |             resp.status = ## FILL HERE! ##
77 |             resp.body = json.dumps(response) + '\n'
78 | 
79 |         except Exception as e:
80 |             self.logger.error(e, exc_info=True)
81 |             resp.status = falcon.HTTP_500


--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
  1 | # Byte-compiled / optimized / DLL files
  2 | __pycache__/
  3 | *.py[cod]
  4 | *$py.class
  5 | 
  6 | # C extensions
  7 | *.so
  8 | 
  9 | # Distribution / packaging
 10 | .Python
 11 | build/
 12 | develop-eggs/
 13 | dist/
 14 | downloads/
 15 | eggs/
 16 | .eggs/
 17 | lib/
 18 | lib64/
 19 | parts/
 20 | sdist/
 21 | var/
 22 | wheels/
 23 | pip-wheel-metadata/
 24 | share/python-wheels/
 25 | *.egg-info/
 26 | .installed.cfg
 27 | *.egg
 28 | MANIFEST
 29 | 
 30 | # PyInstaller
 31 | #  Usually these files are written by a python script from a template
 32 | #  before PyInstaller builds the exe, so as to inject date/other infos into it.
 33 | *.manifest
 34 | *.spec
 35 | 
 36 | # Installer logs
 37 | pip-log.txt
 38 | pip-delete-this-directory.txt
 39 | 
 40 | # Unit test / coverage reports
 41 | htmlcov/
 42 | .tox/
 43 | .nox/
 44 | .coverage
 45 | .coverage.*
 46 | .cache
 47 | nosetests.xml
 48 | coverage.xml
 49 | *.cover
 50 | *.py,cover
 51 | .hypothesis/
 52 | .pytest_cache/
 53 | 
 54 | # Translations
 55 | *.mo
 56 | *.pot
 57 | 
 58 | # Django stuff:
 59 | *.log
 60 | local_settings.py
 61 | db.sqlite3
 62 | db.sqlite3-journal
 63 | 
 64 | # Flask stuff:
 65 | instance/
 66 | .webassets-cache
 67 | 
 68 | # Scrapy stuff:
 69 | .scrapy
 70 | 
 71 | # Sphinx documentation
 72 | docs/_build/
 73 | 
 74 | # PyBuilder
 75 | target/
 76 | 
 77 | # Jupyter Notebook
 78 | .ipynb_checkpoints
 79 | 
 80 | # IPython
 81 | profile_default/
 82 | ipython_config.py
 83 | 
 84 | # pyenv
 85 | .python-version
 86 | 
 87 | # pipenv
 88 | #   According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
 89 | #   However, in case of collaboration, if having platform-specific dependencies or dependencies
 90 | #   having no cross-platform support, pipenv may install dependencies that don't work, or not
 91 | #   install all needed dependencies.
 92 | #Pipfile.lock
 93 | 
 94 | # PEP 582; used by e.g. github.com/David-OConnor/pyflow
 95 | __pypackages__/
 96 | 
 97 | # Celery stuff
 98 | celerybeat-schedule
 99 | celerybeat.pid
100 | 
101 | # SageMath parsed files
102 | *.sage.py
103 | 
104 | # Environments
105 | .env
106 | .venv
107 | env/
108 | venv/
109 | ENV/
110 | env.bak/
111 | venv.bak/
112 | 
113 | # Spyder project settings
114 | .spyderproject
115 | .spyproject
116 | 
117 | # Rope project settings
118 | .ropeproject
119 | 
120 | # mkdocs documentation
121 | /site
122 | 
123 | # mypy
124 | .mypy_cache/
125 | .dmypy.json
126 | dmypy.json
127 | 
128 | # Pyre type checker
129 | .pyre/
130 | 
131 | .vscode/
132 | 
133 | data_loader/.Rhistory
134 | 
135 | inputs/.Rhistory
136 | 
137 | preprocessing/.Rhistory
138 | 
139 | code.sh
140 | 
141 | launch_code.sh
142 | 
143 | configs/sample_config.json
144 | 
145 | *.sh
146 | eval.sh
147 | launch_eval.sh
148 | 
149 | 07_graphs/data


--------------------------------------------------------------------------------
/03_continuous_integration/iris-api/README.md:
--------------------------------------------------------------------------------
 1 | # Iris API
 2 | 
 3 | This is an HTTP service that provides functions to detect Iris target from lists. It includes:
 4 | 
 5 | * [Iris Predictor](resources/IrisPredictor.py)
 6 | 
 7 | For understanding the APIs and resources please refer to the folder [resources](resources) or the explanation [here](resources/README.md)
 8 | 
 9 | ## Folders
10 | 
11 | * [bin](bin): executable file creating the necessary folders and copying models before building the docker
12 | * [models](models): sklearn KNN model
13 | * [resources](resources): resources for different APIs
14 | 
15 | 
16 | ## Docker
17 | 
18 | In this part we explain how to build and run the docker.
19 | 
20 | ### Build a Docker container
21 |  
22 | 
23 | ```bash
24 | sudo bin/docker_build_context.sh
25 | sudo docker build --tag=iris_api:0.0.1 build/docker
26 | ```
27 | 
28 | 
29 | ### Run Docker container
30 | 
31 | Just CPU:
32 | 
33 | ```bash
34 | sudo docker run -d  -p 8000:80 iris_api:0.0.1
35 | ```
36 | 
37 | After starting the container the service should listen on 127.0.0.1 port 8000.
38 | 
39 | The number of Gunicorn workers can be configured by setting the `NUM_WORKER` environment variable when running the container, e.g. `-e NUM_WORKER=2`.
40 | 
41 | ### Start service manually
42 | 
43 | For debugging it can be helpful to start the service manually. Run the container but overwrite the entrypoint with a Bash shell (you need to modify the version manually instead of 0.0.1):
44 | 
45 | ```bash
46 | docker run -it -p 8000:80 --entrypoint=/bin/bash iris_api:0.0.1
47 | ```
48 | 
49 | This starts the container and opens a shell but does not start the service. Start the service manually:
50 | 
51 | ```bash
52 | cd /app
53 | gunicorn --workers 1 --worker-class gevent --bind 0.0.0.0:80 main:app
54 | ```
55 | 
56 | ### Login to the Docker container
57 | 
58 | Lookup container ID:
59 | 
60 | ```bash
61 |     docker container ps
62 |     CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS              PORTS                  NAMES
63 |     7932a3814453        friendlyhello       "python app.py"     16 seconds ago      Up 15 seconds       0.0.0.0:4000->80/tcp   musing_robinson
64 | ```
65 | 
66 | Open a shell on the container:
67 | 
68 | ```bash
69 | docker exec -it c1de50a17e8d /bin/bash
70 | ```
71 | 
72 | ### Cleanup
73 | 
74 | Remove all containers and images:
75 | 
76 | ```bash
77 | sudo docker rm $(sudo docker ps -a -q)
78 | sudo docker rmi $(sudo docker images -q)
79 | ```
80 | 
81 | ## Gunicorn
82 | 
83 | The Gunicorn configuration is described in [Gunicorn settings](http://docs.gunicorn.org/en/stable/settings.html).
84 | 
85 | The most important Gunicorn configuration parameters are:
86 | 
87 | * `--reload` - Restart workers when code changes. This should only be used during development
88 | * `--workers` - The number of worker processes for handling requests
89 | * `--worker-class` - The type of workers to use
90 | * `--bind` - The socket and port to bind
91 | * `--access-logfile` - Path of the access log file
92 | * `--error-logfile` -  Path of the error log file
93 | * `--daemon` - Daemonize the Gunicorn process
94 | 


--------------------------------------------------------------------------------
/check_list_before_sharing.md:
--------------------------------------------------------------------------------
 1 | # Check-List before sharing your code
 2 | 
 3 | Thank you for wanting to share your code!
 4 | And thank you even more for trying to make sure it is helpful rather than sending the person in circles!
 5 | Research needs more people like you!
 6 | 
 7 | ## Code is read way more frequently than it is written
 8 | 
 9 | * [ ] Does your code abide by Python naming rules?
10 | 
11 |     - [ ] ALL functions are snake_case
12 |     - [ ] ALL classes are UpperCamelCase
13 |     - [ ] ALL variables are snake_case
14 |     - [ ] ALL constants are UPPER_CASE
15 |     - [ ] ALL packages are lowercase
16 | 
17 | * [ ] No. There are no exceptions because you prefer them...
18 | * [ ] Are your custom data structures clearly described?
19 | 
20 |     - [ ] Shape?
21 |     - [ ] Data Types?
22 |     - [ ] Hierarchy?
23 | 
24 | * [ ] Do ALL public functions have docstrings?
25 | 
26 |     - [ ] Does it include the inputs and their format?
27 |     - [ ] Does it include the ouputs and their format?
28 |     - [ ] Does it include the exceptions that it throws?
29 | 
30 | * [ ] Remember that function that took you ages to write? Read the code: if you can immediately understand it you are good - otherwise, rewrite it to be more explicit.
31 | * [ ] Does ANY functions have >2 bracket-pairs of any kind? Split it into more lines.
32 | * [ ] Have ALL your abbreviations been defined in the same file?
33 | * [ ] No, not everyone knows that abbreviation...
34 | * [ ] Have you set up an auto-formatter?
35 | * [ ] Has the formatter configuration been documented?
36 | * [ ] Okay. Now run the formatter again.
37 | 
38 | ## Code is more often debugged than it is written
39 | 
40 | * [ ] Do you have notebooks?
41 | 
42 |     - [ ] Restart Kernel
43 |     - [ ] Run all
44 |     - [ ] Repeat until no error shows up
45 | 
46 | * [ ] Are ALL your private functions unit tested?
47 | 
48 |     - [ ] Does it guarantee ALL functionality the function allows?
49 |     - [ ] Are your mathematical methods tested for convergence?
50 |     - [ ] Have you included the edge cases in your tests?
51 |     - [ ] Yes, even wrappers around standard libraries.
52 | 
53 | * [ ] Have you set up a CI?
54 | 
55 |     - [ ] For the NEWEST versions of packages?
56 |     - [ ] For the NEWEST versions of the language?
57 |     - [ ] Have you checked that these are really the newest?
58 |     - [ ] Does it run all notebooks?
59 | 
60 | * [ ] Has it passed all tests for all versions?
61 | * [ ] Have you added `requirements.txt` file in your repo?
62 | * [ ] Have you added the correct `.gitignore` file in your repo?
63 | * [ ] Does every folder contain a markdown with correct and up-to-date explanation?
64 | * [ ] Does adding a docker help in reproduciblity of your work? If so, have you implemented it?
65 | * [ ] Have you checked every box? Congratulations, you can now share the code :)
66 | 
67 | ## Automating the boring stuff
68 | 
69 | * [ ] Use **black** auto-formatting on save
70 | * [ ] Use **pylint** in CI to fail when stuff (e.g. docstrings) are missing
71 | * [ ] Use **codacy** to check the quality of the whole code
72 | * [ ] Use **TravisCI** to test against the newest versions from pip
73 | * [ ] Use **codecov** to ensure you are not missing unit tests
74 | * [ ] Use **mypy** for type hints and checking [link](http://mypy-lang.org/) 
75 | 
76 | ## Have more tips?
77 | 
78 | Please do let us know or simply submit your own PR to this repo!
79 | 


--------------------------------------------------------------------------------
/autogen_slidetype.py:
--------------------------------------------------------------------------------
 1 | import json
 2 | import re
 3 | import os
 4 | from tqdm import tqdm
 5 | import click
 6 | 
 7 | 
 8 | def get_num_hashtags(string):
 9 |     """count the number of hashtags in beginning of string
10 | 
11 |     :param string: string to count hashtags in
12 |     :type string: str
13 |     :return: number of consecutive hashtags followed by a white space
14 |     :rtype: int
15 |     """
16 |     num = 0
17 |     match = re.match(r"^[#]{1,}[\s]", string)
18 |     if match:
19 |         num = len(match[0])-1
20 |     return num
21 | 
22 | 
23 | def set_slide_type(metadata, celltype):
24 |     """adds slide type to global "notebook" (side-effect)
25 | 
26 |     :param cell_idx: index of the cell to consider
27 |     :type cell_idx: int
28 |     :param celltype: type of slideshow type to designate this cell
29 |     :type celltype: str
30 |     """
31 |     if 'slideshow' not in metadata.keys():
32 |         metadata['slideshow'] = {}
33 |     metadata['slideshow']['slide_type'] = celltype
34 |     return metadata
35 | 
36 | 
37 | @click.command()
38 | @click.option("--in", "-i", "basename", default="", type=click.STRING, show_default=False, required=True,
39 |               help="input file name to be loaded, autodetects jupytext")
40 | @click.option("--out", "-o", "outname", default="", type=click.STRING, show_default=False, required=True,
41 |               help="output file name to be saved as, autodetects jupytext")
42 | @click.option("--order", "slide_order", default=2, type=click.IntRange(0,), show_default=True, required=False,
43 |               help="Number of # above which all are done as subslides")
44 | @click.option("--INDENT", "indentation", default=1, type=click.IntRange(0,), show_default=True, required=False,
45 |               help="Number of spaces to indent the json/ipynb output")              
46 | def main(basename, outname, slide_order, indentation):
47 |     """automatically generate slide_type metadata for ipynb files
48 | 
49 |     :param basename: input ipynb name without .ipynb
50 |     :type basename: str
51 |     :param outname: output ipynb name without .ipynb
52 |     :type outname: str
53 |     :param slide_order: numbers of #s above which sections are considered sub-slides
54 |     :type slide_order: int > 0
55 |     """
56 |     # Decoding jupyter notebooks as jsons
57 |     with open(f"{basename}.ipynb", "r", encoding="utf-8") as infile:
58 |         notebook = json.load(infile)
59 | 
60 |     # Adjusting metadata for each cell
61 |     for cell_idx in tqdm(range(len(notebook["cells"]))):
62 |         if len(notebook["cells"][cell_idx]["source"]):
63 |             num_hashtags = max(
64 |                 map(get_num_hashtags, notebook["cells"][cell_idx]["source"]))
65 |             metadata = notebook["cells"][cell_idx]['metadata']
66 |             if not isinstance(metadata, dict):
67 |                 print(metadata)
68 |                 metadata = {}
69 |             if num_hashtags == 0:
70 |                 metadata = set_slide_type(metadata, "fragment")
71 |             elif num_hashtags > slide_order:
72 |                 metadata = set_slide_type(metadata, "subslide")
73 |             else:
74 |                 metadata = set_slide_type(metadata, "slide")
75 |         else:
76 |             metadata = set_slide_type(metadata, "skip")
77 |         notebook["cells"][cell_idx]['metadata'] = metadata
78 | 
79 |     # Saving new file
80 |     with open(f"{outname}.ipynb", "w", encoding="utf-8") as outfile:
81 |         json.dump(notebook, fp=outfile, indent=indentation)
82 |     # Ensure time for jupytext is preserved
83 |     if f"{basename}.py" in os.listdir():
84 |         with open(f"{basename}.py", "r", encoding="utf-8") as infile:
85 |             loaded = json.load(infile)
86 |         with open(f"{outname}.py", "w", encoding="utf-8") as outfile:
87 |             json.dump(loaded, fp=outfile, indent=indentation)
88 | 
89 | 
90 | if __name__ == "__main__":
91 |     main()
92 | 


--------------------------------------------------------------------------------
/06_advanced_python/debugging.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "metadata": {
  6 |     "Collapsed": "false"
  7 |    },
  8 |    "source": [
  9 |     "# Debugging in Python"
 10 |    ]
 11 |   },
 12 |   {
 13 |    "cell_type": "markdown",
 14 |    "metadata": {
 15 |     "Collapsed": "false"
 16 |    },
 17 |    "source": [
 18 |     "<img src=\"https://i.redd.it/ytitd72wz2b11.jpg\" width=\"400\">"
 19 |    ]
 20 |   },
 21 |   {
 22 |    "cell_type": "markdown",
 23 |    "metadata": {
 24 |     "Collapsed": "false"
 25 |    },
 26 |    "source": [
 27 |     "## pdb - The Python Debugger"
 28 |    ]
 29 |   },
 30 |   {
 31 |    "cell_type": "markdown",
 32 |    "metadata": {
 33 |     "Collapsed": "false"
 34 |    },
 35 |    "source": [
 36 |     "part of the core library\n",
 37 |     "\n",
 38 |     "allows interactive as well as whole script runs\n",
 39 |     "\n",
 40 |     "```python\n",
 41 |     "import pdb\n",
 42 |     "import mymodule\n",
 43 |     "pdf.run('mymodule.some_function()')\n",
 44 |     "```\n",
 45 |     "\n",
 46 |     "```bash\n",
 47 |     "python -m pdb myscript.py\n",
 48 |     "```"
 49 |    ]
 50 |   },
 51 |   {
 52 |    "cell_type": "markdown",
 53 |    "metadata": {
 54 |     "Collapsed": "false"
 55 |    },
 56 |    "source": [
 57 |     "Python 3.2:  `-c` option allows executing commands as if a `.pdbrc` config file was given\n",
 58 |     "\n",
 59 |     "Python 3.7:  \n",
 60 |     "- built-in `breakpoint()` to set a trace instead of `import pdb; pdb.set_trace()`\n",
 61 |     "- `-m` option executes modules similar to the way `python -m` does"
 62 |    ]
 63 |   },
 64 |   {
 65 |    "cell_type": "markdown",
 66 |    "metadata": {
 67 |     "Collapsed": "false"
 68 |    },
 69 |    "source": [
 70 |     "Upon `breakpoint()`, execution enters `debug mode`"
 71 |    ]
 72 |   },
 73 |   {
 74 |    "cell_type": "markdown",
 75 |    "metadata": {
 76 |     "Collapsed": "false"
 77 |    },
 78 |    "source": [
 79 |     "### Debugger Commands"
 80 |    ]
 81 |   },
 82 |   {
 83 |    "cell_type": "markdown",
 84 |    "metadata": {
 85 |     "Collapsed": "false"
 86 |    },
 87 |    "source": []
 88 |   },
 89 |   {
 90 |    "cell_type": "markdown",
 91 |    "metadata": {
 92 |     "Collapsed": "false"
 93 |    },
 94 |    "source": [
 95 |     "## Callgraphs"
 96 |    ]
 97 |   },
 98 |   {
 99 |    "cell_type": "markdown",
100 |    "metadata": {
101 |     "Collapsed": "false"
102 |    },
103 |    "source": [
104 |     "https://github.com/osteele/callgraph"
105 |    ]
106 |   },
107 |   {
108 |    "cell_type": "markdown",
109 |    "metadata": {
110 |     "Collapsed": "false"
111 |    },
112 |    "source": [
113 |     "```\n",
114 |     "pip install callgraph\n",
115 |     "```"
116 |    ]
117 |   },
118 |   {
119 |    "cell_type": "markdown",
120 |    "metadata": {
121 |     "Collapsed": "false"
122 |    },
123 |    "source": [
124 |     "### In Code (Decorator)\n",
125 |     "\n",
126 |     "```python\n",
127 |     "from functools import lru_cache\n",
128 |     "import callgraph.decorator as callgraph\n",
129 |     "\n",
130 |     "@callgraph()\n",
131 |     "@lru_cache()\n",
132 |     "def nchoosek(n, k):\n",
133 |     "    if k == 0:\n",
134 |     "        return 1\n",
135 |     "    if n == k:\n",
136 |     "        return 1\n",
137 |     "    return nchoosek(n - 1, k - 1) + nchoosek(n - 1, k)\n",
138 |     "\n",
139 |     "nchoosek(5, 2)\n",
140 |     "\n",
141 |     "nchoosek.__callgraph__.view()\n",
142 |     "```"
143 |    ]
144 |   },
145 |   {
146 |    "cell_type": "markdown",
147 |    "metadata": {
148 |     "Collapsed": "false"
149 |    },
150 |    "source": [
151 |     "### Callgraph Magic (Jupyter)\n",
152 |     "\n",
153 |     "```python\n",
154 |     "from functools import lru_cache\n",
155 |     "\n",
156 |     "@lru_cache()\n",
157 |     "def lev(a, b):\n",
158 |     "    if \"\" in (a, b):\n",
159 |     "        return len(a) + len(b)\n",
160 |     "\n",
161 |     "    candidates = []\n",
162 |     "    if a[0] == b[0]:\n",
163 |     "        candidates.append(lev(a[1:], b[1:]))\n",
164 |     "    else:\n",
165 |     "        candidates.append(lev(a[1:], b[1:]) + 1)\n",
166 |     "    candidates.append(lev(a, b[1:]) + 1)\n",
167 |     "    candidates.append(lev(a[1:], b) + 1)\n",
168 |     "    return min(candidates)\n",
169 |     "\n",
170 |     "%callgraph -w10 lev(\"big\", \"dog\"); lev(\"dig\", \"dog\")\n",
171 |     "```"
172 |    ]
173 |   },
174 |   {
175 |    "cell_type": "markdown",
176 |    "metadata": {
177 |     "Collapsed": "false"
178 |    },
179 |    "source": []
180 |   },
181 |   {
182 |    "cell_type": "markdown",
183 |    "metadata": {
184 |     "Collapsed": "false"
185 |    },
186 |    "source": []
187 |   },
188 |   {
189 |    "cell_type": "markdown",
190 |    "metadata": {
191 |     "Collapsed": "false"
192 |    },
193 |    "source": []
194 |   }
195 |  ],
196 |  "metadata": {
197 |   "kernelspec": {
198 |    "display_name": "Python 3",
199 |    "language": "python",
200 |    "name": "python3"
201 |   },
202 |   "language_info": {
203 |    "codemirror_mode": {
204 |     "name": "ipython",
205 |     "version": 3
206 |    },
207 |    "file_extension": ".py",
208 |    "mimetype": "text/x-python",
209 |    "name": "python",
210 |    "nbconvert_exporter": "python",
211 |    "pygments_lexer": "ipython3",
212 |    "version": "3.7.7"
213 |   }
214 |  },
215 |  "nbformat": 4,
216 |  "nbformat_minor": 4
217 | }
218 | 


--------------------------------------------------------------------------------
/00_intro/00_intro.py:
--------------------------------------------------------------------------------
  1 | # ---
  2 | # jupyter:
  3 | #   jupytext:
  4 | #     formats: ipynb,py
  5 | #     text_representation:
  6 | #       extension: .py
  7 | #       format_name: light
  8 | #       format_version: '1.5'
  9 | #       jupytext_version: 1.5.2
 10 | #   kernelspec:
 11 | #     display_name: Python 3
 12 | #     language: python
 13 | #     name: python3
 14 | # ---
 15 | 
 16 | # # Why Practical Training is Crucial
 17 | 
 18 | # ## Why bridging the gap is important
 19 | 
 20 | # <img src="https://preview.redd.it/zj4noiwbf7g31.jpg?width=640&crop=smart&auto=webp&s=75069427a53b85daa79284f096329d5a70c34719" width=300>
 21 | 
 22 | # <img src="https://external-preview.redd.it/2htFTGm3vrPB41z1b_jHeZ122-lGhMK6aKphMGwShXA.jpg?width=640&crop=smart&auto=webp&s=1b05de40142c59d22e42463b1804e187e550f46a" width=300>
 23 | 
 24 | # ### How to bridge the gap
 25 | 
 26 | # There are plenty of expensive courses, thick books, and jaded postdocs telling you how to do things in theory - and that's great! 
 27 | 
 28 | # But... how do you get there?
 29 | 
 30 | # Let's play a little game, can you tell me how to do each of these?
 31 | 
 32 | # | Problem                                                | Implementation |
 33 | # |--------------------------------------------------------|----------------|
 34 | # | Develop this new framework over the next 6 months      | ? |
 35 | # | Adding this feature will take a while                  | ? |
 36 | # | The code you're writing is turnign into a monster      | ? |
 37 | # | Hmmm this Jupyter notebook is gettign too long         | ? |
 38 | # | Writing documentation is too troublesome               | ? |
 39 | # | The code of the other developer looks terrible         | ? |
 40 | # | "Why is there a super() in here???"                    | ? |
 41 | # | "You know, you should make this script run with a CLI" | ? |
 42 | # | "How are these objects related?"                       | ? |
 43 | # | Chart the structure of your project                    | ? |
 44 | # | Figure out which part is slowing down the code         | ? |
 45 | # | Speed up this NumPy code                               | ? |
 46 | # | This loop is really slow... |  |
 47 | 
 48 | # Solution:
 49 | 
 50 | # | Problem                                                | Implementation |
 51 | # |--------------------------------------------------------|----------------|
 52 | # | Develop this new framework over the next 6 months      | Agile, Sprint planning, etc. |
 53 | # | Adding this feature will take a while                  | Sprint planning, review process |
 54 | # | The code you're writing is turnign into a monster      | Code architecture, refactoring |
 55 | # | Hmmm this Jupyter notebook is gettign too long         | module architecture |
 56 | # | Writing documentation is too troublesome               | AutoDoc, Docstring creator, etc |
 57 | # | The code of the other developer looks terrible         | code formatter, linting |
 58 | # | "Why is there a super() in here???"                    | Java Developers |
 59 | # | "What are these properties?"                           | Setters and Getters |
 60 | # | "You know, you should make this script run with a CLI" | click |
 61 | # | "How are these objects related?"                       | Coda Analytzr |
 62 | # | "Can you show me how this project is structured?"      | UML, Code Analyzer |
 63 | # | Figure out which part is slowing down the code         | Dynamic Code Analyzer / Profiler |
 64 | # | Speed up this NumPy code                               | Numba |
 65 | # | This loop is really slow....                           | Map(), Numba, Dask |
 66 | # | I should run this in parallel...                       | Multiprocessing, Dask |
 67 | 
 68 | # # Is this Course for you?
 69 | 
 70 | # <img src="https://blog.codinghorror.com/content/images/uploads/2009/02/6a0120a85dcdae970b012877707a45970c-pi.png" width=320>
 71 | 
 72 | # Have you ever gotten...
 73 | 
 74 | # + [markdown] jupyter={"outputs_hidden": true}
 75 | # - a shared project in your group, but couldn't figure out what 80\% of the functions or objects did?
 76 | # -
 77 | 
 78 | 
 79 | # - a code from a previous student/phd/postdoc and though -- WHAT THE F- is this?!
 80 | 
 81 | # - a bachelor student to use your code and only gotten stupid questions from them?
 82 | 
 83 | # I hate to break it to you, but you also write bad code
 84 | 
 85 | # We all write bad code, and the point is not to write perfect code, but to write less bad code.
 86 | #
 87 | # Just a world with less bad code. That's the dream.
 88 | 
 89 | # # Exercise
 90 | 
 91 | # - Pair up in groups of 2 or 3
 92 | # - Show the other person your last opened python code you wrote
 93 | # - Spend 5 minutes trying to unsderstand it
 94 | # - Discuss the code
 95 | 
 96 | # # Overview of the Course
 97 | 
 98 | # 1. Fundamentals of Production Code
 99 | #    - Workflow Organization
100 | #    - Environments
101 | #    - Code Style and Formatters
102 | #    - Design Patterns
103 | #    - Thinking Functionally
104 | #    - Module Architecture
105 | #    - CLI Interfaces
106 | # 2. Data Management Fundamentals
107 | #     - Pre-SQL
108 | #     - SQL
109 | #     - NoSQL
110 | #     - Graph Databases
111 | # 3. Continuous Integration Pipeline
112 | #     - Git
113 | #     - Unit Tests
114 | #     - Dockers
115 | #     - APIs
116 | 
117 | # 4. Best Practices in Data Science
118 | #     - Machine Learning 
119 | #     - Coding
120 | 
121 | # 5. Processing Data Efficiently
122 | #     - Tensorflow
123 | #     - Network Architeictures & Applications
124 | #     - Slurm
125 | #     - Numba
126 | #     - Dask
127 | 


--------------------------------------------------------------------------------
/99_other_material/meme_treasury.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "metadata": {},
  6 |    "source": [
  7 |     "# Debugging"
  8 |    ]
  9 |   },
 10 |   {
 11 |    "cell_type": "markdown",
 12 |    "metadata": {
 13 |     "collapsed": "false"
 14 |    },
 15 |    "source": [
 16 |     "<img src=\"https://i.redd.it/ytitd72wz2b11.jpg\" width=\"400\">"
 17 |    ]
 18 |   },
 19 |   {
 20 |    "cell_type": "code",
 21 |    "execution_count": null,
 22 |    "metadata": {
 23 |     "collapsed": true,
 24 |     "jupyter": {
 25 |      "outputs_hidden": true
 26 |     }
 27 |    },
 28 |    "outputs": [],
 29 |    "source": []
 30 |   },
 31 |   {
 32 |    "cell_type": "markdown",
 33 |    "metadata": {},
 34 |    "source": [
 35 |     "# Commenting your code is important\n",
 36 |     "\n",
 37 |     "<img src=\"https://preview.redd.it/a5skfy5y88x11.jpg?width=640&crop=smart&auto=webp&s=edee581af2812ad4f4469d13e79e38d9b2dd3f06\" width=300>"
 38 |    ]
 39 |   },
 40 |   {
 41 |    "cell_type": "code",
 42 |    "execution_count": null,
 43 |    "metadata": {
 44 |     "collapsed": true,
 45 |     "jupyter": {
 46 |      "outputs_hidden": true
 47 |     }
 48 |    },
 49 |    "outputs": [],
 50 |    "source": []
 51 |   },
 52 |   {
 53 |    "cell_type": "markdown",
 54 |    "metadata": {
 55 |     "collapsed": "false"
 56 |    },
 57 |    "source": [
 58 |     "Why this is needed\n",
 59 |     "\n",
 60 |     "<img src=\"https://external-preview.redd.it/2htFTGm3vrPB41z1b_jHeZ122-lGhMK6aKphMGwShXA.jpg?width=640&crop=smart&auto=webp&s=1b05de40142c59d22e42463b1804e187e550f46a\" width=\"400\">"
 61 |    ]
 62 |   },
 63 |   {
 64 |    "cell_type": "code",
 65 |    "execution_count": null,
 66 |    "metadata": {
 67 |     "collapsed": true,
 68 |     "jupyter": {
 69 |      "outputs_hidden": true
 70 |     }
 71 |    },
 72 |    "outputs": [],
 73 |    "source": []
 74 |   },
 75 |   {
 76 |    "cell_type": "markdown",
 77 |    "metadata": {
 78 |     "collapsed": "false"
 79 |    },
 80 |    "source": [
 81 |     "why git is important\n",
 82 |     "\n",
 83 |     "<img src=\"https://external-preview.redd.it/u1_S5Vu4FztMR72c9pfl086wbmdlZYVjK77i1IEvTjg.jpg?width=640&crop=smart&auto=webp&s=310af21a5b237f4b53a982afc2077fcdb4b1839c\" width=\"400\">"
 84 |    ]
 85 |   },
 86 |   {
 87 |    "cell_type": "code",
 88 |    "execution_count": null,
 89 |    "metadata": {
 90 |     "collapsed": true,
 91 |     "jupyter": {
 92 |      "outputs_hidden": true
 93 |     }
 94 |    },
 95 |    "outputs": [],
 96 |    "source": []
 97 |   },
 98 |   {
 99 |    "cell_type": "markdown",
100 |    "metadata": {
101 |     "collapsed": "false"
102 |    },
103 |    "source": [
104 |     "Why containers are important\n",
105 |     "\n",
106 |     "<img src=\"https://i.redd.it/ygjaybp2l5c21.jpg\">"
107 |    ]
108 |   },
109 |   {
110 |    "cell_type": "markdown",
111 |    "metadata": {
112 |     "collapsed": "false"
113 |    },
114 |    "source": [
115 |     "<img src=\"https://preview.redd.it/rkczpah5u2c41.jpg?width=640&crop=smart&auto=webp&s=0d8f791f7e40f34e5e03f1c6302a0efe584988e7\" width=400 />"
116 |    ]
117 |   },
118 |   {
119 |    "cell_type": "markdown",
120 |    "metadata": {
121 |     "collapsed": "false"
122 |    },
123 |    "source": [
124 |     "<img src=\"https://preview.redd.it/e8k3k4u0nqv41.png?width=640&crop=smart&auto=webp&s=f317522dedcf1023458447633f51b3394add283c\" width=400 />"
125 |    ]
126 |   },
127 |   {
128 |    "cell_type": "markdown",
129 |    "metadata": {
130 |     "collapsed": "false"
131 |    },
132 |    "source": [
133 |     "<img src=\"https://preview.redd.it/rh977vd6pvb41.jpg?width=640&crop=smart&auto=webp&s=6063b50ecc7b4b13edd1fb2adb51a76547c54cd2\" width=400 />"
134 |    ]
135 |   },
136 |   {
137 |    "cell_type": "markdown",
138 |    "metadata": {
139 |     "collapsed": "false"
140 |    },
141 |    "source": [
142 |     "<img src=\"https://preview.redd.it/nvrpt44due141.jpg?width=640&crop=smart&auto=webp&s=29fc90a3faf4f4bbcbbbbbfd2aa94b306255695b\" width=400 />"
143 |    ]
144 |   },
145 |   {
146 |    "cell_type": "markdown",
147 |    "metadata": {
148 |     "collapsed": "false"
149 |    },
150 |    "source": [
151 |     "<img src=\"https://preview.redd.it/mar8p5zhhm131.jpg?width=640&crop=smart&auto=webp&s=957cfba55d8fbd56cdf08be5600cdda3f7073a3f\" width=400 />"
152 |    ]
153 |   },
154 |   {
155 |    "cell_type": "markdown",
156 |    "metadata": {
157 |     "collapsed": "false"
158 |    },
159 |    "source": [
160 |     "<img src=\"https://preview.redd.it/78wm8uwn5xz31.jpg?width=640&crop=smart&auto=webp&s=3d41720241fe52c072ddde6f09b39356b2420c3c\" width=400 />"
161 |    ]
162 |   },
163 |   {
164 |    "cell_type": "markdown",
165 |    "metadata": {
166 |     "collapsed": "false"
167 |    },
168 |    "source": [
169 |     "<img src=\"https://i.redd.it/uxs5wnj01ev41.jpg\" width=400 />"
170 |    ]
171 |   },
172 |   {
173 |    "cell_type": "markdown",
174 |    "metadata": {
175 |     "collapsed": "false"
176 |    },
177 |    "source": [
178 |     "<img src=\"https://preview.redd.it/hb2n8l3yogb41.jpg?width=640&crop=smart&auto=webp&s=cbf9a4d90a06d88cba3139f780045c42fa99800f\" width=400 />"
179 |    ]
180 |   },
181 |   {
182 |    "cell_type": "markdown",
183 |    "metadata": {
184 |     "collapsed": "false"
185 |    },
186 |    "source": [
187 |     "<img src=\"https://preview.redd.it/wvh152fq18c31.png?width=640&crop=smart&auto=webp&s=05be4abc9d85f9370d8ed8fb3ff484bbe39899fd\" width=400 />"
188 |    ]
189 |   }
190 |  ],
191 |  "metadata": {
192 |   "kernelspec": {
193 |    "display_name": "Python 3",
194 |    "language": "python",
195 |    "name": "python3"
196 |   },
197 |   "language_info": {
198 |    "codemirror_mode": {
199 |     "name": "ipython",
200 |     "version": 3
201 |    },
202 |    "file_extension": ".py",
203 |    "mimetype": "text/x-python",
204 |    "name": "python",
205 |    "nbconvert_exporter": "python",
206 |    "pygments_lexer": "ipython3",
207 |    "version": "3.7.9"
208 |   }
209 |  },
210 |  "nbformat": 4,
211 |  "nbformat_minor": 4
212 | }
213 | 


--------------------------------------------------------------------------------
/04_best_practices/04_best_practices.py:
--------------------------------------------------------------------------------
  1 | # ---
  2 | # jupyter:
  3 | #   jupytext:
  4 | #     formats: ipynb,py
  5 | #     text_representation:
  6 | #       extension: .py
  7 | #       format_name: light
  8 | #       format_version: '1.5'
  9 | #       jupytext_version: 1.6.0
 10 | #   kernelspec:
 11 | #     display_name: Python 3
 12 | #     language: python
 13 | #     name: python3
 14 | # ---
 15 | 
 16 | # + [markdown] slideshow={"slide_type": "slide"}
 17 | # # Best Practices in Machine Learning and Code Organization
 18 | 
 19 | # + [markdown] slideshow={"slide_type": "slide"}
 20 | # ## Motivation
 21 | 
 22 | # + [markdown] slideshow={"slide_type": "fragment"}
 23 | # - What does best-practice even mean?
 24 | # - How do I know something is a bad practice?
 25 | 
 26 | # + [markdown] jupyter={"outputs_hidden": true} slideshow={"slide_type": "fragment"}
 27 | # > It's not wrong, but it feels wrong.
 28 | 
 29 | 
 30 | # + [markdown] slideshow={"slide_type": "slide"}
 31 | # ## Overview
 32 | 
 33 | # + [markdown] slideshow={"slide_type": "fragment"}
 34 | # Best Pratices in:
 35 | # - Machine Learning Code Bases and Versioning
 36 | # - Code and Module organization and philosophies
 37 | 
 38 | # + [markdown] slideshow={"slide_type": "slide"}
 39 | # ## Bad vs. Best Practices in Python
 40 | 
 41 | # + [markdown] slideshow={"slide_type": "subslide"}
 42 | # ### Repetition
 43 | 
 44 | # + [markdown] slideshow={"slide_type": "subslide"}
 45 | # #### Python is not C - so do ***not*** copy-and-paste!
 46 | 
 47 | # + [markdown] slideshow={"slide_type": "fragment"}
 48 | # <img src="https://www.gogagah.com/wp-content/uploads/2019/04/Find-the-Difference-1024x576.jpg" width=380>
 49 | 
 50 | # + [markdown] slideshow={"slide_type": "subslide"}
 51 | # #### Instead of copy & pasting:
 52 | 
 53 | # + [markdown] slideshow={"slide_type": "fragment"}
 54 | # - write functions!
 55 | # - compose functions!
 56 | # - create partial function!
 57 | 
 58 | # + slideshow={"slide_type": "fragment"}
 59 | def add(a, b):
 60 |     return a + b
 61 | 
 62 | 
 63 | # +
 64 | from functools import partial
 65 | 
 66 | add2 = partial(add, 2)  # Create a copy of add() with a=2
 67 | 
 68 | add2(3)
 69 | 
 70 | # + slideshow={"slide_type": "fragment"}
 71 | add2 = lambda x: add(2, x)
 72 | 
 73 | add2(3)
 74 | 
 75 | 
 76 | # +
 77 | def add2(x):
 78 |     return add(2, x)
 79 | 
 80 | add2(3)
 81 | 
 82 | 
 83 | # + [markdown] slideshow={"slide_type": "subslide"}
 84 | # ### Switch Behavior
 85 | 
 86 | # + [markdown] slideshow={"slide_type": "subslide"}
 87 | # #### Python has no switch statements, but don't go around stacking if's:
 88 | 
 89 | # + [markdown] slideshow={"slide_type": "fragment"}
 90 | # <img src="https://i.redd.it/6rbq35occu441.jpg" width=300px>
 91 | 
 92 | # + [markdown] slideshow={"slide_type": "subslide"}
 93 | # #### Instead of stacking if-else:
 94 | 
 95 | # + [markdown] slideshow={"slide_type": "fragment"}
 96 | # - map things with a dictionary!
 97 | 
 98 | # + [markdown] slideshow={"slide_type": "fragment"}
 99 | # Dictionaries are hashmaps, meaning the map a hash to an object.
100 | 
101 | # + [markdown] slideshow={"slide_type": "fragment"}
102 | # Since Functions are first order objects in Python, they can be pointed to!
103 | 
104 | # + slideshow={"slide_type": "fragment"}
105 | def add(a, b):
106 |     return a + b
107 | 
108 | def add_sum(a, b):
109 |     return sum([a, b])
110 | 
111 | math_functions = {'add': add_sum}
112 | 
113 | math_functions['add'](2, 2)
114 | 
115 | # + [markdown] slideshow={"slide_type": "subslide"}
116 | # ### Depth
117 | 
118 | # + [markdown] slideshow={"slide_type": "subslide"}
119 | # #### Making too many layers - inheritance, nesting, etc.
120 | 
121 | # + [markdown] slideshow={"slide_type": "fragment"}
122 | # <img src="https://preview.redd.it/3kz7f2k1psx41.jpg?width=640&crop=smart&auto=webp&s=0c026807888b4c611089b31c740947bf78b5a3c5" width=400 />
123 | 
124 | # + [markdown] slideshow={"slide_type": "subslide"}
125 | # #### Instead keep things shallow
126 | 
127 | # + [markdown] slideshow={"slide_type": "fragment"}
128 | # Ask yourself:
129 | # - Do I need this class?
130 | #   - Will it be instantiated often?
131 | #   - Are there many objects inheriting from it?
132 | #   - Does it carry state? Otherwise its a namespace!
133 | # - Does this need to be submodul or a file?
134 | #   - Are there many long functions?
135 | #   - Are there a large number of private functions?
136 | 
137 | # + [markdown] slideshow={"slide_type": "fragment"}
138 | # Singleton Pattern (Single global instance for an Object)
139 | # - If it does not carry state, it is a namespace
140 | #   - In Python, any file is a namespace! No need for the Object or Instance!
141 | # - If it just carries state, you want a database
142 | #   - Atomicity of operation can be guaranteed with a database
143 | #   - Database outside of Global Interpreter Lock (GIL)
144 | #   - Databases scale better!
145 | 
146 | # + [markdown] slideshow={"slide_type": "subslide"}
147 | # ### Readability
148 | 
149 | # + [markdown] slideshow={"slide_type": "subslide"}
150 | # #### Write code - but write it to be read!
151 | 
152 | # + [markdown] slideshow={"slide_type": "fragment"}
153 | # <img src="https://i.redd.it/yl1lu031day41.png" width=400 />
154 | 
155 | # + [markdown] slideshow={"slide_type": "subslide"}
156 | # #### Code is written to be read
157 | 
158 | # + [markdown] slideshow={"slide_type": "fragment"}
159 | # - Documentation
160 | # - Type Hinting
161 | # - Naming
162 | 
163 | # + [markdown] slideshow={"slide_type": "subslide"}
164 | # ### Dependencies
165 | 
166 | # + [markdown] slideshow={"slide_type": "subslide"}
167 | # #### Sometimes they're too tempting
168 | 
169 | # + [markdown] slideshow={"slide_type": "fragment"}
170 | # <img src="https://i.redd.it/mapjfjami3y41.jpg" width=400 />
171 | 
172 | # + [markdown] slideshow={"slide_type": "subslide"}
173 | # #### Why?
174 | 
175 | # + [markdown] slideshow={"slide_type": "fragment"}
176 | # - Projects get abandoned
177 | #   - Lack of security patches
178 | #   - Forced to stay with old versions
179 | #   - => Your project becomes ancient
180 | # -
181 | 
182 | # Update regularly!
183 | # - Small bugs on a regular basis prevent abandonment
184 | # - Improved performances
185 | # - Additional functionality!
186 | 
187 | # + [markdown] slideshow={"slide_type": "subslide"}
188 | # ### Keep things short
189 | 
190 | # + [markdown] slideshow={"slide_type": "subslide"}
191 | # #### The first law of Software Quality
192 | 
193 | # + [markdown] slideshow={"slide_type": "fragment"}
194 | # <img src="https://i.redd.it/tozimpm65gy41.jpg" widht=350 />
195 | 
196 | # + [markdown] slideshow={"slide_type": "subslide"}
197 | # #### Sometimes less functionality is more maintainability
198 | 
199 | # + [markdown] slideshow={"slide_type": "fragment"}
200 | # > Each line of code is a credit you take on and interest is paid in time to maintain the base. Don't default on your code debt.
201 | 
202 | # + [markdown] slideshow={"slide_type": "fragment"}
203 | # Finding non-critical code:
204 | # - Is this functionality used by many?
205 | # - Is this code still used or abandoned?
206 | # - Is it relevant to the larger goal?
207 | 
208 | # + [markdown] slideshow={"slide_type": "fragment"}
209 | # Solving too much code:
210 | # - Spin out functionality into a different module
211 | # - Simplify the code
212 | # - Delete code
213 | # - No really, you should delete code
214 | 
215 | # + [markdown] slideshow={"slide_type": "subslide"}
216 | # ### Use version control
217 | 
218 | # + [markdown] slideshow={"slide_type": "fragment"}
219 | # <img src="https://external-preview.redd.it/u1_S5Vu4FztMR72c9pfl086wbmdlZYVjK77i1IEvTjg.jpg?width=640&crop=smart&auto=webp&s=310af21a5b237f4b53a982afc2077fcdb4b1839c" width="400">
220 | 


--------------------------------------------------------------------------------
/06_advanced_python/jupyter_addons.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "metadata": {},
  6 |    "source": [
  7 |     "# Useful tools for you workflow"
  8 |    ]
  9 |   },
 10 |   {
 11 |    "cell_type": "markdown",
 12 |    "metadata": {},
 13 |    "source": [
 14 |     "## Spell Check!"
 15 |    ]
 16 |   },
 17 |   {
 18 |    "cell_type": "markdown",
 19 |    "metadata": {},
 20 |    "source": [
 21 |     "https://github.com/ijmbarr/jupyterlab_spellchecker"
 22 |    ]
 23 |   },
 24 |   {
 25 |    "cell_type": "markdown",
 26 |    "metadata": {},
 27 |    "source": [
 28 |     "`jupyter labextension install @ijmbarr/jupyterlab_spellchecker`"
 29 |    ]
 30 |   },
 31 |   {
 32 |    "cell_type": "markdown",
 33 |    "metadata": {},
 34 |    "source": [
 35 |     "### Using Latex in Power Point"
 36 |    ]
 37 |   },
 38 |   {
 39 |    "cell_type": "markdown",
 40 |    "metadata": {},
 41 |    "source": [
 42 |     "https://www.fast.ai/2019/06/17/latex-ppt/"
 43 |    ]
 44 |   },
 45 |   {
 46 |    "cell_type": "markdown",
 47 |    "metadata": {},
 48 |    "source": [
 49 |     "## Table of Contents\n",
 50 |     "\n",
 51 |     "https://github.com/ian-r-rose/jupyterlab-toc"
 52 |    ]
 53 |   },
 54 |   {
 55 |    "cell_type": "markdown",
 56 |    "metadata": {},
 57 |    "source": [
 58 |     "```bash\n",
 59 |     "jupyter labextension install @jupyterlab/toc\n",
 60 |     "```"
 61 |    ]
 62 |   },
 63 |   {
 64 |    "cell_type": "markdown",
 65 |    "metadata": {},
 66 |    "source": [
 67 |     "## Collapsible Headings\n",
 68 |     "\n",
 69 |     "https://github.com/aquirdTurtle/Collapsible_Headings"
 70 |    ]
 71 |   },
 72 |   {
 73 |    "cell_type": "markdown",
 74 |    "metadata": {},
 75 |    "source": [
 76 |     "```bash\n",
 77 |     "jupyter labextension install @aquirdturtle/collapsible_headings\n",
 78 |     "```"
 79 |    ]
 80 |   },
 81 |   {
 82 |    "cell_type": "markdown",
 83 |    "metadata": {},
 84 |    "source": [
 85 |     "## Go-To-Definition\n",
 86 |     "\n",
 87 |     "https://github.com/krassowski/jupyterlab-go-to-definition\n",
 88 |     "\n",
 89 |     "```bash\n",
 90 |     "jupyter labextension install @krassowski/jupyterlab_go_to_definition\n",
 91 |     "```"
 92 |    ]
 93 |   },
 94 |   {
 95 |    "cell_type": "markdown",
 96 |    "metadata": {},
 97 |    "source": [
 98 |     "## Notifications for completion of long run of Jupyter Code\n",
 99 |     "\n",
100 |     "https://github.com/ShopRunner/jupyter-notify"
101 |    ]
102 |   },
103 |   {
104 |    "cell_type": "markdown",
105 |    "metadata": {},
106 |    "source": [
107 |     "### Install JupyterNotify\n",
108 |     "```bash\n",
109 |     "pip install jupyternotify\n",
110 |     "```"
111 |    ]
112 |   },
113 |   {
114 |    "cell_type": "markdown",
115 |    "metadata": {},
116 |    "source": [
117 |     "### Enable Notification "
118 |    ]
119 |   },
120 |   {
121 |    "cell_type": "markdown",
122 |    "metadata": {},
123 |    "source": [
124 |     "#### Activate Javascript (only JupyterLab)\n",
125 |     "```python\n",
126 |     "%%javascript\n",
127 |     "var jq = document.createElement('script');\n",
128 |     "jq.src = \"https://ajax.googleapis.com/ajax/libs/jquery/2.1.4/jquery.min.js\";\n",
129 |     "document.getElementsByTagName('head')[0].appendChild(jq);\n",
130 |     "```"
131 |    ]
132 |   },
133 |   {
134 |    "cell_type": "markdown",
135 |    "metadata": {},
136 |    "source": [
137 |     "#### Enable extension\n",
138 |     "```python\n",
139 |     "%load_ext jupyternotify\n",
140 |     "``` "
141 |    ]
142 |   },
143 |   {
144 |    "cell_type": "markdown",
145 |    "metadata": {},
146 |    "source": [
147 |     "### Using Notification\n",
148 |     "```python\n",
149 |     "%%notify\n",
150 |     "print(\"hi, when this is written, you'll get a notification!\")\n",
151 |     "```"
152 |    ]
153 |   },
154 |   {
155 |    "cell_type": "markdown",
156 |    "metadata": {},
157 |    "source": [
158 |     "#### Giving a message\n",
159 |     "\n",
160 |     "\n",
161 |     "```python\n",
162 |     "%%notify -m \"this is the notification message\"\n",
163 |     "```\n",
164 |     "\n",
165 |     "```python\n",
166 |     "%%notify -o\n",
167 |     "time.sleep(4)\n",
168 |     "'this is the notification messsage'\n",
169 |     "```"
170 |    ]
171 |   },
172 |   {
173 |    "cell_type": "markdown",
174 |    "metadata": {},
175 |    "source": [
176 |     "## Bell Notification\n",
177 |     "\n",
178 |     "https://github.com/samwhitehall/ipython-bell"
179 |    ]
180 |   },
181 |   {
182 |    "cell_type": "markdown",
183 |    "metadata": {},
184 |    "source": [
185 |     "## Visualizations"
186 |    ]
187 |   },
188 |   {
189 |    "cell_type": "markdown",
190 |    "metadata": {},
191 |    "source": [
192 |     "### Dash Plugin\n",
193 |     "https://github.com/plotly/jupyterlab-dash\n",
194 |     "```bash\n",
195 |     "jupyter labextension install jupyterlab-dash@0.1.0-alpha.3\n",
196 |     "```"
197 |    ]
198 |   },
199 |   {
200 |    "cell_type": "markdown",
201 |    "metadata": {},
202 |    "source": [
203 |     "### Bokeh Plugin\n",
204 |     "https://github.com/bokeh/jupyter_bokeh\n",
205 |     "```bash\n",
206 |     "conda install -c bokeh jupyter_bokeh\n",
207 |     "jupyter labextension install @jupyter-widgets/jupyterlab-manager\n",
208 |     "jupyter labextension install @bokeh/jupyter_bokeh\n",
209 |     "```"
210 |    ]
211 |   },
212 |   {
213 |    "cell_type": "markdown",
214 |    "metadata": {},
215 |    "source": [
216 |     "## Enhanced Multiprocessing\n",
217 |     "\n",
218 |     "https://github.com/krassowski/enhanced-multiprocessing\n",
219 |     "\n",
220 |     "```bash\n",
221 |     "pip install enhanced_multiprocessing\n",
222 |     "```"
223 |    ]
224 |   },
225 |   {
226 |    "cell_type": "markdown",
227 |    "metadata": {},
228 |    "source": [
229 |     "## Helpers\n",
230 |     "\n",
231 |     "https://github.com/krassowski/jupyter-helpers\n",
232 |     "\n",
233 |     "```bash\n",
234 |     "pip install jupyter_helpers\n",
235 |     "```"
236 |    ]
237 |   },
238 |   {
239 |    "cell_type": "markdown",
240 |    "metadata": {},
241 |    "source": [
242 |     "## Jupytext for better git diffs"
243 |    ]
244 |   },
245 |   {
246 |    "cell_type": "markdown",
247 |    "metadata": {},
248 |    "source": [
249 |     "https://github.com/mwouts/jupytext"
250 |    ]
251 |   },
252 |   {
253 |    "cell_type": "markdown",
254 |    "metadata": {},
255 |    "source": [
256 |     "```bash\n",
257 |     "pip install jupytext\n",
258 |     "```"
259 |    ]
260 |   },
261 |   {
262 |    "cell_type": "markdown",
263 |    "metadata": {},
264 |    "source": [
265 |     "## Make presentations out of Notebooks!"
266 |    ]
267 |   },
268 |   {
269 |    "cell_type": "markdown",
270 |    "metadata": {},
271 |    "source": [
272 |     "https://github.com/damianavila/RISE"
273 |    ]
274 |   },
275 |   {
276 |    "cell_type": "markdown",
277 |    "metadata": {},
278 |    "source": [
279 |     "```bash\n",
280 |     "pip install rise\n",
281 |     "``` "
282 |    ]
283 |   },
284 |   {
285 |    "cell_type": "markdown",
286 |    "metadata": {},
287 |    "source": [
288 |     "For automatically designating slide_types based on mmarkdown headers, consider: https://github.com/the-rccg/ipynb_slidetype_generator"
289 |    ]
290 |   }
291 |  ],
292 |  "metadata": {
293 |   "kernelspec": {
294 |    "display_name": "Python 3",
295 |    "language": "python",
296 |    "name": "python3"
297 |   },
298 |   "language_info": {
299 |    "codemirror_mode": {
300 |     "name": "ipython",
301 |     "version": 3
302 |    },
303 |    "file_extension": ".py",
304 |    "mimetype": "text/x-python",
305 |    "name": "python",
306 |    "nbconvert_exporter": "python",
307 |    "pygments_lexer": "ipython3",
308 |    "version": "3.7.9"
309 |   }
310 |  },
311 |  "nbformat": 4,
312 |  "nbformat_minor": 4
313 | }
314 | 


--------------------------------------------------------------------------------
/03_continuous_integration/03_continuous_integration.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "metadata": {
  6 |     "Collapsed": "false",
  7 |     "slideshow": {
  8 |      "slide_type": "slide"
  9 |     }
 10 |    },
 11 |    "source": [
 12 |     "# The API & Docker course"
 13 |    ]
 14 |   },
 15 |   {
 16 |    "cell_type": "markdown",
 17 |    "metadata": {
 18 |     "Collapsed": "false",
 19 |     "slideshow": {
 20 |      "slide_type": "slide"
 21 |     },
 22 |     "tags": []
 23 |    },
 24 |    "source": [
 25 |     "In this session, we try to implement an API using Docker and python. The code shall be documented and stored on git. We will talk about testing scenarios and continous integration\n",
 26 |     "\n",
 27 |     "Project: The goal of the project is to create an API which recived inputs a form of lists, and predicts the target value and sends back the results. For example as the input\n",
 28 |     "\n",
 29 |     "```bash\n",
 30 |     "     curl -d '{\"features\":[1,2,3,4]}' \\\n",
 31 |     "     -H \"Content-Type: application/json\" \\\n",
 32 |     "     -X POST http://localhost:8000/iris_api\n",
 33 |     "```"
 34 |    ]
 35 |   },
 36 |   {
 37 |    "cell_type": "markdown",
 38 |    "metadata": {
 39 |     "Collapsed": "false",
 40 |     "slideshow": {
 41 |      "slide_type": "slide"
 42 |     }
 43 |    },
 44 |    "source": [
 45 |     "\n",
 46 |     "For this part, you need to have a GitHub acount. I highly recommend to use GitKraken as your UI for git commands.\n",
 47 |     "\n",
 48 |     "## Implementation Part 1: repository setup\n",
 49 |     "\n",
 50 |     "1. Create a GitHub repository name `iris-api`\n",
 51 |     "2. Add a .gitignore for python codes\n",
 52 |     "3. Clone the repository on your computer\n",
 53 |     "4. copy the content of the iris-api to your folder https://github.com/Mu-DS/practical_training/tree/master/03_continuous_integration/iris-api\n",
 54 |     "5. commit and push\n",
 55 |     "\n",
 56 |     "\n",
 57 |     "https://github.github.com/training-kit/downloads/github-git-cheat-sheet.pdf"
 58 |    ]
 59 |   },
 60 |   {
 61 |    "cell_type": "code",
 62 |    "execution_count": null,
 63 |    "metadata": {},
 64 |    "outputs": [],
 65 |    "source": []
 66 |   },
 67 |   {
 68 |    "cell_type": "markdown",
 69 |    "metadata": {
 70 |     "slideshow": {
 71 |      "slide_type": "slide"
 72 |     }
 73 |    },
 74 |    "source": [
 75 |     "## HTTP Request Methods\n",
 76 |     "\n",
 77 |     "\n",
 78 |     "What is HTTP?\n",
 79 |     "The Hypertext Transfer Protocol (HTTP) is designed to enable communications between clients and servers.\n",
 80 |     "\n",
 81 |     "HTTP works as a request-response protocol between a client and server.\n",
 82 |     "\n",
 83 |     "Example: A client (browser) sends an HTTP request to the server; then the server returns a response to the client. The response contains status information about the request and may also contain the requested content.\n",
 84 |     "\n",
 85 |     "\n",
 86 |     "https://www.w3schools.com/tags/ref_httpmethods.asp\n",
 87 |     "\n",
 88 |     "## HTTP Status codes\n",
 89 |     "\n",
 90 |     "- 200 OK\n",
 91 |     "- 400 Bad request\n",
 92 |     "- 500 Internal Server Error\n",
 93 |     "\n",
 94 |     "https://en.wikipedia.org/wiki/List_of_HTTP_status_codes"
 95 |    ]
 96 |   },
 97 |   {
 98 |    "cell_type": "markdown",
 99 |    "metadata": {
100 |     "Collapsed": "false",
101 |     "slideshow": {
102 |      "slide_type": "slide"
103 |     }
104 |    },
105 |    "source": [
106 |     "## Web-framework for open science access\n",
107 |     "\n",
108 |     "Frameworks:\n",
109 |     "\n",
110 |     "- Flask [https://flask.palletsprojects.com/en/1.1.x/tutorial/] [https://blog.miguelgrinberg.com/post/the-flask-mega-tutorial-part-i-hello-world]\n",
111 |     "- Django\n",
112 |     "- Bottle\n",
113 |     "- Falcon (we use this one!) [https://falcon.readthedocs.io/en/stable/]"
114 |    ]
115 |   },
116 |   {
117 |    "cell_type": "markdown",
118 |    "metadata": {
119 |     "slideshow": {
120 |      "slide_type": "slide"
121 |     }
122 |    },
123 |    "source": [
124 |     "## The GET Method\n",
125 |     "\n",
126 |     "GET is used to request data from a specified resource.\n",
127 |     "\n",
128 |     "GET is one of the most common HTTP methods.\n",
129 |     "\n",
130 |     "Note that the query string (name/value pairs) is sent in the URL of a GET request:\n",
131 |     "\n",
132 |     "/test/demo_form.php?name1=value1&name2=value2\n",
133 |     "Some other notes on GET requests:\n",
134 |     "\n",
135 |     "GET requests can be cached\n",
136 |     "GET requests remain in the browser history\n",
137 |     "GET requests can be bookmarked\n",
138 |     "GET requests should never be used when dealing with sensitive data\n",
139 |     "GET requests have length restrictions\n",
140 |     "GET requests are only used to request data (not modify)\n"
141 |    ]
142 |   },
143 |   {
144 |    "cell_type": "markdown",
145 |    "metadata": {
146 |     "slideshow": {
147 |      "slide_type": "slide"
148 |     }
149 |    },
150 |    "source": [
151 |     "## The POST Method\n",
152 |     "\n",
153 |     "POST is used to send data to a server to create/update a resource.\n",
154 |     "\n",
155 |     "The data sent to the server with POST is stored in the request body of the HTTP request:\n",
156 |     "\n",
157 |     "POST /test/demo_form.php HTTP/1.1\n",
158 |     "Host: w3schools.com\n",
159 |     "name1=value1&name2=value2\n",
160 |     "POST is one of the most common HTTP methods.\n",
161 |     "\n",
162 |     "Some other notes on POST requests:\n",
163 |     "\n",
164 |     "POST requests are never cached\n",
165 |     "POST requests do not remain in the browser history\n",
166 |     "POST requests cannot be bookmarked\n",
167 |     "POST requests have no restrictions on data length"
168 |    ]
169 |   },
170 |   {
171 |    "cell_type": "markdown",
172 |    "metadata": {
173 |     "Collapsed": "false",
174 |     "slideshow": {
175 |      "slide_type": "slide"
176 |     }
177 |    },
178 |    "source": [
179 |     "## Implementation 2: Coding part\n",
180 |     "\n",
181 |     "There are missing parts in '/resources/IrisPredictorResource.py'.\n",
182 |     "\n",
183 |     "Finish the code with your teammate"
184 |    ]
185 |   },
186 |   {
187 |    "cell_type": "markdown",
188 |    "metadata": {
189 |     "Collapsed": "false",
190 |     "slideshow": {
191 |      "slide_type": "slide"
192 |     }
193 |    },
194 |    "source": [
195 |     "### API structure\n",
196 |     "\n",
197 |     "\n",
198 |     "The structure of the root folder should look like this:\n",
199 |     "\n",
200 |     "```\n",
201 |     "root/\n",
202 |     "    ├── resources/                 \n",
203 |     "    │\n",
204 |     "    ├── tests/\n",
205 |     "    │\n",
206 |     "    ├── bin/ \n",
207 |     "    │\n",
208 |     "    ├── models/ \n",
209 |     "    │\n",
210 |     "    ├── .gitignore   \n",
211 |     "    │\n",
212 |     "    ├── Dockerfile  \n",
213 |     "    │\n",
214 |     "    ├── main.py     \n",
215 |     "    │\n",
216 |     "    ├── LICENSE     \n",
217 |     "    │\n",
218 |     "    ├── service.yaml   \n",
219 |     "    │ \n",
220 |     "    └── README.md                    \n",
221 |     "```\n",
222 |     "\n"
223 |    ]
224 |   },
225 |   {
226 |    "cell_type": "markdown",
227 |    "metadata": {
228 |     "Collapsed": "false",
229 |     "slideshow": {
230 |      "slide_type": "slide"
231 |     }
232 |    },
233 |    "source": [
234 |     "## Docker\n",
235 |     "\n",
236 |     "Developing apps today requires so much more than writing code. Multiple languages, frameworks, architectures, and discontinuous interfaces between tools for each lifecycle stage creates enormous complexity. Docker simplifies and accelerates your workflow, while giving developers the freedom to innovate with their choice of tools, application stacks, and deployment environments for each project.\n",
237 |     "\n",
238 |     "https://www.docker.com/sites/default/files/d8/2019-09/docker-cheat-sheet.pdf"
239 |    ]
240 |   },
241 |   {
242 |    "cell_type": "markdown",
243 |    "metadata": {
244 |     "Collapsed": "false",
245 |     "slideshow": {
246 |      "slide_type": "slide"
247 |     }
248 |    },
249 |    "source": [
250 |     "## Implementation 3: Docker part\n",
251 |     "\n",
252 |     "We use Docker as our container. Also, we use gunicorn for handling the calls.\n",
253 |     "https://gunicorn.org/\n",
254 |     "\n",
255 |     "1. Finilize the docker file '/Dockerfile'.\n",
256 |     "2. Install the Docker using the readme\n",
257 |     "3. Run the Docker\n",
258 |     "4. test the docker using curl command\n"
259 |    ]
260 |   },
261 |   {
262 |    "cell_type": "markdown",
263 |    "metadata": {
264 |     "slideshow": {
265 |      "slide_type": "slide"
266 |     }
267 |    },
268 |    "source": [
269 |     "## GitHub Apps for CI\n",
270 |     "\n",
271 |     "You can use different github apps for your code quality\n",
272 |     "\n",
273 |     "here we use Travis & Codacy Production\n",
274 |     "\n",
275 |     "Travis CI: https://travis-ci.org/\n",
276 |     "\n",
277 |     "Codacy: https://codacy.com"
278 |    ]
279 |   },
280 |   {
281 |    "cell_type": "markdown",
282 |    "metadata": {
283 |     "slideshow": {
284 |      "slide_type": "slide"
285 |     }
286 |    },
287 |    "source": [
288 |     "## Implementation 4: Add unit testing\n",
289 |     "\n",
290 |     "Add different test scenarios for your functions to check if everything is alright\n",
291 |     "\n",
292 |     "Add this unittests in '/tests/' folder"
293 |    ]
294 |   },
295 |   {
296 |    "cell_type": "markdown",
297 |    "metadata": {
298 |     "slideshow": {
299 |      "slide_type": "slide"
300 |     }
301 |    },
302 |    "source": [
303 |     "## Implementation 5: CI apps\n",
304 |     "\n",
305 |     "Log in to Travis and Codacy and connect them to your github account"
306 |    ]
307 |   },
308 |   {
309 |    "cell_type": "markdown",
310 |    "metadata": {
311 |     "slideshow": {
312 |      "slide_type": "slide"
313 |     }
314 |    },
315 |    "source": [
316 |     "## Implementation 6: travis\n",
317 |     "\n",
318 |     "add this content to '.travis.yml' file\n",
319 |     "\n",
320 |     "'''yaml\n",
321 |     "os:\n",
322 |     "  - linux\n",
323 |     "\n",
324 |     "language: python\n",
325 |     "\n",
326 |     "python:\n",
327 |     "  - \"3.6\"\n",
328 |     "  - \"3.7\"\n",
329 |     "\n",
330 |     "script:\n",
331 |     "  - pytest\n",
332 |     "'''"
333 |    ]
334 |   },
335 |   {
336 |    "cell_type": "code",
337 |    "execution_count": null,
338 |    "metadata": {},
339 |    "outputs": [],
340 |    "source": []
341 |   }
342 |  ],
343 |  "metadata": {
344 |   "celltoolbar": "Slideshow",
345 |   "jupytext": {
346 |    "formats": "ipynb,py"
347 |   },
348 |   "kernelspec": {
349 |    "display_name": "Python 3",
350 |    "language": "python",
351 |    "name": "python3"
352 |   },
353 |   "language_info": {
354 |    "codemirror_mode": {
355 |     "name": "ipython",
356 |     "version": 3
357 |    },
358 |    "file_extension": ".py",
359 |    "mimetype": "text/x-python",
360 |    "name": "python",
361 |    "nbconvert_exporter": "python",
362 |    "pygments_lexer": "ipython3",
363 |    "version": "3.7.4"
364 |   }
365 |  },
366 |  "nbformat": 4,
367 |  "nbformat_minor": 4
368 | }
369 | 


--------------------------------------------------------------------------------
/00_intro/00_intro.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "metadata": {
  6 |     "slideshow": {
  7 |      "slide_type": "slide"
  8 |     }
  9 |    },
 10 |    "source": [
 11 |     "# Why Practical Training is Crucial"
 12 |    ]
 13 |   },
 14 |   {
 15 |    "cell_type": "markdown",
 16 |    "metadata": {
 17 |     "slideshow": {
 18 |      "slide_type": "slide"
 19 |     }
 20 |    },
 21 |    "source": [
 22 |     "## Why bridging the gap is important"
 23 |    ]
 24 |   },
 25 |   {
 26 |    "cell_type": "markdown",
 27 |    "metadata": {
 28 |     "slideshow": {
 29 |      "slide_type": "fragment"
 30 |     }
 31 |    },
 32 |    "source": [
 33 |     "<img src=\"https://preview.redd.it/zj4noiwbf7g31.jpg?width=640&crop=smart&auto=webp&s=75069427a53b85daa79284f096329d5a70c34719\" width=300>"
 34 |    ]
 35 |   },
 36 |   {
 37 |    "cell_type": "markdown",
 38 |    "metadata": {
 39 |     "slideshow": {
 40 |      "slide_type": "fragment"
 41 |     }
 42 |    },
 43 |    "source": [
 44 |     "<img src=\"https://external-preview.redd.it/2htFTGm3vrPB41z1b_jHeZ122-lGhMK6aKphMGwShXA.jpg?width=640&crop=smart&auto=webp&s=1b05de40142c59d22e42463b1804e187e550f46a\" width=300>"
 45 |    ]
 46 |   },
 47 |   {
 48 |    "cell_type": "markdown",
 49 |    "metadata": {
 50 |     "slideshow": {
 51 |      "slide_type": "subslide"
 52 |     }
 53 |    },
 54 |    "source": [
 55 |     "### How to bridge the gap"
 56 |    ]
 57 |   },
 58 |   {
 59 |    "cell_type": "markdown",
 60 |    "metadata": {
 61 |     "slideshow": {
 62 |      "slide_type": "fragment"
 63 |     }
 64 |    },
 65 |    "source": [
 66 |     "There are plenty of expensive courses, thick books, and jaded postdocs telling you how to do things in theory - and that's great! "
 67 |    ]
 68 |   },
 69 |   {
 70 |    "cell_type": "markdown",
 71 |    "metadata": {
 72 |     "slideshow": {
 73 |      "slide_type": "fragment"
 74 |     }
 75 |    },
 76 |    "source": [
 77 |     "But... how do you get there?"
 78 |    ]
 79 |   },
 80 |   {
 81 |    "cell_type": "markdown",
 82 |    "metadata": {
 83 |     "slideshow": {
 84 |      "slide_type": "fragment"
 85 |     }
 86 |    },
 87 |    "source": [
 88 |     "Let's play a little game, can you tell me how to do each of these?"
 89 |    ]
 90 |   },
 91 |   {
 92 |    "cell_type": "markdown",
 93 |    "metadata": {
 94 |     "slideshow": {
 95 |      "slide_type": "fragment"
 96 |     }
 97 |    },
 98 |    "source": [
 99 |     "| Problem                                                | Implementation |\n",
100 |     "|--------------------------------------------------------|----------------|\n",
101 |     "| Develop this new framework over the next 6 months      | ? |\n",
102 |     "| Adding this feature will take a while                  | ? |\n",
103 |     "| The code you're writing is turnign into a monster      | ? |\n",
104 |     "| Hmmm this Jupyter notebook is gettign too long         | ? |\n",
105 |     "| Writing documentation is too troublesome               | ? |\n",
106 |     "| The code of the other developer looks terrible         | ? |\n",
107 |     "| \"Why is there a super() in here???\"                    | ? |\n",
108 |     "| \"You know, you should make this script run with a CLI\" | ? |\n",
109 |     "| \"How are these objects related?\"                       | ? |\n",
110 |     "| Chart the structure of your project                    | ? |\n",
111 |     "| Figure out which part is slowing down the code         | ? |\n",
112 |     "| Speed up this NumPy code                               | ? |\n",
113 |     "| This loop is really slow... |  |"
114 |    ]
115 |   },
116 |   {
117 |    "cell_type": "markdown",
118 |    "metadata": {
119 |     "slideshow": {
120 |      "slide_type": "fragment"
121 |     }
122 |    },
123 |    "source": [
124 |     "Solution:"
125 |    ]
126 |   },
127 |   {
128 |    "cell_type": "markdown",
129 |    "metadata": {
130 |     "slideshow": {
131 |      "slide_type": "fragment"
132 |     }
133 |    },
134 |    "source": [
135 |     "| Problem                                                | Implementation |\n",
136 |     "|--------------------------------------------------------|----------------|\n",
137 |     "| Develop this new framework over the next 6 months      | Agile, Sprint planning, etc. |\n",
138 |     "| Adding this feature will take a while                  | Sprint planning, review process |\n",
139 |     "| The code you're writing is turnign into a monster      | Code architecture, refactoring |\n",
140 |     "| Hmmm this Jupyter notebook is gettign too long         | module architecture |\n",
141 |     "| Writing documentation is too troublesome               | AutoDoc, Docstring creator, etc |\n",
142 |     "| The code of the other developer looks terrible         | code formatter, linting |\n",
143 |     "| \"Why is there a super() in here???\"                    | Java Developers |\n",
144 |     "| \"What are these properties?\"                           | Setters and Getters |\n",
145 |     "| \"You know, you should make this script run with a CLI\" | click |\n",
146 |     "| \"How are these objects related?\"                       | Coda Analytzr |\n",
147 |     "| \"Can you show me how this project is structured?\"      | UML, Code Analyzer |\n",
148 |     "| Figure out which part is slowing down the code         | Dynamic Code Analyzer / Profiler |\n",
149 |     "| Speed up this NumPy code                               | Numba |\n",
150 |     "| This loop is really slow....                           | Map(), Numba, Dask |\n",
151 |     "| I should run this in parallel...                       | Multiprocessing, Dask |"
152 |    ]
153 |   },
154 |   {
155 |    "cell_type": "markdown",
156 |    "metadata": {
157 |     "slideshow": {
158 |      "slide_type": "slide"
159 |     }
160 |    },
161 |    "source": [
162 |     "# Is this Course for you?"
163 |    ]
164 |   },
165 |   {
166 |    "cell_type": "markdown",
167 |    "metadata": {
168 |     "slideshow": {
169 |      "slide_type": "fragment"
170 |     }
171 |    },
172 |    "source": [
173 |     "<img src=\"https://blog.codinghorror.com/content/images/uploads/2009/02/6a0120a85dcdae970b012877707a45970c-pi.png\" width=320>"
174 |    ]
175 |   },
176 |   {
177 |    "cell_type": "markdown",
178 |    "metadata": {
179 |     "slideshow": {
180 |      "slide_type": "fragment"
181 |     }
182 |    },
183 |    "source": [
184 |     "Have you ever gotten..."
185 |    ]
186 |   },
187 |   {
188 |    "cell_type": "markdown",
189 |    "metadata": {
190 |     "jupyter": {
191 |      "outputs_hidden": true
192 |     },
193 |     "lines_to_next_cell": 2,
194 |     "slideshow": {
195 |      "slide_type": "fragment"
196 |     }
197 |    },
198 |    "source": [
199 |     "- a shared project in your group, but couldn't figure out what 80\\% of the functions or objects did?"
200 |    ]
201 |   },
202 |   {
203 |    "cell_type": "markdown",
204 |    "metadata": {
205 |     "slideshow": {
206 |      "slide_type": "fragment"
207 |     }
208 |    },
209 |    "source": [
210 |     "- a code from a previous student/phd/postdoc and though -- WHAT THE F- is this?!"
211 |    ]
212 |   },
213 |   {
214 |    "cell_type": "markdown",
215 |    "metadata": {
216 |     "slideshow": {
217 |      "slide_type": "fragment"
218 |     }
219 |    },
220 |    "source": [
221 |     "- a bachelor student to use your code and only gotten stupid questions from them?"
222 |    ]
223 |   },
224 |   {
225 |    "cell_type": "markdown",
226 |    "metadata": {
227 |     "slideshow": {
228 |      "slide_type": "fragment"
229 |     }
230 |    },
231 |    "source": [
232 |     "I hate to break it to you, but you also write bad code"
233 |    ]
234 |   },
235 |   {
236 |    "cell_type": "markdown",
237 |    "metadata": {
238 |     "slideshow": {
239 |      "slide_type": "fragment"
240 |     }
241 |    },
242 |    "source": [
243 |     "We all write bad code, and the point is not to write perfect code, but to write less bad code.\n",
244 |     "\n",
245 |     "Just a world with less bad code. That's the dream."
246 |    ]
247 |   },
248 |   {
249 |    "cell_type": "markdown",
250 |    "metadata": {
251 |     "slideshow": {
252 |      "slide_type": "slide"
253 |     }
254 |    },
255 |    "source": [
256 |     "# Exercise"
257 |    ]
258 |   },
259 |   {
260 |    "cell_type": "markdown",
261 |    "metadata": {
262 |     "slideshow": {
263 |      "slide_type": "fragment"
264 |     }
265 |    },
266 |    "source": [
267 |     "- Pair up in groups of 2 or 3\n",
268 |     "- Show the other person your last opened python code you wrote\n",
269 |     "- Spend 5 minutes trying to unsderstand it\n",
270 |     "- Discuss the code"
271 |    ]
272 |   },
273 |   {
274 |    "cell_type": "markdown",
275 |    "metadata": {
276 |     "slideshow": {
277 |      "slide_type": "slide"
278 |     }
279 |    },
280 |    "source": [
281 |     "# Overview of the Course"
282 |    ]
283 |   },
284 |   {
285 |    "cell_type": "markdown",
286 |    "metadata": {
287 |     "slideshow": {
288 |      "slide_type": "fragment"
289 |     }
290 |    },
291 |    "source": [
292 |     "1. Fundamentals of Production Code\n",
293 |     "   - Workflow Organization\n",
294 |     "   - Environments\n",
295 |     "   - Code Style and Formatters\n",
296 |     "   - Design Patterns\n",
297 |     "   - Thinking Functionally\n",
298 |     "   - Module Architecture\n",
299 |     "   - CLI Interfaces"
300 |    ]
301 |   },
302 |   {
303 |    "cell_type": "markdown",
304 |    "metadata": {
305 |     "slideshow": {
306 |      "slide_type": "fragment"
307 |     }
308 |    },
309 |    "source": [
310 |     "2. Data Management Fundamentals\n",
311 |     "    - Pre-SQL\n",
312 |     "    - SQL\n",
313 |     "    - NoSQL\n",
314 |     "    - Graph Databases"
315 |    ]
316 |   },
317 |   {
318 |    "cell_type": "markdown",
319 |    "metadata": {
320 |     "slideshow": {
321 |      "slide_type": "fragment"
322 |     }
323 |    },
324 |    "source": [
325 |     "3. Continuous Integration Pipeline\n",
326 |     "    - Git\n",
327 |     "    - Unit Tests\n",
328 |     "    - Dockers\n",
329 |     "    - APIs"
330 |    ]
331 |   },
332 |   {
333 |    "cell_type": "markdown",
334 |    "metadata": {
335 |     "slideshow": {
336 |      "slide_type": "fragment"
337 |     }
338 |    },
339 |    "source": [
340 |     "4. Best Practices in Data Science\n",
341 |     "    - Machine Learning \n",
342 |     "    - Coding"
343 |    ]
344 |   },
345 |   {
346 |    "cell_type": "markdown",
347 |    "metadata": {
348 |     "slideshow": {
349 |      "slide_type": "fragment"
350 |     }
351 |    },
352 |    "source": [
353 |     "5. Processing Data Efficiently\n",
354 |     "    - Tensorflow\n",
355 |     "    - Network Architeictures & Applications\n",
356 |     "    - Slurm\n",
357 |     "    - Numba\n",
358 |     "    - Dask"
359 |    ]
360 |   }
361 |  ],
362 |  "metadata": {
363 |   "jupytext": {
364 |    "formats": "ipynb,py"
365 |   },
366 |   "kernelspec": {
367 |    "display_name": "Python 3",
368 |    "language": "python",
369 |    "name": "python3"
370 |   },
371 |   "language_info": {
372 |    "codemirror_mode": {
373 |     "name": "ipython",
374 |     "version": 3
375 |    },
376 |    "file_extension": ".py",
377 |    "mimetype": "text/x-python",
378 |    "name": "python",
379 |    "nbconvert_exporter": "python",
380 |    "pygments_lexer": "ipython3",
381 |    "version": "3.7.7"
382 |   }
383 |  },
384 |  "nbformat": 4,
385 |  "nbformat_minor": 4
386 | }


--------------------------------------------------------------------------------
/04_best_practices/slurm.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "metadata": {
  6 |     "slideshow": {
  7 |      "slide_type": "slide"
  8 |     }
  9 |    },
 10 |    "source": [
 11 |     "# Slurm\n",
 12 |     "Slurm is a widely used cluster manager and job scheduling system. It's used to submit jobs in an HPC system.  "
 13 |    ]
 14 |   },
 15 |   {
 16 |    "cell_type": "markdown",
 17 |    "metadata": {
 18 |     "slideshow": {
 19 |      "slide_type": "subslide"
 20 |     }
 21 |    },
 22 |    "source": [
 23 |     "### First contact with Slurm\n",
 24 |     "Basic commands to communicate with the cluster are:\n",
 25 |     "- `srun` to directly run a command on a computing node. This is usually used to have interactive sessions slurm\n",
 26 |     "- `sinfo` to get info on specific jobs (selected by jobid, user etc.). Useful to monitor your jobs\n",
 27 |     "- `sbatch` to submit jobs to the cluster. This is useful if you want to submit scripts etc. "
 28 |    ]
 29 |   },
 30 |   {
 31 |    "cell_type": "markdown",
 32 |    "metadata": {
 33 |     "slideshow": {
 34 |      "slide_type": "slide"
 35 |     }
 36 |    },
 37 |    "source": [
 38 |     "## How does the cluster look like\n",
 39 |     "We can use `sinfo` to get a glimpse of the cluster structure:\n",
 40 |     "```bash\n",
 41 |     "PARTITION       AVAIL  TIMELIMIT  NODES  STATE NODELIST\n",
 42 |     "icb_cpu*           up 7-00:00:00     15    mix ibis216-010-[022-023,034-035,051,064,071],ibis216-224-[010-011],icb-neu-[001-003],icb-rsrv[05-06,08]\n",
 43 |     "icb_cpu*           up 7-00:00:00     22  alloc ibis-ceph-[002-006,008-019],ibis216-010-[011-012,020-021,033]\n",
 44 |     "icb_cpu*           up 7-00:00:00     19   idle ibis216-010-[001-004,007,024-032,036-037,068-070]\n",
 45 |     "icb_gpu            up 7-00:00:00      9    mix icb-gpusrv[02-08],supergpu02pxe,supergpu03pxe\n",
 46 |     "icb_gpu            up 7-00:00:00      1   idle icb-gpusrv01\n",
 47 |     "icb_interactive    up   12:00:00      9  down* clara,fonsi,heidi,hias,icb-lisa,icb-mona,icb-sarah,sepp,wastl\n",
 48 |     "icb_interactive    up   12:00:00      1    mix icb-iris\n",
 49 |     "icb_rstrct         up 5-00:00:00      1    mix icb-neu-003\n",
 50 |     "bcf                up 12-00:00:0      1    mix ibis216-010-005\n",
 51 |     "bcf                up 12-00:00:0      1   idle ibis216-010-006\n",
 52 |     "```"
 53 |    ]
 54 |   },
 55 |   {
 56 |    "cell_type": "markdown",
 57 |    "metadata": {
 58 |     "slideshow": {
 59 |      "slide_type": "slide"
 60 |     }
 61 |    },
 62 |    "source": [
 63 |     "## What are the running jobs?\n",
 64 |     "We can use `squeue` to get that info\n",
 65 |     "```bash\n",
 66 |     "            535882   icb_cpu nf-Veloc thomas.w  R 1-00:59:00      1 ibis216-224-010\n",
 67 |     "            538003   icb_cpu rhapsody emilio.d  R   22:16:26      1 ibis216-010-071\n",
 68 |     "            541083   icb_gpu EMBEDDIN leander.  R      51:45      1 supergpu03pxe\n",
 69 |     "            541090   icb_gpu EMBEDDIN leander.  R      42:29      1 supergpu03pxe\n",
 70 |     "            541091   icb_gpu EMBEDDIN leander.  R      41:46      1 supergpu03pxe\n",
 71 |     "```"
 72 |    ]
 73 |   },
 74 |   {
 75 |    "cell_type": "markdown",
 76 |    "metadata": {
 77 |     "slideshow": {
 78 |      "slide_type": "slide"
 79 |     }
 80 |    },
 81 |    "source": [
 82 |     "## How about a specific job?\n",
 83 |     "We can look at specific jobs with `scontrol show jobid [JOBID]`\n",
 84 |     "```bash\n",
 85 |     "(base) [giovanni.palla@vicb-submit-02 cpu_interactive]$ sbatch submit_interactive.sh\n",
 86 |     "Submitted batch job 543650\n",
 87 |     "(base) [giovanni.palla@vicb-submit-02 cpu_interactive]$ sq\n",
 88 |     "            543650   icb_cpu interact giovanni  R       0:00      1 ibis216-010-051\n",
 89 |     "(base) [giovanni.palla@vicb-submit-02 cpu_interactive]$ scontrol show jobid 543650\n",
 90 |     "JobId=543650 JobName=interactive\n",
 91 |     "   UserId=giovanni.palla(138707) GroupId=OG-ICB-User(20000) MCS_label=N/A\n",
 92 |     "   Priority=4294048901 Nice=1000 Account=icb-user QOS=icb_stndrd\n",
 93 |     "   JobState=RUNNING Reason=None Dependency=(null)\n",
 94 |     "   Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0\n",
 95 |     "   RunTime=00:00:12 TimeLimit=10:00:00 TimeMin=N/A\n",
 96 |     "   SubmitTime=2020-09-10T12:01:00 EligibleTime=2020-09-10T12:01:00\n",
 97 |     "   AccrueTime=2020-09-10T12:01:01\n",
 98 |     "   StartTime=2020-09-10T12:01:01 EndTime=2020-09-10T22:01:01 Deadline=N/A\n",
 99 |     "   SuspendTime=None SecsPreSuspend=0 LastSchedEval=2020-09-10T12:01:01\n",
100 |     "   Partition=icb_cpu AllocNode:Sid=vicb-submit-02.scidom.de:24925\n",
101 |     "   ReqNodeList=(null) ExcNodeList=(null)\n",
102 |     "   NodeList=ibis216-010-051\n",
103 |     "   BatchHost=ibis216-010-051\n",
104 |     "   NumNodes=1 NumCPUs=8 NumTasks=1 CPUs/Task=8 ReqB:S:C:T=0:0:*:*\n",
105 |     "   TRES=cpu=8,mem=8G,node=1,billing=8\n",
106 |     "   Socks/Node=* NtasksPerN:B:S:C=0:0:*:1 CoreSpec=*\n",
107 |     "   MinCPUsNode=8 MinMemoryNode=8G MinTmpDiskNode=0\n",
108 |     "   Features=xeon_6126|opteron_6234|opteron_6376|opteron_6378 DelayBoot=00:00:00\n",
109 |     "   OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)\n",
110 |     "   Command=/storage/groups/ml01/workspace/giovanni.palla/cpu_interactive/submit_interactive.sh\n",
111 |     "   WorkDir=/storage/groups/ml01/workspace/giovanni.palla/cpu_interactive\n",
112 |     "   StdErr=/storage/groups/ml01/workspace/giovanni.palla/cpu_interactive/interactive_543650.err\n",
113 |     "   StdIn=/dev/null\n",
114 |     "   StdOut=/storage/groups/ml01/workspace/giovanni.palla/cpu_interactive/interactive_543650.out\n",
115 |     "   Power=\n",
116 |     "  ```"
117 |    ]
118 |   },
119 |   {
120 |    "cell_type": "markdown",
121 |    "metadata": {
122 |     "slideshow": {
123 |      "slide_type": "slide"
124 |     }
125 |    },
126 |    "source": [
127 |     "## Establish an interactive slurm session\n",
128 |     "```bash\n",
129 |     "srun -p icb_interactive -w ibis216-010-022 -c 1 -t 00:15:00 --mem=200 --pty bash\n",
130 |     "```\n",
131 |     "\n",
132 |     "The `--pty` is used to assign the commmand. In this case, we just want to get a bash terminal. "
133 |    ]
134 |   },
135 |   {
136 |    "cell_type": "markdown",
137 |    "metadata": {
138 |     "slideshow": {
139 |      "slide_type": "fragment"
140 |     }
141 |    },
142 |    "source": [
143 |     "One way I often use this is\n",
144 |     "```bash\n",
145 |     "(base) [giovanni.palla@vicb-submit-02 cpu_interactive]$ srun -p icb_gpu -w icb-gpusrv03 --pty nvidia-smi\n",
146 |     "Thu Sep 10 13:11:07 2020\n",
147 |     "+-----------------------------------------------------------------------------+\n",
148 |     "| NVIDIA-SMI 440.31       Driver Version: 440.31       CUDA Version: 10.2     |\n",
149 |     "|-------------------------------+----------------------+----------------------+\n",
150 |     "| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |\n",
151 |     "| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |\n",
152 |     "|===============================+======================+======================|\n",
153 |     "|   0  TITAN V             Off  | 00000000:65:00.0 Off |                  N/A |\n",
154 |     "| 61%   83C    P2   147W / 250W |  12005MiB / 12066MiB |     83%      Default |\n",
155 |     "+-------------------------------+----------------------+----------------------+\n",
156 |     "|   1  TITAN V             Off  | 00000000:B3:00.0 Off |                  N/A |\n",
157 |     "| 62%   83C    P2   140W / 250W |  12005MiB / 12066MiB |     48%      Default |\n",
158 |     "+-------------------------------+----------------------+----------------------+\n",
159 |     "\n",
160 |     "+-----------------------------------------------------------------------------+\n",
161 |     "| Processes:                                                       GPU Memory |\n",
162 |     "|  GPU       PID   Type   Process name                             Usage      |\n",
163 |     "|=============================================================================|\n",
164 |     "|    0     60099      C   python                                     11993MiB |\n",
165 |     "|    1     60098      C   python                                     11993MiB |\n",
166 |     "+-----------------------------------------------------------------------------+\n",
167 |     "```\n",
168 |     "Very useful if you want to get quick info on efficiency of gpu usage etc.\n",
169 |     "\n",
170 |     "In general, always be specific with the arguments, although it's true that slurm systems usually have sound defaults values."
171 |    ]
172 |   },
173 |   {
174 |    "cell_type": "markdown",
175 |    "metadata": {
176 |     "slideshow": {
177 |      "slide_type": "slide"
178 |     }
179 |    },
180 |    "source": [
181 |     "## More on sbatch\n",
182 |     "`sbatch` is useful to submit scripts as jobs. Usually, you have interactive sessions with `srun` for prorotyping, but then you want to use `sbatch` for the major computation. Argemunts are exactly the same as `srun`, but specified differently.\n",
183 |     "\n",
184 |     "```bash\n",
185 |     "#!/bin/bash\n",
186 |     "\n",
187 |     "#SBATCH -o slurm_output.txt\n",
188 |     "#SBATCH -e slurm_error.txt\n",
189 |     "#SBATCH -J MyFancyJobName\n",
190 |     "#SBATCH -p icb_cpu\n",
191 |     "#SBATCH --nodelist=ibis-ceph-002\n",
192 |     "#SBATCH -c 1\n",
193 |     "#SBATCH --mem=2G\n",
194 |     "#SBATCH -t 00:15:00\n",
195 |     "#SBATCH --nice=10000 \n",
196 |     "\n",
197 |     "echo \"Starting stuff at `date`\"\n",
198 |     "# You can put arbitrary unix commands here, call other scripts, etc...\n",
199 |     "sleep 10\n",
200 |     "echo \"Computering...\"\n",
201 |     "sleep 900\n",
202 |     "echo \"Ending stuff at `date`\"\n",
203 |     "```"
204 |    ]
205 |   },
206 |   {
207 |    "cell_type": "markdown",
208 |    "metadata": {
209 |     "slideshow": {
210 |      "slide_type": "slide"
211 |     }
212 |    },
213 |    "source": [
214 |     "## Interactive session with sbatch\n",
215 |     "In a typical datascience workflow, you might want to start your coding with a jupyter instance. This is a way to do it.  "
216 |    ]
217 |   },
218 |   {
219 |    "cell_type": "markdown",
220 |    "metadata": {
221 |     "slideshow": {
222 |      "slide_type": "fragment"
223 |     }
224 |    },
225 |    "source": [
226 |     "Create a script `submit_interactive.sh` that looks like this:\n",
227 |     "```bash\n",
228 |     "#!/bin/bash\n",
229 |     "\n",
230 |     "#SBATCH -o \"interactive_%j.out\"\n",
231 |     "#SBATCH -e \"interactive_%j.err\"\n",
232 |     "#SBATCH -J interactive\n",
233 |     "#SBATCH -c 8 # default values is 2\n",
234 |     "#SBATCH --constraint=\"xeon_6126|opteron_6234|opteron_6376|opteron_6378\"\n",
235 |     "#SBATCH --mem=8GB\n",
236 |     "#SBATCH -t 10:00:00\n",
237 |     "#SBATCH --nice=10000\n",
238 |     "\n",
239 |     "./run_jupyter.bash -e myenv\n",
240 |     "```  "
241 |    ]
242 |   },
243 |   {
244 |    "cell_type": "markdown",
245 |    "metadata": {
246 |     "slideshow": {
247 |      "slide_type": "fragment"
248 |     }
249 |    },
250 |    "source": [
251 |     "and another script `run_jupyter.bash`, that looks like this:\n",
252 |     "```bash\n",
253 |     "#!/bin/bash\n",
254 |     "\n",
255 |     "source ~/.bashrc\n",
256 |     "\n",
257 |     "while getopts \":e:\" opt; do\n",
258 |     "  case $opt in\n",
259 |     "    e) env=\"$OPTARG\"\n",
260 |     "    ;;\n",
261 |     "    \\?) echo \"Invalid option -$OPTARG\" >&2\n",
262 |     "    ;;\n",
263 |     "  esac\n",
264 |     "done\n",
265 |     "\n",
266 |     "conda activate $env\n",
267 |     "cd /storage/groups/ml01/workspace/giovanni.palla\n",
268 |     "jupyter lab --no-browser --ip=0.0.0.0\n",
269 |     "```"
270 |    ]
271 |   },
272 |   {
273 |    "cell_type": "markdown",
274 |    "metadata": {
275 |     "slideshow": {
276 |      "slide_type": "slide"
277 |     }
278 |    },
279 |    "source": [
280 |     "## Interactive session with sbatch\n",
281 |     "After ~30 seconds, you will read the link for the jupyter session in the `.err` file\n",
282 |     "```bash\n",
283 |     "(base) [giovanni.palla@vicb-submit-02 cpu_interactive]$ cat interactive_543650.err\n",
284 |     "[I 12:01:26.392 LabApp] JupyterLab extension loaded from /home/icb/giovanni.palla/miniconda3/envs/sfaira/lib/python3.8/site-packages/jupyterlab\n",
285 |     "[I 12:01:26.392 LabApp] JupyterLab application directory is /home/icb/giovanni.palla/miniconda3/envs/sfaira/share/jupyter/lab\n",
286 |     "[I 12:01:26.401 LabApp] Serving notebooks from local directory: /storage/groups/ml01/workspace/giovanni.palla\n",
287 |     "[I 12:01:26.401 LabApp] The Jupyter Notebook is running at:\n",
288 |     "[I 12:01:26.401 LabApp] http://ibis216-010-051.scidom.de:8888/?token=ba33b814bc360beb21c803517adc53ade10da631ede21690\n",
289 |     "[I 12:01:26.401 LabApp]  or http://127.0.0.1:8888/?token=ba33b814bc360beb21c803517adc53ade10da631ede21690\n",
290 |     "[I 12:01:26.401 LabApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).\n",
291 |     "[C 12:01:26.424 LabApp]\n",
292 |     "\n",
293 |     "    To access the notebook, open this file in a browser:\n",
294 |     "        file:///mnt/home/icb/giovanni.palla/.local/share/jupyter/runtime/nbserver-1565-open.html\n",
295 |     "    Or copy and paste one of these URLs:\n",
296 |     "        http://ibis216-010-051.scidom.de:8888/?token=ba33b814bc360beb21c803517adc53ade10da631ede21690\n",
297 |     "     or http://127.0.0.1:8888/?token=ba33b814bc360beb21c803517adc53ade10da631ede21690\n",
298 |     "```"
299 |    ]
300 |   },
301 |   {
302 |    "cell_type": "markdown",
303 |    "metadata": {
304 |     "slideshow": {
305 |      "slide_type": "slide"
306 |     }
307 |    },
308 |    "source": [
309 |     "## Another command: sacct\n",
310 |     "Useful to check all your recent jobs (finished, cancelled, etc)\n",
311 |     "\n",
312 |     "```bash\n",
313 |     "(base) [giovanni.palla@vicb-submit-02 cpu_interactive]$ sacct\n",
314 |     "       JobID    JobName  Partition    Account  AllocCPUS      State ExitCode\n",
315 |     "------------ ---------- ---------- ---------- ---------- ---------- --------\n",
316 |     "543650       interacti+    icb_cpu   icb-user          8    RUNNING      0:0\n",
317 |     "543650.batch      batch              icb-user          8    RUNNING      0:0\n",
318 |     "543650.exte+     extern              icb-user          8    RUNNING      0:0\n",
319 |     "543818       nvidia-smi    icb_gpu   icb-user          2  COMPLETED      0:0\n",
320 |     "543818.exte+     extern              icb-user          2  COMPLETED      0:0\n",
321 |     "543818.0     nvidia-smi              icb-user          2  COMPLETED      0:0\n",
322 |     "```"
323 |    ]
324 |   }
325 |  ],
326 |  "metadata": {
327 |   "kernelspec": {
328 |    "display_name": "Python 3",
329 |    "language": "python",
330 |    "name": "python3"
331 |   },
332 |   "language_info": {
333 |    "codemirror_mode": {
334 |     "name": "ipython",
335 |     "version": 3
336 |    },
337 |    "file_extension": ".py",
338 |    "mimetype": "text/x-python",
339 |    "name": "python",
340 |    "nbconvert_exporter": "python",
341 |    "pygments_lexer": "ipython3",
342 |    "version": "3.8.5"
343 |   }
344 |  },
345 |  "nbformat": 4,
346 |  "nbformat_minor": 4
347 | }
348 | 


--------------------------------------------------------------------------------
/04_best_practices/04_best_practices.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "metadata": {
  6 |     "slideshow": {
  7 |      "slide_type": "slide"
  8 |     }
  9 |    },
 10 |    "source": [
 11 |     "# Best Practices in Machine Learning and Code Organization"
 12 |    ]
 13 |   },
 14 |   {
 15 |    "cell_type": "markdown",
 16 |    "metadata": {
 17 |     "slideshow": {
 18 |      "slide_type": "slide"
 19 |     }
 20 |    },
 21 |    "source": [
 22 |     "## Motivation"
 23 |    ]
 24 |   },
 25 |   {
 26 |    "cell_type": "markdown",
 27 |    "metadata": {
 28 |     "slideshow": {
 29 |      "slide_type": "fragment"
 30 |     }
 31 |    },
 32 |    "source": [
 33 |     "- What does best-practice even mean?\n",
 34 |     "- How do I know something is a bad practice?"
 35 |    ]
 36 |   },
 37 |   {
 38 |    "cell_type": "markdown",
 39 |    "metadata": {
 40 |     "jupyter": {
 41 |      "outputs_hidden": true
 42 |     },
 43 |     "lines_to_next_cell": 2,
 44 |     "slideshow": {
 45 |      "slide_type": "fragment"
 46 |     }
 47 |    },
 48 |    "source": [
 49 |     "> It's not wrong, but it feels wrong."
 50 |    ]
 51 |   },
 52 |   {
 53 |    "cell_type": "markdown",
 54 |    "metadata": {
 55 |     "slideshow": {
 56 |      "slide_type": "slide"
 57 |     }
 58 |    },
 59 |    "source": [
 60 |     "## Overview"
 61 |    ]
 62 |   },
 63 |   {
 64 |    "cell_type": "markdown",
 65 |    "metadata": {
 66 |     "slideshow": {
 67 |      "slide_type": "fragment"
 68 |     }
 69 |    },
 70 |    "source": [
 71 |     "Best Pratices in:\n",
 72 |     "- Machine Learning Code Bases and Versioning\n",
 73 |     "- Code and Module organization and philosophies"
 74 |    ]
 75 |   },
 76 |   {
 77 |    "cell_type": "markdown",
 78 |    "metadata": {
 79 |     "slideshow": {
 80 |      "slide_type": "slide"
 81 |     }
 82 |    },
 83 |    "source": [
 84 |     "## Bad vs. Best Practices in Python"
 85 |    ]
 86 |   },
 87 |   {
 88 |    "cell_type": "markdown",
 89 |    "metadata": {
 90 |     "slideshow": {
 91 |      "slide_type": "subslide"
 92 |     }
 93 |    },
 94 |    "source": [
 95 |     "### Repetition"
 96 |    ]
 97 |   },
 98 |   {
 99 |    "cell_type": "markdown",
100 |    "metadata": {
101 |     "slideshow": {
102 |      "slide_type": "subslide"
103 |     }
104 |    },
105 |    "source": [
106 |     "#### Python is not C - so do ***not*** copy-and-paste!"
107 |    ]
108 |   },
109 |   {
110 |    "cell_type": "markdown",
111 |    "metadata": {
112 |     "slideshow": {
113 |      "slide_type": "fragment"
114 |     }
115 |    },
116 |    "source": [
117 |     "<img src=\"https://www.gogagah.com/wp-content/uploads/2019/04/Find-the-Difference-1024x576.jpg\" width=380>"
118 |    ]
119 |   },
120 |   {
121 |    "cell_type": "markdown",
122 |    "metadata": {
123 |     "slideshow": {
124 |      "slide_type": "subslide"
125 |     }
126 |    },
127 |    "source": [
128 |     "#### Instead of copy & pasting:"
129 |    ]
130 |   },
131 |   {
132 |    "cell_type": "markdown",
133 |    "metadata": {
134 |     "slideshow": {
135 |      "slide_type": "fragment"
136 |     }
137 |    },
138 |    "source": [
139 |     "- write functions!\n",
140 |     "- compose functions!\n",
141 |     "- create partial function!"
142 |    ]
143 |   },
144 |   {
145 |    "cell_type": "code",
146 |    "execution_count": 1,
147 |    "metadata": {
148 |     "slideshow": {
149 |      "slide_type": "fragment"
150 |     }
151 |    },
152 |    "outputs": [],
153 |    "source": [
154 |     "def add(a, b):\n",
155 |     "    return a + b"
156 |    ]
157 |   },
158 |   {
159 |    "cell_type": "code",
160 |    "execution_count": 3,
161 |    "metadata": {},
162 |    "outputs": [
163 |     {
164 |      "data": {
165 |       "text/plain": [
166 |        "5"
167 |       ]
168 |      },
169 |      "execution_count": 3,
170 |      "metadata": {},
171 |      "output_type": "execute_result"
172 |     }
173 |    ],
174 |    "source": [
175 |     "from functools import partial\n",
176 |     "\n",
177 |     "add2 = partial(add, 2)  # Create a copy of add() with a=2\n",
178 |     "\n",
179 |     "add2(3)"
180 |    ]
181 |   },
182 |   {
183 |    "cell_type": "code",
184 |    "execution_count": 5,
185 |    "metadata": {
186 |     "slideshow": {
187 |      "slide_type": "fragment"
188 |     }
189 |    },
190 |    "outputs": [
191 |     {
192 |      "data": {
193 |       "text/plain": [
194 |        "5"
195 |       ]
196 |      },
197 |      "execution_count": 5,
198 |      "metadata": {},
199 |      "output_type": "execute_result"
200 |     }
201 |    ],
202 |    "source": [
203 |     "add2 = lambda x: add(2, x)\n",
204 |     "\n",
205 |     "add2(3)"
206 |    ]
207 |   },
208 |   {
209 |    "cell_type": "code",
210 |    "execution_count": 7,
211 |    "metadata": {},
212 |    "outputs": [
213 |     {
214 |      "data": {
215 |       "text/plain": [
216 |        "5"
217 |       ]
218 |      },
219 |      "execution_count": 7,
220 |      "metadata": {},
221 |      "output_type": "execute_result"
222 |     }
223 |    ],
224 |    "source": [
225 |     "def add2(x):\n",
226 |     "    return add(2, x)\n",
227 |     "\n",
228 |     "add2(3)"
229 |    ]
230 |   },
231 |   {
232 |    "cell_type": "markdown",
233 |    "metadata": {
234 |     "slideshow": {
235 |      "slide_type": "subslide"
236 |     }
237 |    },
238 |    "source": [
239 |     "### Switch Behavior"
240 |    ]
241 |   },
242 |   {
243 |    "cell_type": "markdown",
244 |    "metadata": {
245 |     "slideshow": {
246 |      "slide_type": "subslide"
247 |     }
248 |    },
249 |    "source": [
250 |     "#### Python has no switch statements, but don't go around stacking if's:"
251 |    ]
252 |   },
253 |   {
254 |    "cell_type": "markdown",
255 |    "metadata": {
256 |     "slideshow": {
257 |      "slide_type": "fragment"
258 |     }
259 |    },
260 |    "source": [
261 |     "<img src=\"https://i.redd.it/6rbq35occu441.jpg\" width=300px>"
262 |    ]
263 |   },
264 |   {
265 |    "cell_type": "markdown",
266 |    "metadata": {
267 |     "slideshow": {
268 |      "slide_type": "subslide"
269 |     }
270 |    },
271 |    "source": [
272 |     "#### Instead of stacking if-else:"
273 |    ]
274 |   },
275 |   {
276 |    "cell_type": "markdown",
277 |    "metadata": {
278 |     "slideshow": {
279 |      "slide_type": "fragment"
280 |     }
281 |    },
282 |    "source": [
283 |     "- map things with a dictionary!"
284 |    ]
285 |   },
286 |   {
287 |    "cell_type": "markdown",
288 |    "metadata": {
289 |     "slideshow": {
290 |      "slide_type": "fragment"
291 |     }
292 |    },
293 |    "source": [
294 |     "Dictionaries are hashmaps, meaning the map a hash to an object."
295 |    ]
296 |   },
297 |   {
298 |    "cell_type": "markdown",
299 |    "metadata": {
300 |     "slideshow": {
301 |      "slide_type": "fragment"
302 |     }
303 |    },
304 |    "source": [
305 |     "Since Functions are first order objects in Python, they can be pointed to!"
306 |    ]
307 |   },
308 |   {
309 |    "cell_type": "code",
310 |    "execution_count": 9,
311 |    "metadata": {
312 |     "slideshow": {
313 |      "slide_type": "fragment"
314 |     }
315 |    },
316 |    "outputs": [
317 |     {
318 |      "data": {
319 |       "text/plain": [
320 |        "4"
321 |       ]
322 |      },
323 |      "execution_count": 9,
324 |      "metadata": {},
325 |      "output_type": "execute_result"
326 |     }
327 |    ],
328 |    "source": [
329 |     "def add(a, b):\n",
330 |     "    return a + b\n",
331 |     "\n",
332 |     "def add_sum(a, b):\n",
333 |     "    return sum([a, b])\n",
334 |     "\n",
335 |     "math_functions = {'add': add_sum}\n",
336 |     "\n",
337 |     "math_functions['add'](2, 2)"
338 |    ]
339 |   },
340 |   {
341 |    "cell_type": "markdown",
342 |    "metadata": {
343 |     "slideshow": {
344 |      "slide_type": "subslide"
345 |     }
346 |    },
347 |    "source": [
348 |     "### Depth"
349 |    ]
350 |   },
351 |   {
352 |    "cell_type": "markdown",
353 |    "metadata": {
354 |     "slideshow": {
355 |      "slide_type": "subslide"
356 |     }
357 |    },
358 |    "source": [
359 |     "#### Making too many layers - inheritance, nesting, etc."
360 |    ]
361 |   },
362 |   {
363 |    "cell_type": "markdown",
364 |    "metadata": {
365 |     "slideshow": {
366 |      "slide_type": "fragment"
367 |     }
368 |    },
369 |    "source": [
370 |     "<img src=\"https://preview.redd.it/3kz7f2k1psx41.jpg?width=640&crop=smart&auto=webp&s=0c026807888b4c611089b31c740947bf78b5a3c5\" width=400 />"
371 |    ]
372 |   },
373 |   {
374 |    "cell_type": "markdown",
375 |    "metadata": {
376 |     "slideshow": {
377 |      "slide_type": "subslide"
378 |     }
379 |    },
380 |    "source": [
381 |     "#### Instead keep things shallow"
382 |    ]
383 |   },
384 |   {
385 |    "cell_type": "markdown",
386 |    "metadata": {
387 |     "slideshow": {
388 |      "slide_type": "fragment"
389 |     }
390 |    },
391 |    "source": [
392 |     "Ask yourself:\n",
393 |     "- Do I need this class?\n",
394 |     "  - Will it be instantiated often?\n",
395 |     "  - Are there many objects inheriting from it?\n",
396 |     "  - Does it carry state? Otherwise its a namespace!\n",
397 |     "- Does this need to be submodul or a file?\n",
398 |     "  - Are there many long functions?\n",
399 |     "  - Are there a large number of private functions?"
400 |    ]
401 |   },
402 |   {
403 |    "cell_type": "markdown",
404 |    "metadata": {
405 |     "slideshow": {
406 |      "slide_type": "fragment"
407 |     }
408 |    },
409 |    "source": [
410 |     "Singleton Pattern (Single global instance for an Object)\n",
411 |     "- If it does not carry state, it is a namespace\n",
412 |     "  - In Python, any file is a namespace! No need for the Object or Instance!\n",
413 |     "- If it just carries state, you want a database\n",
414 |     "  - Atomicity of operation can be guaranteed with a database\n",
415 |     "  - Database outside of Global Interpreter Lock (GIL)\n",
416 |     "  - Databases scale better!"
417 |    ]
418 |   },
419 |   {
420 |    "cell_type": "markdown",
421 |    "metadata": {
422 |     "slideshow": {
423 |      "slide_type": "subslide"
424 |     }
425 |    },
426 |    "source": [
427 |     "### Readability"
428 |    ]
429 |   },
430 |   {
431 |    "cell_type": "markdown",
432 |    "metadata": {
433 |     "slideshow": {
434 |      "slide_type": "subslide"
435 |     }
436 |    },
437 |    "source": [
438 |     "#### Write code - but write it to be read!"
439 |    ]
440 |   },
441 |   {
442 |    "cell_type": "markdown",
443 |    "metadata": {
444 |     "slideshow": {
445 |      "slide_type": "fragment"
446 |     }
447 |    },
448 |    "source": [
449 |     "<img src=\"https://i.redd.it/yl1lu031day41.png\" width=400 />"
450 |    ]
451 |   },
452 |   {
453 |    "cell_type": "markdown",
454 |    "metadata": {
455 |     "slideshow": {
456 |      "slide_type": "subslide"
457 |     }
458 |    },
459 |    "source": [
460 |     "#### Code is written to be read"
461 |    ]
462 |   },
463 |   {
464 |    "cell_type": "markdown",
465 |    "metadata": {
466 |     "slideshow": {
467 |      "slide_type": "fragment"
468 |     }
469 |    },
470 |    "source": [
471 |     "- Documentation\n",
472 |     "- Type Hinting\n",
473 |     "- Naming"
474 |    ]
475 |   },
476 |   {
477 |    "cell_type": "markdown",
478 |    "metadata": {
479 |     "slideshow": {
480 |      "slide_type": "subslide"
481 |     }
482 |    },
483 |    "source": [
484 |     "### Dependencies"
485 |    ]
486 |   },
487 |   {
488 |    "cell_type": "markdown",
489 |    "metadata": {
490 |     "slideshow": {
491 |      "slide_type": "subslide"
492 |     }
493 |    },
494 |    "source": [
495 |     "#### Sometimes they're too tempting"
496 |    ]
497 |   },
498 |   {
499 |    "cell_type": "markdown",
500 |    "metadata": {
501 |     "slideshow": {
502 |      "slide_type": "fragment"
503 |     }
504 |    },
505 |    "source": [
506 |     "<img src=\"https://i.redd.it/mapjfjami3y41.jpg\" width=400 />"
507 |    ]
508 |   },
509 |   {
510 |    "cell_type": "markdown",
511 |    "metadata": {
512 |     "slideshow": {
513 |      "slide_type": "subslide"
514 |     }
515 |    },
516 |    "source": [
517 |     "#### Why?"
518 |    ]
519 |   },
520 |   {
521 |    "cell_type": "markdown",
522 |    "metadata": {
523 |     "slideshow": {
524 |      "slide_type": "fragment"
525 |     }
526 |    },
527 |    "source": [
528 |     "- Projects get abandoned\n",
529 |     "  - Lack of security patches\n",
530 |     "  - Forced to stay with old versions\n",
531 |     "  - => Your project becomes ancient"
532 |    ]
533 |   },
534 |   {
535 |    "cell_type": "markdown",
536 |    "metadata": {},
537 |    "source": [
538 |     "Update regularly!\n",
539 |     "- Small bugs on a regular basis prevent abandonment\n",
540 |     "- Improved performances\n",
541 |     "- Additional functionality!"
542 |    ]
543 |   },
544 |   {
545 |    "cell_type": "markdown",
546 |    "metadata": {
547 |     "slideshow": {
548 |      "slide_type": "subslide"
549 |     }
550 |    },
551 |    "source": [
552 |     "### Keep things short"
553 |    ]
554 |   },
555 |   {
556 |    "cell_type": "markdown",
557 |    "metadata": {
558 |     "slideshow": {
559 |      "slide_type": "subslide"
560 |     }
561 |    },
562 |    "source": [
563 |     "#### The first law of Software Quality"
564 |    ]
565 |   },
566 |   {
567 |    "cell_type": "markdown",
568 |    "metadata": {
569 |     "slideshow": {
570 |      "slide_type": "fragment"
571 |     }
572 |    },
573 |    "source": [
574 |     "<img src=\"https://i.redd.it/tozimpm65gy41.jpg\" widht=350 />"
575 |    ]
576 |   },
577 |   {
578 |    "cell_type": "markdown",
579 |    "metadata": {
580 |     "slideshow": {
581 |      "slide_type": "subslide"
582 |     }
583 |    },
584 |    "source": [
585 |     "#### Sometimes less functionality is more maintainability"
586 |    ]
587 |   },
588 |   {
589 |    "cell_type": "markdown",
590 |    "metadata": {
591 |     "slideshow": {
592 |      "slide_type": "fragment"
593 |     }
594 |    },
595 |    "source": [
596 |     "> Each line of code is a credit you take on and interest is paid in time to maintain the base. Don't default on your code debt."
597 |    ]
598 |   },
599 |   {
600 |    "cell_type": "markdown",
601 |    "metadata": {
602 |     "slideshow": {
603 |      "slide_type": "fragment"
604 |     }
605 |    },
606 |    "source": [
607 |     "Finding non-critical code:\n",
608 |     "- Is this functionality used by many?\n",
609 |     "- Is this code still used or abandoned?\n",
610 |     "- Is it relevant to the larger goal?"
611 |    ]
612 |   },
613 |   {
614 |    "cell_type": "markdown",
615 |    "metadata": {
616 |     "slideshow": {
617 |      "slide_type": "fragment"
618 |     }
619 |    },
620 |    "source": [
621 |     "Solving too much code:\n",
622 |     "- Spin out functionality into a different module\n",
623 |     "- Simplify the code\n",
624 |     "- Delete code\n",
625 |     "- No really, you should delete code"
626 |    ]
627 |   },
628 |   {
629 |    "cell_type": "markdown",
630 |    "metadata": {
631 |     "slideshow": {
632 |      "slide_type": "subslide"
633 |     }
634 |    },
635 |    "source": [
636 |     "### Use version control"
637 |    ]
638 |   },
639 |   {
640 |    "cell_type": "markdown",
641 |    "metadata": {
642 |     "slideshow": {
643 |      "slide_type": "fragment"
644 |     }
645 |    },
646 |    "source": [
647 |     "<img src=\"https://external-preview.redd.it/u1_S5Vu4FztMR72c9pfl086wbmdlZYVjK77i1IEvTjg.jpg?width=640&crop=smart&auto=webp&s=310af21a5b237f4b53a982afc2077fcdb4b1839c\" width=\"400\">"
648 |    ]
649 |   }
650 |  ],
651 |  "metadata": {
652 |   "jupytext": {
653 |    "formats": "ipynb,py"
654 |   },
655 |   "kernelspec": {
656 |    "display_name": "Python 3",
657 |    "language": "python",
658 |    "name": "python3"
659 |   },
660 |   "language_info": {
661 |    "codemirror_mode": {
662 |     "name": "ipython",
663 |     "version": 3
664 |    },
665 |    "file_extension": ".py",
666 |    "mimetype": "text/x-python",
667 |    "name": "python",
668 |    "nbconvert_exporter": "python",
669 |    "pygments_lexer": "ipython3",
670 |    "version": "3.7.9"
671 |   }
672 |  },
673 |  "nbformat": 4,
674 |  "nbformat_minor": 4
675 | }
676 | 


--------------------------------------------------------------------------------
/Info Flyer/ugly_code_numpy_linalg.py:
--------------------------------------------------------------------------------
 1 | __all__ = ['matrix_power', 'solve', 'tensorsolve', 'tensorinv', 'inv',
 2 | 'cholesky', 'eigvals', 'eigvalsh', 'pinv', 'slogdet', 'det',
 3 | 'svd', 'eig', 'eigh', 'lstsq', 'norm', 'qr', 'cond', 'matrix_rank',
 4 | 'LinAlgError', 'multi_dot']; import functools
 5 | import operator
 6 | import warnings; from numpy.core import (
 7 | array, asarray, zeros, empty, empty_like, intc, single, double,
 8 | csingle, cdouble, inexact, complexfloating, newaxis, all, Inf, dot,
 9 | add, multiply, sqrt, fastCopyAndTranspose, sum, isfinite,
10 | finfo, errstate, geterrobj, moveaxis, amin, amax, product, abs,
11 | atleast_2d, intp, asanyarray, object_, matmul,
12 | swapaxes, divide, count_nonzero, isnan, sign
13 | ); from numpy.core.multiarray import normalize_axis_index; from numpy.core.overrides import set_module; from numpy.core import overrides; from numpy.lib.twodim_base import triu, eye; from numpy.linalg import lapack_lite, _umath_linalg; array_function_dispatch = functools.partial(
14 | overrides.array_function_dispatch, module='numpy.linalg'); _N = b'N'; _V = b'V'; _A = b'A'; _S = b'S'; _L = b'L' fortran_int = intc @set_module('numpy.linalg') class LinAlgError(Exception): def _determine_error_states(): errobj = geterrobj() bufsize = errobj[0] with errstate(invalid='call', over='ignore', divide='ignore', under='ignore'): invalid_call_errmask = geterrobj()[1] return [bufsize, invalid_call_errmask, None]; _linalg_error_extobj = _determine_error_states(); del _determine_error_states; def _raise_linalgerror_singular(err, flag): raise LinAlgError("Singular matrix"); def _raise_linalgerror_nonposdef(err, flag): raise LinAlgError("Matrix is not positive definite"); def _raise_linalgerror_eigenvalues_nonconvergence(err, flag): raise LinAlgError("Eigenvalues did not converge"); def _raise_linalgerror_svd_nonconvergence(err, flag): raise LinAlgError("SVD did not converge"); def _raise_linalgerror_lstsq(err, flag): raise LinAlgError("SVD did not converge in Linear Least Squares"); def get_linalg_error_extobj(callback): extobj = list(_linalg_error_extobj); extobj[2] = callback; return extobj; def _makearray(a): new = asarray(a); wrap = getattr(a, "__array_prepare__", new.__array_wrap__); return new, wrap; def isComplexType(t): return issubclass(t, complexfloating); _real_types_map = {single: single,; double: double,; csingle: single,; cdouble: double}; _complex_types_map = {single: csingle,; double: cdouble,; csingle: csingle,; cdouble: cdouble}; def _realType(t, default=double): return _real_types_map.get(t, default); def _complexType(t, default=cdouble): return _complex_types_map.get(t, default); def _linalgRealType(t): """Cast the type t to either double or cdouble."""; return double; def _commonType(*arrays): result_type = single; is_complex = False; for a in arrays: if issubclass(a.dtype.type, inexact): if isComplexType(a.dtype.type): is_complex = True; rt = _realType(a.dtype.type, default=None); if rt is None: raise TypeError("array type %s is unsupported in linalg" %; (a.dtype.name,)); else: rt = double; if rt is double: result_type = double; if is_complex: t = cdouble; result_type = _complex_types_map[result_type]; else: t = double; return t, result_type;  _fastCT = fastCopyAndTranspose; def _to_native_byte_order(*arrays): ret = []; for arr in arrays: if arr.dtype.byteorder not in ('=', '|'): ret.append(asarray(arr, dtype=arr.dtype.newbyteorder('='))); else: ret.append(arr); if len(ret) == 1: return ret[0]; else: return ret; def _fastCopyAndTranspose(type, *arrays): cast_arrays = (); for a in arrays: if a.dtype.type is type: cast_arrays = cast_arrays + (_fastCT(a),); else: cast_arrays = cast_arrays + (_fastCT(a.astype(type)),); if len(cast_arrays) == 1: return cast_arrays[0]; else: return cast_arrays; def _assert_2d(*arrays): for a in arrays: if a.ndim != 2: raise LinAlgError('%d-dimensional array given. Array must be '; 'two-dimensional' % a.ndim); def _assert_stacked_2d(*arrays): for a in arrays: if a.ndim < 2: raise LinAlgError('%d-dimensional array given. Array must be '; 'at least two-dimensional' % a.ndim); def _assert_stacked_square(*arrays): for a in arrays: m, n = a.shape[-2:]; if m != n: raise LinAlgError('Last 2 dimensions of the array must be square'); def _assert_finite(*arrays): for a in arrays: if not isfinite(a).all(): raise LinAlgError("Array must not contain infs or NaNs"); def _is_empty_2d(arr): return arr.size == 0 and product(arr.shape[-2:]) == 0; def transpose(a): a, wrap = _makearray(a); b = asarray(b); an = a.ndim; if axes is not None: allaxes = list(range(0, an)); for k in axes: allaxes.remove(k); allaxes.insert(an, k); a = a.transpose(allaxes); oldshape = a.shape[-(an-b.ndim):]; prod = 1; for k in oldshape: prod *= k; a = a.reshape(-1, prod); b = b.ravel(); res = wrap(solve(a, b)); res.shape = oldshape; return res; def _solve_dispatcher(a, b): return (a, b); @array_function_dispatch(_solve_dispatcher); def solve(a, b):  a, _ = _makearray(a); _assert_stacked_2d(a); _assert_stacked_square(a); b, wrap = _makearray(b); t, result_t = _commonType(a, b); if b.ndim == a.ndim - 1: gufunc = _umath_linalg.solve1; else: gufunc = _umath_linalg.solve; signature = 'DD->D' if isComplexType(t) else 'dd->d'; extobj = get_linalg_error_extobj(_raise_linalgerror_singular); r = gufunc(a, b, signature=signature, extobj=extobj); return wrap(r.astype(result_t, copy=False)); def _tensorinv_dispatcher(a, ind=None): return (a,); @array_function_dispatch(_tensorinv_dispatcher); def tensorinv(a, ind=2): a = asarray(a); oldshape = a.shape; prod = 1; if ind > 0: invshape = oldshape[ind:] + oldshape[:ind]; for k in oldshape[ind:]: prod *= k; else: raise ValueError("Invalid ind argument."); a = a.reshape(prod, -1); ia = inv(a); return ia.reshape(*invshape); def _unary_dispatcher(a): return (a,); @array_function_dispatch(_unary_dispatcher); def inv(a): a, wrap = _makearray(a); _assert_stacked_2d(a); _assert_stacked_square(a); t, result_t = _commonType(a); signature = 'D->D' if isComplexType(t) else 'd->d'; extobj = get_linalg_error_extobj(_raise_linalgerror_singular); ainv = _umath_linalg.inv(a, signature=signature, extobj=extobj); return wrap(ainv.astype(result_t, copy=False)); def _matrix_power_dispatcher(a, n): return (a,); @array_function_dispatch(_matrix_power_dispatcher); def matrix_power(a, n): a = asanyarray(a); _assert_stacked_2d(a); _assert_stacked_square(a); try: n = operator.index(n); except TypeError: raise TypeError("exponent must be an integer"); if a.dtype != object: fmatmul = matmul; elif a.ndim == 2: fmatmul = dot; else: raise NotImplementedError(; "matrix_power not supported for stacks of object arrays"); if n == 0: a = empty_like(a); a[...] = eye(a.shape[-2], dtype=a.dtype); return a; elif n < 0: a = inv(a); n = abs(n); if n == 1: return a; elif n == 2: return fmatmul(a, a); elif n == 3: return fmatmul(fmatmul(a, a), a); z = result = None; while n > 0: z = a if z is None else fmatmul(z, z); n, bit = divmod(n, 2); if bit: result = z if result is None else fmatmul(result, z); return result; @array_function_dispatch(_unary_dispatcher); def cholesky(a): extobj = get_linalg_error_extobj(_raise_linalgerror_nonposdef); gufunc = _umath_linalg.cholesky_lo; a, wrap = _makearray(a); _assert_stacked_2d(a); _assert_stacked_square(a); t, result_t = _commonType(a); signature = 'D->D' if isComplexType(t) else 'd->d'; r = gufunc(a, signature=signature, extobj=extobj); return wrap(r.astype(result_t, copy=False)); def _qr_dispatcher(a, mode=None): return (a,); @array_function_dispatch(_qr_dispatcher); def qr(a, mode='reduced'): if mode not in ('reduced', 'complete',x 'r', 'raw'): if mode in ('f', 'full'): msg = "".join((; "The 'full' option is deprecated in favor of 'reduced'.\n",; "For backward compatibility let mode default.")); warnings.warn(msg, DeprecationWarning, stacklevel=3); mode = 'reduced'; elif mode in ('e', 'economic'): msg = "The 'economic' option is deprecated."; warnings.warn(msg, DeprecationWarning, stacklevel=3); mode = 'economic'; else: raise ValueError("Unrecognized mode '%s'" % mode); a, wrap = _makearray(a); _assert_2d(a); m, n = a.shape; t, result_t = _commonType(a); a = _fastCopyAndTranspose(t, a); a = _to_native_byte_order(a); mn = min(m, n); tau = zeros((mn,), t); if isComplexType(t): lapack_routine = lapack_lite.zgeqrf; routine_name = 'zgeqrf'; else: lapack_routine = lapack_lite.dgeqrf; routine_name = 'dgeqrf'; lwork = 1; work = zeros((lwork,), t); results = lapack_routine(m, n, a, max(1, m), tau, work, -1, 0); if results['info'] != 0: raise LinAlgError('%s returns %d' % (routine_name, results['info'])); lwork = max(1, n, int(abs(work[0]))); work = zeros((lwork,), t); results = lapack_routine(m, n, a, max(1, m), tau, work, lwork, 0); if results['info'] != 0: raise LinAlgError('%s returns %d' % (routine_name, results['info'])); if mode == 'r': r = _fastCopyAndTranspose(result_t, a[:, :mn]); return wrap(triu(r)); if mode == 'raw': return a, tau; if mode == 'economic': if t != result_t : a = a.astype(result_t, copy=False); return wrap(a.T); if mode == 'complete' and m > n: mc = m; q = empty((m, m), t); else: mc = mn; q = empty((n, m), t); q[:n] = a; if isComplexType(t): lapack_routine = lapack_lite.zungqr; routine_name = 'zungqr'; else: lapack_routine = lapack_lite.dorgqr; routine_name = 'dorgqr'; lwork = 1; work = zeros((lwork,), t); results = lapack_routine(m, mc, mn, q, max(1, m), tau, work, -1, 0); if results['info'] != 0: raise LinAlgError('%s returns %d' % (routine_name, results['info'])); lwork = max(1, n, int(abs(work[0]))); work = zeros((lwork,), t); results = lapack_routine(m, mc, mn, q, max(1, m), tau, work, lwork, 0); if results['info'] != 0: raise LinAlgError('%s returns %d' % (routine_name, results['info'])); q = _fastCopyAndTranspose(result_t, q[:mc]); r = _fastCopyAndTranspose(result_t, a[:, :mc]); return wrap(q), wrap(triu(r)); @array_function_dispatch(_unary_dispatcher); def eigvals(a): a, wrap = _makearray(a); _assert_stacked_2d(a); _assert_stacked_square(a); _assert_finite(a); t, result_t = _commonType(a); extobj = get_linalg_error_extobj(; _raise_linalgerror_eigenvalues_nonconvergence); signature = 'D->D' if isComplexType(t) else 'd->D'; w = _umath_linalg.eigvals(a, signature=signature, extobj=extobj); if not isComplexType(t): if all(w.imag == 0): w = w.real; result_t = _realType(result_t); else: result_t = _complexType(result_t); return w.astype(result_t, copy=False); def _eigvalsh_dispatcher(a, UPLO=None): return (a,); @array_function_dispatch(_eigvalsh_dispatcher); def eigvalsh(a, UPLO='L'): UPLO = UPLO.upper(); if UPLO not in ('L', 'U'): raise ValueError("UPLO argument must be 'L' or 'U'"); extobj = get_linalg_error_extobj(; _raise_linalgerror_eigenvalues_nonconvergence); if UPLO == 'L': gufunc = _umath_linalg.eigvalsh_lo; else: gufunc = _umath_linalg.eigvalsh_up; a, wrap = _makearray(a); _assert_stacked_2d(a); _assert_stacked_square(a); t, result_t = _commonType(a); signature = 'D->d' if isComplexType(t) else 'd->d'; w = gufunc(a, signature=signature, extobj=extobj); return w.astype(_realType(result_t), copy=False); def _convertarray(a): t, result_t = _commonType(a); a = _fastCT(a.astype(t)); return a, t, result_t; def eig(a): a, wrap = _makearray(a); _assert_stacked_2d(a); _assert_stacked_square(a); _assert_finite(a); t, result_t = _commonType(a); extobj = get_linalg_error_extobj(; _raise_linalgerror_eigenvalues_nonconvergence); signature = 'D->DD' if isComplexType(t) else 'd->DD'; w, vt = _umath_linalg.eig(a, signature=signature, extobj=extobj); if not isComplexType(t) and all(w.imag == 0.0): w = w.real; vt = vt.real; result_t = _realType(result_t); else: result_t = _complexType(result_t); vt = vt.astype(result_t, copy=False); return w.astype(result_t, copy=False), wrap(vt); @array_function_dispatch(_eigvalsh_dispatcher); def eigh(a, UPLO='L'): UPLO = UPLO.upper(); if UPLO not in ('L', 'U'): raise ValueError("UPLO argument must be 'L' or 'U'"); a, wrap = _makearray(a); _assert_stacked_2d(a); _assert_stacked_square(a); t, result_t = _commonType(a); extobj = get_linalg_error_extobj(; _raise_linalgerror_eigenvalues_nonconvergence); if UPLO == 'L': gufunc = _umath_linalg.eigh_lo; else: gufunc = _umath_linalg.eigh_up; signature = 'D->dD' if isComplexType(t) else 'd->dd'; w, vt = gufunc(a, signature=signature, extobj=extobj); w = w.astype(_realType(result_t), copy=False); vt = vt.astype(result_t, copy=False); return w, wrap(vt); def _svd_dispatcher(a, full_matrices=None, compute_uv=None, hermitian=None): return (a,); @array_function_dispatch(_svd_dispatcher); def svd(a, full_matrices=True, compute_uv=True, hermitian=False): a, wrap = _makearray(a); if hermitian: if compute_uv: s, u = eigh(a); s = s[..., ::-1]; u = u[..., ::-1]; vt = transpose(u * sign(s)[..., None, :]).conjugate(); s = abs(s); return wrap(u), s, wrap(vt); else: s = eigvalsh(a); s = s[..., ::-1]; s = abs(s); return s; _assert_stacked_2d(a); t, result_t = _commonType(a); extobj = get_linalg_error_extobj(_raise_linalgerror_svd_nonconvergence); m, n = a.shape[-2:]; if compute_uv: if full_matrices: if m < n: gufunc = _umath_linalg.svd_m_f; else: gufunc = _umath_linalg.svd_n_f; else: if m < n: gufunc = _umath_linalg.svd_m_s; else: gufunc = _umath_linalg.svd_n_s; signature = 'D->DdD' if isComplexType(t) else 'd->ddd'; u, s, vh = gufunc(a, signature=signature, extobj=extobj); u = u.astype(result_t, copy=False); s = s.astype(_realType(result_t), copy=False); vh = vh.astype(result_t, copy=False); return wrap(u), s, wrap(vh); else: if m < n: gufunc = _umath_linalg.svd_m; else: gufunc = _umath_linalg.svd_n; signature = 'D->d' if isComplexType(t) else 'd->d'; s = gufunc(a, signature=signature, extobj=extobj); s = s.astype(_realType(result_t), copy=False); return s; def _cond_dispatcher(x, p=None): return (x,); @array_function_dispatch(_cond_dispatcher); def cond(x, p=None): x = asarray(x); if _is_empty_2d(x): raise LinAlgError("cond is not defined on empty arrays"); if p is None or p == 2 or p == -2: s = svd(x, compute_uv=False); with errstate(all='ignore'): if p == -2: r = s[..., -1] / s[..., 0]; else: r = s[..., 0] / s[..., -1]; else: _assert_stacked_2d(x); _assert_stacked_square(x); t, result_t = _commonType(x); signature = 'D->D' if isComplexType(t) else 'd->d'; with errstate(all='ignore'): invx = _umath_linalg.inv(x, signature=signature); r = norm(x, p, axis=(-2, -1)) * norm(invx, p, axis=(-2, -1)); r = r.astype(result_t, copy=False); r = asarray(r); nan_mask = isnan(r); if nan_mask.any(): nan_mask &= ~isnan(x).any(axis=(-2, -1)); if r.ndim > 0: r[nan_mask] = Inf; elif nan_mask: r[()] = Inf; if r.ndim == 0: r = r[()]; return r; def _matrix_rank_dispatcher(M, tol=None, hermitian=None): return (M,); @array_function_dispatch(_matrix_rank_dispatcher); def matrix_rank(M, tol=None, hermitian=False): M = asarray(M); if M.ndim < 2: return int(not all(M==0)); S = svd(M, compute_uv=False, hermitian=hermitian); if tol is None: tol = S.max(axis=-1, keepdims=True) * max(M.shape[-2:]) * finfo(S.dtype).eps; else: tol = asarray(tol)[..., newaxis]; return count_nonzero(S > tol, axis=-1); def pinv(a, rcond=1e-15, hermitian=False): a, wrap = _makearray(a); rcond = asarray(rcond); if _is_empty_2d(a): m, n = a.shape[-2:]; res = empty(a.shape[:-2] + (n, m), dtype=a.dtype); return wrap(res); a = a.conjugate(); u, s, vt = svd(a, full_matrices=False, hermitian=hermitian); cutoff = rcond[..., newaxis] * amax(s, axis=-1, keepdims=True); large = s > cutoff; s = divide(1, s, where=large, out=s); s[~large] = 0; res = matmul(transpose(vt), multiply(s[..., newaxis], transpose(u))); return wrap(res); def slogdet(a): a = asarray(a); _assert_stacked_2d(a); _assert_stacked_square(a); t, result_t = _commonType(a); real_t = _realType(result_t); signature = 'D->Dd' if isComplexType(t) else 'd->dd'; sign, logdet = _umath_linalg.slogdet(a, signature=signature); sign = sign.astype(result_t, copy=False); logdet = logdet.astype(real_t, copy=False); return sign, logdet; @array_function_dispatch(_unary_dispatcher); def det(a): a = asarray(a); _assert_stacked_2d(a); _assert_stacked_square(a); t, result_t = _commonType(a); signature = 'D->D' if isComplexType(t) else 'd->d'; r = _umath_linalg.det(a, signature=signature); r = r.astype(result_t, copy=False); return r; def lstsq(a, b, rcond="warn"): a, _ = _makearray(a); b, wrap = _makearray(b); is_1d = b.ndim == 1; if is_1d: b = b[:, newaxis]; _assert_2d(a, b); m, n = a.shape[-2:]; m2, n_rhs = b.shape[-2:]; if m != m2: raise LinAlgError('Incompatible dimensions'); t, result_t = _commonType(a, b); real_t = _linalgRealType(t); result_real_t = _realType(result_t); if rcond == "warn": warnings.warn("`rcond` parameter will change to the default of "; "machine precision times ``max(M, N)`` where M and N "; "are the input matrix dimensions.\n"; "To use the future default and silence this warning "; "we advise to pass `rcond=None`, to keep using the old, "; "explicitly pass `rcond=-1`.",; FutureWarning, stacklevel=3); rcond = -1; if rcond is None: rcond = finfo(t).eps * max(n, m); if m <= n: gufunc = _umath_linalg.lstsq_m; else: gufunc = _umath_linalg.lstsq_n; signature = 'DDd->Ddid' if isComplexType(t) else 'ddd->ddid'; extobj = get_linalg_error_extobj(_raise_linalgerror_lstsq); if n_rhs == 0: b = zeros(b.shape[:-2] + (m, n_rhs + 1), dtype=b.dtype); x, resids, rank, s = gufunc(a, b, rcond, signature=signature, extobj=extobj); if m == 0: x[...] = 0; if n_rhs == 0: x = x[..., :n_rhs]; resids = resids[..., :n_rhs]; if is_1d: x = x.squeeze(axis=-1); if rank != n or m <= n: resids = array([], result_real_t); s = s.astype(result_real_t, copy=False); resids = resids.astype(result_real_t, copy=False); x = x.astype(result_t, copy=True); return wrap(x), wrap(resids), rank, s; def _multi_svd_norm(x, row_axis, col_axis, op): y = moveaxis(x, (row_axis, col_axis), (-2, -1)); result = op(svd(y, compute_uv=False), axis=-1); return result; def _norm_dispatcher(x, ord=None, axis=None, keepdims=None): return (x,); @array_function_dispatch(_norm_dispatcher); def norm(x, ord=None, axis=None, keepdims=False): x = asarray(x); if not issubclass(x.dtype.type, (inexact, object_)): x = x.astype(float); if axis is None: ndim = x.ndim; if ((ord is None) or; (ord in ('f', 'fro') and ndim == 2) or; (ord == 2 and ndim == 1)): x = x.ravel(order='K'); if isComplexType(x.dtype.type): sqnorm = dot(x.real, x.real) + dot(x.imag, x.imag); else: sqnorm = dot(x, x); ret = sqrt(sqnorm); if keepdims: ret = ret.reshape(ndim*[1]); return ret; nd = x.ndim; if axis is None: axis = tuple(range(nd)); elif not isinstance(axis, tuple): try: axis = int(axis); except Exception: raise TypeError("'axis' must be None, an integer or a tuple of integers"); axis = (axis,); if len(axis) == 1: if ord == Inf: return abs(x).max(axis=axis, keepdims=keepdims); elif ord == -Inf: return abs(x).min(axis=axis, keepdims=keepdims); elif ord == 0: return (x != 0).astype(x.real.dtype).sum(axis=axis, keepdims=keepdims); elif ord == 1: return add.reduce(abs(x), axis=axis, keepdims=keepdims); elif ord is None or ord == 2: s = (x.conj() * x).real; return sqrt(add.reduce(s, axis=axis, keepdims=keepdims)); else: try: ord + 1; except TypeError: raise ValueError("Invalid norm order for vectors."); absx = abs(x); absx **= ord; ret = add.reduce(absx, axis=axis, keepdims=keepdims); ret **= (1 / ord); return ret; elif len(axis) == 2: row_axis, col_axis = axis; row_axis = normalize_axis_index(row_axis, nd); col_axis = normalize_axis_index(col_axis, nd); if row_axis == col_axis: raise ValueError('Duplicate axes given.'); if ord == 2: ret =_multi_svd_norm(x, row_axis, col_axis, amax); elif ord == -2: ret = _multi_svd_norm(x, row_axis, col_axis, amin); elif ord == 1: if col_axis > row_axis: col_axis -= 1; ret = add.reduce(abs(x), axis=row_axis).max(axis=col_axis); elif ord == Inf: if row_axis > col_axis: row_axis -= 1; ret = add.reduce(abs(x), axis=col_axis).max(axis=row_axis); elif ord == -1: if col_axis > row_axis: col_axis -= 1; ret = add.reduce(abs(x), axis=row_axis).min(axis=col_axis); elif ord == -Inf: if row_axis > col_axis: row_axis -= 1; ret = add.reduce(abs(x), axis=col_axis).min(axis=row_axis); elif ord in [None, 'fro', 'f']: ret = sqrt(add.reduce((x.conj() * x).real, axis=axis)); elif ord == 'nuc': ret = _multi_svd_norm(x, row_axis, col_axis, sum); else: raise ValueError("Invalid norm order for matrices."); if keepdims: ret_shape = list(x.shape); ret_shape[axis[0]] = 1; ret_shape[axis[1]] = 1; ret = ret.reshape(ret_shape); return ret; else: raise ValueError("Improper number of dimensions to norm."); def multi_dot(arrays): n = len(arrays); if n < 2: raise ValueError("Expecting at least two arrays."); elif n == 2: return dot(arrays[0], arrays[1]); arrays = [asanyarray(a) for a in arrays]; ndim_first, ndim_last = arrays[0].ndim, arrays[-1].ndim; if arrays[0].ndim == 1: arrays[0] = atleast_2d(arrays[0]); if arrays[-1].ndim == 1: arrays[-1] = atleast_2d(arrays[-1]).T; _assert_2d(*arrays)
15 | if n == 3: result = _multi_dot_three(arrays[0], arrays[1], arrays[2])
16 | else: order = _multi_dot_matrix_chain_order(arrays)
17 | result = _multi_dot(arrays, order, 0, n - 1)
18 | if ndim_first == 1 and ndim_last == 1: return result[0, 0]
19 | elif ndim_first == 1 or ndim_last == 1: return result.ravel()
20 | else: return result; def _multi_dot_three(A, B, C): a0, a1b0 = A.shape
21 | b1c0, c1 = C.shape; cost1 = a0 * b1c0 * (a1b0 + c1); cost2 = a1b0 * c1 * (a0 + b1c0); if cost1 < cost2: return dot(dot(A, B), C); else: return dot(A, dot(B, C)); def _multi_dot_matrix_chain_order(arrays, return_costs=False): n = len(arrays); p = [a.shape[0] for a in arrays] + [arrays[-1].shape[1]]; m = zeros((n, n), dtype=double); s = empty((n, n), dtype=intp); for l in range(1, n): for i in range(n - l): j = i + l; m[i, j] = Inf; for k in range(i, j): q = m[i, k] + m[k+1, j] + p[i]*p[k+1]*p[j+1]; if q < m[i, j]: m[i, j] = q; s[i, j] = k; if i == j: return arrays[i]; else: return dot(_multi_dot(arrays, order, i, order[i, j]),; _multi_dot(arrays, order, order[i, j] + 1, j))
22 | 


--------------------------------------------------------------------------------
/99_other_material/complexity.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "metadata": {},
  6 |    "source": []
  7 |   },
  8 |   {
  9 |    "cell_type": "markdown",
 10 |    "metadata": {},
 11 |    "source": []
 12 |   },
 13 |   {
 14 |    "cell_type": "code",
 15 |    "execution_count": 3,
 16 |    "metadata": {},
 17 |    "outputs": [
 18 |     {
 19 |      "data": {
 20 |       "text/plain": [
 21 |        "['OFFSETTEXTPAD',\n",
 22 |        " '__class__',\n",
 23 |        " '__delattr__',\n",
 24 |        " '__dict__',\n",
 25 |        " '__dir__',\n",
 26 |        " '__doc__',\n",
 27 |        " '__eq__',\n",
 28 |        " '__format__',\n",
 29 |        " '__ge__',\n",
 30 |        " '__getattribute__',\n",
 31 |        " '__getstate__',\n",
 32 |        " '__gt__',\n",
 33 |        " '__hash__',\n",
 34 |        " '__init__',\n",
 35 |        " '__init_subclass__',\n",
 36 |        " '__le__',\n",
 37 |        " '__lt__',\n",
 38 |        " '__module__',\n",
 39 |        " '__name__',\n",
 40 |        " '__ne__',\n",
 41 |        " '__new__',\n",
 42 |        " '__reduce__',\n",
 43 |        " '__reduce_ex__',\n",
 44 |        " '__repr__',\n",
 45 |        " '__setattr__',\n",
 46 |        " '__sizeof__',\n",
 47 |        " '__str__',\n",
 48 |        " '__subclasshook__',\n",
 49 |        " '__weakref__',\n",
 50 |        " '_agg_filter',\n",
 51 |        " '_alpha',\n",
 52 |        " '_animated',\n",
 53 |        " '_autolabelpos',\n",
 54 |        " '_axes',\n",
 55 |        " '_clipon',\n",
 56 |        " '_clippath',\n",
 57 |        " '_contains',\n",
 58 |        " '_copy_tick_props',\n",
 59 |        " '_get_clipping_extent_bbox',\n",
 60 |        " '_get_label',\n",
 61 |        " '_get_offset_text',\n",
 62 |        " '_get_tick',\n",
 63 |        " '_get_tick_bboxes',\n",
 64 |        " '_get_tick_boxes_siblings',\n",
 65 |        " '_get_ticks_position',\n",
 66 |        " '_gid',\n",
 67 |        " '_gridOnMajor',\n",
 68 |        " '_gridOnMinor',\n",
 69 |        " '_in_layout',\n",
 70 |        " '_label',\n",
 71 |        " '_major_tick_kw',\n",
 72 |        " '_minor_tick_kw',\n",
 73 |        " '_mouseover',\n",
 74 |        " '_oid',\n",
 75 |        " '_path_effects',\n",
 76 |        " '_picker',\n",
 77 |        " '_prop_order',\n",
 78 |        " '_propobservers',\n",
 79 |        " '_rasterized',\n",
 80 |        " '_remove_method',\n",
 81 |        " '_remove_overlapping_locs',\n",
 82 |        " '_scale',\n",
 83 |        " '_set_artist_props',\n",
 84 |        " '_set_gc_clip',\n",
 85 |        " '_set_scale',\n",
 86 |        " '_sketch',\n",
 87 |        " '_smart_bounds',\n",
 88 |        " '_snap',\n",
 89 |        " '_stale',\n",
 90 |        " '_sticky_edges',\n",
 91 |        " '_transform',\n",
 92 |        " '_transformSet',\n",
 93 |        " '_translate_tick_kw',\n",
 94 |        " '_update_axisinfo',\n",
 95 |        " '_update_label_position',\n",
 96 |        " '_update_offset_text_position',\n",
 97 |        " '_update_ticks',\n",
 98 |        " '_url',\n",
 99 |        " '_visible',\n",
100 |        " 'add_callback',\n",
101 |        " 'aname',\n",
102 |        " 'axes',\n",
103 |        " 'axis_date',\n",
104 |        " 'axis_name',\n",
105 |        " 'callbacks',\n",
106 |        " 'cla',\n",
107 |        " 'clipbox',\n",
108 |        " 'contains',\n",
109 |        " 'convert_units',\n",
110 |        " 'convert_xunits',\n",
111 |        " 'convert_yunits',\n",
112 |        " 'converter',\n",
113 |        " 'draw',\n",
114 |        " 'eventson',\n",
115 |        " 'figure',\n",
116 |        " 'findobj',\n",
117 |        " 'format_cursor_data',\n",
118 |        " 'get_agg_filter',\n",
119 |        " 'get_alpha',\n",
120 |        " 'get_animated',\n",
121 |        " 'get_children',\n",
122 |        " 'get_clip_box',\n",
123 |        " 'get_clip_on',\n",
124 |        " 'get_clip_path',\n",
125 |        " 'get_contains',\n",
126 |        " 'get_cursor_data',\n",
127 |        " 'get_data_interval',\n",
128 |        " 'get_figure',\n",
129 |        " 'get_gid',\n",
130 |        " 'get_gridlines',\n",
131 |        " 'get_in_layout',\n",
132 |        " 'get_inverted',\n",
133 |        " 'get_label',\n",
134 |        " 'get_label_position',\n",
135 |        " 'get_label_text',\n",
136 |        " 'get_major_formatter',\n",
137 |        " 'get_major_locator',\n",
138 |        " 'get_major_ticks',\n",
139 |        " 'get_majorticklabels',\n",
140 |        " 'get_majorticklines',\n",
141 |        " 'get_majorticklocs',\n",
142 |        " 'get_minor_formatter',\n",
143 |        " 'get_minor_locator',\n",
144 |        " 'get_minor_ticks',\n",
145 |        " 'get_minorticklabels',\n",
146 |        " 'get_minorticklines',\n",
147 |        " 'get_minorticklocs',\n",
148 |        " 'get_minpos',\n",
149 |        " 'get_offset_text',\n",
150 |        " 'get_path_effects',\n",
151 |        " 'get_picker',\n",
152 |        " 'get_pickradius',\n",
153 |        " 'get_rasterized',\n",
154 |        " 'get_remove_overlapping_locs',\n",
155 |        " 'get_scale',\n",
156 |        " 'get_sketch_params',\n",
157 |        " 'get_smart_bounds',\n",
158 |        " 'get_snap',\n",
159 |        " 'get_text_heights',\n",
160 |        " 'get_tick_padding',\n",
161 |        " 'get_tick_space',\n",
162 |        " 'get_ticklabel_extents',\n",
163 |        " 'get_ticklabels',\n",
164 |        " 'get_ticklines',\n",
165 |        " 'get_ticklocs',\n",
166 |        " 'get_ticks_direction',\n",
167 |        " 'get_ticks_position',\n",
168 |        " 'get_tightbbox',\n",
169 |        " 'get_transform',\n",
170 |        " 'get_transformed_clip_path_and_affine',\n",
171 |        " 'get_units',\n",
172 |        " 'get_url',\n",
173 |        " 'get_view_interval',\n",
174 |        " 'get_visible',\n",
175 |        " 'get_window_extent',\n",
176 |        " 'get_zorder',\n",
177 |        " 'grid',\n",
178 |        " 'have_units',\n",
179 |        " 'isDefault_label',\n",
180 |        " 'isDefault_majfmt',\n",
181 |        " 'isDefault_majloc',\n",
182 |        " 'isDefault_minfmt',\n",
183 |        " 'isDefault_minloc',\n",
184 |        " 'is_transform_set',\n",
185 |        " 'iter_ticks',\n",
186 |        " 'label',\n",
187 |        " 'label_position',\n",
188 |        " 'labelpad',\n",
189 |        " 'limit_range_for_scale',\n",
190 |        " 'major',\n",
191 |        " 'majorTicks',\n",
192 |        " 'minor',\n",
193 |        " 'minorTicks',\n",
194 |        " 'mouseover',\n",
195 |        " 'offsetText',\n",
196 |        " 'offset_text_position',\n",
197 |        " 'pan',\n",
198 |        " 'pchanged',\n",
199 |        " 'pick',\n",
200 |        " 'pickable',\n",
201 |        " 'pickradius',\n",
202 |        " 'properties',\n",
203 |        " 'remove',\n",
204 |        " 'remove_callback',\n",
205 |        " 'remove_overlapping_locs',\n",
206 |        " 'reset_ticks',\n",
207 |        " 'set',\n",
208 |        " 'set_agg_filter',\n",
209 |        " 'set_alpha',\n",
210 |        " 'set_animated',\n",
211 |        " 'set_clip_box',\n",
212 |        " 'set_clip_on',\n",
213 |        " 'set_clip_path',\n",
214 |        " 'set_contains',\n",
215 |        " 'set_data_interval',\n",
216 |        " 'set_default_intervals',\n",
217 |        " 'set_figure',\n",
218 |        " 'set_gid',\n",
219 |        " 'set_in_layout',\n",
220 |        " 'set_inverted',\n",
221 |        " 'set_label',\n",
222 |        " 'set_label_coords',\n",
223 |        " 'set_label_position',\n",
224 |        " 'set_label_text',\n",
225 |        " 'set_major_formatter',\n",
226 |        " 'set_major_locator',\n",
227 |        " 'set_minor_formatter',\n",
228 |        " 'set_minor_locator',\n",
229 |        " 'set_path_effects',\n",
230 |        " 'set_picker',\n",
231 |        " 'set_pickradius',\n",
232 |        " 'set_rasterized',\n",
233 |        " 'set_remove_overlapping_locs',\n",
234 |        " 'set_sketch_params',\n",
235 |        " 'set_smart_bounds',\n",
236 |        " 'set_snap',\n",
237 |        " 'set_tick_params',\n",
238 |        " 'set_ticklabels',\n",
239 |        " 'set_ticks',\n",
240 |        " 'set_ticks_position',\n",
241 |        " 'set_transform',\n",
242 |        " 'set_units',\n",
243 |        " 'set_url',\n",
244 |        " 'set_view_interval',\n",
245 |        " 'set_visible',\n",
246 |        " 'set_zorder',\n",
247 |        " 'stale',\n",
248 |        " 'stale_callback',\n",
249 |        " 'sticky_edges',\n",
250 |        " 'tick_bottom',\n",
251 |        " 'tick_top',\n",
252 |        " 'units',\n",
253 |        " 'update',\n",
254 |        " 'update_from',\n",
255 |        " 'update_units',\n",
256 |        " 'zoom',\n",
257 |        " 'zorder']"
258 |       ]
259 |      },
260 |      "execution_count": 3,
261 |      "metadata": {},
262 |      "output_type": "execute_result"
263 |     }
264 |    ],
265 |    "source": [
266 |     "dir(ax.xaxis)"
267 |    ]
268 |   },
269 |   {
270 |    "cell_type": "code",
271 |    "execution_count": 14,
272 |    "metadata": {},
273 |    "outputs": [
274 |     {
275 |      "data": {
276 |       "text/plain": [
277 |        "[Text(0, 0, '1 person (you)'),\n",
278 |        " Text(0, 0, '5 people (team)'),\n",
279 |        " Text(0, 0, '100 people (group)'),\n",
280 |        " Text(0, 0, '10,000 people (division)'),\n",
281 |        " Text(0, 0, '1,000,000 people (world)')]"
282 |       ]
283 |      },
284 |      "execution_count": 14,
285 |      "metadata": {},
286 |      "output_type": "execute_result"
287 |     },
288 |     {
289 |      "data": {
290 |       "image/png": "iVBORw0KGgoAAAANSUhEUgAAAwIAAAGqCAYAAAC4bPMSAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjMsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+AADFEAAAgAElEQVR4nO3df9xtVV0n8M+Xi78CFZWrIT8CEyRUULmShk3YDwPrNUxKKTmKjsWgYk1Ok5hNVs40OdlkjgKSEWollWERkehoioQKl8QLKOgNSG5YgiL+SIfQ1R97P9xzD+f5de+53Mtd7/fr9byec/ZZe+/1nL3OOvuz1977qdZaAACAvuy2oysAAADc8wQBAADokCAAAAAdEgQAAKBDggAAAHRIEAAAgA4tGwSq6uyq+nxVXb3I61VVb6yqjVW1oaqeNP9qAgAA87SSEYFzkhy7xOvHJTl4/Dk5yRnbXi0AAGB7WjYItNYuTvLFJYocn+TtbfDRJHtV1T7zqiAAADB/u89hGfsmuWni+aZx2uemC1bVyRlGDbLHHnsceeihh85h9QAAsLgrrrji1tba2h1dj53NPIJAzZjWZhVsrZ2V5KwkWbduXVu/fv0cVg8AAIurqn/Y0XXYGc3jrkGbkuw/8Xy/JDfPYbkAAMB2Mo8gcH6SF4x3D3pKkttba3c7LQgAANh5LHtqUFW9M8kxSfauqk1JXpPkPknSWjszyYVJnplkY5J/SfKi7VVZAABgPpYNAq21E5d5vSV52dxqBAAAbHf+szAAAHRIEAAAgA4JAgAA0CFBAAAAOiQIAABAhwQBAADokCAAAAAdEgQAAKBDggAAAHRIEAAAgA4JAgAA0CFBAAAAOiQIAABAhwQBAADokCAAAAAdEgQAAKBDggAAAHRIEAAAgA4JAgAA0CFBAAAAOiQIAABAhwQBAADokCAAAAAdEgQAAKBDggAAAHRIEAAAgA4JAgAA0CFBAAAAOiQIAABAhwQBAADokCAAAAAdEgQAAKBDggAAAHRIEAAAgA4JAgAA0CFBAAAAOiQIAABAhwQBAADokCAAAAAdEgQAAKBDggAAAHRIEAAAgA4JAgAA0CFBAAAAOiQIAABAhwQBAADokCAAAAAdEgQAAKBDggAAAHRIEAAAgA4JAgAA0CFBAAAAOiQIAABAhwQBAADokCAAAAAdEgQAAKBDggAAAHRIEAAAgA4JAgAA0CFBAAAAOiQIAABAhwQBAADokCAAAAAdEgQAAKBDggAAAHRIEAAAgA4JAgAA0CFBAAAAOiQIAABAh1YUBKrq2Kq6rqo2VtVpM15/cFX9ZVV9oqquqaoXzb+qAADAvCwbBKpqTZI3JzkuyWFJTqyqw6aKvSzJJ1trRyQ5JslvVdV951xXAABgTlYyInBUko2ttetba3ckOTfJ8VNlWpIHVlUl2TPJF5PcOdeaAgAAc7OSILBvkpsmnm8ap016U5LvSnJzkquS/Gxr7VtzqSEAADB3KwkCNWNam3r+w0muTPLIJE9I8qaqetDdFlR1clWtr6r1t9xyy6orCwAAzMdKgsCmJPtPPN8vw5H/SS9Kcl4bbExyQ5JDpxfUWjurtbautbZu7dq1W1tnAABgG60kCFye5OCqOmi8APi5Sc6fKvPZJD+QJFX1iCSPSXL9PCsKAADMz+7LFWit3VlVpya5KMmaJGe31q6pqlPG189M8tok51TVVRlOJXpla+3W7VhvAABgGywbBJKktXZhkgunpp058fjmJM+Yb9UAAIDtxX8WBgCADgkCAADQIUEAAAA6JAgAAECHBAEAAOiQIAAAAB0SBAAAoEOCAAAAdEgQAACADgkCAADQIUEAAAA6JAgAAECHBAEAAOiQIAAAAB0SBAAAoEOCAAAAdEgQAACADgkCAADQIUEAAAA6JAgAAECHBAEAAOiQIAAAAB0SBAAAoEOCAAAAdEgQAACADgkCAADQIUEAAAA6JAgAAECHBAEAAOiQIAAAAB0SBAAAoEOCAAAAdEgQAACADgkCAADQIUEAAAA6JAgAAECHBAEAAOiQIAAAAB0SBAAAoEOCAAAAdEgQAACADgkCAADQIUEAAAA6JAgAAECHBAEAAOiQIAAAAB0SBAAAoEOCAAAAdEgQAACADgkCAADQIUEAAAA6JAgAAECHBAEAAOiQIAAAAB0SBAAAoEOCAAAAdEgQAACADgkCAADQIUEAAAA6JAgAAECHBAEAAOiQIAAAAB0SBAAAoEOCAAAAdEgQAACADgkCAADQIUEAAAA6JAgAAECHBAEAAOiQIAAAAB1aURCoqmOr6rqq2lhVpy1S5piqurKqrqmqD823mgAAwDztvlyBqlqT5M1JfijJpiSXV9X5rbVPTpTZK8npSY5trX22qh6+vSoMAABsu5WMCByVZGNr7frW2h1Jzk1y/FSZn0xyXmvts0nSWvv8fKsJAADM00qCwL5Jbpp4vmmcNumQJA+pqg9W1RVV9YJZC6qqk6tqfVWtv+WWW7auxgAAwDZbSRCoGdPa1PPdkxyZ5EeS/HCS/15Vh9xtptbOaq2ta62tW7t27aorCwAAzMey1whkGAHYf+L5fklunlHm1tba15J8raouTnJEkk/PpZYAAMBcrWRE4PIkB1fVQVV13yTPTXL+VJm/SPK9VbV7VX1bku9O8qn5VhUAAJiXZUcEWmt3VtWpSS5KsibJ2a21a6rqlPH1M1trn6qq9yTZkORbSd7aWrt6e1YcAADYetXa9On+94x169a19evX75B1AwDQj6q6orW2bkfXY2fjPwsDAECHBAEAAOiQIAAAAB0SBAAAoEOCAAAAdEgQAACADgkCAADQIUEAAAA6JAgAAECHBAEAAOiQIAAAAB0SBAAAoEOCAAAAdEgQAACADgkCAADQIUEAAAA6JAgAAECHBAEAAOiQIAAAAB0SBAAAoEOCAAAAdEgQAACADgkCAADQIUEAAAA6JAgAAECHBAEAAOiQIAAAAB0SBAAAoEOCAAAAdEgQAACADgkCAADQIUEAAAA6JAgAAECHBAEAAOiQIAAAAB0SBAAAoEOCAAAAdEgQAACADgkCAADQIUEAAAA6JAgAAECHBAEAAOiQIAAAAB0SBAAAoEOCAAAAdEgQAACADgkCAADQIUEAAAA6JAgAAECHBAEAAOiQIAAAAB0SBAAAoEOCAAAAdEgQAACADgkCAADQIUEAAAA6JAgAAECHBAEAAOiQIAAAAB0SBAAAoEOCAAAAdEgQAACADgkCAADQIUEAAAA6JAgAAECHBAEAAOiQIAAAAB0SBAAAoEOCAAAAdGhFQaCqjq2q66pqY1WdtkS5J1fVN6vqhPlVEQAAmLdlg0BVrUny5iTHJTksyYlVddgi5V6X5KJ5VxIAAJivlYwIHJVkY2vt+tbaHUnOTXL8jHIvT/JnST4/x/oBAADbwUqCwL5Jbpp4vmmcdpeq2jfJjyU5c6kFVdXJVbW+qtbfcsstq60rAAAwJysJAjVjWpt6/oYkr2ytfXOpBbXWzmqtrWutrVu7du1K6wgAAMzZ7isosynJ/hPP90ty81SZdUnOraok2TvJM6vqztban8+llgAAwFytJAhcnuTgqjooyT8meW6Sn5ws0Fo7aOFxVZ2T5AIhAAAAdl7LBoHW2p1VdWqGuwGtSXJ2a+2aqjplfH3J6wIAAICdz0pGBNJauzDJhVPTZgaA1toLt71aAADA9uQ/CwMAQIcEAQAA6JAgAAAAHRIEAACgQ4IAAAB0SBAAAIAOCQIAANAhQQAAADokCAAAQIcEAQAA6JAgAAAAHRIEAACgQ4IAAAB0SBAAAIAOCQIAANAhQQAAADokCAAAQIcEAQAA6JAgAAAAHRIEAACgQ4IAAAB0SBAAAIAOCQIAANAhQQAAADokCAAAQIcEAQAA6JAgAAAAHRIEAACgQ4IAAAB0SBAAAIAOCQIAANAhQQAAADokCAAAQIcEAQAA6JAgAAAAHRIEAACgQ4IAAAB0SBAAAIAOCQIAANAhQQAAADokCAAAQIcEAQAA6JAgAAAAHRIEAACgQ4IAAAB0SBAAAIAOCQIAANAhQQAAADokCAAAQIcEAQAA6JAgAAAAHRIEAACgQ4IAAAB0SBAAAIAOCQIAANAhQQAAADokCAAAQIcEAQAA6JAgAAAAHRIEAACgQ4IAAAB0SBAAAIAOCQIAANAhQQAAADokCAAAQIcEAQAA6JAgAAAAHRIEAACgQ4IAAAB0aEVBoKqOrarrqmpjVZ024/XnVdWG8efSqjpi/lUFAADmZdkgUFVrkrw5yXFJDktyYlUdNlXshiTf11o7PMlrk5w174oCAADzs5IRgaOSbGytXd9auyPJuUmOnyzQWru0tXbb+PSjSfabbzUBAIB5WkkQ2DfJTRPPN43TFvPiJH+9LZUCAAC2r91XUKZmTGszC1Y9PUMQeNoir5+c5OQkOeCAA1ZYRQAAYN5WMiKwKcn+E8/3S3LzdKGqOjzJW5Mc31r7wqwFtdbOaq2ta62tW7t27dbUFwAAmIOVBIHLkxxcVQdV1X2TPDfJ+ZMFquqAJOcleX5r7dPzryYAADBPy54a1Fq7s6pOTXJRkjVJzm6tXVNVp4yvn5nkl5M8LMnpVZUkd7bW1m2/agMAANuiWpt5uv92t27durZ+/fodsm4AAPpRVVc4SH13/rMwAAB0SBAAAIAOCQIAANAhQQAAADokCAAAQIcEAQAA6JAgAAAAHRIEAACgQ4IAAAB0SBAAAIAOCQIAANAhQQAAADokCAAAQIcEAQAA6JAgAAAAHRIEAACgQ4IAAAB0SBAAAIAOCQIAANAhQQAAADokCAAAQIcEAQAA6JAgAAAAHRIEAACgQ4IAAAB0SBAAAIAOCQIAANAhQQAAADokCAAAQIcEAQAA6JAgAAAAHRIEAACgQ4IAAAB0SBAAAIAOCQIAANAhQQAAADokCAAAQIcEAQAA6JAgAAAAHRIEAACgQ4IAAAB0SBAAAIAOCQIAANAhQQAAADokCAAAQIcEAQAA6JAgAAAAHRIEAACgQ4IAAAB0SBAAAIAOCQIAANAhQQAAADokCAAAQIcEAQAA6JAgAAAAHRIEAACgQ4IAAAB0SBAAAIAOCQIAANAhQQAAADokCAAAQIcEAQAA6JAgAAAAHRIEAACgQ4IAAAB0SBAAAIAOCQIAANAhQQAAADokCAAAQIcEAQAA6NCKgkBVHVtV11XVxqo6bcbrVVVvHF/fUFVPmn9VAQCAeVk2CFTVmiRvTnJcksOSnFhVh00VOy7JwePPyUnOmHM9AQCAOVrJiMBRSTa21q5vrd2R5Nwkx0+VOT7J29vgo0n2qqp95lxXAABgTnZfQZl9k9w08XxTku9eQZl9k3xuslBVnZxhxCBJvlpV162qtvRg7yS37uhKsNPRLphFu2AW7YJZHrOjK7AzWkkQqBnT2laUSWvtrCRnrWCddKqq1rfW1u3oerBz0S6YRbtgFu2CWapq/Y6uw85oJacGbUqy/8Tz/ZLcvBVlAACAncRKgsDlSQ6uqoOq6r5Jnpvk/Kky5yd5wXj3oKckub219rnpBQEAADuHZU8Naq3dWVWnJrkoyZokZ7fWrqmqU8bXz0xyYZJnJtmY5F+SvGj7VZldnFPHmEW7YBbtglm0C2bRLmao1u52Kj8AALCL85+FAQCgQ4IAAAB0SBAgVXV2VX2+qq7e0XVZrap6YlW9dc7LXFtV75nnMu8NqurGqrqqqq7ckbdZG+ux9yrneVdVPWp8/Ivbp2ZbrG+XaSOLff6r6qFV9b6q+sz4+yETr72qqjZW1XVV9cP3fK2TqjpwtX1WVT2gqj5UVWu2V72WWf99q+riqlrJrbvnud5Vb+OpcieNZT5TVSdNTD+oqj42Tv/j8YYiGW8c8saxjWyoqidt379wtqo6pqouWOU8+yw2T1V9sKrWjY8vrKq9lljOI6vqXcus69LV1G1q3nOr6uCtnHfZ7/yqut+4TTeO2/jAiddW1R5mLHtm/1FVR47fQRvH9lPL1eWeVFUvrKo3rXKe7bGPsmjfN9VG/99in+lJggBJck6SY7fXwrfzl+4vJvm/81xga+2WJJ+rqqPnudx7iae31p5wb7oHd1U9Nsma1tr146TtHgR2sTZyTmZ//k9L8v7W2sFJ3j8+T1UdluHucY8d5zt9R+1Yb4X/lOS81to3VzrDPP+21todGd7L58xrmSt0TlaxjSdV1UOTvCbDPxI9KslrJnYuXpfkt8f5b0vy4nH6cUkOHn9OTnLG3P6S7e8VSX53uUKttWe21r60xOs3t9ZOWGYZ37MV9VtwRpJf2Mp5z8ny3/kvTnJba+3RSX47w7be2vZwl2X6jzMytJeFtrNQx5l1uZeY6z7KKg8ivCPJS5ct1Vrz4ydJDkxy9RKvn5PkzCQfTvLpJD86Tl+T5Dcz3GZ2Q5L/PE4/JsnfJPmjJJ9MskeSv0ryiSRXJ3nOWO4Hknw8yVVJzk5yv3H6jUl+Ncnfja8dOqNOD0xy3fh4tySfSbJ24vnGDP9h8jsyfMltGH8fMPE3nTCxvK9OPD4+yek7ervcw23gxiR7L1Nmte2gxulXj9txYbsfk+TiJO8e28eZSXabrkeS/5jksiRXJnlLhh3+6Tr9epIXjo9/I8k3x/J/uNQyMnzprE9yTZJfnXoffj3JR8bXn5Thrml/n+SUXbGNZMbnP8l1SfYZH+8z8Vl7VZJXTZS7KMlTF2lPrxvf+8uSPHqcvjbJn41t5fIkR4/TH5rkz8f289Ekh4/TfyXDF9oHMnzGf3q6zou1vxl1ujTJgePj3ZKcPm7/CzLc/e6Eibr/cpJLMuy0PGGs04axzT5kLPfBJOvGx3snuXF8/MIkf5HkPeP7+JqJOhyR5MKdeRtPlTkxyVsmnr9lnFYZ/nvv7uP0pya5aLLMrPVMLfurSX4rQz///mzuv79zfO+uyNDXHDpOX6ovn9UvHZPkgvHxHhm+Yy7P8J1z/CLv0/XZ/D30gCTnjuv74yQfm9jeN47b/HVJXjox/68k+a/Zsn0+Npv7oA1JDl74+8ffS/WTH0zyriTXJvnDbL7Jy25Jblh4/+fRHqZev+tzneEOk7eO9Vx1e5ha7sz+I0P7u3ZWu1usLjP+nmuTvG18j9+V5NvG145M8qGxPV2UzW1+qc/1GzL0F1cnOWric/2mpfqxqTrdtY8yPr8qyV7je/WFJC8Yp78jyQ8muX+S3x/LfTzDgbmF9f5pkr/M0A/ete2ydBt9yFLbeOHHiACrcWCS70vyI0nOrKr7Z0jqt7fWnpzkyUl+uqoOGssfleTVrbXDMiT7m1trR7TWHpfkPeP852To9B6f4QP+kon13dpae1KGHbafn1GfdRk+pGmtfSvJHyR53vjaDyb5RGvt1iRvSvL21trhGTrSN67gb12f5HtXUG5X0pK8t6quqKqTlyh3YFbeDp6VobM9IsM2+c2q2mdczlEZvjAfn+GL/1mTK6mq78pw5PTo1toTMuzgPy93d3SGDj6ttdOSfL0NoxrPW2YZr27DyMfhSb6vqg6fWOZNrbWnZtixOCfJCUmekuTXJsrs6m3kEW38fzDj74eP0/dNctNEuU3jtFm+3Fo7KsNn8A3jtN/JcNTwyUmenWRh2PxXk3x8/Jz+YpK3Tyzn8Azt7alJfrmqHjm1nqX6oSTDaTlJHtVau3Gc9KwMbfnxSX5qXPakb7TWntZaO3esyyvHul2V4Yjoco7K0NaekOTHF4brM/RZT17B/PeExbbxpMW298OSfKm1dufU9KXmmbZHkr8b+/kPZfP7elaSl7fWjszQ958+Tl+qLz8wd++XJr06yQfGNvL0DH3RHpMFxjZzW2vt/4+TXpLkX8b1/c8MO5TTzs2WIzw/kWGnbdIpSX5n7IPWZXg/Ji3VTz4xyX9JcliSR2Xo7xa+8zaO82wPd23DcRvfnmGbb017mLncqXL7Zsv3ZWZ7mqrLtMckOWvcXl9O8tKquk+GI/InjO3p7AzbMln6c71HG0ZsXjrOM22xfmzSXfsoo7/NsP0emyFwLnx/PCVDIHnZ+Dc+PkMQettEO35qkpNaa98/tY5F22hr7bYk96uqWe/VXe7R8xS51/uTsfP5TFVdn+TQJM9IcnhVLQyBPjjDkN4dSS5rrd0wTr8qyeur6nUZjtB8uKqOSHJDa+3TY5m3ZfggLOwwnDf+viJTO4mjfZLcMvH87AxH4d6Q4RSA3x+nP3Vi/nck+d8r+Fs/n2R6Z2NXd3Rr7eaqeniS91XVta21i2eUW007eFqSd7bhVIx/rqoPZdgJ+nKG9nF9klTVO8eyk+fU/kCGTu3y8VTRB2TYLtOm28GkpZbxE2Pg2X1cxmEZjqokm/9p4lVJ9mytfSXJV6rqG1W1VxtOCeixjSTD0axpi92H+p0Tv397fPyDSQ4bt0eSPKiqHphh+z87SVprH6iqh1XVg8cyf9Fa+3qSr1fV32TYyb5yYj2Ltb8bJsrsnWTyVI6nJfnTsS3/07jcSX+cJGMd9mqtfWic/rbcfUdvlve11r4wLuO8cX3rW2vfrKo7quqBY7va2S22vZdqByttI9/K+D5nOJBzXlXtmeR7kvzpRBu53/h7qb58Vr806RlJ/n1VLRxUun+SA5J8aqLMdF/y7zKGjdbahqrakCmttY9X1cPHcLo2Q5D47NR57B9J8uqq2i/DqWmfmVrMcv3kpiSpqiszBJ5LxvkW+qArpus1B6vd7ivd5tuzPd3UWvvb8fEfJPmZDCNLj8vwnZYMo4efW8Hn+p1J0lq7uKoeNON6kJn92NRnero9fThDm/qHjKdBVdW+Sb7YWvtqVT0t42lErbVrq+ofkhwyzvu+1toXZ/zNy7XRhTbyhRnzJhEEWJ3pD97Ch/flrbWLJl+oqmOSfO2ugq19uqqOzPCP5/5XVb03d/8P1dMWjsp8M7Pb6tczdOYL67ipqv65qr4/w/mLs44eT/4dd2a8Tma8KGnywqb7j8vvRmvt5vH356vq3Rl2tmYFgdW0g2cutcplnleSt7XWXrVM1bdoBytZxnjk7+eTPLm1dltVnTO1jIW2962JxwvPF9rirt5G/rmq9mmtfW48OrkQoDYl2X+i3H5Jbl5kGW3G490yDPNv8d7VxDfqjHlW0lbu1v6mTLeTWeub9LVlXk8m+pDcvQ0uVef7JfnGCpa/vS22jSdtynCKyoL9Mpw6cWuSvapq9/Eo7WQ7WE0bmdQyvJ9fGo+er6T8rMeznleSZ7fWrltiebP6ksVC7qR3ZRg1/PYMIwRbLqC1P6qqj2UYrbioqn6qtfaBqbotZrL/mf4u3J590MI23DSel/7gJF/M1rWHWcudnP/mcfp+M6YvVZdpi303XTOO8N5l4iDDYpZrTzP7sSnT7eniDAc7D8gwQvVjGdrNhxeqtcSyluqPlmqjy7YRpwaxGj9eVbtV1XdmGKK8LsP5di8Zh99SVYdMD7eO0x+ZYfjqD5K8PsN519cmObCqHj0We36G4eGV+lSSR09Ne2uGIwF/0jZfEHhphvN8kyEcLBxNuTGbh9GOT3KfieUcki2H9HZpVbXHeFQ24/Z7Rhb/+1fTDi5O8pyqWlNVazMcvbhsXM5RNdxlYrcMQ+uXTK3n/UlOGEcoFu5w8h0z6jPdDv51oR5LLONBGTrW26vqERkublytXb2NnJ/kpPHxSRlG2xamP7eGO3kclOHI+2Uz5k82nzLxnAxHRZPkvUlOXShQVQs7fBdnDO/jgYRbW2tfHl87vqruPw5xH5PhnNxJy/ZD4zD5momh9kuSPHtsy4/Iljs3k/PdnuS2qloYxp/sp27M5j5k+sLQHxrb2wOS/IcMpwVk/Btuaa3966z13cNmbuOq2req3j9OvyjJM6rqITVcFPqMDOd+twzXgZ0wPf+43BfU4CkZTtv63Iz17zYx/08muWTc5jdU1Y+Pdalx9DhZvC9PZvdLky5K8vKFwFlVT5xRn09nOOK+YLJNPi7DKWqznDvW64RsOaqZcd5HJbm+tfbGDO/N9HKW6ieXckiGa1zmoqpOraqFz+Zk2zghw2lVLVvRHqrqqKp6+8Ry79Z/jO3jK1X1lHEbvSBbtqdZdZl2QFUt7PCfmKF9XJdk7cL0qrpPVT12mc91MvZd41H628fykxbrxyZt8d3UWrspw8jkweNo+CUZDkgtBIHJ9nZIhsCwVHCdnmeLNjq+j9+eoZ9alCDAwmkZH0nymKraVFV3u9J/dF2GD8pfZ7ho8hsZdrw/meTvarid1Vsy++j945NcVsPQ5quT/I9x/hdlGAK+KsPR1jNXWu/W2rVJHrywAzs6P8me2XxaUDIMD75oHDJ7fpKfHaf/boZzwy/LMIIwmbifnuHi5l48IsklVfWJDF9Af9VaW+z2mKtpB+/OcLrNJzJc5PQLrbV/GpfzkQwX916d4RSOd0+upLX2ySS/lOG6hQ1J3pdhqHXaX2XLnbizkmyoqj9cbBmttU9kuBjrmgynlP1tVm+XaCNLfP5/I8PO7GeS/ND4PK21a5L8SYbt/Z4kL2uL34XnfuOR0J9N8nPjtJ9Jsq6G20p+MsP508lwkeW6cTv9RjZ/8Sdjm8xwHu1rF0avJqy0H3pvhtMwkuFCv00Z2t9bMlxkN/1lv+CkDOdtb8hwLvfCtSKvzxBALs3wBT/pkgynr1yZ5M9aawu35H16hguT7zGr3cYZPmd3Jsl4OsJrs/miyF+bOEXhlUleUVUbM5yz/Xvj9AsznAO9MUM/u9idS76W5LFVdUWS78/m9/V5SV489kfXZDhQkyzelyez+6VJr81wsGfD2EZeO12Z1trXkvz9xMGpM5LsOa7vF7LIzvn4mXhgkn9cJPA8J8nV4/ffodny+pdk6X5ypjG8fn2R9S1pifZwaDafQvJ7SR42bttXZLyj1Fa2hwMyHpVepv94SYbP8sYMN2f466XqMsOnkpw0bq+HJjmjDXfqOiHJ68b2dGWGU8+SxT/XyRASLs2wTzJrn2ixfuwui+yjfCxD4EyGALBvNgfa0zMcrLgqwylzL5y4XmUxS7XRI5N8dOK6jZkWrj6HJdVw6sQFrbUl74t8T6uqn0vyldbaW8fn6zJcwLNNF3FW1cUZ7ipx2xyquV/Ryj0AAAFpSURBVMuYVzsYj/j+fGvtR+dQpwdkOBJ19BI7pHOnjSytqm7McPeKW7dxOb+S4e4qr59DnZ6Y5BWtteePz/ccz819WIYv0KOX2wFb4XpemOFvP3XGa+dluGvKckf6dpjxqPBnW2vLnb65rev5amttzzks55zM6fupqn4syZGttV/a1mVtT+N335dba7+3bOGVL/OCJM8ad57npqp+M8k7Wmt3u8Zijus4MEMbeNwclvXBDN9P2/z/dKb3Ue5JVfU7Sc5vrb1/qXKuEeDe7owkC0PIp2U4orDYtQErMg7N/h87ePcOrbWvV9VrMhxZ+ew9sU5t5N5pvLDzb6pqzRgaL6jhIsD7Zhhp2OYQsJQa7lz05ztzCEiS1tqq/mnSrqS19u5a5i4rO4kvZRhxmpt5HJhZZLn/bXss917irn2UHeDq5UJAYkQAAAC65BoBAADokCAAAAAdEgQAAKBDggAAAHRIEAAAgA79GwU6zSeTW8wdAAAAAElFTkSuQmCC\n",
291 |       "text/plain": [
292 |        "<Figure size 864x518.4 with 1 Axes>"
293 |       ]
294 |      },
295 |      "metadata": {
296 |       "needs_background": "light"
297 |      },
298 |      "output_type": "display_data"
299 |     }
300 |    ],
301 |    "source": [
302 |     "import matplotlib.pyplot as plt\n",
303 |     "import numpy as np\n",
304 |     "\n",
305 |     "vals = [1, 5, 100, 10000, 1000000]\n",
306 |     "vals = np.log10(vals)\n",
307 |     "vals = [1, 2, 3, 4, 5]\n",
308 |     "\n",
309 |     "fig, ax = plt.subplots(1, 1, figsize=(12, 7.2))\n",
310 |     "ax.xaxis.set_ticks(vals)\n",
311 |     "ax.xaxis.set_ticklabels([\n",
312 |     "    \"1 person (you)\",\n",
313 |     "    \"5 people (team)\",\n",
314 |     "    \"100 people (group)\",\n",
315 |     "    \"10,000 people (division)\",\n",
316 |     "    \"1,000,000 people (world)\"])\n",
317 |     "ax.yaxis.set_ticks(vals)\n",
318 |     "ax.yaxis.set_ticklabels([\n",
319 |     "    \"1 day\",\n",
320 |     "    \"1 week\",\n",
321 |     "    \"3 months\",\n",
322 |     "    \"1 year\",\n",
323 |     "    \"10 years\"])"
324 |    ]
325 |   },
326 |   {
327 |    "cell_type": "markdown",
328 |    "metadata": {},
329 |    "source": []
330 |   },
331 |   {
332 |    "cell_type": "markdown",
333 |    "metadata": {},
334 |    "source": []
335 |   },
336 |   {
337 |    "cell_type": "markdown",
338 |    "metadata": {},
339 |    "source": []
340 |   },
341 |   {
342 |    "cell_type": "markdown",
343 |    "metadata": {},
344 |    "source": []
345 |   },
346 |   {
347 |    "cell_type": "markdown",
348 |    "metadata": {},
349 |    "source": []
350 |   },
351 |   {
352 |    "cell_type": "markdown",
353 |    "metadata": {},
354 |    "source": []
355 |   },
356 |   {
357 |    "cell_type": "markdown",
358 |    "metadata": {},
359 |    "source": []
360 |   }
361 |  ],
362 |  "metadata": {
363 |   "kernelspec": {
364 |    "display_name": "Python 3",
365 |    "language": "python",
366 |    "name": "python3"
367 |   },
368 |   "language_info": {
369 |    "codemirror_mode": {
370 |     "name": "ipython",
371 |     "version": 3
372 |    },
373 |    "file_extension": ".py",
374 |    "mimetype": "text/x-python",
375 |    "name": "python",
376 |    "nbconvert_exporter": "python",
377 |    "pygments_lexer": "ipython3",
378 |    "version": "3.7.7"
379 |   }
380 |  },
381 |  "nbformat": 4,
382 |  "nbformat_minor": 4
383 | }
384 | 


--------------------------------------------------------------------------------
/07_graphs/07_graphs.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "metadata": {
  6 |     "Collapsed": "false"
  7 |    },
  8 |    "source": [
  9 |     "# Machine Learning for Graphs\n",
 10 |     "This notebook contains:\n",
 11 |     "1. A very brief introduction for the library [`NetworkX`](https://networkx.github.io/).\n",
 12 |     "1. An introduction to Graph Neural Networks (GNNs) with [`PyTorch`](https://pytorch.org/).\n",
 13 |     "1. Some basic queries for the proprietary graph database [`Neo4j`](https://neo4j.com/)."
 14 |    ]
 15 |   },
 16 |   {
 17 |    "cell_type": "markdown",
 18 |    "metadata": {},
 19 |    "source": [
 20 |     "## Introduction to NetworkX\n",
 21 |     "How to handle graph data in python?"
 22 |    ]
 23 |   },
 24 |   {
 25 |    "cell_type": "code",
 26 |    "execution_count": null,
 27 |    "metadata": {},
 28 |    "outputs": [],
 29 |    "source": [
 30 |     "import networkx as nx\n",
 31 |     "import pandas as pd\n",
 32 |     "import matplotlib.pyplot as plt\n",
 33 |     "\n",
 34 |     "print(f'NetworkX version used: {nx.__version__}')"
 35 |    ]
 36 |   },
 37 |   {
 38 |    "cell_type": "markdown",
 39 |    "metadata": {},
 40 |    "source": [
 41 |     "### Load dataset 📞 ☎️"
 42 |    ]
 43 |   },
 44 |   {
 45 |    "cell_type": "code",
 46 |    "execution_count": null,
 47 |    "metadata": {},
 48 |    "outputs": [],
 49 |    "source": [
 50 |     "df = pd.read_csv('./log_of_calls.csv')\n",
 51 |     "df"
 52 |    ]
 53 |   },
 54 |   {
 55 |    "cell_type": "markdown",
 56 |    "metadata": {},
 57 |    "source": [
 58 |     "### Convert to NetworkX graph\n",
 59 |     "It is sometimes some work to get all the edge and feature attributes into the desired format 🤬!"
 60 |    ]
 61 |   },
 62 |   {
 63 |    "cell_type": "code",
 64 |    "execution_count": null,
 65 |    "metadata": {},
 66 |    "outputs": [],
 67 |    "source": [
 68 |     "from_df = df[[c for c in df.columns if c.startswith('from_')]]\n",
 69 |     "from_df.columns = [c[5:] for c in from_df.columns]\n",
 70 |     "to_df = df[[c for c in df.columns if c.startswith('to_')]]\n",
 71 |     "to_df.columns = [c[3:] for c in to_df.columns]\n",
 72 |     "df_nodes = pd.concat((from_df, to_df), ignore_index=True)\n",
 73 |     "df_nodes = df_nodes.drop(columns='dt')\n",
 74 |     "df_nodes = df_nodes.drop_duplicates(subset='number')\n",
 75 |     "df_nodes = df_nodes.reset_index(drop=True)\n",
 76 |     "df_nodes"
 77 |    ]
 78 |   },
 79 |   {
 80 |    "cell_type": "code",
 81 |    "execution_count": null,
 82 |    "metadata": {},
 83 |    "outputs": [],
 84 |    "source": [
 85 |     "G = nx.MultiDiGraph()\n",
 86 |     "\n",
 87 |     "G.add_nodes_from(zip(\n",
 88 |     "    df_nodes.number,\n",
 89 |     "    df_nodes.drop(columns='number').to_dict('records')\n",
 90 |     "))\n",
 91 |     "G.nodes(data=True)['403-726-6587']"
 92 |    ]
 93 |   },
 94 |   {
 95 |    "cell_type": "markdown",
 96 |    "metadata": {},
 97 |    "source": [
 98 |     "The addition of edges preserves the insertion order:"
 99 |    ]
100 |   },
101 |   {
102 |    "cell_type": "code",
103 |    "execution_count": null,
104 |    "metadata": {},
105 |    "outputs": [],
106 |    "source": [
107 |     "list(G.nodes)[0]"
108 |    ]
109 |   },
110 |   {
111 |    "cell_type": "code",
112 |    "execution_count": null,
113 |    "metadata": {},
114 |    "outputs": [],
115 |    "source": [
116 |     "G.add_edges_from(zip(\n",
117 |     "    df.from_number,\n",
118 |     "    df.to_number,\n",
119 |     "    df[['from_dt', 'to_dt']].to_dict('records')\n",
120 |     "))\n",
121 |     "list(G.edges(nbunch='403-726-6587', data=True))"
122 |    ]
123 |   },
124 |   {
125 |    "cell_type": "markdown",
126 |    "metadata": {},
127 |    "source": [
128 |     "### What does the Graph look like? 📊 📈 📉\n",
129 |     "As mentioned this question strongly depends on the spatial order of nodes..."
130 |    ]
131 |   },
132 |   {
133 |    "cell_type": "code",
134 |    "execution_count": null,
135 |    "metadata": {},
136 |    "outputs": [],
137 |    "source": [
138 |     "plt.figure(figsize=(16,16))\n",
139 |     "nx.draw_circular(G, width=0.1, node_size=10)"
140 |    ]
141 |   },
142 |   {
143 |    "cell_type": "markdown",
144 |    "metadata": {},
145 |    "source": [
146 |     "### Some Summary Statistics 📂\n",
147 |     "Plenty of algorithms are available:"
148 |    ]
149 |   },
150 |   {
151 |    "cell_type": "code",
152 |    "execution_count": null,
153 |    "metadata": {
154 |     "scrolled": false
155 |    },
156 |    "outputs": [],
157 |    "source": [
158 |     "from IPython.display import IFrame  \n",
159 |     "IFrame('https://networkx.github.io/documentation/networkx-2.4/reference', width=1000, height=650)"
160 |    ]
161 |   },
162 |   {
163 |    "cell_type": "code",
164 |    "execution_count": null,
165 |    "metadata": {},
166 |    "outputs": [],
167 |    "source": [
168 |     "nx.average_shortest_path_length(G)"
169 |    ]
170 |   },
171 |   {
172 |    "cell_type": "markdown",
173 |    "metadata": {},
174 |    "source": [
175 |     "## Graph Neural Networks (GNNs) with PyTorch (aka Message Passing 📥 📤)\n",
176 |     "Graph Neural Networks (GNNs) unraveled!"
177 |    ]
178 |   },
179 |   {
180 |    "cell_type": "code",
181 |    "execution_count": null,
182 |    "metadata": {},
183 |    "outputs": [],
184 |    "source": [
185 |     "from collections import OrderedDict\n",
186 |     "from typing import Tuple, List\n",
187 |     "\n",
188 |     "import numpy as np\n",
189 |     "import scipy.sparse as sp\n",
190 |     "from sklearn.model_selection import train_test_split\n",
191 |     "from sklearn.preprocessing import OneHotEncoder, LabelEncoder\n",
192 |     "import torch\n",
193 |     "from torch.nn import functional as F\n",
194 |     "from torch import nn\n",
195 |     "from tqdm.notebook import tqdm "
196 |    ]
197 |   },
198 |   {
199 |    "cell_type": "markdown",
200 |    "metadata": {},
201 |    "source": [
202 |     "### Loading the labels 🤷‍♂️ 🤷‍♀️\n",
203 |     "_(This task is probably political correct, but the result is quite surprising!)_"
204 |    ]
205 |   },
206 |   {
207 |    "cell_type": "code",
208 |    "execution_count": null,
209 |    "metadata": {},
210 |    "outputs": [],
211 |    "source": [
212 |     "le = LabelEncoder()\n",
213 |     "y = le.fit_transform(df_nodes['gender'].values)\n",
214 |     "y = torch.from_numpy(y)\n",
215 |     "y"
216 |    ]
217 |   },
218 |   {
219 |    "cell_type": "markdown",
220 |    "metadata": {},
221 |    "source": [
222 |     "### Loading the features\n",
223 |     "I.e. the one hot encoding of names:\n",
224 |     "\n",
225 |     "**Observation:** \n",
226 |     "🔎_The name is basically a unique identifier, so how should we be able to learn something (with partially labelled data)?_ 🔎"
227 |    ]
228 |   },
229 |   {
230 |    "cell_type": "code",
231 |    "execution_count": null,
232 |    "metadata": {},
233 |    "outputs": [],
234 |    "source": [
235 |     "X = OneHotEncoder().fit_transform(df_nodes['name'].values[:, None]).toarray()\n",
236 |     "X = torch.from_numpy(X).float()\n",
237 |     "X.shape"
238 |    ]
239 |   },
240 |   {
241 |    "cell_type": "markdown",
242 |    "metadata": {},
243 |    "source": [
244 |     "## Convert the NetworkX graph into a sparse adjacecny matrix\n",
245 |     "Otherwise the space requirements are $O(n^2)$ with the number of nodes $n$"
246 |    ]
247 |   },
248 |   {
249 |    "cell_type": "code",
250 |    "execution_count": null,
251 |    "metadata": {},
252 |    "outputs": [],
253 |    "source": [
254 |     "def from_networkx_to_sparse_tensor(G: nx.Graph) -> torch.Tensor:\n",
255 |     "    if hasattr(G, 'to_undirected'):\n",
256 |     "        G = G.to_undirected()\n",
257 |     "    adjacency_matrix = nx.convert_matrix.to_scipy_sparse_matrix(G)\n",
258 |     "    adjacency_matrix += sp.diags(np.ones(len(G.nodes())))\n",
259 |     "    adjacency_matrix = adjacency_matrix.tocoo()\n",
260 |     "    row_index = torch.from_numpy(adjacency_matrix.row).to(torch.long)\n",
261 |     "    col_index = torch.from_numpy(adjacency_matrix.col).to(torch.long)\n",
262 |     "    A = torch.sparse.FloatTensor(\n",
263 |     "        torch.stack([row_index, col_index], dim=0),\n",
264 |     "        torch.ones_like(row_index, dtype=torch.float)\n",
265 |     "    ).coalesce()\n",
266 |     "    return A"
267 |    ]
268 |   },
269 |   {
270 |    "cell_type": "code",
271 |    "execution_count": null,
272 |    "metadata": {},
273 |    "outputs": [],
274 |    "source": [
275 |     "A = from_networkx_to_sparse_tensor(G)\n",
276 |     "A"
277 |    ]
278 |   },
279 |   {
280 |    "cell_type": "markdown",
281 |    "metadata": {},
282 |    "source": [
283 |     "### Implementation of a Graph Convolutional Network (GCN)\n",
284 |     "For the graph convolutional layer we are going to use the following update scheme:\n",
285 |     "\n",
286 |     "$$𝐻^{(𝑙+1)}=\\sigma\\left(𝐷^{−\\frac{1}{2}} 𝐴 𝐷^{−\\frac{1}{2}} 𝐻^{(𝑙)} 𝑊{(𝑙)}\\right)=\\sigma\\left(\\hat{A} 𝐻^{(𝑙)} 𝑊{(𝑙)}\\right)$$\n",
287 |     "\n",
288 |     "We use the ReLU for the activation function, but in the last layer where we directly output the raw logits (i.e. no activation at all). With $𝐻^{(0)}$ we denote the node features."
289 |    ]
290 |   },
291 |   {
292 |    "cell_type": "code",
293 |    "execution_count": null,
294 |    "metadata": {},
295 |    "outputs": [],
296 |    "source": [
297 |     "class GraphConvolution(nn.Module):\n",
298 |     "    \"\"\"\n",
299 |     "    Graph Convolution Layer: as proposed in [Kipf et al. 2017](https://arxiv.org/abs/1609.02907).\n",
300 |     "    \n",
301 |     "    Parameters\n",
302 |     "    ----------\n",
303 |     "    in_channels: int\n",
304 |     "        Dimensionality of input channels/features.\n",
305 |     "    out_channels: int\n",
306 |     "        Dimensionality of output channels/features.\n",
307 |     "    \"\"\"\n",
308 |     "\n",
309 |     "    def __init__(self, in_channels: int, out_channels: int):\n",
310 |     "        super().__init__()\n",
311 |     "        self.linear = nn.Linear(in_channels, out_channels, bias=False)\n",
312 |     "\n",
313 |     "    def forward(self, arguments: Tuple[torch.tensor, torch.sparse.FloatTensor]) -> torch.tensor:\n",
314 |     "        \"\"\"\n",
315 |     "        Forward method.\n",
316 |     "        \n",
317 |     "        Parameters\n",
318 |     "        ----------\n",
319 |     "        arguments: Tuple[torch.tensor, torch.sparse.FloatTensor]\n",
320 |     "            Tuple of feature matrix `X` and normalized adjacency matrix `A_hat`\n",
321 |     "            \n",
322 |     "        Returns\n",
323 |     "        ---------\n",
324 |     "        X: torch.tensor\n",
325 |     "            The result of the message passing step\n",
326 |     "        \"\"\"\n",
327 |     "        X, A_hat = arguments\n",
328 |     "        X = A_hat @ self.linear(X)\n",
329 |     "        return X"
330 |    ]
331 |   },
332 |   {
333 |    "cell_type": "markdown",
334 |    "metadata": {},
335 |    "source": [
336 |     "In the following we stack multiple layers ($l$) (with ReLU activation functions and dropout in between). Before we pass the adjacency matrix to the GCN, we calculate the normalized adjacency matrix: \n",
337 |     "$$\\hat{A} = 𝐷^{−\\frac{1}{2}} 𝐴 𝐷^{−\\frac{1}{2}}$$"
338 |    ]
339 |   },
340 |   {
341 |    "cell_type": "code",
342 |    "execution_count": null,
343 |    "metadata": {},
344 |    "outputs": [],
345 |    "source": [
346 |     "class GCN(nn.Module):\n",
347 |     "    \"\"\"\n",
348 |     "    Graph Convolution Network: as proposed in [Kipf et al. 2017](https://arxiv.org/abs/1609.02907).\n",
349 |     "    \n",
350 |     "    Parameters\n",
351 |     "    ----------\n",
352 |     "    n_features: int\n",
353 |     "        Dimensionality of input features.\n",
354 |     "    n_classes: int\n",
355 |     "        Number of classes for the semi-supervised node classification.\n",
356 |     "    hidden_dimensions: List[int]\n",
357 |     "        Internal number of features. `len(hidden_dimensions)` defines the number of hidden representations.\n",
358 |     "    activation: nn.Module\n",
359 |     "        The activation for each layer but the last.\n",
360 |     "    dropout: float\n",
361 |     "        The dropout probability.\n",
362 |     "    \"\"\"\n",
363 |     "    \n",
364 |     "    def __init__(self,\n",
365 |     "                 n_features: int,\n",
366 |     "                 n_classes: int,\n",
367 |     "                 hidden_dimensions: List[int] = [80],\n",
368 |     "                 activation: nn.Module = nn.ReLU(),\n",
369 |     "                 dropout: float = 0.5):\n",
370 |     "        super().__init__()\n",
371 |     "        self.n_features = n_features\n",
372 |     "        self.n_classes = n_classes\n",
373 |     "        self.hidden_dimensions = hidden_dimensions\n",
374 |     "        self.layers = nn.ModuleList()\n",
375 |     "        self.layers.extend([\n",
376 |     "            nn.Sequential(OrderedDict([\n",
377 |     "                (f'gcn_{idx}', GraphConvolution(in_channels=in_channels,\n",
378 |     "                                                out_channels=out_channels)),\n",
379 |     "                (f'activation_{idx}', activation),\n",
380 |     "                (f'dropout_{idx}', nn.Dropout(p=dropout))\n",
381 |     "            ]))\n",
382 |     "            for idx, (in_channels, out_channels)\n",
383 |     "            in enumerate(zip([n_features] + hidden_dimensions[:-1], hidden_dimensions))\n",
384 |     "        ])\n",
385 |     "        self.layers.append(\n",
386 |     "            nn.Sequential(OrderedDict([\n",
387 |     "                (f'gcn_{len(hidden_dimensions)}', GraphConvolution(in_channels=hidden_dimensions[-1],\n",
388 |     "                                                                  out_channels=n_classes))\n",
389 |     "            ]))\n",
390 |     "        )\n",
391 |     "  \n",
392 |     "    def normalize(self, A: torch.sparse.FloatTensor) -> torch.tensor:\n",
393 |     "        \"\"\"\n",
394 |     "        For calculating $\\hat{A} = 𝐷^{−\\frac{1}{2}} 𝐴 𝐷^{−\\frac{1}{2}}$.\n",
395 |     "        \n",
396 |     "        Parameters\n",
397 |     "        ----------\n",
398 |     "        A: torch.sparse.FloatTensor\n",
399 |     "            Sparse adjacency matrix with added self-loops.\n",
400 |     "            \n",
401 |     "        Returns\n",
402 |     "        -------\n",
403 |     "        A_hat: torch.sparse.FloatTensor\n",
404 |     "            Normalized message passing matrix\n",
405 |     "        \"\"\"\n",
406 |     "        row, col = A._indices()\n",
407 |     "        edge_weight = A._values()\n",
408 |     "        deg = (A @ torch.ones(A.shape[0], 1, device=A.device)).squeeze()\n",
409 |     "        deg_inv_sqrt = deg.pow(-0.5)\n",
410 |     "        normalized_edge_weight = deg_inv_sqrt[row] * edge_weight * deg_inv_sqrt[col]\n",
411 |     "        A_hat = torch.sparse.FloatTensor(A._indices(), normalized_edge_weight, A.shape)\n",
412 |     "        return A_hat\n",
413 |     "\n",
414 |     "    def forward(self, X: torch.Tensor, A: torch.sparse.FloatTensor) -> torch.tensor:\n",
415 |     "        \"\"\"\n",
416 |     "        Forward method.\n",
417 |     "        \n",
418 |     "        Parameters\n",
419 |     "        ----------\n",
420 |     "        X: torch.tensor\n",
421 |     "            Feature matrix `X`\n",
422 |     "        A: torch.tensor\n",
423 |     "            adjacency matrix `A` (with self-loops)\n",
424 |     "            \n",
425 |     "        Returns\n",
426 |     "        ---------\n",
427 |     "        X: torch.tensor\n",
428 |     "            The result of the last message passing step (i.e. the logits)\n",
429 |     "        \"\"\"\n",
430 |     "        A_hat = self.normalize(A)\n",
431 |     "        for layer in self.layers:\n",
432 |     "            X = layer((X, A_hat))\n",
433 |     "        return X"
434 |    ]
435 |   },
436 |   {
437 |    "cell_type": "markdown",
438 |    "metadata": {},
439 |    "source": [
440 |     "### Train/Validation/Test split 🎛"
441 |    ]
442 |   },
443 |   {
444 |    "cell_type": "code",
445 |    "execution_count": null,
446 |    "metadata": {},
447 |    "outputs": [],
448 |    "source": [
449 |     "def split(labels: np.ndarray,\n",
450 |     "          train_size: float = 0.1,\n",
451 |     "          val_size: float = 0.1,\n",
452 |     "          test_size: float = 0.8,\n",
453 |     "          random_state: int = 42) -> List[np.ndarray]:\n",
454 |     "    \"\"\"Split the arrays or matrices into random train, validation and test subsets.\n",
455 |     "\n",
456 |     "    Parameters\n",
457 |     "    ----------\n",
458 |     "    labels: np.ndarray [n_nodes]\n",
459 |     "        The class labels\n",
460 |     "    train_size: float\n",
461 |     "        Proportion of the dataset included in the train split.\n",
462 |     "    val_size: float\n",
463 |     "        Proportion of the dataset included in the validation split.\n",
464 |     "    test_size: float\n",
465 |     "        Proportion of the dataset included in the test split.\n",
466 |     "    random_state: int\n",
467 |     "        Random_state is the seed used by the random number generator;\n",
468 |     "\n",
469 |     "    Returns\n",
470 |     "    -------\n",
471 |     "    split_train: array-like\n",
472 |     "        The indices of the training nodes\n",
473 |     "    split_val: array-like\n",
474 |     "        The indices of the validation nodes\n",
475 |     "    split_test array-like\n",
476 |     "        The indices of the test nodes\n",
477 |     "\n",
478 |     "    \"\"\"\n",
479 |     "    idx = np.arange(labels.shape[0])\n",
480 |     "    idx_train_and_val, idx_test = train_test_split(idx,\n",
481 |     "                                                   random_state=random_state,\n",
482 |     "                                                   train_size=(train_size + val_size),\n",
483 |     "                                                   test_size=test_size,\n",
484 |     "                                                   stratify=labels)\n",
485 |     "\n",
486 |     "    idx_train, idx_val = train_test_split(idx_train_and_val,\n",
487 |     "                                          random_state=random_state,\n",
488 |     "                                          train_size=(train_size / (train_size + val_size)),\n",
489 |     "                                          test_size=(val_size / (train_size + val_size)),\n",
490 |     "                                          stratify=labels[idx_train_and_val])\n",
491 |     "    \n",
492 |     "    return idx_train, idx_val, idx_test"
493 |    ]
494 |   },
495 |   {
496 |    "cell_type": "markdown",
497 |    "metadata": {},
498 |    "source": [
499 |     "### The training code... 🎓"
500 |    ]
501 |   },
502 |   {
503 |    "cell_type": "code",
504 |    "execution_count": null,
505 |    "metadata": {},
506 |    "outputs": [],
507 |    "source": [
508 |     "def train(model: nn.Module, \n",
509 |     "          X: torch.Tensor, \n",
510 |     "          A: torch.sparse.FloatTensor, \n",
511 |     "          labels: torch.Tensor, \n",
512 |     "          idx_train: np.ndarray, \n",
513 |     "          idx_val: np.ndarray,\n",
514 |     "          lr: float = 1e-2,\n",
515 |     "          weight_decay: float = 5e-4, \n",
516 |     "          patience: int = 50, \n",
517 |     "          max_epochs: int = 500, \n",
518 |     "          display_step: int = 10):\n",
519 |     "    \"\"\"\n",
520 |     "    Train a model using either standard or adversarial training.\n",
521 |     "    \n",
522 |     "    Parameters\n",
523 |     "    ----------\n",
524 |     "    model: nn.Module\n",
525 |     "        Model which we want to train.\n",
526 |     "    X: torch.Tensor [n, d]\n",
527 |     "        Dense attribute matrix.\n",
528 |     "    A: torch.sparse.FloatTensor [n, n]\n",
529 |     "        Sparse adjacency matrix.\n",
530 |     "    labels: torch.Tensor [n]\n",
531 |     "        Ground-truth labels of all nodes,\n",
532 |     "    idx_train: np.ndarray [?]\n",
533 |     "        Indices of the training nodes.\n",
534 |     "    idx_val: np.ndarray [?]\n",
535 |     "        Indices of the validation nodes.\n",
536 |     "    lr: float\n",
537 |     "        Learning rate.\n",
538 |     "    weight_decay : float\n",
539 |     "        Weight decay.\n",
540 |     "    patience: int\n",
541 |     "        The number of epochs to wait for the validation loss to improve before stopping early.\n",
542 |     "    max_epochs: int\n",
543 |     "        Maximum number of epochs for training.\n",
544 |     "    display_step : int\n",
545 |     "        How often to print information.\n",
546 |     "    seed: int\n",
547 |     "        Seed\n",
548 |     "        \n",
549 |     "    Returns\n",
550 |     "    -------\n",
551 |     "    trace_train: list\n",
552 |     "        A list of values of the train loss during training.\n",
553 |     "    trace_val: list\n",
554 |     "        A list of values of the validation loss during training.\n",
555 |     "    \"\"\"\n",
556 |     "    trace_train = []\n",
557 |     "    trace_val = []\n",
558 |     "    optimizer = torch.optim.Adam(model.parameters(), lr=lr, weight_decay=weight_decay)\n",
559 |     "\n",
560 |     "    best_loss = np.inf\n",
561 |     "    for it in tqdm(range(max_epochs), desc='Training...'):\n",
562 |     "        logits = model(X, A)     \n",
563 |     "        loss_train = F.cross_entropy(logits[idx_train], labels[idx_train])\n",
564 |     "        loss_val = F.cross_entropy(logits[idx_val], labels[idx_val])\n",
565 |     "\n",
566 |     "        optimizer.zero_grad()\n",
567 |     "        loss_train.backward()\n",
568 |     "        optimizer.step()\n",
569 |     "        \n",
570 |     "        trace_train.append(loss_train.detach().item())\n",
571 |     "        trace_val.append(loss_val.detach().item())\n",
572 |     "\n",
573 |     "        if loss_val < best_loss:\n",
574 |     "            best_loss = loss_val\n",
575 |     "            best_epoch = it\n",
576 |     "            best_state = {key: value.cpu() for key, value in model.state_dict().items()}\n",
577 |     "        else:\n",
578 |     "            if it >= best_epoch + patience:\n",
579 |     "                break\n",
580 |     "\n",
581 |     "        if display_step > 0 and it % display_step == 0:\n",
582 |     "            print(f'Epoch {it:4}: loss_train: {loss_train.item():.5f}, loss_val: {loss_val.item():.5f} ')\n",
583 |     "\n",
584 |     "    # restore the best validation state\n",
585 |     "    model.load_state_dict(best_state)\n",
586 |     "    return trace_train, trace_val"
587 |    ]
588 |   },
589 |   {
590 |    "cell_type": "markdown",
591 |    "metadata": {},
592 |    "source": [
593 |     "### 🚧 Putting it all together 🚧"
594 |    ]
595 |   },
596 |   {
597 |    "cell_type": "code",
598 |    "execution_count": null,
599 |    "metadata": {},
600 |    "outputs": [],
601 |    "source": [
602 |     "D, C = X.shape[1], y.max() + 1\n",
603 |     "\n",
604 |     "gcn = GCN(n_features=D, n_classes=C, hidden_dimensions=[64])\n",
605 |     "\n",
606 |     "gcn"
607 |    ]
608 |   },
609 |   {
610 |    "cell_type": "code",
611 |    "execution_count": null,
612 |    "metadata": {},
613 |    "outputs": [],
614 |    "source": [
615 |     "idx_train, idx_val, idx_test = split(y.numpy(), train_size=0.1, val_size=0.1, test_size=0.8)"
616 |    ]
617 |   },
618 |   {
619 |    "cell_type": "code",
620 |    "execution_count": null,
621 |    "metadata": {
622 |     "scrolled": false
623 |    },
624 |    "outputs": [],
625 |    "source": [
626 |     "trace_train, trace_val = train(gcn, X, A, y, idx_train, idx_val)\n",
627 |     "\n",
628 |     "plt.plot(trace_train, label='train')\n",
629 |     "plt.plot(trace_val, label='validation')\n",
630 |     "plt.xlabel('Epochs')\n",
631 |     "plt.ylabel('Loss')\n",
632 |     "plt.legend()\n",
633 |     "plt.grid(True)"
634 |    ]
635 |   },
636 |   {
637 |    "cell_type": "code",
638 |    "execution_count": null,
639 |    "metadata": {},
640 |    "outputs": [],
641 |    "source": [
642 |     "gcn.eval()\n",
643 |     "logits = gcn(X, A)\n",
644 |     "accuracy = (torch.argmax(logits, dim=-1) == y)[idx_test].float().mean()\n",
645 |     "print(f'We can predict the name with an accuracy of {100*accuracy:.2f} % ' \n",
646 |     "      f'based on non-informative features due to the graph stucture!!!')"
647 |    ]
648 |   },
649 |   {
650 |    "cell_type": "markdown",
651 |    "metadata": {
652 |     "Collapsed": "false"
653 |    },
654 |    "source": [
655 |     "## 🗺 Graph Databases (Neo4j)\n",
656 |     "Handle your graph data professionally!"
657 |    ]
658 |   },
659 |   {
660 |    "cell_type": "markdown",
661 |    "metadata": {
662 |     "Collapsed": "false"
663 |    },
664 |    "source": [
665 |     "Switch to this folder: `cd 07_graphs`\n",
666 |     "\n",
667 |     "Start Neo4j server e.g. via docker (initial user: `neo4j`, pw: `neo4j`):\n",
668 |     "```bash\n",
669 |     "docker run \\\n",
670 |     "    --publish=7474:7474 --publish=7687:7687 \\\n",
671 |     "    --volume=$PWD/data:/data \\\n",
672 |     "    --volume=$PWD/import:/import \\\n",
673 |     "    --env 'NEO4JLABS_PLUGINS=[\"graph-data-science\"]' \\\n",
674 |     "    neo4j:4.1.1\n",
675 |     "```\n",
676 |     "\n",
677 |     "Then connect to the Neo4j bowser via `http://localhost:7474/` (default user and pw is typically `neo4j`).\n",
678 |     "\n",
679 |     "### Load Graph\n",
680 |     "\n",
681 |     "```sql\n",
682 |     "MATCH (n) DETACH DELETE n;\n",
683 |     "LOAD CSV WITH HEADERS FROM 'file:///log_of_calls.csv' AS line\n",
684 |     "MERGE (c1:City { name: line.from_city })\n",
685 |     "MERGE (p1:Person { name: line.from_name, number: line.from_number, gender: line.from_gender })\n",
686 |     "MERGE (p1)-[:FROM]->(c1)\n",
687 |     "MERGE (c2:City { name: line.to_city })\n",
688 |     "MERGE (p2:Person { name: line.to_name, number: line.to_number, gender: line.to_gender })\n",
689 |     "MERGE (p2)-[:FROM]->(c2)\n",
690 |     "CREATE (p1)-[c:Calls { \n",
691 |     "\t\tfrom: datetime(line.from_dt),\n",
692 |     "\t\tto: datetime(line.to_dt),\n",
693 |     "        duration: duration.between(datetime(line.from_dt), datetime(line.to_dt)).minutes\n",
694 |     "\t}]->(p2)\n",
695 |     "```\n",
696 |     "\n",
697 |     "### Visualize Graph\n",
698 |     "\n",
699 |     "For example we want to have a look at all persons from `Pattaya`:\n",
700 |     "```sql\n",
701 |     "MATCH p=()-[r:FROM]->({ name: 'Pattaya' }) \n",
702 |     "RETURN p\n",
703 |     "```\n",
704 |     "or equivalently:\n",
705 |     "```sql\n",
706 |     "MATCH p=()-[r:FROM]->(c)\n",
707 |     "WHERE c.name='Pattaya'\n",
708 |     "RETURN p\n",
709 |     "```\n",
710 |     "\n",
711 |     "### Explain\n",
712 |     "Similarily to SQL, we can execute an `EXPLAIN` query for analyis of the execution plan:\n",
713 |     "```\n",
714 |     "EXPLAIN MATCH p=()-[r:FROM]->(c)\n",
715 |     "WHERE c.name='Pattaya'\n",
716 |     "RETURN p\n",
717 |     "```\n",
718 |     "\n",
719 |     "### Closeness centrality\n",
720 |     "```\n",
721 |     "CALL gds.alpha.closeness.stream({\n",
722 |     "  nodeProjection: 'Person',\n",
723 |     "  relationshipProjection: 'Calls'\n",
724 |     "})\n",
725 |     "YIELD nodeId, centrality\n",
726 |     "RETURN gds.util.asNode(nodeId).name AS user, centrality\n",
727 |     "ORDER BY centrality DESC\n",
728 |     "```"
729 |    ]
730 |   },
731 |   {
732 |    "cell_type": "code",
733 |    "execution_count": null,
734 |    "metadata": {},
735 |    "outputs": [],
736 |    "source": []
737 |   }
738 |  ],
739 |  "metadata": {
740 |   "jupytext": {
741 |    "formats": "ipynb,py"
742 |   },
743 |   "kernelspec": {
744 |    "display_name": "Python 3",
745 |    "language": "python",
746 |    "name": "python3"
747 |   },
748 |   "language_info": {
749 |    "codemirror_mode": {
750 |     "name": "ipython",
751 |     "version": 3
752 |    },
753 |    "file_extension": ".py",
754 |    "mimetype": "text/x-python",
755 |    "name": "python",
756 |    "nbconvert_exporter": "python",
757 |    "pygments_lexer": "ipython3",
758 |    "version": "3.7.6"
759 |   }
760 |  },
761 |  "nbformat": 4,
762 |  "nbformat_minor": 4
763 | }
764 | 


--------------------------------------------------------------------------------
/02_database_basics/02_database_basics.py:
--------------------------------------------------------------------------------
  1 | # ---
  2 | # jupyter:
  3 | #   jupytext:
  4 | #     formats: ipynb,py
  5 | #     text_representation:
  6 | #       extension: .py
  7 | #       format_name: light
  8 | #       format_version: '1.5'
  9 | #       jupytext_version: 1.4.2
 10 | #   kernelspec:
 11 | #     display_name: Python 3
 12 | #     language: python
 13 | #     name: python3
 14 | # ---
 15 | 
 16 | # + [markdown] Collapsed="false"
 17 | # # Data Management and Database Basics
 18 | 
 19 | # + [markdown] Collapsed="false"
 20 | # ## Motivation
 21 | 
 22 | # + [markdown] Collapsed="false"
 23 | # <img src="https://preview.redd.it/gph4rp6drvo41.jpg?width=640&crop=smart&auto=webp&s=a407a7be1da73ba010f0295a6351ab9d14471b2a" width=400 />
 24 | 
 25 | # + [markdown] Collapsed="false"
 26 | # ## Overview
 27 | 
 28 | # + [markdown] Collapsed="false"
 29 | # 1. Pre-SQL  (Robin)
 30 | # 2. SQL databases  (Ali/Emilio)
 31 | # 3. Non-SQL databases  (Ali)
 32 | # 4. Simple graph database introduction  (Robin?)
 33 | 
 34 | # + [markdown] Collapsed="false"
 35 | # # Pre-SQL
 36 | 
 37 | # + [markdown] Collapsed="false"
 38 | # - You kind of have data, but not really that much.
 39 | # - You want to organize it better,  but keep things lightweight to share.
 40 | 
 41 | # + [markdown] Collapsed="false"
 42 | # ## Working with CSV files
 43 | 
 44 | # + Collapsed="false"
 45 | 
 46 | 
 47 | 
 48 | # + [markdown] Collapsed="false"
 49 | # ### Efficiently reading last lines
 50 | 
 51 | # + [markdown] Collapsed="false" slideshow={"slide_type": "slide"}
 52 | # # SQL
 53 | 
 54 | # + [markdown] slideshow={"slide_type": "slide"}
 55 | # ## Introduction
 56 | #  - SQL is a declarative programming language to manipulate tables
 57 | #    - no functions or loops, just _declare_ what you need and the runtime will figure out how to compute it
 58 | #  - SQL queries can be used to
 59 | #    - Insert new rows into a table
 60 | #    - Delete rows from a table
 61 | #    - Ipdate one or more attributes of one or more rows in a table
 62 | #    - Retrieve and possibly transform rows combing from one or more tables
 63 | #  - Relational Database Management System (RDBMS)
 64 | #    - Manages data in the tables
 65 | #    - Executes queries, returns results
 66 | #  - This section will mostly focus on reading data (last point)
 67 | 
 68 | # + [markdown] slideshow={"slide_type": "slide"}
 69 | # ## Main abstraction: Tables
 70 | #  - A table is a _set_ of tuples (rows)
 71 | #    - No two rows are the same
 72 | #  - Rows are distinguished by _primary keys_
 73 | #    - Primary key: smallest set of attributes that uniquely identifies a row
 74 | #    - Cannot have two rows with the same primary key
 75 | #    - Examples:
 76 | #      - Student ID (one attribute)
 77 | #      - First name, last name, birth date, place of birth (four attributes)
 78 | #    - The primary key is a property of each table
 79 | #      - All rows in a table use the same attributes as primary key
 80 | #      - But different tables can have different primary keys
 81 | 
 82 | # + [markdown] slideshow={"slide_type": "slide"}
 83 | # ## Domain
 84 | #  - Good database design has
 85 | #    - One table for each entity in the domain
 86 | #    - Relationships between two or more entities
 87 | #  - _Foreign keys_ are used to refer to rows of other tables
 88 | #    - e.g. a table with grades will have foreign keys that point to the student and the course
 89 | 
 90 | # + [markdown] slideshow={"slide_type": "slide"}
 91 | # ### Example: University
 92 | #  - Entities
 93 | #    - Students (ID, Name, Degree)
 94 | #    - Courses (ID, Title, Faculty, Semester)
 95 | #    - Professors (ID, Name, Chair)
 96 | #  - Relationships
 97 | #    - One student can *Mentor* another student
 98 | #    - A student *Attends* several courses and obtains a grade for each of them
 99 | #    - Professors *Teach* courses
100 | 
101 | # + [markdown] slideshow={"slide_type": "slide"}
102 | # ### ER diagram
103 | #  - Graphical form to represent entities and relationships
104 | #    - Box: entity
105 | #    - Diamond: relationship
106 | #    - Circle: attribute
107 | #  
108 | #  
109 | # ![](../img/sql_er_diagram.png)
110 | 
111 | # + [markdown] slideshow={"slide_type": "slide"}
112 | # ### Which tables to create?
113 | #  - Until now, we separated entities from relationships
114 | #  - But in practice everything must be stored into tables
115 | #  - How to do this?
116 | #    - One table per entity (students, courses, professors)
117 | #    - What about the relationships?
118 | #      - Mentor: 1 to 1, three possibilities
119 | #        1. Have a column "mentor"
120 | #        2. Have a column "mentee"
121 | #           - Having both is not ideal: more work to ensure consistency
122 | #        3. Have a new table (mentor, mentee)
123 | #      - Attends: M to N
124 | #        - Requires a table (student, course)
125 | #      - Teaches: 1 to N
126 | #        - Store professor in course table or create separate table
127 | #    - General rule:
128 | #      - Using a separate table is always possible, or
129 | #      - 1 to 1: can store in either entity
130 | #      - 1 to N: store in entity with cardinality N
131 | #      - M to N: must use separate table
132 | 
133 | # + [markdown] slideshow={"slide_type": "slide"}
134 | # ### Final list of tables
135 | #  - Students(ID, Name, Degree, Mentor)
136 | #  - Professors(ID, Name, Chair)
137 | #  - Courses(ID, Title, Faculty, Semester, Professor)
138 | #  - Attends(Student, Course, Grade)
139 | #  - Which attributes are primary and foreign keys?
140 | 
141 | # + [markdown] slideshow={"slide_type": "slide"}
142 | # ## Purpose of SQL
143 | #  - SQL shines when "navigating" across relationships, for example:
144 | #    - For each student, find the professor that gave them the highest grade
145 | #    - For each professor, find courses taught last semester
146 | #  - Also used to modify data, tables, databases, etc.
147 | #    - Not discussed in this course
148 | 
149 | # + [markdown] slideshow={"slide_type": "slide"}
150 | # ## Anatomy of a SELECT query
151 | #  - SELECT queries are used to retrieve data from the database
152 | #  - The result is itself a table (not saved unless specified)
153 | #
154 | # ```
155 | # SELECT <columns and transformation>
156 | # FROM <source table(s)>
157 | # [WHERE <filter rows coming from the source table(s)>]
158 | # [GROUP BY <create groups of rows>
159 | # [HAVING <filter groups>]]
160 | # [ORDER BY <columns> [ASC|DESC]];
161 | # ```
162 | #
163 | #  - Must have SELECT and FROM
164 | #  - WHERE and GROUPBY are optional
165 | #  - HAVING is optional, and must be used with GROUP BY
166 | #  - GROUP BY: eventually you must have only one row per group
167 | 
168 | # + [markdown] slideshow={"slide_type": "slide"}
169 | # ## Example
170 | #
171 | # Find all courses held in the Winter semester 2019/2020:
172 | #
173 | # ```sql
174 | # SELECT *
175 | # FROM Courses
176 | # WHERE Semester = 'WiSE 19/20';
177 | # ```
178 | 
179 | # + [markdown] slideshow={"slide_type": "slide"}
180 | # ## Select query untangled
181 | #  - Confusingly, the execution order is different than the writing order:
182 | #    1. FROM: first, gather all input rows from all tables
183 | #    2. WHERE: next, remove all rows not matching the predicate
184 | #    3. GROUP BY: now, if needed, create groups of rows
185 | #    4. HAVING: then, remove all groups that do not match the predicate
186 | #    5. ORDER BY: sort the tuples by a the value of a certain column
187 | #    6. SELECT: finally, produce output columns
188 | 
189 | # + [markdown] slideshow={"slide_type": "slide"}
190 | # ## Interactive SQL console
191 | #
192 | # An interactive SQL console with a few tables can be accessed at [w3schools.com](https://www.w3schools.com/sql/trysql.asp?filename=trysql_select_all)
193 | #
194 | #  - Go to w3schools.com
195 | #  - Scroll until SQL, on the left side there will be a query and a button "Try it Yourself"
196 | #  - I encourage you to fiddle around while I am explaining
197 | #  - They also have a (superficial) command reference
198 | #  
199 | # ![](../img/w3trysql.png)
200 | 
201 | # + [markdown] slideshow={"slide_type": "slide"}
202 | # ## Interactive SQL console
203 | #
204 | # ![](../img/w3sqled.png)
205 | 
206 | # + [markdown] slideshow={"slide_type": "slide"}
207 | # ## FROM: source tables
208 | #  - You can specify one or more tables in the from clause
209 | #  - FROM will do a cross-product of all tuples of all tables
210 | #  - In almost all cases, you only want a small subset of the cross-product
211 | #    - Use WHERE to remove tuples that do not make sense
212 | #  - Possible to give aliases to tables and use that alias in the rest of the query
213 | #    - Useful to keep query short and to disambiguate when the same table is used several times in the same query
214 | 
215 | # + [markdown] slideshow={"slide_type": "slide"}
216 | # ## WHERE: tuple filter
217 | #  - Specify a boolean condition that is evaluated for each row produced by the FROM
218 | #  - All rows where this evaluates to false are discarded
219 | #  - Example: Associate to each student all its grades (one per row)
220 | #
221 | # ```sql
222 | # SELECT *
223 | # FROM
224 | #     Students AS s,
225 | #     Attend AS a,
226 | #     Course AS c
227 | # WHERE
228 | #     s.ID = a.Student
229 | #     AND a.Course = c.ID;
230 | # ```
231 | 
232 | # + [markdown] slideshow={"slide_type": "slide"}
233 | # ## WHERE: handling of NULL values 
234 | #  
235 | #  - NULL is used for "undefined" values
236 | #  - Nothing is equal to NULL (not even NULL)
237 | #    - `x = NULL` always equals NULL (i.e. false)
238 | #  - Use instead `x IS NULL` or `x IS NOT NULL`
239 | #  - Nasty example: `SELECT * FROM table WHERE x = 10 OR NOT x = 10`
240 | #    - When `x` contains NULLs this equals `WHERE x IS NOT NULL`
241 | #    - Dumb fix: `WHERE x = 10 OR NOT x = 10 OR x IS NULL`
242 | 
243 | # + [markdown] slideshow={"slide_type": "slide"}
244 | # ## JOIN: a special case of FROM+WHERE
245 | #  - In most cases, we are not interested in the cross-product
246 | #  - We actually want tuples that match primary/foreign keys
247 | #  - This operation is so common that it has a special name to distinguish it from the general case
248 | #  - Other than the name, the two are completely equivalent
249 | #  - Join makes your intentions clearer
250 | #  - The previous query becomes:
251 | #
252 | # ```sql
253 | # SELECT *
254 | # FROM
255 | #     Students AS s
256 | #     JOIN Attend AS a
257 | #         ON s.ID = a.Student
258 | #     JOIN Course AS c
259 | #         ON c.ID = a.Course;
260 | # ```
261 | 
262 | # + [markdown] slideshow={"slide_type": "slide"}
263 | # ## Non-matching rows in JOINs
264 | #  - Options to handle non-matches:
265 | #    - Inner join: Only keep matches
266 | #    - Left join: keep matches and un-matched records from _left_ table
267 | #    - Right join: keep matches and un-matched records from _right_ table
268 | #    - Outer join: keep matches, cross-product between un-matched records
269 | #  - Other possibilities:
270 | #     - Natural join (`ON` is missing): match all columns with the same name
271 | #     - Self-join: A table with itself (e.g. to find a student's mentor)
272 | 
273 | # + [markdown] slideshow={"slide_type": "slide"}
274 | # ### INNER JOIN
275 | #
276 | # ```sql
277 | # FROM Students [INNER] JOIN Attend
278 | #     ON Student.ID = Attend.Student
279 | # ```
280 | #
281 | # ![](../img/sql_join_inner.svg)
282 | 
283 | # + [markdown] slideshow={"slide_type": "slide"}
284 | # ### LEFT JOIN
285 | #
286 | # ```sql
287 | # FROM Students LEFT JOIN Attend
288 | #     ON Student.ID = Attend.Student
289 | # ```
290 | #
291 | # ![](../img/sql_join_left.svg)
292 | 
293 | # + [markdown] slideshow={"slide_type": "slide"}
294 | # ### RIGHT JOIN
295 | #
296 | # ```sql
297 | # FROM Students RIGHT JOIN Attend
298 | #     ON Student.ID = Attend.Student
299 | # ```
300 | #
301 | # ![](../img/sql_join_right.svg)
302 | 
303 | # + [markdown] slideshow={"slide_type": "slide"}
304 | # ### OUTER JOIN
305 | #
306 | # ```sql
307 | # FROM Students OUTER JOIN Attend
308 | #     ON Student.ID = Attend.Student
309 | # ```
310 | #
311 | # ![](../img/sql_join_outer.svg)
312 | #
313 | #
314 | # Warning: cross-product between unmatched rows!
315 | 
316 | # + [markdown] slideshow={"slide_type": "slide"}
317 | # ### Retrieving un-matched rows only
318 | #
319 | #  - Example: find all students who have not attended any course
320 | #
321 | # ```sql
322 | # SELECT Students.ID
323 | # FROM Students LEFT JOIN Attend
324 | #     ON Students.ID = Attends.Student
325 | # WHERE
326 | #     Attends.Student IS NULL
327 | # ```
328 | #
329 | # ![](../img/sql_join_unmatched_only.svg)
330 | 
331 | # + [markdown] slideshow={"slide_type": "slide"}
332 | # ## GROUP BY: create groups of rows
333 | #  - must specify one or more columns, possibly with transformation
334 | #  - all rows that have the same values for all (transformed) column(s) end up in the same group
335 | 
336 | # + [markdown] slideshow={"slide_type": "slide"}
337 | # ## HAVING: filter groups
338 | #  - A boolean condition applied to each group
339 | #  - Example: filter by group size, min/max/average of something..
340 | #  - Common case: counting
341 | #    - `COUNT(*)`: number of rows in the group
342 | #    - `COUNT(expr)`: number of rows where `expr` is not NULL
343 | #    - `COUNT(DISTINCT expr)`: number of unique values of `expr` (excluding NULLs)
344 | 
345 | # + [markdown] slideshow={"slide_type": "slide"}
346 | # ## ORDER BY: order tuples
347 | #
348 | #  - Sort the tuples produced by the query
349 | #  - Sort by the value of one or more columns, possibly transformed
350 | #  - Possible to order by aggregations (count/min/max/sum/avg)
351 | 
352 | # + [markdown] slideshow={"slide_type": "slide"}
353 | # ## SELECT: produce output columns
354 | #  - All the surviving groups/rows are transformed
355 | #  - Select only a subset of attributes, or transform values
356 | #  - Careful: each group must be collapsed into a row
357 | 
358 | # + [markdown] slideshow={"slide_type": "slide"}
359 | # # Examples
360 | 
361 | # + [markdown] slideshow={"slide_type": "slide"}
362 | # ## Example 1
363 | #
364 | # Find the ID of all students who failed at least one exam.
365 | #
366 | # ```sql
367 | # SELECT ...
368 | # ```
369 | #
370 | # Tables:
371 | #  - Students(ID, Name, Degree, Mentor)
372 | #  - Professors(ID, Name, Chair)
373 | #  - Courses(ID, Title, Faculty, Semester, Professor)
374 | #  - Attends(Student, Course, Grade)
375 | 
376 | # + [markdown] slideshow={"slide_type": "slide"}
377 | # ## Example 1
378 | #
379 | # Find the ID of all students who failed at least one exam.
380 | #
381 | # ```sql
382 | # SELECT Student
383 | # FROM Attends
384 | # WHERE Grade > 5
385 | # ```
386 | #
387 | #
388 | # Tables:
389 | #  - Students(ID, Name, Degree, Mentor)
390 | #  - Professors(ID, Name, Chair)
391 | #  - Courses(ID, Title, Faculty, Semester, Professor)
392 | #  - Attends(Student, Course, Grade)
393 | 
394 | # + [markdown] slideshow={"slide_type": "slide"}
395 | # ## Example 2
396 | #
397 | # Find how many exams each student failed.
398 | #
399 | # ```sql
400 | # SELECT ...
401 | # ```
402 | #
403 | # Tables:
404 | #  - Students(ID, Name, Degree, Mentor)
405 | #  - Professors(ID, Name, Chair)
406 | #  - Courses(ID, Title, Faculty, Semester, Professor)
407 | #  - Attends(Student, Course, Grade)
408 | 
409 | # + [markdown] slideshow={"slide_type": "slide"}
410 | # ## Example 2
411 | #
412 | # Find how many exams each student failed.
413 | #
414 | # ```sql
415 | # SELECT Student, COUNT(*)
416 | # FROM Attends
417 | # WHERE Grade > 5
418 | # GROUP BY Student
419 | # ```
420 | #
421 | #
422 | # Tables:
423 | #  - Students(ID, Name, Degree, Mentor)
424 | #  - Professors(ID, Name, Chair)
425 | #  - Courses(ID, Title, Faculty, Semester, Professor)
426 | #  - Attends(Student, Course, Grade)
427 | 
428 | # + [markdown] slideshow={"slide_type": "slide"}
429 | # ## Example 3
430 | #
431 | # Find how many exams each student failed, only for the students who failed at least 2.
432 | #
433 | # ```sql
434 | # SELECT ...
435 | # ```
436 | #
437 | #
438 | # Tables:
439 | #  - Students(ID, Name, Degree, Mentor)
440 | #  - Professors(ID, Name, Chair)
441 | #  - Courses(ID, Title, Faculty, Semester, Professor)
442 | #  - Attends(Student, Course, Grade)
443 | 
444 | # + [markdown] slideshow={"slide_type": "slide"}
445 | # ## Example 3
446 | #
447 | # Find how many exams each student failed, only for the students who failed at least 2.
448 | #
449 | #
450 | # ```sql
451 | # SELECT Student, COUNT(*)
452 | # FROM Attends
453 | # WHERE Grade > 5
454 | # GROUP BY Student
455 | # HAVING COUNT(*) > 1
456 | # ```
457 | #
458 | #
459 | # Tables:
460 | #  - Students(ID, Name, Degree, Mentor)
461 | #  - Professors(ID, Name, Chair)
462 | #  - Courses(ID, Title, Faculty, Semester, Professor)
463 | #  - Attends(Student, Course, Grade)
464 | 
465 | # + [markdown] slideshow={"slide_type": "slide"}
466 | # ## Example 4
467 | #
468 | # Find how many courses each student failed, only for the students who failed at least 2 exams.
469 | #
470 | # ```sql
471 | # SELECT ...
472 | # ```
473 | #
474 | #
475 | # Tables:
476 | #  - Students(ID, Name, Degree, Mentor)
477 | #  - Professors(ID, Name, Chair)
478 | #  - Courses(ID, Title, Faculty, Semester, Professor)
479 | #  - Attends(Student, Course, Grade)
480 | 
481 | # + [markdown] slideshow={"slide_type": "slide"}
482 | # ## Example 4
483 | #
484 | # Find how many courses each student failed, only for the students who failed at least 2 exams.
485 | #
486 | # ```sql
487 | # SELECT Student, COUNT(DISTINCT Course)
488 | # FROM Attends
489 | # WHERE Grade > 5
490 | # GROUP BY Student
491 | # HAVING COUNT(*) > 1
492 | # ```
493 | #
494 | #
495 | # Tables:
496 | #  - Students(ID, Name, Degree, Mentor)
497 | #  - Professors(ID, Name, Chair)
498 | #  - Courses(ID, Title, Faculty, Semester, Professor)
499 | #  - Attends(Student, Course, Grade)
500 | 
501 | # + [markdown] slideshow={"slide_type": "slide"}
502 | # # Transactions and ACID properties
503 | #
504 | #  - When the data is read and modified by several clients at the same time, care must be taken
505 | #  - Read/modify/write workflows especially vulnerable
506 | #  - Transaction: a set of queries (reads and/or writes)
507 | #    - Atomicity: sequence of operations appears as as a single operation on the data
508 | #      - Either all operations succeed, or the all modifications are undone
509 | #    - Consistency: database invariants are always satisfied regardless of the outcome
510 | #       - Invariants: uniqueness, non-empty values, primary/foreign keys, etc.
511 | #    - Isolation: different transactions cannot "see" each other
512 | #       - Order of transactions does not matter
513 | #    - Durability: once completed, the modifications are permanent
514 | #       - Useful in case of crashes
515 | #   - All of this is handled automatically by the DBMS
516 | #     - Users only need to declare start/end and outcome of the transaction
517 | 
518 | # + [markdown] slideshow={"slide_type": "slide"}
519 | # # Interfacing to a RDBMS
520 | #
521 | # Three types of clients
522 | #
523 | #  1. Command line clients
524 | #  2. Graphical clients
525 | #  3. Programmatic access
526 | 
527 | # + [markdown] slideshow={"slide_type": "slide"}
528 | # ## Command line clients
529 | #
530 | # Enter SQL queries and administrative commands directly from the command line:
531 | #
532 | # ```
533 | # $ sqlite3
534 | # SQLite version 3.32.3 2020-06-18 14:00:33
535 | # Enter ".help" for usage hints.
536 | # Connected to a transient in-memory database.
537 | # Use ".open FILENAME" to reopen on a persistent database.
538 | # sqlite> 
539 | # ```
540 | #
541 | # ```
542 | # $ psql -U user -h 10.0.6.12 -p 21334 -d database
543 | # psql (11.1, server 11.0)
544 | # Type "help" for help.
545 | #
546 | # postgres=# 
547 | # ```
548 | 
549 | # + [markdown] slideshow={"slide_type": "slide"}
550 | # ## Graphical clients
551 | #
552 | # Database-specific:
553 | #  - pgAdmin (PostgreSQL)
554 | #  - SQLite Browser (SQLite)
555 | #  - MySQL Workbench (MySQL)
556 | #
557 | # General purpose:
558 | #  - [SQuirreL](http://squirrel-sql.sourceforge.net)
559 | #  - [SQLAdmin](http://sqladmin.sourceforge.net/)
560 | 
561 | # + [markdown] slideshow={"slide_type": "slide"}
562 | # ### SQuirreL example: querying
563 | # ![](http://squirrel-sql.sourceforge.net/screenshots/15_edit_result.png)
564 | 
565 | # + [markdown] slideshow={"slide_type": "slide"}
566 | # ### SQuirreL example: visualizing tables
567 | #
568 | # ![](http://squirrel-sql.sourceforge.net/screenshots/7_graph.png)
569 | 
570 | # + [markdown] slideshow={"slide_type": "slide"}
571 | # ## Programmatic Access
572 | #
573 | # Two types of APIs:
574 | #
575 | #  1. High-level: Object-relational mapping (ORM)
576 | #     - Each table has a corresponding class in the code
577 | #     - Operations on objects are automatically translated on queries
578 | #     - These libraries can work with many SQL databases
579 | #  2. Low-level: Directly write SQL queries as strings
580 | #     - Usually tied to a specific type of SQL database
581 | 
582 | # + [markdown] slideshow={"slide_type": "slide"}
583 | # ### SQLAlchemy: ORM in Python
584 | #
585 | # Example from [pythoncentral.io](https://www.pythoncentral.io/overview-sqlalchemys-expression-language-orm-queries/).
586 | #
587 | # Tables:
588 | #
589 | # ```python
590 | # class Department(Base):
591 | #     __tablename__ = 'department'
592 | #     id = Column(Integer, primary_key=True)
593 | #     name = Column(String)
594 | #     employees = relationship('Employee', secondary='department_employee')
595 | #
596 | # class Employee(Base):
597 | #     __tablename__ = 'employee'
598 | #     id = Column(Integer, primary_key=True)
599 | #     name = Column(String)
600 | #     departments = relationship('Department', secondary='department_employee')
601 | #
602 | # class DepartmentEmployee(Base):
603 | #     __tablename__ = 'department_employee'
604 | #     department_id = Column(Integer, ForeignKey('department.id'), primary_key=True)
605 | #     employee_id = Column(Integer, ForeignKey('employee.id'), primary_key=True)
606 | # ```
607 | 
608 | # + [markdown] slideshow={"slide_type": "slide"}
609 | # ### Inserting data in SQLAlchemy
610 | #
611 | # ```python
612 | # from sqlalchemy import create_engine
613 | # engine = create_engine('sqlite:///')
614 | #
615 | # from sqlalchemy.orm import sessionmaker
616 | # session = sessionmaker()
617 | # session.configure(bind=engine)
618 | # Base.metadata.create_all(engine)
619 | #
620 | # s = session()
621 | # john = Employee(name='john')
622 | # s.add(john)
623 | # it_department = Department(name='IT')
624 | # it_department.employees.append(john)
625 | # s.add(it_department)
626 | # s.commit()
627 | # ```
628 | 
629 | # + [markdown] slideshow={"slide_type": "slide"}
630 | # ### Querying in SQLAlchemy
631 | #
632 | # asd
633 | #
634 | # ```python
635 | # find_marry = select([
636 | #     Employee.id
637 | # ]).select_from(
638 | #     Employee.__table__.join(DepartmentEmployee)
639 | # ).group_by(
640 | #     Employee.id
641 | # ).having(func.count(
642 | #     DepartmentEmployee.department_id
643 | # ) > 1)
644 | #
645 | # rs = s.execute(find_marry) 
646 | # rs.fetchall()  # result: [(2,)]
647 | # ```
648 | 
649 | # + [markdown] slideshow={"slide_type": "slide"}
650 | # ### Accessing SQLite in Python
651 | #
652 | # Example from [python3.org](https://docs.python.org/3/library/sqlite3.html)
653 | #
654 | # ```python
655 | # import sqlite3
656 | # conn = sqlite3.connect('example.db')
657 | #
658 | # c = conn.cursor()
659 | #
660 | # c.execute('''CREATE TABLE stocks
661 | #              (date text, trans text, symbol text, qty real, price real)''')
662 | #
663 | # c.execute("INSERT INTO stocks VALUES ('2006-01-05','BUY','RHAT',100,35.14)")
664 | #
665 | # conn.commit()
666 | #
667 | # conn.close()
668 | # ```
669 | 
670 | # + [markdown] slideshow={"slide_type": "slide"}
671 | # ### Querying SQLite in Python
672 | #
673 | # ```python
674 | # purchases = [('2006-03-28', 'BUY', 'IBM', 1000, 45.00),
675 | #              ('2006-04-05', 'BUY', 'MSFT', 1000, 72.00),
676 | #              ('2006-04-06', 'SELL', 'IBM', 500, 53.00)]
677 | #
678 | # c.executemany('INSERT INTO stocks VALUES (?,?,?,?,?)', purchases)
679 | #
680 | # for row in c.execute('SELECT * FROM stocks WHERE price<? ORDER BY price', [2000,]):
681 | #     print(row)
682 | #
683 | # # ('2006-03-28', 'BUY', 'IBM', 1000, 45.0)
684 | # # ('2006-04-06', 'SELL', 'IBM', 500, 53.0)
685 | # # ('2006-04-05', 'BUY', 'MSFT', 1000, 72.0)
686 | # ```
687 | 
688 | # + [markdown] slideshow={"slide_type": "slide"}
689 | # ## Programmatic access to SQL databases
690 | #
691 | # Common concepts:
692 | #
693 | # | SQLite | SQLAlchemy | Purpose |
694 | # |-|-|-|
695 | # | Connection | Engine | The database object |
696 | # | Cursor | Session | A transaction |
697 | #
698 | # General workflow:
699 | #
700 | #  1. Obtain a connection
701 | #     - Use a connection pool if performance is a concern
702 | #  2. Obtain a cursor
703 | #  3. Execute queries
704 | #  4. Close cursor
705 | #  5. Possibly repeat...
706 | #  6. Close connection
707 | 
708 | # + [markdown] slideshow={"slide_type": "slide"}
709 | # ### SQL injection
710 | #
711 | #  - An once-popular cyber-attack on SQL databases
712 | #  - Caused by improper escaping of arguments coming from external users (e.g. in a web form)
713 | #  - Never trust user input!
714 | #
715 | # Example:
716 | #
717 | # ```python
718 | # def find_courses(conn, semester):
719 | #     c = conn.cursor()
720 | #     return c.execute(
721 | #         "SELECT * FROM Courses WHERE Semester={}".format(semester)
722 | #     ).fetchall()
723 | #
724 | # # later on...
725 | # find_courses(conn, "''; DROP TABLE Courses; --")
726 | # ```
727 | #
728 | # This will execute **two** queries:
729 | #
730 | # ```sql
731 | # SELECT * FROM Courses WHERE Semester='';
732 | # DROP TABLE Courses; --
733 | # ```
734 | #
735 | 
736 | # + [markdown] slideshow={"slide_type": "slide"}
737 | # ### Avoiding SQL injection
738 | #
739 | # Let the API handle escaping for you:
740 | #
741 | # ```python
742 | # def find_courses(conn, semester):
743 | #     c = conn.cursor()
744 | #     return c.execute(
745 | #         "SELECT * FROM Courses WHERE Semester=?",
746 | #         [semester]
747 | #     ).fetchall()
748 | #
749 | # # later on...
750 | # find_courses(conn, "''; DROP TABLE Courses; --")
751 | # ```
752 | #
753 | # This will execute **one** query:
754 | #
755 | # ```sql
756 | # SELECT * FROM Courses WHERE Semester='''''; DROP TABLE Courses; --'
757 | # ```
758 | 
759 | # + [markdown] slideshow={"slide_type": "slide"}
760 | # # Where should I start from?
761 | #
762 | # [SQLite](https://sqlite.org/index.html) is very simple and scales well.
763 | #
764 | # [PostgreSQL](https://www.postgresql.org/) for more complicated requirements / large scale data processing.
765 | #
766 | # Also frequently used: [MySQL](https://www.mysql.com/)
767 | 
768 | # + [markdown] slideshow={"slide_type": "slide"}
769 | # # Advanced topics
770 | 
771 | # + [markdown] slideshow={"slide_type": "slide"}
772 | # ## Indexing
773 | #  - depending on your query and how you express it, it may be quite slow
774 | #  - the DBMS tries to optimize every query, but sometimes it fails
775 | #  - when most of the time is spent on joins and lookups, creating _indices_ can greatly speed up the query
776 | #  - an index is just a mapping from values to rows that contain that value in one or more columns
777 | #  - this makes it much faster to find rows that contain a given value
778 | #    - instead of checking row by row, simply look in the index
779 | #    - think about books!
780 | #  - an index is always relative to a table and one or more columns
781 | #    - `CREATE INDEX <index name> ON <table name>(<list of columns>)`
782 | #  - a table can have many indices
783 | #    - an index is always created automatically for primary keys
784 | #    - all other unique keys must also have an index
785 | #    - indices on foreign keys _might_ be useful
786 | #    - WHERE/JOIN are much faster when there is an index on one of the columns
787 | #  - if a query is slow and/or executed very frequently, consider adding an index on columns used in the WHERE/JOIN
788 | 
789 | # + [markdown] slideshow={"slide_type": "slide"}
790 | # ## Main types of index
791 | #  - Tree-based: O(log N) access, can be used to quickly answer queries like `WHERE L < column < U`
792 | #    - Branching factor in the order of 1000s
793 | #  - Hash-based: O(1) access, cannot answer range queries
794 | #  - Clustered index: table is physically sorted by the columns
795 | 
796 | # + [markdown] slideshow={"slide_type": "slide"}
797 | # ## Query plans
798 | #  - understanding why a query is slow is not trivial
799 | #  - the query plan is produced by the optimizer and shows exactly what and how is done to execute the query
800 | #  - it contains an estimated cost and can be augmented with the actual cost measured when executing the query
801 | #  - estimated cost:
802 | #    - computed from statistics about rows/values that the DBMS maintains internally
803 | #    - these statistics can become inaccurate after lots of operations
804 | #    - useful to periodically recompute these statistics
805 | #    - also useful to periodically clear the space allocated to deleted rows and defragment table data
806 | #  - (show example of plans before/after adding an index)
807 | 
808 | # + [markdown] slideshow={"slide_type": "slide"}
809 | # ### Example
810 | #
811 | # ![](../img/qplan.png)
812 | #
813 | # Image from [dba.stackexchange.com](https://dba.stackexchange.com/q/9234)
814 | 
815 | # + [markdown] Collapsed="false" slideshow={"slide_type": "slide"}
816 | # # Non-SQL
817 | 
818 | # + Collapsed="false"
819 | 
820 | 
821 | 
822 | # + [markdown] Collapsed="false"
823 | # # Graph Databases
824 | 
825 | # + [markdown] Collapsed="false"
826 | # ## Graph Theory
827 | 
828 | # + Collapsed="false"
829 | 
830 | 
831 | 
832 | # + [markdown] Collapsed="false"
833 | # ## Neo4j
834 | 
835 | # + Collapsed="false"
836 | 
837 | 


--------------------------------------------------------------------------------