├── LICENSE ├── README.md ├── _config.yml ├── apachespark └── README.md ├── conferences ├── README.md ├── images │ └── mlflow_pydata_miami.png └── pydata │ ├── pydata_miami │ ├── PyDataMiami.dbc │ └── slides │ │ └── PySparkStructuredStreaming_FINAL.pdf │ └── pydata_sf │ └── PyBay2019.dbc ├── databricks └── README.md ├── delta-lake └── README.md ├── images ├── tutorial_welcome_page.jpg └── tutorial_welcome_page.png ├── koalas └── README.md └── mlflow ├── README.md ├── extras ├── boston_housing_tensorflow.py ├── keras_minst.py ├── keras_mnist_lab_5.py ├── lab_classes.py ├── lab_utils.py ├── load_predict_deploy_model_ans_lab_6.py ├── mlflow_example_wine.py ├── mlflow_object.py ├── mlruns │ └── 0 │ │ └── meta.yaml └── plot_confusion_matrix.py ├── images ├── bank_note.png ├── fake_note.jpeg ├── intro_slide.png ├── mlflow_project.png ├── mlproject_file.png ├── mnist_1layer.png ├── nn_linear_regression.png └── pyfunc_models.png ├── labs ├── 00_get_started.py ├── 01_petrol_regression_lab.py ├── 02_banknote_classification_lab.py ├── 03_airbnb_base_lab.py ├── 04_airbnb_exp_lab.py ├── 05_tf_keras_mnist_lab.py ├── 06_load_predict_model_lab.py ├── 07_tensorflow_keras_petrol_regression.py ├── 09_register_model_apis.py ├── __init__.py ├── data │ ├── airbnb-cleaned-mlflow.csv │ ├── bill_authentication.csv │ ├── petrol_consumption.csv │ ├── test_bill_authentication.csv │ ├── test_petrol_consumption.csv │ ├── windfarm_data.csv │ └── wine-quality.csv ├── experiment_ce.py └── lab_cls │ ├── __init__.py │ ├── keras_model.py │ ├── lab_utils.py │ ├── rfc_model.py │ ├── rfr_base_exp_model.py │ ├── rfr_model.py │ └── tf_keras_model.py ├── notebooks └── databricks │ └── MLflowConferenceTutorials.dbc ├── req.txt ├── slides ├── mlflow-ODSC_FINAL.pdf └── mlflow-strataNY_FINAL.pdf └── solutions ├── 01_lab.py ├── 02_lab.py ├── 03_lab.py ├── 04_lab.py ├── 05_lab.py ├── 06_lab.py ├── 08_lab.py ├── __init__.py ├── data ├── airbnb-cleaned-mlflow.csv ├── bill_authentication.csv ├── petrol_consumption.csv ├── test_bill_authentication.csv ├── test_petrol_consumption.csv └── wine-quality.csv └── sol_cls ├── __init__.py ├── keras_model.py ├── lab_utils.py ├── rfc_model.py ├── rfr_base_exp_model.py └── rfr_model.py /LICENSE: -------------------------------------------------------------------------------- 1 | Apache License 2 | Version 2.0, January 2004 3 | http://www.apache.org/licenses/ 4 | 5 | TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 6 | 7 | 1. Definitions. 8 | 9 | "License" shall mean the terms and conditions for use, reproduction, 10 | and distribution as defined by Sections 1 through 9 of this document. 11 | 12 | "Licensor" shall mean the copyright owner or entity authorized by 13 | the copyright owner that is granting the License. 14 | 15 | "Legal Entity" shall mean the union of the acting entity and all 16 | other entities that control, are controlled by, or are under common 17 | control with that entity. For the purposes of this definition, 18 | "control" means (i) the power, direct or indirect, to cause the 19 | direction or management of such entity, whether by contract or 20 | otherwise, or (ii) ownership of fifty percent (50%) or more of the 21 | outstanding shares, or (iii) beneficial ownership of such entity. 22 | 23 | "You" (or "Your") shall mean an individual or Legal Entity 24 | exercising permissions granted by this License. 25 | 26 | "Source" form shall mean the preferred form for making modifications, 27 | including but not limited to software source code, documentation 28 | source, and configuration files. 29 | 30 | "Object" form shall mean any form resulting from mechanical 31 | transformation or translation of a Source form, including but 32 | not limited to compiled object code, generated documentation, 33 | and conversions to other media types. 34 | 35 | "Work" shall mean the work of authorship, whether in Source or 36 | Object form, made available under the License, as indicated by a 37 | copyright notice that is included in or attached to the work 38 | (an example is provided in the Appendix below). 39 | 40 | "Derivative Works" shall mean any work, whether in Source or Object 41 | form, that is based on (or derived from) the Work and for which the 42 | editorial revisions, annotations, elaborations, or other modifications 43 | represent, as a whole, an original work of authorship. For the purposes 44 | of this License, Derivative Works shall not include works that remain 45 | separable from, or merely link (or bind by name) to the interfaces of, 46 | the Work and Derivative Works thereof. 47 | 48 | "Contribution" shall mean any work of authorship, including 49 | the original version of the Work and any modifications or additions 50 | to that Work or Derivative Works thereof, that is intentionally 51 | submitted to Licensor for inclusion in the Work by the copyright owner 52 | or by an individual or Legal Entity authorized to submit on behalf of 53 | the copyright owner. For the purposes of this definition, "submitted" 54 | means any form of electronic, verbal, or written communication sent 55 | to the Licensor or its representatives, including but not limited to 56 | communication on electronic mailing lists, source code control systems, 57 | and issue tracking systems that are managed by, or on behalf of, the 58 | Licensor for the purpose of discussing and improving the Work, but 59 | excluding communication that is conspicuously marked or otherwise 60 | designated in writing by the copyright owner as "Not a Contribution." 61 | 62 | "Contributor" shall mean Licensor and any individual or Legal Entity 63 | on behalf of whom a Contribution has been received by Licensor and 64 | subsequently incorporated within the Work. 65 | 66 | 2. Grant of Copyright License. Subject to the terms and conditions of 67 | this License, each Contributor hereby grants to You a perpetual, 68 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 69 | copyright license to reproduce, prepare Derivative Works of, 70 | publicly display, publicly perform, sublicense, and distribute the 71 | Work and such Derivative Works in Source or Object form. 72 | 73 | 3. Grant of Patent License. Subject to the terms and conditions of 74 | this License, each Contributor hereby grants to You a perpetual, 75 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 76 | (except as stated in this section) patent license to make, have made, 77 | use, offer to sell, sell, import, and otherwise transfer the Work, 78 | where such license applies only to those patent claims licensable 79 | by such Contributor that are necessarily infringed by their 80 | Contribution(s) alone or by combination of their Contribution(s) 81 | with the Work to which such Contribution(s) was submitted. If You 82 | institute patent litigation against any entity (including a 83 | cross-claim or counterclaim in a lawsuit) alleging that the Work 84 | or a Contribution incorporated within the Work constitutes direct 85 | or contributory patent infringement, then any patent licenses 86 | granted to You under this License for that Work shall terminate 87 | as of the date such litigation is filed. 88 | 89 | 4. Redistribution. You may reproduce and distribute copies of the 90 | Work or Derivative Works thereof in any medium, with or without 91 | modifications, and in Source or Object form, provided that You 92 | meet the following conditions: 93 | 94 | (a) You must give any other recipients of the Work or 95 | Derivative Works a copy of this License; and 96 | 97 | (b) You must cause any modified files to carry prominent notices 98 | stating that You changed the files; and 99 | 100 | (c) You must retain, in the Source form of any Derivative Works 101 | that You distribute, all copyright, patent, trademark, and 102 | attribution notices from the Source form of the Work, 103 | excluding those notices that do not pertain to any part of 104 | the Derivative Works; and 105 | 106 | (d) If the Work includes a "NOTICE" text file as part of its 107 | distribution, then any Derivative Works that You distribute must 108 | include a readable copy of the attribution notices contained 109 | within such NOTICE file, excluding those notices that do not 110 | pertain to any part of the Derivative Works, in at least one 111 | of the following places: within a NOTICE text file distributed 112 | as part of the Derivative Works; within the Source form or 113 | documentation, if provided along with the Derivative Works; or, 114 | within a display generated by the Derivative Works, if and 115 | wherever such third-party notices normally appear. The contents 116 | of the NOTICE file are for informational purposes only and 117 | do not modify the License. You may add Your own attribution 118 | notices within Derivative Works that You distribute, alongside 119 | or as an addendum to the NOTICE text from the Work, provided 120 | that such additional attribution notices cannot be construed 121 | as modifying the License. 122 | 123 | You may add Your own copyright statement to Your modifications and 124 | may provide additional or different license terms and conditions 125 | for use, reproduction, or distribution of Your modifications, or 126 | for any such Derivative Works as a whole, provided Your use, 127 | reproduction, and distribution of the Work otherwise complies with 128 | the conditions stated in this License. 129 | 130 | 5. Submission of Contributions. Unless You explicitly state otherwise, 131 | any Contribution intentionally submitted for inclusion in the Work 132 | by You to the Licensor shall be under the terms and conditions of 133 | this License, without any additional terms or conditions. 134 | Notwithstanding the above, nothing herein shall supersede or modify 135 | the terms of any separate license agreement you may have executed 136 | with Licensor regarding such Contributions. 137 | 138 | 6. Trademarks. This License does not grant permission to use the trade 139 | names, trademarks, service marks, or product names of the Licensor, 140 | except as required for reasonable and customary use in describing the 141 | origin of the Work and reproducing the content of the NOTICE file. 142 | 143 | 7. Disclaimer of Warranty. Unless required by applicable law or 144 | agreed to in writing, Licensor provides the Work (and each 145 | Contributor provides its Contributions) on an "AS IS" BASIS, 146 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 147 | implied, including, without limitation, any warranties or conditions 148 | of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A 149 | PARTICULAR PURPOSE. You are solely responsible for determining the 150 | appropriateness of using or redistributing the Work and assume any 151 | risks associated with Your exercise of permissions under this License. 152 | 153 | 8. Limitation of Liability. In no event and under no legal theory, 154 | whether in tort (including negligence), contract, or otherwise, 155 | unless required by applicable law (such as deliberate and grossly 156 | negligent acts) or agreed to in writing, shall any Contributor be 157 | liable to You for damages, including any direct, indirect, special, 158 | incidental, or consequential damages of any character arising as a 159 | result of this License or out of the use or inability to use the 160 | Work (including but not limited to damages for loss of goodwill, 161 | work stoppage, computer failure or malfunction, or any and all 162 | other commercial damages or losses), even if such Contributor 163 | has been advised of the possibility of such damages. 164 | 165 | 9. Accepting Warranty or Additional Liability. While redistributing 166 | the Work or Derivative Works thereof, You may choose to offer, 167 | and charge a fee for, acceptance of support, warranty, indemnity, 168 | or other liability obligations and/or rights consistent with this 169 | License. However, in accepting such obligations, You may act only 170 | on Your own behalf and on Your sole responsibility, not on behalf 171 | of any other Contributor, and only if You agree to indemnify, 172 | defend, and hold each Contributor harmless for any liability 173 | incurred by, or claims asserted against, such Contributor by reason 174 | of your accepting any such warranty or additional liability. 175 | 176 | END OF TERMS AND CONDITIONS 177 | 178 | APPENDIX: How to apply the Apache License to your work. 179 | 180 | To apply the Apache License to your work, attach the following 181 | boilerplate notice, with the fields enclosed by brackets "[]" 182 | replaced with your own identifying information. (Don't include 183 | the brackets!) The text should be enclosed in the appropriate 184 | comment syntax for the file format. We also recommend that a 185 | file or class name and description of purpose be included on the 186 | same "printed page" as the copyright notice for easier 187 | identification within third-party archives. 188 | 189 | Copyright Jules S. Damji 190 | 191 | Licensed under the Apache License, Version 2.0 (the "License"); 192 | you may not use this file except in compliance with the License. 193 | You may obtain a copy of the License at 194 | 195 | http://www.apache.org/licenses/LICENSE-2.0 196 | 197 | Unless required by applicable law or agreed to in writing, software 198 | distributed under the License is distributed on an "AS IS" BASIS, 199 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 200 | See the License for the specific language governing permissions and 201 | limitations under the License. 202 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | ![](images/tutorial_welcome_page.png) 2 | 3 | A repository to house learning resources, this repo contains tutorials, workshops, conference talks, and code examples from various open-source projects 4 | like Apache Spark, Delta Lake, Koalas, [MLflow](mlflow/README.md), and others. 5 | 6 | Use it for teaching or sharing, with appropriate attribution, and contribute back by filing an issue or PR request. 7 | 8 | More to come, so stay tuned ... :) 9 | 10 | Have fun! 11 | 12 | Jules 13 | 14 | -------------------------------------------------------------------------------- /_config.yml: -------------------------------------------------------------------------------- 1 | theme: jekyll-theme-cayman -------------------------------------------------------------------------------- /apachespark/README.md: -------------------------------------------------------------------------------- 1 | # Tutorials 2 | This repository contains tutorials for Apache Spark. 3 | -------------------------------------------------------------------------------- /conferences/README.md: -------------------------------------------------------------------------------- 1 | # Tutorials, Talks and Workshops 2 | 3 | This repository contains some talks, videos, and slides for the conferences. In each directory, you will 4 | find a _filename.dbc_, which is a collection of Databricks notebooks used as part of the tutorial and presenation. You can 5 | download these _.dbc_ files and upload them in your [Databricks Community Edition](https://databricks.com/try)*[]: 6 | 7 | 8 | * [PyData Miami](./pydata/pydata_miami) 9 | * [![MLflow Talk](./images/mlflow_pydata_miami.png)](https://youtu.be/w-x0fYFGmJY?list=PLGVZCDnMOq0qtkoXglrDC6pS8NvY94QQw) 10 | * [PyData SF](./pydata/pydata_sf) 11 | 12 | -------------------------------------------------------------------------------- /conferences/images/mlflow_pydata_miami.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/dmatrix/tutorials/bcf1a827f8f46b477576095b82db15424d38afd2/conferences/images/mlflow_pydata_miami.png -------------------------------------------------------------------------------- /conferences/pydata/pydata_miami/PyDataMiami.dbc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/dmatrix/tutorials/bcf1a827f8f46b477576095b82db15424d38afd2/conferences/pydata/pydata_miami/PyDataMiami.dbc -------------------------------------------------------------------------------- /conferences/pydata/pydata_miami/slides/PySparkStructuredStreaming_FINAL.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/dmatrix/tutorials/bcf1a827f8f46b477576095b82db15424d38afd2/conferences/pydata/pydata_miami/slides/PySparkStructuredStreaming_FINAL.pdf -------------------------------------------------------------------------------- /conferences/pydata/pydata_sf/PyBay2019.dbc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/dmatrix/tutorials/bcf1a827f8f46b477576095b82db15424d38afd2/conferences/pydata/pydata_sf/PyBay2019.dbc -------------------------------------------------------------------------------- /databricks/README.md: -------------------------------------------------------------------------------- 1 | # Tutorials 2 | This repository contains tutorials for Databricks. 3 | -------------------------------------------------------------------------------- /delta-lake/README.md: -------------------------------------------------------------------------------- 1 | # Tutorials 2 | This repository contains all tutorials for Delta Lake. 3 | -------------------------------------------------------------------------------- /images/tutorial_welcome_page.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/dmatrix/tutorials/bcf1a827f8f46b477576095b82db15424d38afd2/images/tutorial_welcome_page.jpg -------------------------------------------------------------------------------- /images/tutorial_welcome_page.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/dmatrix/tutorials/bcf1a827f8f46b477576095b82db15424d38afd2/images/tutorial_welcome_page.png -------------------------------------------------------------------------------- /koalas/README.md: -------------------------------------------------------------------------------- 1 | # Tutorials 2 | This repository contains all tutorials for Koalas. 3 | -------------------------------------------------------------------------------- /mlflow/README.md: -------------------------------------------------------------------------------- 1 | 2 | ![](images/intro_slide.png) 3 | # MLflow Tutorial Presented at Conferences 4 | ## Agenda 5 | * Overview of ML development challenges (40 mins) 6 | * How MLflow tackles these 7 | * MLflow Components 8 | * Mlflow Tracking 9 | * MLflow Projects 10 | * MLflow Models 11 | * MLflow Registry 12 | * Managed MLflow Registry Demo 13 | * Q & A 14 | * (Break?) 15 | * Set up Environment (10 mins) 16 | * Hands-on Tutorial (rest of the class) 17 | ### Prerequisites 18 | 1. Knowledge of Python 3 and programming in general 19 | 2. Preferably a UNIX-based, fully-charged laptop with 8-16 GB, with a Chrome or Firefox browser 20 | 3. Familiarity with GitHub, git, and an account on Github 21 | 4. Some Knowledge of some Machine Learning concepts, libraries, and frameworks 22 | * scikit-Learn 23 | * pandas and Numpy 24 | * matplotlib 25 | * TensorFlow/Keras 26 | 5. PyCharm/IntelliJ or choice of syntax-based Python editor 27 | 6. pip/pip3 or conda and Python 3 installed 28 | 7. Loads of laughter, curiosity, and a sense of humor ... :-) 29 | 30 | ### Installation and Setup environment 31 | 32 | 1. Open MLflow [docs](https://mlflow.org) and scikit-learn [docs](https://scikit-learn.org/stable/index.html) in your browser. Keep this tab open. 33 | 2. `git clone git@github.com:dmatrix/tutorials.git` or `git clone https://github.com/dmatrix/tutorials.git` 34 | 3. `cd /tutorials/mlflow/` 35 | 4. Install MLflow and the required Python modules 36 | * `pip install -r req.txt` or `pip3 install -r req.txt` 37 | 5. `cd labs` 38 | 6. If using PyCharm or IntelliJ, create a project and load source files in the project 39 | 7. **Optional**: Pre-register for [Databricks Community Edition](https://databricks.com/try-databricks) 40 | 41 | ### **Optional**: Configuring local host with MLflow Credentials for Community Edition (CE) 42 | 43 | 44 | **Note**: This step is **only** required if you're going to use CE to track experiment runs 45 | 46 | Good [Resource Blog](https://databricks.com/blog/2019/10/17/managed-mlflow-now-available-on-databricks-community-edition.html) 47 | 48 | 1. Run from your shell `databricks configure` 49 | 2. Answer the prompts 50 | 3. **Databricks Host (should begin with https://)**: _https://community.cloud.databricks.com_ 51 | 4. **Username**: _enter your community edition login credentials_ 52 | 5. **Password**: _enter password for community edition_ 53 | 6. Configure MLflow to communicate with the Community Edition server: `export MLFLOW_TRACKING_URI=databricks` 54 | 7. Test out your configuration by creating an experiment via the CLI: `mlflow experiments create -n /Users/username@email_addr/my-experiment` 55 | 56 | ### Documentation Resources 57 | 58 | 1. [MLflow](https://mlflow.org/docs/latest/index.html) 59 | 2. [Numpy](https://numpy.org/devdocs/user/quickstart.html) 60 | 3. [Pandas](https://pandas.pydata.org/pandas-docs/stable/reference/index.html) 61 | 4. [Scikit-Learn](https://scikit-learn.org/stable/index.html) 62 | 5. [Keras](https://keras.io/optimizers/) 63 | 6. [TensorFlow](https://tensorflow.org) 64 | 7. [Matplotlib](https://matplotlib.org/3.2.0/tutorials/introductory/pyplot.html) 65 | 66 | ## Labs 67 | The general objective of the labs is to familiarize you with MLflow APIs how these 68 | APIs facilitate different machine learning cycle: creating a baseline or a benchmark model; 69 | creating many experimental models by tuning parameters to produce a best outcome; understanding 70 | how to package an MLflow project as a unit of execution; and learning about MLflow model flavors 71 | and their flexibility. 72 | 73 | All this is achieved by experimenting and tracking the effects of different models, developed with different ML 74 | algorithms and tracking its results using MLflow APIs. In simple terms, the typical 75 | ML management cycle is: 76 | 77 | 1. Train a base line model with initial parameters 78 | 2. Record the relevant metrics and parameters with MLflow APIs 79 | 3. Observe the results via MLflow UI 80 | 4. Change or tweak relevant parameters in your model code 81 | 5. Train again 82 | 6. Test or evaluate model 83 | 7. Repeat 2-6 until satisfied 84 | 85 | This iterative process is recurrent in each of the lab, as part of model management life cycle. 86 | Well, let's get started, as these labs are going to he hands-on and you'll 87 | be writing code! 88 | 89 | ### Lab-00: Get Started with MLflow 90 | 91 | [00_get_started.py](./labs/00_get_started.py) 92 | 93 | ### Problem 94 | How to get you started with MLflow and how to peruse the documentation 95 | 96 | ### Solution 97 | MLflow Documentation: 98 | * [MLflow General](https://mlflow.org/docs/latest/index.html) 99 | * [MLflow Models APIs](https://mlflow.org/docs/latest/python_api/index.html) 100 | * [MLflow Tracking Client API](https://mlflow.org/docs/latest/python_api/mlflow.tracking.html) 101 | 102 | Let's run this lab in class together 103 | * `cd labs` 104 | * In a separate shell, `cd labs && mlflow ui --backend-store-uri sqlite:///mlruns.db` 105 | * In s speparate shell, `cd labs && python 00_get_started.py` or Run from your IDE 106 | * Go the to MLflow UI at http://127.0.0.1:5000 107 | * Let's examine the MLflow UI 108 | 109 | ### Lab-01: Scikit-Learn Regression with RandomForestRegressor 110 | 111 | [_01_petrol_regression_lab.py_](./labs/01_petrol_regression_lab.py) 112 | ### Problem 113 | Part 1: We want to predict the gas consumption in millions of gallons in 48 of the US states 114 | based on some key features. These features are petrol tax (in cents), per capital income (in US dollars), 115 | paved highway (in miles), population of people with driving licences. 116 | ### Solution 117 | Since this is a regression problem where the value is a range of numbers, we can use the 118 | common Random Forest Algorithm in Scikit-Learn. Most regression models are evaluated with 119 | three standard evalution metrics: Mean Absolute Error(MAE); Mean Squared Error (MSE); and 120 | Root Mean Squared Error (RSME), and r2 score. 121 | 122 | ### Sample Data 123 | |Petrol_tax |Average_income| Paved_Highways| Population_Driver_license(%)| Petrol_Consumption| 124 | |-----------|--------------|------------------|-----------------------------|---------------------| 125 | |9.0 | 3571 | 1976 | 0.525 | 541 | 126 | |9.0 | 4092 | 1250 | 0.572 | 524 | 127 | |9.0 | 3865 | 1586 | 0.580 | 561 | 128 | |7.5 | 4870 | 2351 | 0.529 | 414 | 129 | |8.0 | 4399 | 431 | 0.544 | 410 | 130 | 131 | Objectives of this Lab: 132 | 133 | * Use _RandomForestRegressor_ Model 134 | * How to use the MLflow API 135 | * Use the MLflow API to experiment several Runs 136 | * Interpret and observe runs via the MLflow UI 137 | 138 | #### Lab-01 Exercise: 139 | 140 | 1. Consult [RandomForestRegressor](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestRegressor.html) documentation 141 | 2. Change or alter the range of runs and increments of n_estimators, random_state etc. 142 | 3. Change or alter the range of runs and increments of n_estimators 143 | 4. Check in MLfow UI if the metrics are affected 144 | 145 | *_challenge-1_:* Create mean square error or r2 artifacts plots and save them for each run 146 | 147 | Refresh on [Regression Metrics](https://www.dataquest.io/blog/understanding-regression-error-metrics/) 148 | 149 | Refresh on [RandomForest](https://towardsdatascience.com/understanding-random-forest-58381e0602d2) 150 | 151 | Source for [Lab 1 & 2](https://stackabuse.com/random-forest-algorithm-with-python-and-scikit-learn/) 152 | 153 | Data source for [lab 1 & 2](https://archive.ics.uci.edu/ml/datasets/banknote+authentication) 154 | 155 | ### Lab-02: Scikit-Learn Classification with RandomForestClassifier 156 | * [_02_banknote_classification_lab.py_](./labs/02_banknote_classification_lab.py) 157 | 158 | ![](images/bank_note.png) 159 | 160 | ### Problem 161 | Part 2: Given a set of features or attributes of a bank note, can we predict whether it's authentic or fake? 162 | Four attributes, extracted from wavelet transformed images, contribute as independent variables to this classification: 163 | 164 | 1. Image.Var (Variance of Wavelet Transformed image (WTI)) 165 | 2. Image.Skew (Skewness of WTI) 166 | 3. Image.Curt (Curtosis of WTI) 167 | 4. Entropy (Entropy of image) 168 | 5. Class (Whether or not the banknote was authentic; zero=fake; one=authentic) 169 | ### Solution 170 | We are going to use Random Forest Classification to make the prediction, and measure on the accuracy. 171 | The closer to 1.0 is the accuracy the better is our confidence in its prediction. 172 | ### Sample Data 173 | |Variance |Skewness | Curtosis| Entropy| Class| 174 | |-----------|--------------|------------|----------|---------| 175 | |3.62160 | 8.6661 | -2.807 | -0.44699 | 0 | 176 | |4.54590 | 8.1674 | -2.4586 | -1.46210 | 0 | 177 | |3.86600 | -2.6383 | 1.9242 | 0.10645 | 0 | 178 | |3.45660 | 9.5228 | 4.0112 | -3.59440| 0 | 179 | 180 | Objectives of this lab: 181 | * Use a RandomForestClassification Model 182 | * How to use the MLflow Tracking API 183 | * Use the MLflow API to experiment several runs 184 | * Interpret and observe runs via the MLflow UI 185 | 186 | #### Lab-02 Exercise: 187 | * Consult [RandomForestClassifier](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html) documentation 188 | * Change or add parameters, such as depth of the tree or random_state, etc. 189 | * Change or alter the range of runs and increments of n_estimators 190 | * Check in MLflow UI if the metrics are affected 191 | * Log confusion matrix, recall and F1-score as metrics 192 | 193 | *_challenge-1_:* Use linear regression or SVM algorithm and see if it makes a difference in the evaluation metrics. This is a classic scenario where 194 | MLflow allows you to experiment, record, and evaluate three different algorithms to pick the best one, after 195 | experimentation and evaluation. 196 | **Hint**: Read the blog below on using three algorithms 197 | 198 | Nice blog on [RF, SVM, & LR](https://www.vshsolutions.com/blogs/banknote-authentication-using-machine-learning-algorithms/) on detecting fake notes 199 | 200 | Refresh on [Classification Metrics](https://joshlawman.com/metrics-classification-report-breakdown-precision-recall-f1/) 201 | 202 | Refresh on [Confusion Matrix](https://www.dataschool.io/simple-guide-to-confusion-matrix-terminology/) 203 | 204 | Refresh on [RandomForest](https://towardsdatascience.com/understanding-random-forest-58381e0602d2) 205 | 206 | Source for [Lab 1 & 2](https://stackabuse.com/random-forest-algorithm-with-python-and-scikit-learn/) 207 | 208 | Data source for [lab 1 & 2](https://archive.ics.uci.edu/ml/datasets/banknote+authentication) 209 | 210 | ### Lab-03: Scikit-Learn Regression Base with RandomForestRegressor 211 | * [_03_airbnb_base_lab.py_](./labs/03_airbnb_base_lab.py) 212 | ### Problem 213 | Take a cleansed, featurized dataset from AirBnB listing and develop a base line model to predict prices. 214 | ### Solution 215 | Use RandomForestRegressor and baseline parameters to predict the price, given all the features. 216 | ### Sample Data 217 | |host_total_listings_count|neighbourhood_cleansed|zipcode|latitude|longitude|property_type|room_type|accommodates|bathrooms|bedrooms|beds|bed_type|minimum_nights|number_of_reviews|review_scores_rating|review_scores_accuracy|review_scores_cleanliness|review_scores_checkin|review_scores_communication|review_scores_location|review_scores_value|price| 218 | |-------------------------|----------------------|-------|--------|---------|-------------|---------|------------|---------|--------|----|--------|--------------|-----------------|--------------------|----------------------|-------------------------|---------------------|---------------------------|----------------------|-------------------|-----| 219 | |1.0|0|0|37.769310377340766|-122.43385634488999|0|0|3.0|1.0|1.0|2.0|0|1.0|127.0|97.0|10.0|10.0|10.0|10.0|10.0|10.0|170.0| 220 | |2.0|1|1|37.745112331410034|-122.42101788836888|0|0|5.0|1.0|2.0|3.0|0|30.0|112.0|98.0|10.0|10.0|10.0|10.0|10.0|9.0|235.0| 221 | |10.0|2|0|37.766689597862175|-122.45250461761628|0|1|2.0|4.0|1.0|1.0|0|32.0|17.0|85.0|8.0|8.0|9.0|9.0|9.0|8.0| 65.0| 222 | 223 | Objectives of this lab: 224 | * Create a benchmark or base model 225 | * How to use the MLflow Tracking API 226 | * Interpret and observe runs via the MLflow UI 227 | 228 | #### Lab-03 Exercise: 229 | * Run script and create a simple base line model 230 | * Observe the parameters and metrics in the MLflow UI 231 | 232 | [Related code](https://github.com/MangoTheCat/Modelling-Airbnb-Prices) for this model. 233 | 234 | ### Lab-4: Scikit-Learn Regression Experimental with RandomForestRegressor 235 | * [_04_airbnb_exp_lab.py_](./labs/04_airbnb_exp_lab.py) 236 | ### Problem 237 | Can you extend the baseline model built in lab 3 to build several experimental models? 238 | ### Solution 239 | Use the existing model and make changes to code to experiment with model parameters. 240 | 241 | Objectives of this lab: 242 | * Create experiments and log meterics and parameters 243 | * Interpret and observe runs via the MLflow UI 244 | * How to use _MLflowClient()_ API to peruse experiment details 245 | 246 | Nice read on [Feature Importance](https://towardsdatascience.com/explaining-feature-importance-by-example-of-a-random-forest-d9166011959e) in Random Forest model. 247 | 248 | #### Lab-04 Exercise: 249 | * Modify or extend the parameters 250 | * Compare the results between baseline and experimental runs 251 | * Did the experimental runs produce better outcomes of metrics? 252 | * Did the RMSE decrease over the experiments 253 | 254 | [Related code](https://github.com/MangoTheCat/Modelling-Airbnb-Prices) for this model. 255 | 256 | Nice read on [feature importance](https://towardsdatascience.com/explaining-feature-importance-by-example-of-a-random-forest-d9166011959e) 257 | 258 | Nice read on [residual plots](http://docs.statwing.com/interpreting-residual-plots-to-improve-your-regression/) 259 | 260 | ### Lab-05 : Deep Learning Neural Networks for Classification 261 | * [_05_tf_keras_mnist_lab.py_](./labs/05_tf_keras_mnist_lab.py) 262 | Modified from [MLflow example](https://github.com/dbczumar/mlflow-keras-ffnn-mnist/blob/master/train.py) 263 | 264 | ### MNIST Neural Network with Layers 265 | 266 | ![](images/mnist_1layer.png) 267 | 268 | ### Problem 269 | Build a Keras/TensorFlow Neural Network to classify digital digits from 0-9 270 | ### Solution 271 | Use Keras Sequential Layers to build input, hidden, and outut layers. 272 | ### Sample Data 273 | Use the built in MNIST dataset available via dataset module `keras.datasets.mnist` 274 | 275 | Objectives of this lab: 276 | * Introduce Keras NN Model 277 | * Create your own experiment name and log runs under it 278 | * Use various optimzation techniques to get the best outcome 279 | 280 | #### Lab Exercise: 281 | * Consult [Keras Sequential Model](https://keras.io/getting-started/sequential-model-guide/) documentation 282 | * Change or modify Neural Network and regularization parameters 283 | * Add hidden layers 284 | * Make hidden units larger 285 | * Try a different [Keras optimizers](https://keras.io/optimizers/): RMSprop, Adadelta etc 286 | * Train for more epochs 287 | * Log parameters, metrics, and the model 288 | * Check MLflow UI and compare metrics among different runs 289 | 290 | ### Lab-06: Loading and predicting an existing model 291 | * [_06_load_predict_model_lab.py_](./labs/06_load_predict_model_lab.py) 292 | 293 | ![](images/pyfunc_models.png) 294 | ### Problem 295 | Having experimented several runs from labs above, can can you reuse the model to predict? 296 | ### Solution 297 | Load an existing model by extending or modifying code to reload the saved model and 298 | use test data on its _model_.predict(test_data) method. 299 | 300 | Objectives of this lab: 301 | * load an existing model and predicting with test data 302 | * load model as pyfunc function 303 | * understand MLflow model flavors that can deployed and loaded in different deployment 304 | environments. 305 | 306 | #### Lab Exercise: 307 | * Extend the _MLflowOps_ class private instance dictionary of 308 | function mappers to include [pyfunc](https://mlflow.org/docs/latest/python_api/mlflow.pyfunc.html#mlflow.pyfunc.load_model) model 309 | * Use a couple of the run_uid from your Lab-1 runs. 310 | * Check your MLflow UI for run_uids 311 | * Use the _load_model_type.predict(test_data)_ to predict the outcome 312 | 313 | ### Lab-07 : Revisit Lab 1 Problem of Linear Regression using TensorFlow/Keras 314 | 315 | #### Lab Exercise: 316 | * Run this script 317 | 318 | ### Lab-08 : Using the MLflow Register UI 319 | 320 | #### Lab Exercise: 321 | 322 | ### Lab-09 : Using the MLflow Register APIs 323 | 324 | #### Lab Exercise: 325 | 326 | ### Lab-10: Deploying and Serving a model 327 | 328 | 329 | #### Lab Exercise: 330 | 331 | ### Lab-11 (optional): Executing MLproject from GitHub 332 | 333 | ![](images/mlproject_file.png) 334 | 335 | Objectives of this lab: 336 | * Understanding MLflow Project files 337 | * Running MLflow Project as unit of execution 338 | 339 | #### Lab Exercise: 340 | * Execute an existing MLproject on git 341 | * Consult [docs](https://mlflow.org/docs/latest/quickstart.html#running-mlflow-projects) for running MLprojects 342 | * Can you execute it with different parameters? 343 | * `mlflow run https://github.com/mlflow/mlflow-example.git -P alpha=.4` 344 | * with no-conda use `mlflow run --no-conda https://github.com/mlflow/mlflow-example.git -P alpha=5` 345 | * Execute the MLproject using [mlflow.run(...)](https://mlflow.org/docs/latest/python_api/mlflow.html#mlflow.run) API 346 | 347 | *_challenge-1_:* 348 | * Create a an MLproject on your github for one of above labs 349 | * Use `https://github.com/mlflow/mlflow-example.git` as an example 350 | * Execute the MLproject with `mlflow run https://github.com/ [-P args...]` 351 | * Execute your new MLproject using [mlflow.run(...)](https://mlflow.org/docs/latest/python_api/mlflow.html#mlflow.run) API 352 | 353 | ### Lab-12 (Capstone): Create, experiment, register, and manage your model of choice Lifecycle 354 | 355 | Objectives of this lab: 356 | * Use and build whatever you have learned from above 357 | 358 | #### Lab Exercise: 359 | * Create a Python script for your model example 360 | * Consult [MLflow](https://mlflow.org/docs/latest/python_api/mlflow.html) and [Tracking APIs](https://mlflow.org/docs/latest/python_api/mlflow.tracking.html): 361 | * compute relevant metrics 362 | * log individual or bulk parameters and metrics 363 | * add tags or notes for your runs 364 | * create a distinct experiment name 365 | * experiment different parameters with each run under this experiment 366 | * can you create appropriate artifacts (using matplotlib) and save them? 367 | * consult MLflow UI to pick the best model 368 | * can you load the best model using native model or pyfunc? 369 | * can you predict with test data? 370 | * create an MLproject on GitHub 371 | * can you execute it using `mlflow run https://github.com/