├── .gitignore └── README.md /.gitignore: -------------------------------------------------------------------------------- 1 | # Course specific files 2 | instructor/ 3 | *instructor* 4 | 5 | # Byte-compiled / optimized / DLL files 6 | __pycache__/ 7 | *.py[cod] 8 | *$py.class 9 | 10 | # C extensions 11 | *.so 12 | 13 | # Distribution / packaging 14 | .Python 15 | build/ 16 | develop-eggs/ 17 | dist/ 18 | downloads/ 19 | eggs/ 20 | .eggs/ 21 | lib/ 22 | lib64/ 23 | parts/ 24 | sdist/ 25 | var/ 26 | wheels/ 27 | pip-wheel-metadata/ 28 | share/python-wheels/ 29 | *.egg-info/ 30 | .installed.cfg 31 | *.egg 32 | MANIFEST 33 | 34 | # PyInstaller 35 | # Usually these files are written by a python script from a template 36 | # before PyInstaller builds the exe, so as to inject date/other infos into it. 37 | *.manifest 38 | *.spec 39 | 40 | # Installer logs 41 | pip-log.txt 42 | pip-delete-this-directory.txt 43 | 44 | # Unit test / coverage reports 45 | htmlcov/ 46 | .tox/ 47 | .nox/ 48 | .coverage 49 | .coverage.* 50 | .cache 51 | nosetests.xml 52 | coverage.xml 53 | *.cover 54 | *.py,cover 55 | .hypothesis/ 56 | .pytest_cache/ 57 | 58 | # Translations 59 | *.mo 60 | *.pot 61 | 62 | # Django stuff: 63 | *.log 64 | local_settings.py 65 | db.sqlite3 66 | db.sqlite3-journal 67 | 68 | # Flask stuff: 69 | instance/ 70 | .webassets-cache 71 | 72 | # Scrapy stuff: 73 | .scrapy 74 | 75 | # Sphinx documentation 76 | docs/_build/ 77 | 78 | # PyBuilder 79 | target/ 80 | 81 | # Jupyter Notebook 82 | .ipynb_checkpoints 83 | 84 | # IPython 85 | profile_default/ 86 | ipython_config.py 87 | 88 | # pyenv 89 | .python-version 90 | 91 | # pipenv 92 | # According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control. 93 | # However, in case of collaboration, if having platform-specific dependencies or dependencies 94 | # having no cross-platform support, pipenv may install dependencies that don't work, or not 95 | # install all needed dependencies. 96 | #Pipfile.lock 97 | 98 | # PEP 582; used by e.g. github.com/David-OConnor/pyflow 99 | __pypackages__/ 100 | 101 | # Celery stuff 102 | celerybeat-schedule 103 | celerybeat.pid 104 | 105 | # SageMath parsed files 106 | *.sage.py 107 | 108 | # Environments 109 | .env 110 | .venv 111 | env/ 112 | venv/ 113 | ENV/ 114 | env.bak/ 115 | venv.bak/ 116 | 117 | # Spyder project settings 118 | .spyderproject 119 | .spyproject 120 | 121 | # Rope project settings 122 | .ropeproject 123 | 124 | # mkdocs documentation 125 | /site 126 | 127 | # mypy 128 | .mypy_cache/ 129 | .dmypy.json 130 | dmypy.json 131 | 132 | # Pyre type checker 133 | .pyre/ 134 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 |

Introduction to Machine Learning in scikit-learn

2 |
3 | 4 | > "It's tough to make predictions, especially about the future." 5 | > – Yogi Berra 6 | 7 | ---- 8 | Course Description 9 | ---- 10 | 11 | In this course, you'll create end-to-end solutions to machine learning problems. This course will cover popular applied techniques in both supervised and unsupervised machine learning, such as regression, classification, and clustering. You'll learn how to properly engineer features, apply algorithms, and evaluate model performance. The focus of the course will be Python's [scikit-learn](https://scikit-learn.org) library. 12 | 13 | ---- 14 | Logistics 15 | ---- 16 | 17 | __Instructor:__ Brian Spiering 18 | 19 | Prerequisites 20 | ---- 21 | 22 | - Working knowledge of probability and statistics. 23 | - Introductory knowledge of linear algebra (e.g., determinants and singular value decomposition). 24 | - Intermediate level of Python (e.g., ability to create to classes). 25 | - No previous knowledge of machine learning required. 26 | 27 | Learning Outcomes 28 | ----- 29 | 30 | By the end of the course, you should be able to: 31 | 32 | 1. Build end-to-end machine learning systems to answer meaningful Data Science questions. 33 | 1. Write idiomatic code in Python's scikit-learn package to model data. 34 | 1. Recognize when to and _when not to_ apply machine learning techniques. 35 | 1. Complete data science take-home challenges that you might encounter during job interviews. 36 | 37 | ----- 38 | Course Topics 39 | ----- 40 | 41 | 1. Welcome 42 | 1. Machine learning workflow 43 | 1. Scikit-learn API Overview (Estimators, Transformers, Pipelines) 44 | 1. Build your first ML model 45 | 1. Preprocessing 46 | 1. Feature extraction 47 | 1. Feature selection 48 | 1. Principal Component Analysis (PCA) 49 | 1. Model Selection 50 | 1. Classifiers (binary classification and mutliclass classification) 51 | 1. Handling class imbalance with with SMOTE resampling 52 | 1. Classification Metrics 53 | 1. Ensembling 54 | 1. Feature Importance 55 | 1. Creating custom classes in scikit-learn 56 | 1. Clustering 57 | 58 | Assignments 59 | ----- 60 | 61 | There are five hand-ons assignments to practice applying course concepts to real-world data. 62 | 63 | Final Project 64 | ------ 65 | 66 | There is a final project where you choose a dataset and complete a end-to-end machine learning project. 67 | --------------------------------------------------------------------------------