├── .gitignore ├── LICENSE ├── README.md ├── data ├── auto_mpg.csv └── casting_images.npz ├── images ├── descent.png └── residuals.png ├── notebooks_en ├── 1_Linear_Regression.ipynb ├── 2_Logistic_Regression.ipynb ├── 3_Multiple_Linear_Regression.ipynb ├── 4_Polynomial_Regression.ipynb └── 5_Multiple_Logistic_Regression.ipynb ├── scripts └── plot_helpers.py └── style └── custom.css /.gitignore: -------------------------------------------------------------------------------- 1 | #Ignore notebook checkpoints 2 | .ipynb_checkpoints 3 | .DS_Store 4 | *.pyc 5 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | BSD 3-Clause License 2 | 3 | Copyright (c) 2019, engineersCode 4 | All rights reserved. 5 | 6 | Redistribution and use in source and binary forms, with or without 7 | modification, are permitted provided that the following conditions are met: 8 | 9 | 1. Redistributions of source code must retain the above copyright notice, this 10 | list of conditions and the following disclaimer. 11 | 12 | 2. Redistributions in binary form must reproduce the above copyright notice, 13 | this list of conditions and the following disclaimer in the documentation 14 | and/or other materials provided with the distribution. 15 | 16 | 3. Neither the name of the copyright holder nor the names of its 17 | contributors may be used to endorse or promote products derived from 18 | this software without specific prior written permission. 19 | 20 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" 21 | AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 22 | IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 23 | DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE 24 | FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 25 | DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR 26 | SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER 27 | CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, 28 | OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE 29 | OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 30 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Engineering Computations Module 6 2 | 3 | _Engineering Computations_ is a collection of stackable learning modules, flexible for adoption in different situations. 4 | It aims to develop computational skills for students in engineering, but it can also be used by students in other science majors. 5 | The modules use the Python programming language and the Jupyter open-source tools for interactive computing. 6 | 7 | > Rather than "learning to code," our vision is "coding to learn." 8 | 9 | ## Module 6: deep learning 10 | 11 | *A step-by-step introduction to deep learning (a.k.a. neural network) models, aimed at scientists and engineers having a background in calculus and linear algebra.* 12 | 13 | [![Tweet](https://img.shields.io/twitter/url/http/shields.io.svg?style=social)](https://twitter.com/intent/tweet?text=Step-by-step%20Deep%20Learning%20tutorial%20with%20Python%20for%20engineering%20students%20%23EngineersCode%20&url=https://github.com/engineersCode/EngComp6_deeplearning) 14 | 15 | **Pre-requisite: learning modules [*EngComp 1*](https://github.com/engineersCode/EngComp1_offtheground) and [*EngComp 4*](https://github.com/engineersCode/EngComp4_landlinear) of our collection.** Recommended: [*EngComp 2*](https://github.com/engineersCode/EngComp2_takeoff), or basic use of `pandas` for data manipulation. 16 | 17 | ### [Lesson 1](http://go.gwu.edu/engcomp6lesson1): Linear regression by gradient descent 18 | 19 | Find the minimum of a function by gradient descent. Play with SymPy. Key ingredients of building a linear model from data with a single independent variable. Optimize a loss function to find the model parameters. 20 | 21 | ### [Lesson 2](http://go.gwu.edu/engcomp6lesson2): Logistic regression 22 | 23 | Composition of a linear model with the logistic function. Construct the logistic loss function by integration. Find the model parameters with `autograd`. Combine with a decision boundary to do classification. 24 | 25 | ### [Lesson 3](http://go.gwu.edu/engcomp6lesson3): Multiple linear regression 26 | 27 | Use multiple independent variables to build a linear model. Express multiple linear regression in matrix form. Find the weights by gradient descent. Scale (normalize) the features to ensure convergence. Get acquainted with `scikit-learn`. Model accuracy. Linear regression with `scikit-learn` and with pseudo-inverse. 28 | 29 | ### [Lesson 4](http://go.gwu.edu/engcomp6lesson4): Polynomial regression 30 | 31 | Fitting a polynomial to data is a special case of multiple linear regression. Build polynomial features, scale the data, and train the model like in Lesson 3. For predictions with the model, use the scaling from the training data on the new data. 32 | Observe underfitting and overfitting. Use regularization to avoid overfitting. This is also called _ridge regression_. 33 | Do it with scikit-learn's `Ridge()`. 34 | 35 | ### [Lesson 5](http://go.gwu.edu/engcomp6lesson5): Multiple logistic regression 36 | 37 | A taste of more practical machine learning applications: _multiple logistic regression_ for the problem of identifying defective metal-casting parts. 38 | Turn an image into a vector of grayscale values to use it as input data, and set up a classification problem from multi-dimensional feature vectors. 39 | Split data into training, validation, and test datasets to assess model performance. 40 | Normalize the data using z-score. 41 | Evaluate the performance of a classification model using F-score. 42 | 43 | ### [Lesson 6]() Multivariate regression (coming soon) 44 | 45 | ### [Lesson 7]() Neural network model (coming soon) 46 | 47 | ## Copyright and License 48 | 49 | (c) 2021 Lorena A. Barba, Pi-Yueh Chuang, Tingyu Wang. 50 | All content is under Creative Commons Attribution [CC-BY 4.0](https://creativecommons.org/licenses/by/4.0/legalcode.txt), and all [code is under BSD-3 clause](https://github.com/engineersCode/EngComp/blob/master/LICENSE). We are happy if you re-use the content in any way! 51 | 52 | [![License](https://img.shields.io/badge/License-BSD%203--Clause-blue.svg)](https://opensource.org/licenses/BSD-3-Clause) [![License: CC BY 4.0](https://img.shields.io/badge/License-CC%20BY%204.0-lightgrey.svg)](https://creativecommons.org/licenses/by/4.0/) 53 | -------------------------------------------------------------------------------- /data/auto_mpg.csv: -------------------------------------------------------------------------------- 1 | mpg,cylinders,displacement,horsepower,weight,acceleration,model year,origin,car name 2 | 18.0,8,307.0,130.0,3504.0,12.0,70,1,chevrolet chevelle malibu 3 | 15.0,8,350.0,165.0,3693.0,11.5,70,1,buick skylark 320 4 | 18.0,8,318.0,150.0,3436.0,11.0,70,1,plymouth satellite 5 | 16.0,8,304.0,150.0,3433.0,12.0,70,1,amc rebel sst 6 | 17.0,8,302.0,140.0,3449.0,10.5,70,1,ford torino 7 | 15.0,8,429.0,198.0,4341.0,10.0,70,1,ford galaxie 500 8 | 14.0,8,454.0,220.0,4354.0,9.0,70,1,chevrolet impala 9 | 14.0,8,440.0,215.0,4312.0,8.5,70,1,plymouth fury iii 10 | 14.0,8,455.0,225.0,4425.0,10.0,70,1,pontiac catalina 11 | 15.0,8,390.0,190.0,3850.0,8.5,70,1,amc ambassador dpl 12 | 15.0,8,383.0,170.0,3563.0,10.0,70,1,dodge challenger se 13 | 14.0,8,340.0,160.0,3609.0,8.0,70,1,plymouth 'cuda 340 14 | 15.0,8,400.0,150.0,3761.0,9.5,70,1,chevrolet monte carlo 15 | 14.0,8,455.0,225.0,3086.0,10.0,70,1,buick estate wagon (sw) 16 | 24.0,4,113.0,95.0,2372.0,15.0,70,3,toyota corona mark ii 17 | 22.0,6,198.0,95.0,2833.0,15.5,70,1,plymouth duster 18 | 18.0,6,199.0,97.0,2774.0,15.5,70,1,amc hornet 19 | 21.0,6,200.0,85.0,2587.0,16.0,70,1,ford maverick 20 | 27.0,4,97.0,88.0,2130.0,14.5,70,3,datsun pl510 21 | 26.0,4,97.0,46.0,1835.0,20.5,70,2,volkswagen 1131 deluxe sedan 22 | 25.0,4,110.0,87.0,2672.0,17.5,70,2,peugeot 504 23 | 24.0,4,107.0,90.0,2430.0,14.5,70,2,audi 100 ls 24 | 25.0,4,104.0,95.0,2375.0,17.5,70,2,saab 99e 25 | 26.0,4,121.0,113.0,2234.0,12.5,70,2,bmw 2002 26 | 21.0,6,199.0,90.0,2648.0,15.0,70,1,amc gremlin 27 | 10.0,8,360.0,215.0,4615.0,14.0,70,1,ford f250 28 | 10.0,8,307.0,200.0,4376.0,15.0,70,1,chevy c20 29 | 11.0,8,318.0,210.0,4382.0,13.5,70,1,dodge d200 30 | 9.0,8,304.0,193.0,4732.0,18.5,70,1,hi 1200d 31 | 27.0,4,97.0,88.0,2130.0,14.5,71,3,datsun pl510 32 | 28.0,4,140.0,90.0,2264.0,15.5,71,1,chevrolet vega 2300 33 | 25.0,4,113.0,95.0,2228.0,14.0,71,3,toyota corona 34 | 19.0,6,232.0,100.0,2634.0,13.0,71,1,amc gremlin 35 | 16.0,6,225.0,105.0,3439.0,15.5,71,1,plymouth satellite custom 36 | 17.0,6,250.0,100.0,3329.0,15.5,71,1,chevrolet chevelle malibu 37 | 19.0,6,250.0,88.0,3302.0,15.5,71,1,ford torino 500 38 | 18.0,6,232.0,100.0,3288.0,15.5,71,1,amc matador 39 | 14.0,8,350.0,165.0,4209.0,12.0,71,1,chevrolet impala 40 | 14.0,8,400.0,175.0,4464.0,11.5,71,1,pontiac catalina brougham 41 | 14.0,8,351.0,153.0,4154.0,13.5,71,1,ford galaxie 500 42 | 14.0,8,318.0,150.0,4096.0,13.0,71,1,plymouth fury iii 43 | 12.0,8,383.0,180.0,4955.0,11.5,71,1,dodge monaco (sw) 44 | 13.0,8,400.0,170.0,4746.0,12.0,71,1,ford country squire (sw) 45 | 13.0,8,400.0,175.0,5140.0,12.0,71,1,pontiac safari (sw) 46 | 18.0,6,258.0,110.0,2962.0,13.5,71,1,amc hornet sportabout (sw) 47 | 22.0,4,140.0,72.0,2408.0,19.0,71,1,chevrolet vega (sw) 48 | 19.0,6,250.0,100.0,3282.0,15.0,71,1,pontiac firebird 49 | 18.0,6,250.0,88.0,3139.0,14.5,71,1,ford mustang 50 | 23.0,4,122.0,86.0,2220.0,14.0,71,1,mercury capri 2000 51 | 28.0,4,116.0,90.0,2123.0,14.0,71,2,opel 1900 52 | 30.0,4,79.0,70.0,2074.0,19.5,71,2,peugeot 304 53 | 30.0,4,88.0,76.0,2065.0,14.5,71,2,fiat 124b 54 | 31.0,4,71.0,65.0,1773.0,19.0,71,3,toyota corolla 1200 55 | 35.0,4,72.0,69.0,1613.0,18.0,71,3,datsun 1200 56 | 27.0,4,97.0,60.0,1834.0,19.0,71,2,volkswagen model 111 57 | 26.0,4,91.0,70.0,1955.0,20.5,71,1,plymouth cricket 58 | 24.0,4,113.0,95.0,2278.0,15.5,72,3,toyota corona hardtop 59 | 25.0,4,97.5,80.0,2126.0,17.0,72,1,dodge colt hardtop 60 | 23.0,4,97.0,54.0,2254.0,23.5,72,2,volkswagen type 3 61 | 20.0,4,140.0,90.0,2408.0,19.5,72,1,chevrolet vega 62 | 21.0,4,122.0,86.0,2226.0,16.5,72,1,ford pinto runabout 63 | 13.0,8,350.0,165.0,4274.0,12.0,72,1,chevrolet impala 64 | 14.0,8,400.0,175.0,4385.0,12.0,72,1,pontiac catalina 65 | 15.0,8,318.0,150.0,4135.0,13.5,72,1,plymouth fury iii 66 | 14.0,8,351.0,153.0,4129.0,13.0,72,1,ford galaxie 500 67 | 17.0,8,304.0,150.0,3672.0,11.5,72,1,amc ambassador sst 68 | 11.0,8,429.0,208.0,4633.0,11.0,72,1,mercury marquis 69 | 13.0,8,350.0,155.0,4502.0,13.5,72,1,buick lesabre custom 70 | 12.0,8,350.0,160.0,4456.0,13.5,72,1,oldsmobile delta 88 royale 71 | 13.0,8,400.0,190.0,4422.0,12.5,72,1,chrysler newport royal 72 | 19.0,3,70.0,97.0,2330.0,13.5,72,3,mazda rx2 coupe 73 | 15.0,8,304.0,150.0,3892.0,12.5,72,1,amc matador (sw) 74 | 13.0,8,307.0,130.0,4098.0,14.0,72,1,chevrolet chevelle concours (sw) 75 | 13.0,8,302.0,140.0,4294.0,16.0,72,1,ford gran torino (sw) 76 | 14.0,8,318.0,150.0,4077.0,14.0,72,1,plymouth satellite custom (sw) 77 | 18.0,4,121.0,112.0,2933.0,14.5,72,2,volvo 145e (sw) 78 | 22.0,4,121.0,76.0,2511.0,18.0,72,2,volkswagen 411 (sw) 79 | 21.0,4,120.0,87.0,2979.0,19.5,72,2,peugeot 504 (sw) 80 | 26.0,4,96.0,69.0,2189.0,18.0,72,2,renault 12 (sw) 81 | 22.0,4,122.0,86.0,2395.0,16.0,72,1,ford pinto (sw) 82 | 28.0,4,97.0,92.0,2288.0,17.0,72,3,datsun 510 (sw) 83 | 23.0,4,120.0,97.0,2506.0,14.5,72,3,toyouta corona mark ii (sw) 84 | 28.0,4,98.0,80.0,2164.0,15.0,72,1,dodge colt (sw) 85 | 27.0,4,97.0,88.0,2100.0,16.5,72,3,toyota corolla 1600 (sw) 86 | 13.0,8,350.0,175.0,4100.0,13.0,73,1,buick century 350 87 | 14.0,8,304.0,150.0,3672.0,11.5,73,1,amc matador 88 | 13.0,8,350.0,145.0,3988.0,13.0,73,1,chevrolet malibu 89 | 14.0,8,302.0,137.0,4042.0,14.5,73,1,ford gran torino 90 | 15.0,8,318.0,150.0,3777.0,12.5,73,1,dodge coronet custom 91 | 12.0,8,429.0,198.0,4952.0,11.5,73,1,mercury marquis brougham 92 | 13.0,8,400.0,150.0,4464.0,12.0,73,1,chevrolet caprice classic 93 | 13.0,8,351.0,158.0,4363.0,13.0,73,1,ford ltd 94 | 14.0,8,318.0,150.0,4237.0,14.5,73,1,plymouth fury gran sedan 95 | 13.0,8,440.0,215.0,4735.0,11.0,73,1,chrysler new yorker brougham 96 | 12.0,8,455.0,225.0,4951.0,11.0,73,1,buick electra 225 custom 97 | 13.0,8,360.0,175.0,3821.0,11.0,73,1,amc ambassador brougham 98 | 18.0,6,225.0,105.0,3121.0,16.5,73,1,plymouth valiant 99 | 16.0,6,250.0,100.0,3278.0,18.0,73,1,chevrolet nova custom 100 | 18.0,6,232.0,100.0,2945.0,16.0,73,1,amc hornet 101 | 18.0,6,250.0,88.0,3021.0,16.5,73,1,ford maverick 102 | 23.0,6,198.0,95.0,2904.0,16.0,73,1,plymouth duster 103 | 26.0,4,97.0,46.0,1950.0,21.0,73,2,volkswagen super beetle 104 | 11.0,8,400.0,150.0,4997.0,14.0,73,1,chevrolet impala 105 | 12.0,8,400.0,167.0,4906.0,12.5,73,1,ford country 106 | 13.0,8,360.0,170.0,4654.0,13.0,73,1,plymouth custom suburb 107 | 12.0,8,350.0,180.0,4499.0,12.5,73,1,oldsmobile vista cruiser 108 | 18.0,6,232.0,100.0,2789.0,15.0,73,1,amc gremlin 109 | 20.0,4,97.0,88.0,2279.0,19.0,73,3,toyota carina 110 | 21.0,4,140.0,72.0,2401.0,19.5,73,1,chevrolet vega 111 | 22.0,4,108.0,94.0,2379.0,16.5,73,3,datsun 610 112 | 18.0,3,70.0,90.0,2124.0,13.5,73,3,maxda rx3 113 | 19.0,4,122.0,85.0,2310.0,18.5,73,1,ford pinto 114 | 21.0,6,155.0,107.0,2472.0,14.0,73,1,mercury capri v6 115 | 26.0,4,98.0,90.0,2265.0,15.5,73,2,fiat 124 sport coupe 116 | 15.0,8,350.0,145.0,4082.0,13.0,73,1,chevrolet monte carlo s 117 | 16.0,8,400.0,230.0,4278.0,9.5,73,1,pontiac grand prix 118 | 29.0,4,68.0,49.0,1867.0,19.5,73,2,fiat 128 119 | 24.0,4,116.0,75.0,2158.0,15.5,73,2,opel manta 120 | 20.0,4,114.0,91.0,2582.0,14.0,73,2,audi 100ls 121 | 19.0,4,121.0,112.0,2868.0,15.5,73,2,volvo 144ea 122 | 15.0,8,318.0,150.0,3399.0,11.0,73,1,dodge dart custom 123 | 24.0,4,121.0,110.0,2660.0,14.0,73,2,saab 99le 124 | 20.0,6,156.0,122.0,2807.0,13.5,73,3,toyota mark ii 125 | 11.0,8,350.0,180.0,3664.0,11.0,73,1,oldsmobile omega 126 | 20.0,6,198.0,95.0,3102.0,16.5,74,1,plymouth duster 127 | 19.0,6,232.0,100.0,2901.0,16.0,74,1,amc hornet 128 | 15.0,6,250.0,100.0,3336.0,17.0,74,1,chevrolet nova 129 | 31.0,4,79.0,67.0,1950.0,19.0,74,3,datsun b210 130 | 26.0,4,122.0,80.0,2451.0,16.5,74,1,ford pinto 131 | 32.0,4,71.0,65.0,1836.0,21.0,74,3,toyota corolla 1200 132 | 25.0,4,140.0,75.0,2542.0,17.0,74,1,chevrolet vega 133 | 16.0,6,250.0,100.0,3781.0,17.0,74,1,chevrolet chevelle malibu classic 134 | 16.0,6,258.0,110.0,3632.0,18.0,74,1,amc matador 135 | 18.0,6,225.0,105.0,3613.0,16.5,74,1,plymouth satellite sebring 136 | 16.0,8,302.0,140.0,4141.0,14.0,74,1,ford gran torino 137 | 13.0,8,350.0,150.0,4699.0,14.5,74,1,buick century luxus (sw) 138 | 14.0,8,318.0,150.0,4457.0,13.5,74,1,dodge coronet custom (sw) 139 | 14.0,8,302.0,140.0,4638.0,16.0,74,1,ford gran torino (sw) 140 | 14.0,8,304.0,150.0,4257.0,15.5,74,1,amc matador (sw) 141 | 29.0,4,98.0,83.0,2219.0,16.5,74,2,audi fox 142 | 26.0,4,79.0,67.0,1963.0,15.5,74,2,volkswagen dasher 143 | 26.0,4,97.0,78.0,2300.0,14.5,74,2,opel manta 144 | 31.0,4,76.0,52.0,1649.0,16.5,74,3,toyota corona 145 | 32.0,4,83.0,61.0,2003.0,19.0,74,3,datsun 710 146 | 28.0,4,90.0,75.0,2125.0,14.5,74,1,dodge colt 147 | 24.0,4,90.0,75.0,2108.0,15.5,74,2,fiat 128 148 | 26.0,4,116.0,75.0,2246.0,14.0,74,2,fiat 124 tc 149 | 24.0,4,120.0,97.0,2489.0,15.0,74,3,honda civic 150 | 26.0,4,108.0,93.0,2391.0,15.5,74,3,subaru 151 | 31.0,4,79.0,67.0,2000.0,16.0,74,2,fiat x1.9 152 | 19.0,6,225.0,95.0,3264.0,16.0,75,1,plymouth valiant custom 153 | 18.0,6,250.0,105.0,3459.0,16.0,75,1,chevrolet nova 154 | 15.0,6,250.0,72.0,3432.0,21.0,75,1,mercury monarch 155 | 15.0,6,250.0,72.0,3158.0,19.5,75,1,ford maverick 156 | 16.0,8,400.0,170.0,4668.0,11.5,75,1,pontiac catalina 157 | 15.0,8,350.0,145.0,4440.0,14.0,75,1,chevrolet bel air 158 | 16.0,8,318.0,150.0,4498.0,14.5,75,1,plymouth grand fury 159 | 14.0,8,351.0,148.0,4657.0,13.5,75,1,ford ltd 160 | 17.0,6,231.0,110.0,3907.0,21.0,75,1,buick century 161 | 16.0,6,250.0,105.0,3897.0,18.5,75,1,chevroelt chevelle malibu 162 | 15.0,6,258.0,110.0,3730.0,19.0,75,1,amc matador 163 | 18.0,6,225.0,95.0,3785.0,19.0,75,1,plymouth fury 164 | 21.0,6,231.0,110.0,3039.0,15.0,75,1,buick skyhawk 165 | 20.0,8,262.0,110.0,3221.0,13.5,75,1,chevrolet monza 2+2 166 | 13.0,8,302.0,129.0,3169.0,12.0,75,1,ford mustang ii 167 | 29.0,4,97.0,75.0,2171.0,16.0,75,3,toyota corolla 168 | 23.0,4,140.0,83.0,2639.0,17.0,75,1,ford pinto 169 | 20.0,6,232.0,100.0,2914.0,16.0,75,1,amc gremlin 170 | 23.0,4,140.0,78.0,2592.0,18.5,75,1,pontiac astro 171 | 24.0,4,134.0,96.0,2702.0,13.5,75,3,toyota corona 172 | 25.0,4,90.0,71.0,2223.0,16.5,75,2,volkswagen dasher 173 | 24.0,4,119.0,97.0,2545.0,17.0,75,3,datsun 710 174 | 18.0,6,171.0,97.0,2984.0,14.5,75,1,ford pinto 175 | 29.0,4,90.0,70.0,1937.0,14.0,75,2,volkswagen rabbit 176 | 19.0,6,232.0,90.0,3211.0,17.0,75,1,amc pacer 177 | 23.0,4,115.0,95.0,2694.0,15.0,75,2,audi 100ls 178 | 23.0,4,120.0,88.0,2957.0,17.0,75,2,peugeot 504 179 | 22.0,4,121.0,98.0,2945.0,14.5,75,2,volvo 244dl 180 | 25.0,4,121.0,115.0,2671.0,13.5,75,2,saab 99le 181 | 33.0,4,91.0,53.0,1795.0,17.5,75,3,honda civic cvcc 182 | 28.0,4,107.0,86.0,2464.0,15.5,76,2,fiat 131 183 | 25.0,4,116.0,81.0,2220.0,16.9,76,2,opel 1900 184 | 25.0,4,140.0,92.0,2572.0,14.9,76,1,capri ii 185 | 26.0,4,98.0,79.0,2255.0,17.7,76,1,dodge colt 186 | 27.0,4,101.0,83.0,2202.0,15.3,76,2,renault 12tl 187 | 17.5,8,305.0,140.0,4215.0,13.0,76,1,chevrolet chevelle malibu classic 188 | 16.0,8,318.0,150.0,4190.0,13.0,76,1,dodge coronet brougham 189 | 15.5,8,304.0,120.0,3962.0,13.9,76,1,amc matador 190 | 14.5,8,351.0,152.0,4215.0,12.8,76,1,ford gran torino 191 | 22.0,6,225.0,100.0,3233.0,15.4,76,1,plymouth valiant 192 | 22.0,6,250.0,105.0,3353.0,14.5,76,1,chevrolet nova 193 | 24.0,6,200.0,81.0,3012.0,17.6,76,1,ford maverick 194 | 22.5,6,232.0,90.0,3085.0,17.6,76,1,amc hornet 195 | 29.0,4,85.0,52.0,2035.0,22.2,76,1,chevrolet chevette 196 | 24.5,4,98.0,60.0,2164.0,22.1,76,1,chevrolet woody 197 | 29.0,4,90.0,70.0,1937.0,14.2,76,2,vw rabbit 198 | 33.0,4,91.0,53.0,1795.0,17.4,76,3,honda civic 199 | 20.0,6,225.0,100.0,3651.0,17.7,76,1,dodge aspen se 200 | 18.0,6,250.0,78.0,3574.0,21.0,76,1,ford granada ghia 201 | 18.5,6,250.0,110.0,3645.0,16.2,76,1,pontiac ventura sj 202 | 17.5,6,258.0,95.0,3193.0,17.8,76,1,amc pacer d/l 203 | 29.5,4,97.0,71.0,1825.0,12.2,76,2,volkswagen rabbit 204 | 32.0,4,85.0,70.0,1990.0,17.0,76,3,datsun b-210 205 | 28.0,4,97.0,75.0,2155.0,16.4,76,3,toyota corolla 206 | 26.5,4,140.0,72.0,2565.0,13.6,76,1,ford pinto 207 | 20.0,4,130.0,102.0,3150.0,15.7,76,2,volvo 245 208 | 13.0,8,318.0,150.0,3940.0,13.2,76,1,plymouth volare premier v8 209 | 19.0,4,120.0,88.0,3270.0,21.9,76,2,peugeot 504 210 | 19.0,6,156.0,108.0,2930.0,15.5,76,3,toyota mark ii 211 | 16.5,6,168.0,120.0,3820.0,16.7,76,2,mercedes-benz 280s 212 | 16.5,8,350.0,180.0,4380.0,12.1,76,1,cadillac seville 213 | 13.0,8,350.0,145.0,4055.0,12.0,76,1,chevy c10 214 | 13.0,8,302.0,130.0,3870.0,15.0,76,1,ford f108 215 | 13.0,8,318.0,150.0,3755.0,14.0,76,1,dodge d100 216 | 31.5,4,98.0,68.0,2045.0,18.5,77,3,honda accord cvcc 217 | 30.0,4,111.0,80.0,2155.0,14.8,77,1,buick opel isuzu deluxe 218 | 36.0,4,79.0,58.0,1825.0,18.6,77,2,renault 5 gtl 219 | 25.5,4,122.0,96.0,2300.0,15.5,77,1,plymouth arrow gs 220 | 33.5,4,85.0,70.0,1945.0,16.8,77,3,datsun f-10 hatchback 221 | 17.5,8,305.0,145.0,3880.0,12.5,77,1,chevrolet caprice classic 222 | 17.0,8,260.0,110.0,4060.0,19.0,77,1,oldsmobile cutlass supreme 223 | 15.5,8,318.0,145.0,4140.0,13.7,77,1,dodge monaco brougham 224 | 15.0,8,302.0,130.0,4295.0,14.9,77,1,mercury cougar brougham 225 | 17.5,6,250.0,110.0,3520.0,16.4,77,1,chevrolet concours 226 | 20.5,6,231.0,105.0,3425.0,16.9,77,1,buick skylark 227 | 19.0,6,225.0,100.0,3630.0,17.7,77,1,plymouth volare custom 228 | 18.5,6,250.0,98.0,3525.0,19.0,77,1,ford granada 229 | 16.0,8,400.0,180.0,4220.0,11.1,77,1,pontiac grand prix lj 230 | 15.5,8,350.0,170.0,4165.0,11.4,77,1,chevrolet monte carlo landau 231 | 15.5,8,400.0,190.0,4325.0,12.2,77,1,chrysler cordoba 232 | 16.0,8,351.0,149.0,4335.0,14.5,77,1,ford thunderbird 233 | 29.0,4,97.0,78.0,1940.0,14.5,77,2,volkswagen rabbit custom 234 | 24.5,4,151.0,88.0,2740.0,16.0,77,1,pontiac sunbird coupe 235 | 26.0,4,97.0,75.0,2265.0,18.2,77,3,toyota corolla liftback 236 | 25.5,4,140.0,89.0,2755.0,15.8,77,1,ford mustang ii 2+2 237 | 30.5,4,98.0,63.0,2051.0,17.0,77,1,chevrolet chevette 238 | 33.5,4,98.0,83.0,2075.0,15.9,77,1,dodge colt m/m 239 | 30.0,4,97.0,67.0,1985.0,16.4,77,3,subaru dl 240 | 30.5,4,97.0,78.0,2190.0,14.1,77,2,volkswagen dasher 241 | 22.0,6,146.0,97.0,2815.0,14.5,77,3,datsun 810 242 | 21.5,4,121.0,110.0,2600.0,12.8,77,2,bmw 320i 243 | 21.5,3,80.0,110.0,2720.0,13.5,77,3,mazda rx-4 244 | 43.1,4,90.0,48.0,1985.0,21.5,78,2,volkswagen rabbit custom diesel 245 | 36.1,4,98.0,66.0,1800.0,14.4,78,1,ford fiesta 246 | 32.8,4,78.0,52.0,1985.0,19.4,78,3,mazda glc deluxe 247 | 39.4,4,85.0,70.0,2070.0,18.6,78,3,datsun b210 gx 248 | 36.1,4,91.0,60.0,1800.0,16.4,78,3,honda civic cvcc 249 | 19.9,8,260.0,110.0,3365.0,15.5,78,1,oldsmobile cutlass salon brougham 250 | 19.4,8,318.0,140.0,3735.0,13.2,78,1,dodge diplomat 251 | 20.2,8,302.0,139.0,3570.0,12.8,78,1,mercury monarch ghia 252 | 19.2,6,231.0,105.0,3535.0,19.2,78,1,pontiac phoenix lj 253 | 20.5,6,200.0,95.0,3155.0,18.2,78,1,chevrolet malibu 254 | 20.2,6,200.0,85.0,2965.0,15.8,78,1,ford fairmont (auto) 255 | 25.1,4,140.0,88.0,2720.0,15.4,78,1,ford fairmont (man) 256 | 20.5,6,225.0,100.0,3430.0,17.2,78,1,plymouth volare 257 | 19.4,6,232.0,90.0,3210.0,17.2,78,1,amc concord 258 | 20.6,6,231.0,105.0,3380.0,15.8,78,1,buick century special 259 | 20.8,6,200.0,85.0,3070.0,16.7,78,1,mercury zephyr 260 | 18.6,6,225.0,110.0,3620.0,18.7,78,1,dodge aspen 261 | 18.1,6,258.0,120.0,3410.0,15.1,78,1,amc concord d/l 262 | 19.2,8,305.0,145.0,3425.0,13.2,78,1,chevrolet monte carlo landau 263 | 17.7,6,231.0,165.0,3445.0,13.4,78,1,buick regal sport coupe (turbo) 264 | 18.1,8,302.0,139.0,3205.0,11.2,78,1,ford futura 265 | 17.5,8,318.0,140.0,4080.0,13.7,78,1,dodge magnum xe 266 | 30.0,4,98.0,68.0,2155.0,16.5,78,1,chevrolet chevette 267 | 27.5,4,134.0,95.0,2560.0,14.2,78,3,toyota corona 268 | 27.2,4,119.0,97.0,2300.0,14.7,78,3,datsun 510 269 | 30.9,4,105.0,75.0,2230.0,14.5,78,1,dodge omni 270 | 21.1,4,134.0,95.0,2515.0,14.8,78,3,toyota celica gt liftback 271 | 23.2,4,156.0,105.0,2745.0,16.7,78,1,plymouth sapporo 272 | 23.8,4,151.0,85.0,2855.0,17.6,78,1,oldsmobile starfire sx 273 | 23.9,4,119.0,97.0,2405.0,14.9,78,3,datsun 200-sx 274 | 20.3,5,131.0,103.0,2830.0,15.9,78,2,audi 5000 275 | 17.0,6,163.0,125.0,3140.0,13.6,78,2,volvo 264gl 276 | 21.6,4,121.0,115.0,2795.0,15.7,78,2,saab 99gle 277 | 16.2,6,163.0,133.0,3410.0,15.8,78,2,peugeot 604sl 278 | 31.5,4,89.0,71.0,1990.0,14.9,78,2,volkswagen scirocco 279 | 29.5,4,98.0,68.0,2135.0,16.6,78,3,honda accord lx 280 | 21.5,6,231.0,115.0,3245.0,15.4,79,1,pontiac lemans v6 281 | 19.8,6,200.0,85.0,2990.0,18.2,79,1,mercury zephyr 6 282 | 22.3,4,140.0,88.0,2890.0,17.3,79,1,ford fairmont 4 283 | 20.2,6,232.0,90.0,3265.0,18.2,79,1,amc concord dl 6 284 | 20.6,6,225.0,110.0,3360.0,16.6,79,1,dodge aspen 6 285 | 17.0,8,305.0,130.0,3840.0,15.4,79,1,chevrolet caprice classic 286 | 17.6,8,302.0,129.0,3725.0,13.4,79,1,ford ltd landau 287 | 16.5,8,351.0,138.0,3955.0,13.2,79,1,mercury grand marquis 288 | 18.2,8,318.0,135.0,3830.0,15.2,79,1,dodge st. regis 289 | 16.9,8,350.0,155.0,4360.0,14.9,79,1,buick estate wagon (sw) 290 | 15.5,8,351.0,142.0,4054.0,14.3,79,1,ford country squire (sw) 291 | 19.2,8,267.0,125.0,3605.0,15.0,79,1,chevrolet malibu classic (sw) 292 | 18.5,8,360.0,150.0,3940.0,13.0,79,1,chrysler lebaron town @ country (sw) 293 | 31.9,4,89.0,71.0,1925.0,14.0,79,2,vw rabbit custom 294 | 34.1,4,86.0,65.0,1975.0,15.2,79,3,maxda glc deluxe 295 | 35.7,4,98.0,80.0,1915.0,14.4,79,1,dodge colt hatchback custom 296 | 27.4,4,121.0,80.0,2670.0,15.0,79,1,amc spirit dl 297 | 25.4,5,183.0,77.0,3530.0,20.1,79,2,mercedes benz 300d 298 | 23.0,8,350.0,125.0,3900.0,17.4,79,1,cadillac eldorado 299 | 27.2,4,141.0,71.0,3190.0,24.8,79,2,peugeot 504 300 | 23.9,8,260.0,90.0,3420.0,22.2,79,1,oldsmobile cutlass salon brougham 301 | 34.2,4,105.0,70.0,2200.0,13.2,79,1,plymouth horizon 302 | 34.5,4,105.0,70.0,2150.0,14.9,79,1,plymouth horizon tc3 303 | 31.8,4,85.0,65.0,2020.0,19.2,79,3,datsun 210 304 | 37.3,4,91.0,69.0,2130.0,14.7,79,2,fiat strada custom 305 | 28.4,4,151.0,90.0,2670.0,16.0,79,1,buick skylark limited 306 | 28.8,6,173.0,115.0,2595.0,11.3,79,1,chevrolet citation 307 | 26.8,6,173.0,115.0,2700.0,12.9,79,1,oldsmobile omega brougham 308 | 33.5,4,151.0,90.0,2556.0,13.2,79,1,pontiac phoenix 309 | 41.5,4,98.0,76.0,2144.0,14.7,80,2,vw rabbit 310 | 38.1,4,89.0,60.0,1968.0,18.8,80,3,toyota corolla tercel 311 | 32.1,4,98.0,70.0,2120.0,15.5,80,1,chevrolet chevette 312 | 37.2,4,86.0,65.0,2019.0,16.4,80,3,datsun 310 313 | 28.0,4,151.0,90.0,2678.0,16.5,80,1,chevrolet citation 314 | 26.4,4,140.0,88.0,2870.0,18.1,80,1,ford fairmont 315 | 24.3,4,151.0,90.0,3003.0,20.1,80,1,amc concord 316 | 19.1,6,225.0,90.0,3381.0,18.7,80,1,dodge aspen 317 | 34.3,4,97.0,78.0,2188.0,15.8,80,2,audi 4000 318 | 29.8,4,134.0,90.0,2711.0,15.5,80,3,toyota corona liftback 319 | 31.3,4,120.0,75.0,2542.0,17.5,80,3,mazda 626 320 | 37.0,4,119.0,92.0,2434.0,15.0,80,3,datsun 510 hatchback 321 | 32.2,4,108.0,75.0,2265.0,15.2,80,3,toyota corolla 322 | 46.6,4,86.0,65.0,2110.0,17.9,80,3,mazda glc 323 | 27.9,4,156.0,105.0,2800.0,14.4,80,1,dodge colt 324 | 40.8,4,85.0,65.0,2110.0,19.2,80,3,datsun 210 325 | 44.3,4,90.0,48.0,2085.0,21.7,80,2,vw rabbit c (diesel) 326 | 43.4,4,90.0,48.0,2335.0,23.7,80,2,vw dasher (diesel) 327 | 36.4,5,121.0,67.0,2950.0,19.9,80,2,audi 5000s (diesel) 328 | 30.0,4,146.0,67.0,3250.0,21.8,80,2,mercedes-benz 240d 329 | 44.6,4,91.0,67.0,1850.0,13.8,80,3,honda civic 1500 gl 330 | 33.8,4,97.0,67.0,2145.0,18.0,80,3,subaru dl 331 | 29.8,4,89.0,62.0,1845.0,15.3,80,2,vokswagen rabbit 332 | 32.7,6,168.0,132.0,2910.0,11.4,80,3,datsun 280-zx 333 | 23.7,3,70.0,100.0,2420.0,12.5,80,3,mazda rx-7 gs 334 | 35.0,4,122.0,88.0,2500.0,15.1,80,2,triumph tr7 coupe 335 | 32.4,4,107.0,72.0,2290.0,17.0,80,3,honda accord 336 | 27.2,4,135.0,84.0,2490.0,15.7,81,1,plymouth reliant 337 | 26.6,4,151.0,84.0,2635.0,16.4,81,1,buick skylark 338 | 25.8,4,156.0,92.0,2620.0,14.4,81,1,dodge aries wagon (sw) 339 | 23.5,6,173.0,110.0,2725.0,12.6,81,1,chevrolet citation 340 | 30.0,4,135.0,84.0,2385.0,12.9,81,1,plymouth reliant 341 | 39.1,4,79.0,58.0,1755.0,16.9,81,3,toyota starlet 342 | 39.0,4,86.0,64.0,1875.0,16.4,81,1,plymouth champ 343 | 35.1,4,81.0,60.0,1760.0,16.1,81,3,honda civic 1300 344 | 32.3,4,97.0,67.0,2065.0,17.8,81,3,subaru 345 | 37.0,4,85.0,65.0,1975.0,19.4,81,3,datsun 210 mpg 346 | 37.7,4,89.0,62.0,2050.0,17.3,81,3,toyota tercel 347 | 34.1,4,91.0,68.0,1985.0,16.0,81,3,mazda glc 4 348 | 34.7,4,105.0,63.0,2215.0,14.9,81,1,plymouth horizon 4 349 | 34.4,4,98.0,65.0,2045.0,16.2,81,1,ford escort 4w 350 | 29.9,4,98.0,65.0,2380.0,20.7,81,1,ford escort 2h 351 | 33.0,4,105.0,74.0,2190.0,14.2,81,2,volkswagen jetta 352 | 33.7,4,107.0,75.0,2210.0,14.4,81,3,honda prelude 353 | 32.4,4,108.0,75.0,2350.0,16.8,81,3,toyota corolla 354 | 32.9,4,119.0,100.0,2615.0,14.8,81,3,datsun 200sx 355 | 31.6,4,120.0,74.0,2635.0,18.3,81,3,mazda 626 356 | 28.1,4,141.0,80.0,3230.0,20.4,81,2,peugeot 505s turbo diesel 357 | 30.7,6,145.0,76.0,3160.0,19.6,81,2,volvo diesel 358 | 25.4,6,168.0,116.0,2900.0,12.6,81,3,toyota cressida 359 | 24.2,6,146.0,120.0,2930.0,13.8,81,3,datsun 810 maxima 360 | 22.4,6,231.0,110.0,3415.0,15.8,81,1,buick century 361 | 26.6,8,350.0,105.0,3725.0,19.0,81,1,oldsmobile cutlass ls 362 | 20.2,6,200.0,88.0,3060.0,17.1,81,1,ford granada gl 363 | 17.6,6,225.0,85.0,3465.0,16.6,81,1,chrysler lebaron salon 364 | 28.0,4,112.0,88.0,2605.0,19.6,82,1,chevrolet cavalier 365 | 27.0,4,112.0,88.0,2640.0,18.6,82,1,chevrolet cavalier wagon 366 | 34.0,4,112.0,88.0,2395.0,18.0,82,1,chevrolet cavalier 2-door 367 | 31.0,4,112.0,85.0,2575.0,16.2,82,1,pontiac j2000 se hatchback 368 | 29.0,4,135.0,84.0,2525.0,16.0,82,1,dodge aries se 369 | 27.0,4,151.0,90.0,2735.0,18.0,82,1,pontiac phoenix 370 | 24.0,4,140.0,92.0,2865.0,16.4,82,1,ford fairmont futura 371 | 36.0,4,105.0,74.0,1980.0,15.3,82,2,volkswagen rabbit l 372 | 37.0,4,91.0,68.0,2025.0,18.2,82,3,mazda glc custom l 373 | 31.0,4,91.0,68.0,1970.0,17.6,82,3,mazda glc custom 374 | 38.0,4,105.0,63.0,2125.0,14.7,82,1,plymouth horizon miser 375 | 36.0,4,98.0,70.0,2125.0,17.3,82,1,mercury lynx l 376 | 36.0,4,120.0,88.0,2160.0,14.5,82,3,nissan stanza xe 377 | 36.0,4,107.0,75.0,2205.0,14.5,82,3,honda accord 378 | 34.0,4,108.0,70.0,2245.0,16.9,82,3,toyota corolla 379 | 38.0,4,91.0,67.0,1965.0,15.0,82,3,honda civic 380 | 32.0,4,91.0,67.0,1965.0,15.7,82,3,honda civic (auto) 381 | 38.0,4,91.0,67.0,1995.0,16.2,82,3,datsun 310 gx 382 | 25.0,6,181.0,110.0,2945.0,16.4,82,1,buick century limited 383 | 38.0,6,262.0,85.0,3015.0,17.0,82,1,oldsmobile cutlass ciera (diesel) 384 | 26.0,4,156.0,92.0,2585.0,14.5,82,1,chrysler lebaron medallion 385 | 22.0,6,232.0,112.0,2835.0,14.7,82,1,ford granada l 386 | 32.0,4,144.0,96.0,2665.0,13.9,82,3,toyota celica gt 387 | 36.0,4,135.0,84.0,2370.0,13.0,82,1,dodge charger 2.2 388 | 27.0,4,151.0,90.0,2950.0,17.3,82,1,chevrolet camaro 389 | 27.0,4,140.0,86.0,2790.0,15.6,82,1,ford mustang gl 390 | 44.0,4,97.0,52.0,2130.0,24.6,82,2,vw pickup 391 | 32.0,4,135.0,84.0,2295.0,11.6,82,1,dodge rampage 392 | 28.0,4,120.0,79.0,2625.0,18.6,82,1,ford ranger 393 | 31.0,4,119.0,82.0,2720.0,19.4,82,1,chevy s-10 394 | -------------------------------------------------------------------------------- /data/casting_images.npz: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/engineersCode/EngComp6_deeplearning/8f76ef44050625abe041583ae1989b86ff5a717b/data/casting_images.npz -------------------------------------------------------------------------------- /images/descent.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/engineersCode/EngComp6_deeplearning/8f76ef44050625abe041583ae1989b86ff5a717b/images/descent.png -------------------------------------------------------------------------------- /images/residuals.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/engineersCode/EngComp6_deeplearning/8f76ef44050625abe041583ae1989b86ff5a717b/images/residuals.png -------------------------------------------------------------------------------- /notebooks_en/1_Linear_Regression.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "###### Content under Creative Commons Attribution license CC-BY 4.0, code under BSD 3-Clause License © 2021 Lorena A. Barba" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "# Linear regression by gradient descent\n", 15 | "\n", 16 | "This module of _Engineering Computations_ takes a step-by-step approach to introduce you to the essential ideas of deep learning, an algorithmic technology that is taking the world by storm. \n", 17 | "It is at the core of the artificial intelligence boom, and we think every scientist and engineer should understand the basics, at least. \n", 18 | "\n", 19 | "Another term for deep learning is deep neural networks. \n", 20 | "In this module, you will learn how neural-network models are built, computationally. \n", 21 | "The inspiration for deep learning may have been how the brain works, but in practice what we have is a method to build models, using mostly linear algebra and a little bit of calculus. \n", 22 | "These models are not magical, or even \"intelligent\"—they are just about _optimization_, which every engineer knows about!\n", 23 | "\n", 24 | "In this lesson, we take the first step of model-building: linear regression. \n", 25 | "The very first module of the _Engineering Computations_ series discusses [linear regression with real data](http://go.gwu.edu/engcomp1lesson5), and there we found the model parameters (slope and $y$-intercept) analytically. \n", 26 | "Let's forget about that for this lesson. \n", 27 | "The key concept we introduce here will be _gradient descent_. Start your ride here!" 28 | ] 29 | }, 30 | { 31 | "cell_type": "markdown", 32 | "metadata": {}, 33 | "source": [ 34 | "## Gradient descent\n", 35 | "\n", 36 | "This lesson is partly based on a tutorial at the 2019 SciPy Conference by Eric Ma [1]. He begins his tutorial by presenting the idea of _gradient descent_ with a simple quadratic function: the question is how do we find this function's minimum?\n", 37 | "\n", 38 | "$$f(w) = w^2 +3w -5$$\n", 39 | "\n", 40 | "We know from calculus that at the minimum, the derivative of the function is zero (the tangent to the function curve is horizontal), and the second derivative is positive (the curve slants _up_ on each side of the minimum). \n", 41 | "The analytical derivative of the function above is $f^\\prime(w) = 2w + 3$ and the second derivative is $f^{\\prime\\prime}(w)=2>0$. Thus, we make $2w+3=0$ to find the minimum.\n", 42 | "\n", 43 | "Let's play with this function using SymPy. We'll later use NumPy, and make plots with Matplotlib, so we load all the libraries in one place." 44 | ] 45 | }, 46 | { 47 | "cell_type": "code", 48 | "execution_count": 1, 49 | "metadata": {}, 50 | "outputs": [], 51 | "source": [ 52 | "import sympy\n", 53 | "import numpy\n", 54 | "\n", 55 | "from matplotlib import pyplot\n", 56 | "%matplotlib inline" 57 | ] 58 | }, 59 | { 60 | "cell_type": "markdown", 61 | "metadata": {}, 62 | "source": [ 63 | "We run this SymPy method to get beautiful typeset symbols and equations (in the Jupyter notebook, it will use [MathJax](https://en.wikipedia.org/wiki/MathJax) by default): " 64 | ] 65 | }, 66 | { 67 | "cell_type": "code", 68 | "execution_count": 2, 69 | "metadata": {}, 70 | "outputs": [], 71 | "source": [ 72 | "sympy.init_printing()" 73 | ] 74 | }, 75 | { 76 | "cell_type": "markdown", 77 | "metadata": {}, 78 | "source": [ 79 | "Now we'll define the Python variable `w` to be a SymPy symbol, and create the expression `f` to match the mathematical function above, and plot it." 80 | ] 81 | }, 82 | { 83 | "cell_type": "code", 84 | "execution_count": 3, 85 | "metadata": {}, 86 | "outputs": [ 87 | { 88 | "data": { 89 | "image/png": "iVBORw0KGgoAAAANSUhEUgAAAG8AAAAVCAYAAABIfLDHAAAACXBIWXMAAA7EAAAOxAGVKw4bAAADoUlEQVRoBe2Z61HbQBCAhYcCCOnA6QBSAnQQkgoCHYThF/7HOB0AFSShA0IFAXdgOgjjDsj3nXUaSSNsZMuyTbwz691b3WNft3eSt56fn5N5oNfr7TD+LJ2jm9KvyEfzzLsZO90D29O7TO3RJ1AnsRf8JfwD+CHKNnQxHug0MO0xATvIzdOH7yLby8k27AI80ETw3HX3C9BtM+U0D3jmNYnn5+d9cNjknP/rXPhxr2w7sh2wq7yJMy/Lj7RUfkKwnwlbZtDBS1M8g71M2fZc/t2yKk0sd4fe2jBIJ5MXgn8bC17qNM+7ffil3DRTQ0+hMXgJvMl0Cz0Cb7R8jeAp1dX7wyOo/hfYEfzbSPCYzOzWaYfQJG1LXbAWMMbLjxeeq1oDx52PIV6gDFYMVNxxvs5E2QxTL2XIADuOXlp57gsLkxs4Xw8u4fdE+FMwZg1sLbA0xPJQayCdLS9mZchMB6NPxtt+S9DEzvOdTmdLM8BpWenKhAtmWNNd9i6/DDLLpmCCvSkIwcNAnW/JeQ8OaWclC96ddAINwYDqjOxso11wFs9WBtDNEqyu6p+3aWZ72zYOvY2L+hobq5xnXrjAxLJ5huA7D/6AGpsHzwoHRRjB2P4YBatGscXy/Q29TDgNvS/puC72GrSfxgb0KBIf4E3KpAPjzjJogheO8lllx1sfCvS3NOmQ2pcRx7cB6OhBr8Ee9j9ADQ7lE7o29qLrIehmCQCvz/V/OAIsm48IwzaE/wxegAGQa6jRjze28QMyOZ0otmtTxqtAyKDS4F3bPK86MyfevkrzhCbz3IA64BfUEr8Qe5lbP92B0teCry/R968dYwAPGNfd5kfDEqiZ6cLZ2QCvc0c8Ky8wRD4XMGdVcKIevipYxmsBY0w25yjra9nUFo0OrwvQRu1lPv3Y2McJ5rPa7UJfmnOnQ4cIlkwzOwQzFSor7Dqe66Cyc9LuSyfeeC2RJuE0WHV7vVNU2REr0yAfvC6dy+dYlewLzikEdJqXWnxu4sUymV9WRwh5vatsq5Ity94r/Fz1t5oVJNjRCSaNf9xNWaQZaCejrEEBUll2eYnyFaLexgr6oXMsj74uGNwIq26vHz0K76a0vUEL4avLll+nI+Q6a+Rf0PPvGnwClRXeAWk3Duigs2c681SG8SZd/pOSyVf5YXoV7FXnlwD91N2EFNxIxsHPkCEJC8Gzx7IBxeYK3rL1b3P9fNlsc91Ja5lVIbMmddo8S5J/Uya9Zk7UnmQAAAAASUVORK5CYII=\n", 90 | "text/latex": [ 91 | "$\\displaystyle w^{2} + 3 w - 5$" 92 | ], 93 | "text/plain": [ 94 | " 2 \n", 95 | "w + 3⋅w - 5" 96 | ] 97 | }, 98 | "execution_count": 3, 99 | "metadata": {}, 100 | "output_type": "execute_result" 101 | } 102 | ], 103 | "source": [ 104 | "w = sympy.Symbol('w', real=True)\n", 105 | "\n", 106 | "f = w**2 + 3*w - 5\n", 107 | "f" 108 | ] 109 | }, 110 | { 111 | "cell_type": "code", 112 | "execution_count": 4, 113 | "metadata": {}, 114 | "outputs": [ 115 | { 116 | "data": { 117 | "image/png": "\n", 118 | "text/plain": [ 119 | "
" 120 | ] 121 | }, 122 | "metadata": { 123 | "needs_background": "light" 124 | }, 125 | "output_type": "display_data" 126 | } 127 | ], 128 | "source": [ 129 | "sympy.plotting.plot(f);" 130 | ] 131 | }, 132 | { 133 | "cell_type": "markdown", 134 | "metadata": {}, 135 | "source": [ 136 | "A neat parabola. We can see from the plot that the minimum of $f(w)$ is reached somewhere between $w=-2.5$ and $w=0$. SymPy can tell us the derivative, and the value where it is zero:" 137 | ] 138 | }, 139 | { 140 | "cell_type": "code", 141 | "execution_count": 5, 142 | "metadata": {}, 143 | "outputs": [ 144 | { 145 | "data": { 146 | "image/png": "iVBORw0KGgoAAAANSUhEUgAAADwAAAAQCAYAAABKkhw/AAAACXBIWXMAAA7EAAAOxAGVKw4bAAACn0lEQVRIDdWX61EbQQyAD8YF8OgAOgCnA9MBhA6ggzD8sv9loAOSChjoIO4gQAfQQTzuwPk+ze6xdzmG4IwPohkhraRdPVbaw2uLxaKaTCY7VVWdgcIQnIFnyB8U/E+QcjlNMW9Aze0C+VTZ2ng8VnCF4ECBAH8B+QIeZEPlHx2I1QRNLidcwR8iuwGP4G/XNQBrA3iNvO05qGGvgO8ReLKkU/edsN8kM8TNsjhXYMIj8BEjq1OChhvI7YA+wTjasfytf0fQixIDiL/mFQxAE9trK1QmWNZ53t8bJQdz2SwdIsu3faV8gOCoNCj4PXn08XBBTdyW2QbtiG/QAHhtT6ExGlCdOCr78I0Kx4ae/uDb7o2RhY94bek/AKUJ2Mr55dbmHPkl9CfoISU4H2Xrm6TrYWnUF2/8oI+uF+CF3WXfnQmj9LG6TQlWHsDaRAVf81lwz3+s5I+8xN7W0tFTlvVJ8f8AXoJ27zV4Dx+t7Qw3AIW9/pSMs851tDaCz+DXrEBuMWx3kyzhDt2LCaPTj4Vqw5YC9I0vRzIykZdGsH1OrLH34uy4G+hmI2EEzugWtP4mu4t1zCHUKplcPb/wBj1HlwvCMuAxM10U+66EquRjB+r4vAnYY/E9ox2LLW2co7qlMTKZXWhdQXgdl7NpIaxyFABeUNa4XfQ6bjvVdtVwjwPb10vphEg4BfgJWj5SbrAIs2KnybfbtEt2zFmNIhRnrJL1InILl36GaTEdEJgB+0hN4eNbVViOkJWt5a1F22iDzjZx5jwjIMnqByzLe6LtCzPGPIZ+Nuf+L+2s1QG3ArN990sZ61wUq/kLdJ6/g3aCssY3mvWbIAW41AzriP1eQj2W8Ob2/OPBX0sfCf414ddyWX/N4B30dom4EvgNCwYAZwDIeMAAAAAASUVORK5CYII=\n", 147 | "text/latex": [ 148 | "$\\displaystyle 2 w + 3$" 149 | ], 150 | "text/plain": [ 151 | "2⋅w + 3" 152 | ] 153 | }, 154 | "execution_count": 5, 155 | "metadata": {}, 156 | "output_type": "execute_result" 157 | } 158 | ], 159 | "source": [ 160 | "fprime = f.diff(w)\n", 161 | "fprime" 162 | ] 163 | }, 164 | { 165 | "cell_type": "code", 166 | "execution_count": 6, 167 | "metadata": {}, 168 | "outputs": [ 169 | { 170 | "data": { 171 | "image/png": "iVBORw0KGgoAAAANSUhEUgAAACsAAAAzCAYAAAAO2PE2AAAACXBIWXMAAA7EAAAOxAGVKw4bAAACw0lEQVRoBe2a31HjMBCHk5srIFwJoQP+dBA6IC1ACcw9Ja/QAbQQOgA6IHQAJcB1cPf9fBYjyZYszTBrHrwzGknrtfV5vVopmsw3m81i1iPb7fZPj9pExdi9TD8YfU/5iMq1CVV6kF3EI77dHM++0rjkbR7T9457BbYLCNY/LTEYdMl4l+2Y+tTqX5c6ygwWIMFdUTvYGe1zdA/Ua8o97awoZq1En/KiBXRjutD77RS52hL2BRBlGJVGAP9sO12utgwDefHAh/G8fOvrU21LzwYMgK5QKEUqE90FFxMdM8+68QE7oi3QU4pC45lSJGPAClBlBriywZ7622UD8QUCpNKVJtmO9iK42NMxi1lgjlR6GFwYKDSyYgYLhfYg+uSDHkwRW8asPvcjsKp9OWk7boHwrwVtS9irYGQ6gGuCydNKX/FLxOYzM1hg7igrir8ALCE6QzfoVZGbwWqwFqoITPaxWE6weOzq/gRb7bLCGybPFjqq2mzybLXLCm/I5lnyolaXJ4rqUtF2r9kCuhvo/3XtoRrbecpmCFZL4HHq5lJ9DqD0GbKbYrbGWzW2k2drvFVjm51gNQ8qsWWiLbFz+1ptut/VRx9kj9SzzGBb0FvqMwdDW+cG+qlTtKe1jNnmQMOBqgZSXlZ61HnsoFjCrqB5BTBeYLQZX6BXiGTFElZQb0DJk30Sv0THxjJm153R/yuaswReYnCSWXq2wwqgQPX5XYbo2PiKUWEB0cS6B/rGh0q1R4MFUD/JFcOp8OgwjwILoI7sf1F/5twOWY/CHBZAncIc+h6lvVTp4QtUprAAaUKdUscTSi+gpTcrlqlLntOE0uGcf4QkQB0rDU4yM1iAHigCVrzGMphjdYMZLJ47jAlr+6YxWwsX20+wsUe+qu9iVnku2PXQT+2Ovmrs5HNiFgwbNhcGSiX+HyS+258jGp5/R7zM+TmH0KYAAAAASUVORK5CYII=\n", 172 | "text/latex": [ 173 | "$\\displaystyle \\left[ - \\frac{3}{2}\\right]$" 174 | ], 175 | "text/plain": [ 176 | "[-3/2]" 177 | ] 178 | }, 179 | "execution_count": 6, 180 | "metadata": {}, 181 | "output_type": "execute_result" 182 | } 183 | ], 184 | "source": [ 185 | "sympy.solve(fprime, w)" 186 | ] 187 | }, 188 | { 189 | "cell_type": "markdown", 190 | "metadata": {}, 191 | "source": [ 192 | "That looks about right: $-3/2$ or $-1.5$. \n", 193 | "We could have also solved this by hand, because it's a simple function. \n", 194 | "But for more complicated functions, finding the minimum analytically could be more difficult. \n", 195 | "Instead, we can use the iterative method of gradient descent. \n", 196 | "\n", 197 | "The idea in gradient descent is to find the value of $w$ at the function minimum by starting with an initial guess, then iteratively taking small steps down the slope of the function, i.e., in the negative gradient direction. \n", 198 | "To illustrate the process, we turn the symbolic expression `fprime` into a Python function that we can call, and use it in a simple loop taking small steps:" 199 | ] 200 | }, 201 | { 202 | "cell_type": "code", 203 | "execution_count": 7, 204 | "metadata": {}, 205 | "outputs": [ 206 | { 207 | "data": { 208 | "text/plain": [ 209 | "function" 210 | ] 211 | }, 212 | "execution_count": 7, 213 | "metadata": {}, 214 | "output_type": "execute_result" 215 | } 216 | ], 217 | "source": [ 218 | "fpnum = sympy.lambdify(w, fprime)\n", 219 | "type(fpnum)" 220 | ] 221 | }, 222 | { 223 | "cell_type": "markdown", 224 | "metadata": {}, 225 | "source": [ 226 | "Yep. We got a Python function with the [`sympy.lambdify()`](https://docs.sympy.org/latest/modules/utilities/lambdify.html) method, whose return value is of type `function`. Now, you can pick any starting guess, say $w=10$, and advance in a loop taking steps of size $0.01$ (a choice we make; more on this later):" 227 | ] 228 | }, 229 | { 230 | "cell_type": "code", 231 | "execution_count": 8, 232 | "metadata": {}, 233 | "outputs": [ 234 | { 235 | "name": "stdout", 236 | "output_type": "stream", 237 | "text": [ 238 | "-1.4999999806458753\n" 239 | ] 240 | } 241 | ], 242 | "source": [ 243 | "w = 10.0 # starting guess for the min\n", 244 | "\n", 245 | "for i in range(1000):\n", 246 | " w = w - fpnum(w)*0.01 # with 0.01 the step size\n", 247 | "\n", 248 | "print(w)" 249 | ] 250 | }, 251 | { 252 | "cell_type": "markdown", 253 | "metadata": {}, 254 | "source": [ 255 | "That gave a result very close to the true value $-1.5$, and all we needed was a function for the derivative of $f(w)$. This is how you find the argument of the minimum of a function iteratively. \n", 256 | "\n", 257 | "##### Note\n", 258 | "\n", 259 | "> Implied in this method is that the function is differentiable, and that we can step *down* the slope, meaning its second derivative is positive, or the function is _convex_.\n", 260 | "\n", 261 | " \n", 262 | "\n", 263 | "#### Gradient descent steps in the direction of the negative slope to approach the minimum." 264 | ] 265 | }, 266 | { 267 | "cell_type": "markdown", 268 | "metadata": {}, 269 | "source": [ 270 | "## Linear regression\n", 271 | "\n", 272 | "Suppose you have data consisting of one independent variable and one dependent variable, and when you plot the data it seems to noisily follow a trend line. \n", 273 | "To build a model with this data, you assume the relationship is _linear_, and seek to find the line's slope and $y$-intercept (the model parameters) that best fit the data. \n", 274 | "\n", 275 | "Though this sounds straightforward, some key ideas of machine learning are contained:\n", 276 | "\n", 277 | "- we don't _know_ the true relationship between the variables, we _assume_ it is linear (and go for it!)\n", 278 | "- the model we chose (linear) has some parameters (slope, intercept) that are unknown\n", 279 | "- we will need some data (observational, experimental) of the dependent and independent variables\n", 280 | "- we find the model parameters by fitting the \"best\" line to the data\n", 281 | "- the model with its parameters can then be used to make _predictions_\n", 282 | "\n", 283 | "Let's make some synthetic data to play with, following the example in Eric Ma's tutorial [1]." 284 | ] 285 | }, 286 | { 287 | "cell_type": "code", 288 | "execution_count": 9, 289 | "metadata": {}, 290 | "outputs": [ 291 | { 292 | "data": { 293 | "image/png": "\n", 294 | "text/plain": [ 295 | "
" 296 | ] 297 | }, 298 | "metadata": { 299 | "needs_background": "light" 300 | }, 301 | "output_type": "display_data" 302 | } 303 | ], 304 | "source": [ 305 | "# make sythetic data (from Eric's example)\n", 306 | "x_data = numpy.linspace(-5, 5, 100)\n", 307 | "w_true = 2\n", 308 | "b_true = 20\n", 309 | "\n", 310 | "y_data = w_true*x_data + b_true + numpy.random.normal(size=len(x_data))\n", 311 | "\n", 312 | "pyplot.scatter(x_data,y_data);" 313 | ] 314 | }, 315 | { 316 | "cell_type": "markdown", 317 | "metadata": {}, 318 | "source": [ 319 | "This situation arises often. In **Module 1** of _Engineering Computations_, we used a real data set of Earth temperature over time and we fit an ominously sloped line. \n", 320 | "We derived analytical formulas for the model coefficients and wrote our own custom functions, and we also learned that NumPy has a built-in function that will do it for us: `numpy.polyfit(x, y, 1)` will return the two parameters $w, b$ for the line\n", 321 | "\n", 322 | "$$y = w x + b $$\n", 323 | "\n", 324 | "Here, we will instead use gradient descent to get the parameters of the linear model. \n", 325 | "The first step is to define a function that represents the _deviation_ of the data from the model. \n", 326 | "For linear regression, we use the sum (or the mean) of the square _errors_: the differences between each data point and the predicted value from the linear model (also called _residuals_).\n", 327 | "\n", 328 | " \n", 329 | "\n", 330 | "#### Each data point deviates from the linear regression: we aim to minimize the sum of squares of the residuals.\n", 331 | "\n", 332 | "\n", 333 | "Let's review our ingredients:\n", 334 | "\n", 335 | "1. observational data, in the form of two arrays: $x, y$\n", 336 | "2. our linear model: $y = wx + b$\n", 337 | "3. a function that measures the discrepancy between the data and the fitting line: $\\frac{1}{N}\\sum (y_i - f(x_i))^2$\n", 338 | "\n", 339 | "The last item is called a \"loss function\" (also sometimes \"cost function\"). Our method will be to step down the slope of the loss function, to find its minimum.\n", 340 | "\n", 341 | "As a first approach, let's again use SymPy, which can compute derivatives for us. Below, we define the loss function for a single data point, and make Python functions with its derivatives with respect to the model parameters. \n", 342 | "We will call these functions in a sequence of steps that start at an initial guess for the parameters (we choose zero), and step in the negative gradient multiplied by a step size (we choose $0.01$). \n", 343 | "After $1000$ steps, you see that the values of $w$ and $b$ are quite close to the true values from our synthetic data." 344 | ] 345 | }, 346 | { 347 | "cell_type": "code", 348 | "execution_count": 10, 349 | "metadata": {}, 350 | "outputs": [ 351 | { 352 | "data": { 353 | "image/png": "iVBORw0KGgoAAAANSUhEUgAAAHgAAAAaCAYAAAB8WJiDAAAACXBIWXMAAA7EAAAOxAGVKw4bAAAE+UlEQVRoBe2a7VHcMBCGjxsKIKSCQAcQKgjpIEAFQAcw/OMfAx0AFRDoAKiAjw5IByHXAXkfI+1Isuz4jvMRJ94ZnaTVStoP7WplmHt5eRn00G0NHB4eLkiCAyfFkqu3hR/Nd1u0nnungWMZc9drQ+1TtR9Uloce2ded1sCOjLoeSHCs9pJwK72BA610uIn33uf4n+vv4Jxauo2T5+LB31SXQ7SQ/pLutpT/EPfj2ES0KxhXZRUVRCFag3vCQdDD36UB7lNsUwvuIOC9q2qPILYQLQRWX1O9zwCgNun3pcpnlWf1l1X38A4akO4x8Ej1WW574Ym8+6qLbNr1Xw2sDoa8VV24dbqA8KTc96otFU9p+n77GnB2+KK68E6/o/oYl6eROafa2Grfh2jcGoIqIGxfVw02xYuRdZWdpvQ9XUkD2AhbpYAD8kyi9oWnk33o2FQn650Yxa124+q3VEQKSg8TaEC2OFP5pUIoNi9W+0PVckMNcvf+qCIQ/ivj4YI1tP1Q+xrAVptNt+FTJQas8048+FEG9lncmvoX6l+pngloL7ye0P5R5Ul9SzTU5vrYVe2TCw5slEmqP9C4X4MuMmyrcHdtqQB3onkPmUhcH7R3KBO8ngu3AWMJYCtsZvTJeNTlDiZDfoqwruOUggJRBOHhRDWKuVQb3KzgwO19pw3TO+hAuJAXQhd95AqB77UnwTrnGiQnIDFBoem64dw22l4mcpt0bzyUg5oDbBXKm6Mx3FAthHs2TNzw969lblIICqRUMRCv8Mae9uOAYViAk5vyCo+WAIqeE/6oYteOcHh/qETPv/eCxWRc3faggUzIWRVVkb+xgQnRCIfAOWAjwnM6viA84TILoifb84cjpGGvgcZzCR375EIS9z8GAzjZR0VLP8JjfHhJlcGTzgys8bTPc9DkEm1uX5G8gsbZ41aFuilsaJ7nO51TKZMjRHcmZzIZuRrzgYHrgI2ie0lMe8N5ryrNF03OgAPh8Xq+yhDqG4FoRxC6uQjmvQ40vPD4TxUZXTmZ8eigsFAdaD48ZL8R1M2rGnPrDVTDfySTcFWH1i9X55CexmpCNC7PJjkgFKSG9Kc99Zrc/GnictGkFMqcglKDGx+BUu3gCrdAMaLZNdClRRK3bdWh9VzBJzZrBPOiwuXrYrqFOqcETj9Z66jRDtMjgkfjxS0LzgzlcFvizb7oOJ4vNUaSxaEkuuD14VokPDbHrTOLKidT6dAmjODBIe/JcNwdqstpX4vR1kN5YTaKor5LGWGYNOKWG/BpXiYeOOkIi5IKcDhLuBwaOsqzxkun381Jo5Sb2noVGcrxAq+pDCEjRf4QIura8xq8UMFwOdgWkvcYbzWUw/PoPYw70L58vTmliA+ix0+VTyrw53HRG1ljAF4LzyhuIFqiz7Wbw2c9DJ9GAUhnAUQN+CfDRx7/x5y66w85sjmO8CUo/pqkDUhK6rK+0sRJENpn7CRrkn26OscZel11NqETnmjF4fQH4Y+iEqIBTlDjU1HMmOwHz6P894AxVYggBahNhOS9fvSKyf7i8eF7PksUIgsDa3FCGM8Xu89Comm1tf6N22taS3Z5HaIZ16OHczX4Wpi9LpxtsNFYVyR3sAdSdu5isrge2tcA3rgog+2pJuSSX9TdveQZY0dZ+48OTR5oAzyYf9Zq/CGCeT20qwHZg0NwpTrKupvs+ht4KM5n5o25AwAAAABJRU5ErkJggg==\n", 354 | "text/latex": [ 355 | "$\\displaystyle \\left(b + w x - y\\right)^{2}$" 356 | ], 357 | "text/plain": [ 358 | " 2\n", 359 | "(b + w⋅x - y) " 360 | ] 361 | }, 362 | "execution_count": 10, 363 | "metadata": {}, 364 | "output_type": "execute_result" 365 | } 366 | ], 367 | "source": [ 368 | "w, b, x, y = sympy.symbols('w b x y')\n", 369 | "\n", 370 | "loss = (w*x + b - y)**2\n", 371 | "loss" 372 | ] 373 | }, 374 | { 375 | "cell_type": "code", 376 | "execution_count": 11, 377 | "metadata": {}, 378 | "outputs": [], 379 | "source": [ 380 | "grad_b = sympy.lambdify([w,b,x,y], loss.diff(b), 'numpy')\n", 381 | "grad_w = sympy.lambdify([w,b,x,y], loss.diff(w), 'numpy')" 382 | ] 383 | }, 384 | { 385 | "cell_type": "markdown", 386 | "metadata": {}, 387 | "source": [ 388 | "Be sure to read the documentation for [`sympy.lambdify()`](https://docs.sympy.org/latest/modules/utilities/lambdify.html), which explains the argument list.\n", 389 | "Now, we step down the slope. \n", 390 | "Note that we first compute the derivatives with respect to both parameters _at all the data points_ (thanks to NumPy array operations), and we take the average. \n", 391 | "Then we step both parameters (starting from an initial guess of zero)." 392 | ] 393 | }, 394 | { 395 | "cell_type": "code", 396 | "execution_count": 12, 397 | "metadata": {}, 398 | "outputs": [ 399 | { 400 | "name": "stdout", 401 | "output_type": "stream", 402 | "text": [ 403 | "2.03136593978268\n", 404 | "20.030956250406764\n" 405 | ] 406 | } 407 | ], 408 | "source": [ 409 | "w = 0\n", 410 | "b = 0\n", 411 | "\n", 412 | "for i in range(1000):\n", 413 | " descent_b = numpy.sum(grad_b(w,b,x_data,y_data))/len(x_data)\n", 414 | " descent_w = numpy.sum(grad_w(w,b,x_data,y_data))/len(x_data)\n", 415 | " w = w - descent_w*0.01 # with 0.01 the step size\n", 416 | " b = b - descent_b*0.01 \n", 417 | "\n", 418 | "print(w)\n", 419 | "print(b)" 420 | ] 421 | }, 422 | { 423 | "cell_type": "code", 424 | "execution_count": 13, 425 | "metadata": {}, 426 | "outputs": [ 427 | { 428 | "data": { 429 | "image/png": "\n", 430 | "text/plain": [ 431 | "
" 432 | ] 433 | }, 434 | "metadata": { 435 | "needs_background": "light" 436 | }, 437 | "output_type": "display_data" 438 | } 439 | ], 440 | "source": [ 441 | "pyplot.scatter(x_data,y_data)\n", 442 | "pyplot.plot(x_data, w*x_data + b, '-r');" 443 | ] 444 | }, 445 | { 446 | "cell_type": "markdown", 447 | "metadata": {}, 448 | "source": [ 449 | "It works! That line looks to be fitting the data pretty well. Now we have a \"best fit\" line that represents the data, and that we can use to estimate the value of the dependent variable for any value of the independent variable, even if not present in the data. That is, _to make predictions_.\n", 450 | "\n", 451 | "##### Key idea\n", 452 | "\n", 453 | "> \"Learning\" means building a model by finding the parameters that best fit the data. We do it by minimizing a loss function (a.k.a. cost function), which involves computing derivatives with respect to the parameters in the model. \n", 454 | "\n", 455 | "Here, we used SymPy to help us out with the derivatives, but for more complex models (which may have many parameters), this could be a cumbersome approach. \n", 456 | "Instead, we will make use of the technique of _automatic differentiation_, which evaluates the derivative of a function written in code. \n", 457 | "You'll learn more about it in the next lesson, on **logistic regression**." 458 | ] 459 | }, 460 | { 461 | "cell_type": "markdown", 462 | "metadata": {}, 463 | "source": [ 464 | "## What we've learned\n", 465 | "\n", 466 | "- Gradient descent can find a minimum of a function.\n", 467 | "- Linear regression starts by assuming a linear relationship between two variables.\n", 468 | "- A model includes the assumed relationship in the data and model parameters.\n", 469 | "- Observational data allows finding the parameters in the model (slope, intercept).\n", 470 | "- A loss function captures the deviation between the observed and the predicted values of the dependent variable.\n", 471 | "- We find the parameters by minimizing the loss function via gradient descent.\n", 472 | "- SymPy computes derivatives with `sympy.diff()` and returns numeric functions with `simpy.lambdify()`." 473 | ] 474 | }, 475 | { 476 | "cell_type": "markdown", 477 | "metadata": {}, 478 | "source": [ 479 | "## References\n", 480 | "\n", 481 | "1. Eric Ma, \"Deep Learning Fundamentals: Forward Model, Differentiable Loss Function & Optimization,\" SciPy 2019 tutorial. [video on YouTube](https://youtu.be/JPBz7-UCqRo) and [archive on GitHub](https://github.com/ericmjl/dl-workshop/releases/tag/scipy2019)." 482 | ] 483 | }, 484 | { 485 | "cell_type": "code", 486 | "execution_count": 14, 487 | "metadata": {}, 488 | "outputs": [ 489 | { 490 | "data": { 491 | "text/html": [ 492 | "\n", 493 | "\n", 494 | "\n", 495 | "\n", 625 | "\n" 641 | ], 642 | "text/plain": [ 643 | "" 644 | ] 645 | }, 646 | "execution_count": 14, 647 | "metadata": {}, 648 | "output_type": "execute_result" 649 | } 650 | ], 651 | "source": [ 652 | "# Execute this cell to load the notebook's style sheet, then ignore it\n", 653 | "from IPython.core.display import HTML\n", 654 | "css_file = '../style/custom.css'\n", 655 | "HTML(open(css_file, \"r\").read())" 656 | ] 657 | } 658 | ], 659 | "metadata": { 660 | "kernelspec": { 661 | "display_name": "Python 3", 662 | "language": "python", 663 | "name": "python3" 664 | }, 665 | "language_info": { 666 | "codemirror_mode": { 667 | "name": "ipython", 668 | "version": 3 669 | }, 670 | "file_extension": ".py", 671 | "mimetype": "text/x-python", 672 | "name": "python", 673 | "nbconvert_exporter": "python", 674 | "pygments_lexer": "ipython3", 675 | "version": "3.8.5" 676 | } 677 | }, 678 | "nbformat": 4, 679 | "nbformat_minor": 2 680 | } 681 | -------------------------------------------------------------------------------- /notebooks_en/4_Polynomial_Regression.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "###### Content under Creative Commons Attribution license CC-BY 4.0, code under BSD 3-Clause License © 2021 Lorena A. Barba, Tingyu Wang" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "# Polynomial Regression\n", 15 | "\n", 16 | "In this fourth lesson of the _Engineering Computations_ module on deep learning, we play with a different model-building method: polynomial regression. \n", 17 | "We already saw one case in which the observational data could not be fit by a line, and that was the classification problem that we tackled with logistic regression (Lesson 3). \n", 18 | "Now imagine that your data wiggles about on the $x, y$ plane—where $x$ is the independent variable or feature, and $y$ is the dependent variable. \n", 19 | "You look at it and think: a curvilinear relationship might work. Good idea. \n", 20 | "\n", 21 | "It may surprise you to learn that fitting a polynomial to the data is a special case of multiple linear regression. Yes: _linear_. \n", 22 | "When we talk about linear models, we mean linear with respect to the parameters!" 23 | ] 24 | }, 25 | { 26 | "cell_type": "markdown", 27 | "metadata": {}, 28 | "source": [ 29 | "## A special case of multiple linear regression\n", 30 | "\n", 31 | "Let's generate some synthetic data using a polynomial function of fourth order, $y = x^4 + x^3 - 4x^2 $, with a bit of added noise.\n", 32 | "\n", 33 | "As usuall, we start by loading some needed Python libraries and functions, including the useful tools from `autograd` that you've learned about already." 34 | ] 35 | }, 36 | { 37 | "cell_type": "code", 38 | "execution_count": 1, 39 | "metadata": {}, 40 | "outputs": [], 41 | "source": [ 42 | "from matplotlib import pyplot\n", 43 | "from autograd import grad\n", 44 | "from autograd import numpy" 45 | ] 46 | }, 47 | { 48 | "cell_type": "code", 49 | "execution_count": 2, 50 | "metadata": {}, 51 | "outputs": [ 52 | { 53 | "data": { 54 | "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXkAAAD4CAYAAAAJmJb0AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8vihELAAAACXBIWXMAAAsTAAALEwEAmpwYAAATNUlEQVR4nO3df2xd513H8fcXE+BqA7lV3ZK4LRkoGAZlC7KqoSI01g1XgBZTqWj8UgSVokn8GBKYJUxiAlS1yBICIYSItkEQ+0G0pWnEYF6XMvFDbKu7lKVdalqNrY0TGjOw2MDa0uzLHz4uTuo0uT7n5vg89/2SrHvPc38836Pkfu45z/PceyMzkSSV6evaLkCSNDiGvCQVzJCXpIIZ8pJUMENekgr29W0XsN4NN9yQO3fubLsMSeqUxx577D8yc2yj27ZUyO/cuZP5+fm2y5CkTomIL1zuNodrJKlghrwkFcyQl6SCGfKSVDBDXpIKtqVW10jSsDl6YpHZuQXOLK+wY7THzNQE07vHG3t+Q16SWnL0xCIHjpxk5fwFABaXVzhw5CRAY0HvcI0ktWR2buHFgF+zcv4Cs3MLjfVhyEtSS84sr/TVvhmGvCS1ZMdor6/2zTDkJaklM1MT9LaNXNTW2zbCzNREY3048SpJLVmbXHV1jSQVanr3eKOhfimHaySpYIa8JBXMkJekghnyklQwQ16SCtZIyEfEaER8MCKeiohTEfEDEXF9RDwcEU9Xl9c10Zck6eo1dST/h8BHMvO7gNcAp4D9wPHM3AUcr7YlSddQ7ZCPiG8Bfgh4N0BmfjUzl4E9wKHqboeA6bp9SZL608SR/LcDS8CfRcSJiHhXRLwCuCkzzwJUlzdu9OCI2BcR8xExv7S01EA5kqQ1TYT81wPfD/xJZu4G/oc+hmYy82BmTmbm5NjYWAPlSJLWNBHyp4HTmfnJavuDrIb+8xGxHaC6PNdAX5KkPtQO+cz8d+C5iFj72rQ7gc8Cx4C9Vdte4KG6fUmS+tPUF5T9MvDeiPgG4HPAz7P6BnI4Iu4FngXuaagvSdJVaiTkM/NxYHKDm+5s4vklSZvjJ14lqWCGvCQVzJCXpIIZ8pJUMENekgpmyEtSwQx5SSqYIS9JBTPkJalghrwkFcyQl6SCGfKSVDBDXpIKZshLUsEMeUkqmCEvSQVr5EdDIuLzwJeAC8ALmTkZEdcDfwXsBD4P/GRm/lcT/UmSrk6TR/I/nJmvzcy1X4jaDxzPzF3A8WpbknQNDXK4Zg9wqLp+CJgeYF+SpA00FfIJfDQiHouIfVXbTZl5FqC6vHGjB0bEvoiYj4j5paWlhsqRJEFDY/LAHZl5JiJuBB6OiKeu9oGZeRA4CDA5OZkN1SNJoqEj+cw8U12eAx4Ebgeej4jtANXluSb6kiRdvdohHxGviIhvXrsO/AjwBHAM2FvdbS/wUN2+JEn9aWK45ibgwYhYe773ZeZHIuJR4HBE3As8C9zTQF+SpD7UDvnM/Bzwmg3avwjcWff5JUmb5ydeJalghrwkFcyQl6SCNbVOXpKG0tETi8zOLXBmeYUdoz1mpiaY3j3edlkvMuQlaZOOnljkwJGTrJy/AMDi8goHjpwE2DJB73CNJG3S7NzCiwG/ZuX8BWbnFlqq6KUMeUnapDPLK321t8GQl6RN2jHa66u9DYa8JG3SzNQEvW0jF7X1to0wMzXRUkUv5cSrJG3S2uSqq2skqVDTu8e3VKhfyuEaSSqYIS9JBTPkJalghrwkFayxkI+IkYg4ERF/XW1fHxEPR8TT1eV1TfUlSbo6TR7Jvw04tW57P3A8M3cBx6ttSdI11EjIR8TNwI8B71rXvAc4VF0/BEw30Zck6eo1dST/B8BvAF9b13ZTZp4FqC5vbKgvSdJVqh3yEfHjwLnMfGyTj98XEfMRMb+0tFS3HEnSOk0cyd8BvDkiPg98AHhDRPwl8HxEbAeoLs9t9ODMPJiZk5k5OTY21kA5kqQ1tUM+Mw9k5s2ZuRN4C/BIZv4scAzYW91tL/BQ3b4kSf0Z5Dr5B4A3RcTTwJuqbUnSNdToF5Rl5seBj1fXvwjc2eTzS5L64ydeJalghrwkFcyQl6SCGfKSVDBDXpIKZshLUsEMeUkqmCEvSQUz5CWpYIa8JBXMkJekgjX63TVtOXpikdm5Bc4sr7BjtMfM1ATTu8fbLkuSWtf5kD96YpEDR06ycv4CAIvLKxw4chLAoJc09Do/XDM7t/BiwK9ZOX+B2bmFliqSpK2j8yF/Znmlr3ZJGiadD/kdo72+2iVpmDTxQ97fFBGfioh/iYgnI+K3q/brI+LhiHi6uryufrkvNTM1QW/byEVtvW0jzExNDKI7SeqUJo7kvwK8ITNfA7wWuCsiXgfsB45n5i7geLXduOnd49x/922Mj/YIYHy0x/133+akqyTRwOqazEzgy9XmtuovgT3A66v2Q6z+LODb6/a3kend44a6JG2gkTH5iBiJiMeBc8DDmflJ4KbMPAtQXd54mcfui4j5iJhfWlpqohxJUqWRkM/MC5n5WuBm4PaI+N4+HnswMyczc3JsbKyJciRJlUZX12TmMqvDMncBz0fEdoDq8lyTfUmSrqyJ1TVjETFaXe8BbwSeAo4Be6u77QUeqtuXJKk/TXytwXbgUESMsPqmcTgz/zoi/hk4HBH3As8C9zTQlySpD02srvkMsHuD9i8Cd9Z9fknS5nX+E6+SpMsz5CWpYIa8JBXMkJekghnyklQwQ16SCmbIS1LBDHlJKpghL0kFM+QlqWCGvCQVzJCXpII18S2UktSaoycWmZ1b4MzyCjtGe8xMTfhzoOsY8pI66+iJRQ4cOcnK+QsALC6vcODISQCDvmLIS+qs2bmFFwN+zcr5C8zOLVx1yJd+JmDIS+qsM8srfbVfahjOBJr4+b9bIuLvIuJURDwZEW+r2q+PiIcj4unq8rr65UrS/9sx2uur/VIvdyZQiiZW17wA/FpmfjfwOuAXI+LVwH7geGbuAo5X25LUmJmpCXrbRi5q620bYWZq4qoeX/dMoAtqh3xmns3MT1fXvwScAsaBPcCh6m6HgOm6fUnSetO7x7n/7tsYH+0RwPhoj/vvvu2qh1rqngl0QaNj8hGxk9Xfe/0kcFNmnoXVN4KIuPEyj9kH7AO49dZbmyxH0hCY3j2+6fHzmamJi8bkob8zgS5o7MNQEfFK4EPAr2bmf1/t4zLzYGZOZubk2NhYU+VI0hXVPRPogkaO5CNiG6sB/97MPFI1Px8R26uj+O3AuSb6kqQm1TkT6IImVtcE8G7gVGb+/rqbjgF7q+t7gYfq9iVJ6k8TR/J3AD8HnIyIx6u23wQeAA5HxL3As8A9DfQlSepD7ZDPzH8E4jI331n3+SVJm+e3UEpSwQx5SSqYIS9JBTPkJalghrwkFcyQl6SCGfKSVDBDXpIKZshLUsEMeUkqmCEvSQUz5CWpYIa8JBXMkJekgjX6G6+S+nf0xCKzcwucWV5hx2iPmamJon+pSNeWIb8F+CIfXkdPLF70Q9KLyyscOHISwP8DakQjwzUR8Z6IOBcRT6xruz4iHo6Ip6vL65roqzRrL/LF5RWS/3+RHz2x2HZpugZm5xZeDPg1K+cvMDu30FJFKk1TY/J/Dtx1Sdt+4Hhm7gKOV9u6hC/y4XZmeaWvdqlfjYR8Zv498J+XNO8BDlXXDwHTTfRVGl/kw23HaK+vdqlfg1xdc1NmngWoLm/c6E4RsS8i5iNifmlpaYDlbE2+yIfbzNQEvW0jF7X1to0wMzXRUkUqTetLKDPzYGZOZubk2NhY2+Vcc77Ih9v07nHuv/s2xkd7BDA+2uP+u29z0lWNGeTqmucjYntmno2I7cC5AfbVWWsvZlfXDK/p3eP+e2tgBhnyx4C9wAPV5UMD7KvTfJFLGpRGQj4i3g+8HrghIk4D72Q13A9HxL3As8A9TfQ1CK5Tl1SqRkI+M3/qMjfd2cTzD1IJH0bxTUrS5bQ+8dq2rq9T98NUkl7O0Id819epd/1NStJgDX3Id32detffpCQN1tCHfNfXqXf9TUo6emKROx54hFft/zB3PPCIQ40NG/qQ7/qHUbr+JqX6uhySzikNnl81TLfXqfthquHW9dVhLzen1IX6u8CQL0CX36RUT9dD0jmlwTPkG+A6dbWl6yG5Y7TH4ga1OqfUHEO+pq6fLjeh629yXa6/6yE5MzVx0esHnFNq2tBPvNY17OvUuz5x1vX6uz7x3vWFD13gkXxNXT9drqvrY8Jdr38rTLzXPRNyTmmwDPmaun66XFfX3+S6Xj+0G5IOV259DtfU1PXT5bq6/mGsrtfftmEfruwCQ76mYR9T7PqbXBP1d/nDSHWVcCZUOodrGjDMY4pbYUy4jrr1D/twxbAPV3ZBZOZgO4i4C/hDYAR4V2Y+cLn7Tk5O5vz8/EDrkZp0xwOPbBhy46M9/mn/G1qo6Nq69E0OVs+EhulsdiuIiMcyc3Kj2wZ6JB8RI8AfA28CTgOPRsSxzPzsIPuVrpVhH67o+pncMBj0cM3twDOZ+TmAiPgAsAcw5FUEhyuGe7iyCwY98ToOPLdu+3TVJhWh6xPPMNwTx8Ng0EfysUHbRZMAEbEP2Adw6623DrgcqVldH64Y9onjYTDokD8N3LJu+2bgzPo7ZOZB4CCsTrwOuB4VqO3vnunycEXXP/GrKxt0yD8K7IqIVwGLwFuAnx5wnxoiHonWM+wTx8NgoGPymfkC8EvAHHAKOJyZTw6yTw0XP3FZj5/4Ld/AP/GamX+Tmd+Zmd+RmfcNuj8NF49E6ylh4lgvz681UKd5JFrPsH8txzDwaw3UujoTp/7oRH1dnjjWlRnyalXdidOuL2GUBs2QV6uaWMLnkah0eY7Jq1VOnEqDZcirVU6cSoNlyKtVLuGTBssxebXKiVNpsAx5tc6JU2lwHK6RpIIZ8pJUMENekgpmyEtSwQx5SSqYIS9JBTPkJalgtUI+Iu6JiCcj4msRMXnJbQci4pmIWIiIqXplSpI2o+6HoZ4A7gb+dH1jRLya1d9z/R5gB/CxiPjOzLzw0qeQJA1KrSP5zDyVmRv9mOYe4AOZ+ZXM/DfgGeD2On1Jkvo3qDH5ceC5ddunq7aXiIh9ETEfEfNLS0sDKkeShtMVh2si4mPAt25w0zsy86HLPWyDttzojpl5EDgIMDk5ueF9JEmbc8WQz8w3buJ5TwO3rNu+GTizieeRJNUwqOGaY8BbIuIbI+JVwC7gUwPqS5J0GXWXUP5ERJwGfgD4cETMAWTmk8Bh4LPAR4BfdGWNJF17tZZQZuaDwIOXue0+4L46zy9JqsdPvEpSwQx5SSqYIS9JBfM3XsXRE4v+kLZUKEN+yB09sciBIydZOb+6+GlxeYUDR04CGPRSARyuGXKzcwsvBvyalfMXmJ3b6CuJJHWNIT/kziyv9NUuqVsM+SG3Y7TXV7ukbjHkh9zM1AS9bSMXtfW2jTAzNdFSRZKa5MTrkFubXHV1jVQmQ15M7x431KVCOVwjSQUz5CWpYIa8JBXMkJekghnyklSwyNw6v50dEUvAF2o8xQ3AfzRUTptK2Q9wX7aiUvYD3Jc135aZYxvdsKVCvq6ImM/MybbrqKuU/QD3ZSsqZT/AfbkaDtdIUsEMeUkqWGkhf7DtAhpSyn6A+7IVlbIf4L5cUVFj8pKki5V2JC9JWseQl6SCFRXyEfG7EfGZiHg8Ij4aETvarmmzImI2Ip6q9ufBiBhtu6bNioh7IuLJiPhaRHRuuVtE3BURCxHxTETsb7uezYqI90TEuYh4ou1a6oqIWyLi7yLiVPV/621t17QZEfFNEfGpiPiXaj9+u/E+ShqTj4hvycz/rq7/CvDqzHxry2VtSkT8CPBIZr4QEb8HkJlvb7msTYmI7wa+Bvwp8OuZOd9ySVctIkaAfwXeBJwGHgV+KjM/22phmxARPwR8GfiLzPzetuupIyK2A9sz89MR8c3AY8B01/5dIiKAV2TmlyNiG/CPwNsy8xNN9VHUkfxawFdeAXT2HSwzP5qZL1SbnwBubrOeOjLzVGZ29ZfBbweeyczPZeZXgQ8Ae1quaVMy8++B/2y7jiZk5tnM/HR1/UvAKaBzP4qQq75cbW6r/hrNraJCHiAi7ouI54CfAX6r7Xoa8gvA37ZdxJAaB55bt32aDoZJySJiJ7Ab+GTLpWxKRIxExOPAOeDhzGx0PzoX8hHxsYh4YoO/PQCZ+Y7MvAV4L/BL7Vb78q60L9V93gG8wOr+bFlXsy8dFRu0dfYMsTQR8UrgQ8CvXnIm3xmZeSEzX8vq2frtEdHoUFrnfv4vM994lXd9H/Bh4J0DLKeWK+1LROwFfhy4M7f45Ekf/y5dcxq4Zd32zcCZlmrROtUY9oeA92bmkbbrqSszlyPi48BdQGOT4507kn85EbFr3eabgafaqqWuiLgLeDvw5sz837brGWKPArsi4lUR8Q3AW4BjLdc09KoJy3cDpzLz99uuZ7MiYmxt5VxE9IA30nBulba65kPABKsrOb4AvDUzF9utanMi4hngG4EvVk2f6PBKoZ8A/ggYA5aBxzNzqtWi+hARPwr8ATACvCcz72u3os2JiPcDr2f1K22fB96Zme9utahNiogfBP4BOMnq6x3gNzPzb9qrqn8R8X3AIVb/b30dcDgzf6fRPkoKeUnSxYoarpEkXcyQl6SCGfKSVDBDXpIKZshLUsEMeUkqmCEvSQX7PxmPnhXp37F8AAAAAElFTkSuQmCC\n", 55 | "text/plain": [ 56 | "
" 57 | ] 58 | }, 59 | "metadata": { 60 | "needs_background": "light" 61 | }, 62 | "output_type": "display_data" 63 | } 64 | ], 65 | "source": [ 66 | "numpy.random.seed(0) # fix seed for reproducibility\n", 67 | "x = numpy.linspace(-3, 3, 20)\n", 68 | "y = x**4 + x**3 - 4*x**2 + 8*numpy.random.normal(size=len(x))\n", 69 | "pyplot.scatter(x, y);" 70 | ] 71 | }, 72 | { 73 | "cell_type": "markdown", 74 | "metadata": {}, 75 | "source": [ 76 | "Suppose we were only given the data (not the function that generated them), and our goal is to fit a curve to these data. The nonlinear relationship between $x$ and $y$ suggests that a linear regression will fail. Intuitively, using polynomial functions may first come to your mind.\n", 77 | "\n", 78 | "Let's write the model as a $d$th-order polynomial on $x$, the only feature:\n", 79 | "\n", 80 | "$$\n", 81 | "\\hat{y} = w_0 + w_1 x + w_2 x^2 + \\cdots + w_d x^d, \n", 82 | "$$\n", 83 | "\n", 84 | "where $w$ denotes the weights. Keep in mind that in the model-fitting context, the objective is always to find the optimal values of these weights given $x$ and $y$. When viewed from a different perspective, the model above is just a linear combination of the weights. In fact, by creating polynomial features of $x$, namely, letting $x_i = x^i$, the model becomes:\n", 85 | "\n", 86 | "$$\n", 87 | "\\hat{y} = w_0 + w_1 x_1 + w_2 x_2 + \\ldots + w_d x_d.\n", 88 | "$$\n", 89 | "\n", 90 | "As you can see, the polynomial regression model is identical to multiple linear regression, with the matrix form being also $\\hat{\\mathbf{y}} = X\\mathbf{w}$, and the only gap is forming the matrix $X$ using the powers of $x$.\n", 91 | "\n", 92 | "Suppose we want to fit our data with a 3rd degree polynomial function; let's write a function to create these polynomial features first." 93 | ] 94 | }, 95 | { 96 | "cell_type": "code", 97 | "execution_count": 3, 98 | "metadata": {}, 99 | "outputs": [ 100 | { 101 | "name": "stdout", 102 | "output_type": "stream", 103 | "text": [ 104 | "(20, 4)\n" 105 | ] 106 | } 107 | ], 108 | "source": [ 109 | "degree = 3\n", 110 | "\n", 111 | "def polynomial_features(x, degree):\n", 112 | " \"\"\" Generate polynomial features for x.\"\"\"\n", 113 | " \n", 114 | " X = numpy.empty((len(x), degree+1))\n", 115 | " for i in range(degree+1):\n", 116 | " X[:,i] = x**i\n", 117 | " return X\n", 118 | "\n", 119 | "X = polynomial_features(x, degree)\n", 120 | "print(X.shape)" 121 | ] 122 | }, 123 | { 124 | "cell_type": "markdown", 125 | "metadata": {}, 126 | "source": [ 127 | "##### Note\n", 128 | "\n", 129 | "> Unsurprisingly, **scikit-learn** offers a counterpart: [`PolynomialFeatures()`](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.PolynomialFeatures.html).\n", 130 | "When your original data come with multiple features (e.g., $x_1$ and $x_2$), the polynomial features for regression will involve interaction terms (e.g., $x_1 x_2$ for 2nd-order polynomials).\n", 131 | "In that case, it is handy to use this function to generate all terms." 132 | ] 133 | }, 134 | { 135 | "cell_type": "markdown", 136 | "metadata": {}, 137 | "source": [ 138 | "## Scale the data, train the model\n", 139 | "\n", 140 | "Recall that for multiple linear regression, we should normalize each feature to the same scale.\n", 141 | "As in the previous lesson, let's use `MinMaxScaler()` to scale all features to $[0,1]$, except for the first column:\n", 142 | "$x_0$ is set to 1 for all entries, since $w_0$ represents the intercept." 143 | ] 144 | }, 145 | { 146 | "cell_type": "code", 147 | "execution_count": 4, 148 | "metadata": {}, 149 | "outputs": [], 150 | "source": [ 151 | "from sklearn.preprocessing import MinMaxScaler\n", 152 | "\n", 153 | "min_max_scaler = MinMaxScaler()\n", 154 | "X_scaled = min_max_scaler.fit_transform(X)\n", 155 | "X_scaled[:,0] = 1 # the column for intercept" 156 | ] 157 | }, 158 | { 159 | "cell_type": "markdown", 160 | "metadata": {}, 161 | "source": [ 162 | "When scaling the matrix of polynomial features $X$ above, we used:\n", 163 | "```python\n", 164 | "X_scaled = min_max_scaler.fit_transform(X)\n", 165 | "```\n", 166 | "The function `fit_transform()` is actually a combination of two steps:\n", 167 | "- [`fit()`](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MinMaxScaler.html#sklearn.preprocessing.MinMaxScaler.fit): compute $x_{\\min}$ and $x_{\\max}$ values for each feature and save the information to the variable `min_max_scaler`. You can access them via `min_max_scaler.data_min_` and `min_max_scaler.data_max_`.\n", 168 | "- [`transform()`](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MinMaxScaler.html#sklearn.preprocessing.MinMaxScaler.transform) use `min_max_scaler` to scale `X`.\n", 169 | "\n", 170 | "This will be helpful to remember a bit later." 171 | ] 172 | }, 173 | { 174 | "cell_type": "markdown", 175 | "metadata": {}, 176 | "source": [ 177 | "We can reuse the same model and loss function from Lesson 3:" 178 | ] 179 | }, 180 | { 181 | "cell_type": "code", 182 | "execution_count": 5, 183 | "metadata": {}, 184 | "outputs": [], 185 | "source": [ 186 | "def linear_regression(params, X):\n", 187 | " '''\n", 188 | " The linear regression model in matrix form.\n", 189 | " Arguments:\n", 190 | " params: 1D array of weights for the linear model\n", 191 | " X : 2D array of input values\n", 192 | " Returns:\n", 193 | " 1D array of predicted values\n", 194 | " '''\n", 195 | " return numpy.dot(X, params)\n", 196 | "\n", 197 | "def mse_loss(params, model, X, y):\n", 198 | " '''\n", 199 | " The mean squared error loss function.\n", 200 | " Arguments:\n", 201 | " params: 1D array of weights for the linear model\n", 202 | " model : function for the linear regression model\n", 203 | " X : 2D array of input values\n", 204 | " y : 1D array of predicted values\n", 205 | " Returns:\n", 206 | " float, mean squared error\n", 207 | " '''\n", 208 | " y_pred = model(params, X)\n", 209 | " return numpy.mean( numpy.sum((y-y_pred)**2) )\n", 210 | "\n", 211 | "gradient = grad(mse_loss)" 212 | ] 213 | }, 214 | { 215 | "cell_type": "markdown", 216 | "metadata": {}, 217 | "source": [ 218 | "Remember, \"training\" a model simply means finding the best parameters by minimizing a loss function. \n", 219 | "We'll choose both a maximum number of iterations in the optimization loop, and an exit criterion based on the norm of the gradient at the current iteration. \n", 220 | "(If this is quite small, when multiplied by the also-small learning rate, the parameters will change very little.)" 221 | ] 222 | }, 223 | { 224 | "cell_type": "code", 225 | "execution_count": 6, 226 | "metadata": {}, 227 | "outputs": [ 228 | { 229 | "name": "stdout", 230 | "output_type": "stream", 231 | "text": [ 232 | "iteration 0, loss = 5434.768, mae = 11.057\n", 233 | "iteration 100, loss = 1300.477, mae = 6.885\n", 234 | "iteration 200, loss = 1281.309, mae = 6.864\n", 235 | "iteration 300, loss = 1272.990, mae = 6.808\n", 236 | "iteration 400, loss = 1267.448, mae = 6.760\n", 237 | "iteration 500, loss = 1263.750, mae = 6.722\n", 238 | "iteration 600, loss = 1261.282, mae = 6.690\n", 239 | "iteration 700, loss = 1259.636, mae = 6.664\n", 240 | "iteration 800, loss = 1258.537, mae = 6.643\n", 241 | "iteration 900, loss = 1257.804, mae = 6.625\n", 242 | "iteration 1000, loss = 1257.314, mae = 6.611\n", 243 | "iteration 1100, loss = 1256.988, mae = 6.600\n", 244 | "iteration 1200, loss = 1256.770, mae = 6.590\n", 245 | "iteration 1300, loss = 1256.625, mae = 6.583\n", 246 | "iteration 1400, loss = 1256.528, mae = 6.576\n", 247 | "iteration 1500, loss = 1256.463, mae = 6.571\n", 248 | "iteration 1600, loss = 1256.420, mae = 6.567\n", 249 | "iteration 1700, loss = 1256.391, mae = 6.564\n", 250 | "iteration 1800, loss = 1256.372, mae = 6.561\n", 251 | "iteration 1900, loss = 1256.359, mae = 6.558\n", 252 | "iteration 2000, loss = 1256.350, mae = 6.557\n", 253 | "iteration 2100, loss = 1256.345, mae = 6.555\n", 254 | "iteration 2200, loss = 1256.341, mae = 6.554\n", 255 | "iteration 2300, loss = 1256.338, mae = 6.553\n", 256 | "iteration 2400, loss = 1256.337, mae = 6.552\n", 257 | "iteration 2500, loss = 1256.336, mae = 6.551\n", 258 | "iteration 2600, loss = 1256.335, mae = 6.551\n", 259 | "iteration 2700, loss = 1256.334, mae = 6.550\n", 260 | "iteration 2800, loss = 1256.334, mae = 6.550\n", 261 | "iteration 2900, loss = 1256.334, mae = 6.550\n" 262 | ] 263 | } 264 | ], 265 | "source": [ 266 | "max_iter = 3000\n", 267 | "alpha = 0.01\n", 268 | "params = numpy.zeros(X_scaled.shape[1])\n", 269 | "descent = numpy.ones(X_scaled.shape[1])\n", 270 | "i = 0\n", 271 | "\n", 272 | "from sklearn.metrics import mean_absolute_error\n", 273 | "\n", 274 | "while numpy.linalg.norm(descent) > 0.01 and i < max_iter:\n", 275 | " descent = gradient(params, linear_regression, X_scaled, y)\n", 276 | " params = params - descent * alpha\n", 277 | " loss = mse_loss(params, linear_regression, X_scaled, y)\n", 278 | " mae = mean_absolute_error(y, X_scaled@params)\n", 279 | " if i%100 == 0:\n", 280 | " print(f\"iteration {i:4}, {loss = :.3f}, {mae = :.3f}\")\n", 281 | " i += 1" 282 | ] 283 | }, 284 | { 285 | "cell_type": "markdown", 286 | "metadata": {}, 287 | "source": [ 288 | "Let's print out the weights." 289 | ] 290 | }, 291 | { 292 | "cell_type": "code", 293 | "execution_count": 7, 294 | "metadata": {}, 295 | "outputs": [ 296 | { 297 | "data": { 298 | "text/plain": [ 299 | "array([-22.51572398, 6.75930601, 41.30788709, 30.0105898 ])" 300 | ] 301 | }, 302 | "execution_count": 7, 303 | "metadata": {}, 304 | "output_type": "execute_result" 305 | } 306 | ], 307 | "source": [ 308 | "params" 309 | ] 310 | }, 311 | { 312 | "cell_type": "markdown", 313 | "metadata": {}, 314 | "source": [ 315 | "The first term is the intercept, and the rest are the weights of each **scaled** polynomial feature.\n", 316 | "\n", 317 | "Although our model has multiple weights, there's only one feature in our original data. \n", 318 | "Therefore, if we are given new values of $x$ to predict $y$, we need to first create the polynomial features, scale them to $[0,1]$ using the **same scaling function**, and then multiply by the weights. Since we have used the min-max scaling, it is important to use the same $x_{\\min}$ and $x_{\\max}$ of the training data to scale new data.\n", 319 | "\n", 320 | "Recall the explanation above for the scaling step. We now only need to call `min_max_scaler.transform()` to scale new data. Such design occurs very often in scikit-learn.\n", 321 | "\n", 322 | "With that in mind, let's plot the fitted curve together with the data. We generate some new values of $x$, `xgrid`, and predict $y$ at these locations. Don't forget to repeat the procedure of creating polynomial features and scaling them. Ponder over this code for a moment to wrap your head around it." 323 | ] 324 | }, 325 | { 326 | "cell_type": "code", 327 | "execution_count": 8, 328 | "metadata": {}, 329 | "outputs": [ 330 | { 331 | "data": { 332 | "image/png": "\n", 333 | "text/plain": [ 334 | "
" 335 | ] 336 | }, 337 | "metadata": { 338 | "needs_background": "light" 339 | }, 340 | "output_type": "display_data" 341 | } 342 | ], 343 | "source": [ 344 | "xgrid = numpy.linspace(x.min(), x.max(), 30)\n", 345 | "Xgrid_poly_feat = polynomial_features(xgrid, degree)\n", 346 | "Xgrid_scaled = min_max_scaler.transform(Xgrid_poly_feat)\n", 347 | "Xgrid_scaled[:,0] = 1 \n", 348 | "pyplot.scatter(x, y, c='r', label='true')\n", 349 | "pyplot.plot(xgrid, Xgrid_scaled@params, label='predicted')\n", 350 | "pyplot.legend();" 351 | ] 352 | }, 353 | { 354 | "cell_type": "markdown", 355 | "metadata": {}, 356 | "source": [ 357 | "## Observe underfitting & overfitting\n", 358 | "\n", 359 | "In the model above, we just randomly picked a polynomial degree of $3$ for the fitted curve. Is it good enough to model our dataset? Should we try higher-order polynomials?\n", 360 | "\n", 361 | "We can repeat our study with different polynomial degrees varying from $1$ to $15$, and see what happens.\n", 362 | "To faciliate this task, we provide you with a script to train the model and plot these fitted curves interactively using `ipywidget`.\n", 363 | "\n", 364 | "Run the cell below and drag the slider to see how the curve changes." 365 | ] 366 | }, 367 | { 368 | "cell_type": "code", 369 | "execution_count": 9, 370 | "metadata": {}, 371 | "outputs": [ 372 | { 373 | "data": { 374 | "application/vnd.jupyter.widget-view+json": { 375 | "model_id": "71f0d094cc19484483274e0b52147109", 376 | "version_major": 2, 377 | "version_minor": 0 378 | }, 379 | "text/plain": [ 380 | "interactive(children=(IntSlider(value=8, description='degree', max=15, min=1), Output()), _dom_classes=('widge…" 381 | ] 382 | }, 383 | "metadata": {}, 384 | "output_type": "display_data" 385 | } 386 | ], 387 | "source": [ 388 | "import sys\n", 389 | "sys.path.append('../scripts/')\n", 390 | "from plot_helpers import interact_polyreg\n", 391 | "\n", 392 | "max_degree = 15\n", 393 | "interact_polyreg(max_degree, x, y)" 394 | ] 395 | }, 396 | { 397 | "cell_type": "markdown", 398 | "metadata": {}, 399 | "source": [ 400 | "### Underfitting\n", 401 | "\n", 402 | "When `degree` is $1$, the straight line clearly fails to capture the underlying relationship between $x$ and $y$. Specifically, a line is too simple to explain how far the data are spread out, namely, the **variance** in the data. We say that the linear model **underfits** the data in this case.\n", 403 | "\n", 404 | "Underfitting happens when the model is too naive or the weights need to be trained with more iterations. It is often easy to detect, since a large training error is a good indicator.\n", 405 | "\n", 406 | "##### Challenge question\n", 407 | "\n", 408 | "> Would having more training data help resolve underfitting?\n", 409 | "\n", 410 | "### Overfitting\n", 411 | "\n", 412 | "As we increase the polynomial degree, the training error (MAE from the figure title) keeps decreasing. \n", 413 | "But does it mean that the 15th-order polynomial gives the best fit?\n", 414 | "Probably not.\n", 415 | "\n", 416 | "Drag the slider to the right, and you will find that the curve passes exactly through many points and looks very odd.\n", 417 | "If we are given new data, this model may not predict well because it fits too closely to the old data.\n", 418 | "As real-world data tend to be noisy (due to missing and erroneous values), our synthetic data also have noise added.\n", 419 | "Models with high polynomial degrees are so flexible that they fit the noise rather than the true relationship! \n", 420 | "In this case, these models are **overfitting**, or have a **high variance**.\n", 421 | "Overfitting usually happens when the model is overly complicated.\n", 422 | "The high-order polynomials in our example have too many degrees of freedom (weights) for our data.\n", 423 | "\n", 424 | "##### Challenge question\n", 425 | "\n", 426 | "> Would having more training data help resolve overfitting?\n", 427 | "\n", 428 | "Compared to underfitting, overfitting is in general harder to detect because the training error tends to be small. We will discuss how to identify overfitting later in this module.\n", 429 | "Now let's focus on how to prevent overfitting using regularization." 430 | ] 431 | }, 432 | { 433 | "cell_type": "markdown", 434 | "metadata": {}, 435 | "source": [ 436 | "## Regularization\n", 437 | "\n", 438 | "Regularization is used to reduce or avoid overfitting.\n", 439 | "The idea is to introduce a term in the loss function that penalizes complicated models. In our polynomial regression model:\n", 440 | "\n", 441 | "$$\n", 442 | "\\hat{y} = w_0 + w_1 x + w_2 x^2 + \\cdots + w_d x^d,\n", 443 | "$$\n", 444 | "\n", 445 | "every polynomial term contributes to the overall complexity of the model.\n", 446 | "One idea is to add constraints to the magnitudes of these weights.\n", 447 | "\n", 448 | "A common approach is to add a regularization term $\\lambda\\sum_{j=1}^d w_j^2$ to the mean-squared error loss:\n", 449 | "\n", 450 | "$$\n", 451 | "L(\\mathbf{w})=\\frac{1}{N} {\\lVert \\mathbf{y} - X\\mathbf{w} \\rVert}^2 + \\lambda \\sum_{j=1}^d w_j^2\n", 452 | "$$\n", 453 | "\n", 454 | "This new loss function favors smaller weights, because larger weights will increase the term on the right.\n", 455 | "As weights are smaller, the model is less likely to overfit by large amplitude higher-order polynomial terms.\n", 456 | "\n", 457 | "Notice that we don't penalize the intercept $w_0$ in the regularization term. This can be justified mathematically, but one way to think about it is that a constant term won't affect the overall model complexity.\n", 458 | "\n", 459 | "Above, $\\lambda$ denotes the regularization parameter, or the strength of penalty. It controls the tradeoff between fitting the data well (minimizing the first term) and keeping the model simple to avoid overfitting (minimizing the second term). When $\\lambda\\rightarrow 0$, the loss function falls back to standard mean-square error loss. And when $\\lambda$ is large, all penalized weights will be close to 0 after training and our model would approach the constant value $w_0$.\n", 460 | "\n", 461 | "##### Note\n", 462 | "\n", 463 | "> This regularization is also called $L^2$-penalty, or Tikhonov regularization, while the name _ridge regression_ is also used for this model-building method." 464 | ] 465 | }, 466 | { 467 | "cell_type": "markdown", 468 | "metadata": {}, 469 | "source": [ 470 | "Let's code the regularized mean-squared error loss and set $\\lambda = 1$." 471 | ] 472 | }, 473 | { 474 | "cell_type": "code", 475 | "execution_count": 10, 476 | "metadata": {}, 477 | "outputs": [], 478 | "source": [ 479 | "def regularized_loss(params, model, X, y, _lambda=1.0):\n", 480 | " '''\n", 481 | " The mean squared error loss function with an L2 penalty.\n", 482 | " Arguments:\n", 483 | " params: 1D array of weights for the linear model\n", 484 | " model : function for the linear regression model\n", 485 | " X : 2D array of input values\n", 486 | " y : 1D array of predicted values\n", 487 | " _lambda: regularization parameter, default 1.0\n", 488 | " Returns:\n", 489 | " float, regularized mean squared error\n", 490 | " '''\n", 491 | " y_pred = model(params, X)\n", 492 | " return numpy.mean( numpy.sum((y-y_pred)**2) ) + _lambda * numpy.sum( params[1:]**2 )\n", 493 | "\n", 494 | "gradient = grad(regularized_loss) " 495 | ] 496 | }, 497 | { 498 | "cell_type": "code", 499 | "execution_count": 11, 500 | "metadata": {}, 501 | "outputs": [], 502 | "source": [ 503 | "no_regularization_params = params.copy()" 504 | ] 505 | }, 506 | { 507 | "cell_type": "markdown", 508 | "metadata": {}, 509 | "source": [ 510 | "And train the 3rd-degree polynomial model using gradient descent." 511 | ] 512 | }, 513 | { 514 | "cell_type": "code", 515 | "execution_count": 12, 516 | "metadata": {}, 517 | "outputs": [ 518 | { 519 | "name": "stdout", 520 | "output_type": "stream", 521 | "text": [ 522 | "iteration 0, loss = 5434.768, mae = 11.05718775676392\n", 523 | "iteration 100, loss = 1785.985, mae = 6.983886996350374\n", 524 | "iteration 200, loss = 1764.260, mae = 6.966518785947395\n", 525 | "iteration 300, loss = 1763.570, mae = 6.965728405272657\n" 526 | ] 527 | } 528 | ], 529 | "source": [ 530 | "max_iter = 3000\n", 531 | "alpha = 0.01\n", 532 | "params = numpy.zeros(X_scaled.shape[1])\n", 533 | "descent = numpy.ones(X_scaled.shape[1])\n", 534 | "i = 0\n", 535 | "\n", 536 | "from sklearn.metrics import mean_absolute_error\n", 537 | "\n", 538 | "while numpy.linalg.norm(descent) > 0.01 and i < max_iter:\n", 539 | " descent = gradient(params, linear_regression, X_scaled, y)\n", 540 | " params = params - descent * alpha\n", 541 | " loss = mse_loss(params, linear_regression, X_scaled, y)\n", 542 | " mae = mean_absolute_error(y, X_scaled@params)\n", 543 | " if i%100 == 0:\n", 544 | " print(f\"iteration {i:4}, {loss = :.3f}, {mae = }\")\n", 545 | " i += 1" 546 | ] 547 | }, 548 | { 549 | "cell_type": "markdown", 550 | "metadata": {}, 551 | "source": [ 552 | "Let's compare the optimal weights before and after regularization. We can reuse the `xgrid` to plot both curves." 553 | ] 554 | }, 555 | { 556 | "cell_type": "code", 557 | "execution_count": 13, 558 | "metadata": {}, 559 | "outputs": [ 560 | { 561 | "name": "stdout", 562 | "output_type": "stream", 563 | "text": [ 564 | "weights without regularization\n", 565 | "[-22.51572398 6.75930601 41.30788709 30.0105898 ]\n", 566 | "weights with regularization\n", 567 | "[-11.13750882 12.48522096 28.26626633 11.09211867]\n" 568 | ] 569 | }, 570 | { 571 | "data": { 572 | "image/png": "\n", 573 | "text/plain": [ 574 | "
" 575 | ] 576 | }, 577 | "metadata": { 578 | "needs_background": "light" 579 | }, 580 | "output_type": "display_data" 581 | } 582 | ], 583 | "source": [ 584 | "print(\"weights without regularization\")\n", 585 | "print(no_regularization_params)\n", 586 | "print(\"weights with regularization\")\n", 587 | "print(params)\n", 588 | "\n", 589 | "pyplot.scatter(x, y, c='r', label='true')\n", 590 | "pyplot.plot(xgrid, Xgrid_scaled@no_regularization_params, label='w/o regularization')\n", 591 | "pyplot.plot(xgrid, Xgrid_scaled@params, label='with regularization')\n", 592 | "pyplot.legend();" 593 | ] 594 | }, 595 | { 596 | "cell_type": "markdown", 597 | "metadata": {}, 598 | "source": [ 599 | "Using our helper script again to display both models with varying polynomial degree in an `ipywidget`, interact with the slider to explore the results. We set `regularized=True` this time." 600 | ] 601 | }, 602 | { 603 | "cell_type": "code", 604 | "execution_count": 14, 605 | "metadata": {}, 606 | "outputs": [ 607 | { 608 | "data": { 609 | "application/vnd.jupyter.widget-view+json": { 610 | "model_id": "e0c720ccb75d4994b1bb51d650dd8ab5", 611 | "version_major": 2, 612 | "version_minor": 0 613 | }, 614 | "text/plain": [ 615 | "interactive(children=(IntSlider(value=8, description='degree', max=15, min=1), Output()), _dom_classes=('widge…" 616 | ] 617 | }, 618 | "metadata": {}, 619 | "output_type": "display_data" 620 | } 621 | ], 622 | "source": [ 623 | "interact_polyreg(max_degree, x, y, regularized=True)" 624 | ] 625 | }, 626 | { 627 | "cell_type": "markdown", 628 | "metadata": {}, 629 | "source": [ 630 | "Thanks to the regularization term, you won't see the wiggling curves even with a high degree polynomial, since the magnitude of the weights (the coefficients before each polynomial term) is much smaller." 631 | ] 632 | }, 633 | { 634 | "cell_type": "markdown", 635 | "metadata": {}, 636 | "source": [ 637 | "## Ridge regression with scikit-learn\n", 638 | "\n", 639 | "You won't be surprised by now that **scikit-learn** offers a method to obtain a linear model with regularization: [`Ridge()`](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Ridge.html#sklearn.linear_model.Ridge). \n", 640 | "Look and ponder at the code cell below, where we call `Ridge()` and its `.fit()` method to fit the ridge regression model with our scaled matrix of features (also called the _design matrix_) and the $y$ data.\n", 641 | "The default value of the penalization parameter of $1.0$, so we don't specify it, but if you wanted to try a different value, you specify it as `alpha=value` in the argument list. Do explore the documentation page." 642 | ] 643 | }, 644 | { 645 | "cell_type": "code", 646 | "execution_count": 15, 647 | "metadata": {}, 648 | "outputs": [ 649 | { 650 | "name": "stdout", 651 | "output_type": "stream", 652 | "text": [ 653 | "[12.48291164 28.26642615 11.09583654]\n", 654 | "-11.138315887456029\n" 655 | ] 656 | }, 657 | { 658 | "data": { 659 | "image/png": "\n", 660 | "text/plain": [ 661 | "
" 662 | ] 663 | }, 664 | "metadata": { 665 | "needs_background": "light" 666 | }, 667 | "output_type": "display_data" 668 | } 669 | ], 670 | "source": [ 671 | "from sklearn.linear_model import Ridge\n", 672 | "\n", 673 | "model = Ridge().fit(X_scaled[:, 1:], y) # Ridge() by default fits an intercept\n", 674 | "y_pred_sklearn = model.predict(Xgrid_scaled[:,1:])\n", 675 | "\n", 676 | "print(model.coef_)\n", 677 | "print(model.intercept_)\n", 678 | "\n", 679 | "pyplot.scatter(x, y, c='r', label='true')\n", 680 | "pyplot.plot(xgrid, y_pred_sklearn, label='sklearn ridge regression')\n", 681 | "pyplot.legend();\n" 682 | ] 683 | }, 684 | { 685 | "cell_type": "markdown", 686 | "metadata": {}, 687 | "source": [ 688 | "The `model` variable we created above using `Ridge()` stores the model weights and the intercept in the `.coef_` and `intercept_` attributes. \n", 689 | "Compare the values with those we obtained above using the regularization and 3rd-order polynomial. Pretty close!" 690 | ] 691 | }, 692 | { 693 | "cell_type": "markdown", 694 | "metadata": {}, 695 | "source": [ 696 | "## What we've learned\n", 697 | "\n", 698 | "- Polynomial regression is a special case of multiple linear regression.\n", 699 | "- Pick too low-order a polynomial (e.g., a line) and the model could _underfit_ the data, giving large residuals (training error).\n", 700 | "- Pick too high a polynomial order, and you will get _overfitting_, as the model tries to fit the noise in the data resulting in wiggles.\n", 701 | "- Regularization is used to reduce or avoid overfitting.\n", 702 | "- Adding the sum-of-squares of the weights to the loss function _penalizes_ large weights and controls overfitting. \n", 703 | "- Linear regression with this so-called Tikhonov regularization is called _ridge regression_.\n", 704 | "- With **scikit-learn**, you can use `Ridge()` to fit a linear model with regularization. " 705 | ] 706 | }, 707 | { 708 | "cell_type": "code", 709 | "execution_count": 16, 710 | "metadata": {}, 711 | "outputs": [ 712 | { 713 | "data": { 714 | "text/html": [ 715 | "\n", 716 | "\n", 717 | "\n", 718 | "\n", 848 | "\n" 864 | ], 865 | "text/plain": [ 866 | "" 867 | ] 868 | }, 869 | "execution_count": 16, 870 | "metadata": {}, 871 | "output_type": "execute_result" 872 | } 873 | ], 874 | "source": [ 875 | "# Execute this cell to load the notebook's style sheet, then ignore it\n", 876 | "from IPython.core.display import HTML\n", 877 | "css_file = '../style/custom.css'\n", 878 | "HTML(open(css_file, \"r\").read())" 879 | ] 880 | } 881 | ], 882 | "metadata": { 883 | "kernelspec": { 884 | "display_name": "Python 3", 885 | "language": "python", 886 | "name": "python3" 887 | }, 888 | "language_info": { 889 | "codemirror_mode": { 890 | "name": "ipython", 891 | "version": 3 892 | }, 893 | "file_extension": ".py", 894 | "mimetype": "text/x-python", 895 | "name": "python", 896 | "nbconvert_exporter": "python", 897 | "pygments_lexer": "ipython3", 898 | "version": "3.8.5" 899 | } 900 | }, 901 | "nbformat": 4, 902 | "nbformat_minor": 5 903 | } 904 | -------------------------------------------------------------------------------- /scripts/plot_helpers.py: -------------------------------------------------------------------------------- 1 | import numpy as _np 2 | from sklearn.pipeline import make_pipeline 3 | from sklearn.linear_model import LinearRegression, Ridge 4 | from sklearn.preprocessing import PolynomialFeatures, MinMaxScaler 5 | from sklearn.metrics import mean_absolute_error 6 | from matplotlib import pyplot as plt 7 | from ipywidgets import interact 8 | 9 | def interact_polyreg(max_degree, x, y, regularized=False, verbose=True): 10 | """ 11 | The function to plot polynomial linear regression. 12 | 13 | Args: 14 | max_degree: int 15 | Max polynomial degree. 16 | x,y: numpy.ndarray 17 | 1D Training data. 18 | regularized: bool 19 | Whether to add l2-norm regularization term in loss. Default to False. 20 | verbose: bool 21 | Whether to print trained weights. Default to True. 22 | """ 23 | x_plot = _np.linspace(x.min(), x.max(), 30).reshape(-1,1) 24 | 25 | def polyreg_helper(degree): 26 | plt.figure(figsize=(10,6)) 27 | plt.scatter(x, y, c='r', label='true') 28 | linear = make_pipeline(PolynomialFeatures(degree, include_bias=False), 29 | MinMaxScaler(), 30 | LinearRegression(fit_intercept=True)) 31 | linear.fit(x.reshape(-1,1), y) 32 | mae_linear = mean_absolute_error(y, linear.predict(x.reshape(-1,1))) 33 | 34 | if regularized: 35 | ridge = make_pipeline(PolynomialFeatures(degree, include_bias=False), 36 | MinMaxScaler(), 37 | Ridge(alpha=1.0)) 38 | ridge.fit(x.reshape(-1,1), y) 39 | mae_ridge = mean_absolute_error(y, ridge.predict(x.reshape(-1,1))) 40 | plt.plot(x_plot, linear.predict(x_plot), label='predicted, w/o regularization') 41 | plt.plot(x_plot, ridge.predict(x_plot), label='predicted, with regularization') 42 | plt.title(f"Poly degree = {degree:2}, MAE_no_reg = {mae_linear:.3f}, MAE_reg = {mae_ridge:.3f}", fontsize=16) 43 | if verbose: 44 | print('weights without regularization') 45 | print(linear.named_steps['linearregression'].coef_) 46 | print('weights with regularization') 47 | print(ridge.named_steps['ridge'].coef_) 48 | else: 49 | plt.plot(x_plot, linear.predict(x_plot), label='predicted') 50 | plt.title(f"Polynomial degree = {degree:2}, MAE = {mae_linear:.3f}", fontsize=16) 51 | plt.legend(fontsize=16) 52 | plt.show() 53 | 54 | interact(polyreg_helper, degree=(1, max_degree)) 55 | -------------------------------------------------------------------------------- /style/custom.css: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 134 | 150 | --------------------------------------------------------------------------------