├── .DS_Store ├── Coursera AL28VT8TSGUS.pdf ├── README.md ├── course1_neural_networks_and_deep_learning ├── .DS_Store ├── Building+your+Deep+Neural+Network+-+Step+by+Step+v3.ipynb ├── Coursera NEST8656HYZJ.pdf ├── Deep+Neural+Network+-+Application+v3.ipynb ├── Logistic+Regression+with+a+Neural+Network+mindset+v3.ipynb ├── Planar+data+classification+with+one+hidden+layer+v3.ipynb └── Python+Basics+With+Numpy+v3.ipynb ├── course2_improving_deep_neural_networks ├── .DS_Store ├── Coursera D2ZZE6WAGLGD.pdf ├── Gradient+Checking.ipynb ├── Initialization.ipynb ├── Optimization+methods.ipynb ├── Regularization.ipynb └── Tensorflow+Tutorial.ipynb ├── course3_structuring_machine_learning_projects ├── .DS_Store ├── Coursera AGVR9FXTTJAB.pdf ├── Week 1 Quiz - Bird recognition in the city of Peacetopia (case study).md └── Week 2 Quiz - Autonomous driving (case study).md ├── course4_convolutional_neural_networks ├── Art+Generation+with+Neural+Style+Transfer+-+v2.ipynb ├── Autonomous+driving+application+-+Car+detection+-+v1.ipynb ├── Convolution+model+-+Application+-+v1.ipynb ├── Convolution+model+-+Step+by+Step+-+v2.ipynb ├── Coursera 22RV83VPMX63.pdf ├── Face+Recognition+for+the+Happy+House+-+v3.ipynb ├── Keras+-+Tutorial+-+Happy+House+v2.ipynb └── Residual+Networks+-+v2.ipynb └── course5_sequential_models ├── .DS_Store ├── Building+a+Recurrent+Neural+Network+-+Step+by+Step+-+v1.ipynb ├── Coursera 32LEZVGFV8FD.pdf ├── Dinosaurus+Island+--+Character+level+language+model+final+-+v2.ipynb ├── Emojify+-+v2.ipynb ├── Improvise+a+Jazz+Solo+with+an+LSTM+Network+-+v1.ipynb ├── Neural+machine+translation+with+attention+-+v1.ipynb ├── Operations+on+word+vectors+-+v1.ipynb └── Trigger+word+detection+-+v1.ipynb /.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wangruinju/Deep-Learning/e4c2b9d04898bd6113ef32377732915d4282b838/.DS_Store -------------------------------------------------------------------------------- /Coursera AL28VT8TSGUS.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wangruinju/Deep-Learning/e4c2b9d04898bd6113ef32377732915d4282b838/Coursera AL28VT8TSGUS.pdf -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Deep-Learning 2 | 3 | In this early morning of Super Bowl Day, I finally finished Deep Learning Specialization taught by Andrew Ng. 4 | 5 | This specialization includes 5 modules: 6 | 7 | ## Course 1: [Neural Networks and Deep Learning](https://www.coursera.org/learn/neural-networks-deep-learning/home/welcome) 8 | 9 | - Understand the major technology trends driving Deep Learning. 10 | - Be able to build, train and apply fully connected deep neural networks. 11 | - Know how to implement efficient (vectorized) neural networks. 12 | - Understand the key parameters in a neural network's architecture. 13 | 14 | * Week1: Be able to explain the major trends driving the rise of deep learning, and understand where and how it is applied today. 15 | * [Python basics with numpy](https://github.com/wangruinju/Deep-Learning/blob/master/course1_neural_networks_and_deep_learning/Python%2BBasics%2BWith%2BNumpy%2Bv3.ipynb) 16 | 17 | * Week2: Learn to set up a machine learning problem with a neural network mindset. Learn to use vectorization to speed up your models. 18 | * [Logistic Regression with a Neural Network mindset](https://github.com/wangruinju/Deep-Learning/blob/master/course1_neural_networks_and_deep_learning/Logistic%2BRegression%2Bwith%2Ba%2BNeural%2BNetwork%2Bmindset%2Bv3.ipynb) 19 | 20 | * Week3: Learn to build a neural network with one hidden layer, using forward propagation and backpropagation. 21 | * [Planar data classification with a hidden layer](https://github.com/wangruinju/Deep-Learning/blob/master/course1_neural_networks_and_deep_learning/Planar%2Bdata%2Bclassification%2Bwith%2Bone%2Bhidden%2Blayer%2Bv3.ipynb) 22 | 23 | * Week4: Understand the key computations underlying deep learning, use them to build and train deep neural networks, and apply it to computer vision. 24 | * [Build your deep neural network: Step by Step](https://github.com/wangruinju/Deep-Learning/blob/master/course1_neural_networks_and_deep_learning/Building%2Byour%2BDeep%2BNeural%2BNetwork%2B-%2BStep%2Bby%2BStep%2Bv3.ipynb) 25 | * [Deep Neural Network Application](https://github.com/wangruinju/Deep-Learning/blob/master/course1_neural_networks_and_deep_learning/Deep%2BNeural%2BNetwork%2B-%2BApplication%2Bv3.ipynb) 26 | 27 | ## Course 2: [Improving Deeping Neural Networks](https://www.coursera.org/learn/deep-neural-network/home/welcome) 28 | 29 | - Understand industry best-practices for building deep learning applications. 30 | - Be able to effectively use the common neural network "tricks", including initialization, L2 and dropout regularization, Batch normalization, gradient checking. 31 | - Be able to implement and apply a variety of optimization algorithms, such as mini-batch gradient descent, Momentum, RMSprop and Adam, and check for their convergence. 32 | - Understand new best-practices for the deep learning era of how to set up train/dev/test sets and analyze bias/variance. 33 | - Be able to implement a neural network in TensorFlow. 34 | 35 | * Week 1: Practical aspects of Deep Learning 36 | * [Initialization](https://github.com/wangruinju/Deep-Learning/blob/master/course2_improving_deep_neural_networks/Initialization.ipynb) 37 | * [Regularization](https://github.com/wangruinju/Deep-Learning/blob/master/course2_improving_deep_neural_networks/Regularization.ipynb) 38 | * [Gradient Chekcing](https://github.com/wangruinju/Deep-Learning/blob/master/course2_improving_deep_neural_networks/Gradient%2BChecking.ipynb) 39 | 40 | * Week 2: Optimization algorithms 41 | * [Optimization](https://github.com/wangruinju/Deep-Learning/blob/master/course2_improving_deep_neural_networks/Optimization%2Bmethods.ipynb) 42 | 43 | * Week 3: Hyperparameter tuning, Batch Normalization and Programming Frameworks 44 | * [Tensorflow Tutorial](https://github.com/wangruinju/Deep-Learning/blob/master/course2_improving_deep_neural_networks/Tensorflow%2BTutorial.ipynb) 45 | 46 | ## Course 3: [Sturcturing Machine Learning Projects](https://www.coursera.org/learn/machine-learning-projects/home/welcome) 47 | 48 | - Understand how to diagnose errors in a machine learning system. 49 | - Be able to prioritize the most promising directions for reducing error. 50 | - Understand complex ML settings, such as mismatched training/test sets, and comparing to and/or surpassing human-level performance. 51 | - Know how to apply end-to-end learning, transfer learning, and multi-task learning. 52 | 53 | * Week 1: ML Strategy (1) 54 | * [Bird recognition](https://github.com/wangruinju/Deep-Learning/blob/master/course3_structuring_machine_learning_projects/Week%201%20Quiz%20-%20Bird%20recognition%20in%20the%20city%20of%20Peacetopia%20(case%20study).md) 55 | 56 | * Week 2: ML Strategy (2) 57 | * [Autonomous driving](https://github.com/wangruinju/Deep-Learning/blob/master/course3_structuring_machine_learning_projects/Week%202%20Quiz%20-%20Autonomous%20driving%20(case%20study).md) 58 | 59 | ## Course 4: [Convolutional Neural Networks](https://www.coursera.org/learn/convolutional-neural-networks/home/welcome) 60 | 61 | - Understand how to build a convolutional neural network, including recent variations such as residual networks. 62 | - Know how to apply convolutional networks to visual detection and recognition tasks. 63 | - Know to use neural style transfer to generate art. 64 | - Be able to apply these algorithms to a variety of image, video, and other 2D or 3D data. 65 | 66 | * Week 1: Foundations of Convolutional Neural Networks 67 | * [Convolutional Model: step by step](https://github.com/wangruinju/Deep-Learning/blob/master/course4_convolutional_neural_networks/Convolution%2Bmodel%2B-%2BStep%2Bby%2BStep%2B-%2Bv2.ipynb) 68 | * [Convolutional model: application](https://github.com/wangruinju/Deep-Learning/blob/master/course4_convolutional_neural_networks/Convolution%2Bmodel%2B-%2BApplication%2B-%2Bv1.ipynb) 69 | 70 | * Week 2: Deep convolutional models: case studies 71 | * [Residual Networks](https://github.com/wangruinju/Deep-Learning/blob/master/course4_convolutional_neural_networks/Residual%2BNetworks%2B-%2Bv2.ipynb) 72 | 73 | * Week 3: Object detection 74 | * [Car detection with YOLOv2](https://github.com/wangruinju/Deep-Learning/blob/master/course4_convolutional_neural_networks/Autonomous%2Bdriving%2Bapplication%2B-%2BCar%2Bdetection%2B-%2Bv1.ipynb) 75 | 76 | * Week 4: Special applications: Face recognition & Neural style transfer 77 | * [Art generation with Neural Stype Transfer](https://github.com/wangruinju/Deep-Learning/blob/master/course4_convolutional_neural_networks/Art%2BGeneration%2Bwith%2BNeural%2BStyle%2BTransfer%2B-%2Bv2.ipynb) 78 | * [Face Recognition for the Happy House](https://github.com/wangruinju/Deep-Learning/blob/master/course4_convolutional_neural_networks/Face%2BRecognition%2Bfor%2Bthe%2BHappy%2BHouse%2B-%2Bv3.ipynb) 79 | 80 | ## Course 5: [Sequence Models](https://www.coursera.org/learn/nlp-sequence-models/home/welcome) 81 | 82 | - Understand how to build and train Recurrent Neural Networks (RNNs), and commonly-used variants such as GRUs and LSTMs. 83 | - Be able to apply sequence models to natural language problems, including text synthesis. 84 | - Be able to apply sequence models to audio applications, including speech recognition and music synthesis. 85 | 86 | * Week 1: Recurrent Neural Networks 87 | * [Build RNN: step by step](https://github.com/wangruinju/Deep-Learning/blob/master/course5_sequential_models/Building%2Ba%2BRecurrent%2BNeural%2BNetwork%2B-%2BStep%2Bby%2BStep%2B-%2Bv1.ipynb) 88 | * [Dinasaur Island: Character-level language modeling](https://github.com/wangruinju/Deep-Learning/blob/master/course5_sequential_models/Dinosaurus%2BIsland%2B--%2BCharacter%2Blevel%2Blanguage%2Bmodel%2Bfinal%2B-%2Bv2.ipynb) 89 | * [Jazz improvisation with LSTM](https://github.com/wangruinju/Deep-Learning/blob/master/course5_sequential_models/Improvise%2Ba%2BJazz%2BSolo%2Bwith%2Ban%2BLSTM%2BNetwork%2B-%2Bv1.ipynb) 90 | 91 | * Week 2: Natural Language Processing & Word Embeddings 92 | * [Operation on word vectors - Debiasing](https://github.com/wangruinju/Deep-Learning/blob/master/course5_sequential_models/Operations%2Bon%2Bword%2Bvectors%2B-%2Bv1.ipynb) 93 | * [Emojify: Sentimental analysis](https://github.com/wangruinju/Deep-Learning/blob/master/course5_sequential_models/Emojify%2B-%2Bv2.ipynb) 94 | 95 | * Week 3: Sequence models & Attention mechanism 96 | * [Neural Machine Translation with Attention](https://github.com/wangruinju/Deep-Learning/blob/master/course5_sequential_models/Neural%2Bmachine%2Btranslation%2Bwith%2Battention%2B-%2Bv1.ipynb) 97 | * [Trigger word detection](https://github.com/wangruinju/Deep-Learning/blob/master/course5_sequential_models/Trigger%2Bword%2Bdetection%2B-%2Bv1.ipynb) 98 | 99 | Please see my [GitHub](https://github.com/wangruinju/Deep-Learning) for details. 100 | 101 | I have also reviewed two amazing courses offered by Stanford University, which are 102 | 103 | * [CS231n: Convolutional Neural Networks for Visual Recognition](http://cs231n.stanford.edu/index.html) 104 | 105 | * [CS224n: Natural Language Processing with Deep Learning](http://web.stanford.edu/class/cs224n/) 106 | 107 | For the basic of machine learning, please refer to Andrew's [Maching Learning on Coursera](https://www.coursera.org/learn/machine-learning/home/welcome) and [CS229: Maching Learning](http://cs229.stanford.edu/). Here is [my GitHub repo](https://github.com/wangruinju/Machine-Learning-Coursera) for Andrew's Machine Learning course as guidance if needed. 108 | 109 | # Other Resources 110 | 111 | [Deep Learning textbook: Ian Goodfellow and Yoshua Bengio and Aaron Courville](http://www.deeplearningbook.org/) 112 | 113 | [Cheat Sheets for Deep Learning](https://becominghuman.ai/cheat-sheets-for-ai-neural-networks-machine-learning-deep-learning-big-data-678c51b4b463) 114 | 115 | [Deep Learning Projects](http://www.samyzaf.com/ML/) 116 | 117 | [TensorFlow and Deep Learning without a PhD (LOL)](https://www.youtube.com/watch?v=u4alGiomYP4) 118 | 119 | * CMU maching learning 120 | 121 | [Introduction to Machine Learning](http://www.cs.cmu.edu/~epxing/Class/10701/lecture.html) 122 | 123 | [Advanced Introduction to Machine Learning](http://www.cs.cmu.edu/~epxing/Class/10715/lecture.html) 124 | 125 | * UBC maching learning 126 | 127 | [Machine Learning and Data Mining](https://www.cs.ubc.ca/~schmidtm/Courses/340-F17/) 128 | 129 | [Machine Learning](https://www.cs.ubc.ca/~schmidtm/Courses/540-W17/) 130 | 131 | * Hunag-yi Lee videos 132 | 133 | [Maching Learning and Deep Learning resources](http://speech.ee.ntu.edu.tw/~tlkagk/talk.html) 134 | 135 | -------------------------------------------------------------------------------- /course1_neural_networks_and_deep_learning/.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wangruinju/Deep-Learning/e4c2b9d04898bd6113ef32377732915d4282b838/course1_neural_networks_and_deep_learning/.DS_Store -------------------------------------------------------------------------------- /course1_neural_networks_and_deep_learning/Coursera NEST8656HYZJ.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wangruinju/Deep-Learning/e4c2b9d04898bd6113ef32377732915d4282b838/course1_neural_networks_and_deep_learning/Coursera NEST8656HYZJ.pdf -------------------------------------------------------------------------------- /course1_neural_networks_and_deep_learning/Python+Basics+With+Numpy+v3.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Python Basics with Numpy (optional assignment)\n", 8 | "\n", 9 | "Welcome to your first assignment. This exercise gives you a brief introduction to Python. Even if you've used Python before, this will help familiarize you with functions we'll need. \n", 10 | "\n", 11 | "**Instructions:**\n", 12 | "- You will be using Python 3.\n", 13 | "- Avoid using for-loops and while-loops, unless you are explicitly told to do so.\n", 14 | "- Do not modify the (# GRADED FUNCTION [function name]) comment in some cells. Your work would not be graded if you change this. Each cell containing that comment should only contain one function.\n", 15 | "- After coding your function, run the cell right below it to check if your result is correct.\n", 16 | "\n", 17 | "**After this assignment you will:**\n", 18 | "- Be able to use iPython Notebooks\n", 19 | "- Be able to use numpy functions and numpy matrix/vector operations\n", 20 | "- Understand the concept of \"broadcasting\"\n", 21 | "- Be able to vectorize code\n", 22 | "\n", 23 | "Let's get started!" 24 | ] 25 | }, 26 | { 27 | "cell_type": "markdown", 28 | "metadata": {}, 29 | "source": [ 30 | "## About iPython Notebooks ##\n", 31 | "\n", 32 | "iPython Notebooks are interactive coding environments embedded in a webpage. You will be using iPython notebooks in this class. You only need to write code between the ### START CODE HERE ### and ### END CODE HERE ### comments. After writing your code, you can run the cell by either pressing \"SHIFT\"+\"ENTER\" or by clicking on \"Run Cell\" (denoted by a play symbol) in the upper bar of the notebook. \n", 33 | "\n", 34 | "We will often specify \"(≈ X lines of code)\" in the comments to tell you about how much code you need to write. It is just a rough estimate, so don't feel bad if your code is longer or shorter.\n", 35 | "\n", 36 | "**Exercise**: Set test to `\"Hello World\"` in the cell below to print \"Hello World\" and run the two cells below." 37 | ] 38 | }, 39 | { 40 | "cell_type": "code", 41 | "execution_count": 5, 42 | "metadata": { 43 | "collapsed": false 44 | }, 45 | "outputs": [], 46 | "source": [ 47 | "### START CODE HERE ### (≈ 1 line of code)\n", 48 | "test = 'Hello World'\n", 49 | "### END CODE HERE ###" 50 | ] 51 | }, 52 | { 53 | "cell_type": "code", 54 | "execution_count": 6, 55 | "metadata": { 56 | "collapsed": false 57 | }, 58 | "outputs": [ 59 | { 60 | "name": "stdout", 61 | "output_type": "stream", 62 | "text": [ 63 | "test: Hello World\n" 64 | ] 65 | } 66 | ], 67 | "source": [ 68 | "print (\"test: \" + test)" 69 | ] 70 | }, 71 | { 72 | "cell_type": "markdown", 73 | "metadata": {}, 74 | "source": [ 75 | "**Expected output**:\n", 76 | "test: Hello World" 77 | ] 78 | }, 79 | { 80 | "cell_type": "markdown", 81 | "metadata": {}, 82 | "source": [ 83 | "\n", 84 | "**What you need to remember**:\n", 85 | "- Run your cells using SHIFT+ENTER (or \"Run cell\")\n", 86 | "- Write code in the designated areas using Python 3 only\n", 87 | "- Do not modify the code outside of the designated areas" 88 | ] 89 | }, 90 | { 91 | "cell_type": "markdown", 92 | "metadata": {}, 93 | "source": [ 94 | "## 1 - Building basic functions with numpy ##\n", 95 | "\n", 96 | "Numpy is the main package for scientific computing in Python. It is maintained by a large community (www.numpy.org). In this exercise you will learn several key numpy functions such as np.exp, np.log, and np.reshape. You will need to know how to use these functions for future assignments.\n", 97 | "\n", 98 | "### 1.1 - sigmoid function, np.exp() ###\n", 99 | "\n", 100 | "Before using np.exp(), you will use math.exp() to implement the sigmoid function. You will then see why np.exp() is preferable to math.exp().\n", 101 | "\n", 102 | "**Exercise**: Build a function that returns the sigmoid of a real number x. Use math.exp(x) for the exponential function.\n", 103 | "\n", 104 | "**Reminder**:\n", 105 | "$sigmoid(x) = \\frac{1}{1+e^{-x}}$ is sometimes also known as the logistic function. It is a non-linear function used not only in Machine Learning (Logistic Regression), but also in Deep Learning.\n", 106 | "\n", 107 | "\n", 108 | "\n", 109 | "To refer to a function belonging to a specific package you could call it using package_name.function(). Run the code below to see an example with math.exp()." 110 | ] 111 | }, 112 | { 113 | "cell_type": "code", 114 | "execution_count": 14, 115 | "metadata": { 116 | "collapsed": true 117 | }, 118 | "outputs": [], 119 | "source": [ 120 | "# GRADED FUNCTION: basic_sigmoid\n", 121 | "\n", 122 | "import math\n", 123 | "import numpy as np\n", 124 | "\n", 125 | "def basic_sigmoid(x):\n", 126 | " \"\"\"\n", 127 | " Compute sigmoid of x.\n", 128 | "\n", 129 | " Arguments:\n", 130 | " x -- A scalar\n", 131 | "\n", 132 | " Return:\n", 133 | " s -- sigmoid(x)\n", 134 | " \"\"\"\n", 135 | " \n", 136 | " ### START CODE HERE ### (≈ 1 line of code)\n", 137 | " s = 1/(1+1/np.exp(x))\n", 138 | " ### END CODE HERE ###\n", 139 | " \n", 140 | " return s" 141 | ] 142 | }, 143 | { 144 | "cell_type": "code", 145 | "execution_count": 15, 146 | "metadata": { 147 | "collapsed": false 148 | }, 149 | "outputs": [ 150 | { 151 | "data": { 152 | "text/plain": [ 153 | "0.95257412682243336" 154 | ] 155 | }, 156 | "execution_count": 15, 157 | "metadata": {}, 158 | "output_type": "execute_result" 159 | } 160 | ], 161 | "source": [ 162 | "basic_sigmoid(3)" 163 | ] 164 | }, 165 | { 166 | "cell_type": "markdown", 167 | "metadata": {}, 168 | "source": [ 169 | "**Expected Output**: \n", 170 | "\n", 171 | " \n", 172 | " \n", 173 | " \n", 174 | " \n", 175 | "\n", 176 | "
** basic_sigmoid(3) **0.9525741268224334
" 177 | ] 178 | }, 179 | { 180 | "cell_type": "markdown", 181 | "metadata": {}, 182 | "source": [ 183 | "Actually, we rarely use the \"math\" library in deep learning because the inputs of the functions are real numbers. In deep learning we mostly use matrices and vectors. This is why numpy is more useful. " 184 | ] 185 | }, 186 | { 187 | "cell_type": "code", 188 | "execution_count": 16, 189 | "metadata": { 190 | "collapsed": false 191 | }, 192 | "outputs": [ 193 | { 194 | "data": { 195 | "text/plain": [ 196 | "array([ 0.73105858, 0.88079708, 0.95257413])" 197 | ] 198 | }, 199 | "execution_count": 16, 200 | "metadata": {}, 201 | "output_type": "execute_result" 202 | } 203 | ], 204 | "source": [ 205 | "### One reason why we use \"numpy\" instead of \"math\" in Deep Learning ###\n", 206 | "x = [1, 2, 3]\n", 207 | "basic_sigmoid(x) # you will see this give an error when you run it, because x is a vector." 208 | ] 209 | }, 210 | { 211 | "cell_type": "markdown", 212 | "metadata": {}, 213 | "source": [ 214 | "In fact, if $ x = (x_1, x_2, ..., x_n)$ is a row vector then $np.exp(x)$ will apply the exponential function to every element of x. The output will thus be: $np.exp(x) = (e^{x_1}, e^{x_2}, ..., e^{x_n})$" 215 | ] 216 | }, 217 | { 218 | "cell_type": "code", 219 | "execution_count": 17, 220 | "metadata": { 221 | "collapsed": false 222 | }, 223 | "outputs": [ 224 | { 225 | "name": "stdout", 226 | "output_type": "stream", 227 | "text": [ 228 | "[ 2.71828183 7.3890561 20.08553692]\n" 229 | ] 230 | } 231 | ], 232 | "source": [ 233 | "import numpy as np\n", 234 | "\n", 235 | "# example of np.exp\n", 236 | "x = np.array([1, 2, 3])\n", 237 | "print(np.exp(x)) # result is (exp(1), exp(2), exp(3))" 238 | ] 239 | }, 240 | { 241 | "cell_type": "markdown", 242 | "metadata": {}, 243 | "source": [ 244 | "Furthermore, if x is a vector, then a Python operation such as $s = x + 3$ or $s = \\frac{1}{x}$ will output s as a vector of the same size as x." 245 | ] 246 | }, 247 | { 248 | "cell_type": "code", 249 | "execution_count": 18, 250 | "metadata": { 251 | "collapsed": false 252 | }, 253 | "outputs": [ 254 | { 255 | "name": "stdout", 256 | "output_type": "stream", 257 | "text": [ 258 | "[4 5 6]\n" 259 | ] 260 | } 261 | ], 262 | "source": [ 263 | "# example of vector operation\n", 264 | "x = np.array([1, 2, 3])\n", 265 | "print (x + 3)" 266 | ] 267 | }, 268 | { 269 | "cell_type": "markdown", 270 | "metadata": {}, 271 | "source": [ 272 | "Any time you need more info on a numpy function, we encourage you to look at [the official documentation](https://docs.scipy.org/doc/numpy-1.10.1/reference/generated/numpy.exp.html). \n", 273 | "\n", 274 | "You can also create a new cell in the notebook and write `np.exp?` (for example) to get quick access to the documentation.\n", 275 | "\n", 276 | "**Exercise**: Implement the sigmoid function using numpy. \n", 277 | "\n", 278 | "**Instructions**: x could now be either a real number, a vector, or a matrix. The data structures we use in numpy to represent these shapes (vectors, matrices...) are called numpy arrays. You don't need to know more for now.\n", 279 | "$$ \\text{For } x \\in \\mathbb{R}^n \\text{, } sigmoid(x) = sigmoid\\begin{pmatrix}\n", 280 | " x_1 \\\\\n", 281 | " x_2 \\\\\n", 282 | " ... \\\\\n", 283 | " x_n \\\\\n", 284 | "\\end{pmatrix} = \\begin{pmatrix}\n", 285 | " \\frac{1}{1+e^{-x_1}} \\\\\n", 286 | " \\frac{1}{1+e^{-x_2}} \\\\\n", 287 | " ... \\\\\n", 288 | " \\frac{1}{1+e^{-x_n}} \\\\\n", 289 | "\\end{pmatrix}\\tag{1} $$" 290 | ] 291 | }, 292 | { 293 | "cell_type": "code", 294 | "execution_count": 19, 295 | "metadata": { 296 | "collapsed": false 297 | }, 298 | "outputs": [], 299 | "source": [ 300 | "# GRADED FUNCTION: sigmoid\n", 301 | "\n", 302 | "import numpy as np # this means you can access numpy functions by writing np.function() instead of numpy.function()\n", 303 | "\n", 304 | "def sigmoid(x):\n", 305 | " \"\"\"\n", 306 | " Compute the sigmoid of x\n", 307 | "\n", 308 | " Arguments:\n", 309 | " x -- A scalar or numpy array of any size\n", 310 | "\n", 311 | " Return:\n", 312 | " s -- sigmoid(x)\n", 313 | " \"\"\"\n", 314 | " \n", 315 | " ### START CODE HERE ### (≈ 1 line of code)\n", 316 | " s = 1/(1+1/np.exp(x))\n", 317 | " ### END CODE HERE ###\n", 318 | " \n", 319 | " return s" 320 | ] 321 | }, 322 | { 323 | "cell_type": "code", 324 | "execution_count": 20, 325 | "metadata": { 326 | "collapsed": false 327 | }, 328 | "outputs": [ 329 | { 330 | "data": { 331 | "text/plain": [ 332 | "array([ 0.73105858, 0.88079708, 0.95257413])" 333 | ] 334 | }, 335 | "execution_count": 20, 336 | "metadata": {}, 337 | "output_type": "execute_result" 338 | } 339 | ], 340 | "source": [ 341 | "x = np.array([1, 2, 3])\n", 342 | "sigmoid(x)" 343 | ] 344 | }, 345 | { 346 | "cell_type": "markdown", 347 | "metadata": {}, 348 | "source": [ 349 | "**Expected Output**: \n", 350 | "\n", 351 | " \n", 352 | " \n", 353 | " \n", 354 | " \n", 355 | "
**sigmoid([1,2,3])** array([ 0.73105858, 0.88079708, 0.95257413])
\n" 356 | ] 357 | }, 358 | { 359 | "cell_type": "markdown", 360 | "metadata": {}, 361 | "source": [ 362 | "### 1.2 - Sigmoid gradient\n", 363 | "\n", 364 | "As you've seen in lecture, you will need to compute gradients to optimize loss functions using backpropagation. Let's code your first gradient function.\n", 365 | "\n", 366 | "**Exercise**: Implement the function sigmoid_grad() to compute the gradient of the sigmoid function with respect to its input x. The formula is: $$sigmoid\\_derivative(x) = \\sigma'(x) = \\sigma(x) (1 - \\sigma(x))\\tag{2}$$\n", 367 | "You often code this function in two steps:\n", 368 | "1. Set s to be the sigmoid of x. You might find your sigmoid(x) function useful.\n", 369 | "2. Compute $\\sigma'(x) = s(1-s)$" 370 | ] 371 | }, 372 | { 373 | "cell_type": "code", 374 | "execution_count": 21, 375 | "metadata": { 376 | "collapsed": false 377 | }, 378 | "outputs": [], 379 | "source": [ 380 | "# GRADED FUNCTION: sigmoid_derivative\n", 381 | "\n", 382 | "def sigmoid_derivative(x):\n", 383 | " \"\"\"\n", 384 | " Compute the gradient (also called the slope or derivative) of the sigmoid function with respect to its input x.\n", 385 | " You can store the output of the sigmoid function into variables and then use it to calculate the gradient.\n", 386 | " \n", 387 | " Arguments:\n", 388 | " x -- A scalar or numpy array\n", 389 | "\n", 390 | " Return:\n", 391 | " ds -- Your computed gradient.\n", 392 | " \"\"\"\n", 393 | " \n", 394 | " ### START CODE HERE ### (≈ 2 lines of code)\n", 395 | " s = 1/(1+1/np.exp(x))\n", 396 | " ds = s*(1-s)\n", 397 | " ### END CODE HERE ###\n", 398 | " \n", 399 | " return ds" 400 | ] 401 | }, 402 | { 403 | "cell_type": "code", 404 | "execution_count": 22, 405 | "metadata": { 406 | "collapsed": false 407 | }, 408 | "outputs": [ 409 | { 410 | "name": "stdout", 411 | "output_type": "stream", 412 | "text": [ 413 | "sigmoid_derivative(x) = [ 0.19661193 0.10499359 0.04517666]\n" 414 | ] 415 | } 416 | ], 417 | "source": [ 418 | "x = np.array([1, 2, 3])\n", 419 | "print (\"sigmoid_derivative(x) = \" + str(sigmoid_derivative(x)))" 420 | ] 421 | }, 422 | { 423 | "cell_type": "markdown", 424 | "metadata": {}, 425 | "source": [ 426 | "**Expected Output**: \n", 427 | "\n", 428 | "\n", 429 | "\n", 430 | " \n", 431 | " \n", 432 | " \n", 433 | " \n", 434 | "
**sigmoid_derivative([1,2,3])** [ 0.19661193 0.10499359 0.04517666]
\n", 435 | "\n" 436 | ] 437 | }, 438 | { 439 | "cell_type": "markdown", 440 | "metadata": {}, 441 | "source": [ 442 | "### 1.3 - Reshaping arrays ###\n", 443 | "\n", 444 | "Two common numpy functions used in deep learning are [np.shape](https://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.shape.html) and [np.reshape()](https://docs.scipy.org/doc/numpy/reference/generated/numpy.reshape.html). \n", 445 | "- X.shape is used to get the shape (dimension) of a matrix/vector X. \n", 446 | "- X.reshape(...) is used to reshape X into some other dimension. \n", 447 | "\n", 448 | "For example, in computer science, an image is represented by a 3D array of shape $(length, height, depth = 3)$. However, when you read an image as the input of an algorithm you convert it to a vector of shape $(length*height*3, 1)$. In other words, you \"unroll\", or reshape, the 3D array into a 1D vector.\n", 449 | "\n", 450 | "\n", 451 | "\n", 452 | "**Exercise**: Implement `image2vector()` that takes an input of shape (length, height, 3) and returns a vector of shape (length\\*height\\*3, 1). For example, if you would like to reshape an array v of shape (a, b, c) into a vector of shape (a*b,c) you would do:\n", 453 | "``` python\n", 454 | "v = v.reshape((v.shape[0]*v.shape[1], v.shape[2])) # v.shape[0] = a ; v.shape[1] = b ; v.shape[2] = c\n", 455 | "```\n", 456 | "- Please don't hardcode the dimensions of image as a constant. Instead look up the quantities you need with `image.shape[0]`, etc. " 457 | ] 458 | }, 459 | { 460 | "cell_type": "code", 461 | "execution_count": 24, 462 | "metadata": { 463 | "collapsed": false 464 | }, 465 | "outputs": [], 466 | "source": [ 467 | "# GRADED FUNCTION: image2vector\n", 468 | "def image2vector(image):\n", 469 | " \"\"\"\n", 470 | " Argument:\n", 471 | " image -- a numpy array of shape (length, height, depth)\n", 472 | " \n", 473 | " Returns:\n", 474 | " v -- a vector of shape (length*height*depth, 1)\n", 475 | " \"\"\"\n", 476 | " \n", 477 | " ### START CODE HERE ### (≈ 1 line of code)\n", 478 | " v = image.reshape(image.shape[0]*image.shape[1]*image.shape[2], 1)\n", 479 | " ### END CODE HERE ###\n", 480 | " \n", 481 | " return v" 482 | ] 483 | }, 484 | { 485 | "cell_type": "code", 486 | "execution_count": 25, 487 | "metadata": { 488 | "collapsed": false 489 | }, 490 | "outputs": [ 491 | { 492 | "name": "stdout", 493 | "output_type": "stream", 494 | "text": [ 495 | "image2vector(image) = [[ 0.67826139]\n", 496 | " [ 0.29380381]\n", 497 | " [ 0.90714982]\n", 498 | " [ 0.52835647]\n", 499 | " [ 0.4215251 ]\n", 500 | " [ 0.45017551]\n", 501 | " [ 0.92814219]\n", 502 | " [ 0.96677647]\n", 503 | " [ 0.85304703]\n", 504 | " [ 0.52351845]\n", 505 | " [ 0.19981397]\n", 506 | " [ 0.27417313]\n", 507 | " [ 0.60659855]\n", 508 | " [ 0.00533165]\n", 509 | " [ 0.10820313]\n", 510 | " [ 0.49978937]\n", 511 | " [ 0.34144279]\n", 512 | " [ 0.94630077]]\n" 513 | ] 514 | } 515 | ], 516 | "source": [ 517 | "# This is a 3 by 3 by 2 array, typically images will be (num_px_x, num_px_y,3) where 3 represents the RGB values\n", 518 | "image = np.array([[[ 0.67826139, 0.29380381],\n", 519 | " [ 0.90714982, 0.52835647],\n", 520 | " [ 0.4215251 , 0.45017551]],\n", 521 | "\n", 522 | " [[ 0.92814219, 0.96677647],\n", 523 | " [ 0.85304703, 0.52351845],\n", 524 | " [ 0.19981397, 0.27417313]],\n", 525 | "\n", 526 | " [[ 0.60659855, 0.00533165],\n", 527 | " [ 0.10820313, 0.49978937],\n", 528 | " [ 0.34144279, 0.94630077]]])\n", 529 | "\n", 530 | "print (\"image2vector(image) = \" + str(image2vector(image)))" 531 | ] 532 | }, 533 | { 534 | "cell_type": "markdown", 535 | "metadata": {}, 536 | "source": [ 537 | "**Expected Output**: \n", 538 | "\n", 539 | "\n", 540 | "\n", 541 | " \n", 542 | " \n", 543 | " \n", 561 | " \n", 562 | " \n", 563 | " \n", 564 | "
**image2vector(image)** [[ 0.67826139]\n", 544 | " [ 0.29380381]\n", 545 | " [ 0.90714982]\n", 546 | " [ 0.52835647]\n", 547 | " [ 0.4215251 ]\n", 548 | " [ 0.45017551]\n", 549 | " [ 0.92814219]\n", 550 | " [ 0.96677647]\n", 551 | " [ 0.85304703]\n", 552 | " [ 0.52351845]\n", 553 | " [ 0.19981397]\n", 554 | " [ 0.27417313]\n", 555 | " [ 0.60659855]\n", 556 | " [ 0.00533165]\n", 557 | " [ 0.10820313]\n", 558 | " [ 0.49978937]\n", 559 | " [ 0.34144279]\n", 560 | " [ 0.94630077]]
" 565 | ] 566 | }, 567 | { 568 | "cell_type": "markdown", 569 | "metadata": {}, 570 | "source": [ 571 | "### 1.4 - Normalizing rows\n", 572 | "\n", 573 | "Another common technique we use in Machine Learning and Deep Learning is to normalize our data. It often leads to a better performance because gradient descent converges faster after normalization. Here, by normalization we mean changing x to $ \\frac{x}{\\| x\\|} $ (dividing each row vector of x by its norm).\n", 574 | "\n", 575 | "For example, if $$x = \n", 576 | "\\begin{bmatrix}\n", 577 | " 0 & 3 & 4 \\\\\n", 578 | " 2 & 6 & 4 \\\\\n", 579 | "\\end{bmatrix}\\tag{3}$$ then $$\\| x\\| = np.linalg.norm(x, axis = 1, keepdims = True) = \\begin{bmatrix}\n", 580 | " 5 \\\\\n", 581 | " \\sqrt{56} \\\\\n", 582 | "\\end{bmatrix}\\tag{4} $$and $$ x\\_normalized = \\frac{x}{\\| x\\|} = \\begin{bmatrix}\n", 583 | " 0 & \\frac{3}{5} & \\frac{4}{5} \\\\\n", 584 | " \\frac{2}{\\sqrt{56}} & \\frac{6}{\\sqrt{56}} & \\frac{4}{\\sqrt{56}} \\\\\n", 585 | "\\end{bmatrix}\\tag{5}$$ Note that you can divide matrices of different sizes and it works fine: this is called broadcasting and you're going to learn about it in part 5.\n", 586 | "\n", 587 | "\n", 588 | "**Exercise**: Implement normalizeRows() to normalize the rows of a matrix. After applying this function to an input matrix x, each row of x should be a vector of unit length (meaning length 1)." 589 | ] 590 | }, 591 | { 592 | "cell_type": "code", 593 | "execution_count": 28, 594 | "metadata": { 595 | "collapsed": false 596 | }, 597 | "outputs": [], 598 | "source": [ 599 | "# GRADED FUNCTION: normalizeRows\n", 600 | "\n", 601 | "def normalizeRows(x):\n", 602 | " \"\"\"\n", 603 | " Implement a function that normalizes each row of the matrix x (to have unit length).\n", 604 | " \n", 605 | " Argument:\n", 606 | " x -- A numpy matrix of shape (n, m)\n", 607 | " \n", 608 | " Returns:\n", 609 | " x -- The normalized (by row) numpy matrix. You are allowed to modify x.\n", 610 | " \"\"\"\n", 611 | " \n", 612 | " ### START CODE HERE ### (≈ 2 lines of code)\n", 613 | " # Compute x_norm as the norm 2 of x. Use np.linalg.norm(..., ord = 2, axis = ..., keepdims = True)\n", 614 | " x_norm = np.linalg.norm(x, axis = 1, keepdims = True)\n", 615 | " \n", 616 | " # Divide x by its norm.\n", 617 | " x = x / x_norm\n", 618 | " ### END CODE HERE ###\n", 619 | "\n", 620 | " return x" 621 | ] 622 | }, 623 | { 624 | "cell_type": "code", 625 | "execution_count": 29, 626 | "metadata": { 627 | "collapsed": false 628 | }, 629 | "outputs": [ 630 | { 631 | "name": "stdout", 632 | "output_type": "stream", 633 | "text": [ 634 | "normalizeRows(x) = [[ 0. 0.6 0.8 ]\n", 635 | " [ 0.13736056 0.82416338 0.54944226]]\n" 636 | ] 637 | } 638 | ], 639 | "source": [ 640 | "x = np.array([\n", 641 | " [0, 3, 4],\n", 642 | " [1, 6, 4]])\n", 643 | "print(\"normalizeRows(x) = \" + str(normalizeRows(x)))" 644 | ] 645 | }, 646 | { 647 | "cell_type": "markdown", 648 | "metadata": {}, 649 | "source": [ 650 | "**Expected Output**: \n", 651 | "\n", 652 | "\n", 653 | "\n", 654 | " \n", 655 | " \n", 656 | " \n", 658 | " \n", 659 | " \n", 660 | " \n", 661 | "
**normalizeRows(x)** [[ 0. 0.6 0.8 ]\n", 657 | " [ 0.13736056 0.82416338 0.54944226]]
" 662 | ] 663 | }, 664 | { 665 | "cell_type": "markdown", 666 | "metadata": {}, 667 | "source": [ 668 | "**Note**:\n", 669 | "In normalizeRows(), you can try to print the shapes of x_norm and x, and then rerun the assessment. You'll find out that they have different shapes. This is normal given that x_norm takes the norm of each row of x. So x_norm has the same number of rows but only 1 column. So how did it work when you divided x by x_norm? This is called broadcasting and we'll talk about it now! " 670 | ] 671 | }, 672 | { 673 | "cell_type": "markdown", 674 | "metadata": {}, 675 | "source": [ 676 | "### 1.5 - Broadcasting and the softmax function ####\n", 677 | "A very important concept to understand in numpy is \"broadcasting\". It is very useful for performing mathematical operations between arrays of different shapes. For the full details on broadcasting, you can read the official [broadcasting documentation](http://docs.scipy.org/doc/numpy/user/basics.broadcasting.html)." 678 | ] 679 | }, 680 | { 681 | "cell_type": "markdown", 682 | "metadata": {}, 683 | "source": [ 684 | "**Exercise**: Implement a softmax function using numpy. You can think of softmax as a normalizing function used when your algorithm needs to classify two or more classes. You will learn more about softmax in the second course of this specialization.\n", 685 | "\n", 686 | "**Instructions**:\n", 687 | "- $ \\text{for } x \\in \\mathbb{R}^{1\\times n} \\text{, } softmax(x) = softmax(\\begin{bmatrix}\n", 688 | " x_1 &&\n", 689 | " x_2 &&\n", 690 | " ... &&\n", 691 | " x_n \n", 692 | "\\end{bmatrix}) = \\begin{bmatrix}\n", 693 | " \\frac{e^{x_1}}{\\sum_{j}e^{x_j}} &&\n", 694 | " \\frac{e^{x_2}}{\\sum_{j}e^{x_j}} &&\n", 695 | " ... &&\n", 696 | " \\frac{e^{x_n}}{\\sum_{j}e^{x_j}} \n", 697 | "\\end{bmatrix} $ \n", 698 | "\n", 699 | "- $\\text{for a matrix } x \\in \\mathbb{R}^{m \\times n} \\text{, $x_{ij}$ maps to the element in the $i^{th}$ row and $j^{th}$ column of $x$, thus we have: }$ $$softmax(x) = softmax\\begin{bmatrix}\n", 700 | " x_{11} & x_{12} & x_{13} & \\dots & x_{1n} \\\\\n", 701 | " x_{21} & x_{22} & x_{23} & \\dots & x_{2n} \\\\\n", 702 | " \\vdots & \\vdots & \\vdots & \\ddots & \\vdots \\\\\n", 703 | " x_{m1} & x_{m2} & x_{m3} & \\dots & x_{mn}\n", 704 | "\\end{bmatrix} = \\begin{bmatrix}\n", 705 | " \\frac{e^{x_{11}}}{\\sum_{j}e^{x_{1j}}} & \\frac{e^{x_{12}}}{\\sum_{j}e^{x_{1j}}} & \\frac{e^{x_{13}}}{\\sum_{j}e^{x_{1j}}} & \\dots & \\frac{e^{x_{1n}}}{\\sum_{j}e^{x_{1j}}} \\\\\n", 706 | " \\frac{e^{x_{21}}}{\\sum_{j}e^{x_{2j}}} & \\frac{e^{x_{22}}}{\\sum_{j}e^{x_{2j}}} & \\frac{e^{x_{23}}}{\\sum_{j}e^{x_{2j}}} & \\dots & \\frac{e^{x_{2n}}}{\\sum_{j}e^{x_{2j}}} \\\\\n", 707 | " \\vdots & \\vdots & \\vdots & \\ddots & \\vdots \\\\\n", 708 | " \\frac{e^{x_{m1}}}{\\sum_{j}e^{x_{mj}}} & \\frac{e^{x_{m2}}}{\\sum_{j}e^{x_{mj}}} & \\frac{e^{x_{m3}}}{\\sum_{j}e^{x_{mj}}} & \\dots & \\frac{e^{x_{mn}}}{\\sum_{j}e^{x_{mj}}}\n", 709 | "\\end{bmatrix} = \\begin{pmatrix}\n", 710 | " softmax\\text{(first row of x)} \\\\\n", 711 | " softmax\\text{(second row of x)} \\\\\n", 712 | " ... \\\\\n", 713 | " softmax\\text{(last row of x)} \\\\\n", 714 | "\\end{pmatrix} $$" 715 | ] 716 | }, 717 | { 718 | "cell_type": "code", 719 | "execution_count": 30, 720 | "metadata": { 721 | "collapsed": false 722 | }, 723 | "outputs": [], 724 | "source": [ 725 | "# GRADED FUNCTION: softmax\n", 726 | "\n", 727 | "def softmax(x):\n", 728 | " \"\"\"Calculates the softmax for each row of the input x.\n", 729 | "\n", 730 | " Your code should work for a row vector and also for matrices of shape (n, m).\n", 731 | "\n", 732 | " Argument:\n", 733 | " x -- A numpy matrix of shape (n,m)\n", 734 | "\n", 735 | " Returns:\n", 736 | " s -- A numpy matrix equal to the softmax of x, of shape (n,m)\n", 737 | " \"\"\"\n", 738 | " \n", 739 | " ### START CODE HERE ### (≈ 3 lines of code)\n", 740 | " # Apply exp() element-wise to x. Use np.exp(...).\n", 741 | " x_exp = np.exp(x)\n", 742 | "\n", 743 | " # Create a vector x_sum that sums each row of x_exp. Use np.sum(..., axis = 1, keepdims = True).\n", 744 | " x_sum = np.sum(x_exp, axis = 1, keepdims = True)\n", 745 | " \n", 746 | " # Compute softmax(x) by dividing x_exp by x_sum. It should automatically use numpy broadcasting.\n", 747 | " s = x_exp / x_sum\n", 748 | "\n", 749 | " ### END CODE HERE ###\n", 750 | " \n", 751 | " return s" 752 | ] 753 | }, 754 | { 755 | "cell_type": "code", 756 | "execution_count": 31, 757 | "metadata": { 758 | "collapsed": false 759 | }, 760 | "outputs": [ 761 | { 762 | "name": "stdout", 763 | "output_type": "stream", 764 | "text": [ 765 | "softmax(x) = [[ 9.80897665e-01 8.94462891e-04 1.79657674e-02 1.21052389e-04\n", 766 | " 1.21052389e-04]\n", 767 | " [ 8.78679856e-01 1.18916387e-01 8.01252314e-04 8.01252314e-04\n", 768 | " 8.01252314e-04]]\n" 769 | ] 770 | } 771 | ], 772 | "source": [ 773 | "x = np.array([\n", 774 | " [9, 2, 5, 0, 0],\n", 775 | " [7, 5, 0, 0 ,0]])\n", 776 | "print(\"softmax(x) = \" + str(softmax(x)))" 777 | ] 778 | }, 779 | { 780 | "cell_type": "markdown", 781 | "metadata": {}, 782 | "source": [ 783 | "**Expected Output**:\n", 784 | "\n", 785 | "\n", 786 | "\n", 787 | " \n", 788 | " \n", 789 | " \n", 793 | " \n", 794 | "
**softmax(x)** [[ 9.80897665e-01 8.94462891e-04 1.79657674e-02 1.21052389e-04\n", 790 | " 1.21052389e-04]\n", 791 | " [ 8.78679856e-01 1.18916387e-01 8.01252314e-04 8.01252314e-04\n", 792 | " 8.01252314e-04]]
\n" 795 | ] 796 | }, 797 | { 798 | "cell_type": "markdown", 799 | "metadata": {}, 800 | "source": [ 801 | "**Note**:\n", 802 | "- If you print the shapes of x_exp, x_sum and s above and rerun the assessment cell, you will see that x_sum is of shape (2,1) while x_exp and s are of shape (2,5). **x_exp/x_sum** works due to python broadcasting.\n", 803 | "\n", 804 | "Congratulations! You now have a pretty good understanding of python numpy and have implemented a few useful functions that you will be using in deep learning." 805 | ] 806 | }, 807 | { 808 | "cell_type": "markdown", 809 | "metadata": {}, 810 | "source": [ 811 | "\n", 812 | "**What you need to remember:**\n", 813 | "- np.exp(x) works for any np.array x and applies the exponential function to every coordinate\n", 814 | "- the sigmoid function and its gradient\n", 815 | "- image2vector is commonly used in deep learning\n", 816 | "- np.reshape is widely used. In the future, you'll see that keeping your matrix/vector dimensions straight will go toward eliminating a lot of bugs. \n", 817 | "- numpy has efficient built-in functions\n", 818 | "- broadcasting is extremely useful" 819 | ] 820 | }, 821 | { 822 | "cell_type": "markdown", 823 | "metadata": { 824 | "collapsed": true 825 | }, 826 | "source": [ 827 | "## 2) Vectorization" 828 | ] 829 | }, 830 | { 831 | "cell_type": "markdown", 832 | "metadata": {}, 833 | "source": [ 834 | "\n", 835 | "In deep learning, you deal with very large datasets. Hence, a non-computationally-optimal function can become a huge bottleneck in your algorithm and can result in a model that takes ages to run. To make sure that your code is computationally efficient, you will use vectorization. For example, try to tell the difference between the following implementations of the dot/outer/elementwise product." 836 | ] 837 | }, 838 | { 839 | "cell_type": "code", 840 | "execution_count": 37, 841 | "metadata": { 842 | "collapsed": false 843 | }, 844 | "outputs": [ 845 | { 846 | "name": "stdout", 847 | "output_type": "stream", 848 | "text": [ 849 | "dot = 278\n", 850 | " ----- Computation time = 0.15392999999996881ms\n", 851 | "outer = [[ 81. 18. 18. 81. 0. 81. 18. 45. 0. 0. 81. 18. 45. 0.\n", 852 | " 0.]\n", 853 | " [ 18. 4. 4. 18. 0. 18. 4. 10. 0. 0. 18. 4. 10. 0.\n", 854 | " 0.]\n", 855 | " [ 45. 10. 10. 45. 0. 45. 10. 25. 0. 0. 45. 10. 25. 0.\n", 856 | " 0.]\n", 857 | " [ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n", 858 | " 0.]\n", 859 | " [ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n", 860 | " 0.]\n", 861 | " [ 63. 14. 14. 63. 0. 63. 14. 35. 0. 0. 63. 14. 35. 0.\n", 862 | " 0.]\n", 863 | " [ 45. 10. 10. 45. 0. 45. 10. 25. 0. 0. 45. 10. 25. 0.\n", 864 | " 0.]\n", 865 | " [ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n", 866 | " 0.]\n", 867 | " [ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n", 868 | " 0.]\n", 869 | " [ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n", 870 | " 0.]\n", 871 | " [ 81. 18. 18. 81. 0. 81. 18. 45. 0. 0. 81. 18. 45. 0.\n", 872 | " 0.]\n", 873 | " [ 18. 4. 4. 18. 0. 18. 4. 10. 0. 0. 18. 4. 10. 0.\n", 874 | " 0.]\n", 875 | " [ 45. 10. 10. 45. 0. 45. 10. 25. 0. 0. 45. 10. 25. 0.\n", 876 | " 0.]\n", 877 | " [ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n", 878 | " 0.]\n", 879 | " [ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n", 880 | " 0.]]\n", 881 | " ----- Computation time = 0.3455210000000708ms\n", 882 | "elementwise multiplication = [ 81. 4. 10. 0. 0. 63. 10. 0. 0. 0. 81. 4. 25. 0. 0.]\n", 883 | " ----- Computation time = 0.1942270000001578ms\n", 884 | "gdot = [ 12.26563817 25.54968779 21.60902077]\n", 885 | " ----- Computation time = 0.24369900000009714ms\n" 886 | ] 887 | } 888 | ], 889 | "source": [ 890 | "import time\n", 891 | "\n", 892 | "x1 = [9, 2, 5, 0, 0, 7, 5, 0, 0, 0, 9, 2, 5, 0, 0]\n", 893 | "x2 = [9, 2, 2, 9, 0, 9, 2, 5, 0, 0, 9, 2, 5, 0, 0]\n", 894 | "\n", 895 | "### CLASSIC DOT PRODUCT OF VECTORS IMPLEMENTATION ###\n", 896 | "tic = time.process_time()\n", 897 | "dot = 0\n", 898 | "for i in range(len(x1)):\n", 899 | " dot+= x1[i]*x2[i]\n", 900 | "toc = time.process_time()\n", 901 | "print (\"dot = \" + str(dot) + \"\\n ----- Computation time = \" + str(1000*(toc - tic)) + \"ms\")\n", 902 | "\n", 903 | "### CLASSIC OUTER PRODUCT IMPLEMENTATION ###\n", 904 | "tic = time.process_time()\n", 905 | "outer = np.zeros((len(x1),len(x2))) # we create a len(x1)*len(x2) matrix with only zeros\n", 906 | "for i in range(len(x1)):\n", 907 | " for j in range(len(x2)):\n", 908 | " outer[i,j] = x1[i]*x2[j]\n", 909 | "toc = time.process_time()\n", 910 | "print (\"outer = \" + str(outer) + \"\\n ----- Computation time = \" + str(1000*(toc - tic)) + \"ms\")\n", 911 | "\n", 912 | "### CLASSIC ELEMENTWISE IMPLEMENTATION ###\n", 913 | "tic = time.process_time()\n", 914 | "mul = np.zeros(len(x1))\n", 915 | "for i in range(len(x1)):\n", 916 | " mul[i] = x1[i]*x2[i]\n", 917 | "toc = time.process_time()\n", 918 | "print (\"elementwise multiplication = \" + str(mul) + \"\\n ----- Computation time = \" + str(1000*(toc - tic)) + \"ms\")\n", 919 | "\n", 920 | "### CLASSIC GENERAL DOT PRODUCT IMPLEMENTATION ###\n", 921 | "W = np.random.rand(3,len(x1)) # Random 3*len(x1) numpy array\n", 922 | "tic = time.process_time()\n", 923 | "gdot = np.zeros(W.shape[0])\n", 924 | "for i in range(W.shape[0]):\n", 925 | " for j in range(len(x1)):\n", 926 | " gdot[i] += W[i,j]*x1[j]\n", 927 | "toc = time.process_time()\n", 928 | "print (\"gdot = \" + str(gdot) + \"\\n ----- Computation time = \" + str(1000*(toc - tic)) + \"ms\")" 929 | ] 930 | }, 931 | { 932 | "cell_type": "code", 933 | "execution_count": 38, 934 | "metadata": { 935 | "collapsed": false 936 | }, 937 | "outputs": [ 938 | { 939 | "name": "stdout", 940 | "output_type": "stream", 941 | "text": [ 942 | "dot = 278\n", 943 | " ----- Computation time = 0.18921100000013347ms\n", 944 | "outer = [[81 18 18 81 0 81 18 45 0 0 81 18 45 0 0]\n", 945 | " [18 4 4 18 0 18 4 10 0 0 18 4 10 0 0]\n", 946 | " [45 10 10 45 0 45 10 25 0 0 45 10 25 0 0]\n", 947 | " [ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]\n", 948 | " [ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]\n", 949 | " [63 14 14 63 0 63 14 35 0 0 63 14 35 0 0]\n", 950 | " [45 10 10 45 0 45 10 25 0 0 45 10 25 0 0]\n", 951 | " [ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]\n", 952 | " [ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]\n", 953 | " [ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]\n", 954 | " [81 18 18 81 0 81 18 45 0 0 81 18 45 0 0]\n", 955 | " [18 4 4 18 0 18 4 10 0 0 18 4 10 0 0]\n", 956 | " [45 10 10 45 0 45 10 25 0 0 45 10 25 0 0]\n", 957 | " [ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]\n", 958 | " [ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]]\n", 959 | " ----- Computation time = 0.1437939999999749ms\n", 960 | "elementwise multiplication = [81 4 10 0 0 63 10 0 0 0 81 4 25 0 0]\n", 961 | " ----- Computation time = 0.11161900000011826ms\n", 962 | "gdot = [ 12.26563817 25.54968779 21.60902077]\n", 963 | " ----- Computation time = 0.3983100000000128ms\n" 964 | ] 965 | } 966 | ], 967 | "source": [ 968 | "x1 = [9, 2, 5, 0, 0, 7, 5, 0, 0, 0, 9, 2, 5, 0, 0]\n", 969 | "x2 = [9, 2, 2, 9, 0, 9, 2, 5, 0, 0, 9, 2, 5, 0, 0]\n", 970 | "\n", 971 | "### VECTORIZED DOT PRODUCT OF VECTORS ###\n", 972 | "tic = time.process_time()\n", 973 | "dot = np.dot(x1,x2)\n", 974 | "toc = time.process_time()\n", 975 | "print (\"dot = \" + str(dot) + \"\\n ----- Computation time = \" + str(1000*(toc - tic)) + \"ms\")\n", 976 | "\n", 977 | "### VECTORIZED OUTER PRODUCT ###\n", 978 | "tic = time.process_time()\n", 979 | "outer = np.outer(x1,x2)\n", 980 | "toc = time.process_time()\n", 981 | "print (\"outer = \" + str(outer) + \"\\n ----- Computation time = \" + str(1000*(toc - tic)) + \"ms\")\n", 982 | "\n", 983 | "### VECTORIZED ELEMENTWISE MULTIPLICATION ###\n", 984 | "tic = time.process_time()\n", 985 | "mul = np.multiply(x1,x2)\n", 986 | "toc = time.process_time()\n", 987 | "print (\"elementwise multiplication = \" + str(mul) + \"\\n ----- Computation time = \" + str(1000*(toc - tic)) + \"ms\")\n", 988 | "\n", 989 | "### VECTORIZED GENERAL DOT PRODUCT ###\n", 990 | "tic = time.process_time()\n", 991 | "dot = np.dot(W,x1)\n", 992 | "toc = time.process_time()\n", 993 | "print (\"gdot = \" + str(dot) + \"\\n ----- Computation time = \" + str(1000*(toc - tic)) + \"ms\")" 994 | ] 995 | }, 996 | { 997 | "cell_type": "markdown", 998 | "metadata": {}, 999 | "source": [ 1000 | "As you may have noticed, the vectorized implementation is much cleaner and more efficient. For bigger vectors/matrices, the differences in running time become even bigger. \n", 1001 | "\n", 1002 | "**Note** that `np.dot()` performs a matrix-matrix or matrix-vector multiplication. This is different from `np.multiply()` and the `*` operator (which is equivalent to `.*` in Matlab/Octave), which performs an element-wise multiplication." 1003 | ] 1004 | }, 1005 | { 1006 | "cell_type": "markdown", 1007 | "metadata": {}, 1008 | "source": [ 1009 | "### 2.1 Implement the L1 and L2 loss functions\n", 1010 | "\n", 1011 | "**Exercise**: Implement the numpy vectorized version of the L1 loss. You may find the function abs(x) (absolute value of x) useful.\n", 1012 | "\n", 1013 | "**Reminder**:\n", 1014 | "- The loss is used to evaluate the performance of your model. The bigger your loss is, the more different your predictions ($ \\hat{y} $) are from the true values ($y$). In deep learning, you use optimization algorithms like Gradient Descent to train your model and to minimize the cost.\n", 1015 | "- L1 loss is defined as:\n", 1016 | "$$\\begin{align*} & L_1(\\hat{y}, y) = \\sum_{i=0}^m|y^{(i)} - \\hat{y}^{(i)}| \\end{align*}\\tag{6}$$" 1017 | ] 1018 | }, 1019 | { 1020 | "cell_type": "code", 1021 | "execution_count": 33, 1022 | "metadata": { 1023 | "collapsed": false 1024 | }, 1025 | "outputs": [], 1026 | "source": [ 1027 | "# GRADED FUNCTION: L1\n", 1028 | "\n", 1029 | "def L1(yhat, y):\n", 1030 | " \"\"\"\n", 1031 | " Arguments:\n", 1032 | " yhat -- vector of size m (predicted labels)\n", 1033 | " y -- vector of size m (true labels)\n", 1034 | " \n", 1035 | " Returns:\n", 1036 | " loss -- the value of the L1 loss function defined above\n", 1037 | " \"\"\"\n", 1038 | " \n", 1039 | " ### START CODE HERE ### (≈ 1 line of code)\n", 1040 | " loss = sum(abs(yhat-y))\n", 1041 | " ### END CODE HERE ###\n", 1042 | " \n", 1043 | " return loss" 1044 | ] 1045 | }, 1046 | { 1047 | "cell_type": "code", 1048 | "execution_count": 34, 1049 | "metadata": { 1050 | "collapsed": false 1051 | }, 1052 | "outputs": [ 1053 | { 1054 | "name": "stdout", 1055 | "output_type": "stream", 1056 | "text": [ 1057 | "L1 = 1.1\n" 1058 | ] 1059 | } 1060 | ], 1061 | "source": [ 1062 | "yhat = np.array([.9, 0.2, 0.1, .4, .9])\n", 1063 | "y = np.array([1, 0, 0, 1, 1])\n", 1064 | "print(\"L1 = \" + str(L1(yhat,y)))" 1065 | ] 1066 | }, 1067 | { 1068 | "cell_type": "markdown", 1069 | "metadata": {}, 1070 | "source": [ 1071 | "**Expected Output**:\n", 1072 | "\n", 1073 | "\n", 1074 | "\n", 1075 | " \n", 1076 | " \n", 1077 | " \n", 1078 | " \n", 1079 | "
**L1** 1.1
\n" 1080 | ] 1081 | }, 1082 | { 1083 | "cell_type": "markdown", 1084 | "metadata": {}, 1085 | "source": [ 1086 | "**Exercise**: Implement the numpy vectorized version of the L2 loss. There are several way of implementing the L2 loss but you may find the function np.dot() useful. As a reminder, if $x = [x_1, x_2, ..., x_n]$, then `np.dot(x,x)` = $\\sum_{j=0}^n x_j^{2}$. \n", 1087 | "\n", 1088 | "- L2 loss is defined as $$\\begin{align*} & L_2(\\hat{y},y) = \\sum_{i=0}^m(y^{(i)} - \\hat{y}^{(i)})^2 \\end{align*}\\tag{7}$$" 1089 | ] 1090 | }, 1091 | { 1092 | "cell_type": "code", 1093 | "execution_count": 35, 1094 | "metadata": { 1095 | "collapsed": false 1096 | }, 1097 | "outputs": [], 1098 | "source": [ 1099 | "# GRADED FUNCTION: L2\n", 1100 | "\n", 1101 | "def L2(yhat, y):\n", 1102 | " \"\"\"\n", 1103 | " Arguments:\n", 1104 | " yhat -- vector of size m (predicted labels)\n", 1105 | " y -- vector of size m (true labels)\n", 1106 | " \n", 1107 | " Returns:\n", 1108 | " loss -- the value of the L2 loss function defined above\n", 1109 | " \"\"\"\n", 1110 | " \n", 1111 | " ### START CODE HERE ### (≈ 1 line of code)\n", 1112 | " loss = sum((yhat-y)*(yhat-y))\n", 1113 | " ### END CODE HERE ###\n", 1114 | " \n", 1115 | " return loss" 1116 | ] 1117 | }, 1118 | { 1119 | "cell_type": "code", 1120 | "execution_count": 36, 1121 | "metadata": { 1122 | "collapsed": false 1123 | }, 1124 | "outputs": [ 1125 | { 1126 | "name": "stdout", 1127 | "output_type": "stream", 1128 | "text": [ 1129 | "L2 = 0.43\n" 1130 | ] 1131 | } 1132 | ], 1133 | "source": [ 1134 | "yhat = np.array([.9, 0.2, 0.1, .4, .9])\n", 1135 | "y = np.array([1, 0, 0, 1, 1])\n", 1136 | "print(\"L2 = \" + str(L2(yhat,y)))" 1137 | ] 1138 | }, 1139 | { 1140 | "cell_type": "markdown", 1141 | "metadata": {}, 1142 | "source": [ 1143 | "**Expected Output**: \n", 1144 | "\n", 1145 | " \n", 1146 | " \n", 1147 | " \n", 1148 | " \n", 1149 | "
**L2** 0.43
" 1150 | ] 1151 | }, 1152 | { 1153 | "cell_type": "markdown", 1154 | "metadata": {}, 1155 | "source": [ 1156 | "Congratulations on completing this assignment. We hope that this little warm-up exercise helps you in the future assignments, which will be more exciting and interesting!" 1157 | ] 1158 | }, 1159 | { 1160 | "cell_type": "markdown", 1161 | "metadata": {}, 1162 | "source": [ 1163 | "\n", 1164 | "**What to remember:**\n", 1165 | "- Vectorization is very important in deep learning. It provides computational efficiency and clarity.\n", 1166 | "- You have reviewed the L1 and L2 loss.\n", 1167 | "- You are familiar with many numpy functions such as np.sum, np.dot, np.multiply, np.maximum, etc..." 1168 | ] 1169 | } 1170 | ], 1171 | "metadata": { 1172 | "coursera": { 1173 | "course_slug": "neural-networks-deep-learning", 1174 | "graded_item_id": "XHpfv", 1175 | "launcher_item_id": "Zh0CU" 1176 | }, 1177 | "kernelspec": { 1178 | "display_name": "Python 3", 1179 | "language": "python", 1180 | "name": "python3" 1181 | }, 1182 | "language_info": { 1183 | "codemirror_mode": { 1184 | "name": "ipython", 1185 | "version": 3 1186 | }, 1187 | "file_extension": ".py", 1188 | "mimetype": "text/x-python", 1189 | "name": "python", 1190 | "nbconvert_exporter": "python", 1191 | "pygments_lexer": "ipython3", 1192 | "version": "3.5.2" 1193 | } 1194 | }, 1195 | "nbformat": 4, 1196 | "nbformat_minor": 2 1197 | } 1198 | -------------------------------------------------------------------------------- /course2_improving_deep_neural_networks/.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wangruinju/Deep-Learning/e4c2b9d04898bd6113ef32377732915d4282b838/course2_improving_deep_neural_networks/.DS_Store -------------------------------------------------------------------------------- /course2_improving_deep_neural_networks/Coursera D2ZZE6WAGLGD.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wangruinju/Deep-Learning/e4c2b9d04898bd6113ef32377732915d4282b838/course2_improving_deep_neural_networks/Coursera D2ZZE6WAGLGD.pdf -------------------------------------------------------------------------------- /course2_improving_deep_neural_networks/Gradient+Checking.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Gradient Checking\n", 8 | "\n", 9 | "Welcome to the final assignment for this week! In this assignment you will learn to implement and use gradient checking. \n", 10 | "\n", 11 | "You are part of a team working to make mobile payments available globally, and are asked to build a deep learning model to detect fraud--whenever someone makes a payment, you want to see if the payment might be fraudulent, such as if the user's account has been taken over by a hacker. \n", 12 | "\n", 13 | "But backpropagation is quite challenging to implement, and sometimes has bugs. Because this is a mission-critical application, your company's CEO wants to be really certain that your implementation of backpropagation is correct. Your CEO says, \"Give me a proof that your backpropagation is actually working!\" To give this reassurance, you are going to use \"gradient checking\".\n", 14 | "\n", 15 | "Let's do it!" 16 | ] 17 | }, 18 | { 19 | "cell_type": "code", 20 | "execution_count": 7, 21 | "metadata": { 22 | "collapsed": true 23 | }, 24 | "outputs": [], 25 | "source": [ 26 | "# Packages\n", 27 | "import numpy as np\n", 28 | "from testCases import *\n", 29 | "from gc_utils import sigmoid, relu, dictionary_to_vector, vector_to_dictionary, gradients_to_vector" 30 | ] 31 | }, 32 | { 33 | "cell_type": "markdown", 34 | "metadata": {}, 35 | "source": [ 36 | "## 1) How does gradient checking work?\n", 37 | "\n", 38 | "Backpropagation computes the gradients $\\frac{\\partial J}{\\partial \\theta}$, where $\\theta$ denotes the parameters of the model. $J$ is computed using forward propagation and your loss function.\n", 39 | "\n", 40 | "Because forward propagation is relatively easy to implement, you're confident you got that right, and so you're almost 100% sure that you're computing the cost $J$ correctly. Thus, you can use your code for computing $J$ to verify the code for computing $\\frac{\\partial J}{\\partial \\theta}$. \n", 41 | "\n", 42 | "Let's look back at the definition of a derivative (or gradient):\n", 43 | "$$ \\frac{\\partial J}{\\partial \\theta} = \\lim_{\\varepsilon \\to 0} \\frac{J(\\theta + \\varepsilon) - J(\\theta - \\varepsilon)}{2 \\varepsilon} \\tag{1}$$\n", 44 | "\n", 45 | "If you're not familiar with the \"$\\displaystyle \\lim_{\\varepsilon \\to 0}$\" notation, it's just a way of saying \"when $\\varepsilon$ is really really small.\"\n", 46 | "\n", 47 | "We know the following:\n", 48 | "\n", 49 | "- $\\frac{\\partial J}{\\partial \\theta}$ is what you want to make sure you're computing correctly. \n", 50 | "- You can compute $J(\\theta + \\varepsilon)$ and $J(\\theta - \\varepsilon)$ (in the case that $\\theta$ is a real number), since you're confident your implementation for $J$ is correct. \n", 51 | "\n", 52 | "Lets use equation (1) and a small value for $\\varepsilon$ to convince your CEO that your code for computing $\\frac{\\partial J}{\\partial \\theta}$ is correct!" 53 | ] 54 | }, 55 | { 56 | "cell_type": "markdown", 57 | "metadata": {}, 58 | "source": [ 59 | "## 2) 1-dimensional gradient checking\n", 60 | "\n", 61 | "Consider a 1D linear function $J(\\theta) = \\theta x$. The model contains only a single real-valued parameter $\\theta$, and takes $x$ as input.\n", 62 | "\n", 63 | "You will implement code to compute $J(.)$ and its derivative $\\frac{\\partial J}{\\partial \\theta}$. You will then use gradient checking to make sure your derivative computation for $J$ is correct. \n", 64 | "\n", 65 | "\n", 66 | "
**Figure 1** : **1D linear model**
\n", 67 | "\n", 68 | "The diagram above shows the key computation steps: First start with $x$, then evaluate the function $J(x)$ (\"forward propagation\"). Then compute the derivative $\\frac{\\partial J}{\\partial \\theta}$ (\"backward propagation\"). \n", 69 | "\n", 70 | "**Exercise**: implement \"forward propagation\" and \"backward propagation\" for this simple function. I.e., compute both $J(.)$ (\"forward propagation\") and its derivative with respect to $\\theta$ (\"backward propagation\"), in two separate functions. " 71 | ] 72 | }, 73 | { 74 | "cell_type": "code", 75 | "execution_count": 1, 76 | "metadata": { 77 | "collapsed": true 78 | }, 79 | "outputs": [], 80 | "source": [ 81 | "# GRADED FUNCTION: forward_propagation\n", 82 | "\n", 83 | "def forward_propagation(x, theta):\n", 84 | " \"\"\"\n", 85 | " Implement the linear forward propagation (compute J) presented in Figure 1 (J(theta) = theta * x)\n", 86 | " \n", 87 | " Arguments:\n", 88 | " x -- a real-valued input\n", 89 | " theta -- our parameter, a real number as well\n", 90 | " \n", 91 | " Returns:\n", 92 | " J -- the value of function J, computed using the formula J(theta) = theta * x\n", 93 | " \"\"\"\n", 94 | " \n", 95 | " ### START CODE HERE ### (approx. 1 line)\n", 96 | " J = theta*x\n", 97 | " ### END CODE HERE ###\n", 98 | " \n", 99 | " return J" 100 | ] 101 | }, 102 | { 103 | "cell_type": "code", 104 | "execution_count": 2, 105 | "metadata": {}, 106 | "outputs": [ 107 | { 108 | "name": "stdout", 109 | "output_type": "stream", 110 | "text": [ 111 | "J = 8\n" 112 | ] 113 | } 114 | ], 115 | "source": [ 116 | "x, theta = 2, 4\n", 117 | "J = forward_propagation(x, theta)\n", 118 | "print (\"J = \" + str(J))" 119 | ] 120 | }, 121 | { 122 | "cell_type": "markdown", 123 | "metadata": {}, 124 | "source": [ 125 | "**Expected Output**:\n", 126 | "\n", 127 | "\n", 128 | " \n", 129 | " \n", 130 | " \n", 131 | " \n", 132 | "
** J ** 8
" 133 | ] 134 | }, 135 | { 136 | "cell_type": "markdown", 137 | "metadata": {}, 138 | "source": [ 139 | "**Exercise**: Now, implement the backward propagation step (derivative computation) of Figure 1. That is, compute the derivative of $J(\\theta) = \\theta x$ with respect to $\\theta$. To save you from doing the calculus, you should get $dtheta = \\frac { \\partial J }{ \\partial \\theta} = x$." 140 | ] 141 | }, 142 | { 143 | "cell_type": "code", 144 | "execution_count": 3, 145 | "metadata": { 146 | "collapsed": true 147 | }, 148 | "outputs": [], 149 | "source": [ 150 | "# GRADED FUNCTION: backward_propagation\n", 151 | "\n", 152 | "def backward_propagation(x, theta):\n", 153 | " \"\"\"\n", 154 | " Computes the derivative of J with respect to theta (see Figure 1).\n", 155 | " \n", 156 | " Arguments:\n", 157 | " x -- a real-valued input\n", 158 | " theta -- our parameter, a real number as well\n", 159 | " \n", 160 | " Returns:\n", 161 | " dtheta -- the gradient of the cost with respect to theta\n", 162 | " \"\"\"\n", 163 | " \n", 164 | " ### START CODE HERE ### (approx. 1 line)\n", 165 | " dtheta = x\n", 166 | " ### END CODE HERE ###\n", 167 | " \n", 168 | " return dtheta" 169 | ] 170 | }, 171 | { 172 | "cell_type": "code", 173 | "execution_count": 4, 174 | "metadata": { 175 | "scrolled": true 176 | }, 177 | "outputs": [ 178 | { 179 | "name": "stdout", 180 | "output_type": "stream", 181 | "text": [ 182 | "dtheta = 2\n" 183 | ] 184 | } 185 | ], 186 | "source": [ 187 | "x, theta = 2, 4\n", 188 | "dtheta = backward_propagation(x, theta)\n", 189 | "print (\"dtheta = \" + str(dtheta))" 190 | ] 191 | }, 192 | { 193 | "cell_type": "markdown", 194 | "metadata": {}, 195 | "source": [ 196 | "**Expected Output**:\n", 197 | "\n", 198 | "\n", 199 | " \n", 200 | " \n", 201 | " \n", 202 | " \n", 203 | "
** dtheta ** 2
" 204 | ] 205 | }, 206 | { 207 | "cell_type": "markdown", 208 | "metadata": {}, 209 | "source": [ 210 | "**Exercise**: To show that the `backward_propagation()` function is correctly computing the gradient $\\frac{\\partial J}{\\partial \\theta}$, let's implement gradient checking.\n", 211 | "\n", 212 | "**Instructions**:\n", 213 | "- First compute \"gradapprox\" using the formula above (1) and a small value of $\\varepsilon$. Here are the Steps to follow:\n", 214 | " 1. $\\theta^{+} = \\theta + \\varepsilon$\n", 215 | " 2. $\\theta^{-} = \\theta - \\varepsilon$\n", 216 | " 3. $J^{+} = J(\\theta^{+})$\n", 217 | " 4. $J^{-} = J(\\theta^{-})$\n", 218 | " 5. $gradapprox = \\frac{J^{+} - J^{-}}{2 \\varepsilon}$\n", 219 | "- Then compute the gradient using backward propagation, and store the result in a variable \"grad\"\n", 220 | "- Finally, compute the relative difference between \"gradapprox\" and the \"grad\" using the following formula:\n", 221 | "$$ difference = \\frac {\\mid\\mid grad - gradapprox \\mid\\mid_2}{\\mid\\mid grad \\mid\\mid_2 + \\mid\\mid gradapprox \\mid\\mid_2} \\tag{2}$$\n", 222 | "You will need 3 Steps to compute this formula:\n", 223 | " - 1'. compute the numerator using np.linalg.norm(...)\n", 224 | " - 2'. compute the denominator. You will need to call np.linalg.norm(...) twice.\n", 225 | " - 3'. divide them.\n", 226 | "- If this difference is small (say less than $10^{-7}$), you can be quite confident that you have computed your gradient correctly. Otherwise, there may be a mistake in the gradient computation. \n" 227 | ] 228 | }, 229 | { 230 | "cell_type": "code", 231 | "execution_count": 10, 232 | "metadata": { 233 | "collapsed": true 234 | }, 235 | "outputs": [], 236 | "source": [ 237 | "# GRADED FUNCTION: gradient_check\n", 238 | "\n", 239 | "def gradient_check(x, theta, epsilon = 1e-7):\n", 240 | " \"\"\"\n", 241 | " Implement the backward propagation presented in Figure 1.\n", 242 | " \n", 243 | " Arguments:\n", 244 | " x -- a real-valued input\n", 245 | " theta -- our parameter, a real number as well\n", 246 | " epsilon -- tiny shift to the input to compute approximated gradient with formula(1)\n", 247 | " \n", 248 | " Returns:\n", 249 | " difference -- difference (2) between the approximated gradient and the backward propagation gradient\n", 250 | " \"\"\"\n", 251 | " \n", 252 | " # Compute gradapprox using left side of formula (1). epsilon is small enough, you don't need to worry about the limit.\n", 253 | " ### START CODE HERE ### (approx. 5 lines)\n", 254 | " thetaplus = theta + epsilon # Step 1\n", 255 | " thetaminus = theta - epsilon # Step 2\n", 256 | " J_plus = forward_propagation(x, thetaplus) # Step 3\n", 257 | " J_minus = forward_propagation(x, thetaminus) # Step 4\n", 258 | " gradapprox = 0.5*(J_plus - J_minus)/epsilon # Step 5\n", 259 | " ### END CODE HERE ###\n", 260 | " \n", 261 | " # Check if gradapprox is close enough to the output of backward_propagation()\n", 262 | " ### START CODE HERE ### (approx. 1 line)\n", 263 | " grad = backward_propagation(x, theta)\n", 264 | " ### END CODE HERE ###\n", 265 | " \n", 266 | " ### START CODE HERE ### (approx. 1 line)\n", 267 | " numerator = np.linalg.norm(grad - gradapprox) # Step 1'\n", 268 | " denominator = np.linalg.norm(grad) + np.linalg.norm(gradapprox) # Step 2'\n", 269 | " difference = numerator/denominator # Step 3'\n", 270 | " ### END CODE HERE ###\n", 271 | " \n", 272 | " if difference < 1e-7:\n", 273 | " print (\"The gradient is correct!\")\n", 274 | " else:\n", 275 | " print (\"The gradient is wrong!\")\n", 276 | " \n", 277 | " return difference" 278 | ] 279 | }, 280 | { 281 | "cell_type": "code", 282 | "execution_count": 11, 283 | "metadata": { 284 | "scrolled": true 285 | }, 286 | "outputs": [ 287 | { 288 | "name": "stdout", 289 | "output_type": "stream", 290 | "text": [ 291 | "The gradient is correct!\n", 292 | "difference = 2.91933588329e-10\n" 293 | ] 294 | } 295 | ], 296 | "source": [ 297 | "x, theta = 2, 4\n", 298 | "difference = gradient_check(x, theta)\n", 299 | "print(\"difference = \" + str(difference))" 300 | ] 301 | }, 302 | { 303 | "cell_type": "markdown", 304 | "metadata": {}, 305 | "source": [ 306 | "**Expected Output**:\n", 307 | "The gradient is correct!\n", 308 | "\n", 309 | " \n", 310 | " \n", 311 | " \n", 312 | " \n", 313 | "
** difference ** 2.9193358103083e-10
" 314 | ] 315 | }, 316 | { 317 | "cell_type": "markdown", 318 | "metadata": {}, 319 | "source": [ 320 | "Congrats, the difference is smaller than the $10^{-7}$ threshold. So you can have high confidence that you've correctly computed the gradient in `backward_propagation()`. \n", 321 | "\n", 322 | "Now, in the more general case, your cost function $J$ has more than a single 1D input. When you are training a neural network, $\\theta$ actually consists of multiple matrices $W^{[l]}$ and biases $b^{[l]}$! It is important to know how to do a gradient check with higher-dimensional inputs. Let's do it!" 323 | ] 324 | }, 325 | { 326 | "cell_type": "markdown", 327 | "metadata": {}, 328 | "source": [ 329 | "## 3) N-dimensional gradient checking" 330 | ] 331 | }, 332 | { 333 | "cell_type": "markdown", 334 | "metadata": { 335 | "collapsed": true 336 | }, 337 | "source": [ 338 | "The following figure describes the forward and backward propagation of your fraud detection model.\n", 339 | "\n", 340 | "\n", 341 | "
**Figure 2** : **deep neural network**
*LINEAR -> RELU -> LINEAR -> RELU -> LINEAR -> SIGMOID*
\n", 342 | "\n", 343 | "Let's look at your implementations for forward propagation and backward propagation. " 344 | ] 345 | }, 346 | { 347 | "cell_type": "code", 348 | "execution_count": 12, 349 | "metadata": { 350 | "collapsed": true 351 | }, 352 | "outputs": [], 353 | "source": [ 354 | "def forward_propagation_n(X, Y, parameters):\n", 355 | " \"\"\"\n", 356 | " Implements the forward propagation (and computes the cost) presented in Figure 3.\n", 357 | " \n", 358 | " Arguments:\n", 359 | " X -- training set for m examples\n", 360 | " Y -- labels for m examples \n", 361 | " parameters -- python dictionary containing your parameters \"W1\", \"b1\", \"W2\", \"b2\", \"W3\", \"b3\":\n", 362 | " W1 -- weight matrix of shape (5, 4)\n", 363 | " b1 -- bias vector of shape (5, 1)\n", 364 | " W2 -- weight matrix of shape (3, 5)\n", 365 | " b2 -- bias vector of shape (3, 1)\n", 366 | " W3 -- weight matrix of shape (1, 3)\n", 367 | " b3 -- bias vector of shape (1, 1)\n", 368 | " \n", 369 | " Returns:\n", 370 | " cost -- the cost function (logistic cost for one example)\n", 371 | " \"\"\"\n", 372 | " \n", 373 | " # retrieve parameters\n", 374 | " m = X.shape[1]\n", 375 | " W1 = parameters[\"W1\"]\n", 376 | " b1 = parameters[\"b1\"]\n", 377 | " W2 = parameters[\"W2\"]\n", 378 | " b2 = parameters[\"b2\"]\n", 379 | " W3 = parameters[\"W3\"]\n", 380 | " b3 = parameters[\"b3\"]\n", 381 | "\n", 382 | " # LINEAR -> RELU -> LINEAR -> RELU -> LINEAR -> SIGMOID\n", 383 | " Z1 = np.dot(W1, X) + b1\n", 384 | " A1 = relu(Z1)\n", 385 | " Z2 = np.dot(W2, A1) + b2\n", 386 | " A2 = relu(Z2)\n", 387 | " Z3 = np.dot(W3, A2) + b3\n", 388 | " A3 = sigmoid(Z3)\n", 389 | "\n", 390 | " # Cost\n", 391 | " logprobs = np.multiply(-np.log(A3),Y) + np.multiply(-np.log(1 - A3), 1 - Y)\n", 392 | " cost = 1./m * np.sum(logprobs)\n", 393 | " \n", 394 | " cache = (Z1, A1, W1, b1, Z2, A2, W2, b2, Z3, A3, W3, b3)\n", 395 | " \n", 396 | " return cost, cache" 397 | ] 398 | }, 399 | { 400 | "cell_type": "markdown", 401 | "metadata": {}, 402 | "source": [ 403 | "Now, run backward propagation." 404 | ] 405 | }, 406 | { 407 | "cell_type": "code", 408 | "execution_count": 22, 409 | "metadata": { 410 | "collapsed": true 411 | }, 412 | "outputs": [], 413 | "source": [ 414 | "def backward_propagation_n(X, Y, cache):\n", 415 | " \"\"\"\n", 416 | " Implement the backward propagation presented in figure 2.\n", 417 | " \n", 418 | " Arguments:\n", 419 | " X -- input datapoint, of shape (input size, 1)\n", 420 | " Y -- true \"label\"\n", 421 | " cache -- cache output from forward_propagation_n()\n", 422 | " \n", 423 | " Returns:\n", 424 | " gradients -- A dictionary with the gradients of the cost with respect to each parameter, activation and pre-activation variables.\n", 425 | " \"\"\"\n", 426 | " \n", 427 | " m = X.shape[1]\n", 428 | " (Z1, A1, W1, b1, Z2, A2, W2, b2, Z3, A3, W3, b3) = cache\n", 429 | " \n", 430 | " dZ3 = A3 - Y\n", 431 | " dW3 = 1./m * np.dot(dZ3, A2.T)\n", 432 | " db3 = 1./m * np.sum(dZ3, axis=1, keepdims = True)\n", 433 | " \n", 434 | " dA2 = np.dot(W3.T, dZ3)\n", 435 | " dZ2 = np.multiply(dA2, np.int64(A2 > 0))\n", 436 | " dW2 = 1./m * np.dot(dZ2, A1.T)\n", 437 | " db2 = 1./m * np.sum(dZ2, axis=1, keepdims = True)\n", 438 | " \n", 439 | " dA1 = np.dot(W2.T, dZ2)\n", 440 | " dZ1 = np.multiply(dA1, np.int64(A1 > 0))\n", 441 | " dW1 = 1./m * np.dot(dZ1, X.T)\n", 442 | " db1 = 1./m * np.sum(dZ1, axis=1, keepdims = True)\n", 443 | " \n", 444 | " gradients = {\"dZ3\": dZ3, \"dW3\": dW3, \"db3\": db3,\n", 445 | " \"dA2\": dA2, \"dZ2\": dZ2, \"dW2\": dW2, \"db2\": db2,\n", 446 | " \"dA1\": dA1, \"dZ1\": dZ1, \"dW1\": dW1, \"db1\": db1}\n", 447 | " \n", 448 | " return gradients" 449 | ] 450 | }, 451 | { 452 | "cell_type": "markdown", 453 | "metadata": { 454 | "collapsed": true 455 | }, 456 | "source": [ 457 | "You obtained some results on the fraud detection test set but you are not 100% sure of your model. Nobody's perfect! Let's implement gradient checking to verify if your gradients are correct." 458 | ] 459 | }, 460 | { 461 | "cell_type": "markdown", 462 | "metadata": {}, 463 | "source": [ 464 | "**How does gradient checking work?**.\n", 465 | "\n", 466 | "As in 1) and 2), you want to compare \"gradapprox\" to the gradient computed by backpropagation. The formula is still:\n", 467 | "\n", 468 | "$$ \\frac{\\partial J}{\\partial \\theta} = \\lim_{\\varepsilon \\to 0} \\frac{J(\\theta + \\varepsilon) - J(\\theta - \\varepsilon)}{2 \\varepsilon} \\tag{1}$$\n", 469 | "\n", 470 | "However, $\\theta$ is not a scalar anymore. It is a dictionary called \"parameters\". We implemented a function \"`dictionary_to_vector()`\" for you. It converts the \"parameters\" dictionary into a vector called \"values\", obtained by reshaping all parameters (W1, b1, W2, b2, W3, b3) into vectors and concatenating them.\n", 471 | "\n", 472 | "The inverse function is \"`vector_to_dictionary`\" which outputs back the \"parameters\" dictionary.\n", 473 | "\n", 474 | "\n", 475 | "
**Figure 2** : **dictionary_to_vector() and vector_to_dictionary()**
You will need these functions in gradient_check_n()
\n", 476 | "\n", 477 | "We have also converted the \"gradients\" dictionary into a vector \"grad\" using gradients_to_vector(). You don't need to worry about that.\n", 478 | "\n", 479 | "**Exercise**: Implement gradient_check_n().\n", 480 | "\n", 481 | "**Instructions**: Here is pseudo-code that will help you implement the gradient check.\n", 482 | "\n", 483 | "For each i in num_parameters:\n", 484 | "- To compute `J_plus[i]`:\n", 485 | " 1. Set $\\theta^{+}$ to `np.copy(parameters_values)`\n", 486 | " 2. Set $\\theta^{+}_i$ to $\\theta^{+}_i + \\varepsilon$\n", 487 | " 3. Calculate $J^{+}_i$ using to `forward_propagation_n(x, y, vector_to_dictionary(`$\\theta^{+}$ `))`. \n", 488 | "- To compute `J_minus[i]`: do the same thing with $\\theta^{-}$\n", 489 | "- Compute $gradapprox[i] = \\frac{J^{+}_i - J^{-}_i}{2 \\varepsilon}$\n", 490 | "\n", 491 | "Thus, you get a vector gradapprox, where gradapprox[i] is an approximation of the gradient with respect to `parameter_values[i]`. You can now compare this gradapprox vector to the gradients vector from backpropagation. Just like for the 1D case (Steps 1', 2', 3'), compute: \n", 492 | "$$ difference = \\frac {\\| grad - gradapprox \\|_2}{\\| grad \\|_2 + \\| gradapprox \\|_2 } \\tag{3}$$" 493 | ] 494 | }, 495 | { 496 | "cell_type": "code", 497 | "execution_count": 23, 498 | "metadata": { 499 | "collapsed": true 500 | }, 501 | "outputs": [], 502 | "source": [ 503 | "# GRADED FUNCTION: gradient_check_n\n", 504 | "\n", 505 | "def gradient_check_n(parameters, gradients, X, Y, epsilon = 1e-7):\n", 506 | " \"\"\"\n", 507 | " Checks if backward_propagation_n computes correctly the gradient of the cost output by forward_propagation_n\n", 508 | " \n", 509 | " Arguments:\n", 510 | " parameters -- python dictionary containing your parameters \"W1\", \"b1\", \"W2\", \"b2\", \"W3\", \"b3\":\n", 511 | " grad -- output of backward_propagation_n, contains gradients of the cost with respect to the parameters. \n", 512 | " x -- input datapoint, of shape (input size, 1)\n", 513 | " y -- true \"label\"\n", 514 | " epsilon -- tiny shift to the input to compute approximated gradient with formula(1)\n", 515 | " \n", 516 | " Returns:\n", 517 | " difference -- difference (2) between the approximated gradient and the backward propagation gradient\n", 518 | " \"\"\"\n", 519 | " \n", 520 | " # Set-up variables\n", 521 | " parameters_values, _ = dictionary_to_vector(parameters)\n", 522 | " grad = gradients_to_vector(gradients)\n", 523 | " num_parameters = parameters_values.shape[0]\n", 524 | " J_plus = np.zeros((num_parameters, 1))\n", 525 | " J_minus = np.zeros((num_parameters, 1))\n", 526 | " gradapprox = np.zeros((num_parameters, 1))\n", 527 | " \n", 528 | " # Compute gradapprox\n", 529 | " for i in range(num_parameters):\n", 530 | " \n", 531 | " # Compute J_plus[i]. Inputs: \"parameters_values, epsilon\". Output = \"J_plus[i]\".\n", 532 | " # \"_\" is used because the function you have to outputs two parameters but we only care about the first one\n", 533 | " ### START CODE HERE ### (approx. 3 lines)\n", 534 | " thetaplus = np.copy(parameters_values) # Step 1\n", 535 | " thetaplus[i][0] += epsilon # Step 2\n", 536 | " J_plus[i], _ = forward_propagation_n(X, Y, vector_to_dictionary(thetaplus)) # Step 3\n", 537 | " ### END CODE HERE ###\n", 538 | " \n", 539 | " # Compute J_minus[i]. Inputs: \"parameters_values, epsilon\". Output = \"J_minus[i]\".\n", 540 | " ### START CODE HERE ### (approx. 3 lines)\n", 541 | " thetaminus = np.copy(parameters_values) # Step 1\n", 542 | " thetaminus[i][0] -= epsilon # Step 2 \n", 543 | " J_minus[i], _ = forward_propagation_n(X, Y, vector_to_dictionary(thetaminus)) # Step 3\n", 544 | " ### END CODE HERE ###\n", 545 | " \n", 546 | " # Compute gradapprox[i]\n", 547 | " ### START CODE HERE ### (approx. 1 line)\n", 548 | " gradapprox[i] = 0.5*(J_plus[i] - J_minus[i])/epsilon\n", 549 | " ### END CODE HERE ###\n", 550 | " \n", 551 | " # Compare gradapprox to backward propagation gradients by computing difference.\n", 552 | " ### START CODE HERE ### (approx. 1 line)\n", 553 | " numerator = np.linalg.norm(grad - gradapprox) # Step 1'\n", 554 | " denominator = np.linalg.norm(grad) + np.linalg.norm(gradapprox) # Step 2'\n", 555 | " difference = numerator/denominator # Step 3'\n", 556 | " ### END CODE HERE ###\n", 557 | "\n", 558 | " if difference > 1e-7:\n", 559 | " print (\"\\033[93m\" + \"There is a mistake in the backward propagation! difference = \" + str(difference) + \"\\033[0m\")\n", 560 | " else:\n", 561 | " print (\"\\033[92m\" + \"Your backward propagation works perfectly fine! difference = \" + str(difference) + \"\\033[0m\")\n", 562 | " \n", 563 | " return difference" 564 | ] 565 | }, 566 | { 567 | "cell_type": "code", 568 | "execution_count": 24, 569 | "metadata": { 570 | "scrolled": false 571 | }, 572 | "outputs": [ 573 | { 574 | "name": "stdout", 575 | "output_type": "stream", 576 | "text": [ 577 | "\u001b[93mThere is a mistake in the backward propagation! difference = 1.18904178788e-07\u001b[0m\n" 578 | ] 579 | } 580 | ], 581 | "source": [ 582 | "X, Y, parameters = gradient_check_n_test_case()\n", 583 | "\n", 584 | "cost, cache = forward_propagation_n(X, Y, parameters)\n", 585 | "gradients = backward_propagation_n(X, Y, cache)\n", 586 | "difference = gradient_check_n(parameters, gradients, X, Y)" 587 | ] 588 | }, 589 | { 590 | "cell_type": "markdown", 591 | "metadata": {}, 592 | "source": [ 593 | "**Expected output**:\n", 594 | "\n", 595 | "\n", 596 | " \n", 597 | " \n", 598 | " \n", 599 | " \n", 600 | "
** There is a mistake in the backward propagation!** difference = 0.285093156781
" 601 | ] 602 | }, 603 | { 604 | "cell_type": "markdown", 605 | "metadata": {}, 606 | "source": [ 607 | "It seems that there were errors in the `backward_propagation_n` code we gave you! Good that you've implemented the gradient check. Go back to `backward_propagation` and try to find/correct the errors *(Hint: check dW2 and db1)*. Rerun the gradient check when you think you've fixed it. Remember you'll need to re-execute the cell defining `backward_propagation_n()` if you modify the code. \n", 608 | "\n", 609 | "Can you get gradient check to declare your derivative computation correct? Even though this part of the assignment isn't graded, we strongly urge you to try to find the bug and re-run gradient check until you're convinced backprop is now correctly implemented. \n", 610 | "\n", 611 | "**Note** \n", 612 | "- Gradient Checking is slow! Approximating the gradient with $\\frac{\\partial J}{\\partial \\theta} \\approx \\frac{J(\\theta + \\varepsilon) - J(\\theta - \\varepsilon)}{2 \\varepsilon}$ is computationally costly. For this reason, we don't run gradient checking at every iteration during training. Just a few times to check if the gradient is correct. \n", 613 | "- Gradient Checking, at least as we've presented it, doesn't work with dropout. You would usually run the gradient check algorithm without dropout to make sure your backprop is correct, then add dropout. \n", 614 | "\n", 615 | "Congrats, you can be confident that your deep learning model for fraud detection is working correctly! You can even use this to convince your CEO. :) \n", 616 | "\n", 617 | "\n", 618 | "**What you should remember from this notebook**:\n", 619 | "- Gradient checking verifies closeness between the gradients from backpropagation and the numerical approximation of the gradient (computed using forward propagation).\n", 620 | "- Gradient checking is slow, so we don't run it in every iteration of training. You would usually run it only to make sure your code is correct, then turn it off and use backprop for the actual learning process. " 621 | ] 622 | }, 623 | { 624 | "cell_type": "code", 625 | "execution_count": null, 626 | "metadata": { 627 | "collapsed": true 628 | }, 629 | "outputs": [], 630 | "source": [] 631 | } 632 | ], 633 | "metadata": { 634 | "coursera": { 635 | "course_slug": "deep-neural-network", 636 | "graded_item_id": "n6NBD", 637 | "launcher_item_id": "yfOsE" 638 | }, 639 | "kernelspec": { 640 | "display_name": "Python 3", 641 | "language": "python", 642 | "name": "python3" 643 | }, 644 | "language_info": { 645 | "codemirror_mode": { 646 | "name": "ipython", 647 | "version": 3 648 | }, 649 | "file_extension": ".py", 650 | "mimetype": "text/x-python", 651 | "name": "python", 652 | "nbconvert_exporter": "python", 653 | "pygments_lexer": "ipython3", 654 | "version": "3.6.0" 655 | } 656 | }, 657 | "nbformat": 4, 658 | "nbformat_minor": 1 659 | } 660 | -------------------------------------------------------------------------------- /course3_structuring_machine_learning_projects/.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wangruinju/Deep-Learning/e4c2b9d04898bd6113ef32377732915d4282b838/course3_structuring_machine_learning_projects/.DS_Store -------------------------------------------------------------------------------- /course3_structuring_machine_learning_projects/Coursera AGVR9FXTTJAB.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wangruinju/Deep-Learning/e4c2b9d04898bd6113ef32377732915d4282b838/course3_structuring_machine_learning_projects/Coursera AGVR9FXTTJAB.pdf -------------------------------------------------------------------------------- /course3_structuring_machine_learning_projects/Week 1 Quiz - Bird recognition in the city of Peacetopia (case study).md: -------------------------------------------------------------------------------- 1 | ## Week 1 Quiz - Bird recognition in the city of Peacetopia (case study) 2 | 3 | 1. Having three evaluation metrics makes it harder for you to quickly choose between two different algorithms, and will slow down the speed with which your team can iterate. True/False? 4 | 5 | - [x] True 6 | - [ ] False 7 | 8 | 2. If you had the three following models, which one would you choose? 9 | 10 | - Test Accuracy 98% 11 | - Runtime 9 sec 12 | - Memory size 9MB 13 | 14 | 3. Based on the city’s requests, which of the following would you say is true? 15 | 16 | - [x] Accuracy is an optimizing metric; running time and memory size are a satisficing metrics. 17 | - [ ] Accuracy is a satisficing metric; running time and memory size are an optimizing metric. 18 | - [ ] Accuracy, running time and memory size are all optimizing metrics because you want to do well on all three. 19 | - [ ] Accuracy, running time and memory size are all satisficing metrics because you have to do sufficiently well on all three for your system to be acceptable. 20 | 21 | 4. Before implementing your algorithm, you need to split your data into train/dev/test sets. Which of these do you think is the best choice? 22 | 23 | - Train 9,500,000 24 | - Dev 250,000 25 | - Test 250,000 26 | 27 | 5. After setting up your train/dev/test sets, the City Council comes across another 1,000,000 images, called the “citizens’ data”. Apparently the citizens of Peacetopia are so scared of birds that they volunteered to take pictures of the sky and label them, thus contributing these additional 1,000,000 images. These images are different from the distribution of images the City Council had originally given you, but you think it could help your algorithm. 28 | 29 | You should not add the citizens’ data to the training set, because this will cause the training and dev/test set distributions to become different, thus hurting dev and test set performance. True/False? 30 | 31 | - [ ] True 32 | - [x] False 33 | 34 | 6. One member of the City Council knows a little about machine learning, and thinks you should add the 1,000,000 citizens’ data images to the test set. You object because: 35 | 36 | - The test set no longer reflects the distribution of data (security cameras) you most care about. 37 | - This would cause the dev and test set distributions to become different. This is a bad idea because you’re not aiming where you want to hit. 38 | 39 | 7. You train a system, and its errors are as follows (error = 100%-Accuracy): 40 | 41 | - Training set error 4.0% 42 | - Dev set error 4.5% 43 | 44 | This suggests that one good avenue for improving performance is to train a bigger network so as to drive down the 4.0% training error. Do you agree? 45 | 46 | - No, because there is insufficient information to tell. 47 | 48 | 8. You ask a few people to label the dataset so as to find out what is human-level performance. You find the following levels of accuracy: 49 | 50 | - Bird watching expert #1 0.3% error 51 | - Bird watching expert #2 0.5% error 52 | - Normal person #1 (not a bird watching expert) 1.0% error 53 | - Normal person #2 (not a bird watching expert) 1.2% error 54 | 55 | If your goal is to have “human-level performance” be a proxy (or estimate) for Bayes error, how would you define “human-level performance”? 56 | 57 | - 0.3% (accuracy of expert #1) 58 | 59 | 9. Which of the following statements do you agree with? 60 | 61 | - A learning algorithm’s performance can be better human-level performance but it can never be better than Bayes error. 62 | 63 | 10. You find that a team of ornithologists debating and discussing an image gets an even better 0.1% performance, so you define that as “human-level performance.” After working further on your algorithm, you end up with the following: 64 | 65 | - Human-level performance 0.1% 66 | - Training set error 2.0% 67 | - Dev set error 2.1% 68 | 69 | Based on the evidence you have, which two of the following four options seem the most promising to try? (Check two options.) 70 | 71 | - Try decreasing regularization. 72 | - Train a bigger model to try to do better on the training set. 73 | 74 | 11. You also evaluate your model on the test set, and find the following: 75 | 76 | - Human-level performance 0.1% 77 | - Training set error 2.0% 78 | - Dev set error 2.1% 79 | - Test set error 7.0% 80 | 81 | What does this mean? (Check the two best options.) 82 | 83 | - You should try to get a bigger dev set. 84 | - You have overfit to the dev set. 85 | 86 | 12. After working on this project for a year, you finally achieve: 87 | 88 | - Human-level performance 0.10% 89 | - Training set error 0.05% 90 | - Dev set error 0.05% 91 | 92 | What can you conclude? (Check all that apply.) 93 | 94 | - It is now harder to measure avoidable bias, thus progress will be slower going forward. 95 | - If the test set is big enough for the 0,05% error estimate to be accurate, this implies Bayes error is ≤0.05 96 | 97 | 13. It turns out Peacetopia has hired one of your competitors to build a system as well. Your system and your competitor both deliver systems with about the same running time and memory size. However, your system has higher accuracy! However, when Peacetopia tries out your and your competitor’s systems, they conclude they actually like your competitor’s system better, because even though you have higher overall accuracy, you have more false negatives (failing to raise an alarm when a bird is in the air). What should you do? 98 | 99 | - Rethink the appropriate metric for this task, and ask your team to tune to the new metric. 100 | 101 | 14. You’ve handily beaten your competitor, and your system is now deployed in Peacetopia and is protecting the citizens from birds! But over the last few months, a new species of bird has been slowly migrating into the area, so the performance of your system slowly degrades because your data is being tested on a new type of data. 102 | 103 | - Use the data you have to define a new evaluation metric (using a new dev/test set) taking into account the new species, and use that to drive further progress for your team. 104 | 105 | 15. The City Council thinks that having more Cats in the city would help scare off birds. They are so happy with your work on the Bird detector that they also hire you to build a Cat detector. (Wow Cat detectors are just incredibly useful aren’t they.) Because of years of working on Cat detectors, you have such a huge dataset of 100,000,000 cat images that training on this data takes about two weeks. Which of the statements do you agree with? (Check all that agree.) 106 | 107 | - If 100,000,000 examples is enough to build a good enough Cat detector, you might be better of training with just 10,000,000 examples to gain a ≈10x improvement in how quickly you can run experiments, even if each model performs a bit worse because it’s trained on less data. 108 | - Buying faster computers could speed up your teams’ iteration speed and thus your team’s productivity. 109 | - Needing two weeks to train will limit the speed at which you can iterate. 110 | -------------------------------------------------------------------------------- /course3_structuring_machine_learning_projects/Week 2 Quiz - Autonomous driving (case study).md: -------------------------------------------------------------------------------- 1 | ## Week 2 Quiz - Autonomous driving (case study) 2 | 3 | 1. You are just getting started on this project. What is the first thing you do? Assume each of the steps below would take about an equal amount of time (a few days). 4 | 5 | - Spend a few days training a basic model and see what mistakes it makes. 6 | 7 | > As discussed in lecture, applied ML is a highly iterative process. If you train a basic model and carry out error analysis (see what mistakes it makes) it will help point you in more promising directions. 8 | 9 | 2. Your goal is to detect road signs (stop sign, pedestrian crossing sign, construction ahead sign) and traffic signals (red and green lights) in images. The goal is to recognize which of these objects appear in each image. You plan to use a deep neural network with ReLU units in the hidden layers. 10 | 11 | For the output layer, a softmax activation would be a good choice for the output layer because this is a multi-task learning problem. True/False? 12 | 13 | - [ ] True 14 | - [x] False 15 | 16 | > Softmax would be a good choice if one and only one of the possibilities (stop sign, speed bump, pedestrian crossing, green light and red light) was present in each image. 17 | 18 | 3. You are carrying out error analysis and counting up what errors the algorithm makes. Which of these datasets do you think you should manually go through and carefully examine, one image at a time? 19 | 20 | - [ ] 10,000 randomly chosen images 21 | - [ ] 500 randomly chosen images 22 | - [x] 500 images on which the algorithm made a mistake 23 | - [ ] 10,000 images on which the algorithm made a mistake 24 | 25 | 4. After working on the data for several weeks, your team ends up with the following data: 26 | 27 | - 100,000 labeled images taken using the front-facing camera of your car. 28 | - 900,000 labeled images of roads downloaded from the internet. 29 | 30 | Each image’s labels precisely indicate the presence of any specific road signs and traffic signals or combinations of them. For example, y(i) = [1 0 0 1 0] means the image contains a stop sign and a red traffic light. 31 | Because this is a multi-task learning problem, you need to have all your y(i) vectors fully labeled. If one example is equal to [0 ? 1 1 ?] then the learning algorithm will not be able to use that example. True/False? 32 | 33 | - [ ] True 34 | - [x] False 35 | 36 | > As seen in the lecture on multi-task learning, you can compute the cost such that it is not influenced by the fact that some entries haven’t been labeled. 37 | 38 | 5. The distribution of data you care about contains images from your car’s front-facing camera; which comes from a different distribution than the images you were able to find and download off the internet. How should you split the dataset into train/dev/test sets? 39 | 40 | - [ ] Mix all the 100,000 images with the 900,000 images you found online. Shuffle everything. Split the 1,000,000 images dataset into 600,000 for the training set, 200,000 for the dev set and 200,000 for the test set. 41 | - [ ] Mix all the 100,000 images with the 900,000 images you found online. Shuffle everything. Split the 1,000,000 images dataset into 980,000 for the training set, 10,000 for the dev set and 10,000 for the test set. 42 | - [x] Choose the training set to be the 900,000 images from the internet along with 80,000 images from your car’s front-facing camera. The 20,000 remaining images will be split equally in dev and test sets. 43 | - [ ] Choose the training set to be the 900,000 images from the internet along with 20,000 images from your car’s front-facing camera. The 80,000 remaining images will be split equally in dev and test sets. 44 | > As seen in lecture, it is important that your dev and test set have the closest possible distribution to “real”-data. It is also important for the training set to contain enough “real”-data to avoid having a data-mismatch problem. 45 | 46 | 6. Assume you’ve finally chosen the following split between of the data: 47 | 48 | - Training 940,000 images randomly picked from (900,000 internet images + 60,000 car’s front-facing camera images) 8.8% 49 | - Training-Dev 20,000 images randomly picked from (900,000 internet images + 60,000 car’s front-facing camera images) 9.1% 50 | - Dev 20,000 images from your car’s front-facing camera 14.3% 51 | - Test 20,000 images from the car’s front-facing camera 14.8% 52 | 53 | You also know that human-level error on the road sign and traffic signals classification task is around 0.5%. Which of the following are True? (Check all that apply). 54 | 55 | - You have a large avoidable-bias problem because your training error is quite a bit higher than the human-level error. 56 | - You have a large data-mismatch problem because your model does a lot better on the training-dev set than on the dev set. 57 | 58 | 7. Based on table from the previous question, a friend thinks that the training data distribution is much easier than the dev/test distribution. What do you think? 59 | 60 | - There’s insufficient information to tell if your friend is right or wrong. 61 | 62 | > The algorithm does better on the distribution of data it trained on. But you don’t know if it’s because it trained on that no distribution or if it really is easier. To get a better sense, measure human-level error separately on both distributions. 63 | 64 | 8. You decide to focus on the dev set and check by hand what are the errors due to. Here is a table summarizing your discoveries: 65 | 66 | - Overall dev set error 14.3% 67 | - Errors due to incorrectly labeled data 4.1% 68 | - Errors due to foggy pictures 8.0% 69 | - Errors due to rain drops stuck on your car’s front-facing camera 2.2% 70 | - Errors due to other causes 1.0% 71 | 72 | In this table, 4.1%, 8.0%, etc.are a fraction of the total dev set (not just examples your algorithm mislabeled). I.e. about 8.0/14.3 = 56% of your errors are due to foggy pictures. 73 | 74 | The results from this analysis implies that the team’s highest priority should be to bring more foggy pictures into the training set so as to address the 8.0% of errors in that category. True/False? 75 | 76 | - [x] False because this would depend on how easy it is to add this data and how much you think your team thinks it’ll help. 77 | - [ ] True because it is the largest category of errors. As discussed in lecture, we should prioritize the largest category of error to avoid wasting the team’s time. 78 | - [ ] True because it is greater than the other error categories added together (8.0 > 4.1+2.2+1.0). 79 | - [ ] False because data augmentation (synthesizing foggy images by clean/non-foggy images) is more efficient. 80 | 81 | 9. You can buy a specially designed windshield wiper that help wipe off some of the raindrops on the front-facing camera. Based on the table from the previous question, which of the following statements do you agree with? 82 | 83 | - 2.2% would be a reasonable estimate of the maximum amount this windshield wiper could improve performance. 84 | 85 | > You will probably not improve performance by more than 2.2% by solving the raindrops problem. If your dataset was infinitely big, 2.2% would be a perfect estimate of the improvement you can achieve by purchasing a specially designed windshield wiper that removes the raindrops. 86 | 87 | 10. You decide to use data augmentation to address foggy images. You find 1,000 pictures of fog off the internet, and “add” them to clean images to synthesize foggy days, like this: 88 | 89 | Which of the following statements do you agree with? (Check all that apply.) 90 | 91 | - So long as the synthesized fog looks realistic to the human eye, you can be confident that the synthesized data is accurately capturing the distribution of real foggy images, since human vision is very accurate for the problem you’re solving. 92 | 93 | > If the synthesized images look realistic, then the model will just see them as if you had added useful data to identify road signs and traffic signals in a foggy weather. I will very likely help. 94 | 95 | 11. After working further on the problem, you’ve decided to correct the incorrectly labeled data on the dev set. Which of these statements do you agree with? (Check all that apply). 96 | 97 | - You should not correct incorrectly labeled data in the training set as well so as to avoid your training set now being even more different from your dev set. 98 | 99 | > Deep learning algorithms are quite robust to having slightly different train and dev distributions. 100 | 101 | - You should also correct the incorrectly labeled data in the test set, so that the dev and test sets continue to come from the same distribution 102 | 103 | > Because you want to make sure that your dev and test data come from the same distribution for your algorithm to make your team’s iterative development process is efficient. 104 | 105 | 12. So far your algorithm only recognizes red and green traffic lights. One of your colleagues in the startup is starting to work on recognizing a yellow traffic light. (Some countries call it an orange light rather than a yellow light; we’ll use the US convention of calling it yellow.) Images containing yellow lights are quite rare, and she doesn’t have enough data to build a good model. She hopes you can help her out using transfer learning. 106 | 107 | What do you tell your colleague? 108 | 109 | - She should try using weights pre-trained on your dataset, and fine-tuning further with the yellow-light dataset. 110 | 111 | > You have trained your model on a huge dataset, and she has a small dataset. Although your labels are different, the parameters of your model have been trained to recognize many characteristics of road and traffic images which will be useful for her problem. This is a perfect case for transfer learning, she can start with a model with the same architecture as yours, change what is after the last hidden layer and initialize it with your trained parameters. 112 | 113 | 13. Another colleague wants to use microphones placed outside the car to better hear if there’re other vehicles around you. For example, if there is a police vehicle behind you, you would be able to hear their siren. However, they don’t have much to train this audio system. How can you help? 114 | 115 | - Neither transfer learning nor multi-task learning seems promising. 116 | 117 | > The problem he is trying to solve is quite different from yours. The different dataset structures make it probably impossible to use transfer learning or multi-task learning. 118 | 119 | 14. To recognize red and green lights, you have been using this approach: 120 | 121 | - (A) Input an image (x) to a neural network and have it directly learn a mapping to make a prediction as to whether there’s a red light and/or green light (y). 122 | 123 | A teammate proposes a different, two-step approach: 124 | 125 | - (B) In this two-step approach, you would first (i) detect the traffic light in the image (if any), then (ii) determine the color of the illuminated lamp in the traffic light. 126 | Between these two, Approach B is more of an end-to-end approach because it has distinct steps for the input end and the output end. True/False? 127 | 128 | 129 | - [ ] True 130 | - [x] False 131 | 132 | > (A) is an end-to-end approach as it maps directly the input (x) to the output (y). 133 | 134 | 15. Approach A (in the question above) tends to be more promising than approach B if you have a ________ (fill in the blank). 135 | 136 | - [x] Large training set 137 | - [ ] Multi-task learning problem. 138 | - [ ] Large bias problem. 139 | - [ ] Problem with a high Bayes error. 140 | 141 | > In many fields, it has been observed that end-to-end learning works better in practice, but requires a large amount of data. 142 | -------------------------------------------------------------------------------- /course4_convolutional_neural_networks/Coursera 22RV83VPMX63.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wangruinju/Deep-Learning/e4c2b9d04898bd6113ef32377732915d4282b838/course4_convolutional_neural_networks/Coursera 22RV83VPMX63.pdf -------------------------------------------------------------------------------- /course4_convolutional_neural_networks/Face+Recognition+for+the+Happy+House+-+v3.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Face Recognition for the Happy House\n", 8 | "\n", 9 | "Welcome to the first assignment of week 4! Here you will build a face recognition system. Many of the ideas presented here are from [FaceNet](https://arxiv.org/pdf/1503.03832.pdf). In lecture, we also talked about [DeepFace](https://research.fb.com/wp-content/uploads/2016/11/deepface-closing-the-gap-to-human-level-performance-in-face-verification.pdf). \n", 10 | "\n", 11 | "Face recognition problems commonly fall into two categories: \n", 12 | "\n", 13 | "- **Face Verification** - \"is this the claimed person?\". For example, at some airports, you can pass through customs by letting a system scan your passport and then verifying that you (the person carrying the passport) are the correct person. A mobile phone that unlocks using your face is also using face verification. This is a 1:1 matching problem. \n", 14 | "- **Face Recognition** - \"who is this person?\". For example, the video lecture showed a face recognition video (https://www.youtube.com/watch?v=wr4rx0Spihs) of Baidu employees entering the office without needing to otherwise identify themselves. This is a 1:K matching problem. \n", 15 | "\n", 16 | "FaceNet learns a neural network that encodes a face image into a vector of 128 numbers. By comparing two such vectors, you can then determine if two pictures are of the same person.\n", 17 | " \n", 18 | "**In this assignment, you will:**\n", 19 | "- Implement the triplet loss function\n", 20 | "- Use a pretrained model to map face images into 128-dimensional encodings\n", 21 | "- Use these encodings to perform face verification and face recognition\n", 22 | "\n", 23 | "In this exercise, we will be using a pre-trained model which represents ConvNet activations using a \"channels first\" convention, as opposed to the \"channels last\" convention used in lecture and previous programming assignments. In other words, a batch of images will be of shape $(m, n_C, n_H, n_W)$ instead of $(m, n_H, n_W, n_C)$. Both of these conventions have a reasonable amount of traction among open-source implementations; there isn't a uniform standard yet within the deep learning community. \n", 24 | "\n", 25 | "Let's load the required packages. \n" 26 | ] 27 | }, 28 | { 29 | "cell_type": "code", 30 | "execution_count": 14, 31 | "metadata": {}, 32 | "outputs": [ 33 | { 34 | "name": "stdout", 35 | "output_type": "stream", 36 | "text": [ 37 | "The autoreload extension is already loaded. To reload it, use:\n", 38 | " %reload_ext autoreload\n" 39 | ] 40 | } 41 | ], 42 | "source": [ 43 | "from keras.models import Sequential\n", 44 | "from keras.layers import Conv2D, ZeroPadding2D, Activation, Input, concatenate\n", 45 | "from keras.models import Model\n", 46 | "from keras.layers.normalization import BatchNormalization\n", 47 | "from keras.layers.pooling import MaxPooling2D, AveragePooling2D\n", 48 | "from keras.layers.merge import Concatenate\n", 49 | "from keras.layers.core import Lambda, Flatten, Dense\n", 50 | "from keras.initializers import glorot_uniform\n", 51 | "from keras.engine.topology import Layer\n", 52 | "from keras import backend as K\n", 53 | "K.set_image_data_format('channels_first')\n", 54 | "import cv2\n", 55 | "import os\n", 56 | "import numpy as np\n", 57 | "from numpy import genfromtxt\n", 58 | "import pandas as pd\n", 59 | "import tensorflow as tf\n", 60 | "from fr_utils import *\n", 61 | "from inception_blocks_v2 import *\n", 62 | "\n", 63 | "%matplotlib inline\n", 64 | "%load_ext autoreload\n", 65 | "%autoreload 2\n", 66 | "\n", 67 | "np.set_printoptions(threshold=np.nan)" 68 | ] 69 | }, 70 | { 71 | "cell_type": "markdown", 72 | "metadata": {}, 73 | "source": [ 74 | "## 0 - Naive Face Verification\n", 75 | "\n", 76 | "In Face Verification, you're given two images and you have to tell if they are of the same person. The simplest way to do this is to compare the two images pixel-by-pixel. If the distance between the raw images are less than a chosen threshold, it may be the same person! \n", 77 | "\n", 78 | "\n", 79 | "
**Figure 1**
" 80 | ] 81 | }, 82 | { 83 | "cell_type": "markdown", 84 | "metadata": { 85 | "collapsed": true 86 | }, 87 | "source": [ 88 | "Of course, this algorithm performs really poorly, since the pixel values change dramatically due to variations in lighting, orientation of the person's face, even minor changes in head position, and so on. \n", 89 | "\n", 90 | "You'll see that rather than using the raw image, you can learn an encoding $f(img)$ so that element-wise comparisons of this encoding gives more accurate judgements as to whether two pictures are of the same person." 91 | ] 92 | }, 93 | { 94 | "cell_type": "markdown", 95 | "metadata": {}, 96 | "source": [ 97 | "## 1 - Encoding face images into a 128-dimensional vector \n", 98 | "\n", 99 | "### 1.1 - Using an ConvNet to compute encodings\n", 100 | "\n", 101 | "The FaceNet model takes a lot of data and a long time to train. So following common practice in applied deep learning settings, let's just load weights that someone else has already trained. The network architecture follows the Inception model from [Szegedy *et al.*](https://arxiv.org/abs/1409.4842). We have provided an inception network implementation. You can look in the file `inception_blocks.py` to see how it is implemented (do so by going to \"File->Open...\" at the top of the Jupyter notebook). \n" 102 | ] 103 | }, 104 | { 105 | "cell_type": "markdown", 106 | "metadata": {}, 107 | "source": [ 108 | "The key things you need to know are:\n", 109 | "\n", 110 | "- This network uses 96x96 dimensional RGB images as its input. Specifically, inputs a face image (or batch of $m$ face images) as a tensor of shape $(m, n_C, n_H, n_W) = (m, 3, 96, 96)$ \n", 111 | "- It outputs a matrix of shape $(m, 128)$ that encodes each input face image into a 128-dimensional vector\n", 112 | "\n", 113 | "Run the cell below to create the model for face images." 114 | ] 115 | }, 116 | { 117 | "cell_type": "code", 118 | "execution_count": 15, 119 | "metadata": { 120 | "collapsed": true 121 | }, 122 | "outputs": [], 123 | "source": [ 124 | "FRmodel = faceRecoModel(input_shape=(3, 96, 96))" 125 | ] 126 | }, 127 | { 128 | "cell_type": "code", 129 | "execution_count": 16, 130 | "metadata": {}, 131 | "outputs": [ 132 | { 133 | "name": "stdout", 134 | "output_type": "stream", 135 | "text": [ 136 | "Total Params: 3743280\n" 137 | ] 138 | } 139 | ], 140 | "source": [ 141 | "print(\"Total Params:\", FRmodel.count_params())" 142 | ] 143 | }, 144 | { 145 | "cell_type": "markdown", 146 | "metadata": {}, 147 | "source": [ 148 | "** Expected Output **\n", 149 | "\n", 150 | "
\n", 151 | "Total Params: 3743280\n", 152 | "
\n", 153 | "
\n" 154 | ] 155 | }, 156 | { 157 | "cell_type": "markdown", 158 | "metadata": {}, 159 | "source": [ 160 | "By using a 128-neuron fully connected layer as its last layer, the model ensures that the output is an encoding vector of size 128. You then use the encodings the compare two face images as follows:\n", 161 | "\n", 162 | "\n", 163 | "
**Figure 2**:
By computing a distance between two encodings and thresholding, you can determine if the two pictures represent the same person
\n", 164 | "\n", 165 | "So, an encoding is a good one if: \n", 166 | "- The encodings of two images of the same person are quite similar to each other \n", 167 | "- The encodings of two images of different persons are very different\n", 168 | "\n", 169 | "The triplet loss function formalizes this, and tries to \"push\" the encodings of two images of the same person (Anchor and Positive) closer together, while \"pulling\" the encodings of two images of different persons (Anchor, Negative) further apart. \n", 170 | "\n", 171 | "\n", 172 | "
\n", 173 | "
**Figure 3**:
In the next part, we will call the pictures from left to right: Anchor (A), Positive (P), Negative (N)
" 174 | ] 175 | }, 176 | { 177 | "cell_type": "markdown", 178 | "metadata": {}, 179 | "source": [ 180 | "\n", 181 | "\n", 182 | "### 1.2 - The Triplet Loss\n", 183 | "\n", 184 | "For an image $x$, we denote its encoding $f(x)$, where $f$ is the function computed by the neural network.\n", 185 | "\n", 186 | "\n", 187 | "\n", 188 | "\n", 191 | "\n", 192 | "Training will use triplets of images $(A, P, N)$: \n", 193 | "\n", 194 | "- A is an \"Anchor\" image--a picture of a person. \n", 195 | "- P is a \"Positive\" image--a picture of the same person as the Anchor image.\n", 196 | "- N is a \"Negative\" image--a picture of a different person than the Anchor image.\n", 197 | "\n", 198 | "These triplets are picked from our training dataset. We will write $(A^{(i)}, P^{(i)}, N^{(i)})$ to denote the $i$-th training example. \n", 199 | "\n", 200 | "You'd like to make sure that an image $A^{(i)}$ of an individual is closer to the Positive $P^{(i)}$ than to the Negative image $N^{(i)}$) by at least a margin $\\alpha$:\n", 201 | "\n", 202 | "$$\\mid \\mid f(A^{(i)}) - f(P^{(i)}) \\mid \\mid_2^2 + \\alpha < \\mid \\mid f(A^{(i)}) - f(N^{(i)}) \\mid \\mid_2^2$$\n", 203 | "\n", 204 | "You would thus like to minimize the following \"triplet cost\":\n", 205 | "\n", 206 | "$$\\mathcal{J} = \\sum^{m}_{i=1} \\large[ \\small \\underbrace{\\mid \\mid f(A^{(i)}) - f(P^{(i)}) \\mid \\mid_2^2}_\\text{(1)} - \\underbrace{\\mid \\mid f(A^{(i)}) - f(N^{(i)}) \\mid \\mid_2^2}_\\text{(2)} + \\alpha \\large ] \\small_+ \\tag{3}$$\n", 207 | "\n", 208 | "Here, we are using the notation \"$[z]_+$\" to denote $max(z,0)$. \n", 209 | "\n", 210 | "Notes:\n", 211 | "- The term (1) is the squared distance between the anchor \"A\" and the positive \"P\" for a given triplet; you want this to be small. \n", 212 | "- The term (2) is the squared distance between the anchor \"A\" and the negative \"N\" for a given triplet, you want this to be relatively large, so it thus makes sense to have a minus sign preceding it. \n", 213 | "- $\\alpha$ is called the margin. It is a hyperparameter that you should pick manually. We will use $\\alpha = 0.2$. \n", 214 | "\n", 215 | "Most implementations also normalize the encoding vectors to have norm equal one (i.e., $\\mid \\mid f(img)\\mid \\mid_2$=1); you won't have to worry about that here.\n", 216 | "\n", 217 | "**Exercise**: Implement the triplet loss as defined by formula (3). Here are the 4 steps:\n", 218 | "1. Compute the distance between the encodings of \"anchor\" and \"positive\": $\\mid \\mid f(A^{(i)}) - f(P^{(i)}) \\mid \\mid_2^2$\n", 219 | "2. Compute the distance between the encodings of \"anchor\" and \"negative\": $\\mid \\mid f(A^{(i)}) - f(N^{(i)}) \\mid \\mid_2^2$\n", 220 | "3. Compute the formula per training example: $ \\mid \\mid f(A^{(i)}) - f(P^{(i)}) \\mid - \\mid \\mid f(A^{(i)}) - f(N^{(i)}) \\mid \\mid_2^2 + \\alpha$\n", 221 | "3. Compute the full formula by taking the max with zero and summing over the training examples:\n", 222 | "$$\\mathcal{J} = \\sum^{m}_{i=1} \\large[ \\small \\mid \\mid f(A^{(i)}) - f(P^{(i)}) \\mid \\mid_2^2 - \\mid \\mid f(A^{(i)}) - f(N^{(i)}) \\mid \\mid_2^2+ \\alpha \\large ] \\small_+ \\tag{3}$$\n", 223 | "\n", 224 | "Useful functions: `tf.reduce_sum()`, `tf.square()`, `tf.subtract()`, `tf.add()`, `tf.maximum()`.\n", 225 | "For steps 1 and 2, you will need to sum over the entries of $\\mid \\mid f(A^{(i)}) - f(P^{(i)}) \\mid \\mid_2^2$ and $\\mid \\mid f(A^{(i)}) - f(N^{(i)}) \\mid \\mid_2^2$ while for step 4 you will need to sum over the training examples." 226 | ] 227 | }, 228 | { 229 | "cell_type": "code", 230 | "execution_count": 17, 231 | "metadata": { 232 | "collapsed": true 233 | }, 234 | "outputs": [], 235 | "source": [ 236 | "# GRADED FUNCTION: triplet_loss\n", 237 | "\n", 238 | "def triplet_loss(y_true, y_pred, alpha = 0.2):\n", 239 | " \"\"\"\n", 240 | " Implementation of the triplet loss as defined by formula (3)\n", 241 | " \n", 242 | " Arguments:\n", 243 | " y_true -- true labels, required when you define a loss in Keras, you don't need it in this function.\n", 244 | " y_pred -- python list containing three objects:\n", 245 | " anchor -- the encodings for the anchor images, of shape (None, 128)\n", 246 | " positive -- the encodings for the positive images, of shape (None, 128)\n", 247 | " negative -- the encodings for the negative images, of shape (None, 128)\n", 248 | " \n", 249 | " Returns:\n", 250 | " loss -- real number, value of the loss\n", 251 | " \"\"\"\n", 252 | " \n", 253 | " anchor, positive, negative = y_pred[0], y_pred[1], y_pred[2]\n", 254 | " \n", 255 | " ### START CODE HERE ### (≈ 4 lines)\n", 256 | " # Step 1: Compute the (encoding) distance between the anchor and the positive, you will need to sum over axis=-1\n", 257 | " pos_dist = tf.reduce_sum(tf.square(tf.subtract(anchor, positive)), axis = None)\n", 258 | " # Step 2: Compute the (encoding) distance between the anchor and the negative, you will need to sum over axis=-1\n", 259 | " neg_dist = tf.reduce_sum(tf.square(tf.subtract(anchor, negative)), axis = None)\n", 260 | " # Step 3: subtract the two previous distances and add alpha.\n", 261 | " basic_loss = tf.add(tf.subtract(pos_dist, neg_dist), alpha)\n", 262 | " # Step 4: Take the maximum of basic_loss and 0.0. Sum over the training examples.\n", 263 | " loss = tf.reduce_sum(tf.maximum(basic_loss, 0.0))\n", 264 | " ### END CODE HERE ###\n", 265 | " \n", 266 | " return loss" 267 | ] 268 | }, 269 | { 270 | "cell_type": "code", 271 | "execution_count": 18, 272 | "metadata": {}, 273 | "outputs": [ 274 | { 275 | "name": "stdout", 276 | "output_type": "stream", 277 | "text": [ 278 | "loss = 350.026\n" 279 | ] 280 | } 281 | ], 282 | "source": [ 283 | "with tf.Session() as test:\n", 284 | " tf.set_random_seed(1)\n", 285 | " y_true = (None, None, None)\n", 286 | " y_pred = (tf.random_normal([3, 128], mean=6, stddev=0.1, seed = 1),\n", 287 | " tf.random_normal([3, 128], mean=1, stddev=1, seed = 1),\n", 288 | " tf.random_normal([3, 128], mean=3, stddev=4, seed = 1))\n", 289 | " loss = triplet_loss(y_true, y_pred)\n", 290 | " \n", 291 | " print(\"loss = \" + str(loss.eval()))" 292 | ] 293 | }, 294 | { 295 | "cell_type": "markdown", 296 | "metadata": {}, 297 | "source": [ 298 | "**Expected Output**:\n", 299 | "\n", 300 | "\n", 301 | " \n", 302 | " \n", 305 | " \n", 308 | " \n", 309 | "\n", 310 | "
\n", 303 | " **loss**\n", 304 | " \n", 306 | " 528.143\n", 307 | "
" 311 | ] 312 | }, 313 | { 314 | "cell_type": "markdown", 315 | "metadata": {}, 316 | "source": [ 317 | "## 2 - Loading the trained model\n", 318 | "\n", 319 | "FaceNet is trained by minimizing the triplet loss. But since training requires a lot of data and a lot of computation, we won't train it from scratch here. Instead, we load a previously trained model. Load a model using the following cell; this might take a couple of minutes to run. " 320 | ] 321 | }, 322 | { 323 | "cell_type": "code", 324 | "execution_count": 19, 325 | "metadata": {}, 326 | "outputs": [ 327 | { 328 | "ename": "KeyError", 329 | "evalue": "'conv1_w'", 330 | "output_type": "error", 331 | "traceback": [ 332 | "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", 333 | "\u001b[0;31mKeyError\u001b[0m Traceback (most recent call last)", 334 | "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[0mFRmodel\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mcompile\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0moptimizer\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m'adam'\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mloss\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mtriplet_loss\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mmetrics\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m[\u001b[0m\u001b[0;34m'accuracy'\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 2\u001b[0;31m \u001b[0mload_weights_from_FaceNet\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mFRmodel\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", 335 | "\u001b[0;32m/home/jovyan/work/week4/Face Recognition/fr_utils.py\u001b[0m in \u001b[0;36mload_weights_from_FaceNet\u001b[0;34m(FRmodel)\u001b[0m\n\u001b[1;32m 131\u001b[0m \u001b[0;31m# Load weights from csv files (which was exported from Openface torch model)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 132\u001b[0m \u001b[0mweights\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mWEIGHTS\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 133\u001b[0;31m \u001b[0mweights_dict\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mload_weights\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 134\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 135\u001b[0m \u001b[0;31m# Set layer weights of the model\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", 336 | "\u001b[0;32m/home/jovyan/work/week4/Face Recognition/fr_utils.py\u001b[0m in \u001b[0;36mload_weights\u001b[0;34m()\u001b[0m\n\u001b[1;32m 152\u001b[0m \u001b[0;32mfor\u001b[0m \u001b[0mname\u001b[0m \u001b[0;32min\u001b[0m \u001b[0mWEIGHTS\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 153\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0;34m'conv'\u001b[0m \u001b[0;32min\u001b[0m \u001b[0mname\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 154\u001b[0;31m \u001b[0mconv_w\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mgenfromtxt\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mpaths\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mname\u001b[0m \u001b[0;34m+\u001b[0m \u001b[0;34m'_w'\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mdelimiter\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;34m','\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mdtype\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;32mNone\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 155\u001b[0m \u001b[0mconv_w\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mnp\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mreshape\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mconv_w\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mconv_shape\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mname\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 156\u001b[0m \u001b[0mconv_w\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mnp\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mtranspose\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mconv_w\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m(\u001b[0m\u001b[0;36m2\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m3\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m1\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m0\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", 337 | "\u001b[0;31mKeyError\u001b[0m: 'conv1_w'" 338 | ] 339 | } 340 | ], 341 | "source": [ 342 | "FRmodel.compile(optimizer = 'adam', loss = triplet_loss, metrics = ['accuracy'])\n", 343 | "load_weights_from_FaceNet(FRmodel)" 344 | ] 345 | }, 346 | { 347 | "cell_type": "markdown", 348 | "metadata": {}, 349 | "source": [ 350 | "Here're some examples of distances between the encodings between three individuals:\n", 351 | "\n", 352 | "\n", 353 | "
\n", 354 | "
**Figure 4**:
Example of distance outputs between three individuals' encodings
\n", 355 | "\n", 356 | "Let's now use this model to perform face verification and face recognition! " 357 | ] 358 | }, 359 | { 360 | "cell_type": "markdown", 361 | "metadata": {}, 362 | "source": [ 363 | "## 3 - Applying the model" 364 | ] 365 | }, 366 | { 367 | "cell_type": "markdown", 368 | "metadata": {}, 369 | "source": [ 370 | "Back to the Happy House! Residents are living blissfully since you implemented happiness recognition for the house in an earlier assignment. \n", 371 | "\n", 372 | "However, several issues keep coming up: The Happy House became so happy that every happy person in the neighborhood is coming to hang out in your living room. It is getting really crowded, which is having a negative impact on the residents of the house. All these random happy people are also eating all your food. \n", 373 | "\n", 374 | "So, you decide to change the door entry policy, and not just let random happy people enter anymore, even if they are happy! Instead, you'd like to build a **Face verification** system so as to only let people from a specified list come in. To get admitted, each person has to swipe an ID card (identification card) to identify themselves at the door. The face recognition system then checks that they are who they claim to be. " 375 | ] 376 | }, 377 | { 378 | "cell_type": "markdown", 379 | "metadata": {}, 380 | "source": [ 381 | "### 3.1 - Face Verification\n", 382 | "\n", 383 | "Let's build a database containing one encoding vector for each person allowed to enter the happy house. To generate the encoding we use `img_to_encoding(image_path, model)` which basically runs the forward propagation of the model on the specified image. \n", 384 | "\n", 385 | "Run the following code to build the database (represented as a python dictionary). This database maps each person's name to a 128-dimensional encoding of their face." 386 | ] 387 | }, 388 | { 389 | "cell_type": "code", 390 | "execution_count": null, 391 | "metadata": { 392 | "collapsed": true 393 | }, 394 | "outputs": [], 395 | "source": [ 396 | "database = {}\n", 397 | "database[\"danielle\"] = img_to_encoding(\"images/danielle.png\", FRmodel)\n", 398 | "database[\"younes\"] = img_to_encoding(\"images/younes.jpg\", FRmodel)\n", 399 | "database[\"tian\"] = img_to_encoding(\"images/tian.jpg\", FRmodel)\n", 400 | "database[\"andrew\"] = img_to_encoding(\"images/andrew.jpg\", FRmodel)\n", 401 | "database[\"kian\"] = img_to_encoding(\"images/kian.jpg\", FRmodel)\n", 402 | "database[\"dan\"] = img_to_encoding(\"images/dan.jpg\", FRmodel)\n", 403 | "database[\"sebastiano\"] = img_to_encoding(\"images/sebastiano.jpg\", FRmodel)\n", 404 | "database[\"bertrand\"] = img_to_encoding(\"images/bertrand.jpg\", FRmodel)\n", 405 | "database[\"kevin\"] = img_to_encoding(\"images/kevin.jpg\", FRmodel)\n", 406 | "database[\"felix\"] = img_to_encoding(\"images/felix.jpg\", FRmodel)\n", 407 | "database[\"benoit\"] = img_to_encoding(\"images/benoit.jpg\", FRmodel)\n", 408 | "database[\"arnaud\"] = img_to_encoding(\"images/arnaud.jpg\", FRmodel)" 409 | ] 410 | }, 411 | { 412 | "cell_type": "markdown", 413 | "metadata": {}, 414 | "source": [ 415 | "Now, when someone shows up at your front door and swipes their ID card (thus giving you their name), you can look up their encoding in the database, and use it to check if the person standing at the front door matches the name on the ID.\n", 416 | "\n", 417 | "**Exercise**: Implement the verify() function which checks if the front-door camera picture (`image_path`) is actually the person called \"identity\". You will have to go through the following steps:\n", 418 | "1. Compute the encoding of the image from image_path\n", 419 | "2. Compute the distance about this encoding and the encoding of the identity image stored in the database\n", 420 | "3. Open the door if the distance is less than 0.7, else do not open.\n", 421 | "\n", 422 | "As presented above, you should use the L2 distance (np.linalg.norm). (Note: In this implementation, compare the L2 distance, not the square of the L2 distance, to the threshold 0.7.) " 423 | ] 424 | }, 425 | { 426 | "cell_type": "code", 427 | "execution_count": null, 428 | "metadata": { 429 | "collapsed": true 430 | }, 431 | "outputs": [], 432 | "source": [ 433 | "# GRADED FUNCTION: verify\n", 434 | "\n", 435 | "def verify(image_path, identity, database, model):\n", 436 | " \"\"\"\n", 437 | " Function that verifies if the person on the \"image_path\" image is \"identity\".\n", 438 | " \n", 439 | " Arguments:\n", 440 | " image_path -- path to an image\n", 441 | " identity -- string, name of the person you'd like to verify the identity. Has to be a resident of the Happy house.\n", 442 | " database -- python dictionary mapping names of allowed people's names (strings) to their encodings (vectors).\n", 443 | " model -- your Inception model instance in Keras\n", 444 | " \n", 445 | " Returns:\n", 446 | " dist -- distance between the image_path and the image of \"identity\" in the database.\n", 447 | " door_open -- True, if the door should open. False otherwise.\n", 448 | " \"\"\"\n", 449 | " \n", 450 | " ### START CODE HERE ###\n", 451 | " \n", 452 | " # Step 1: Compute the encoding for the image. Use img_to_encoding() see example above. (≈ 1 line)\n", 453 | " encoding = img_to_encoding(image_path, model)\n", 454 | " \n", 455 | " # Step 2: Compute distance with identity's image (≈ 1 line)\n", 456 | " dist = np.linalg.norm(encoding - database[identity])\n", 457 | " \n", 458 | " # Step 3: Open the door if dist < 0.7, else don't open (≈ 3 lines)\n", 459 | " if dist < 0.7:\n", 460 | " print(\"It's \" + str(identity) + \", welcome home!\")\n", 461 | " door_open = True\n", 462 | " else:\n", 463 | " print(\"It's not \" + str(identity) + \", please go away\")\n", 464 | " door_open = False\n", 465 | " \n", 466 | " ### END CODE HERE ###\n", 467 | " \n", 468 | " return dist, door_open" 469 | ] 470 | }, 471 | { 472 | "cell_type": "markdown", 473 | "metadata": {}, 474 | "source": [ 475 | "Younes is trying to enter the Happy House and the camera takes a picture of him (\"images/camera_0.jpg\"). Let's run your verification algorithm on this picture:\n", 476 | "\n", 477 | "" 478 | ] 479 | }, 480 | { 481 | "cell_type": "code", 482 | "execution_count": null, 483 | "metadata": { 484 | "collapsed": true 485 | }, 486 | "outputs": [], 487 | "source": [ 488 | "verify(\"images/camera_0.jpg\", \"younes\", database, FRmodel)" 489 | ] 490 | }, 491 | { 492 | "cell_type": "markdown", 493 | "metadata": { 494 | "collapsed": true 495 | }, 496 | "source": [ 497 | "**Expected Output**:\n", 498 | "\n", 499 | "\n", 500 | " \n", 501 | " \n", 504 | " \n", 507 | " \n", 508 | "\n", 509 | "
\n", 502 | " **It's younes, welcome home!**\n", 503 | " \n", 505 | " (0.65939283, True)\n", 506 | "
" 510 | ] 511 | }, 512 | { 513 | "cell_type": "markdown", 514 | "metadata": { 515 | "collapsed": true 516 | }, 517 | "source": [ 518 | "Benoit, who broke the aquarium last weekend, has been banned from the house and removed from the database. He stole Kian's ID card and came back to the house to try to present himself as Kian. The front-door camera took a picture of Benoit (\"images/camera_2.jpg). Let's run the verification algorithm to check if benoit can enter.\n", 519 | "" 520 | ] 521 | }, 522 | { 523 | "cell_type": "code", 524 | "execution_count": null, 525 | "metadata": { 526 | "collapsed": true 527 | }, 528 | "outputs": [], 529 | "source": [ 530 | "verify(\"images/camera_2.jpg\", \"kian\", database, FRmodel)" 531 | ] 532 | }, 533 | { 534 | "cell_type": "markdown", 535 | "metadata": {}, 536 | "source": [ 537 | "**Expected Output**:\n", 538 | "\n", 539 | "\n", 540 | " \n", 541 | " \n", 544 | " \n", 547 | " \n", 548 | "\n", 549 | "
\n", 542 | " **It's not kian, please go away**\n", 543 | " \n", 545 | " (0.86224014, False)\n", 546 | "
" 550 | ] 551 | }, 552 | { 553 | "cell_type": "markdown", 554 | "metadata": {}, 555 | "source": [ 556 | "### 3.2 - Face Recognition\n", 557 | "\n", 558 | "Your face verification system is mostly working well. But since Kian got his ID card stolen, when he came back to the house that evening he couldn't get in! \n", 559 | "\n", 560 | "To reduce such shenanigans, you'd like to change your face verification system to a face recognition system. This way, no one has to carry an ID card anymore. An authorized person can just walk up to the house, and the front door will unlock for them! \n", 561 | "\n", 562 | "You'll implement a face recognition system that takes as input an image, and figures out if it is one of the authorized persons (and if so, who). Unlike the previous face verification system, we will no longer get a person's name as another input. \n", 563 | "\n", 564 | "**Exercise**: Implement `who_is_it()`. You will have to go through the following steps:\n", 565 | "1. Compute the target encoding of the image from image_path\n", 566 | "2. Find the encoding from the database that has smallest distance with the target encoding. \n", 567 | " - Initialize the `min_dist` variable to a large enough number (100). It will help you keep track of what is the closest encoding to the input's encoding.\n", 568 | " - Loop over the database dictionary's names and encodings. To loop use `for (name, db_enc) in database.items()`.\n", 569 | " - Compute L2 distance between the target \"encoding\" and the current \"encoding\" from the database.\n", 570 | " - If this distance is less than the min_dist, then set min_dist to dist, and identity to name." 571 | ] 572 | }, 573 | { 574 | "cell_type": "code", 575 | "execution_count": null, 576 | "metadata": { 577 | "collapsed": true 578 | }, 579 | "outputs": [], 580 | "source": [ 581 | "# GRADED FUNCTION: who_is_it\n", 582 | "\n", 583 | "def who_is_it(image_path, database, model):\n", 584 | " \"\"\"\n", 585 | " Implements face recognition for the happy house by finding who is the person on the image_path image.\n", 586 | " \n", 587 | " Arguments:\n", 588 | " image_path -- path to an image\n", 589 | " database -- database containing image encodings along with the name of the person on the image\n", 590 | " model -- your Inception model instance in Keras\n", 591 | " \n", 592 | " Returns:\n", 593 | " min_dist -- the minimum distance between image_path encoding and the encodings from the database\n", 594 | " identity -- string, the name prediction for the person on image_path\n", 595 | " \"\"\"\n", 596 | " \n", 597 | " ### START CODE HERE ### \n", 598 | " \n", 599 | " ## Step 1: Compute the target \"encoding\" for the image. Use img_to_encoding() see example above. ## (≈ 1 line)\n", 600 | " encoding = img_to_encoding(image_path, model)\n", 601 | " \n", 602 | " ## Step 2: Find the closest encoding ##\n", 603 | " \n", 604 | " # Initialize \"min_dist\" to a large value, say 100 (≈1 line)\n", 605 | " min_dist = 100\n", 606 | " \n", 607 | " # Loop over the database dictionary's names and encodings.\n", 608 | " for (name, db_enc) in database.items():\n", 609 | " \n", 610 | " # Compute L2 distance between the target \"encoding\" and the current \"emb\" from the database. (≈ 1 line)\n", 611 | " dist = np.linalg.norm(encoding-db_enc)\n", 612 | "\n", 613 | " # If this distance is less than the min_dist, then set min_dist to dist, and identity to name. (≈ 3 lines)\n", 614 | " if dist < min_dist:\n", 615 | " min_dist = dist\n", 616 | " identity = name\n", 617 | "\n", 618 | " ### END CODE HERE ###\n", 619 | " \n", 620 | " if min_dist > 0.7:\n", 621 | " print(\"Not in the database.\")\n", 622 | " else:\n", 623 | " print (\"it's \" + str(identity) + \", the distance is \" + str(min_dist))\n", 624 | " \n", 625 | " return min_dist, identity" 626 | ] 627 | }, 628 | { 629 | "cell_type": "markdown", 630 | "metadata": {}, 631 | "source": [ 632 | "Younes is at the front-door and the camera takes a picture of him (\"images/camera_0.jpg\"). Let's see if your who_it_is() algorithm identifies Younes. " 633 | ] 634 | }, 635 | { 636 | "cell_type": "code", 637 | "execution_count": null, 638 | "metadata": { 639 | "collapsed": true, 640 | "scrolled": false 641 | }, 642 | "outputs": [], 643 | "source": [ 644 | "who_is_it(\"images/camera_0.jpg\", database, FRmodel)" 645 | ] 646 | }, 647 | { 648 | "cell_type": "markdown", 649 | "metadata": {}, 650 | "source": [ 651 | "**Expected Output**:\n", 652 | "\n", 653 | "\n", 654 | " \n", 655 | " \n", 658 | " \n", 661 | " \n", 662 | "\n", 663 | "
\n", 656 | " **it's younes, the distance is 0.659393**\n", 657 | " \n", 659 | " (0.65939283, 'younes')\n", 660 | "
" 664 | ] 665 | }, 666 | { 667 | "cell_type": "markdown", 668 | "metadata": {}, 669 | "source": [ 670 | "You can change \"`camera_0.jpg`\" (picture of younes) to \"`camera_1.jpg`\" (picture of bertrand) and see the result." 671 | ] 672 | }, 673 | { 674 | "cell_type": "markdown", 675 | "metadata": {}, 676 | "source": [ 677 | "Your Happy House is running well. It only lets in authorized persons, and people don't need to carry an ID card around anymore! \n", 678 | "\n", 679 | "You've now seen how a state-of-the-art face recognition system works.\n", 680 | "\n", 681 | "Although we won't implement it here, here're some ways to further improve the algorithm:\n", 682 | "- Put more images of each person (under different lighting conditions, taken on different days, etc.) into the database. Then given a new image, compare the new face to multiple pictures of the person. This would increae accuracy.\n", 683 | "- Crop the images to just contain the face, and less of the \"border\" region around the face. This preprocessing removes some of the irrelevant pixels around the face, and also makes the algorithm more robust.\n" 684 | ] 685 | }, 686 | { 687 | "cell_type": "markdown", 688 | "metadata": {}, 689 | "source": [ 690 | "\n", 691 | "**What you should remember**:\n", 692 | "- Face verification solves an easier 1:1 matching problem; face recognition addresses a harder 1:K matching problem. \n", 693 | "- The triplet loss is an effective loss function for training a neural network to learn an encoding of a face image.\n", 694 | "- The same encoding can be used for verification and recognition. Measuring distances between two images' encodings allows you to determine whether they are pictures of the same person. " 695 | ] 696 | }, 697 | { 698 | "cell_type": "markdown", 699 | "metadata": {}, 700 | "source": [ 701 | "Congrats on finishing this assignment! \n" 702 | ] 703 | }, 704 | { 705 | "cell_type": "markdown", 706 | "metadata": {}, 707 | "source": [ 708 | "### References:\n", 709 | "\n", 710 | "- Florian Schroff, Dmitry Kalenichenko, James Philbin (2015). [FaceNet: A Unified Embedding for Face Recognition and Clustering](https://arxiv.org/pdf/1503.03832.pdf)\n", 711 | "- Yaniv Taigman, Ming Yang, Marc'Aurelio Ranzato, Lior Wolf (2014). [DeepFace: Closing the gap to human-level performance in face verification](https://research.fb.com/wp-content/uploads/2016/11/deepface-closing-the-gap-to-human-level-performance-in-face-verification.pdf) \n", 712 | "- The pretrained model we use is inspired by Victor Sy Wang's implementation and was loaded using his code: https://github.com/iwantooxxoox/Keras-OpenFace.\n", 713 | "- Our implementation also took a lot of inspiration from the official FaceNet github repository: https://github.com/davidsandberg/facenet \n" 714 | ] 715 | } 716 | ], 717 | "metadata": { 718 | "coursera": { 719 | "course_slug": "convolutional-neural-networks", 720 | "graded_item_id": "IaknP", 721 | "launcher_item_id": "5UMr4" 722 | }, 723 | "kernelspec": { 724 | "display_name": "Python 3", 725 | "language": "python", 726 | "name": "python3" 727 | }, 728 | "language_info": { 729 | "codemirror_mode": { 730 | "name": "ipython", 731 | "version": 3 732 | }, 733 | "file_extension": ".py", 734 | "mimetype": "text/x-python", 735 | "name": "python", 736 | "nbconvert_exporter": "python", 737 | "pygments_lexer": "ipython3", 738 | "version": "3.6.0" 739 | } 740 | }, 741 | "nbformat": 4, 742 | "nbformat_minor": 2 743 | } 744 | -------------------------------------------------------------------------------- /course5_sequential_models/.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wangruinju/Deep-Learning/e4c2b9d04898bd6113ef32377732915d4282b838/course5_sequential_models/.DS_Store -------------------------------------------------------------------------------- /course5_sequential_models/Coursera 32LEZVGFV8FD.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wangruinju/Deep-Learning/e4c2b9d04898bd6113ef32377732915d4282b838/course5_sequential_models/Coursera 32LEZVGFV8FD.pdf -------------------------------------------------------------------------------- /course5_sequential_models/Dinosaurus+Island+--+Character+level+language+model+final+-+v2.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Character level language model - Dinosaurus land\n", 8 | "\n", 9 | "Welcome to Dinosaurus Island! 65 million years ago, dinosaurs existed, and in this assignment they are back. You are in charge of a special task. Leading biology researchers are creating new breeds of dinosaurs and bringing them to life on earth, and your job is to give names to these dinosaurs. If a dinosaur does not like its name, it might go beserk, so choose wisely! \n", 10 | "\n", 11 | "\n", 12 | "\n", 16 | "\n", 17 | "
\n", 13 | "\n", 14 | "\n", 15 | "
\n", 18 | "\n", 19 | "Luckily you have learned some deep learning and you will use it to save the day. Your assistant has collected a list of all the dinosaur names they could find, and compiled them into this [dataset](dinos.txt). (Feel free to take a look by clicking the previous link.) To create new dinosaur names, you will build a character level language model to generate new names. Your algorithm will learn the different name patterns, and randomly generate new names. Hopefully this algorithm will keep you and your team safe from the dinosaurs' wrath! \n", 20 | "\n", 21 | "By completing this assignment you will learn:\n", 22 | "\n", 23 | "- How to store text data for processing using an RNN \n", 24 | "- How to synthesize data, by sampling predictions at each time step and passing it to the next RNN-cell unit\n", 25 | "- How to build a character-level text generation recurrent neural network\n", 26 | "- Why clipping the gradients is important\n", 27 | "\n", 28 | "We will begin by loading in some functions that we have provided for you in `rnn_utils`. Specifically, you have access to functions such as `rnn_forward` and `rnn_backward` which are equivalent to those you've implemented in the previous assignment. " 29 | ] 30 | }, 31 | { 32 | "cell_type": "code", 33 | "execution_count": 54, 34 | "metadata": { 35 | "collapsed": true 36 | }, 37 | "outputs": [], 38 | "source": [ 39 | "import numpy as np\n", 40 | "from utils import *\n", 41 | "import random\n", 42 | "from random import shuffle" 43 | ] 44 | }, 45 | { 46 | "cell_type": "markdown", 47 | "metadata": { 48 | "collapsed": true 49 | }, 50 | "source": [ 51 | "## 1 - Problem Statement\n", 52 | "\n", 53 | "### 1.1 - Dataset and Preprocessing\n", 54 | "\n", 55 | "Run the following cell to read the dataset of dinosaur names, create a list of unique characters (such as a-z), and compute the dataset and vocabulary size. " 56 | ] 57 | }, 58 | { 59 | "cell_type": "code", 60 | "execution_count": 55, 61 | "metadata": {}, 62 | "outputs": [ 63 | { 64 | "name": "stdout", 65 | "output_type": "stream", 66 | "text": [ 67 | "There are 19909 total characters and 27 unique characters in your data.\n" 68 | ] 69 | } 70 | ], 71 | "source": [ 72 | "data = open('dinos.txt', 'r').read()\n", 73 | "data= data.lower()\n", 74 | "chars = list(set(data))\n", 75 | "data_size, vocab_size = len(data), len(chars)\n", 76 | "print('There are %d total characters and %d unique characters in your data.' % (data_size, vocab_size))" 77 | ] 78 | }, 79 | { 80 | "cell_type": "markdown", 81 | "metadata": {}, 82 | "source": [ 83 | "The characters are a-z (26 characters) plus the \"\\n\" (or newline character), which in this assignment plays a role similar to the `` (or \"End of sentence\") token we had discussed in lecture, only here it indicates the end of the dinosaur name rather than the end of a sentence. In the cell below, we create a python dictionary (i.e., a hash table) to map each character to an index from 0-26. We also create a second python dictionary that maps each index back to the corresponding character character. This will help you figure out what index corresponds to what character in the probability distribution output of the softmax layer. Below, `char_to_ix` and `ix_to_char` are the python dictionaries. " 84 | ] 85 | }, 86 | { 87 | "cell_type": "code", 88 | "execution_count": 56, 89 | "metadata": {}, 90 | "outputs": [ 91 | { 92 | "name": "stdout", 93 | "output_type": "stream", 94 | "text": [ 95 | "{0: '\\n', 1: 'a', 2: 'b', 3: 'c', 4: 'd', 5: 'e', 6: 'f', 7: 'g', 8: 'h', 9: 'i', 10: 'j', 11: 'k', 12: 'l', 13: 'm', 14: 'n', 15: 'o', 16: 'p', 17: 'q', 18: 'r', 19: 's', 20: 't', 21: 'u', 22: 'v', 23: 'w', 24: 'x', 25: 'y', 26: 'z'}\n" 96 | ] 97 | } 98 | ], 99 | "source": [ 100 | "char_to_ix = { ch:i for i,ch in enumerate(sorted(chars)) }\n", 101 | "ix_to_char = { i:ch for i,ch in enumerate(sorted(chars)) }\n", 102 | "print(ix_to_char)" 103 | ] 104 | }, 105 | { 106 | "cell_type": "markdown", 107 | "metadata": {}, 108 | "source": [ 109 | "### 1.2 - Overview of the model\n", 110 | "\n", 111 | "Your model will have the following structure: \n", 112 | "\n", 113 | "- Initialize parameters \n", 114 | "- Run the optimization loop\n", 115 | " - Forward propagation to compute the loss function\n", 116 | " - Backward propagation to compute the gradients with respect to the loss function\n", 117 | " - Clip the gradients to avoid exploding gradients\n", 118 | " - Using the gradients, update your parameter with the gradient descent update rule.\n", 119 | "- Return the learned parameters \n", 120 | " \n", 121 | "\n", 122 | "
**Figure 1**: Recurrent Neural Network, similar to what you had built in the previous notebook \"Building a RNN - Step by Step\".
\n", 123 | "\n", 124 | "At each time-step, the RNN tries to predict what is the next character given the previous characters. The dataset $X = (x^{\\langle 1 \\rangle}, x^{\\langle 2 \\rangle}, ..., x^{\\langle T_x \\rangle})$ is a list of characters in the training set, while $Y = (y^{\\langle 1 \\rangle}, y^{\\langle 2 \\rangle}, ..., y^{\\langle T_x \\rangle})$ is such that at every time-step $t$, we have $y^{\\langle t \\rangle} = x^{\\langle t+1 \\rangle}$. " 125 | ] 126 | }, 127 | { 128 | "cell_type": "markdown", 129 | "metadata": {}, 130 | "source": [ 131 | "## 2 - Building blocks of the model\n", 132 | "\n", 133 | "In this part, you will build two important blocks of the overall model:\n", 134 | "- Gradient clipping: to avoid exploding gradients\n", 135 | "- Sampling: a technique used to generate characters\n", 136 | "\n", 137 | "You will then apply these two functions to build the model." 138 | ] 139 | }, 140 | { 141 | "cell_type": "markdown", 142 | "metadata": {}, 143 | "source": [ 144 | "### 2.1 - Clipping the gradients in the optimization loop\n", 145 | "\n", 146 | "In this section you will implement the `clip` function that you will call inside of your optimization loop. Recall that your overall loop structure usually consists of a forward pass, a cost computation, a backward pass, and a parameter update. Before updating the parameters, you will perform gradient clipping when needed to make sure that your gradients are not \"exploding,\" meaning taking on overly large values. \n", 147 | "\n", 148 | "In the exercise below, you will implement a function `clip` that takes in a dictionary of gradients and returns a clipped version of gradients if needed. There are different ways to clip gradients; we will use a simple element-wise clipping procedure, in which every element of the gradient vector is clipped to lie between some range [-N, N]. More generally, you will provide a `maxValue` (say 10). In this example, if any component of the gradient vector is greater than 10, it would be set to 10; and if any component of the gradient vector is less than -10, it would be set to -10. If it is between -10 and 10, it is left alone. \n", 149 | "\n", 150 | "\n", 151 | "
**Figure 2**: Visualization of gradient descent with and without gradient clipping, in a case where the network is running into slight \"exploding gradient\" problems.
\n", 152 | "\n", 153 | "**Exercise**: Implement the function below to return the clipped gradients of your dictionary `gradients`. Your function takes in a maximum threshold and returns the clipped versions of your gradients. You can check out this [hint](https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.clip.html) for examples of how to clip in numpy. You will need to use the argument `out = ...`." 154 | ] 155 | }, 156 | { 157 | "cell_type": "code", 158 | "execution_count": 57, 159 | "metadata": { 160 | "collapsed": true 161 | }, 162 | "outputs": [], 163 | "source": [ 164 | "### GRADED FUNCTION: clip\n", 165 | "\n", 166 | "def clip(gradients, maxValue):\n", 167 | " '''\n", 168 | " Clips the gradients' values between minimum and maximum.\n", 169 | " \n", 170 | " Arguments:\n", 171 | " gradients -- a dictionary containing the gradients \"dWaa\", \"dWax\", \"dWya\", \"db\", \"dby\"\n", 172 | " maxValue -- everything above this number is set to this number, and everything less than -maxValue is set to -maxValue\n", 173 | " \n", 174 | " Returns: \n", 175 | " gradients -- a dictionary with the clipped gradients.\n", 176 | " '''\n", 177 | " \n", 178 | " dWaa, dWax, dWya, db, dby = gradients['dWaa'], gradients['dWax'], gradients['dWya'], gradients['db'], gradients['dby']\n", 179 | " \n", 180 | " ### START CODE HERE ###\n", 181 | " # clip to mitigate exploding gradients, loop over [dWax, dWaa, dWya, db, dby]. (≈2 lines)\n", 182 | " for v in gradients.values():\n", 183 | " v.clip(-maxValue, maxValue, out = v)\n", 184 | " ### END CODE HERE ###\n", 185 | " \n", 186 | " gradients = {\"dWaa\": dWaa, \"dWax\": dWax, \"dWya\": dWya, \"db\": db, \"dby\": dby}\n", 187 | " \n", 188 | " return gradients" 189 | ] 190 | }, 191 | { 192 | "cell_type": "code", 193 | "execution_count": 58, 194 | "metadata": {}, 195 | "outputs": [ 196 | { 197 | "name": "stdout", 198 | "output_type": "stream", 199 | "text": [ 200 | "gradients[\"dWaa\"][1][2] = 10.0\n", 201 | "gradients[\"dWax\"][3][1] = -10.0\n", 202 | "gradients[\"dWya\"][1][2] = 0.29713815361\n", 203 | "gradients[\"db\"][4] = [ 10.]\n", 204 | "gradients[\"dby\"][1] = [ 8.45833407]\n" 205 | ] 206 | } 207 | ], 208 | "source": [ 209 | "np.random.seed(3)\n", 210 | "dWax = np.random.randn(5,3)*10\n", 211 | "dWaa = np.random.randn(5,5)*10\n", 212 | "dWya = np.random.randn(2,5)*10\n", 213 | "db = np.random.randn(5,1)*10\n", 214 | "dby = np.random.randn(2,1)*10\n", 215 | "gradients = {\"dWax\": dWax, \"dWaa\": dWaa, \"dWya\": dWya, \"db\": db, \"dby\": dby}\n", 216 | "gradients = clip(gradients, 10)\n", 217 | "print(\"gradients[\\\"dWaa\\\"][1][2] =\", gradients[\"dWaa\"][1][2])\n", 218 | "print(\"gradients[\\\"dWax\\\"][3][1] =\", gradients[\"dWax\"][3][1])\n", 219 | "print(\"gradients[\\\"dWya\\\"][1][2] =\", gradients[\"dWya\"][1][2])\n", 220 | "print(\"gradients[\\\"db\\\"][4] =\", gradients[\"db\"][4])\n", 221 | "print(\"gradients[\\\"dby\\\"][1] =\", gradients[\"dby\"][1])" 222 | ] 223 | }, 224 | { 225 | "cell_type": "markdown", 226 | "metadata": {}, 227 | "source": [ 228 | "** Expected output:**\n", 229 | "\n", 230 | "\n", 231 | "\n", 232 | " \n", 235 | " \n", 238 | "\n", 239 | "\n", 240 | "\n", 241 | " \n", 244 | " \n", 247 | " \n", 248 | "\n", 249 | "\n", 250 | " \n", 253 | " \n", 256 | "\n", 257 | "\n", 258 | " \n", 261 | " \n", 264 | "\n", 265 | "\n", 266 | " \n", 269 | " \n", 272 | "\n", 273 | "\n", 274 | "
\n", 233 | " **gradients[\"dWaa\"][1][2] **\n", 234 | " \n", 236 | " 10.0\n", 237 | "
\n", 242 | " **gradients[\"dWax\"][3][1]**\n", 243 | " \n", 245 | " -10.0\n", 246 | "
\n", 251 | " **gradients[\"dWya\"][1][2]**\n", 252 | " \n", 254 | "0.29713815361\n", 255 | "
\n", 259 | " **gradients[\"db\"][4]**\n", 260 | " \n", 262 | "[ 10.]\n", 263 | "
\n", 267 | " **gradients[\"dby\"][1]**\n", 268 | " \n", 270 | "[ 8.45833407]\n", 271 | "
" 275 | ] 276 | }, 277 | { 278 | "cell_type": "markdown", 279 | "metadata": {}, 280 | "source": [ 281 | "### 2.2 - Sampling\n", 282 | "\n", 283 | "Now assume that your model is trained. You would like to generate new text (characters). The process of generation is explained in the picture below:\n", 284 | "\n", 285 | "\n", 286 | "
**Figure 3**: In this picture, we assume the model is already trained. We pass in $x^{\\langle 1\\rangle} = \\vec{0}$ at the first time step, and have the network then sample one character at a time.
\n", 287 | "\n", 288 | "**Exercise**: Implement the `sample` function below to sample characters. You need to carry out 4 steps:\n", 289 | "\n", 290 | "- **Step 1**: Pass the network the first \"dummy\" input $x^{\\langle 1 \\rangle} = \\vec{0}$ (the vector of zeros). This is the default input before we've generated any characters. We also set $a^{\\langle 0 \\rangle} = \\vec{0}$\n", 291 | "\n", 292 | "- **Step 2**: Run one step of forward propagation to get $a^{\\langle 1 \\rangle}$ and $\\hat{y}^{\\langle 1 \\rangle}$. Here are the equations:\n", 293 | "\n", 294 | "$$ a^{\\langle t+1 \\rangle} = \\tanh(W_{ax} x^{\\langle t \\rangle } + W_{aa} a^{\\langle t \\rangle } + b)\\tag{1}$$\n", 295 | "\n", 296 | "$$ z^{\\langle t + 1 \\rangle } = W_{ya} a^{\\langle t + 1 \\rangle } + b_y \\tag{2}$$\n", 297 | "\n", 298 | "$$ \\hat{y}^{\\langle t+1 \\rangle } = softmax(z^{\\langle t + 1 \\rangle })\\tag{3}$$\n", 299 | "\n", 300 | "Note that $\\hat{y}^{\\langle t+1 \\rangle }$ is a (softmax) probability vector (its entries are between 0 and 1 and sum to 1). $\\hat{y}^{\\langle t+1 \\rangle}_i$ represents the probability that the character indexed by \"i\" is the next character. We have provided a `softmax()` function that you can use.\n", 301 | "\n", 302 | "- **Step 3**: Carry out sampling: Pick the next character's index according to the probability distribution specified by $\\hat{y}^{\\langle t+1 \\rangle }$. This means that if $\\hat{y}^{\\langle t+1 \\rangle }_i = 0.16$, you will pick the index \"i\" with 16% probability. To implement it, you can use [`np.random.choice`](https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.random.choice.html).\n", 303 | "\n", 304 | "Here is an example of how to use `np.random.choice()`:\n", 305 | "```python\n", 306 | "np.random.seed(0)\n", 307 | "p = np.array([0.1, 0.0, 0.7, 0.2])\n", 308 | "index = np.random.choice([0, 1, 2, 3], p = p.ravel())\n", 309 | "```\n", 310 | "This means that you will pick the `index` according to the distribution: \n", 311 | "$P(index = 0) = 0.1, P(index = 1) = 0.0, P(index = 2) = 0.7, P(index = 3) = 0.2$.\n", 312 | "\n", 313 | "- **Step 4**: The last step to implement in `sample()` is to overwrite the variable `x`, which currently stores $x^{\\langle t \\rangle }$, with the value of $x^{\\langle t + 1 \\rangle }$. You will represent $x^{\\langle t + 1 \\rangle }$ by creating a one-hot vector corresponding to the character you've chosen as your prediction. You will then forward propagate $x^{\\langle t + 1 \\rangle }$ in Step 1 and keep repeating the process until you get a \"\\n\" character, indicating you've reached the end of the dinosaur name. " 314 | ] 315 | }, 316 | { 317 | "cell_type": "code", 318 | "execution_count": 59, 319 | "metadata": { 320 | "collapsed": true 321 | }, 322 | "outputs": [], 323 | "source": [ 324 | "# GRADED FUNCTION: sample\n", 325 | "\n", 326 | "def sample(parameters, char_to_ix, seed):\n", 327 | " \"\"\"\n", 328 | " Sample a sequence of characters according to a sequence of probability distributions output of the RNN\n", 329 | "\n", 330 | " Arguments:\n", 331 | " parameters -- python dictionary containing the parameters Waa, Wax, Wya, by, and b. \n", 332 | " char_to_ix -- python dictionary mapping each character to an index.\n", 333 | " seed -- used for grading purposes. Do not worry about it.\n", 334 | "\n", 335 | " Returns:\n", 336 | " indices -- a list of length n containing the indices of the sampled characters.\n", 337 | " \"\"\"\n", 338 | " \n", 339 | " # Retrieve parameters and relevant shapes from \"parameters\" dictionary\n", 340 | " Waa, Wax, Wya, by, b = parameters['Waa'], parameters['Wax'], parameters['Wya'], parameters['by'], parameters['b']\n", 341 | " vocab_size = by.shape[0]\n", 342 | " n_a = Waa.shape[1]\n", 343 | " \n", 344 | " ### START CODE HERE ###\n", 345 | " # Step 1: Create the one-hot vector x for the first character (initializing the sequence generation). (≈1 line)\n", 346 | " x = np.zeros((vocab_size, 1))\n", 347 | " # Step 1': Initialize a_prev as zeros (≈1 line)\n", 348 | " a_prev = np.zeros((n_a, 1))\n", 349 | " \n", 350 | " # Create an empty list of indices, this is the list which will contain the list of indices of the characters to generate (≈1 line)\n", 351 | " indices = []\n", 352 | " \n", 353 | " # Idx is a flag to detect a newline character, we initialize it to -1\n", 354 | " idx = -1 \n", 355 | " \n", 356 | " # Loop over time-steps t. At each time-step, sample a character from a probability distribution and append \n", 357 | " # its index to \"indices\". We'll stop if we reach 50 characters (which should be very unlikely with a well \n", 358 | " # trained model), which helps debugging and prevents entering an infinite loop. \n", 359 | " counter = 0\n", 360 | " newline_character = char_to_ix['\\n']\n", 361 | " \n", 362 | " while (idx != newline_character and counter != 50):\n", 363 | " \n", 364 | " # Step 2: Forward propagate x using the equations (1), (2) and (3)\n", 365 | " a = np.tanh(np.dot(Wax, x) + np.dot(Waa, a_prev) + b)\n", 366 | " z = np.dot(Wya, a) + by\n", 367 | " y = softmax(z)\n", 368 | " \n", 369 | " # for grading purposes\n", 370 | " np.random.seed(counter+seed) \n", 371 | " \n", 372 | " # Step 3: Sample the index of a character within the vocabulary from the probability distribution y\n", 373 | " idx = np.random.choice(list(range(vocab_size)), p = y.ravel())\n", 374 | "\n", 375 | " # Append the index to \"indices\"\n", 376 | " indices.append(idx)\n", 377 | " \n", 378 | " # Step 4: Overwrite the input character as the one corresponding to the sampled index.\n", 379 | " x = np.zeros((vocab_size, 1))\n", 380 | " x[idx] = 1\n", 381 | " \n", 382 | " # Update \"a_prev\" to be \"a\"\n", 383 | " a_prev = a\n", 384 | " \n", 385 | " # for grading purposes\n", 386 | " seed += 1\n", 387 | " counter +=1\n", 388 | " \n", 389 | " ### END CODE HERE ###\n", 390 | "\n", 391 | " if (counter == 50):\n", 392 | " indices.append(char_to_ix['\\n'])\n", 393 | " \n", 394 | " return indices" 395 | ] 396 | }, 397 | { 398 | "cell_type": "code", 399 | "execution_count": 60, 400 | "metadata": {}, 401 | "outputs": [ 402 | { 403 | "name": "stdout", 404 | "output_type": "stream", 405 | "text": [ 406 | "Sampling:\n", 407 | "list of sampled indices: [18, 2, 26, 0]\n", 408 | "list of sampled characters: ['r', 'b', 'z', '\\n']\n" 409 | ] 410 | } 411 | ], 412 | "source": [ 413 | "np.random.seed(2)\n", 414 | "_, n_a = 20, 100\n", 415 | "a0 = np.random.randn(n_a, 1)\n", 416 | "i0 = 1 # first character is ix_to_char[i0]\n", 417 | "Wax, Waa, Wya = np.random.randn(n_a, vocab_size), np.random.randn(n_a, n_a), np.random.randn(vocab_size, n_a)\n", 418 | "b, by = np.random.randn(n_a, 1), np.random.randn(vocab_size, 1)\n", 419 | "parameters = {\"Wax\": Wax, \"Waa\": Waa, \"Wya\": Wya, \"b\": b, \"by\": by}\n", 420 | "\n", 421 | "\n", 422 | "indices = sample(parameters, char_to_ix, 0)\n", 423 | "print(\"Sampling:\")\n", 424 | "print(\"list of sampled indices:\", indices)\n", 425 | "print(\"list of sampled characters:\", [ix_to_char[i] for i in indices])" 426 | ] 427 | }, 428 | { 429 | "cell_type": "markdown", 430 | "metadata": {}, 431 | "source": [ 432 | "** Expected output:**\n", 433 | "\n", 434 | "\n", 435 | " \n", 438 | " \n", 441 | " \n", 442 | " \n", 445 | " \n", 448 | " \n", 449 | " \n", 450 | " \n", 451 | "\n", 452 | "
\n", 436 | " **list of sampled indices:**\n", 437 | " \n", 439 | " [18, 2, 26, 0]\n", 440 | "
\n", 443 | " **list of sampled characters:**\n", 444 | " \n", 446 | " ['r', 'b', 'z', '\\n']\n", 447 | "
" 453 | ] 454 | }, 455 | { 456 | "cell_type": "markdown", 457 | "metadata": {}, 458 | "source": [ 459 | "## 3 - Building the language model \n", 460 | "\n", 461 | "It is time to build the character-level language model for text generation. \n", 462 | "\n", 463 | "\n", 464 | "### 3.1 - Gradient descent \n", 465 | "\n", 466 | "In this section you will implement a function performing one step of stochastic gradient descent (with clipped gradients). You will go through the training examples one at a time, so the optimization algorithm will be stochastic gradient descent. As a reminder, here are the steps of a common optimization loop for an RNN:\n", 467 | "\n", 468 | "- Forward propagate through the RNN to compute the loss\n", 469 | "- Backward propagate through time to compute the gradients of the loss with respect to the parameters\n", 470 | "- Clip the gradients if necessary \n", 471 | "- Update your parameters using gradient descent \n", 472 | "\n", 473 | "**Exercise**: Implement this optimization process (one step of stochastic gradient descent). \n", 474 | "\n", 475 | "We provide you with the following functions: \n", 476 | "\n", 477 | "```python\n", 478 | "def rnn_forward(X, Y, a_prev, parameters):\n", 479 | " \"\"\" Performs the forward propagation through the RNN and computes the cross-entropy loss.\n", 480 | " It returns the loss' value as well as a \"cache\" storing values to be used in the backpropagation.\"\"\"\n", 481 | " ....\n", 482 | " return loss, cache\n", 483 | " \n", 484 | "def rnn_backward(X, Y, parameters, cache):\n", 485 | " \"\"\" Performs the backward propagation through time to compute the gradients of the loss with respect\n", 486 | " to the parameters. It returns also all the hidden states.\"\"\"\n", 487 | " ...\n", 488 | " return gradients, a\n", 489 | "\n", 490 | "def update_parameters(parameters, gradients, learning_rate):\n", 491 | " \"\"\" Updates parameters using the Gradient Descent Update Rule.\"\"\"\n", 492 | " ...\n", 493 | " return parameters\n", 494 | "```" 495 | ] 496 | }, 497 | { 498 | "cell_type": "code", 499 | "execution_count": 61, 500 | "metadata": { 501 | "collapsed": true 502 | }, 503 | "outputs": [], 504 | "source": [ 505 | "# GRADED FUNCTION: optimize\n", 506 | "\n", 507 | "def optimize(X, Y, a_prev, parameters, learning_rate = 0.01):\n", 508 | " \"\"\"\n", 509 | " Execute one step of the optimization to train the model.\n", 510 | " \n", 511 | " Arguments:\n", 512 | " X -- list of integers, where each integer is a number that maps to a character in the vocabulary.\n", 513 | " Y -- list of integers, exactly the same as X but shifted one index to the left.\n", 514 | " a_prev -- previous hidden state.\n", 515 | " parameters -- python dictionary containing:\n", 516 | " Wax -- Weight matrix multiplying the input, numpy array of shape (n_a, n_x)\n", 517 | " Waa -- Weight matrix multiplying the hidden state, numpy array of shape (n_a, n_a)\n", 518 | " Wya -- Weight matrix relating the hidden-state to the output, numpy array of shape (n_y, n_a)\n", 519 | " b -- Bias, numpy array of shape (n_a, 1)\n", 520 | " by -- Bias relating the hidden-state to the output, numpy array of shape (n_y, 1)\n", 521 | " learning_rate -- learning rate for the model.\n", 522 | " \n", 523 | " Returns:\n", 524 | " loss -- value of the loss function (cross-entropy)\n", 525 | " gradients -- python dictionary containing:\n", 526 | " dWax -- Gradients of input-to-hidden weights, of shape (n_a, n_x)\n", 527 | " dWaa -- Gradients of hidden-to-hidden weights, of shape (n_a, n_a)\n", 528 | " dWya -- Gradients of hidden-to-output weights, of shape (n_y, n_a)\n", 529 | " db -- Gradients of bias vector, of shape (n_a, 1)\n", 530 | " dby -- Gradients of output bias vector, of shape (n_y, 1)\n", 531 | " a[len(X)-1] -- the last hidden state, of shape (n_a, 1)\n", 532 | " \"\"\"\n", 533 | " \n", 534 | " ### START CODE HERE ###\n", 535 | " \n", 536 | " # Forward propagate through time (≈1 line)\n", 537 | " loss, cache = rnn_forward(X, Y, a_prev, parameters)\n", 538 | " \n", 539 | " # Backpropagate through time (≈1 line)\n", 540 | " gradients, a = rnn_backward(X, Y, parameters, cache)\n", 541 | " \n", 542 | " # Clip your gradients between -5 (min) and 5 (max) (≈1 line)\n", 543 | " gradients = clip(gradients, 5)\n", 544 | " \n", 545 | " # Update parameters (≈1 line)\n", 546 | " parameters = update_parameters(parameters, gradients, learning_rate)\n", 547 | " \n", 548 | " ### END CODE HERE ###\n", 549 | " \n", 550 | " return loss, gradients, a[len(X)-1]" 551 | ] 552 | }, 553 | { 554 | "cell_type": "code", 555 | "execution_count": 62, 556 | "metadata": {}, 557 | "outputs": [ 558 | { 559 | "name": "stdout", 560 | "output_type": "stream", 561 | "text": [ 562 | "Loss = 126.503975722\n", 563 | "gradients[\"dWaa\"][1][2] = 0.194709315347\n", 564 | "np.argmax(gradients[\"dWax\"]) = 93\n", 565 | "gradients[\"dWya\"][1][2] = -0.007773876032\n", 566 | "gradients[\"db\"][4] = [-0.06809825]\n", 567 | "gradients[\"dby\"][1] = [ 0.01538192]\n", 568 | "a_last[4] = [-1.]\n" 569 | ] 570 | } 571 | ], 572 | "source": [ 573 | "np.random.seed(1)\n", 574 | "vocab_size, n_a = 27, 100\n", 575 | "a_prev = np.random.randn(n_a, 1)\n", 576 | "Wax, Waa, Wya = np.random.randn(n_a, vocab_size), np.random.randn(n_a, n_a), np.random.randn(vocab_size, n_a)\n", 577 | "b, by = np.random.randn(n_a, 1), np.random.randn(vocab_size, 1)\n", 578 | "parameters = {\"Wax\": Wax, \"Waa\": Waa, \"Wya\": Wya, \"b\": b, \"by\": by}\n", 579 | "X = [12,3,5,11,22,3]\n", 580 | "Y = [4,14,11,22,25, 26]\n", 581 | "\n", 582 | "loss, gradients, a_last = optimize(X, Y, a_prev, parameters, learning_rate = 0.01)\n", 583 | "print(\"Loss =\", loss)\n", 584 | "print(\"gradients[\\\"dWaa\\\"][1][2] =\", gradients[\"dWaa\"][1][2])\n", 585 | "print(\"np.argmax(gradients[\\\"dWax\\\"]) =\", np.argmax(gradients[\"dWax\"]))\n", 586 | "print(\"gradients[\\\"dWya\\\"][1][2] =\", gradients[\"dWya\"][1][2])\n", 587 | "print(\"gradients[\\\"db\\\"][4] =\", gradients[\"db\"][4])\n", 588 | "print(\"gradients[\\\"dby\\\"][1] =\", gradients[\"dby\"][1])\n", 589 | "print(\"a_last[4] =\", a_last[4])" 590 | ] 591 | }, 592 | { 593 | "cell_type": "markdown", 594 | "metadata": {}, 595 | "source": [ 596 | "** Expected output:**\n", 597 | "\n", 598 | "\n", 599 | "\n", 600 | "\n", 601 | "\n", 602 | " \n", 605 | " \n", 608 | "\n", 609 | "\n", 610 | " \n", 613 | " \n", 616 | "\n", 617 | " \n", 620 | " \n", 622 | "\n", 623 | "\n", 624 | " \n", 627 | " \n", 629 | "\n", 630 | "\n", 631 | " \n", 634 | " \n", 636 | "\n", 637 | "\n", 638 | " \n", 641 | " \n", 643 | "\n", 644 | "\n", 645 | " \n", 648 | " \n", 650 | "\n", 651 | "\n", 652 | "
\n", 603 | " **Loss **\n", 604 | " \n", 606 | " 126.503975722\n", 607 | "
\n", 611 | " **gradients[\"dWaa\"][1][2]**\n", 612 | " \n", 614 | " 0.194709315347\n", 615 | "
\n", 618 | " **np.argmax(gradients[\"dWax\"])**\n", 619 | " 93\n", 621 | "
\n", 625 | " **gradients[\"dWya\"][1][2]**\n", 626 | " -0.007773876032\n", 628 | "
\n", 632 | " **gradients[\"db\"][4]**\n", 633 | " [-0.06809825]\n", 635 | "
\n", 639 | " **gradients[\"dby\"][1]**\n", 640 | " [ 0.01538192]\n", 642 | "
\n", 646 | " **a_last[4]**\n", 647 | " [-1.]\n", 649 | "
" 653 | ] 654 | }, 655 | { 656 | "cell_type": "markdown", 657 | "metadata": { 658 | "collapsed": true 659 | }, 660 | "source": [ 661 | "### 3.2 - Training the model " 662 | ] 663 | }, 664 | { 665 | "cell_type": "markdown", 666 | "metadata": {}, 667 | "source": [ 668 | "Given the dataset of dinosaur names, we use each line of the dataset (one name) as one training example. Every 100 steps of stochastic gradient descent, you will sample 10 randomly chosen names to see how the algorithm is doing. Remember to shuffle the dataset, so that stochastic gradient descent visits the examples in random order. \n", 669 | "\n", 670 | "**Exercise**: Follow the instructions and implement `model()`. When `examples[index]` contains one dinosaur name (string), to create an example (X, Y), you can use this:\n", 671 | "```python\n", 672 | " index = j % len(examples)\n", 673 | " X = [None] + [char_to_ix[ch] for ch in examples[index]] \n", 674 | " Y = X[1:] + [char_to_ix[\"\\n\"]]\n", 675 | "```\n", 676 | "Note that we use: `index= j % len(examples)`, where `j = 1....num_iterations`, to make sure that `examples[index]` is always a valid statement (`index` is smaller than `len(examples)`).\n", 677 | "The first entry of `X` being `None` will be interpreted by `rnn_forward()` as setting $x^{\\langle 0 \\rangle} = \\vec{0}$. Further, this ensures that `Y` is equal to `X` but shifted one step to the left, and with an additional \"\\n\" appended to signify the end of the dinosaur name. " 678 | ] 679 | }, 680 | { 681 | "cell_type": "code", 682 | "execution_count": 65, 683 | "metadata": { 684 | "collapsed": true 685 | }, 686 | "outputs": [], 687 | "source": [ 688 | "# GRADED FUNCTION: model\n", 689 | "\n", 690 | "def model(data, ix_to_char, char_to_ix, num_iterations = 35000, n_a = 50, dino_names = 7, vocab_size = 27):\n", 691 | " \"\"\"\n", 692 | " Trains the model and generates dinosaur names. \n", 693 | " \n", 694 | " Arguments:\n", 695 | " data -- text corpus\n", 696 | " ix_to_char -- dictionary that maps the index to a character\n", 697 | " char_to_ix -- dictionary that maps a character to an index\n", 698 | " num_iterations -- number of iterations to train the model for\n", 699 | " n_a -- number of units of the RNN cell\n", 700 | " dino_names -- number of dinosaur names you want to sample at each iteration. \n", 701 | " vocab_size -- number of unique characters found in the text, size of the vocabulary\n", 702 | " \n", 703 | " Returns:\n", 704 | " parameters -- learned parameters\n", 705 | " \"\"\"\n", 706 | " \n", 707 | " # Retrieve n_x and n_y from vocab_size\n", 708 | " n_x, n_y = vocab_size, vocab_size\n", 709 | " \n", 710 | " # Initialize parameters\n", 711 | " parameters = initialize_parameters(n_a, n_x, n_y)\n", 712 | " \n", 713 | " # Initialize loss (this is required because we want to smooth our loss, don't worry about it)\n", 714 | " loss = get_initial_loss(vocab_size, dino_names)\n", 715 | " \n", 716 | " # Build list of all dinosaur names (training examples).\n", 717 | " with open(\"dinos.txt\") as f:\n", 718 | " examples = f.readlines()\n", 719 | " examples = [x.lower().strip() for x in examples]\n", 720 | " \n", 721 | " # Shuffle list of all dinosaur names\n", 722 | " # shuffle(examples)\n", 723 | " \n", 724 | " # Initialize the hidden state of your LSTM\n", 725 | " a_prev = np.zeros((n_a, 1))\n", 726 | " \n", 727 | " # Optimization loop\n", 728 | " for j in range(num_iterations):\n", 729 | " \n", 730 | " ### START CODE HERE ###\n", 731 | " \n", 732 | " # Use the hint above to define one training example (X,Y) (≈ 2 lines)\n", 733 | " index = j % len(examples)\n", 734 | " X = [None] + [char_to_ix[ch] for ch in examples[index]] \n", 735 | " Y = X[1:] + [char_to_ix[\"\\n\"]]\n", 736 | " \n", 737 | " # Perform one optimization step: Forward-prop -> Backward-prop -> Clip -> Update parameters\n", 738 | " # Choose a learning rate of 0.01\n", 739 | " curr_loss, gradients, a_prev = optimize(X, Y, a_prev, parameters, learning_rate = 0.01)\n", 740 | " \n", 741 | " ### END CODE HERE ###\n", 742 | " \n", 743 | " # Use a latency trick to keep the loss smooth. It happens here to accelerate the training.\n", 744 | " loss = smooth(loss, curr_loss)\n", 745 | "\n", 746 | " # Every 2000 Iteration, generate \"n\" characters thanks to sample() to check if the model is learning properly\n", 747 | " if j % 2000 == 0:\n", 748 | " \n", 749 | " print('Iteration: %d, Loss: %f' % (j, loss) + '\\n')\n", 750 | " \n", 751 | " # The number of dinosaur names to print\n", 752 | " seed = 0\n", 753 | " for name in range(dino_names):\n", 754 | " \n", 755 | " # Sample indices and print them\n", 756 | " sampled_indices = sample(parameters, char_to_ix, seed)\n", 757 | " print_sample(sampled_indices, ix_to_char)\n", 758 | " \n", 759 | " seed += 1 # To get the same result for grading purposed, increment the seed by one. \n", 760 | " \n", 761 | " print('\\n')\n", 762 | " \n", 763 | " return parameters" 764 | ] 765 | }, 766 | { 767 | "cell_type": "markdown", 768 | "metadata": {}, 769 | "source": [ 770 | "Run the following cell, you should observe your model outputting random-looking characters at the first iteration. After a few thousand iterations, your model should learn to generate reasonable-looking names. " 771 | ] 772 | }, 773 | { 774 | "cell_type": "code", 775 | "execution_count": 66, 776 | "metadata": { 777 | "scrolled": true 778 | }, 779 | "outputs": [ 780 | { 781 | "name": "stdout", 782 | "output_type": "stream", 783 | "text": [ 784 | "Iteration: 0, Loss: 23.093932\n", 785 | "\n", 786 | "Nkzxwtdmeqoeyhsqwasjjjvu\n", 787 | "Kneb\n", 788 | "Kzxwtdmeqoeyhsqwasjjjvu\n", 789 | "Neb\n", 790 | "Zxwtdmeqoeyhsqwasjjjvu\n", 791 | "Eb\n", 792 | "Xwtdmeqoeyhsqwasjjjvu\n", 793 | "\n", 794 | "\n", 795 | "Iteration: 2000, Loss: 26.681733\n", 796 | "\n", 797 | "Iiytronheoravhoss\n", 798 | "Eola\n", 799 | "Eytosaurus\n", 800 | "Idaahosalaus\n", 801 | "Xuskeonoraveros\n", 802 | "Cdalosan\n", 803 | "Tosaurus\n", 804 | "\n", 805 | "\n", 806 | "Iteration: 4000, Loss: 24.336871\n", 807 | "\n", 808 | "Mevtosaurus\n", 809 | "Lnecacosaurus\n", 810 | "Lustonacor\n", 811 | "Mecacoteeaosaurus\n", 812 | "Vusmenatontosaurus\n", 813 | "Gaacosaurus\n", 814 | "Tosaurus\n", 815 | "\n", 816 | "\n", 817 | "Iteration: 6000, Loss: 23.075217\n", 818 | "\n", 819 | "Titotomanosaurus\n", 820 | "Sidachssalchunthenus\n", 821 | "Sutotcmenonus\n", 822 | "Tha\n", 823 | "Utoterchomunmtocongitochusus\n", 824 | "Laaiton\n", 825 | "Totaresaurus\n", 826 | "\n", 827 | "\n", 828 | "Iteration: 8000, Loss: 22.248987\n", 829 | "\n", 830 | "Liusus\n", 831 | "Huecaepsamaruosaurus\n", 832 | "Hususaurus\n", 833 | "Leaaerur\n", 834 | "Xusocheopeurus\n", 835 | "Caaeroleanus\n", 836 | "Usicheopeurus\n", 837 | "\n", 838 | "\n", 839 | "Iteration: 10000, Loss: 21.864302\n", 840 | "\n", 841 | "Liytrongisaurus\n", 842 | "Lidbamrong\n", 843 | "Lytronigosaurus\n", 844 | "Lebalrrgbansaurus\n", 845 | "Xussanisaurus\n", 846 | "Haagronechusilenxongyenntantosaurus\n", 847 | "Tosaurus\n", 848 | "\n", 849 | "\n", 850 | "Iteration: 12000, Loss: 21.570180\n", 851 | "\n", 852 | "Sivusigjirigusauros\n", 853 | "Pkibaisilecitithelus\n", 854 | "Pustrgomindylusaproplos\n", 855 | "Segadosaurus\n", 856 | "Wusohigosaurus\n", 857 | "Keadrola\n", 858 | "Sprimenmevisaurus\n", 859 | "\n", 860 | "\n", 861 | "Iteration: 14000, Loss: 21.092251\n", 862 | "\n", 863 | "Levrosaurus\n", 864 | "Hedaa\n", 865 | "Hustranipheverataps\n", 866 | "Labaisaurus\n", 867 | "Yusiangosaurus\n", 868 | "Abaespacarmitan\n", 869 | "Vus\n", 870 | "\n", 871 | "\n", 872 | "Iteration: 16000, Loss: 21.031843\n", 873 | "\n", 874 | "Hovryphongontasaurus\n", 875 | "Hulbalosaurus\n", 876 | "Hyptokonihhylosfarifhxuchyoten\n", 877 | "Hicajpsaurus\n", 878 | "Xysterhosaurus\n", 879 | "Heaesoneanthyleptophyfispegus\n", 880 | "Sypengosaurus\n", 881 | "\n", 882 | "\n", 883 | "Iteration: 18000, Loss: 20.855732\n", 884 | "\n", 885 | "Prutotancoraxbprobodonopantowalaphotan\n", 886 | "Oracaetopachynphopuin\n", 887 | "Ottroceratous\n", 888 | "Pri\n", 889 | "Wrroceisaurus\n", 890 | "Leagotacaroptor\n", 891 | "Rrochenodratos\n", 892 | "\n", 893 | "\n", 894 | "Iteration: 20000, Loss: 20.912079\n", 895 | "\n", 896 | "Onyushangosaurus\n", 897 | "Licaaaurus\n", 898 | "Lustogonlontashuaprigsialtous\n", 899 | "Ogaadrosaurus\n", 900 | "Yusteratops\n", 901 | "Gaadpsaachus\n", 902 | "Xungongontasaurus\n", 903 | "\n", 904 | "\n", 905 | "Iteration: 22000, Loss: 20.591554\n", 906 | "\n", 907 | "Krystongeus\n", 908 | "Euca\n", 909 | "Eustriongonus\n", 910 | "Kocalsjeicesaurus\n", 911 | "Yusterasodusheucopinto\n", 912 | "Ejacrsat\n", 913 | "Uspanhosaurus\n", 914 | "\n", 915 | "\n", 916 | "Iteration: 24000, Loss: 20.594021\n", 917 | "\n", 918 | "Miutosaurus\n", 919 | "Logdants\n", 920 | "Lytosaurus\n", 921 | "Medagrosaurus\n", 922 | "Yustosaurus\n", 923 | "Haaersas\n", 924 | "Trodorgonus\n", 925 | "\n", 926 | "\n", 927 | "Iteration: 26000, Loss: 20.403874\n", 928 | "\n", 929 | "Rivusaurus\n", 930 | "Praaalosaurus\n", 931 | "Pustrasaurus\n", 932 | "Racalosaurus\n", 933 | "Xutranatiasaurus\n", 934 | "Kaagosaurus\n", 935 | "Usthalosaurus\n", 936 | "\n", 937 | "\n", 938 | "Iteration: 28000, Loss: 20.293634\n", 939 | "\n", 940 | "Liustrimarnathosaurus\n", 941 | "Eracakrona\n", 942 | "Gytrolomprnynosaurus\n", 943 | "Lacalosaurus\n", 944 | "Yurofonnonychus\n", 945 | "Caagronachyosaurus\n", 946 | "Vurangosaurus\n", 947 | "\n", 948 | "\n", 949 | "Iteration: 30000, Loss: 20.301057\n", 950 | "\n", 951 | "Ljyusalongorus\n", 952 | "Loia\n", 953 | "Lyutia\n", 954 | "Lacalosaurus\n", 955 | "Yusocheosaurus\n", 956 | "Haagusanchuschanus\n", 957 | "Uusangosaurus\n", 958 | "\n", 959 | "\n", 960 | "Iteration: 32000, Loss: 20.107862\n", 961 | "\n", 962 | "Rixspterischvisps\n", 963 | "Pikeadps\n", 964 | "Quspriphosaurus\n", 965 | "Ricalosaurus\n", 966 | "Xuspeolosaurus\n", 967 | "Keafrondelus\n", 968 | "Spogolosaurus\n", 969 | "\n", 970 | "\n", 971 | "Iteration: 34000, Loss: 19.956776\n", 972 | "\n", 973 | "Lixtosaurus\n", 974 | "Hicabersar\n", 975 | "Hytssaurus\n", 976 | "Leebasiacantisaurus\n", 977 | "Ytosaurus\n", 978 | "Babbosaurus\n", 979 | "Wuscihingxerosaurus\n", 980 | "\n", 981 | "\n" 982 | ] 983 | } 984 | ], 985 | "source": [ 986 | "parameters = model(data, ix_to_char, char_to_ix)" 987 | ] 988 | }, 989 | { 990 | "cell_type": "markdown", 991 | "metadata": { 992 | "collapsed": true 993 | }, 994 | "source": [ 995 | "## Conclusion\n", 996 | "\n", 997 | "You can see that your algorithm has started to generate plausible dinosaur names towards the end of the training. At first, it was generating random characters, but towards the end you could see dinosaur names with cool endings. Feel free to run the algorithm even longer and play with hyperparameters to see if you can get even better results. Our implemetation generated some really cool names like `maconucon`, `marloralus` and `macingsersaurus`. Your model hopefully also learned that dinosaur names tend to end in `saurus`, `don`, `aura`, `tor`, etc.\n", 998 | "\n", 999 | "If your model generates some non-cool names, don't blame the model entirely--not all actual dinosaur names sound cool. (For example, `dromaeosauroides` is an actual dinosaur name and is in the training set.) But this model should give you a set of candidates from which you can pick the coolest! \n", 1000 | "\n", 1001 | "This assignment had used a relatively small dataset, so that you could train an RNN quickly on a CPU. Training a model of the english language requires a much bigger dataset, and usually needs much more computation, and could run for many hours on GPUs. We ran our dinosaur name for quite some time, and so far our favoriate name is the great, undefeatable, and fierce: Mangosaurus!\n", 1002 | "\n", 1003 | "" 1004 | ] 1005 | }, 1006 | { 1007 | "cell_type": "markdown", 1008 | "metadata": {}, 1009 | "source": [ 1010 | "## 4 - Writing like Shakespeare\n", 1011 | "\n", 1012 | "The rest of this notebook is optional and is not graded, but we hope you'll do it anyway since it's quite fun and informative. \n", 1013 | "\n", 1014 | "A similar (but more complicated) task is to generate Shakespeare poems. Instead of learning from a dataset of Dinosaur names you can use a collection of Shakespearian poems. Using LSTM cells, you can learn longer term dependencies that span many characters in the text--e.g., where a character appearing somewhere a sequence can influence what should be a different character much much later in ths sequence. These long term dependencies were less important with dinosaur names, since the names were quite short. \n", 1015 | "\n", 1016 | "\n", 1017 | "\n", 1018 | "
Let's become poets!
\n", 1019 | "\n", 1020 | "We have implemented a Shakespeare poem generator with Keras. Run the following cell to load the required packages and models. This may take a few minutes. " 1021 | ] 1022 | }, 1023 | { 1024 | "cell_type": "code", 1025 | "execution_count": 51, 1026 | "metadata": {}, 1027 | "outputs": [ 1028 | { 1029 | "name": "stderr", 1030 | "output_type": "stream", 1031 | "text": [ 1032 | "Using TensorFlow backend.\n" 1033 | ] 1034 | }, 1035 | { 1036 | "name": "stdout", 1037 | "output_type": "stream", 1038 | "text": [ 1039 | "Loading text data...\n", 1040 | "Creating training set...\n", 1041 | "number of training examples: 31412\n", 1042 | "Vectorizing training set...\n", 1043 | "Loading model...\n" 1044 | ] 1045 | } 1046 | ], 1047 | "source": [ 1048 | "from __future__ import print_function\n", 1049 | "from keras.callbacks import LambdaCallback\n", 1050 | "from keras.models import Model, load_model, Sequential\n", 1051 | "from keras.layers import Dense, Activation, Dropout, Input, Masking\n", 1052 | "from keras.layers import LSTM\n", 1053 | "from keras.utils.data_utils import get_file\n", 1054 | "from keras.preprocessing.sequence import pad_sequences\n", 1055 | "from shakespeare_utils import *\n", 1056 | "import sys\n", 1057 | "import io" 1058 | ] 1059 | }, 1060 | { 1061 | "cell_type": "markdown", 1062 | "metadata": {}, 1063 | "source": [ 1064 | "To save you some time, we have already trained a model for ~1000 epochs on a collection of Shakespearian poems called [*\"The Sonnets\"*](shakespeare.txt). " 1065 | ] 1066 | }, 1067 | { 1068 | "cell_type": "markdown", 1069 | "metadata": {}, 1070 | "source": [ 1071 | "Let's train the model for one more epoch. When it finishes training for an epoch---this will also take a few minutes---you can run `generate_output`, which will prompt asking you for an input (`<`40 characters). The poem will start with your sentence, and our RNN-Shakespeare will complete the rest of the poem for you! For example, try \"Forsooth this maketh no sense \" (don't enter the quotation marks). Depending on whether you include the space at the end, your results might also differ--try it both ways, and try other inputs as well. \n" 1072 | ] 1073 | }, 1074 | { 1075 | "cell_type": "code", 1076 | "execution_count": 52, 1077 | "metadata": { 1078 | "scrolled": true 1079 | }, 1080 | "outputs": [ 1081 | { 1082 | "name": "stdout", 1083 | "output_type": "stream", 1084 | "text": [ 1085 | "Epoch 1/1\n", 1086 | "31412/31412 [==============================] - 249s - loss: 2.5725 \n" 1087 | ] 1088 | }, 1089 | { 1090 | "data": { 1091 | "text/plain": [ 1092 | "" 1093 | ] 1094 | }, 1095 | "execution_count": 52, 1096 | "metadata": {}, 1097 | "output_type": "execute_result" 1098 | } 1099 | ], 1100 | "source": [ 1101 | "print_callback = LambdaCallback(on_epoch_end=on_epoch_end)\n", 1102 | "\n", 1103 | "model.fit(x, y, batch_size=128, epochs=1, callbacks=[print_callback])" 1104 | ] 1105 | }, 1106 | { 1107 | "cell_type": "code", 1108 | "execution_count": 53, 1109 | "metadata": {}, 1110 | "outputs": [ 1111 | { 1112 | "name": "stdout", 1113 | "output_type": "stream", 1114 | "text": [ 1115 | "Write the beginning of your poem, the Shakespeare machine will complete it. Your input is: Good things take time!\n", 1116 | "\n", 1117 | "\n", 1118 | "Here is your poem: \n", 1119 | "\n", 1120 | "Good things take time!\n", 1121 | "which the woess ar am eller of mole, for not is tride,\n", 1122 | "that ouns nagul ot new, though is seemen every.\n", 1123 | "\n", 1124 | "juter non mone to other pen a ame\n", 1125 | "even whose a from bese hearth on the etherisand livily,\n", 1126 | "my here for eis mayt no bosar my prome.\n", 1127 | " for what os erch uteters hadt deled:,\n", 1128 | "your more the dereng is dellens all kries!\n", 1129 | "why must bes to give werefanch dolne dame.\n", 1130 | "but haent matans the camtlefome mest ape" 1131 | ] 1132 | } 1133 | ], 1134 | "source": [ 1135 | "# Run this cell to try with different inputs without having to re-train the model \n", 1136 | "generate_output()" 1137 | ] 1138 | }, 1139 | { 1140 | "cell_type": "markdown", 1141 | "metadata": {}, 1142 | "source": [ 1143 | "The RNN-Shakespeare model is very similar to the one you have built for dinosaur names. The only major differences are:\n", 1144 | "- LSTMs instead of the basic RNN to capture longer-range dependencies\n", 1145 | "- The model is a deeper, stacked LSTM model (2 layer)\n", 1146 | "- Using Keras instead of python to simplify the code \n", 1147 | "\n", 1148 | "If you want to learn more, you can also check out the Keras Team's text generation implementation on GitHub: https://github.com/keras-team/keras/blob/master/examples/lstm_text_generation.py.\n", 1149 | "\n", 1150 | "Congratulations on finishing this notebook! " 1151 | ] 1152 | }, 1153 | { 1154 | "cell_type": "markdown", 1155 | "metadata": {}, 1156 | "source": [ 1157 | "**References**:\n", 1158 | "- This exercise took inspiration from Andrej Karpathy's implementation: https://gist.github.com/karpathy/d4dee566867f8291f086. To learn more about text generation, also check out Karpathy's [blog post](http://karpathy.github.io/2015/05/21/rnn-effectiveness/).\n", 1159 | "- For the Shakespearian poem generator, our implementation was based on the implementation of an LSTM text generator by the Keras team: https://github.com/keras-team/keras/blob/master/examples/lstm_text_generation.py " 1160 | ] 1161 | }, 1162 | { 1163 | "cell_type": "code", 1164 | "execution_count": null, 1165 | "metadata": { 1166 | "collapsed": true 1167 | }, 1168 | "outputs": [], 1169 | "source": [] 1170 | } 1171 | ], 1172 | "metadata": { 1173 | "coursera": { 1174 | "course_slug": "nlp-sequence-models", 1175 | "graded_item_id": "1dYg0", 1176 | "launcher_item_id": "MLhxP" 1177 | }, 1178 | "kernelspec": { 1179 | "display_name": "Python 3", 1180 | "language": "python", 1181 | "name": "python3" 1182 | }, 1183 | "language_info": { 1184 | "codemirror_mode": { 1185 | "name": "ipython", 1186 | "version": 3 1187 | }, 1188 | "file_extension": ".py", 1189 | "mimetype": "text/x-python", 1190 | "name": "python", 1191 | "nbconvert_exporter": "python", 1192 | "pygments_lexer": "ipython3", 1193 | "version": "3.6.0" 1194 | } 1195 | }, 1196 | "nbformat": 4, 1197 | "nbformat_minor": 2 1198 | } 1199 | -------------------------------------------------------------------------------- /course5_sequential_models/Operations+on+word+vectors+-+v1.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Operations on word vectors\n", 8 | "\n", 9 | "Welcome to your first assignment of this week! \n", 10 | "\n", 11 | "Because word embeddings are very computionally expensive to train, most ML practitioners will load a pre-trained set of embeddings. \n", 12 | "\n", 13 | "**After this assignment you will be able to:**\n", 14 | "\n", 15 | "- Load pre-trained word vectors, and measure similarity using cosine similarity\n", 16 | "- Use word embeddings to solve word analogy problems such as Man is to Woman as King is to ______. \n", 17 | "- Modify word embeddings to reduce their gender bias \n", 18 | "\n", 19 | "Let's get started! Run the following cell to load the packages you will need." 20 | ] 21 | }, 22 | { 23 | "cell_type": "code", 24 | "execution_count": 1, 25 | "metadata": {}, 26 | "outputs": [ 27 | { 28 | "name": "stderr", 29 | "output_type": "stream", 30 | "text": [ 31 | "Using TensorFlow backend.\n" 32 | ] 33 | } 34 | ], 35 | "source": [ 36 | "import numpy as np\n", 37 | "from w2v_utils import *" 38 | ] 39 | }, 40 | { 41 | "cell_type": "markdown", 42 | "metadata": {}, 43 | "source": [ 44 | "Next, lets load the word vectors. For this assignment, we will use 50-dimensional GloVe vectors to represent words. Run the following cell to load the `word_to_vec_map`. " 45 | ] 46 | }, 47 | { 48 | "cell_type": "code", 49 | "execution_count": 2, 50 | "metadata": { 51 | "collapsed": true 52 | }, 53 | "outputs": [], 54 | "source": [ 55 | "words, word_to_vec_map = read_glove_vecs('data/glove.6B.50d.txt')" 56 | ] 57 | }, 58 | { 59 | "cell_type": "markdown", 60 | "metadata": {}, 61 | "source": [ 62 | "You've loaded:\n", 63 | "- `words`: set of words in the vocabulary.\n", 64 | "- `word_to_vec_map`: dictionary mapping words to their GloVe vector representation.\n", 65 | "\n", 66 | "You've seen that one-hot vectors do not do a good job cpaturing what words are similar. GloVe vectors provide much more useful information about the meaning of individual words. Lets now see how you can use GloVe vectors to decide how similar two words are. \n", 67 | "\n" 68 | ] 69 | }, 70 | { 71 | "cell_type": "markdown", 72 | "metadata": {}, 73 | "source": [ 74 | "# 1 - Cosine similarity\n", 75 | "\n", 76 | "To measure how similar two words are, we need a way to measure the degree of similarity between two embedding vectors for the two words. Given two vectors $u$ and $v$, cosine similarity is defined as follows: \n", 77 | "\n", 78 | "$$\\text{CosineSimilarity(u, v)} = \\frac {u . v} {||u||_2 ||v||_2} = cos(\\theta) \\tag{1}$$\n", 79 | "\n", 80 | "where $u.v$ is the dot product (or inner product) of two vectors, $||u||_2$ is the norm (or length) of the vector $u$, and $\\theta$ is the angle between $u$ and $v$. This similarity depends on the angle between $u$ and $v$. If $u$ and $v$ are very similar, their cosine similarity will be close to 1; if they are dissimilar, the cosine similarity will take a smaller value. \n", 81 | "\n", 82 | "\n", 83 | "
**Figure 1**: The cosine of the angle between two vectors is a measure of how similar they are
\n", 84 | "\n", 85 | "**Exercise**: Implement the function `cosine_similarity()` to evaluate similarity between word vectors.\n", 86 | "\n", 87 | "**Reminder**: The norm of $u$ is defined as $ ||u||_2 = \\sqrt{\\sum_{i=1}^{n} u_i^2}$" 88 | ] 89 | }, 90 | { 91 | "cell_type": "code", 92 | "execution_count": 3, 93 | "metadata": { 94 | "collapsed": true 95 | }, 96 | "outputs": [], 97 | "source": [ 98 | "# GRADED FUNCTION: cosine_similarity\n", 99 | "\n", 100 | "def cosine_similarity(u, v):\n", 101 | " \"\"\"\n", 102 | " Cosine similarity reflects the degree of similariy between u and v\n", 103 | " \n", 104 | " Arguments:\n", 105 | " u -- a word vector of shape (n,) \n", 106 | " v -- a word vector of shape (n,)\n", 107 | "\n", 108 | " Returns:\n", 109 | " cosine_similarity -- the cosine similarity between u and v defined by the formula above.\n", 110 | " \"\"\"\n", 111 | " \n", 112 | " distance = 0.0\n", 113 | " \n", 114 | " ### START CODE HERE ###\n", 115 | " # Compute the dot product between u and v (≈1 line)\n", 116 | " dot = np.dot(u, v)\n", 117 | " # Compute the L2 norm of u (≈1 line)\n", 118 | " norm_u = np.sqrt(np.sum(u**2))\n", 119 | " \n", 120 | " # Compute the L2 norm of v (≈1 line)\n", 121 | " norm_v = np.sqrt(np.sum(v**2))\n", 122 | " # Compute the cosine similarity defined by formula (1) (≈1 line)\n", 123 | " cosine_similarity = dot/norm_u/norm_v\n", 124 | " ### END CODE HERE ###\n", 125 | " \n", 126 | " return cosine_similarity" 127 | ] 128 | }, 129 | { 130 | "cell_type": "code", 131 | "execution_count": 4, 132 | "metadata": {}, 133 | "outputs": [ 134 | { 135 | "name": "stdout", 136 | "output_type": "stream", 137 | "text": [ 138 | "cosine_similarity(father, mother) = 0.890903844289\n", 139 | "cosine_similarity(ball, crocodile) = 0.274392462614\n", 140 | "cosine_similarity(france - paris, rome - italy) = -0.675147930817\n" 141 | ] 142 | } 143 | ], 144 | "source": [ 145 | "father = word_to_vec_map[\"father\"]\n", 146 | "mother = word_to_vec_map[\"mother\"]\n", 147 | "ball = word_to_vec_map[\"ball\"]\n", 148 | "crocodile = word_to_vec_map[\"crocodile\"]\n", 149 | "france = word_to_vec_map[\"france\"]\n", 150 | "italy = word_to_vec_map[\"italy\"]\n", 151 | "paris = word_to_vec_map[\"paris\"]\n", 152 | "rome = word_to_vec_map[\"rome\"]\n", 153 | "\n", 154 | "print(\"cosine_similarity(father, mother) = \", cosine_similarity(father, mother))\n", 155 | "print(\"cosine_similarity(ball, crocodile) = \",cosine_similarity(ball, crocodile))\n", 156 | "print(\"cosine_similarity(france - paris, rome - italy) = \",cosine_similarity(france - paris, rome - italy))" 157 | ] 158 | }, 159 | { 160 | "cell_type": "markdown", 161 | "metadata": {}, 162 | "source": [ 163 | "**Expected Output**:\n", 164 | "\n", 165 | "\n", 166 | " \n", 167 | " \n", 170 | " \n", 173 | " \n", 174 | " \n", 175 | " \n", 178 | " \n", 181 | " \n", 182 | " \n", 183 | " \n", 186 | " \n", 189 | " \n", 190 | "
\n", 168 | " **cosine_similarity(father, mother)** =\n", 169 | " \n", 171 | " 0.890903844289\n", 172 | "
\n", 176 | " **cosine_similarity(ball, crocodile)** =\n", 177 | " \n", 179 | " 0.274392462614\n", 180 | "
\n", 184 | " **cosine_similarity(france - paris, rome - italy)** =\n", 185 | " \n", 187 | " -0.675147930817\n", 188 | "
" 191 | ] 192 | }, 193 | { 194 | "cell_type": "markdown", 195 | "metadata": {}, 196 | "source": [ 197 | "After you get the correct expected output, please feel free to modify the inputs and measure the cosine similarity between other pairs of words! Playing around the cosine similarity of other inputs will give you a better sense of how word vectors behave. " 198 | ] 199 | }, 200 | { 201 | "cell_type": "markdown", 202 | "metadata": {}, 203 | "source": [ 204 | "## 2 - Word analogy task\n", 205 | "\n", 206 | "In the word analogy task, we complete the sentence \"*a* is to *b* as *c* is to **____**\". An example is '*man* is to *woman* as *king* is to *queen*' . In detail, we are trying to find a word *d*, such that the associated word vectors $e_a, e_b, e_c, e_d$ are related in the following manner: $e_b - e_a \\approx e_d - e_c$. We will measure the similarity between $e_b - e_a$ and $e_d - e_c$ using cosine similarity. \n", 207 | "\n", 208 | "**Exercise**: Complete the code below to be able to perform word analogies!" 209 | ] 210 | }, 211 | { 212 | "cell_type": "code", 213 | "execution_count": 54, 214 | "metadata": { 215 | "collapsed": true 216 | }, 217 | "outputs": [], 218 | "source": [ 219 | "# GRADED FUNCTION: complete_analogy\n", 220 | "\n", 221 | "def complete_analogy(word_a, word_b, word_c, word_to_vec_map):\n", 222 | " \"\"\"\n", 223 | " Performs the word analogy task as explained above: a is to b as c is to ____. \n", 224 | " \n", 225 | " Arguments:\n", 226 | " word_a -- a word, string\n", 227 | " word_b -- a word, string\n", 228 | " word_c -- a word, string\n", 229 | " word_to_vec_map -- dictionary that maps words to their corresponding vectors. \n", 230 | " \n", 231 | " Returns:\n", 232 | " best_word -- the word such that v_b - v_a is close to v_best_word - v_c, as measured by cosine similarity\n", 233 | " \"\"\"\n", 234 | " \n", 235 | " # convert words to lower case\n", 236 | " word_a, word_b, word_c = word_a.lower(), word_b.lower(), word_c.lower()\n", 237 | " \n", 238 | " ### START CODE HERE ###\n", 239 | " # Get the word embeddings v_a, v_b and v_c (≈1-3 lines)\n", 240 | " e_a, e_b, e_c = word_to_vec_map[word_a], word_to_vec_map[word_b], word_to_vec_map[word_c]\n", 241 | " ### END CODE HERE ###\n", 242 | " \n", 243 | " words = word_to_vec_map.keys()\n", 244 | " max_cosine_sim = -100 # Initialize max_cosine_sim to a large negative number\n", 245 | " best_word = None # Initialize best_word with None, it will help keep track of the word to output\n", 246 | "\n", 247 | " # loop over the whole word vector set\n", 248 | " for w in words: \n", 249 | " # to avoid best_word being one of the input words, pass on them.\n", 250 | " if w in [word_a, word_b, word_c] :\n", 251 | " continue\n", 252 | " \n", 253 | " ### START CODE HERE ###\n", 254 | " # Compute cosine similarity between the combined_vector and the current word (≈1 line)\n", 255 | " cosine_sim = cosine_similarity(e_b-e_a, word_to_vec_map[w]-e_c)\n", 256 | " \n", 257 | " # If the cosine_sim is more than the max_cosine_sim seen so far,\n", 258 | " # then: set the new max_cosine_sim to the current cosine_sim and the best_word to the current word (≈3 lines)\n", 259 | " if cosine_sim > max_cosine_sim:\n", 260 | " max_cosine_sim = cosine_sim\n", 261 | " best_word = w\n", 262 | " ### END CODE HERE ###\n", 263 | " \n", 264 | " return best_word" 265 | ] 266 | }, 267 | { 268 | "cell_type": "markdown", 269 | "metadata": {}, 270 | "source": [ 271 | "Run the cell below to test your code, this may take 1-2 minutes." 272 | ] 273 | }, 274 | { 275 | "cell_type": "code", 276 | "execution_count": 55, 277 | "metadata": {}, 278 | "outputs": [ 279 | { 280 | "name": "stdout", 281 | "output_type": "stream", 282 | "text": [ 283 | "italy -> italian :: spain -> spanish\n", 284 | "india -> delhi :: japan -> tokyo\n", 285 | "man -> woman :: boy -> girl\n", 286 | "small -> smaller :: large -> larger\n" 287 | ] 288 | } 289 | ], 290 | "source": [ 291 | "triads_to_try = [('italy', 'italian', 'spain'), ('india', 'delhi', 'japan'), ('man', 'woman', 'boy'), ('small', 'smaller', 'large')]\n", 292 | "for triad in triads_to_try:\n", 293 | " print ('{} -> {} :: {} -> {}'.format( *triad, complete_analogy(*triad,word_to_vec_map)))" 294 | ] 295 | }, 296 | { 297 | "cell_type": "markdown", 298 | "metadata": {}, 299 | "source": [ 300 | "**Expected Output**:\n", 301 | "\n", 302 | "\n", 303 | " \n", 304 | " \n", 307 | " \n", 310 | " \n", 311 | " \n", 312 | " \n", 315 | " \n", 318 | " \n", 319 | " \n", 320 | " \n", 323 | " \n", 326 | " \n", 327 | " \n", 328 | " \n", 331 | " \n", 334 | " \n", 335 | "
\n", 305 | " **italy -> italian** ::\n", 306 | " \n", 308 | " spain -> spanish\n", 309 | "
\n", 313 | " **india -> delhi** ::\n", 314 | " \n", 316 | " japan -> tokyo\n", 317 | "
\n", 321 | " **man -> woman ** ::\n", 322 | " \n", 324 | " boy -> girl\n", 325 | "
\n", 329 | " **small -> smaller ** ::\n", 330 | " \n", 332 | " large -> larger\n", 333 | "
" 336 | ] 337 | }, 338 | { 339 | "cell_type": "markdown", 340 | "metadata": {}, 341 | "source": [ 342 | "Once you get the correct expected output, please feel free to modify the input cells above to test your own analogies. Try to find some other analogy pairs that do work, but also find some where the algorithm doesn't give the right answer: For example, you can try small->smaller as big->?. " 343 | ] 344 | }, 345 | { 346 | "cell_type": "markdown", 347 | "metadata": {}, 348 | "source": [ 349 | "### Congratulations!\n", 350 | "\n", 351 | "You've come to the end of this assignment. Here are the main points you should remember:\n", 352 | "\n", 353 | "- Cosine similarity a good way to compare similarity between pairs of word vectors. (Though L2 distance works too.) \n", 354 | "- For NLP applications, using a pre-trained set of word vectors from the internet is often a good way to get started. \n", 355 | "\n", 356 | "Even though you have finished the graded portions, we recommend you take a look too at the rest of this notebook. \n", 357 | "\n", 358 | "Congratulations on finishing the graded portions of this notebook! \n" 359 | ] 360 | }, 361 | { 362 | "cell_type": "markdown", 363 | "metadata": {}, 364 | "source": [ 365 | "## 3 - Debiasing word vectors (OPTIONAL/UNGRADED) " 366 | ] 367 | }, 368 | { 369 | "cell_type": "markdown", 370 | "metadata": {}, 371 | "source": [ 372 | "In the following exercise, you will examine gender biases that can be reflected in a word embedding, and explore algorithms for reducing the bias. In addition to learning about the topic of debiasing, this exercise will also help hone your intuition about what word vectors are doing. This section involves a bit of linear algebra, though you can probably complete it even without being expert in linear algebra, and we encourage you to give it a shot. This portion of the notebook is optional and is not graded. \n", 373 | "\n", 374 | "Lets first see how the GloVe word embeddings relate to gender. You will first compute a vector $g = e_{woman}-e_{man}$, where $e_{woman}$ represents the word vector corresponding to the word *woman*, and $e_{man}$ corresponds to the word vector corresponding to the word *man*. The resulting vector $g$ roughly encodes the concept of \"gender\". (You might get a more accurate representation if you compute $g_1 = e_{mother}-e_{father}$, $g_2 = e_{girl}-e_{boy}$, etc. and average over them. But just using $e_{woman}-e_{man}$ will give good enough results for now.) \n" 375 | ] 376 | }, 377 | { 378 | "cell_type": "code", 379 | "execution_count": 9, 380 | "metadata": {}, 381 | "outputs": [ 382 | { 383 | "name": "stdout", 384 | "output_type": "stream", 385 | "text": [ 386 | "[-0.087144 0.2182 -0.40986 -0.03922 -0.1032 0.94165\n", 387 | " -0.06042 0.32988 0.46144 -0.35962 0.31102 -0.86824\n", 388 | " 0.96006 0.01073 0.24337 0.08193 -1.02722 -0.21122\n", 389 | " 0.695044 -0.00222 0.29106 0.5053 -0.099454 0.40445\n", 390 | " 0.30181 0.1355 -0.0606 -0.07131 -0.19245 -0.06115\n", 391 | " -0.3204 0.07165 -0.13337 -0.25068714 -0.14293 -0.224957\n", 392 | " -0.149 0.048882 0.12191 -0.27362 -0.165476 -0.20426\n", 393 | " 0.54376 -0.271425 -0.10245 -0.32108 0.2516 -0.33455\n", 394 | " -0.04371 0.01258 ]\n" 395 | ] 396 | } 397 | ], 398 | "source": [ 399 | "g = word_to_vec_map['woman'] - word_to_vec_map['man']\n", 400 | "print(g)" 401 | ] 402 | }, 403 | { 404 | "cell_type": "markdown", 405 | "metadata": {}, 406 | "source": [ 407 | "Now, you will consider the cosine similarity of different words with $g$. Consider what a positive value of similarity means vs a negative cosine similarity. " 408 | ] 409 | }, 410 | { 411 | "cell_type": "code", 412 | "execution_count": 10, 413 | "metadata": { 414 | "scrolled": false 415 | }, 416 | "outputs": [ 417 | { 418 | "name": "stdout", 419 | "output_type": "stream", 420 | "text": [ 421 | "List of names and their similarities with constructed vector:\n", 422 | "john -0.23163356146\n", 423 | "marie 0.315597935396\n", 424 | "sophie 0.318687898594\n", 425 | "ronaldo -0.312447968503\n", 426 | "priya 0.17632041839\n", 427 | "rahul -0.169154710392\n", 428 | "danielle 0.243932992163\n", 429 | "reza -0.079304296722\n", 430 | "katy 0.283106865957\n", 431 | "yasmin 0.233138577679\n" 432 | ] 433 | } 434 | ], 435 | "source": [ 436 | "print ('List of names and their similarities with constructed vector:')\n", 437 | "\n", 438 | "# girls and boys name\n", 439 | "name_list = ['john', 'marie', 'sophie', 'ronaldo', 'priya', 'rahul', 'danielle', 'reza', 'katy', 'yasmin']\n", 440 | "\n", 441 | "for w in name_list:\n", 442 | " print (w, cosine_similarity(word_to_vec_map[w], g))" 443 | ] 444 | }, 445 | { 446 | "cell_type": "markdown", 447 | "metadata": {}, 448 | "source": [ 449 | "As you can see, female first names tend to have a positive cosine similarity with our constructed vector $g$, while male first names tend to have a negative cosine similarity. This is not suprising, and the result seems acceptable. \n", 450 | "\n", 451 | "But let's try with some other words." 452 | ] 453 | }, 454 | { 455 | "cell_type": "code", 456 | "execution_count": 11, 457 | "metadata": {}, 458 | "outputs": [ 459 | { 460 | "name": "stdout", 461 | "output_type": "stream", 462 | "text": [ 463 | "Other words and their similarities:\n", 464 | "lipstick 0.276919162564\n", 465 | "guns -0.18884855679\n", 466 | "science -0.0608290654093\n", 467 | "arts 0.00818931238588\n", 468 | "literature 0.0647250443346\n", 469 | "warrior -0.209201646411\n", 470 | "doctor 0.118952894109\n", 471 | "tree -0.0708939917548\n", 472 | "receptionist 0.330779417506\n", 473 | "technology -0.131937324476\n", 474 | "fashion 0.0356389462577\n", 475 | "teacher 0.179209234318\n", 476 | "engineer -0.0803928049452\n", 477 | "pilot 0.00107644989919\n", 478 | "computer -0.103303588739\n", 479 | "singer 0.185005181365\n" 480 | ] 481 | } 482 | ], 483 | "source": [ 484 | "print('Other words and their similarities:')\n", 485 | "word_list = ['lipstick', 'guns', 'science', 'arts', 'literature', 'warrior','doctor', 'tree', 'receptionist', \n", 486 | " 'technology', 'fashion', 'teacher', 'engineer', 'pilot', 'computer', 'singer']\n", 487 | "for w in word_list:\n", 488 | " print (w, cosine_similarity(word_to_vec_map[w], g))" 489 | ] 490 | }, 491 | { 492 | "cell_type": "markdown", 493 | "metadata": {}, 494 | "source": [ 495 | "Do you notice anything surprising? It is astonishing how these results reflect certain unhealthy gender stereotypes. For example, \"computer\" is closer to \"man\" while \"literature\" is closer to \"woman\". Ouch! \n", 496 | "\n", 497 | "We'll see below how to reduce the bias of these vectors, using an algorithm due to [Boliukbasi et al., 2016](https://arxiv.org/abs/1607.06520). Note that some word pairs such as \"actor\"/\"actress\" or \"grandmother\"/\"grandfather\" should remain gender specific, while other words such as \"receptionist\" or \"technology\" should be neutralized, i.e. not be gender-related. You will have to treat these two type of words differently when debiasing.\n", 498 | "\n", 499 | "### 3.1 - Neutralize bias for non-gender specific words \n", 500 | "\n", 501 | "The figure below should help you visualize what neutralizing does. If you're using a 50-dimensional word embedding, the 50 dimensional space can be split into two parts: The bias-direction $g$, and the remaining 49 dimensions, which we'll call $g_{\\perp}$. In linear algebra, we say that the 49 dimensional $g_{\\perp}$ is perpendicular (or \"othogonal\") to $g$, meaning it is at 90 degrees to $g$. The neutralization step takes a vector such as $e_{receptionist}$ and zeros out the component in the direction of $g$, giving us $e_{receptionist}^{debiased}$. \n", 502 | "\n", 503 | "Even though $g_{\\perp}$ is 49 dimensional, given the limitations of what we can draw on a screen, we illustrate it using a 1 dimensional axis below. \n", 504 | "\n", 505 | "\n", 506 | "
**Figure 2**: The word vector for \"receptionist\" represented before and after applying the neutralize operation.
\n", 507 | "\n", 508 | "**Exercise**: Implement `neutralize()` to remove the bias of words such as \"receptionist\" or \"scientist\". Given an input embedding $e$, you can use the following formulas to compute $e^{debiased}$: \n", 509 | "\n", 510 | "$$e^{bias\\_component} = \\frac{e*g}{||g||_2^2} * g\\tag{2}$$\n", 511 | "$$e^{debiased} = e - e^{bias\\_component}\\tag{3}$$\n", 512 | "\n", 513 | "If you are an expert in linear algebra, you may recognize $e^{bias\\_component}$ as the projection of $e$ onto the direction $g$. If you're not an expert in linear algebra, don't worry about this.\n", 514 | "\n", 515 | " " 520 | ] 521 | }, 522 | { 523 | "cell_type": "code", 524 | "execution_count": 16, 525 | "metadata": { 526 | "collapsed": true 527 | }, 528 | "outputs": [], 529 | "source": [ 530 | "def neutralize(word, g, word_to_vec_map):\n", 531 | " \"\"\"\n", 532 | " Removes the bias of \"word\" by projecting it on the space orthogonal to the bias axis. \n", 533 | " This function ensures that gender neutral words are zero in the gender subspace.\n", 534 | " \n", 535 | " Arguments:\n", 536 | " word -- string indicating the word to debias\n", 537 | " g -- numpy-array of shape (50,), corresponding to the bias axis (such as gender)\n", 538 | " word_to_vec_map -- dictionary mapping words to their corresponding vectors.\n", 539 | " \n", 540 | " Returns:\n", 541 | " e_debiased -- neutralized word vector representation of the input \"word\"\n", 542 | " \"\"\"\n", 543 | " \n", 544 | " ### START CODE HERE ###\n", 545 | " # Select word vector representation of \"word\". Use word_to_vec_map. (≈ 1 line)\n", 546 | " e = word_to_vec_map[word]\n", 547 | " \n", 548 | " # Compute e_biascomponent using the formula give above. (≈ 1 line)\n", 549 | " e_biascomponent = np.dot(e, g)/np.linalg.norm(g)**2*g\n", 550 | " \n", 551 | " # Neutralize e by substracting e_biascomponent from it \n", 552 | " # e_debiased should be equal to its orthogonal projection. (≈ 1 line)\n", 553 | " e_debiased = e - e_biascomponent\n", 554 | " ### END CODE HERE ###\n", 555 | " \n", 556 | " return e_debiased" 557 | ] 558 | }, 559 | { 560 | "cell_type": "code", 561 | "execution_count": 17, 562 | "metadata": {}, 563 | "outputs": [ 564 | { 565 | "name": "stdout", 566 | "output_type": "stream", 567 | "text": [ 568 | "cosine similarity between receptionist and g, before neutralizing: 0.330779417506\n", 569 | "cosine similarity between receptionist and g, after neutralizing: -3.26732746085e-17\n" 570 | ] 571 | } 572 | ], 573 | "source": [ 574 | "e = \"receptionist\"\n", 575 | "print(\"cosine similarity between \" + e + \" and g, before neutralizing: \", cosine_similarity(word_to_vec_map[\"receptionist\"], g))\n", 576 | "\n", 577 | "e_debiased = neutralize(\"receptionist\", g, word_to_vec_map)\n", 578 | "print(\"cosine similarity between \" + e + \" and g, after neutralizing: \", cosine_similarity(e_debiased, g))" 579 | ] 580 | }, 581 | { 582 | "cell_type": "markdown", 583 | "metadata": {}, 584 | "source": [ 585 | "**Expected Output**: The second result is essentially 0, up to numerical roundof (on the order of $10^{-17}$).\n", 586 | "\n", 587 | "\n", 588 | "\n", 589 | " \n", 590 | " \n", 593 | " \n", 596 | " \n", 597 | " \n", 598 | " \n", 601 | " \n", 604 | "
\n", 591 | " **cosine similarity between receptionist and g, before neutralizing:** :\n", 592 | " \n", 594 | " 0.330779417506\n", 595 | "
\n", 599 | " **cosine similarity between receptionist and g, after neutralizing:** :\n", 600 | " \n", 602 | " -3.26732746085e-17\n", 603 | "
" 605 | ] 606 | }, 607 | { 608 | "cell_type": "markdown", 609 | "metadata": {}, 610 | "source": [ 611 | "### 3.2 - Equalization algorithm for gender-specific words\n", 612 | "\n", 613 | "Next, lets see how debiasing can also be applied to word pairs such as \"actress\" and \"actor.\" Equalization is applied to pairs of words that you might want to have differ only through the gender property. As a concrete example, suppose that \"actress\" is closer to \"babysit\" than \"actor.\" By applying neutralizing to \"babysit\" we can reduce the gender-stereotype associated with babysitting. But this still does not guarantee that \"actor\" and \"actress\" are equidistant from \"babysit.\" The equalization algorithm takes care of this. \n", 614 | "\n", 615 | "The key idea behind equalization is to make sure that a particular pair of words are equi-distant from the 49-dimensional $g_\\perp$. The equalization step also ensures that the two equalized steps are now the same distance from $e_{receptionist}^{debiased}$, or from any other work that has been neutralized. In pictures, this is how equalization works: \n", 616 | "\n", 617 | "\n", 618 | "\n", 619 | "\n", 620 | "The derivation of the linear algebra to do this is a bit more complex. (See Bolukbasi et al., 2016 for details.) But the key equations are: \n", 621 | "\n", 622 | "$$ \\mu = \\frac{e_{w1} + e_{w2}}{2}\\tag{4}$$ \n", 623 | "\n", 624 | "$$ \\mu_{B} = \\frac {\\mu * \\text{bias_axis}}{||\\text{bias_axis}||_2} + ||\\text{bias_axis}||_2 *\\text{bias_axis}\n", 625 | "\\tag{5}$$ \n", 626 | "\n", 627 | "$$\\mu_{\\perp} = \\mu - \\mu_{B} \\tag{6}$$\n", 628 | "\n", 629 | "\n", 630 | "$$e_{w1B} = \\sqrt{ |{1 - ||\\mu_{\\perp} ||^2_2} |} * \\frac{(e_{\\text{w1}} - \\mu_{\\perp}) - \\mu_B} {|(e_{w1} - \\mu_{\\perp}) - \\mu_B)|} \\tag{7}$$\n", 631 | "\n", 632 | "\n", 633 | "$$e_{w2B} = \\sqrt{ |{1 - ||\\mu_{\\perp} ||^2_2} |} * \\frac{(e_{\\text{w2}} - \\mu_{\\perp}) - \\mu_B} {|(e_{w2} - \\mu_{\\perp}) - \\mu_B)|} \\tag{8}$$\n", 634 | "\n", 635 | "$$e_1 = e_{w1B} + \\mu_{\\perp} \\tag{9}$$\n", 636 | "$$e_2 = e_{w2B} + \\mu_{\\perp} \\tag{10}$$\n", 637 | "\n", 638 | "\n", 639 | "**Exercise**: Implement the function below. Use the equations above to get the final equalized version of the pair of words. Good luck!" 640 | ] 641 | }, 642 | { 643 | "cell_type": "code", 644 | "execution_count": 52, 645 | "metadata": { 646 | "collapsed": true 647 | }, 648 | "outputs": [], 649 | "source": [ 650 | "def equalize(pair, bias_axis, word_to_vec_map):\n", 651 | " \"\"\"\n", 652 | " Debias gender specific words by following the equalize method described in the figure above.\n", 653 | " \n", 654 | " Arguments:\n", 655 | " pair -- pair of strings of gender specific words to debias, e.g. (\"actress\", \"actor\") \n", 656 | " bias_axis -- numpy-array of shape (50,), vector corresponding to the bias axis, e.g. gender\n", 657 | " word_to_vec_map -- dictionary mapping words to their corresponding vectors\n", 658 | " \n", 659 | " Returns\n", 660 | " e_1 -- word vector corresponding to the first word\n", 661 | " e_2 -- word vector corresponding to the second word\n", 662 | " \"\"\"\n", 663 | " \n", 664 | " ### START CODE HERE ###\n", 665 | " # Step 1: Select word vector representation of \"word\". Use word_to_vec_map. (≈ 2 lines)\n", 666 | " w1, w2 = pair\n", 667 | " e_w1, e_w2 = word_to_vec_map[w1], word_to_vec_map[w2]\n", 668 | " \n", 669 | " # Step 2: Compute the mean of e_w1 and e_w2 (≈ 1 line)\n", 670 | " mu = (e_w1 + e_w2)*0.5\n", 671 | "\n", 672 | " # Step 3: Compute the projections of mu over the bias axis and the orthogonal axis (≈ 2 lines)\n", 673 | " mu_B = mu*bias_axis/np.linalg.norm(bias_axis) + np.linalg.norm(bias_axis)*bias_axis\n", 674 | " mu_orth = mu - mu_B\n", 675 | "\n", 676 | " # Step 4: Set e1_orth and e2_orth to be equal to mu_orth (≈2 lines)\n", 677 | " e1_orth = (e_w1-mu_orth-mu_B)/np.abs(e_w1-mu_orth-mu_B)\n", 678 | " e2_orth = (e_w2-mu_orth-mu_B)/np.abs(e_w2-mu_orth-mu_B)\n", 679 | " \n", 680 | " # Step 5: Adjust the Bias part of u1 and u2 using the formulas given in the figure above (≈2 lines)\n", 681 | " e_w1B = np.sqrt(np.abs(1-np.linalg.norm(mu_orth)**2))*e1_orth\n", 682 | " e_w2B = np.sqrt(np.abs(1-np.linalg.norm(mu_orth)**2))*e2_orth\n", 683 | " \n", 684 | " # Step 6: Debias by equalizing u1 and u2 to the sum of their projections (≈2 lines)\n", 685 | " e1 = e_w1B + mu_orth\n", 686 | " e2 = e_w2B + mu_orth\n", 687 | " ### END CODE HERE ###\n", 688 | " \n", 689 | " return e1, e2" 690 | ] 691 | }, 692 | { 693 | "cell_type": "code", 694 | "execution_count": 53, 695 | "metadata": { 696 | "scrolled": true 697 | }, 698 | "outputs": [ 699 | { 700 | "name": "stdout", 701 | "output_type": "stream", 702 | "text": [ 703 | "cosine similarities before equalizing:\n", 704 | "cosine_similarity(word_to_vec_map[\"man\"], gender) = -0.117110957653\n", 705 | "cosine_similarity(word_to_vec_map[\"woman\"], gender) = 0.356666188463\n", 706 | "\n", 707 | "cosine similarities after equalizing:\n", 708 | "cosine_similarity(e1, gender) = -0.769973866789\n", 709 | "cosine_similarity(e2, gender) = 0.684163506927\n" 710 | ] 711 | } 712 | ], 713 | "source": [ 714 | "print(\"cosine similarities before equalizing:\")\n", 715 | "print(\"cosine_similarity(word_to_vec_map[\\\"man\\\"], gender) = \", cosine_similarity(word_to_vec_map[\"man\"], g))\n", 716 | "print(\"cosine_similarity(word_to_vec_map[\\\"woman\\\"], gender) = \", cosine_similarity(word_to_vec_map[\"woman\"], g))\n", 717 | "print()\n", 718 | "e1, e2 = equalize((\"man\", \"woman\"), g, word_to_vec_map)\n", 719 | "print(\"cosine similarities after equalizing:\")\n", 720 | "print(\"cosine_similarity(e1, gender) = \", cosine_similarity(e1, g))\n", 721 | "print(\"cosine_similarity(e2, gender) = \", cosine_similarity(e2, g))" 722 | ] 723 | }, 724 | { 725 | "cell_type": "markdown", 726 | "metadata": {}, 727 | "source": [ 728 | "**Expected Output**:\n", 729 | "\n", 730 | "cosine similarities before equalizing:\n", 731 | "\n", 732 | " \n", 733 | " \n", 736 | " \n", 739 | " \n", 740 | " \n", 741 | " \n", 744 | " \n", 747 | " \n", 748 | "
\n", 734 | " **cosine_similarity(word_to_vec_map[\"man\"], gender)** =\n", 735 | " \n", 737 | " -0.117110957653\n", 738 | "
\n", 742 | " **cosine_similarity(word_to_vec_map[\"woman\"], gender)** =\n", 743 | " \n", 745 | " 0.356666188463\n", 746 | "
\n", 749 | "\n", 750 | "cosine similarities after equalizing:\n", 751 | "\n", 752 | " \n", 753 | " \n", 756 | " \n", 759 | " \n", 760 | " \n", 761 | " \n", 764 | " \n", 767 | " \n", 768 | "
\n", 754 | " **cosine_similarity(u1, gender)** =\n", 755 | " \n", 757 | " -0.700436428931\n", 758 | "
\n", 762 | " **cosine_similarity(u2, gender)** =\n", 763 | " \n", 765 | " 0.700436428931\n", 766 | "
" 769 | ] 770 | }, 771 | { 772 | "cell_type": "markdown", 773 | "metadata": { 774 | "collapsed": true 775 | }, 776 | "source": [ 777 | "Please feel free to play with the input words in the cell above, to apply equalization to other pairs of words. \n", 778 | "\n", 779 | "These debiasing algorithms are very helpful for reducing bias, but are not perfect and do not eliminate all traces of bias. For example, one weakness of this implementation was that the bias direction $g$ was defined using only the pair of words _woman_ and _man_. As discussed earlier, if $g$ were defined by computing $g_1 = e_{woman} - e_{man}$; $g_2 = e_{mother} - e_{father}$; $g_3 = e_{girl} - e_{boy}$; and so on and averaging over them, you would obtain a better estimate of the \"gender\" dimension in the 50 dimensional word embedding space. Feel free to play with such variants as well. \n", 780 | " " 781 | ] 782 | }, 783 | { 784 | "cell_type": "markdown", 785 | "metadata": {}, 786 | "source": [ 787 | "### Congratulations\n", 788 | "\n", 789 | "You have come to the end of this notebook, and have seen a lot of the ways that word vectors can be used as well as modified. \n", 790 | "\n", 791 | "Congratulations on finishing this notebook! \n" 792 | ] 793 | }, 794 | { 795 | "cell_type": "markdown", 796 | "metadata": {}, 797 | "source": [ 798 | "**References**:\n", 799 | "- The debiasing algorithm is from Bolukbasi et al., 2016, [Man is to Computer Programmer as Woman is to\n", 800 | "Homemaker? Debiasing Word Embeddings](https://papers.nips.cc/paper/6228-man-is-to-computer-programmer-as-woman-is-to-homemaker-debiasing-word-embeddings.pdf)\n", 801 | "- The GloVe word embeddings were due to Jeffrey Pennington, Richard Socher, and Christopher D. Manning. (https://nlp.stanford.edu/projects/glove/)\n" 802 | ] 803 | } 804 | ], 805 | "metadata": { 806 | "coursera": { 807 | "course_slug": "nlp-sequence-models", 808 | "graded_item_id": "8hb5s", 809 | "launcher_item_id": "5NrJ6" 810 | }, 811 | "kernelspec": { 812 | "display_name": "Python 3", 813 | "language": "python", 814 | "name": "python3" 815 | }, 816 | "language_info": { 817 | "codemirror_mode": { 818 | "name": "ipython", 819 | "version": 3 820 | }, 821 | "file_extension": ".py", 822 | "mimetype": "text/x-python", 823 | "name": "python", 824 | "nbconvert_exporter": "python", 825 | "pygments_lexer": "ipython3", 826 | "version": "3.6.0" 827 | } 828 | }, 829 | "nbformat": 4, 830 | "nbformat_minor": 2 831 | } 832 | --------------------------------------------------------------------------------