├── README.md
├── course-note.pdf
├── notebook
├── Art Generation with Neural Style Transfer.ipynb
├── Autonomous driving application Car detection.ipynb
├── Building a Recurrent Neural Network Step by Step v3.ipynb
├── Building your Deep Neural Network Step by Step v3.ipynb
├── Convolution model Step by Stepv1.ipynb
├── Convolution model Application v1.ipynb
├── Deep Neural Network Application v3.ipynb
├── Dinosaurus Island Character level language model final v3.ipynb
├── Emojify.ipynb
├── Face Recognition for the Happy House.ipynb
├── Gradient Checking.ipynb
├── Initialization.ipynb
├── Keras Tutorial Happy House v2.ipynb
├── Logistic Regression with a Neural Network mindset v3.ipynb
├── Neural machine translation with attention.ipynb
├── Operations on word vectors.ipynb
├── Optimization methods.ipynb
├── Planar data classification with one hidden layer v3.ipynb
├── Python Basics With Numpy v3.ipynb
├── Regularization.ipynb
├── Residual Networks .ipynb
├── Tensorflow Tutorial.ipynb
└── Trigger word detection.ipynb
└── py
├── Art Generation with Neural Style Transfer.py.html
├── Autonomous driving application Car detection.py.html
├── Building a Recurrent Neural Network Step by Step v3.py
├── Building your Deep Neural Network Step by Step v3.py
├── Convolution model Application v1.py.html
├── Convolution model Step by Step v1.py.html
├── Deep Neural Network Application v3.py
├── Dinosaurus Island Character level language model final v3.py
├── Emojify.py
├── Face Recognition for the Happy House.py.html
├── Gradient Checking.py
├── Initialization.py
├── Keras Tutorial Happy House v2.py.html
├── Logistic Regression with a Neural Network mindset v3.py
├── Neural machine translation with attention.py
├── Operations on word vectors.py
├── Optimization methods.py
├── Planar data classification with one hidden layer v3.py
├── Python Basics With Numpy v3.py
├── Regularization.py
├── Residual Networks .py.html
├── Tensorflow Tutorial.py
└── Trigger word detection.py
/README.md:
--------------------------------------------------------------------------------
1 | # Neural-Networks-and-Deep-Learning
2 | * This is my assignment on Andrew Ng's special course "[***Deep Learning Specialization***](https://www.coursera.org/specializations/deep-learning)" This special course consists of five courses:
3 | * [***Neural Networks and Deep Learning***](https://www.coursera.org/learn/neural-networks-deep-learning/home/welcome)
4 | * [***Improving Deep Neural Networks: Hyperparameter tuning, Regularization and Optimization***](https://www.coursera.org/learn/deep-neural-network/home/welcome)
5 | * [***Structuring Machine Learning Projects***](https://www.coursera.org/learn/machine-learning-projects/home/welcome)
6 | * [***Convolutional Neural Networks***](https://www.coursera.org/learn/convolutional-neural-networks)
7 | * [***Sequence Models***](https://www.coursera.org/learn/nlp-sequence-models)
8 |
9 | * [***Neural Networks and Deep Learning***](https://www.coursera.org/learn/neural-networks-deep-learning/home/welcome)
10 | * In this course, you will learn the foundations of deep learning. When you finish this class, you will:
11 | * Understand the major technology trends driving Deep Learning
12 | * Be able to build, train and apply fully connected deep neural networks
13 | * Know how to implement efficient (vectorized) neural networks
14 | * Understand the key parameters in a neural network's architecture
15 |
16 | * This course also teaches you how Deep Learning actually works, rather than presenting only a cursory or surface-level description. So after completing it, you will be able to apply deep learning to a your own applications. If you are looking for a job in AI, after this course you will also be able to answer basic interview questions.
17 |
18 | * Programming Assignments:
19 | * Python Basics with Numpy [[notebook]](https://github.com/fanghao6666/neural-networks-and-deep-learning/blob/master/notebook/Python%20Basics%20With%20Numpy%20v3.ipynb) [[py]](https://github.com/fanghao6666/neural-networks-and-deep-learning/blob/master/py/Python%20Basics%20With%20Numpy%20v3.py)
20 | * Logistic Regression with a Neural Network mindset v3 [[notebook]](https://github.com/fanghao6666/neural-networks-and-deep-learning/blob/master/notebook/Logistic%20Regression%20with%20a%20Neural%20Network%20mindset%20v3.ipynb) [[py]](https://github.com/fanghao6666/neural-networks-and-deep-learning/blob/master/py/Logistic%20Regression%20with%20a%20Neural%20Network%20mindset%20v3.py)
21 | * Planar data classification with one hidden layer v3 [[notebook]](https://github.com/fanghao6666/neural-networks-and-deep-learning/blob/master/notebook/Planar%20data%20classification%20with%20one%20hidden%20layer%20v3.ipynb) [[py]](https://github.com/fanghao6666/neural-networks-and-deep-learning/blob/master/py/Planar%20data%20classification%20with%20one%20hidden%20layer%20v3.py)
22 | * Building your Deep Neural Network Step by Step v3 [[notebook]](https://github.com/fanghao6666/neural-networks-and-deep-learning/blob/master/notebook/Building%20your%20Deep%20Neural%20Network%20Step%20by%20Step%20v3.ipynb) [[py]](https://github.com/fanghao6666/neural-networks-and-deep-learning/blob/master/py/Building%20your%20Deep%20Neural%20Network%20Step%20by%20Step%20v3.py)
23 | * Deep Neural Network Application v3 [[notebook]](https://github.com/fanghao6666/neural-networks-and-deep-learning/blob/master/notebook/Deep%20Neural%20Network%20Application%20v3.ipynb) [[py]](https://github.com/fanghao6666/neural-networks-and-deep-learning/blob/master/py/Deep%20Neural%20Network%20Application%20v3.py)
24 |
25 | * [***Improving Deep Neural Networks: Hyperparameter tuning, Regularization and Optimization***](https://www.coursera.org/learn/deep-neural-network/home/welcome)
26 | * When you finish this class, you will:
27 | * Understand industry best-practices for building deep learning applications.
28 | * Be able to effectively use the common neural network "tricks", including initialization, L2 and dropout regularization, Batch normalization, gradient checking,
29 | * Be able to implement and apply a variety of optimization algorithms, such as mini-batch gradient descent, Momentum, RMSprop and Adam, and check for their convergence.
30 | * Understand new best-practices for the deep learning era of how to set up train/dev/test sets and analyze bias/variance
31 | * Be able to implement a neural network in TensorFlow.
32 | * Programming Assignments:
33 | * Initialization [[notebook]](https://github.com/fanghao6666/neural-networks-and-deep-learning/blob/master/notebook/Initialization.ipynb) [[py]](https://github.com/fanghao6666/neural-networks-and-deep-learning/blob/master/py/Initialization.py)
34 | * Regularization [[notebook]](https://github.com/fanghao6666/neural-networks-and-deep-learning/blob/master/notebook/Regularization.ipynb) [[py]](https://github.com/fanghao6666/neural-networks-and-deep-learning/blob/master/py/Regularization.py)
35 | * Gradient Checking [[notebook]](https://github.com/fanghao6666/neural-networks-and-deep-learning/blob/master/notebook/Gradient%20Checking.ipynb) [[py]](https://github.com/fanghao6666/neural-networks-and-deep-learning/blob/master/py/Gradient%20Checking.py)
36 | * Optimization [[notebook]](https://github.com/fanghao6666/neural-networks-and-deep-learning/blob/master/notebook/Optimization%20methods.ipynb) [[py]](https://github.com/fanghao6666/neural-networks-and-deep-learning/blob/master/py/Optimization%20methods.py)
37 | * Tensorflow Tutorial [[notebook]](https://github.com/fanghao6666/neural-networks-and-deep-learning/blob/master/notebook/Tensorflow%20Tutorial.ipynb) [[py]](https://github.com/fanghao6666/neural-networks-and-deep-learning/blob/master/py/Tensorflow%20Tutorial.py)
38 |
39 | * [***Structuring Machine Learning Projects***](https://www.coursera.org/learn/machine-learning-projects/home/welcome)
40 | * You will learn how to build a successful machine learning project. If you aspire to be a technical leader in AI, and know how to set direction for your team's work, this course will show you how.Much of this content has never been taught elsewhere, and is drawn from my experience building and shipping many deep learning products. This course also has two "flight simulators" that let you practice decision-making as a machine learning project leader. This provides "industry experience" that you might otherwise get only after years of ML work experience.
41 | * When you finish this class, you will:
42 | * Understand how to diagnose errors in a machine learning system, and be able to prioritize the most promising directions for reducing error.
43 | * Understand complex ML settings, such as mismatched training/test sets, and comparing to and/or surpassing human-level performance
44 | * Know how to apply end-to-end learning, transfer learning, and multi-task learning
45 | * [***Convolutional Neural Networks***](https://www.coursera.org/learn/convolutional-neural-networks/home/welcome)
46 | * This course will teach you how to build convolutional neural networks and apply it to image data. Thanks to deep learning, computer vision is working far better than just two years ago, and this is enabling numerous exciting applications ranging from safe autonomous driving, to accurate face recognition, to automatic reading of radiology images.
47 | * When you finish this class,you will:
48 | * Understand how to build a convolutional neural network, including recent variations such as residual networks.
49 | * Know how to apply convolutional networks to visual detection and recognition tasks.
50 | * Know to use neural style transfer to generate art.
51 | * Be able to apply these algorithms to a variety of image, video, and other 2D or 3D data.
52 | * Programming Assignments:
53 | * Convolutional Model:step by step [[notebook]](https://github.com/fanghao6666/neural-networks-and-deep-learning/blob/master/notebook/ConvolutionmodeStepbyStepv1.ipynb) [[py]](https://github.com/fanghao6666/neural-networks-and-deep-learning/blob/master/py/Convolution%20model%20Step%20by%20Step%20v1.py.html)
54 | * Convolution model Application [[notebook]](https://github.com/fanghao6666/neural-networks-and-deep-learning/blob/master/notebook/Convolution%20model%20Application%20v1.ipynb) [[py]](https://github.com/fanghao6666/neural-networks-and-deep-learning/blob/master/py/Convolution%20model%20Application%20v1.py.html)
55 | * Keras Tutorial Happy House [[notebook]](https://github.com/fanghao6666/neural-networks-and-deep-learning/blob/master/notebook/Keras%20Tutorial%20Happy%20House%20v2.ipynb) [[py]](https://github.com/fanghao6666/neural-networks-and-deep-learning/blob/master/py/Keras%20Tutorial%20Happy%20House%20v2.py.html)
56 | * Residual Networks [[notebook]](https://github.com/fanghao6666/neural-networks-and-deep-learning/blob/master/notebook/Residual%20Networks%20.ipynb) [[py]](https://github.com/fanghao6666/neural-networks-and-deep-learning/blob/master/py/Residual%20Networks%20.py.html)
57 | * Autonomous driving application Car detection [[notebook]](https://github.com/fanghao6666/neural-networks-and-deep-learning/blob/master/notebook/Autonomous%20driving%20application%20Car%20detection.ipynb) [[py]](https://github.com/fanghao6666/neural-networks-and-deep-learning/blob/master/py/Autonomous%20driving%20application%20Car%20detection.py.html)
58 | * Art Generation with Neural Style Transfer [[notebook]](https://github.com/fanghao6666/neural-networks-and-deep-learning/blob/master/notebook/Art%20Generation%20with%20Neural%20Style%20Transfer.ipynb) [[py]](https://github.com/fanghao6666/neural-networks-and-deep-learning/blob/master/py/Art%20Generation%20with%20Neural%20Style%20Transfer.py.html)
59 | * Face Recognition for the Happy House [[notebook]](https://github.com/fanghao6666/neural-networks-and-deep-learning/blob/master/notebook/Face%20Recognition%20for%20the%20Happy%20House.ipynb) [[py]](https://github.com/fanghao6666/neural-networks-and-deep-learning/blob/master/py/Face%20Recognition%20for%20the%20Happy%20House.py.html)
60 | * [***Sequence Models***](https://www.coursera.org/learn/nlp-sequence-models/home/welcome)
61 | * This course will teach you how to build models for natural language, audio, and other sequence data. Thanks to deep learning, sequence algorithms are working far better than just two years ago, and this is enabling numerous exciting applications in speech recognition, music synthesis, chatbots, machine translation, natural language understanding, and many others.
62 | * When you finish this class,you will:
63 | * Understand how to build and train Recurrent Neural Networks (RNNs), and commonly-used variants such as GRUs and LSTMs.
64 | * Be able to apply sequence models to natural language problems, including text synthesis.
65 | * Be able to apply sequence models to audio applications, including speech recognition and music synthesis.
66 | * Programming Assignments:
67 | * Building a Recurrent Neural Network Step by Step [[notebook]](https://github.com/fanghao6666/neural-networks-and-deep-learning/blob/master/notebook/Building%20a%20Recurrent%20Neural%20Network%20Step%20by%20Step%20v3.ipynb) [[py]](https://github.com/fanghao6666/neural-networks-and-deep-learning/blob/master/py/Building%20a%20Recurrent%20Neural%20Network%20Step%20by%20Step%20v3.py)
68 | * Dinosaurus Island Character level language model [[notebook]](https://github.com/fanghao6666/neural-networks-and-deep-learning/blob/master/notebook/Dinosaurus%20Island%20Character%20level%20language%20model%20final%20v3.ipynb) [[py]](https://github.com/fanghao6666/neural-networks-and-deep-learning/blob/master/py/Dinosaurus%20Island%20Character%20level%20language%20model%20final%20v3.py)
69 | * Operations on word vectors [[notebook]](https://github.com/fanghao6666/neural-networks-and-deep-learning/blob/master/notebook/Operations%20on%20word%20vectors.ipynb) [[py]](https://github.com/fanghao6666/neural-networks-and-deep-learning/blob/master/py/Operations%20on%20word%20vectors.py)
70 | * Emojify [[notebook]](https://github.com/fanghao6666/neural-networks-and-deep-learning/blob/master/notebook/Emojify.ipynb) [[py]](https://github.com/fanghao6666/neural-networks-and-deep-learning/blob/master/py/Emojify.py)
71 | * Neural machine translation with attention [[notebook]](https://github.com/fanghao6666/neural-networks-and-deep-learning/blob/master/notebook/Neural%20machine%20translation%20with%20attention.ipynb) [[py]](https://github.com/fanghao6666/neural-networks-and-deep-learning/blob/master/py/Neural%20machine%20translation%20with%20attention.py)
72 | * Trigger word detection [[notebook]](https://github.com/fanghao6666/neural-networks-and-deep-learning/blob/master/notebook/Trigger%20word%20detection.ipynb) [[py]](https://github.com/fanghao6666/neural-networks-and-deep-learning/blob/master/py/Trigger%20word%20detection.py)
73 |
74 |
75 |
76 |
77 |
78 |
--------------------------------------------------------------------------------
/course-note.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/fanghao6666/neural-networks-and-deep-learning/d85e68e1fcb6e3051cb73f32f1432d1a068701be/course-note.pdf
--------------------------------------------------------------------------------
/py/Convolution model Application v1.py.html:
--------------------------------------------------------------------------------
1 |
2 | # coding: utf-8
3 |
4 | # # Convolutional Neural Networks: Application
5 | #
6 | # Welcome to Course 4's second assignment! In this notebook, you will:
7 | #
8 | # - Implement helper functions that you will use when implementing a TensorFlow model
9 | # - Implement a fully functioning ConvNet using TensorFlow
10 | #
11 | # **After this assignment you will be able to:**
12 | #
13 | # - Build and train a ConvNet in TensorFlow for a classification problem
14 | #
15 | # We assume here that you are already familiar with TensorFlow. If you are not, please refer the *TensorFlow Tutorial* of the third week of Course 2 ("*Improving deep neural networks*").
16 |
17 | # ## 1.0 - TensorFlow model
18 | #
19 | # In the previous assignment, you built helper functions using numpy to understand the mechanics behind convolutional neural networks. Most practical applications of deep learning today are built using programming frameworks, which have many built-in functions you can simply call.
20 | #
21 | # As usual, we will start by loading in the packages.
22 |
23 | # In[1]:
24 |
25 | import math
26 | import numpy as np
27 | import h5py
28 | import matplotlib.pyplot as plt
29 | import scipy
30 | from PIL import Image
31 | from scipy import ndimage
32 | import tensorflow as tf
33 | from tensorflow.python.framework import ops
34 | from cnn_utils import *
35 |
36 | get_ipython().magic('matplotlib inline')
37 | np.random.seed(1)
38 |
39 |
40 | # Run the next cell to load the "SIGNS" dataset you are going to use.
41 |
42 | # In[2]:
43 |
44 | # Loading the data (signs)
45 | X_train_orig, Y_train_orig, X_test_orig, Y_test_orig, classes = load_dataset()
46 |
47 |
48 | # As a reminder, the SIGNS dataset is a collection of 6 signs representing numbers from 0 to 5.
49 | #
50 | #
51 | #
52 | # The next cell will show you an example of a labelled image in the dataset. Feel free to change the value of `index` below and re-run to see different examples.
53 |
54 | # In[5]:
55 |
56 | # Example of a picture
57 | index = 10
58 | plt.imshow(X_train_orig[index])
59 | print ("y = " + str(np.squeeze(Y_train_orig[:, index])))
60 |
61 |
62 | # In Course 2, you had built a fully-connected network for this dataset. But since this is an image dataset, it is more natural to apply a ConvNet to it.
63 | #
64 | # To get started, let's examine the shapes of your data.
65 |
66 | # In[6]:
67 |
68 | X_train = X_train_orig/255.
69 | X_test = X_test_orig/255.
70 | Y_train = convert_to_one_hot(Y_train_orig, 6).T
71 | Y_test = convert_to_one_hot(Y_test_orig, 6).T
72 | print ("number of training examples = " + str(X_train.shape[0]))
73 | print ("number of test examples = " + str(X_test.shape[0]))
74 | print ("X_train shape: " + str(X_train.shape))
75 | print ("Y_train shape: " + str(Y_train.shape))
76 | print ("X_test shape: " + str(X_test.shape))
77 | print ("Y_test shape: " + str(Y_test.shape))
78 | conv_layers = {}
79 |
80 |
81 | # ### 1.1 - Create placeholders
82 | #
83 | # TensorFlow requires that you create placeholders for the input data that will be fed into the model when running the session.
84 | #
85 | # **Exercise**: Implement the function below to create placeholders for the input image X and the output Y. You should not define the number of training examples for the moment. To do so, you could use "None" as the batch size, it will give you the flexibility to choose it later. Hence X should be of dimension **[None, n_H0, n_W0, n_C0]** and Y should be of dimension **[None, n_y]**. [Hint](https://www.tensorflow.org/api_docs/python/tf/placeholder).
86 |
87 | # In[19]:
88 |
89 | # GRADED FUNCTION: create_placeholders
90 |
91 | def create_placeholders(n_H0, n_W0, n_C0, n_y):
92 | """
93 | Creates the placeholders for the tensorflow session.
94 |
95 | Arguments:
96 | n_H0 -- scalar, height of an input image
97 | n_W0 -- scalar, width of an input image
98 | n_C0 -- scalar, number of channels of the input
99 | n_y -- scalar, number of classes
100 |
101 | Returns:
102 | X -- placeholder for the data input, of shape [None, n_H0, n_W0, n_C0] and dtype "float"
103 | Y -- placeholder for the input labels, of shape [None, n_y] and dtype "float"
104 | """
105 |
106 | ### START CODE HERE ### (≈2 lines)
107 | X = tf.placeholder(tf.float32,[None, n_H0, n_W0, n_C0])
108 | Y = tf.placeholder(tf.float32,
109 | [None, n_y])
110 | ### END CODE HERE ###
111 |
112 | return X, Y
113 |
114 |
115 | # In[20]:
116 |
117 | X, Y = create_placeholders(64, 64, 3, 6)
118 | print ("X = " + str(X))
119 | print ("Y = " + str(Y))
120 |
121 |
122 | # **Expected Output**
123 | #
124 | #
125 | #
126 | #
127 | # X = Tensor("Placeholder:0", shape=(?, 64, 64, 3), dtype=float32)
128 | #
129 | # |
130 | #
131 | #
132 | #
133 | # Y = Tensor("Placeholder_1:0", shape=(?, 6), dtype=float32)
134 | #
135 | # |
136 | #
137 | #
138 |
139 | # ### 1.2 - Initialize parameters
140 | #
141 | # You will initialize weights/filters $W1$ and $W2$ using `tf.contrib.layers.xavier_initializer(seed = 0)`. You don't need to worry about bias variables as you will soon see that TensorFlow functions take care of the bias. Note also that you will only initialize the weights/filters for the conv2d functions. TensorFlow initializes the layers for the fully connected part automatically. We will talk more about that later in this assignment.
142 | #
143 | # **Exercise:** Implement initialize_parameters(). The dimensions for each group of filters are provided below. Reminder - to initialize a parameter $W$ of shape [1,2,3,4] in Tensorflow, use:
144 | # ```python
145 | # W = tf.get_variable("W", [1,2,3,4], initializer = ...)
146 | # ```
147 | # [More Info](https://www.tensorflow.org/api_docs/python/tf/get_variable).
148 |
149 | # In[25]:
150 |
151 | # GRADED FUNCTION: initialize_parameters
152 |
153 | def initialize_parameters():
154 | """
155 | Initializes weight parameters to build a neural network with tensorflow. The shapes are:
156 | W1 : [4, 4, 3, 8]
157 | W2 : [2, 2, 8, 16]
158 | Returns:
159 | parameters -- a dictionary of tensors containing W1, W2
160 | """
161 |
162 | tf.set_random_seed(1) # so that your "random" numbers match ours
163 |
164 | ### START CODE HERE ### (approx. 2 lines of code)
165 | W1 = tf.get_variable("W1",[4,4,3,8],initializer = tf.contrib.layers.xavier_initializer(seed = 0) )
166 | W2 = tf.get_variable("W2",[2,2,8,16],initializer = tf.contrib.layers.xavier_initializer(seed = 0))
167 | ### END CODE HERE ###
168 |
169 | parameters = {"W1": W1,
170 | "W2": W2}
171 |
172 | return parameters
173 |
174 |
175 | # In[26]:
176 |
177 | tf.reset_default_graph()
178 | with tf.Session() as sess_test:
179 | parameters = initialize_parameters()
180 | init = tf.global_variables_initializer()
181 | sess_test.run(init)
182 | print("W1 = " + str(parameters["W1"].eval()[1,1,1]))
183 | print("W2 = " + str(parameters["W2"].eval()[1,1,1]))
184 |
185 |
186 | # ** Expected Output:**
187 | #
188 | #
189 | #
190 | #
191 | #
192 | # W1 =
193 | # |
194 | #
195 | # [ 0.00131723 0.14176141 -0.04434952 0.09197326 0.14984085 -0.03514394
196 | # -0.06847463 0.05245192]
197 | # |
198 | #
199 | #
200 | #
201 | #
202 | # W2 =
203 | # |
204 | #
205 | # [-0.08566415 0.17750949 0.11974221 0.16773748 -0.0830943 -0.08058
206 | # -0.00577033 -0.14643836 0.24162132 -0.05857408 -0.19055021 0.1345228
207 | # -0.22779644 -0.1601823 -0.16117483 -0.10286498]
208 | # |
209 | #
210 | #
211 | #
212 |
213 | # ### 1.2 - Forward propagation
214 | #
215 | # In TensorFlow, there are built-in functions that carry out the convolution steps for you.
216 | #
217 | # - **tf.nn.conv2d(X,W1, strides = [1,s,s,1], padding = 'SAME'):** given an input $X$ and a group of filters $W1$, this function convolves $W1$'s filters on X. The third input ([1,f,f,1]) represents the strides for each dimension of the input (m, n_H_prev, n_W_prev, n_C_prev). You can read the full documentation [here](https://www.tensorflow.org/api_docs/python/tf/nn/conv2d)
218 | #
219 | # - **tf.nn.max_pool(A, ksize = [1,f,f,1], strides = [1,s,s,1], padding = 'SAME'):** given an input A, this function uses a window of size (f, f) and strides of size (s, s) to carry out max pooling over each window. You can read the full documentation [here](https://www.tensorflow.org/api_docs/python/tf/nn/max_pool)
220 | #
221 | # - **tf.nn.relu(Z1):** computes the elementwise ReLU of Z1 (which can be any shape). You can read the full documentation [here.](https://www.tensorflow.org/api_docs/python/tf/nn/relu)
222 | #
223 | # - **tf.contrib.layers.flatten(P)**: given an input P, this function flattens each example into a 1D vector it while maintaining the batch-size. It returns a flattened tensor with shape [batch_size, k]. You can read the full documentation [here.](https://www.tensorflow.org/api_docs/python/tf/contrib/layers/flatten)
224 | #
225 | # - **tf.contrib.layers.fully_connected(F, num_outputs):** given a the flattened input F, it returns the output computed using a fully connected layer. You can read the full documentation [here.](https://www.tensorflow.org/api_docs/python/tf/contrib/layers/fully_connected)
226 | #
227 | # In the last function above (`tf.contrib.layers.fully_connected`), the fully connected layer automatically initializes weights in the graph and keeps on training them as you train the model. Hence, you did not need to initialize those weights when initializing the parameters.
228 | #
229 | #
230 | # **Exercise**:
231 | #
232 | # Implement the `forward_propagation` function below to build the following model: `CONV2D -> RELU -> MAXPOOL -> CONV2D -> RELU -> MAXPOOL -> FLATTEN -> FULLYCONNECTED`. You should use the functions above.
233 | #
234 | # In detail, we will use the following parameters for all the steps:
235 | # - Conv2D: stride 1, padding is "SAME"
236 | # - ReLU
237 | # - Max pool: Use an 8 by 8 filter size and an 8 by 8 stride, padding is "SAME"
238 | # - Conv2D: stride 1, padding is "SAME"
239 | # - ReLU
240 | # - Max pool: Use a 4 by 4 filter size and a 4 by 4 stride, padding is "SAME"
241 | # - Flatten the previous output.
242 | # - FULLYCONNECTED (FC) layer: Apply a fully connected layer without an non-linear activation function. Do not call the softmax here. This will result in 6 neurons in the output layer, which then get passed later to a softmax. In TensorFlow, the softmax and cost function are lumped together into a single function, which you'll call in a different function when computing the cost.
243 |
244 | # In[36]:
245 |
246 | # GRADED FUNCTION: forward_propagation
247 |
248 | def forward_propagation(X, parameters):
249 | """
250 | Implements the forward propagation for the model:
251 | CONV2D -> RELU -> MAXPOOL -> CONV2D -> RELU -> MAXPOOL -> FLATTEN -> FULLYCONNECTED
252 |
253 | Arguments:
254 | X -- input dataset placeholder, of shape (input size, number of examples)
255 | parameters -- python dictionary containing your parameters "W1", "W2"
256 | the shapes are given in initialize_parameters
257 |
258 | Returns:
259 | Z3 -- the output of the last LINEAR unit
260 | """
261 |
262 | # Retrieve the parameters from the dictionary "parameters"
263 | W1 = parameters['W1']
264 | W2 = parameters['W2']
265 |
266 | ### START CODE HERE ###
267 | # CONV2D: stride of 1, padding 'SAME'
268 | Z1 = tf.nn.conv2d(X,W1, strides = [1,1,1,1], padding = 'SAME')
269 | # RELU
270 | A1 = tf.nn.relu(Z1)
271 | # MAXPOOL: window 8x8, sride 8, padding 'SAME'
272 | P1 = tf.nn.max_pool(A1, ksize = [1,8,8,1], strides = [1,8,8,1], padding = 'SAME')
273 | # CONV2D: filters W2, stride 1, padding 'SAME'
274 | Z2 = tf.nn.conv2d(P1,W2, strides = [1,1,1,1], padding = 'SAME')
275 | # RELU
276 | A2 = tf.nn.relu(Z2)
277 | # MAXPOOL: window 4x4, stride 4, padding 'SAME'
278 | P2 = tf.nn.max_pool(A2, ksize = [1,4,4,1], strides = [1,4,4,1], padding = 'SAME')
279 | # FLATTEN
280 | P2 = tf.contrib.layers.flatten(P2)
281 | # FULLY-CONNECTED without non-linear activation function (not not call softmax).
282 | # 6 neurons in output layer. Hint: one of the arguments should be "activation_fn=None"
283 | Z3 = tf.contrib.layers.fully_connected(P2,6,activation_fn=None)
284 | ### END CODE HERE ###
285 |
286 | return Z3
287 |
288 |
289 | # In[37]:
290 |
291 | tf.reset_default_graph()
292 |
293 | with tf.Session() as sess:
294 | np.random.seed(1)
295 | X, Y = create_placeholders(64, 64, 3, 6)
296 | parameters = initialize_parameters()
297 | Z3 = forward_propagation(X, parameters)
298 | init = tf.global_variables_initializer()
299 | sess.run(init)
300 | a = sess.run(Z3, {X: np.random.randn(2,64,64,3), Y: np.random.randn(2,6)})
301 | print("Z3 = " + str(a))
302 |
303 |
304 | # **Expected Output**:
305 | #
306 | #
307 | #
308 | # Z3 =
309 | # |
310 | #
311 | # [[-0.44670227 -1.57208765 -1.53049231 -2.31013036 -1.29104376 0.46852064]
312 | # [-0.17601591 -1.57972014 -1.4737016 -2.61672091 -1.00810647 0.5747785 ]]
313 | # |
314 | #
315 |
316 | # ### 1.3 - Compute cost
317 | #
318 | # Implement the compute cost function below. You might find these two functions helpful:
319 | #
320 | # - **tf.nn.softmax_cross_entropy_with_logits(logits = Z3, labels = Y):** computes the softmax entropy loss. This function both computes the softmax activation function as well as the resulting loss. You can check the full documentation [here.](https://www.tensorflow.org/api_docs/python/tf/nn/softmax_cross_entropy_with_logits)
321 | # - **tf.reduce_mean:** computes the mean of elements across dimensions of a tensor. Use this to sum the losses over all the examples to get the overall cost. You can check the full documentation [here.](https://www.tensorflow.org/api_docs/python/tf/reduce_mean)
322 | #
323 | # ** Exercise**: Compute the cost below using the function above.
324 |
325 | # In[40]:
326 |
327 | # GRADED FUNCTION: compute_cost
328 |
329 | def compute_cost(Z3, Y):
330 | """
331 | Computes the cost
332 |
333 | Arguments:
334 | Z3 -- output of forward propagation (output of the last LINEAR unit), of shape (6, number of examples)
335 | Y -- "true" labels vector placeholder, same shape as Z3
336 |
337 | Returns:
338 | cost - Tensor of the cost function
339 | """
340 |
341 | ### START CODE HERE ### (1 line of code)
342 | cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits = Z3, labels = Y))
343 | ### END CODE HERE ###
344 |
345 | return cost
346 |
347 |
348 | # In[41]:
349 |
350 | tf.reset_default_graph()
351 |
352 | with tf.Session() as sess:
353 | np.random.seed(1)
354 | X, Y = create_placeholders(64, 64, 3, 6)
355 | parameters = initialize_parameters()
356 | Z3 = forward_propagation(X, parameters)
357 | cost = compute_cost(Z3, Y)
358 | init = tf.global_variables_initializer()
359 | sess.run(init)
360 | a = sess.run(cost, {X: np.random.randn(4,64,64,3), Y: np.random.randn(4,6)})
361 | print("cost = " + str(a))
362 |
363 |
364 | # **Expected Output**:
365 | #
366 | #
367 | #
368 | # cost =
369 | # |
370 | #
371 | #
372 | # 2.91034
373 | # |
374 | #
375 |
376 | # ## 1.4 Model
377 | #
378 | # Finally you will merge the helper functions you implemented above to build a model. You will train it on the SIGNS dataset.
379 | #
380 | # You have implemented `random_mini_batches()` in the Optimization programming assignment of course 2. Remember that this function returns a list of mini-batches.
381 | #
382 | # **Exercise**: Complete the function below.
383 | #
384 | # The model below should:
385 | #
386 | # - create placeholders
387 | # - initialize parameters
388 | # - forward propagate
389 | # - compute the cost
390 | # - create an optimizer
391 | #
392 | # Finally you will create a session and run a for loop for num_epochs, get the mini-batches, and then for each mini-batch you will optimize the function. [Hint for initializing the variables](https://www.tensorflow.org/api_docs/python/tf/global_variables_initializer)
393 |
394 | # In[55]:
395 |
396 | # GRADED FUNCTION: model
397 |
398 | def model(X_train, Y_train, X_test, Y_test, learning_rate = 0.009,
399 | num_epochs = 100, minibatch_size = 64, print_cost = True):
400 | """
401 | Implements a three-layer ConvNet in Tensorflow:
402 | CONV2D -> RELU -> MAXPOOL -> CONV2D -> RELU -> MAXPOOL -> FLATTEN -> FULLYCONNECTED
403 |
404 | Arguments:
405 | X_train -- training set, of shape (None, 64, 64, 3)
406 | Y_train -- test set, of shape (None, n_y = 6)
407 | X_test -- training set, of shape (None, 64, 64, 3)
408 | Y_test -- test set, of shape (None, n_y = 6)
409 | learning_rate -- learning rate of the optimization
410 | num_epochs -- number of epochs of the optimization loop
411 | minibatch_size -- size of a minibatch
412 | print_cost -- True to print the cost every 100 epochs
413 |
414 | Returns:
415 | train_accuracy -- real number, accuracy on the train set (X_train)
416 | test_accuracy -- real number, testing accuracy on the test set (X_test)
417 | parameters -- parameters learnt by the model. They can then be used to predict.
418 | """
419 |
420 | ops.reset_default_graph() # to be able to rerun the model without overwriting tf variables
421 | tf.set_random_seed(1) # to keep results consistent (tensorflow seed)
422 | seed = 3 # to keep results consistent (numpy seed)
423 | (m, n_H0, n_W0, n_C0) = X_train.shape
424 | n_y = Y_train.shape[1]
425 | costs = [] # To keep track of the cost
426 |
427 | # Create Placeholders of the correct shape
428 | ### START CODE HERE ### (1 line)
429 | X, Y = create_placeholders(n_H0, n_W0, n_C0, n_y)
430 | ### END CODE HERE ###
431 |
432 | # Initialize parameters
433 | ### START CODE HERE ### (1 line)
434 | parameters = initialize_parameters()
435 | ### END CODE HERE ###
436 |
437 | # Forward propagation: Build the forward propagation in the tensorflow graph
438 | ### START CODE HERE ### (1 line)
439 | Z3 = forward_propagation(X, parameters)
440 | ### END CODE HERE ###
441 |
442 | # Cost function: Add cost function to tensorflow graph
443 | ### START CODE HERE ### (1 line)
444 | cost = compute_cost(Z3, Y)
445 | ### END CODE HERE ###
446 |
447 | # Backpropagation: Define the tensorflow optimizer. Use an AdamOptimizer that minimizes the cost.
448 | ### START CODE HERE ### (1 line)
449 | optimizer = tf.train.AdamOptimizer(learning_rate).minimize(cost)
450 | ### END CODE HERE ###
451 |
452 | # Initialize all the variables globally
453 | init = tf.global_variables_initializer()
454 |
455 | # Start the session to compute the tensorflow graph
456 | with tf.Session() as sess:
457 |
458 | # Run the initialization
459 | sess.run(init)
460 |
461 | # Do the training loop
462 | for epoch in range(num_epochs):
463 |
464 | minibatch_cost = 0.
465 | num_minibatches = int(m / minibatch_size) # number of minibatches of size minibatch_size in the train set
466 | seed = seed + 1
467 | minibatches = random_mini_batches(X_train, Y_train, minibatch_size, seed)
468 |
469 | for minibatch in minibatches:
470 |
471 | # Select a minibatch
472 | (minibatch_X, minibatch_Y) = minibatch
473 | # IMPORTANT: The line that runs the graph on a minibatch.
474 | # Run the session to execute the optimizer and the cost, the feedict should contain a minibatch for (X,Y).
475 | ### START CODE HERE ### (1 line)
476 | _ , temp_cost = sess.run(optimizer,feed_dict={X:minibatch_X,Y:minibatch_Y})
477 | ### END CODE HERE ###
478 | ### HERE HAS BUGS!!!
479 | minibatch_cost += temp_cost / num_minibatches
480 |
481 |
482 | # Print the cost every epoch
483 | if print_cost == True and epoch % 5 == 0:
484 | print ("Cost after epoch %i: %f" % (epoch, minibatch_cost))
485 | if print_cost == True and epoch % 1 == 0:
486 | costs.append(minibatch_cost)
487 |
488 |
489 | # plot the cost
490 | plt.plot(np.squeeze(costs))
491 | plt.ylabel('cost')
492 | plt.xlabel('iterations (per tens)')
493 | plt.title("Learning rate =" + str(learning_rate))
494 | plt.show()
495 |
496 | # Calculate the correct predictions
497 | predict_op = tf.argmax(Z3, 1)
498 | correct_prediction = tf.equal(predict_op, tf.argmax(Y, 1))
499 |
500 | # Calculate accuracy on the test set
501 | accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
502 | print(accuracy)
503 | train_accuracy = accuracy.eval({X: X_train, Y: Y_train})
504 | test_accuracy = accuracy.eval({X: X_test, Y: Y_test})
505 | print("Train Accuracy:", train_accuracy)
506 | print("Test Accuracy:", test_accuracy)
507 |
508 | return train_accuracy, test_accuracy, parameters
509 |
510 | #####This code has some bugs !!!
511 |
512 |
513 | # Run the following cell to train your model for 100 epochs. Check if your cost after epoch 0 and 5 matches our output. If not, stop the cell and go back to your code!
514 |
515 | # In[56]:
516 |
517 | _, _, parameters = model(X_train, Y_train, X_test, Y_test)
518 |
519 |
520 | # **Expected output**: although it may not match perfectly, your expected output should be close to ours and your cost value should decrease.
521 | #
522 | #
523 | #
524 | #
525 | # **Cost after epoch 0 =**
526 | # |
527 | #
528 | #
529 | # 1.917929
530 | # |
531 | #
532 | #
533 | #
534 | # **Cost after epoch 5 =**
535 | # |
536 | #
537 | #
538 | # 1.506757
539 | # |
540 | #
541 | #
542 | #
543 | # **Train Accuracy =**
544 | # |
545 | #
546 | #
547 | # 0.940741
548 | # |
549 | #
550 | #
551 | #
552 | #
553 | # **Test Accuracy =**
554 | # |
555 | #
556 | #
557 | # 0.783333
558 | # |
559 | #
560 | #
561 |
562 | # Congratulations! You have finised the assignment and built a model that recognizes SIGN language with almost 80% accuracy on the test set. If you wish, feel free to play around with this dataset further. You can actually improve its accuracy by spending more time tuning the hyperparameters, or using regularization (as this model clearly has a high variance).
563 | #
564 | # Once again, here's a thumbs up for your work!
565 |
566 | # In[52]:
567 |
568 | fname = "images/thumbs_up.jpg"
569 | image = np.array(ndimage.imread(fname, flatten=False))
570 | my_image = scipy.misc.imresize(image, size=(64,64))
571 | plt.imshow(my_image)
572 |
573 |
574 | # In[ ]:
575 |
576 |
577 |
578 |
--------------------------------------------------------------------------------
/py/Deep Neural Network Application v3.py:
--------------------------------------------------------------------------------
1 |
2 | # coding: utf-8
3 |
4 | # # Deep Neural Network for Image Classification: Application
5 | #
6 | # When you finish this, you will have finished the last programming assignment of Week 4, and also the last programming assignment of this course!
7 | #
8 | # You will use use the functions you'd implemented in the previous assignment to build a deep network, and apply it to cat vs non-cat classification. Hopefully, you will see an improvement in accuracy relative to your previous logistic regression implementation.
9 | #
10 | # **After this assignment you will be able to:**
11 | # - Build and apply a deep neural network to supervised learning.
12 | #
13 | # Let's get started!
14 |
15 | # ## 1 - Packages
16 |
17 | # Let's first import all the packages that you will need during this assignment.
18 | # - [numpy](www.numpy.org) is the fundamental package for scientific computing with Python.
19 | # - [matplotlib](http://matplotlib.org) is a library to plot graphs in Python.
20 | # - [h5py](http://www.h5py.org) is a common package to interact with a dataset that is stored on an H5 file.
21 | # - [PIL](http://www.pythonware.com/products/pil/) and [scipy](https://www.scipy.org/) are used here to test your model with your own picture at the end.
22 | # - dnn_app_utils provides the functions implemented in the "Building your Deep Neural Network: Step by Step" assignment to this notebook.
23 | # - np.random.seed(1) is used to keep all the random function calls consistent. It will help us grade your work.
24 |
25 | # In[1]:
26 |
27 | import time
28 | import numpy as np
29 | import h5py
30 | import matplotlib.pyplot as plt
31 | import scipy
32 | from PIL import Image
33 | from scipy import ndimage
34 | from dnn_app_utils_v2 import *
35 |
36 | get_ipython().magic('matplotlib inline')
37 | plt.rcParams['figure.figsize'] = (5.0, 4.0) # set default size of plots
38 | plt.rcParams['image.interpolation'] = 'nearest'
39 | plt.rcParams['image.cmap'] = 'gray'
40 |
41 | get_ipython().magic('load_ext autoreload')
42 | get_ipython().magic('autoreload 2')
43 |
44 | np.random.seed(1)
45 |
46 |
47 | # ## 2 - Dataset
48 | #
49 | # You will use the same "Cat vs non-Cat" dataset as in "Logistic Regression as a Neural Network" (Assignment 2). The model you had built had 70% test accuracy on classifying cats vs non-cats images. Hopefully, your new model will perform a better!
50 | #
51 | # **Problem Statement**: You are given a dataset ("data.h5") containing:
52 | # - a training set of m_train images labelled as cat (1) or non-cat (0)
53 | # - a test set of m_test images labelled as cat and non-cat
54 | # - each image is of shape (num_px, num_px, 3) where 3 is for the 3 channels (RGB).
55 | #
56 | # Let's get more familiar with the dataset. Load the data by running the cell below.
57 |
58 | # In[2]:
59 |
60 | train_x_orig, train_y, test_x_orig, test_y, classes = load_data()
61 |
62 |
63 | # The following code will show you an image in the dataset. Feel free to change the index and re-run the cell multiple times to see other images.
64 |
65 | # In[6]:
66 |
67 | # Example of a picture
68 | index = 15
69 | plt.imshow(train_x_orig[index])
70 | print(train_x_orig.shape)
71 | print ("y = " + str(train_y[0,index]) + ". It's a " + classes[train_y[0,index]].decode("utf-8") + " picture.")
72 |
73 |
74 | # In[7]:
75 |
76 | # Explore your dataset
77 | m_train = train_x_orig.shape[0]
78 | num_px = train_x_orig.shape[1]
79 | m_test = test_x_orig.shape[0]
80 |
81 | print ("Number of training examples: " + str(m_train))
82 | print ("Number of testing examples: " + str(m_test))
83 | print ("Each image is of size: (" + str(num_px) + ", " + str(num_px) + ", 3)")
84 | print ("train_x_orig shape: " + str(train_x_orig.shape))
85 | print ("train_y shape: " + str(train_y.shape))
86 | print ("test_x_orig shape: " + str(test_x_orig.shape))
87 | print ("test_y shape: " + str(test_y.shape))
88 |
89 |
90 | # As usual, you reshape and standardize the images before feeding them to the network. The code is given in the cell below.
91 | #
92 | #
93 | #
94 | # Figure 1: Image to vector conversion.
95 |
96 | # In[8]:
97 |
98 | # Reshape the training and test examples
99 | train_x_flatten = train_x_orig.reshape(train_x_orig.shape[0], -1).T # The "-1" makes reshape flatten the remaining dimensions
100 | test_x_flatten = test_x_orig.reshape(test_x_orig.shape[0], -1).T
101 |
102 | # Standardize data to have feature values between 0 and 1.
103 | train_x = train_x_flatten/255.
104 | test_x = test_x_flatten/255.
105 |
106 | print ("train_x's shape: " + str(train_x.shape))
107 | print ("test_x's shape: " + str(test_x.shape))
108 |
109 |
110 | # $12,288$ equals $64 \times 64 \times 3$ which is the size of one reshaped image vector.
111 |
112 | # ## 3 - Architecture of your model
113 |
114 | # Now that you are familiar with the dataset, it is time to build a deep neural network to distinguish cat images from non-cat images.
115 | #
116 | # You will build two different models:
117 | # - A 2-layer neural network
118 | # - An L-layer deep neural network
119 | #
120 | # You will then compare the performance of these models, and also try out different values for $L$.
121 | #
122 | # Let's look at the two architectures.
123 | #
124 | # ### 3.1 - 2-layer neural network
125 | #
126 | #
127 | # Figure 2: 2-layer neural network.
The model can be summarized as: ***INPUT -> LINEAR -> RELU -> LINEAR -> SIGMOID -> OUTPUT***.
128 | #
129 | # Detailed Architecture of figure 2:
130 | # - The input is a (64,64,3) image which is flattened to a vector of size $(12288,1)$.
131 | # - The corresponding vector: $[x_0,x_1,...,x_{12287}]^T$ is then multiplied by the weight matrix $W^{[1]}$ of size $(n^{[1]}, 12288)$.
132 | # - You then add a bias term and take its relu to get the following vector: $[a_0^{[1]}, a_1^{[1]},..., a_{n^{[1]}-1}^{[1]}]^T$.
133 | # - You then repeat the same process.
134 | # - You multiply the resulting vector by $W^{[2]}$ and add your intercept (bias).
135 | # - Finally, you take the sigmoid of the result. If it is greater than 0.5, you classify it to be a cat.
136 | #
137 | # ### 3.2 - L-layer deep neural network
138 | #
139 | # It is hard to represent an L-layer deep neural network with the above representation. However, here is a simplified network representation:
140 | #
141 | #
142 | # Figure 3: L-layer neural network.
The model can be summarized as: ***[LINEAR -> RELU] $\times$ (L-1) -> LINEAR -> SIGMOID***
143 | #
144 | # Detailed Architecture of figure 3:
145 | # - The input is a (64,64,3) image which is flattened to a vector of size (12288,1).
146 | # - The corresponding vector: $[x_0,x_1,...,x_{12287}]^T$ is then multiplied by the weight matrix $W^{[1]}$ and then you add the intercept $b^{[1]}$. The result is called the linear unit.
147 | # - Next, you take the relu of the linear unit. This process could be repeated several times for each $(W^{[l]}, b^{[l]})$ depending on the model architecture.
148 | # - Finally, you take the sigmoid of the final linear unit. If it is greater than 0.5, you classify it to be a cat.
149 | #
150 | # ### 3.3 - General methodology
151 | #
152 | # As usual you will follow the Deep Learning methodology to build the model:
153 | # 1. Initialize parameters / Define hyperparameters
154 | # 2. Loop for num_iterations:
155 | # a. Forward propagation
156 | # b. Compute cost function
157 | # c. Backward propagation
158 | # d. Update parameters (using parameters, and grads from backprop)
159 | # 4. Use trained parameters to predict labels
160 | #
161 | # Let's now implement those two models!
162 |
163 | # ## 4 - Two-layer neural network
164 | #
165 | # **Question**: Use the helper functions you have implemented in the previous assignment to build a 2-layer neural network with the following structure: *LINEAR -> RELU -> LINEAR -> SIGMOID*. The functions you may need and their inputs are:
166 | # ```python
167 | # def initialize_parameters(n_x, n_h, n_y):
168 | # ...
169 | # return parameters
170 | # def linear_activation_forward(A_prev, W, b, activation):
171 | # ...
172 | # return A, cache
173 | # def compute_cost(AL, Y):
174 | # ...
175 | # return cost
176 | # def linear_activation_backward(dA, cache, activation):
177 | # ...
178 | # return dA_prev, dW, db
179 | # def update_parameters(parameters, grads, learning_rate):
180 | # ...
181 | # return parameters
182 | # ```
183 |
184 | # In[9]:
185 |
186 | ### CONSTANTS DEFINING THE MODEL ####
187 | n_x = 12288 # num_px * num_px * 3
188 | n_h = 7
189 | n_y = 1
190 | layers_dims = (n_x, n_h, n_y)
191 |
192 |
193 | # In[10]:
194 |
195 | # GRADED FUNCTION: two_layer_model
196 |
197 | def two_layer_model(X, Y, layers_dims, learning_rate = 0.0075, num_iterations = 3000, print_cost=False):
198 | """
199 | Implements a two-layer neural network: LINEAR->RELU->LINEAR->SIGMOID.
200 |
201 | Arguments:
202 | X -- input data, of shape (n_x, number of examples)
203 | Y -- true "label" vector (containing 0 if cat, 1 if non-cat), of shape (1, number of examples)
204 | layers_dims -- dimensions of the layers (n_x, n_h, n_y)
205 | num_iterations -- number of iterations of the optimization loop
206 | learning_rate -- learning rate of the gradient descent update rule
207 | print_cost -- If set to True, this will print the cost every 100 iterations
208 |
209 | Returns:
210 | parameters -- a dictionary containing W1, W2, b1, and b2
211 | """
212 |
213 | np.random.seed(1)
214 | grads = {}
215 | costs = [] # to keep track of the cost
216 | m = X.shape[1] # number of examples
217 | (n_x, n_h, n_y) = layers_dims
218 |
219 | # Initialize parameters dictionary, by calling one of the functions you'd previously implemented
220 | ### START CODE HERE ### (≈ 1 line of code)
221 | parameters = initialize_parameters(n_x, n_h, n_y)
222 | ### END CODE HERE ###
223 |
224 | # Get W1, b1, W2 and b2 from the dictionary parameters.
225 | W1 = parameters["W1"]
226 | b1 = parameters["b1"]
227 | W2 = parameters["W2"]
228 | b2 = parameters["b2"]
229 |
230 | # Loop (gradient descent)
231 |
232 | for i in range(0, num_iterations):
233 |
234 | # Forward propagation: LINEAR -> RELU -> LINEAR -> SIGMOID. Inputs: "X, W1, b1". Output: "A1, cache1, A2, cache2".
235 | ### START CODE HERE ### (≈ 2 lines of code)
236 | A1, cache1 = linear_activation_forward(X, W1, b1, activation = "relu")
237 | A2, cache2 = linear_activation_forward(A1, W2, b2, activation = "sigmoid")
238 | ### END CODE HERE ###
239 |
240 | # Compute cost
241 | ### START CODE HERE ### (≈ 1 line of code)
242 | cost = compute_cost(A2, Y)
243 | ### END CODE HERE ###
244 |
245 | # Initializing backward propagation
246 | dA2 = - (np.divide(Y, A2) - np.divide(1 - Y, 1 - A2))
247 |
248 | # Backward propagation. Inputs: "dA2, cache2, cache1". Outputs: "dA1, dW2, db2; also dA0 (not used), dW1, db1".
249 | ### START CODE HERE ### (≈ 2 lines of code)
250 | dA1, dW2, db2 = linear_activation_backward(dA2, cache2, activation = "sigmoid")
251 | dA0, dW1, db1 = linear_activation_backward(dA1, cache1, activation = "relu")
252 | ### END CODE HERE ###
253 |
254 | # Set grads['dWl'] to dW1, grads['db1'] to db1, grads['dW2'] to dW2, grads['db2'] to db2
255 | grads['dW1'] = dW1
256 | grads['db1'] = db1
257 | grads['dW2'] = dW2
258 | grads['db2'] = db2
259 |
260 | # Update parameters.
261 | ### START CODE HERE ### (approx. 1 line of code)
262 | parameters = update_parameters(parameters, grads, learning_rate)
263 | ### END CODE HERE ###
264 |
265 | # Retrieve W1, b1, W2, b2 from parameters
266 | W1 = parameters["W1"]
267 | b1 = parameters["b1"]
268 | W2 = parameters["W2"]
269 | b2 = parameters["b2"]
270 |
271 | # Print the cost every 100 training example
272 | if print_cost and i % 100 == 0:
273 | print("Cost after iteration {}: {}".format(i, np.squeeze(cost)))
274 | if print_cost and i % 100 == 0:
275 | costs.append(cost)
276 |
277 | # plot the cost
278 |
279 | plt.plot(np.squeeze(costs))
280 | plt.ylabel('cost')
281 | plt.xlabel('iterations (per tens)')
282 | plt.title("Learning rate =" + str(learning_rate))
283 | plt.show()
284 |
285 | return parameters
286 |
287 |
288 | # Run the cell below to train your parameters. See if your model runs. The cost should be decreasing. It may take up to 5 minutes to run 2500 iterations. Check if the "Cost after iteration 0" matches the expected output below, if not click on the square (⬛) on the upper bar of the notebook to stop the cell and try to find your error.
289 |
290 | # In[11]:
291 |
292 | parameters = two_layer_model(train_x, train_y, layers_dims = (n_x, n_h, n_y), num_iterations = 2500, print_cost=True)
293 |
294 |
295 | # **Expected Output**:
296 | #
297 | #
298 | # **Cost after iteration 0** |
299 | # 0.6930497356599888 |
300 | #
301 | #
302 | # **Cost after iteration 100** |
303 | # 0.6464320953428849 |
304 | #
305 | #
306 | # **...** |
307 | # ... |
308 | #
309 | #
310 | # **Cost after iteration 2400** |
311 | # 0.048554785628770206 |
312 | #
313 | #
314 |
315 | # Good thing you built a vectorized implementation! Otherwise it might have taken 10 times longer to train this.
316 | #
317 | # Now, you can use the trained parameters to classify images from the dataset. To see your predictions on the training and test sets, run the cell below.
318 |
319 | # In[12]:
320 |
321 | predictions_train = predict(train_x, train_y, parameters)
322 |
323 |
324 | # **Expected Output**:
325 | #
326 | #
327 | # **Accuracy** |
328 | # 1.0 |
329 | #
330 | #
331 |
332 | # In[13]:
333 |
334 | predictions_test = predict(test_x, test_y, parameters)
335 |
336 |
337 | # **Expected Output**:
338 | #
339 | #
340 | #
341 | # **Accuracy** |
342 | # 0.72 |
343 | #
344 | #
345 |
346 | # **Note**: You may notice that running the model on fewer iterations (say 1500) gives better accuracy on the test set. This is called "early stopping" and we will talk about it in the next course. Early stopping is a way to prevent overfitting.
347 | #
348 | # Congratulations! It seems that your 2-layer neural network has better performance (72%) than the logistic regression implementation (70%, assignment week 2). Let's see if you can do even better with an $L$-layer model.
349 |
350 | # ## 5 - L-layer Neural Network
351 | #
352 | # **Question**: Use the helper functions you have implemented previously to build an $L$-layer neural network with the following structure: *[LINEAR -> RELU]$\times$(L-1) -> LINEAR -> SIGMOID*. The functions you may need and their inputs are:
353 | # ```python
354 | # def initialize_parameters_deep(layer_dims):
355 | # ...
356 | # return parameters
357 | # def L_model_forward(X, parameters):
358 | # ...
359 | # return AL, caches
360 | # def compute_cost(AL, Y):
361 | # ...
362 | # return cost
363 | # def L_model_backward(AL, Y, caches):
364 | # ...
365 | # return grads
366 | # def update_parameters(parameters, grads, learning_rate):
367 | # ...
368 | # return parameters
369 | # ```
370 |
371 | # In[14]:
372 |
373 | ### CONSTANTS ###
374 | layers_dims = [12288, 20, 7, 5, 1] # 5-layer model
375 |
376 |
377 | # In[17]:
378 |
379 | # GRADED FUNCTION: L_layer_model
380 |
381 | def L_layer_model(X, Y, layers_dims, learning_rate = 0.0075, num_iterations = 3000, print_cost=False):#lr was 0.009
382 | """
383 | Implements a L-layer neural network: [LINEAR->RELU]*(L-1)->LINEAR->SIGMOID.
384 |
385 | Arguments:
386 | X -- data, numpy array of shape (number of examples, num_px * num_px * 3)
387 | Y -- true "label" vector (containing 0 if cat, 1 if non-cat), of shape (1, number of examples)
388 | layers_dims -- list containing the input size and each layer size, of length (number of layers + 1).
389 | learning_rate -- learning rate of the gradient descent update rule
390 | num_iterations -- number of iterations of the optimization loop
391 | print_cost -- if True, it prints the cost every 100 steps
392 |
393 | Returns:
394 | parameters -- parameters learnt by the model. They can then be used to predict.
395 | """
396 |
397 | np.random.seed(1)
398 | costs = [] # keep track of cost
399 |
400 | # Parameters initialization.
401 | ### START CODE HERE ###
402 | parameters = initialize_parameters_deep(layers_dims)
403 | ### END CODE HERE ###
404 |
405 | # Loop (gradient descent)
406 | for i in range(0, num_iterations):
407 |
408 | # Forward propagation: [LINEAR -> RELU]*(L-1) -> LINEAR -> SIGMOID.
409 | ### START CODE HERE ### (≈ 1 line of code)
410 | AL, caches = L_model_forward(X, parameters)
411 | ### END CODE HERE ###
412 |
413 | # Compute cost.
414 | ### START CODE HERE ### (≈ 1 line of code)
415 | cost = compute_cost(AL, Y)
416 | ### END CODE HERE ###
417 |
418 | # Backward propagation.
419 | ### START CODE HERE ### (≈ 1 line of code)
420 | grads = L_model_backward(AL, Y, caches)
421 | ### END CODE HERE ###
422 |
423 | # Update parameters.
424 | ### START CODE HERE ### (≈ 1 line of code)
425 | parameters = update_parameters(parameters, grads, learning_rate)
426 | ### END CODE HERE ###
427 |
428 | # Print the cost every 100 training example
429 | if print_cost and i % 100 == 0:
430 | print ("Cost after iteration %i: %f" %(i, cost))
431 | if print_cost and i % 100 == 0:
432 | costs.append(cost)
433 |
434 | # plot the cost
435 | plt.plot(np.squeeze(costs))
436 | plt.ylabel('cost')
437 | plt.xlabel('iterations (per tens)')
438 | plt.title("Learning rate =" + str(learning_rate))
439 | plt.show()
440 |
441 | return parameters
442 |
443 |
444 | # You will now train the model as a 5-layer neural network.
445 | #
446 | # Run the cell below to train your model. The cost should decrease on every iteration. It may take up to 5 minutes to run 2500 iterations. Check if the "Cost after iteration 0" matches the expected output below, if not click on the square (⬛) on the upper bar of the notebook to stop the cell and try to find your error.
447 |
448 | # In[18]:
449 |
450 | parameters = L_layer_model(train_x, train_y, layers_dims, num_iterations = 2500, print_cost = True)
451 |
452 |
453 | # **Expected Output**:
454 | #
455 | #
456 | # **Cost after iteration 0** |
457 | # 0.771749 |
458 | #
459 | #
460 | # **Cost after iteration 100** |
461 | # 0.672053 |
462 | #
463 | #
464 | # **...** |
465 | # ... |
466 | #
467 | #
468 | # **Cost after iteration 2400** |
469 | # 0.092878 |
470 | #
471 | #
472 |
473 | # In[19]:
474 |
475 | pred_train = predict(train_x, train_y, parameters)
476 |
477 |
478 | #
479 | #
480 | #
481 | # **Train Accuracy**
482 | # |
483 | #
484 | # 0.985645933014
485 | # |
486 | #
487 | #
488 |
489 | # In[20]:
490 |
491 | pred_test = predict(test_x, test_y, parameters)
492 |
493 |
494 | # **Expected Output**:
495 | #
496 | #
497 | #
498 | # **Test Accuracy** |
499 | # 0.8 |
500 | #
501 | #
502 |
503 | # Congrats! It seems that your 5-layer neural network has better performance (80%) than your 2-layer neural network (72%) on the same test set.
504 | #
505 | # This is good performance for this task. Nice job!
506 | #
507 | # Though in the next course on "Improving deep neural networks" you will learn how to obtain even higher accuracy by systematically searching for better hyperparameters (learning_rate, layers_dims, num_iterations, and others you'll also learn in the next course).
508 |
509 | # ## 6) Results Analysis
510 | #
511 | # First, let's take a look at some images the L-layer model labeled incorrectly. This will show a few mislabeled images.
512 |
513 | # In[21]:
514 |
515 | print_mislabeled_images(classes, test_x, test_y, pred_test)
516 |
517 |
518 | # **A few type of images the model tends to do poorly on include:**
519 | # - Cat body in an unusual position
520 | # - Cat appears against a background of a similar color
521 | # - Unusual cat color and species
522 | # - Camera Angle
523 | # - Brightness of the picture
524 | # - Scale variation (cat is very large or small in image)
525 |
526 | # ## 7) Test with your own image (optional/ungraded exercise) ##
527 | #
528 | # Congratulations on finishing this assignment. You can use your own image and see the output of your model. To do that:
529 | # 1. Click on "File" in the upper bar of this notebook, then click "Open" to go on your Coursera Hub.
530 | # 2. Add your image to this Jupyter Notebook's directory, in the "images" folder
531 | # 3. Change your image's name in the following code
532 | # 4. Run the code and check if the algorithm is right (1 = cat, 0 = non-cat)!
533 |
534 | # In[22]:
535 |
536 | ## START CODE HERE ##
537 | my_image = "my_image.jpg" # change this to the name of your image file
538 | my_label_y = [1] # the true class of your image (1 -> cat, 0 -> non-cat)
539 | ## END CODE HERE ##
540 |
541 | fname = "images/" + my_image
542 | image = np.array(ndimage.imread(fname, flatten=False))
543 | my_image = scipy.misc.imresize(image, size=(num_px,num_px)).reshape((num_px*num_px*3,1))
544 | my_predicted_image = predict(my_image, my_label_y, parameters)
545 |
546 | plt.imshow(image)
547 | print ("y = " + str(np.squeeze(my_predicted_image)) + ", your L-layer model predicts a \"" + classes[int(np.squeeze(my_predicted_image)),].decode("utf-8") + "\" picture.")
548 |
549 |
550 | # **References**:
551 | #
552 | # - for auto-reloading external module: http://stackoverflow.com/questions/1907993/autoreload-of-modules-in-ipython
553 |
--------------------------------------------------------------------------------
/py/Face Recognition for the Happy House.py.html:
--------------------------------------------------------------------------------
1 |
2 | # coding: utf-8
3 |
4 | # # Face Recognition for the Happy House
5 | #
6 | # Welcome to the first assignment of week 4! Here you will build a face recognition system. Many of the ideas presented here are from [FaceNet](https://arxiv.org/pdf/1503.03832.pdf). In lecture, we also talked about [DeepFace](https://research.fb.com/wp-content/uploads/2016/11/deepface-closing-the-gap-to-human-level-performance-in-face-verification.pdf).
7 | #
8 | # Face recognition problems commonly fall into two categories:
9 | #
10 | # - **Face Verification** - "is this the claimed person?". For example, at some airports, you can pass through customs by letting a system scan your passport and then verifying that you (the person carrying the passport) are the correct person. A mobile phone that unlocks using your face is also using face verification. This is a 1:1 matching problem.
11 | # - **Face Recognition** - "who is this person?". For example, the video lecture showed a face recognition video (https://www.youtube.com/watch?v=wr4rx0Spihs) of Baidu employees entering the office without needing to otherwise identify themselves. This is a 1:K matching problem.
12 | #
13 | # FaceNet learns a neural network that encodes a face image into a vector of 128 numbers. By comparing two such vectors, you can then determine if two pictures are of the same person.
14 | #
15 | # **In this assignment, you will:**
16 | # - Implement the triplet loss function
17 | # - Use a pretrained model to map face images into 128-dimensional encodings
18 | # - Use these encodings to perform face verification and face recognition
19 | #
20 | # In this exercise, we will be using a pre-trained model which represents ConvNet activations using a "channels first" convention, as opposed to the "channels last" convention used in lecture and previous programming assignments. In other words, a batch of images will be of shape $(m, n_C, n_H, n_W)$ instead of $(m, n_H, n_W, n_C)$. Both of these conventions have a reasonable amount of traction among open-source implementations; there isn't a uniform standard yet within the deep learning community.
21 | #
22 | # Let's load the required packages.
23 | #
24 |
25 | # In[5]:
26 |
27 | from keras.models import Sequential
28 | from keras.layers import Conv2D, ZeroPadding2D, Activation, Input, concatenate
29 | from keras.models import Model
30 | from keras.layers.normalization import BatchNormalization
31 | from keras.layers.pooling import MaxPooling2D, AveragePooling2D
32 | from keras.layers.merge import Concatenate
33 | from keras.layers.core import Lambda, Flatten, Dense
34 | from keras.initializers import glorot_uniform
35 | from keras.engine.topology import Layer
36 | from keras import backend as K
37 | K.set_image_data_format('channels_first')
38 | import cv2
39 | import os
40 | import numpy as np
41 | from numpy import genfromtxt
42 | import pandas as pd
43 | import tensorflow as tf
44 | from fr_utils import *
45 | from inception_blocks_v2 import *
46 |
47 | get_ipython().magic('matplotlib inline')
48 | get_ipython().magic('load_ext autoreload')
49 | get_ipython().magic('autoreload 2')
50 |
51 | np.set_printoptions(threshold=np.nan)
52 |
53 |
54 | # ## 0 - Naive Face Verification
55 | #
56 | # In Face Verification, you're given two images and you have to tell if they are of the same person. The simplest way to do this is to compare the two images pixel-by-pixel. If the distance between the raw images are less than a chosen threshold, it may be the same person!
57 | #
58 | #
59 | # **Figure 1**
60 |
61 | # Of course, this algorithm performs really poorly, since the pixel values change dramatically due to variations in lighting, orientation of the person's face, even minor changes in head position, and so on.
62 | #
63 | # You'll see that rather than using the raw image, you can learn an encoding $f(img)$ so that element-wise comparisons of this encoding gives more accurate judgements as to whether two pictures are of the same person.
64 |
65 | # ## 1 - Encoding face images into a 128-dimensional vector
66 | #
67 | # ### 1.1 - Using an ConvNet to compute encodings
68 | #
69 | # The FaceNet model takes a lot of data and a long time to train. So following common practice in applied deep learning settings, let's just load weights that someone else has already trained. The network architecture follows the Inception model from [Szegedy *et al.*](https://arxiv.org/abs/1409.4842). We have provided an inception network implementation. You can look in the file `inception_blocks.py` to see how it is implemented (do so by going to "File->Open..." at the top of the Jupyter notebook).
70 | #
71 |
72 | # The key things you need to know are:
73 | #
74 | # - This network uses 96x96 dimensional RGB images as its input. Specifically, inputs a face image (or batch of $m$ face images) as a tensor of shape $(m, n_C, n_H, n_W) = (m, 3, 96, 96)$
75 | # - It outputs a matrix of shape $(m, 128)$ that encodes each input face image into a 128-dimensional vector
76 | #
77 | # Run the cell below to create the model for face images.
78 |
79 | # In[6]:
80 |
81 | FRmodel = faceRecoModel(input_shape=(3, 96, 96))
82 |
83 |
84 | # In[7]:
85 |
86 | print("Total Params:", FRmodel.count_params())
87 |
88 |
89 | # ** Expected Output **
90 | #
91 | #
92 | # Total Params: 3743280
93 | #
94 | #
95 | #
96 |
97 | # By using a 128-neuron fully connected layer as its last layer, the model ensures that the output is an encoding vector of size 128. You then use the encodings the compare two face images as follows:
98 | #
99 | #
100 | # **Figure 2**:
By computing a distance between two encodings and thresholding, you can determine if the two pictures represent the same person
101 | #
102 | # So, an encoding is a good one if:
103 | # - The encodings of two images of the same person are quite similar to each other
104 | # - The encodings of two images of different persons are very different
105 | #
106 | # The triplet loss function formalizes this, and tries to "push" the encodings of two images of the same person (Anchor and Positive) closer together, while "pulling" the encodings of two images of different persons (Anchor, Negative) further apart.
107 | #
108 | #
109 | #
110 | # **Figure 3**:
In the next part, we will call the pictures from left to right: Anchor (A), Positive (P), Negative (N)
111 |
112 | #
113 | #
114 | # ### 1.2 - The Triplet Loss
115 | #
116 | # For an image $x$, we denote its encoding $f(x)$, where $f$ is the function computed by the neural network.
117 | #
118 | #
119 | #
120 | #
123 | #
124 | # Training will use triplets of images $(A, P, N)$:
125 | #
126 | # - A is an "Anchor" image--a picture of a person.
127 | # - P is a "Positive" image--a picture of the same person as the Anchor image.
128 | # - N is a "Negative" image--a picture of a different person than the Anchor image.
129 | #
130 | # These triplets are picked from our training dataset. We will write $(A^{(i)}, P^{(i)}, N^{(i)})$ to denote the $i$-th training example.
131 | #
132 | # You'd like to make sure that an image $A^{(i)}$ of an individual is closer to the Positive $P^{(i)}$ than to the Negative image $N^{(i)}$) by at least a margin $\alpha$:
133 | #
134 | # $$\mid \mid f(A^{(i)}) - f(P^{(i)}) \mid \mid_2^2 + \alpha < \mid \mid f(A^{(i)}) - f(N^{(i)}) \mid \mid_2^2$$
135 | #
136 | # You would thus like to minimize the following "triplet cost":
137 | #
138 | # $$\mathcal{J} = \sum^{N}_{i=1} \large[ \small \underbrace{\mid \mid f(A^{(i)}) - f(P^{(i)}) \mid \mid_2^2}_\text{(1)} - \underbrace{\mid \mid f(A^{(i)}) - f(N^{(i)}) \mid \mid_2^2}_\text{(2)} + \alpha \large ] \small_+ \tag{3}$$
139 | #
140 | # Here, we are using the notation "$[z]_+$" to denote $max(z,0)$.
141 | #
142 | # Notes:
143 | # - The term (1) is the squared distance between the anchor "A" and the positive "P" for a given triplet; you want this to be small.
144 | # - The term (2) is the squared distance between the anchor "A" and the negative "N" for a given triplet, you want this to be relatively large, so it thus makes sense to have a minus sign preceding it.
145 | # - $\alpha$ is called the margin. It is a hyperparameter that you should pick manually. We will use $\alpha = 0.2$.
146 | #
147 | # Most implementations also normalize the encoding vectors to have norm equal one (i.e., $\mid \mid f(img)\mid \mid_2$=1); you won't have to worry about that here.
148 | #
149 | # **Exercise**: Implement the triplet loss as defined by formula (3). Here are the 4 steps:
150 | # 1. Compute the distance between the encodings of "anchor" and "positive": $\mid \mid f(A^{(i)}) - f(P^{(i)}) \mid \mid_2^2$
151 | # 2. Compute the distance between the encodings of "anchor" and "negative": $\mid \mid f(A^{(i)}) - f(N^{(i)}) \mid \mid_2^2$
152 | # 3. Compute the formula per training example: $ \mid \mid f(A^{(i)}) - f(P^{(i)}) \mid - \mid \mid f(A^{(i)}) - f(N^{(i)}) \mid \mid_2^2 + \alpha$
153 | # 3. Compute the full formula by taking the max with zero and summing over the training examples:
154 | # $$\mathcal{J} = \sum^{N}_{i=1} \large[ \small \mid \mid f(A^{(i)}) - f(P^{(i)}) \mid \mid_2^2 - \mid \mid f(A^{(i)}) - f(N^{(i)}) \mid \mid_2^2+ \alpha \large ] \small_+ \tag{3}$$
155 | #
156 | # Useful functions: `tf.reduce_sum()`, `tf.square()`, `tf.subtract()`, `tf.add()`, `tf.reduce_mean`, `tf.maximum()`.
157 |
158 | # In[8]:
159 |
160 | # GRADED FUNCTION: triplet_loss
161 |
162 | def triplet_loss(y_true, y_pred, alpha = 0.2):
163 | """
164 | Implementation of the triplet loss as defined by formula (3)
165 |
166 | Arguments:
167 | y_true -- true labels, required when you define a loss in Keras, you don't need it in this function.
168 | y_pred -- python list containing three objects:
169 | anchor -- the encodings for the anchor images, of shape (None, 128)
170 | positive -- the encodings for the positive images, of shape (None, 128)
171 | negative -- the encodings for the negative images, of shape (None, 128)
172 |
173 | Returns:
174 | loss -- real number, value of the loss
175 | """
176 |
177 | anchor, positive, negative = y_pred[0], y_pred[1], y_pred[2]
178 |
179 | ### START CODE HERE ### (≈ 4 lines)
180 | # Step 1: Compute the (encoding) distance between the anchor and the positive
181 | pos_dist = tf.reduce_sum(tf.square(tf.subtract(y_pred[0],y_pred[1])))
182 | # Step 2: Compute the (encoding) distance between the anchor and the negative
183 | neg_dist = tf.reduce_sum(tf.square(tf.subtract(y_pred[0],y_pred[2])))
184 | # Step 3: subtract the two previous distances and add alpha.
185 | basic_loss = tf.add(tf.subtract(pos_dist,neg_dist),alpha)
186 | # Step 4: Take the maximum of basic_loss and 0.0. Sum over the training examples.
187 | loss = tf.reduce_sum(tf.maximum(basic_loss,0.0))
188 | ### END CODE HERE ###
189 |
190 | return loss
191 |
192 |
193 | # In[9]:
194 |
195 | with tf.Session() as test:
196 | tf.set_random_seed(1)
197 | y_true = (None, None, None)
198 | y_pred = (tf.random_normal([3, 128], mean=6, stddev=0.1, seed = 1),
199 | tf.random_normal([3, 128], mean=1, stddev=1, seed = 1),
200 | tf.random_normal([3, 128], mean=3, stddev=4, seed = 1))
201 | loss = triplet_loss(y_true, y_pred)
202 |
203 | print("loss = " + str(loss.eval()))
204 |
205 |
206 | # **Expected Output**:
207 | #
208 | #
209 | #
210 | #
211 | # **loss**
212 | # |
213 | #
214 | # 350.026
215 | # |
216 | #
217 | #
218 | #
219 |
220 | # ## 2 - Loading the trained model
221 | #
222 | # FaceNet is trained by minimizing the triplet loss. But since training requires a lot of data and a lot of computation, we won't train it from scratch here. Instead, we load a previously trained model. Load a model using the following cell; this might take a couple of minutes to run.
223 |
224 | # In[10]:
225 |
226 | FRmodel.compile(optimizer = 'adam', loss = triplet_loss, metrics = ['accuracy'])
227 | load_weights_from_FaceNet(FRmodel)
228 | #########################
229 | #There's a problem that I can't solve
230 | #########################
231 |
232 |
233 | # Here're some examples of distances between the encodings between three individuals:
234 | #
235 | #
236 | #
237 | # **Figure 4**:
Example of distance outputs between three individuals' encodings
238 | #
239 | # Let's now use this model to perform face verification and face recognition!
240 |
241 | # ## 3 - Applying the model
242 |
243 | # Back to the Happy House! Residents are living blissfully since you implemented happiness recognition for the house in an earlier assignment.
244 | #
245 | # However, several issues keep coming up: The Happy House became so happy that every happy person in the neighborhood is coming to hang out in your living room. It is getting really crowded, which is having a negative impact on the residents of the house. All these random happy people are also eating all your food.
246 | #
247 | # So, you decide to change the door entry policy, and not just let random happy people enter anymore, even if they are happy! Instead, you'd like to build a **Face verification** system so as to only let people from a specified list come in. To get admitted, each person has to swipe an ID card (identification card) to identify themselves at the door. The face recognition system then checks that they are who they claim to be.
248 |
249 | # ### 3.1 - Face Verification
250 | #
251 | # Let's build a database containing one encoding vector for each person allowed to enter the happy house. To generate the encoding we use `img_to_encoding(image_path, model)` which basically runs the forward propagation of the model on the specified image.
252 | #
253 | # Run the following code to build the database (represented as a python dictionary). This database maps each person's name to a 128-dimensional encoding of their face.
254 |
255 | # In[11]:
256 |
257 | database = {}
258 | database["danielle"] = img_to_encoding("images/danielle.png", FRmodel)
259 | database["younes"] = img_to_encoding("images/younes.jpg", FRmodel)
260 | database["tian"] = img_to_encoding("images/tian.jpg", FRmodel)
261 | database["andrew"] = img_to_encoding("images/andrew.jpg", FRmodel)
262 | database["kian"] = img_to_encoding("images/kian.jpg", FRmodel)
263 | database["dan"] = img_to_encoding("images/dan.jpg", FRmodel)
264 | database["sebastiano"] = img_to_encoding("images/sebastiano.jpg", FRmodel)
265 | database["bertrand"] = img_to_encoding("images/bertrand.jpg", FRmodel)
266 | database["kevin"] = img_to_encoding("images/kevin.jpg", FRmodel)
267 | database["felix"] = img_to_encoding("images/felix.jpg", FRmodel)
268 | database["benoit"] = img_to_encoding("images/benoit.jpg", FRmodel)
269 | database["arnaud"] = img_to_encoding("images/arnaud.jpg", FRmodel)
270 |
271 |
272 | # Now, when someone shows up at your front door and swipes their ID card (thus giving you their name), you can look up their encoding in the database, and use it to check if the person standing at the front door matches the name on the ID.
273 | #
274 | # **Exercise**: Implement the verify() function which checks if the front-door camera picture (`image_path`) is actually the person called "identity". You will have to go through the following steps:
275 | # 1. Compute the encoding of the image from image_path
276 | # 2. Compute the distance about this encoding and the encoding of the identity image stored in the database
277 | # 3. Open the door if the distance is less than 0.7, else do not open.
278 | #
279 | # As presented above, you should use the L2 distance (np.linalg.norm). (Note: In this implementation, compare the L2 distance, not the square of the L2 distance, to the threshold 0.7.)
280 |
281 | # In[33]:
282 |
283 | # GRADED FUNCTION: verify
284 |
285 | def verify(image_path, identity, database, model):
286 | """
287 | Function that verifies if the person on the "image_path" image is "identity".
288 |
289 | Arguments:
290 | image_path -- path to an image
291 | identity -- string, name of the person you'd like to verify the identity. Has to be a resident of the Happy house.
292 | database -- python dictionary mapping names of allowed people's names (strings) to their encodings (vectors).
293 | model -- your Inception model instance in Keras
294 |
295 | Returns:
296 | dist -- distance between the image_path and the image of "identity" in the database.
297 | door_open -- True, if the door should open. False otherwise.
298 | """
299 |
300 | ### START CODE HERE ###
301 |
302 | # Step 1: Compute the encoding for the image. Use img_to_encoding() see example above. (≈ 1 line)
303 | encoding = img_to_encoding(image_path,model)
304 |
305 | # Step 2: Compute distance with identity's image (≈ 1 line)
306 | dist = np.linalg.norm((encoding-database[identity]))
307 | ########################
308 | #There's a problem that I can't solve
309 | ########################
310 |
311 | # Step 3: Open the door if dist < 0.7, else don't open (≈ 3 lines)
312 | if dist < 0.7:
313 | print("It's " + str(identity) + ", welcome home!")
314 | door_open = True
315 | else:
316 | print("It's not " + str(identity) + ", please go away")
317 | door_open = False
318 |
319 | ### END CODE HERE ###
320 |
321 | return dist, door_open
322 |
323 |
324 | # Younes is trying to enter the Happy House and the camera takes a picture of him ("images/camera_0.jpg"). Let's run your verification algorithm on this picture:
325 | #
326 | #
327 |
328 | # In[34]:
329 |
330 | verify("images/camera_0.jpg", "younes", database, FRmodel)
331 |
332 |
333 | # **Expected Output**:
334 | #
335 | #
336 | #
337 | #
338 | # **It's younes, welcome home!**
339 | # |
340 | #
341 | # (0.65939283, True)
342 | # |
343 | #
344 | #
345 | #
346 |
347 | # Benoit, who broke the aquarium last weekend, has been banned from the house and removed from the database. He stole Kian's ID card and came back to the house to try to present himself as Kian. The front-door camera took a picture of Benoit ("images/camera_2.jpg). Let's run the verification algorithm to check if benoit can enter.
348 | #
349 |
350 | # In[35]:
351 |
352 | verify("images/camera_2.jpg", "kian", database, FRmodel)
353 |
354 |
355 | # **Expected Output**:
356 | #
357 | #
358 | #
359 | #
360 | # **It's not kian, please go away**
361 | # |
362 | #
363 | # (0.86224014, False)
364 | # |
365 | #
366 | #
367 | #
368 |
369 | # ### 3.2 - Face Recognition
370 | #
371 | # Your face verification system is mostly working well. But since Kian got his ID card stolen, when he came back to the house that evening he couldn't get in!
372 | #
373 | # To reduce such shenanigans, you'd like to change your face verification system to a face recognition system. This way, no one has to carry an ID card anymore. An authorized person can just walk up to the house, and the front door will unlock for them!
374 | #
375 | # You'll implement a face recognition system that takes as input an image, and figures out if it is one of the authorized persons (and if so, who). Unlike the previous face verification system, we will no longer get a person's name as another input.
376 | #
377 | # **Exercise**: Implement `who_is_it()`. You will have to go through the following steps:
378 | # 1. Compute the target encoding of the image from image_path
379 | # 2. Find the encoding from the database that has smallest distance with the target encoding.
380 | # - Initialize the `min_dist` variable to a large enough number (100). It will help you keep track of what is the closest encoding to the input's encoding.
381 | # - Loop over the database dictionary's names and encodings. To loop use `for (name, db_enc) in database.items()`.
382 | # - Compute L2 distance between the target "encoding" and the current "encoding" from the database.
383 | # - If this distance is less than the min_dist, then set min_dist to dist, and identity to name.
384 |
385 | # In[36]:
386 |
387 | # GRADED FUNCTION: who_is_it
388 |
389 | def who_is_it(image_path, database, model):
390 | """
391 | Implements face recognition for the happy house by finding who is the person on the image_path image.
392 |
393 | Arguments:
394 | image_path -- path to an image
395 | database -- database containing image encodings along with the name of the person on the image
396 | model -- your Inception model instance in Keras
397 |
398 | Returns:
399 | min_dist -- the minimum distance between image_path encoding and the encodings from the database
400 | identity -- string, the name prediction for the person on image_path
401 | """
402 |
403 | ### START CODE HERE ###
404 |
405 | ## Step 1: Compute the target "encoding" for the image. Use img_to_encoding() see example above. ## (≈ 1 line)
406 | encoding = img_to_encoding(image_path,model)
407 |
408 | ## Step 2: Find the closest encoding ##
409 |
410 | # Initialize "min_dist" to a large value, say 100 (≈1 line)
411 | min_dist = 100
412 |
413 | # Loop over the database dictionary's names and encodings.
414 | for (name, db_enc) in database:
415 |
416 | # Compute L2 distance between the target "encoding" and the current "emb" from the database. (≈ 1 line)
417 | dist = np.linalg.norm((encoding-db_enc))
418 | ###########################
419 | #There's a problem that I can't solve,same as above
420 | ###########################
421 |
422 | # If this distance is less than the min_dist, then set min_dist to dist, and identity to name. (≈ 3 lines)
423 | if dist < min_dist:
424 | min_dist = dist
425 | identity = name
426 |
427 | ### END CODE HERE ###
428 |
429 | if min_dist > 0.7:
430 | print("Not in the database.")
431 | else:
432 | print ("it's " + str(identity) + ", the distance is " + str(min_dist))
433 |
434 | return min_dist, identity
435 |
436 |
437 | # Younes is at the front-door and the camera takes a picture of him ("images/camera_0.jpg"). Let's see if your who_it_is() algorithm identifies Younes.
438 |
439 | # In[37]:
440 |
441 | who_is_it("images/camera_0.jpg", database, FRmodel)
442 |
443 |
444 | # **Expected Output**:
445 | #
446 | #
447 | #
448 | #
449 | # **it's younes, the distance is 0.659393**
450 | # |
451 | #
452 | # (0.65939283, 'younes')
453 | # |
454 | #
455 | #
456 | #
457 |
458 | # You can change "`camera_0.jpg`" (picture of younes) to "`camera_1.jpg`" (picture of bertrand) and see the result.
459 |
460 | # Your Happy House is running well. It only lets in authorized persons, and people don't need to carry an ID card around anymore!
461 | #
462 | # You've now seen how a state-of-the-art face recognition system works.
463 | #
464 | # Although we won't implement it here, here're some ways to further improve the algorithm:
465 | # - Put more images of each person (under different lighting conditions, taken on different days, etc.) into the database. Then given a new image, compare the new face to multiple pictures of the person. This would increae accuracy.
466 | # - Crop the images to just contain the face, and less of the "border" region around the face. This preprocessing removes some of the irrelevant pixels around the face, and also makes the algorithm more robust.
467 | #
468 |
469 | #
470 | # **What you should remember**:
471 | # - Face verification solves an easier 1:1 matching problem; face recognition addresses a harder 1:K matching problem.
472 | # - The triplet loss is an effective loss function for training a neural network to learn an encoding of a face image.
473 | # - The same encoding can be used for verification and recognition. Measuring distances between two images' encodings allows you to determine whether they are pictures of the same person.
474 |
475 | # Congrats on finishing this assignment!
476 | #
477 |
478 | # ### References:
479 | #
480 | # - Florian Schroff, Dmitry Kalenichenko, James Philbin (2015). [FaceNet: A Unified Embedding for Face Recognition and Clustering](https://arxiv.org/pdf/1503.03832.pdf)
481 | # - Yaniv Taigman, Ming Yang, Marc'Aurelio Ranzato, Lior Wolf (2014). [DeepFace: Closing the gap to human-level performance in face verification](https://research.fb.com/wp-content/uploads/2016/11/deepface-closing-the-gap-to-human-level-performance-in-face-verification.pdf)
482 | # - The pretrained model we use is inspired by Victor Sy Wang's implementation and was loaded using his code: https://github.com/iwantooxxoox/Keras-OpenFace.
483 | # - Our implementation also took a lot of inspiration from the official FaceNet github repository: https://github.com/davidsandberg/facenet
484 | #
485 |
--------------------------------------------------------------------------------
/py/Gradient Checking.py:
--------------------------------------------------------------------------------
1 |
2 | # coding: utf-8
3 |
4 | # # Gradient Checking
5 | #
6 | # Welcome to the final assignment for this week! In this assignment you will learn to implement and use gradient checking.
7 | #
8 | # You are part of a team working to make mobile payments available globally, and are asked to build a deep learning model to detect fraud--whenever someone makes a payment, you want to see if the payment might be fraudulent, such as if the user's account has been taken over by a hacker.
9 | #
10 | # But backpropagation is quite challenging to implement, and sometimes has bugs. Because this is a mission-critical application, your company's CEO wants to be really certain that your implementation of backpropagation is correct. Your CEO says, "Give me a proof that your backpropagation is actually working!" To give this reassurance, you are going to use "gradient checking".
11 | #
12 | # Let's do it!
13 |
14 | # In[1]:
15 |
16 | # Packages
17 | import numpy as np
18 | from testCases import *
19 | from gc_utils import sigmoid, relu, dictionary_to_vector, vector_to_dictionary, gradients_to_vector
20 |
21 |
22 | # ## 1) How does gradient checking work?
23 | #
24 | # Backpropagation computes the gradients $\frac{\partial J}{\partial \theta}$, where $\theta$ denotes the parameters of the model. $J$ is computed using forward propagation and your loss function.
25 | #
26 | # Because forward propagation is relatively easy to implement, you're confident you got that right, and so you're almost 100% sure that you're computing the cost $J$ correctly. Thus, you can use your code for computing $J$ to verify the code for computing $\frac{\partial J}{\partial \theta}$.
27 | #
28 | # Let's look back at the definition of a derivative (or gradient):
29 | # $$ \frac{\partial J}{\partial \theta} = \lim_{\varepsilon \to 0} \frac{J(\theta + \varepsilon) - J(\theta - \varepsilon)}{2 \varepsilon} \tag{1}$$
30 | #
31 | # If you're not familiar with the "$\displaystyle \lim_{\varepsilon \to 0}$" notation, it's just a way of saying "when $\varepsilon$ is really really small."
32 | #
33 | # We know the following:
34 | #
35 | # - $\frac{\partial J}{\partial \theta}$ is what you want to make sure you're computing correctly.
36 | # - You can compute $J(\theta + \varepsilon)$ and $J(\theta - \varepsilon)$ (in the case that $\theta$ is a real number), since you're confident your implementation for $J$ is correct.
37 | #
38 | # Lets use equation (1) and a small value for $\varepsilon$ to convince your CEO that your code for computing $\frac{\partial J}{\partial \theta}$ is correct!
39 |
40 | # ## 2) 1-dimensional gradient checking
41 | #
42 | # Consider a 1D linear function $J(\theta) = \theta x$. The model contains only a single real-valued parameter $\theta$, and takes $x$ as input.
43 | #
44 | # You will implement code to compute $J(.)$ and its derivative $\frac{\partial J}{\partial \theta}$. You will then use gradient checking to make sure your derivative computation for $J$ is correct.
45 | #
46 | #
47 | # **Figure 1** : **1D linear model**
48 | #
49 | # The diagram above shows the key computation steps: First start with $x$, then evaluate the function $J(x)$ ("forward propagation"). Then compute the derivative $\frac{\partial J}{\partial \theta}$ ("backward propagation").
50 | #
51 | # **Exercise**: implement "forward propagation" and "backward propagation" for this simple function. I.e., compute both $J(.)$ ("forward propagation") and its derivative with respect to $\theta$ ("backward propagation"), in two separate functions.
52 |
53 | # In[2]:
54 |
55 | # GRADED FUNCTION: forward_propagation
56 |
57 | def forward_propagation(x, theta):
58 | """
59 | Implement the linear forward propagation (compute J) presented in Figure 1 (J(theta) = theta * x)
60 |
61 | Arguments:
62 | x -- a real-valued input
63 | theta -- our parameter, a real number as well
64 |
65 | Returns:
66 | J -- the value of function J, computed using the formula J(theta) = theta * x
67 | """
68 |
69 | ### START CODE HERE ### (approx. 1 line)
70 | J = theta * x
71 | ### END CODE HERE ###
72 |
73 | return J
74 |
75 |
76 | # In[3]:
77 |
78 | x, theta = 2, 4
79 | J = forward_propagation(x, theta)
80 | print ("J = " + str(J))
81 |
82 |
83 | # **Expected Output**:
84 | #
85 | #
86 | #
87 | # ** J ** |
88 | # 8 |
89 | #
90 | #
91 |
92 | # **Exercise**: Now, implement the backward propagation step (derivative computation) of Figure 1. That is, compute the derivative of $J(\theta) = \theta x$ with respect to $\theta$. To save you from doing the calculus, you should get $dtheta = \frac { \partial J }{ \partial \theta} = x$.
93 |
94 | # In[4]:
95 |
96 | # GRADED FUNCTION: backward_propagation
97 |
98 | def backward_propagation(x, theta):
99 | """
100 | Computes the derivative of J with respect to theta (see Figure 1).
101 |
102 | Arguments:
103 | x -- a real-valued input
104 | theta -- our parameter, a real number as well
105 |
106 | Returns:
107 | dtheta -- the gradient of the cost with respect to theta
108 | """
109 |
110 | ### START CODE HERE ### (approx. 1 line)
111 | dtheta = x
112 | ### END CODE HERE ###
113 |
114 | return dtheta
115 |
116 |
117 | # In[5]:
118 |
119 | x, theta = 2, 4
120 | dtheta = backward_propagation(x, theta)
121 | print ("dtheta = " + str(dtheta))
122 |
123 |
124 | # **Expected Output**:
125 | #
126 | #
127 | #
128 | # ** dtheta ** |
129 | # 2 |
130 | #
131 | #
132 |
133 | # **Exercise**: To show that the `backward_propagation()` function is correctly computing the gradient $\frac{\partial J}{\partial \theta}$, let's implement gradient checking.
134 | #
135 | # **Instructions**:
136 | # - First compute "gradapprox" using the formula above (1) and a small value of $\varepsilon$. Here are the Steps to follow:
137 | # 1. $\theta^{+} = \theta + \varepsilon$
138 | # 2. $\theta^{-} = \theta - \varepsilon$
139 | # 3. $J^{+} = J(\theta^{+})$
140 | # 4. $J^{-} = J(\theta^{-})$
141 | # 5. $gradapprox = \frac{J^{+} - J^{-}}{2 \varepsilon}$
142 | # - Then compute the gradient using backward propagation, and store the result in a variable "grad"
143 | # - Finally, compute the relative difference between "gradapprox" and the "grad" using the following formula:
144 | # $$ difference = \frac {\mid\mid grad - gradapprox \mid\mid_2}{\mid\mid grad \mid\mid_2 + \mid\mid gradapprox \mid\mid_2} \tag{2}$$
145 | # You will need 3 Steps to compute this formula:
146 | # - 1'. compute the numerator using np.linalg.norm(...)
147 | # - 2'. compute the denominator. You will need to call np.linalg.norm(...) twice.
148 | # - 3'. divide them.
149 | # - If this difference is small (say less than $10^{-7}$), you can be quite confident that you have computed your gradient correctly. Otherwise, there may be a mistake in the gradient computation.
150 | #
151 |
152 | # In[6]:
153 |
154 | # GRADED FUNCTION: gradient_check
155 |
156 | def gradient_check(x, theta, epsilon = 1e-7):
157 | """
158 | Implement the backward propagation presented in Figure 1.
159 |
160 | Arguments:
161 | x -- a real-valued input
162 | theta -- our parameter, a real number as well
163 | epsilon -- tiny shift to the input to compute approximated gradient with formula(1)
164 |
165 | Returns:
166 | difference -- difference (2) between the approximated gradient and the backward propagation gradient
167 | """
168 |
169 | # Compute gradapprox using left side of formula (1). epsilon is small enough, you don't need to worry about the limit.
170 | ### START CODE HERE ### (approx. 5 lines)
171 | thetaplus = theta + epsilon # Step 1
172 | thetaminus = theta - epsilon # Step 2
173 | J_plus = forward_propagation(x, thetaplus) # Step 3
174 | J_minus = forward_propagation(x, thetaminus) # Step 4
175 | gradapprox = (J_plus - J_minus) / (2 * epsilon) # Step 5
176 | ### END CODE HERE ###
177 |
178 | # Check if gradapprox is close enough to the output of backward_propagation()
179 | ### START CODE HERE ### (approx. 1 line)
180 | grad = backward_propagation(x,theta)
181 | ### END CODE HERE ###
182 |
183 | ### START CODE HERE ### (approx. 1 line)
184 | numerator = np.linalg.norm(grad - gradapprox) # Step 1'
185 | denominator = np.linalg.norm(grad) + np.linalg.norm(gradapprox) # Step 2'
186 | difference = numerator / denominator # Step 3'
187 | ### END CODE HERE ###
188 |
189 | if difference < 1e-7:
190 | print ("The gradient is correct!")
191 | else:
192 | print ("The gradient is wrong!")
193 |
194 | return difference
195 |
196 |
197 | # In[7]:
198 |
199 | x, theta = 2, 4
200 | difference = gradient_check(x, theta)
201 | print("difference = " + str(difference))
202 |
203 |
204 | # **Expected Output**:
205 | # The gradient is correct!
206 | #
207 | #
208 | # ** difference ** |
209 | # 2.9193358103083e-10 |
210 | #
211 | #
212 |
213 | # Congrats, the difference is smaller than the $10^{-7}$ threshold. So you can have high confidence that you've correctly computed the gradient in `backward_propagation()`.
214 | #
215 | # Now, in the more general case, your cost function $J$ has more than a single 1D input. When you are training a neural network, $\theta$ actually consists of multiple matrices $W^{[l]}$ and biases $b^{[l]}$! It is important to know how to do a gradient check with higher-dimensional inputs. Let's do it!
216 |
217 | # ## 3) N-dimensional gradient checking
218 |
219 | # The following figure describes the forward and backward propagation of your fraud detection model.
220 | #
221 | #
222 | # **Figure 2** : **deep neural network**
*LINEAR -> RELU -> LINEAR -> RELU -> LINEAR -> SIGMOID*
223 | #
224 | # Let's look at your implementations for forward propagation and backward propagation.
225 |
226 | # In[8]:
227 |
228 | def forward_propagation_n(X, Y, parameters):
229 | """
230 | Implements the forward propagation (and computes the cost) presented in Figure 3.
231 |
232 | Arguments:
233 | X -- training set for m examples
234 | Y -- labels for m examples
235 | parameters -- python dictionary containing your parameters "W1", "b1", "W2", "b2", "W3", "b3":
236 | W1 -- weight matrix of shape (5, 4)
237 | b1 -- bias vector of shape (5, 1)
238 | W2 -- weight matrix of shape (3, 5)
239 | b2 -- bias vector of shape (3, 1)
240 | W3 -- weight matrix of shape (1, 3)
241 | b3 -- bias vector of shape (1, 1)
242 |
243 | Returns:
244 | cost -- the cost function (logistic cost for one example)
245 | """
246 |
247 | # retrieve parameters
248 | m = X.shape[1]
249 | W1 = parameters["W1"]
250 | b1 = parameters["b1"]
251 | W2 = parameters["W2"]
252 | b2 = parameters["b2"]
253 | W3 = parameters["W3"]
254 | b3 = parameters["b3"]
255 |
256 | # LINEAR -> RELU -> LINEAR -> RELU -> LINEAR -> SIGMOID
257 | Z1 = np.dot(W1, X) + b1
258 | A1 = relu(Z1)
259 | Z2 = np.dot(W2, A1) + b2
260 | A2 = relu(Z2)
261 | Z3 = np.dot(W3, A2) + b3
262 | A3 = sigmoid(Z3)
263 |
264 | # Cost
265 | logprobs = np.multiply(-np.log(A3),Y) + np.multiply(-np.log(1 - A3), 1 - Y)
266 | cost = 1./m * np.sum(logprobs)
267 |
268 | cache = (Z1, A1, W1, b1, Z2, A2, W2, b2, Z3, A3, W3, b3)
269 |
270 | return cost, cache
271 |
272 |
273 | # Now, run backward propagation.
274 |
275 | # In[16]:
276 |
277 | def backward_propagation_n(X, Y, cache):
278 | """
279 | Implement the backward propagation presented in figure 2.
280 |
281 | Arguments:
282 | X -- input datapoint, of shape (input size, 1)
283 | Y -- true "label"
284 | cache -- cache output from forward_propagation_n()
285 |
286 | Returns:
287 | gradients -- A dictionary with the gradients of the cost with respect to each parameter, activation and pre-activation variables.
288 | """
289 |
290 | m = X.shape[1]
291 | (Z1, A1, W1, b1, Z2, A2, W2, b2, Z3, A3, W3, b3) = cache
292 |
293 | dZ3 = A3 - Y
294 | dW3 = 1./m * np.dot(dZ3, A2.T)
295 | db3 = 1./m * np.sum(dZ3, axis=1, keepdims = True)
296 |
297 | dA2 = np.dot(W3.T, dZ3)
298 | dZ2 = np.multiply(dA2, np.int64(A2 > 0))
299 | dW2 = 1./m * np.dot(dZ2, A1.T)
300 | db2 = 1./m * np.sum(dZ2, axis=1, keepdims = True)
301 |
302 | dA1 = np.dot(W2.T, dZ2)
303 | dZ1 = np.multiply(dA1, np.int64(A1 > 0))
304 | dW1 = 1./m * np.dot(dZ1, X.T)
305 | db1 = 1./m * np.sum(dZ1, axis=1, keepdims = True)
306 |
307 | gradients = {"dZ3": dZ3, "dW3": dW3, "db3": db3,
308 | "dA2": dA2, "dZ2": dZ2, "dW2": dW2, "db2": db2,
309 | "dA1": dA1, "dZ1": dZ1, "dW1": dW1, "db1": db1}
310 |
311 | return gradients
312 |
313 |
314 | # You obtained some results on the fraud detection test set but you are not 100% sure of your model. Nobody's perfect! Let's implement gradient checking to verify if your gradients are correct.
315 |
316 | # **How does gradient checking work?**.
317 | #
318 | # As in 1) and 2), you want to compare "gradapprox" to the gradient computed by backpropagation. The formula is still:
319 | #
320 | # $$ \frac{\partial J}{\partial \theta} = \lim_{\varepsilon \to 0} \frac{J(\theta + \varepsilon) - J(\theta - \varepsilon)}{2 \varepsilon} \tag{1}$$
321 | #
322 | # However, $\theta$ is not a scalar anymore. It is a dictionary called "parameters". We implemented a function "`dictionary_to_vector()`" for you. It converts the "parameters" dictionary into a vector called "values", obtained by reshaping all parameters (W1, b1, W2, b2, W3, b3) into vectors and concatenating them.
323 | #
324 | # The inverse function is "`vector_to_dictionary`" which outputs back the "parameters" dictionary.
325 | #
326 | #
327 | # **Figure 2** : **dictionary_to_vector() and vector_to_dictionary()**
You will need these functions in gradient_check_n()
328 | #
329 | # We have also converted the "gradients" dictionary into a vector "grad" using gradients_to_vector(). You don't need to worry about that.
330 | #
331 | # **Exercise**: Implement gradient_check_n().
332 | #
333 | # **Instructions**: Here is pseudo-code that will help you implement the gradient check.
334 | #
335 | # For each i in num_parameters:
336 | # - To compute `J_plus[i]`:
337 | # 1. Set $\theta^{+}$ to `np.copy(parameters_values)`
338 | # 2. Set $\theta^{+}_i$ to $\theta^{+}_i + \varepsilon$
339 | # 3. Calculate $J^{+}_i$ using to `forward_propagation_n(x, y, vector_to_dictionary(`$\theta^{+}$ `))`.
340 | # - To compute `J_minus[i]`: do the same thing with $\theta^{-}$
341 | # - Compute $gradapprox[i] = \frac{J^{+}_i - J^{-}_i}{2 \varepsilon}$
342 | #
343 | # Thus, you get a vector gradapprox, where gradapprox[i] is an approximation of the gradient with respect to `parameter_values[i]`. You can now compare this gradapprox vector to the gradients vector from backpropagation. Just like for the 1D case (Steps 1', 2', 3'), compute:
344 | # $$ difference = \frac {\| grad - gradapprox \|_2}{\| grad \|_2 + \| gradapprox \|_2 } \tag{3}$$
345 |
346 | # In[17]:
347 |
348 | # GRADED FUNCTION: gradient_check_n
349 |
350 | def gradient_check_n(parameters, gradients, X, Y, epsilon = 1e-7):
351 | """
352 | Checks if backward_propagation_n computes correctly the gradient of the cost output by forward_propagation_n
353 |
354 | Arguments:
355 | parameters -- python dictionary containing your parameters "W1", "b1", "W2", "b2", "W3", "b3":
356 | grad -- output of backward_propagation_n, contains gradients of the cost with respect to the parameters.
357 | x -- input datapoint, of shape (input size, 1)
358 | y -- true "label"
359 | epsilon -- tiny shift to the input to compute approximated gradient with formula(1)
360 |
361 | Returns:
362 | difference -- difference (2) between the approximated gradient and the backward propagation gradient
363 | """
364 |
365 | # Set-up variables
366 | parameters_values, _ = dictionary_to_vector(parameters)
367 | grad = gradients_to_vector(gradients)
368 | num_parameters = parameters_values.shape[0]
369 | J_plus = np.zeros((num_parameters, 1))
370 | J_minus = np.zeros((num_parameters, 1))
371 | gradapprox = np.zeros((num_parameters, 1))
372 |
373 | # Compute gradapprox
374 | for i in range(num_parameters):
375 |
376 | # Compute J_plus[i]. Inputs: "parameters_values, epsilon". Output = "J_plus[i]".
377 | # "_" is used because the function you have to outputs two parameters but we only care about the first one
378 | ### START CODE HERE ### (approx. 3 lines)
379 | thetaplus = np.copy(parameters_values) # Step 1
380 | thetaplus[i][0] = thetaplus[i][0] + epsilon # Step 2
381 | J_plus[i], _ = forward_propagation_n(X, Y, vector_to_dictionary(thetaplus)) # Step 3
382 | ### END CODE HERE ###
383 |
384 | # Compute J_minus[i]. Inputs: "parameters_values, epsilon". Output = "J_minus[i]".
385 | ### START CODE HERE ### (approx. 3 lines)
386 | thetaminus = np.copy(parameters_values) # Step 1
387 | thetaminus[i][0] = thetaplus[i][0] - epsilon # Step 2
388 | J_minus[i], _ = forward_propagation_n(X, Y, vector_to_dictionary(thetaminus)) # Step 3
389 | ### END CODE HERE ###
390 |
391 | # Compute gradapprox[i]
392 | ### START CODE HERE ### (approx. 1 line)
393 | gradapprox[i] = (J_plus[i] - J_minus[i]) / (2 * epsilon)
394 | ### END CODE HERE ###
395 |
396 | # Compare gradapprox to backward propagation gradients by computing difference.
397 | ### START CODE HERE ### (approx. 1 line)
398 | numerator = np.linalg.norm(grad - gradapprox) # Step 1'
399 | denominator = np.linalg.norm(grad) + np.linalg.norm(gradapprox) # Step 2'
400 | difference = numerator / denominator # Step 3'
401 | ### END CODE HERE ###
402 |
403 | if difference > 1e-7:
404 | print ("\033[93m" + "There is a mistake in the backward propagation! difference = " + str(difference) + "\033[0m")
405 | else:
406 | print ("\033[92m" + "Your backward propagation works perfectly fine! difference = " + str(difference) + "\033[0m")
407 |
408 | return difference
409 |
410 |
411 | # In[18]:
412 |
413 | X, Y, parameters = gradient_check_n_test_case()
414 |
415 | cost, cache = forward_propagation_n(X, Y, parameters)
416 | gradients = backward_propagation_n(X, Y, cache)
417 | difference = gradient_check_n(parameters, gradients, X, Y)
418 |
419 |
420 | # **Expected output**:
421 | #
422 | #
423 | #
424 | # ** There is a mistake in the backward propagation!** |
425 | # difference = 0.285093156781 |
426 | #
427 | #
428 |
429 | # It seems that there were errors in the `backward_propagation_n` code we gave you! Good that you've implemented the gradient check. Go back to `backward_propagation` and try to find/correct the errors *(Hint: check dW2 and db1)*. Rerun the gradient check when you think you've fixed it. Remember you'll need to re-execute the cell defining `backward_propagation_n()` if you modify the code.
430 | #
431 | # Can you get gradient check to declare your derivative computation correct? Even though this part of the assignment isn't graded, we strongly urge you to try to find the bug and re-run gradient check until you're convinced backprop is now correctly implemented.
432 | #
433 | # **Note**
434 | # - Gradient Checking is slow! Approximating the gradient with $\frac{\partial J}{\partial \theta} \approx \frac{J(\theta + \varepsilon) - J(\theta - \varepsilon)}{2 \varepsilon}$ is computationally costly. For this reason, we don't run gradient checking at every iteration during training. Just a few times to check if the gradient is correct.
435 | # - Gradient Checking, at least as we've presented it, doesn't work with dropout. You would usually run the gradient check algorithm without dropout to make sure your backprop is correct, then add dropout.
436 | #
437 | # Congrats, you can be confident that your deep learning model for fraud detection is working correctly! You can even use this to convince your CEO. :)
438 | #
439 | #
440 | # **What you should remember from this notebook**:
441 | # - Gradient checking verifies closeness between the gradients from backpropagation and the numerical approximation of the gradient (computed using forward propagation).
442 | # - Gradient checking is slow, so we don't run it in every iteration of training. You would usually run it only to make sure your code is correct, then turn it off and use backprop for the actual learning process.
443 |
444 | # In[ ]:
445 |
446 |
447 |
448 |
--------------------------------------------------------------------------------
/py/Initialization.py:
--------------------------------------------------------------------------------
1 |
2 | # coding: utf-8
3 |
4 | # # Initialization
5 | #
6 | # Welcome to the first assignment of "Improving Deep Neural Networks".
7 | #
8 | # Training your neural network requires specifying an initial value of the weights. A well chosen initialization method will help learning.
9 | #
10 | # If you completed the previous course of this specialization, you probably followed our instructions for weight initialization, and it has worked out so far. But how do you choose the initialization for a new neural network? In this notebook, you will see how different initializations lead to different results.
11 | #
12 | # A well chosen initialization can:
13 | # - Speed up the convergence of gradient descent
14 | # - Increase the odds of gradient descent converging to a lower training (and generalization) error
15 | #
16 | # To get started, run the following cell to load the packages and the planar dataset you will try to classify.
17 |
18 | # In[1]:
19 |
20 | import numpy as np
21 | import matplotlib.pyplot as plt
22 | import sklearn
23 | import sklearn.datasets
24 | from init_utils import sigmoid, relu, compute_loss, forward_propagation, backward_propagation
25 | from init_utils import update_parameters, predict, load_dataset, plot_decision_boundary, predict_dec
26 |
27 | get_ipython().magic('matplotlib inline')
28 | plt.rcParams['figure.figsize'] = (7.0, 4.0) # set default size of plots
29 | plt.rcParams['image.interpolation'] = 'nearest'
30 | plt.rcParams['image.cmap'] = 'gray'
31 |
32 | # load image dataset: blue/red dots in circles
33 | train_X, train_Y, test_X, test_Y = load_dataset()
34 |
35 |
36 | # You would like a classifier to separate the blue dots from the red dots.
37 |
38 | # ## 1 - Neural Network model
39 |
40 | # You will use a 3-layer neural network (already implemented for you). Here are the initialization methods you will experiment with:
41 | # - *Zeros initialization* -- setting `initialization = "zeros"` in the input argument.
42 | # - *Random initialization* -- setting `initialization = "random"` in the input argument. This initializes the weights to large random values.
43 | # - *He initialization* -- setting `initialization = "he"` in the input argument. This initializes the weights to random values scaled according to a paper by He et al., 2015.
44 | #
45 | # **Instructions**: Please quickly read over the code below, and run it. In the next part you will implement the three initialization methods that this `model()` calls.
46 |
47 | # In[2]:
48 |
49 | def model(X, Y, learning_rate = 0.01, num_iterations = 15000, print_cost = True, initialization = "he"):
50 | """
51 | Implements a three-layer neural network: LINEAR->RELU->LINEAR->RELU->LINEAR->SIGMOID.
52 |
53 | Arguments:
54 | X -- input data, of shape (2, number of examples)
55 | Y -- true "label" vector (containing 0 for red dots; 1 for blue dots), of shape (1, number of examples)
56 | learning_rate -- learning rate for gradient descent
57 | num_iterations -- number of iterations to run gradient descent
58 | print_cost -- if True, print the cost every 1000 iterations
59 | initialization -- flag to choose which initialization to use ("zeros","random" or "he")
60 |
61 | Returns:
62 | parameters -- parameters learnt by the model
63 | """
64 |
65 | grads = {}
66 | costs = [] # to keep track of the loss
67 | m = X.shape[1] # number of examples
68 | layers_dims = [X.shape[0], 10, 5, 1]
69 |
70 | # Initialize parameters dictionary.
71 | if initialization == "zeros":
72 | parameters = initialize_parameters_zeros(layers_dims)
73 | elif initialization == "random":
74 | parameters = initialize_parameters_random(layers_dims)
75 | elif initialization == "he":
76 | parameters = initialize_parameters_he(layers_dims)
77 |
78 | # Loop (gradient descent)
79 |
80 | for i in range(0, num_iterations):
81 |
82 | # Forward propagation: LINEAR -> RELU -> LINEAR -> RELU -> LINEAR -> SIGMOID.
83 | a3, cache = forward_propagation(X, parameters)
84 |
85 | # Loss
86 | cost = compute_loss(a3, Y)
87 |
88 | # Backward propagation.
89 | grads = backward_propagation(X, Y, cache)
90 |
91 | # Update parameters.
92 | parameters = update_parameters(parameters, grads, learning_rate)
93 |
94 | # Print the loss every 1000 iterations
95 | if print_cost and i % 1000 == 0:
96 | print("Cost after iteration {}: {}".format(i, cost))
97 | costs.append(cost)
98 |
99 | # plot the loss
100 | plt.plot(costs)
101 | plt.ylabel('cost')
102 | plt.xlabel('iterations (per hundreds)')
103 | plt.title("Learning rate =" + str(learning_rate))
104 | plt.show()
105 |
106 | return parameters
107 |
108 |
109 | # ## 2 - Zero initialization
110 | #
111 | # There are two types of parameters to initialize in a neural network:
112 | # - the weight matrices $(W^{[1]}, W^{[2]}, W^{[3]}, ..., W^{[L-1]}, W^{[L]})$
113 | # - the bias vectors $(b^{[1]}, b^{[2]}, b^{[3]}, ..., b^{[L-1]}, b^{[L]})$
114 | #
115 | # **Exercise**: Implement the following function to initialize all parameters to zeros. You'll see later that this does not work well since it fails to "break symmetry", but lets try it anyway and see what happens. Use np.zeros((..,..)) with the correct shapes.
116 |
117 | # In[3]:
118 |
119 | # GRADED FUNCTION: initialize_parameters_zeros
120 |
121 | def initialize_parameters_zeros(layers_dims):
122 | """
123 | Arguments:
124 | layer_dims -- python array (list) containing the size of each layer.
125 |
126 | Returns:
127 | parameters -- python dictionary containing your parameters "W1", "b1", ..., "WL", "bL":
128 | W1 -- weight matrix of shape (layers_dims[1], layers_dims[0])
129 | b1 -- bias vector of shape (layers_dims[1], 1)
130 | ...
131 | WL -- weight matrix of shape (layers_dims[L], layers_dims[L-1])
132 | bL -- bias vector of shape (layers_dims[L], 1)
133 | """
134 |
135 | parameters = {}
136 | L = len(layers_dims) # number of layers in the network
137 |
138 | for l in range(1, L):
139 | ### START CODE HERE ### (≈ 2 lines of code)
140 | parameters['W' + str(l)] = np.zeros((layers_dims[l],layers_dims[l-1]))
141 | parameters['b' + str(l)] = np.zeros((layers_dims[l],1))
142 | ### END CODE HERE ###
143 | return parameters
144 |
145 |
146 | # In[4]:
147 |
148 | parameters = initialize_parameters_zeros([3,2,1])
149 | print("W1 = " + str(parameters["W1"]))
150 | print("b1 = " + str(parameters["b1"]))
151 | print("W2 = " + str(parameters["W2"]))
152 | print("b2 = " + str(parameters["b2"]))
153 |
154 |
155 | # **Expected Output**:
156 | #
157 | #
158 | #
159 | #
160 | # **W1**
161 | # |
162 | #
163 | # [[ 0. 0. 0.]
164 | # [ 0. 0. 0.]]
165 | # |
166 | #
167 | #
168 | #
169 | # **b1**
170 | # |
171 | #
172 | # [[ 0.]
173 | # [ 0.]]
174 | # |
175 | #
176 | #
177 | #
178 | # **W2**
179 | # |
180 | #
181 | # [[ 0. 0.]]
182 | # |
183 | #
184 | #
185 | #
186 | # **b2**
187 | # |
188 | #
189 | # [[ 0.]]
190 | # |
191 | #
192 | #
193 | #
194 |
195 | # Run the following code to train your model on 15,000 iterations using zeros initialization.
196 |
197 | # In[5]:
198 |
199 | parameters = model(train_X, train_Y, initialization = "zeros")
200 | print ("On the train set:")
201 | predictions_train = predict(train_X, train_Y, parameters)
202 | print ("On the test set:")
203 | predictions_test = predict(test_X, test_Y, parameters)
204 |
205 |
206 | # The performance is really bad, and the cost does not really decrease, and the algorithm performs no better than random guessing. Why? Lets look at the details of the predictions and the decision boundary:
207 |
208 | # In[6]:
209 |
210 | print ("predictions_train = " + str(predictions_train))
211 | print ("predictions_test = " + str(predictions_test))
212 |
213 |
214 | # In[7]:
215 |
216 | plt.title("Model with Zeros initialization")
217 | axes = plt.gca()
218 | axes.set_xlim([-1.5,1.5])
219 | axes.set_ylim([-1.5,1.5])
220 | plot_decision_boundary(lambda x: predict_dec(parameters, x.T), train_X, train_Y)
221 |
222 |
223 | # The model is predicting 0 for every example.
224 | #
225 | # In general, initializing all the weights to zero results in the network failing to break symmetry. This means that every neuron in each layer will learn the same thing, and you might as well be training a neural network with $n^{[l]}=1$ for every layer, and the network is no more powerful than a linear classifier such as logistic regression.
226 |
227 | #
228 | # **What you should remember**:
229 | # - The weights $W^{[l]}$ should be initialized randomly to break symmetry.
230 | # - It is however okay to initialize the biases $b^{[l]}$ to zeros. Symmetry is still broken so long as $W^{[l]}$ is initialized randomly.
231 | #
232 |
233 | # ## 3 - Random initialization
234 | #
235 | # To break symmetry, lets intialize the weights randomly. Following random initialization, each neuron can then proceed to learn a different function of its inputs. In this exercise, you will see what happens if the weights are intialized randomly, but to very large values.
236 | #
237 | # **Exercise**: Implement the following function to initialize your weights to large random values (scaled by \*10) and your biases to zeros. Use `np.random.randn(..,..) * 10` for weights and `np.zeros((.., ..))` for biases. We are using a fixed `np.random.seed(..)` to make sure your "random" weights match ours, so don't worry if running several times your code gives you always the same initial values for the parameters.
238 |
239 | # In[10]:
240 |
241 | # GRADED FUNCTION: initialize_parameters_random
242 |
243 | def initialize_parameters_random(layers_dims):
244 | """
245 | Arguments:
246 | layer_dims -- python array (list) containing the size of each layer.
247 |
248 | Returns:
249 | parameters -- python dictionary containing your parameters "W1", "b1", ..., "WL", "bL":
250 | W1 -- weight matrix of shape (layers_dims[1], layers_dims[0])
251 | b1 -- bias vector of shape (layers_dims[1], 1)
252 | ...
253 | WL -- weight matrix of shape (layers_dims[L], layers_dims[L-1])
254 | bL -- bias vector of shape (layers_dims[L], 1)
255 | """
256 |
257 | np.random.seed(3) # This seed makes sure your "random" numbers will be the as ours
258 | parameters = {}
259 | L = len(layers_dims) # integer representing the number of layers
260 |
261 | for l in range(1, L):
262 | ### START CODE HERE ### (≈ 2 lines of code)
263 | parameters['W' + str(l)] = np.random.randn(layers_dims[l],layers_dims[l-1]) * 10
264 | parameters['b' + str(l)] = np.zeros((layers_dims[l],1))
265 | ### END CODE HERE ###
266 |
267 | return parameters
268 |
269 |
270 | # In[11]:
271 |
272 | parameters = initialize_parameters_random([3, 2, 1])
273 | print("W1 = " + str(parameters["W1"]))
274 | print("b1 = " + str(parameters["b1"]))
275 | print("W2 = " + str(parameters["W2"]))
276 | print("b2 = " + str(parameters["b2"]))
277 |
278 |
279 | # **Expected Output**:
280 | #
281 | #
282 | #
283 | #
284 | # **W1**
285 | # |
286 | #
287 | # [[ 17.88628473 4.36509851 0.96497468]
288 | # [-18.63492703 -2.77388203 -3.54758979]]
289 | # |
290 | #
291 | #
292 | #
293 | # **b1**
294 | # |
295 | #
296 | # [[ 0.]
297 | # [ 0.]]
298 | # |
299 | #
300 | #
301 | #
302 | # **W2**
303 | # |
304 | #
305 | # [[-0.82741481 -6.27000677]]
306 | # |
307 | #
308 | #
309 | #
310 | # **b2**
311 | # |
312 | #
313 | # [[ 0.]]
314 | # |
315 | #
316 | #
317 | #
318 |
319 | # Run the following code to train your model on 15,000 iterations using random initialization.
320 |
321 | # In[12]:
322 |
323 | parameters = model(train_X, train_Y, initialization = "random")
324 | print ("On the train set:")
325 | predictions_train = predict(train_X, train_Y, parameters)
326 | print ("On the test set:")
327 | predictions_test = predict(test_X, test_Y, parameters)
328 |
329 |
330 | # If you see "inf" as the cost after the iteration 0, this is because of numerical roundoff; a more numerically sophisticated implementation would fix this. But this isn't worth worrying about for our purposes.
331 | #
332 | # Anyway, it looks like you have broken symmetry, and this gives better results. than before. The model is no longer outputting all 0s.
333 |
334 | # In[13]:
335 |
336 | print (predictions_train)
337 | print (predictions_test)
338 |
339 |
340 | # In[14]:
341 |
342 | plt.title("Model with large random initialization")
343 | axes = plt.gca()
344 | axes.set_xlim([-1.5,1.5])
345 | axes.set_ylim([-1.5,1.5])
346 | plot_decision_boundary(lambda x: predict_dec(parameters, x.T), train_X, train_Y)
347 |
348 |
349 | # **Observations**:
350 | # - The cost starts very high. This is because with large random-valued weights, the last activation (sigmoid) outputs results that are very close to 0 or 1 for some examples, and when it gets that example wrong it incurs a very high loss for that example. Indeed, when $\log(a^{[3]}) = \log(0)$, the loss goes to infinity.
351 | # - Poor initialization can lead to vanishing/exploding gradients, which also slows down the optimization algorithm.
352 | # - If you train this network longer you will see better results, but initializing with overly large random numbers slows down the optimization.
353 | #
354 | #
355 | # **In summary**:
356 | # - Initializing weights to very large random values does not work well.
357 | # - Hopefully intializing with small random values does better. The important question is: how small should be these random values be? Lets find out in the next part!
358 |
359 | # ## 4 - He initialization
360 | #
361 | # Finally, try "He Initialization"; this is named for the first author of He et al., 2015. (If you have heard of "Xavier initialization", this is similar except Xavier initialization uses a scaling factor for the weights $W^{[l]}$ of `sqrt(1./layers_dims[l-1])` where He initialization would use `sqrt(2./layers_dims[l-1])`.)
362 | #
363 | # **Exercise**: Implement the following function to initialize your parameters with He initialization.
364 | #
365 | # **Hint**: This function is similar to the previous `initialize_parameters_random(...)`. The only difference is that instead of multiplying `np.random.randn(..,..)` by 10, you will multiply it by $\sqrt{\frac{2}{\text{dimension of the previous layer}}}$, which is what He initialization recommends for layers with a ReLU activation.
366 |
367 | # In[15]:
368 |
369 | # GRADED FUNCTION: initialize_parameters_he
370 |
371 | def initialize_parameters_he(layers_dims):
372 | """
373 | Arguments:
374 | layer_dims -- python array (list) containing the size of each layer.
375 |
376 | Returns:
377 | parameters -- python dictionary containing your parameters "W1", "b1", ..., "WL", "bL":
378 | W1 -- weight matrix of shape (layers_dims[1], layers_dims[0])
379 | b1 -- bias vector of shape (layers_dims[1], 1)
380 | ...
381 | WL -- weight matrix of shape (layers_dims[L], layers_dims[L-1])
382 | bL -- bias vector of shape (layers_dims[L], 1)
383 | """
384 |
385 | np.random.seed(3)
386 | parameters = {}
387 | L = len(layers_dims) - 1 # integer representing the number of layers
388 |
389 | for l in range(1, L + 1):
390 | ### START CODE HERE ### (≈ 2 lines of code)
391 | parameters['W' + str(l)] = np.random.randn(layers_dims[l],layers_dims[l-1]) * np.sqrt(2./layers_dims[l-1])
392 | parameters['b' + str(l)] = np.zeros((layers_dims[l],1))
393 | ### END CODE HERE ###
394 |
395 | return parameters
396 |
397 |
398 | # In[16]:
399 |
400 | parameters = initialize_parameters_he([2, 4, 1])
401 | print("W1 = " + str(parameters["W1"]))
402 | print("b1 = " + str(parameters["b1"]))
403 | print("W2 = " + str(parameters["W2"]))
404 | print("b2 = " + str(parameters["b2"]))
405 |
406 |
407 | # **Expected Output**:
408 | #
409 | #
410 | #
411 | #
412 | # **W1**
413 | # |
414 | #
415 | # [[ 1.78862847 0.43650985]
416 | # [ 0.09649747 -1.8634927 ]
417 | # [-0.2773882 -0.35475898]
418 | # [-0.08274148 -0.62700068]]
419 | # |
420 | #
421 | #
422 | #
423 | # **b1**
424 | # |
425 | #
426 | # [[ 0.]
427 | # [ 0.]
428 | # [ 0.]
429 | # [ 0.]]
430 | # |
431 | #
432 | #
433 | #
434 | # **W2**
435 | # |
436 | #
437 | # [[-0.03098412 -0.33744411 -0.92904268 0.62552248]]
438 | # |
439 | #
440 | #
441 | #
442 | # **b2**
443 | # |
444 | #
445 | # [[ 0.]]
446 | # |
447 | #
448 | #
449 | #
450 |
451 | # Run the following code to train your model on 15,000 iterations using He initialization.
452 |
453 | # In[17]:
454 |
455 | parameters = model(train_X, train_Y, initialization = "he")
456 | print ("On the train set:")
457 | predictions_train = predict(train_X, train_Y, parameters)
458 | print ("On the test set:")
459 | predictions_test = predict(test_X, test_Y, parameters)
460 |
461 |
462 | # In[18]:
463 |
464 | plt.title("Model with He initialization")
465 | axes = plt.gca()
466 | axes.set_xlim([-1.5,1.5])
467 | axes.set_ylim([-1.5,1.5])
468 | plot_decision_boundary(lambda x: predict_dec(parameters, x.T), train_X, train_Y)
469 |
470 |
471 | # **Observations**:
472 | # - The model with He initialization separates the blue and the red dots very well in a small number of iterations.
473 | #
474 |
475 | # ## 5 - Conclusions
476 |
477 | # You have seen three different types of initializations. For the same number of iterations and same hyperparameters the comparison is:
478 | #
479 | #
480 | #
481 | #
482 | # **Model**
483 | # |
484 | #
485 | # **Train accuracy**
486 | # |
487 | #
488 | # **Problem/Comment**
489 | # |
490 | #
491 | #
492 | #
493 | # 3-layer NN with zeros initialization
494 | # |
495 | #
496 | # 50%
497 | # |
498 | #
499 | # fails to break symmetry
500 | # |
501 | #
502 | #
503 | # 3-layer NN with large random initialization
504 | # |
505 | #
506 | # 83%
507 | # |
508 | #
509 | # too large weights
510 | # |
511 | #
512 | #
513 | #
514 | # 3-layer NN with He initialization
515 | # |
516 | #
517 | # 99%
518 | # |
519 | #
520 | # recommended method
521 | # |
522 | #
523 | #
524 |
525 | #
526 | # **What you should remember from this notebook**:
527 | # - Different initializations lead to different results
528 | # - Random initialization is used to break symmetry and make sure different hidden units can learn different things
529 | # - Don't intialize to values that are too large
530 | # - He initialization works well for networks with ReLU activations.
531 |
--------------------------------------------------------------------------------
/py/Keras Tutorial Happy House v2.py.html:
--------------------------------------------------------------------------------
1 |
2 | # coding: utf-8
3 |
4 | # # Keras tutorial - the Happy House
5 | #
6 | # Welcome to the first assignment of week 2. In this assignment, you will:
7 | # 1. Learn to use Keras, a high-level neural networks API (programming framework), written in Python and capable of running on top of several lower-level frameworks including TensorFlow and CNTK.
8 | # 2. See how you can in a couple of hours build a deep learning algorithm.
9 | #
10 | # Why are we using Keras? Keras was developed to enable deep learning engineers to build and experiment with different models very quickly. Just as TensorFlow is a higher-level framework than Python, Keras is an even higher-level framework and provides additional abstractions. Being able to go from idea to result with the least possible delay is key to finding good models. However, Keras is more restrictive than the lower-level frameworks, so there are some very complex models that you can implement in TensorFlow but not (without more difficulty) in Keras. That being said, Keras will work fine for many common models.
11 | #
12 | # In this exercise, you'll work on the "Happy House" problem, which we'll explain below. Let's load the required packages and solve the problem of the Happy House!
13 |
14 | # In[3]:
15 |
16 | import numpy as np
17 | from keras import layers
18 | from keras.layers import Input, Dense, Activation, ZeroPadding2D, BatchNormalization, Flatten, Conv2D
19 | from keras.layers import AveragePooling2D, MaxPooling2D, Dropout, GlobalMaxPooling2D, GlobalAveragePooling2D
20 | from keras.models import Model
21 | from keras.preprocessing import image
22 | from keras.utils import layer_utils
23 | from keras.utils.data_utils import get_file
24 | from keras.applications.imagenet_utils import preprocess_input
25 | import pydot
26 | from IPython.display import SVG
27 | from keras.utils.vis_utils import model_to_dot
28 | from keras.utils import plot_model
29 | from kt_utils import *
30 |
31 | import keras.backend as K
32 | K.set_image_data_format('channels_last')
33 | import matplotlib.pyplot as plt
34 | from matplotlib.pyplot import imshow
35 |
36 | get_ipython().magic('matplotlib inline')
37 |
38 |
39 | # **Note**: As you can see, we've imported a lot of functions from Keras. You can use them easily just by calling them directly in the notebook. Ex: `X = Input(...)` or `X = ZeroPadding2D(...)`.
40 |
41 | # ## 1 - The Happy House
42 | #
43 | # For your next vacation, you decided to spend a week with five of your friends from school. It is a very convenient house with many things to do nearby. But the most important benefit is that everybody has commited to be happy when they are in the house. So anyone wanting to enter the house must prove their current state of happiness.
44 | #
45 | #
46 | # **Figure 1** : **the Happy House**
47 | #
48 | #
49 | # As a deep learning expert, to make sure the "Happy" rule is strictly applied, you are going to build an algorithm which that uses pictures from the front door camera to check if the person is happy or not. The door should open only if the person is happy.
50 | #
51 | # You have gathered pictures of your friends and yourself, taken by the front-door camera. The dataset is labbeled.
52 | #
53 | #
54 | #
55 | # Run the following code to normalize the dataset and learn about its shapes.
56 |
57 | # In[4]:
58 |
59 | X_train_orig, Y_train_orig, X_test_orig, Y_test_orig, classes = load_dataset()
60 |
61 | # Normalize image vectors
62 | X_train = X_train_orig/255.
63 | X_test = X_test_orig/255.
64 |
65 | # Reshape
66 | Y_train = Y_train_orig.T
67 | Y_test = Y_test_orig.T
68 |
69 | print ("number of training examples = " + str(X_train.shape[0]))
70 | print ("number of test examples = " + str(X_test.shape[0]))
71 | print ("X_train shape: " + str(X_train.shape))
72 | print ("Y_train shape: " + str(Y_train.shape))
73 | print ("X_test shape: " + str(X_test.shape))
74 | print ("Y_test shape: " + str(Y_test.shape))
75 |
76 |
77 | # **Details of the "Happy" dataset**:
78 | # - Images are of shape (64,64,3)
79 | # - Training: 600 pictures
80 | # - Test: 150 pictures
81 | #
82 | # It is now time to solve the "Happy" Challenge.
83 |
84 | # ## 2 - Building a model in Keras
85 | #
86 | # Keras is very good for rapid prototyping. In just a short time you will be able to build a model that achieves outstanding results.
87 | #
88 | # Here is an example of a model in Keras:
89 | #
90 | # ```python
91 | # def model(input_shape):
92 | # # Define the input placeholder as a tensor with shape input_shape. Think of this as your input image!
93 | # X_input = Input(input_shape)
94 | #
95 | # # Zero-Padding: pads the border of X_input with zeroes
96 | # X = ZeroPadding2D((3, 3))(X_input)
97 | #
98 | # # CONV -> BN -> RELU Block applied to X
99 | # X = Conv2D(32, (7, 7), strides = (1, 1), name = 'conv0')(X)
100 | # X = BatchNormalization(axis = 3, name = 'bn0')(X)
101 | # X = Activation('relu')(X)
102 | #
103 | # # MAXPOOL
104 | # X = MaxPooling2D((2, 2), name='max_pool')(X)
105 | #
106 | # # FLATTEN X (means convert it to a vector) + FULLYCONNECTED
107 | # X = Flatten()(X)
108 | # X = Dense(1, activation='sigmoid', name='fc')(X)
109 | #
110 | # # Create model. This creates your Keras model instance, you'll use this instance to train/test the model.
111 | # model = Model(inputs = X_input, outputs = X, name='HappyModel')
112 | #
113 | # return model
114 | # ```
115 | #
116 | # Note that Keras uses a different convention with variable names than we've previously used with numpy and TensorFlow. In particular, rather than creating and assigning a new variable on each step of forward propagation such as `X`, `Z1`, `A1`, `Z2`, `A2`, etc. for the computations for the different layers, in Keras code each line above just reassigns `X` to a new value using `X = ...`. In other words, during each step of forward propagation, we are just writing the latest value in the commputation into the same variable `X`. The only exception was `X_input`, which we kept separate and did not overwrite, since we needed it at the end to create the Keras model instance (`model = Model(inputs = X_input, ...)` above).
117 | #
118 | # **Exercise**: Implement a `HappyModel()`. This assignment is more open-ended than most. We suggest that you start by implementing a model using the architecture we suggest, and run through the rest of this assignment using that as your initial model. But after that, come back and take initiative to try out other model architectures. For example, you might take inspiration from the model above, but then vary the network architecture and hyperparameters however you wish. You can also use other functions such as `AveragePooling2D()`, `GlobalMaxPooling2D()`, `Dropout()`.
119 | #
120 | # **Note**: You have to be careful with your data's shapes. Use what you've learned in the videos to make sure your convolutional, pooling and fully-connected layers are adapted to the volumes you're applying it to.
121 |
122 | # In[6]:
123 |
124 | # GRADED FUNCTION: HappyModel
125 |
126 | def HappyModel(input_shape):
127 | """
128 | Implementation of the HappyModel.
129 |
130 | Arguments:
131 | input_shape -- shape of the images of the dataset
132 |
133 | Returns:
134 | model -- a Model() instance in Keras
135 | """
136 |
137 | ### START CODE HERE ###
138 | # Feel free to use the suggested outline in the text above to get started, and run through the whole
139 | # exercise (including the later portions of this notebook) once. The come back also try out other
140 | # network architectures as well.
141 | # Define the input placeholder as a tensor with shape input_shape. Think of this as your input image!
142 | X_input = Input(input_shape)
143 |
144 | # Zero-Padding: pads the border of X_input with zeroes
145 | X = ZeroPadding2D((3, 3))(X_input)
146 |
147 | # CONV -> BN -> RELU Block applied to X
148 | X = Conv2D(32, (7, 7), strides = (1, 1), name = 'conv0')(X)
149 | X = BatchNormalization(axis = 3, name = 'bn0')(X)
150 | X = Activation('relu')(X)
151 |
152 | # MAXPOOL
153 | X = MaxPooling2D((2, 2), name='max_pool')(X)
154 |
155 | # FLATTEN X (means convert it to a vector) + FULLYCONNECTED
156 | X = Flatten()(X)
157 | X = Dense(1, activation='sigmoid', name='fc')(X)
158 |
159 | # Create model. This creates your Keras model instance, you'll use this instance to train/test the model.
160 | model = Model(inputs = X_input, outputs = X, name='HappyModel')
161 |
162 |
163 | ### END CODE HERE ###
164 |
165 | return model
166 |
167 |
168 | # You have now built a function to describe your model. To train and test this model, there are four steps in Keras:
169 | # 1. Create the model by calling the function above
170 | # 2. Compile the model by calling `model.compile(optimizer = "...", loss = "...", metrics = ["accuracy"])`
171 | # 3. Train the model on train data by calling `model.fit(x = ..., y = ..., epochs = ..., batch_size = ...)`
172 | # 4. Test the model on test data by calling `model.evaluate(x = ..., y = ...)`
173 | #
174 | # If you want to know more about `model.compile()`, `model.fit()`, `model.evaluate()` and their arguments, refer to the official [Keras documentation](https://keras.io/models/model/).
175 | #
176 | # **Exercise**: Implement step 1, i.e. create the model.
177 |
178 | # In[10]:
179 |
180 | ### START CODE HERE ### (1 line)
181 | happyModel = HappyModel((64,64,3))
182 | ### END CODE HERE ###
183 |
184 |
185 | # **Exercise**: Implement step 2, i.e. compile the model to configure the learning process. Choose the 3 arguments of `compile()` wisely. Hint: the Happy Challenge is a binary classification problem.
186 |
187 | # In[11]:
188 |
189 | ### START CODE HERE ### (1 line)
190 | happyModel.compile(optimizer = "adam",loss = "binary_crossentropy",metrics = ["accuracy"])
191 | ### END CODE HERE ###
192 |
193 |
194 | # **Exercise**: Implement step 3, i.e. train the model. Choose the number of epochs and the batch size.
195 |
196 | # In[12]:
197 |
198 | ### START CODE HERE ### (1 line)
199 | happyModel.fit(x = X_train,y = Y_train,epochs =40 ,batch_size = 16 )
200 | ### END CODE HERE ###
201 |
202 |
203 | # Note that if you run `fit()` again, the `model` will continue to train with the parameters it has already learnt instead of reinitializing them.
204 | #
205 | # **Exercise**: Implement step 4, i.e. test/evaluate the model.
206 |
207 | # In[14]:
208 |
209 | ### START CODE HERE ### (1 line)
210 | preds = happyModel.evaluate(x = X_test,y = Y_test)
211 | ### END CODE HERE ###
212 | print()
213 | print ("Loss = " + str(preds[0]))
214 | print ("Test Accuracy = " + str(preds[1]))
215 |
216 |
217 | # If your `happyModel()` function worked, you should have observed much better than random-guessing (50%) accuracy on the train and test sets.
218 | #
219 | # To give you a point of comparison, our model gets around **95% test accuracy in 40 epochs** (and 99% train accuracy) with a mini batch size of 16 and "adam" optimizer. But our model gets decent accuracy after just 2-5 epochs, so if you're comparing different models you can also train a variety of models on just a few epochs and see how they compare.
220 | #
221 | # If you have not yet achieved a very good accuracy (let's say more than 80%), here're some things you can play around with to try to achieve it:
222 | #
223 | # - Try using blocks of CONV->BATCHNORM->RELU such as:
224 | # ```python
225 | # X = Conv2D(32, (3, 3), strides = (1, 1), name = 'conv0')(X)
226 | # X = BatchNormalization(axis = 3, name = 'bn0')(X)
227 | # X = Activation('relu')(X)
228 | # ```
229 | # until your height and width dimensions are quite low and your number of channels quite large (≈32 for example). You are encoding useful information in a volume with a lot of channels. You can then flatten the volume and use a fully-connected layer.
230 | # - You can use MAXPOOL after such blocks. It will help you lower the dimension in height and width.
231 | # - Change your optimizer. We find Adam works well.
232 | # - If the model is struggling to run and you get memory issues, lower your batch_size (12 is usually a good compromise)
233 | # - Run on more epochs, until you see the train accuracy plateauing.
234 | #
235 | # Even if you have achieved a good accuracy, please feel free to keep playing with your model to try to get even better results.
236 | #
237 | # **Note**: If you perform hyperparameter tuning on your model, the test set actually becomes a dev set, and your model might end up overfitting to the test (dev) set. But just for the purpose of this assignment, we won't worry about that here.
238 | #
239 |
240 | # ## 3 - Conclusion
241 | #
242 | # Congratulations, you have solved the Happy House challenge!
243 | #
244 | # Now, you just need to link this model to the front-door camera of your house. We unfortunately won't go into the details of how to do that here.
245 |
246 | #
247 | # **What we would like you to remember from this assignment:**
248 | # - Keras is a tool we recommend for rapid prototyping. It allows you to quickly try out different model architectures. Are there any applications of deep learning to your daily life that you'd like to implement using Keras?
249 | # - Remember how to code a model in Keras and the four steps leading to the evaluation of your model on the test set. Create->Compile->Fit/Train->Evaluate/Test.
250 |
251 | # ## 4 - Test with your own image (Optional)
252 | #
253 | # Congratulations on finishing this assignment. You can now take a picture of your face and see if you could enter the Happy House. To do that:
254 | # 1. Click on "File" in the upper bar of this notebook, then click "Open" to go on your Coursera Hub.
255 | # 2. Add your image to this Jupyter Notebook's directory, in the "images" folder
256 | # 3. Write your image's name in the following code
257 | # 4. Run the code and check if the algorithm is right (0 is unhappy, 1 is happy)!
258 | #
259 | # The training/test sets were quite similar; for example, all the pictures were taken against the same background (since a front door camera is always mounted in the same position). This makes the problem easier, but a model trained on this data may or may not work on your own data. But feel free to give it a try!
260 |
261 | # In[18]:
262 |
263 | ### START CODE HERE ###
264 | img_path = 'images/beautiful.jpg'
265 | ### END CODE HERE ###
266 | img = image.load_img(img_path, target_size=(64, 64))
267 | imshow(img)
268 |
269 | x = image.img_to_array(img)
270 | x = np.expand_dims(x, axis=0)
271 | x = preprocess_input(x)
272 |
273 | print(happyModel.predict(x))
274 |
275 |
276 | # ## 5 - Other useful functions in Keras (Optional)
277 | #
278 | # Two other basic features of Keras that you'll find useful are:
279 | # - `model.summary()`: prints the details of your layers in a table with the sizes of its inputs/outputs
280 | # - `plot_model()`: plots your graph in a nice layout. You can even save it as ".png" using SVG() if you'd like to share it on social media ;). It is saved in "File" then "Open..." in the upper bar of the notebook.
281 | #
282 | # Run the following code.
283 |
284 | # In[16]:
285 |
286 | happyModel.summary()
287 |
288 |
289 | # In[17]:
290 |
291 | plot_model(happyModel, to_file='HappyModel.png')
292 | SVG(model_to_dot(happyModel).create(prog='dot', format='svg'))
293 |
294 |
295 | # In[ ]:
296 |
297 |
298 |
299 |
--------------------------------------------------------------------------------
/py/Neural machine translation with attention.py:
--------------------------------------------------------------------------------
1 |
2 | # coding: utf-8
3 |
4 | # # Neural Machine Translation
5 | #
6 | # Welcome to your first programming assignment for this week!
7 | #
8 | # You will build a Neural Machine Translation (NMT) model to translate human readable dates ("25th of June, 2009") into machine readable dates ("2009-06-25"). You will do this using an attention model, one of the most sophisticated sequence to sequence models.
9 | #
10 | # This notebook was produced together with NVIDIA's Deep Learning Institute.
11 | #
12 | # Let's load all the packages you will need for this assignment.
13 |
14 | # In[1]:
15 |
16 | from keras.layers import Bidirectional, Concatenate, Permute, Dot, Input, LSTM, Multiply
17 | from keras.layers import RepeatVector, Dense, Activation, Lambda
18 | from keras.optimizers import Adam
19 | from keras.utils import to_categorical
20 | from keras.models import load_model, Model
21 | import keras.backend as K
22 | import numpy as np
23 |
24 | from faker import Faker
25 | import random
26 | from tqdm import tqdm
27 | from babel.dates import format_date
28 | from nmt_utils import *
29 | import matplotlib.pyplot as plt
30 | get_ipython().magic('matplotlib inline')
31 |
32 |
33 | # ## 1 - Translating human readable dates into machine readable dates
34 | #
35 | # The model you will build here could be used to translate from one language to another, such as translating from English to Hindi. However, language translation requires massive datasets and usually takes days of training on GPUs. To give you a place to experiment with these models even without using massive datasets, we will instead use a simpler "date translation" task.
36 | #
37 | # The network will input a date written in a variety of possible formats (*e.g. "the 29th of August 1958", "03/30/1968", "24 JUNE 1987"*) and translate them into standardized, machine readable dates (*e.g. "1958-08-29", "1968-03-30", "1987-06-24"*). We will have the network learn to output dates in the common machine-readable format YYYY-MM-DD.
38 | #
39 | #
40 | #
41 | #
43 |
44 | # ### 1.1 - Dataset
45 | #
46 | # We will train the model on a dataset of 10000 human readable dates and their equivalent, standardized, machine readable dates. Let's run the following cells to load the dataset and print some examples.
47 |
48 | # In[2]:
49 |
50 | m = 10000
51 | dataset, human_vocab, machine_vocab, inv_machine_vocab = load_dataset(m)
52 |
53 |
54 | # In[3]:
55 |
56 | dataset[:10]
57 |
58 |
59 | # You've loaded:
60 | # - `dataset`: a list of tuples of (human readable date, machine readable date)
61 | # - `human_vocab`: a python dictionary mapping all characters used in the human readable dates to an integer-valued index
62 | # - `machine_vocab`: a python dictionary mapping all characters used in machine readable dates to an integer-valued index. These indices are not necessarily consistent with `human_vocab`.
63 | # - `inv_machine_vocab`: the inverse dictionary of `machine_vocab`, mapping from indices back to characters.
64 | #
65 | # Let's preprocess the data and map the raw text data into the index values. We will also use Tx=30 (which we assume is the maximum length of the human readable date; if we get a longer input, we would have to truncate it) and Ty=10 (since "YYYY-MM-DD" is 10 characters long).
66 |
67 | # In[4]:
68 |
69 | Tx = 30
70 | Ty = 10
71 | X, Y, Xoh, Yoh = preprocess_data(dataset, human_vocab, machine_vocab, Tx, Ty)
72 |
73 | print("X.shape:", X.shape)
74 | print("Y.shape:", Y.shape)
75 | print("Xoh.shape:", Xoh.shape)
76 | print("Yoh.shape:", Yoh.shape)
77 |
78 |
79 | # You now have:
80 | # - `X`: a processed version of the human readable dates in the training set, where each character is replaced by an index mapped to the character via `human_vocab`. Each date is further padded to $T_x$ values with a special character (< pad >). `X.shape = (m, Tx)`
81 | # - `Y`: a processed version of the machine readable dates in the training set, where each character is replaced by the index it is mapped to in `machine_vocab`. You should have `Y.shape = (m, Ty)`.
82 | # - `Xoh`: one-hot version of `X`, the "1" entry's index is mapped to the character thanks to `human_vocab`. `Xoh.shape = (m, Tx, len(human_vocab))`
83 | # - `Yoh`: one-hot version of `Y`, the "1" entry's index is mapped to the character thanks to `machine_vocab`. `Yoh.shape = (m, Tx, len(machine_vocab))`. Here, `len(machine_vocab) = 11` since there are 11 characters ('-' as well as 0-9).
84 | #
85 |
86 | # Lets also look at some examples of preprocessed training examples. Feel free to play with `index` in the cell below to navigate the dataset and see how source/target dates are preprocessed.
87 |
88 | # In[5]:
89 |
90 | index = 0
91 | print("Source date:", dataset[index][0])
92 | print("Target date:", dataset[index][1])
93 | print()
94 | print("Source after preprocessing (indices):", X[index])
95 | print("Target after preprocessing (indices):", Y[index])
96 | print()
97 | print("Source after preprocessing (one-hot):", Xoh[index])
98 | print("Target after preprocessing (one-hot):", Yoh[index])
99 |
100 |
101 | # ## 2 - Neural machine translation with attention
102 | #
103 | # If you had to translate a book's paragraph from French to English, you would not read the whole paragraph, then close the book and translate. Even during the translation process, you would read/re-read and focus on the parts of the French paragraph corresponding to the parts of the English you are writing down.
104 | #
105 | # The attention mechanism tells a Neural Machine Translation model where it should pay attention to at any step.
106 | #
107 | #
108 | # ### 2.1 - Attention mechanism
109 | #
110 | # In this part, you will implement the attention mechanism presented in the lecture videos. Here is a figure to remind you how the model works. The diagram on the left shows the attention model. The diagram on the right shows what one "Attention" step does to calculate the attention variables $\alpha^{\langle t, t' \rangle}$, which are used to compute the context variable $context^{\langle t \rangle}$ for each timestep in the output ($t=1, \ldots, T_y$).
111 | #
112 | #
113 | #
114 | #
115 | # |
116 | #
117 | #
118 | # |
119 | #
120 | # **Figure 1**: Neural machine translation with attention
121 | #
122 |
123 | #
124 | # Here are some properties of the model that you may notice:
125 | #
126 | # - There are two separate LSTMs in this model (see diagram on the left). Because the one at the bottom of the picture is a Bi-directional LSTM and comes *before* the attention mechanism, we will call it *pre-attention* Bi-LSTM. The LSTM at the top of the diagram comes *after* the attention mechanism, so we will call it the *post-attention* LSTM. The pre-attention Bi-LSTM goes through $T_x$ time steps; the post-attention LSTM goes through $T_y$ time steps.
127 | #
128 | # - The post-attention LSTM passes $s^{\langle t \rangle}, c^{\langle t \rangle}$ from one time step to the next. In the lecture videos, we were using only a basic RNN for the post-activation sequence model, so the state captured by the RNN output activations $s^{\langle t\rangle}$. But since we are using an LSTM here, the LSTM has both the output activation $s^{\langle t\rangle}$ and the hidden cell state $c^{\langle t\rangle}$. However, unlike previous text generation examples (such as Dinosaurus in week 1), in this model the post-activation LSTM at time $t$ does will not take the specific generated $y^{\langle t-1 \rangle}$ as input; it only takes $s^{\langle t\rangle}$ and $c^{\langle t\rangle}$ as input. We have designed the model this way, because (unlike language generation where adjacent characters are highly correlated) there isn't as strong a dependency between the previous character and the next character in a YYYY-MM-DD date.
129 | #
130 | # - We use $a^{\langle t \rangle} = [\overrightarrow{a}^{\langle t \rangle}; \overleftarrow{a}^{\langle t \rangle}]$ to represent the concatenation of the activations of both the forward-direction and backward-directions of the pre-attention Bi-LSTM.
131 | #
132 | # - The diagram on the right uses a `RepeatVector` node to copy $s^{\langle t-1 \rangle}$'s value $T_x$ times, and then `Concatenation` to concatenate $s^{\langle t-1 \rangle}$ and $a^{\langle t \rangle}$ to compute $e^{\langle t, t'}$, which is then passed through a softmax to compute $\alpha^{\langle t, t' \rangle}$. We'll explain how to use `RepeatVector` and `Concatenation` in Keras below.
133 | #
134 | # Lets implement this model. You will start by implementing two functions: `one_step_attention()` and `model()`.
135 | #
136 | # **1) `one_step_attention()`**: At step $t$, given all the hidden states of the Bi-LSTM ($[a^{<1>},a^{<2>}, ..., a^{}]$) and the previous hidden state of the second LSTM ($s^{}$), `one_step_attention()` will compute the attention weights ($[\alpha^{},\alpha^{}, ..., \alpha^{}]$) and output the context vector (see Figure 1 (right) for details):
137 | # $$context^{} = \sum_{t' = 0}^{T_x} \alpha^{}a^{}\tag{1}$$
138 | #
139 | # Note that we are denoting the attention in this notebook $context^{\langle t \rangle}$. In the lecture videos, the context was denoted $c^{\langle t \rangle}$, but here we are calling it $context^{\langle t \rangle}$ to avoid confusion with the (post-attention) LSTM's internal memory cell variable, which is sometimes also denoted $c^{\langle t \rangle}$.
140 | #
141 | # **2) `model()`**: Implements the entire model. It first runs the input through a Bi-LSTM to get back $[a^{<1>},a^{<2>}, ..., a^{}]$. Then, it calls `one_step_attention()` $T_y$ times (`for` loop). At each iteration of this loop, it gives the computed context vector $c^{}$ to the second LSTM, and runs the output of the LSTM through a dense layer with softmax activation to generate a prediction $\hat{y}^{}$.
142 | #
143 | #
144 | #
145 | # **Exercise**: Implement `one_step_attention()`. The function `model()` will call the layers in `one_step_attention()` $T_y$ using a for-loop, and it is important that all $T_y$ copies have the same weights. I.e., it should not re-initiaiize the weights every time. In other words, all $T_y$ steps should have shared weights. Here's how you can implement layers with shareable weights in Keras:
146 | # 1. Define the layer objects (as global variables for examples).
147 | # 2. Call these objects when propagating the input.
148 | #
149 | # We have defined the layers you need as global variables. Please run the following cells to create them. Please check the Keras documentation to make sure you understand what these layers are: [RepeatVector()](https://keras.io/layers/core/#repeatvector), [Concatenate()](https://keras.io/layers/merge/#concatenate), [Dense()](https://keras.io/layers/core/#dense), [Activation()](https://keras.io/layers/core/#activation), [Dot()](https://keras.io/layers/merge/#dot).
150 |
151 | # In[6]:
152 |
153 | # Defined shared layers as global variables
154 | repeator = RepeatVector(Tx)
155 | concatenator = Concatenate(axis=-1)
156 | densor1 = Dense(10, activation = "tanh")
157 | densor2 = Dense(1, activation = "relu")
158 | activator = Activation(softmax, name='attention_weights') # We are using a custom softmax(axis = 1) loaded in this notebook
159 | dotor = Dot(axes = 1)
160 |
161 |
162 | # Now you can use these layers to implement `one_step_attention()`. In order to propagate a Keras tensor object X through one of these layers, use `layer(X)` (or `layer([X,Y])` if it requires multiple inputs.), e.g. `densor(X)` will propagate X through the `Dense(1)` layer defined above.
163 |
164 | # In[7]:
165 |
166 | # GRADED FUNCTION: one_step_attention
167 |
168 | def one_step_attention(a, s_prev):
169 | """
170 | Performs one step of attention: Outputs a context vector computed as a dot product of the attention weights
171 | "alphas" and the hidden states "a" of the Bi-LSTM.
172 |
173 | Arguments:
174 | a -- hidden state output of the Bi-LSTM, numpy-array of shape (m, Tx, 2*n_a)
175 | s_prev -- previous hidden state of the (post-attention) LSTM, numpy-array of shape (m, n_s)
176 |
177 | Returns:
178 | context -- context vector, input of the next (post-attetion) LSTM cell
179 | """
180 |
181 | ### START CODE HERE ###
182 | # Use repeator to repeat s_prev to be of shape (m, Tx, n_s) so that you can concatenate it with all hidden states "a" (≈ 1 line)
183 | s_prev = repeator(s_prev)
184 | # Use concatenator to concatenate a and s_prev on the last axis (≈ 1 line)
185 | concat = concatenator([a,s_prev])
186 | # Use densor1 to propagate concat through a small fully-connected neural network to compute the "intermediate energies" variable e. (≈1 lines)
187 | e = densor1(concat)
188 | # Use densor2 to propagate e through a small fully-connected neural network to compute the "energies" variable energies. (≈1 lines)
189 | energies = densor2(e)
190 | # Use "activator" on "energies" to compute the attention weights "alphas" (≈ 1 line)
191 | alphas = activator(energies)
192 | # Use dotor together with "alphas" and "a" to compute the context vector to be given to the next (post-attention) LSTM-cell (≈ 1 line)
193 | context = dotor([alphas,a])
194 | ### END CODE HERE ###
195 |
196 | return context
197 |
198 |
199 | # You will be able to check the expected output of `one_step_attention()` after you've coded the `model()` function.
200 |
201 | # **Exercise**: Implement `model()` as explained in figure 2 and the text above. Again, we have defined global layers that will share weights to be used in `model()`.
202 |
203 | # In[8]:
204 |
205 | n_a = 32
206 | n_s = 64
207 | post_activation_LSTM_cell = LSTM(n_s, return_state = True)
208 | output_layer = Dense(len(machine_vocab), activation=softmax)
209 |
210 |
211 | # Now you can use these layers $T_y$ times in a `for` loop to generate the outputs, and their parameters will not be reinitialized. You will have to carry out the following steps:
212 | #
213 | # 1. Propagate the input into a [Bidirectional](https://keras.io/layers/wrappers/#bidirectional) [LSTM](https://keras.io/layers/recurrent/#lstm)
214 | # 2. Iterate for $t = 0, \dots, T_y-1$:
215 | # 1. Call `one_step_attention()` on $[\alpha^{},\alpha^{}, ..., \alpha^{}]$ and $s^{}$ to get the context vector $context^{}$.
216 | # 2. Give $context^{}$ to the post-attention LSTM cell. Remember pass in the previous hidden-state $s^{\langle t-1\rangle}$ and cell-states $c^{\langle t-1\rangle}$ of this LSTM using `initial_state= [previous hidden state, previous cell state]`. Get back the new hidden state $s^{}$ and the new cell state $c^{}$.
217 | # 3. Apply a softmax layer to $s^{}$, get the output.
218 | # 4. Save the output by adding it to the list of outputs.
219 | #
220 | # 3. Create your Keras model instance, it should have three inputs ("inputs", $s^{<0>}$ and $c^{<0>}$) and output the list of "outputs".
221 |
222 | # In[9]:
223 |
224 | # GRADED FUNCTION: model
225 |
226 | def model(Tx, Ty, n_a, n_s, human_vocab_size, machine_vocab_size):
227 | """
228 | Arguments:
229 | Tx -- length of the input sequence
230 | Ty -- length of the output sequence
231 | n_a -- hidden state size of the Bi-LSTM
232 | n_s -- hidden state size of the post-attention LSTM
233 | human_vocab_size -- size of the python dictionary "human_vocab"
234 | machine_vocab_size -- size of the python dictionary "machine_vocab"
235 |
236 | Returns:
237 | model -- Keras model instance
238 | """
239 |
240 | # Define the inputs of your model with a shape (Tx,)
241 | # Define s0 and c0, initial hidden state for the decoder LSTM of shape (n_s,)
242 | X = Input(shape=(Tx, human_vocab_size))
243 | s0 = Input(shape=(n_s,), name='s0')
244 | c0 = Input(shape=(n_s,), name='c0')
245 | s = s0
246 | c = c0
247 |
248 | # Initialize empty list of outputs
249 | outputs = []
250 |
251 | ### START CODE HERE ###
252 |
253 | # Step 1: Define your pre-attention Bi-LSTM. Remember to use return_sequences=True. (≈ 1 line)
254 | a = Bidirectional(LSTM(n_a, return_sequences=True),input_shape=(m, Tx, n_a*2))(X)
255 |
256 | # Step 2: Iterate for Ty steps
257 | for t in range(Ty):
258 |
259 | # Step 2.A: Perform one step of the attention mechanism to get back the context vector at step t (≈ 1 line)
260 | context = one_step_attention(a, s)
261 |
262 | # Step 2.B: Apply the post-attention LSTM cell to the "context" vector.
263 | # Don't forget to pass: initial_state = [hidden state, cell state] (≈ 1 line)
264 | s, _, c = post_activation_LSTM_cell(context,initial_state = [s, c] )
265 |
266 | # Step 2.C: Apply Dense layer to the hidden state output of the post-attention LSTM (≈ 1 line)
267 | out = output_layer(s)
268 |
269 | # Step 2.D: Append "out" to the "outputs" list (≈ 1 line)
270 | outputs.append(out)
271 |
272 | # Step 3: Create model instance taking three inputs and returning the list of outputs. (≈ 1 line)
273 | model = Model(inputs=[X,s0,c0],outputs=outputs)
274 |
275 | ### END CODE HERE ###
276 |
277 | return model
278 |
279 |
280 | # Run the following cell to create your model.
281 |
282 | # In[10]:
283 |
284 | model = model(Tx, Ty, n_a, n_s, len(human_vocab), len(machine_vocab))
285 |
286 |
287 | # Let's get a summary of the model to check if it matches the expected output.
288 |
289 | # In[11]:
290 |
291 | model.summary()
292 |
293 |
294 | # **Expected Output**:
295 | #
296 | # Here is the summary you should see
297 | #
298 | #
299 | #
300 | # **Total params:**
301 | # |
302 | #
303 | # 185,484
304 | # |
305 | #
306 | #
307 | #
308 | # **Trainable params:**
309 | # |
310 | #
311 | # 185,484
312 | # |
313 | #
314 | #
315 | #
316 | # **Non-trainable params:**
317 | # |
318 | #
319 | # 0
320 | # |
321 | #
322 | #
323 | #
324 | # **bidirectional_1's output shape **
325 | # |
326 | #
327 | # (None, 30, 128)
328 | # |
329 | #
330 | #
331 | #
332 | # **repeat_vector_1's output shape **
333 | # |
334 | #
335 | # (None, 30, 128)
336 | # |
337 | #
338 | #
339 | #
340 | # **concatenate_1's output shape **
341 | # |
342 | #
343 | # (None, 30, 256)
344 | # |
345 | #
346 | #
347 | #
348 | # **attention_weights's output shape **
349 | # |
350 | #
351 | # (None, 30, 1)
352 | # |
353 | #
354 | #
355 | #
356 | # **dot_1's output shape **
357 | # |
358 | #
359 | # (None, 1, 128)
360 | # |
361 | #
362 | #
363 | #
364 | # **dense_2's output shape **
365 | # |
366 | #
367 | # (None, 11)
368 | # |
369 | #
370 | #
371 | #
372 |
373 | # As usual, after creating your model in Keras, you need to compile it and define what loss, optimizer and metrics your are want to use. Compile your model using `categorical_crossentropy` loss, a custom [Adam](https://keras.io/optimizers/#adam) [optimizer](https://keras.io/optimizers/#usage-of-optimizers) (`learning rate = 0.005`, $\beta_1 = 0.9$, $\beta_2 = 0.999$, `decay = 0.01`) and `['accuracy']` metrics:
374 |
375 | # In[12]:
376 |
377 | ### START CODE HERE ### (≈2 lines)
378 | opt = Adam(lr=0.005, beta_1=0.9, beta_2=0.999,decay=0.01)
379 | model.compile(loss='categorical_crossentropy', optimizer=opt,metrics=['accuracy'])
380 | ### END CODE HERE ###
381 |
382 |
383 | # The last step is to define all your inputs and outputs to fit the model:
384 | # - You already have X of shape $(m = 10000, T_x = 30)$ containing the training examples.
385 | # - You need to create `s0` and `c0` to initialize your `post_activation_LSTM_cell` with 0s.
386 | # - Given the `model()` you coded, you need the "outputs" to be a list of 11 elements of shape (m, T_y). So that: `outputs[i][0], ..., outputs[i][Ty]` represent the true labels (characters) corresponding to the $i^{th}$ training example (`X[i]`). More generally, `outputs[i][j]` is the true label of the $j^{th}$ character in the $i^{th}$ training example.
387 |
388 | # In[13]:
389 |
390 | s0 = np.zeros((m, n_s))
391 | c0 = np.zeros((m, n_s))
392 | outputs = list(Yoh.swapaxes(0,1))
393 |
394 |
395 | # Let's now fit the model and run it for one epoch.
396 |
397 | # In[14]:
398 |
399 | model.fit([Xoh, s0, c0], outputs, epochs=1, batch_size=100)
400 |
401 |
402 | # While training you can see the loss as well as the accuracy on each of the 10 positions of the output. The table below gives you an example of what the accuracies could be if the batch had 2 examples:
403 | #
404 | #
405 | # Thus, `dense_2_acc_8: 0.89` means that you are predicting the 7th character of the output correctly 89% of the time in the current batch of data.
406 | #
407 | #
408 | # We have run this model for longer, and saved the weights. Run the next cell to load our weights. (By training a model for several minutes, you should be able to obtain a model of similar accuracy, but loading our model will save you time.)
409 |
410 | # In[15]:
411 |
412 | model.load_weights('models/model.h5')
413 |
414 |
415 | # You can now see the results on new examples.
416 |
417 | # In[16]:
418 |
419 | EXAMPLES = ['3 May 1979', '5 April 09', '21th of August 2016', 'Tue 10 Jul 2007', 'Saturday May 9 2018', 'March 3 2001', 'March 3rd 2001', '1 March 2001']
420 | for example in EXAMPLES:
421 |
422 | source = string_to_int(example, Tx, human_vocab)
423 | source = np.array(list(map(lambda x: to_categorical(x, num_classes=len(human_vocab)), source))).swapaxes(0,1)
424 | prediction = model.predict([source, s0, c0])
425 | prediction = np.argmax(prediction, axis = -1)
426 | output = [inv_machine_vocab[int(i)] for i in prediction]
427 |
428 | print("source:", example)
429 | print("output:", ''.join(output))
430 |
431 |
432 | # You can also change these examples to test with your own examples. The next part will give you a better sense on what the attention mechanism is doing--i.e., what part of the input the network is paying attention to when generating a particular output character.
433 |
434 | # ## 3 - Visualizing Attention (Optional / Ungraded)
435 | #
436 | # Since the problem has a fixed output length of 10, it is also possible to carry out this task using 10 different softmax units to generate the 10 characters of the output. But one advantage of the attention model is that each part of the output (say the month) knows it needs to depend only on a small part of the input (the characters in the input giving the month). We can visualize what part of the output is looking at what part of the input.
437 | #
438 | # Consider the task of translating "Saturday 9 May 2018" to "2018-05-09". If we visualize the computed $\alpha^{\langle t, t' \rangle}$ we get this:
439 | #
440 | #
441 | # **Figure 8**: Full Attention Map
442 | #
443 | # Notice how the output ignores the "Saturday" portion of the input. None of the output timesteps are paying much attention to that portion of the input. We see also that 9 has been translated as 09 and May has been correctly translated into 05, with the output paying attention to the parts of the input it needs to to make the translation. The year mostly requires it to pay attention to the input's "18" in order to generate "2018."
444 | #
445 | #
446 |
447 | # ### 3.1 - Getting the activations from the network
448 | #
449 | # Lets now visualize the attention values in your network. We'll propagate an example through the network, then visualize the values of $\alpha^{\langle t, t' \rangle}$.
450 | #
451 | # To figure out where the attention values are located, let's start by printing a summary of the model .
452 |
453 | # In[17]:
454 |
455 | model.summary()
456 |
457 |
458 | # Navigate through the output of `model.summary()` above. You can see that the layer named `attention_weights` outputs the `alphas` of shape (m, 30, 1) before `dot_2` computes the context vector for every time step $t = 0, \ldots, T_y-1$. Lets get the activations from this layer.
459 | #
460 | # The function `attention_map()` pulls out the attention values from your model and plots them.
461 |
462 | # In[18]:
463 |
464 | attention_map = plot_attention_map(model, human_vocab, inv_machine_vocab, "Tuesday 09 Oct 1993", num = 7, n_s = 64)
465 |
466 |
467 | # On the generated plot you can observe the values of the attention weights for each character of the predicted output. Examine this plot and check that where the network is paying attention makes sense to you.
468 | #
469 | # In the date translation application, you will observe that most of the time attention helps predict the year, and hasn't much impact on predicting the day/month.
470 |
471 | # ### Congratulations!
472 | #
473 | #
474 | # You have come to the end of this assignment
475 | #
476 | # **Here's what you should remember from this notebook**:
477 | #
478 | # - Machine translation models can be used to map from one sequence to another. They are useful not just for translating human languages (like French->English) but also for tasks like date format translation.
479 | # - An attention mechanism allows a network to focus on the most relevant parts of the input when producing a specific part of the output.
480 | # - A network using an attention mechanism can translate from inputs of length $T_x$ to outputs of length $T_y$, where $T_x$ and $T_y$ can be different.
481 | # - You can visualize attention weights $\alpha^{\langle t,t' \rangle}$ to see what the network is paying attention to while generating each output.
482 |
483 | # Congratulations on finishing this assignment! You are now able to implement an attention model and use it to learn complex mappings from one sequence to another.
484 |
--------------------------------------------------------------------------------
/py/Operations on word vectors.py:
--------------------------------------------------------------------------------
1 |
2 | # coding: utf-8
3 |
4 | # # Operations on word vectors
5 | #
6 | # Welcome to your first assignment of this week!
7 | #
8 | # Because word embeddings are very computionally expensive to train, most ML practitioners will load a pre-trained set of embeddings.
9 | #
10 | # **After this assignment you will be able to:**
11 | #
12 | # - Load pre-trained word vectors, and measure similarity using cosine similarity
13 | # - Use word embeddings to solve word analogy problems such as Man is to Woman as King is to ______.
14 | # - Modify word embeddings to reduce their gender bias
15 | #
16 | # Let's get started! Run the following cell to load the packages you will need.
17 |
18 | # In[1]:
19 |
20 | import numpy as np
21 | from w2v_utils import *
22 |
23 |
24 | # Next, lets load the word vectors. For this assignment, we will use 50-dimensional GloVe vectors to represent words. Run the following cell to load the `word_to_vec_map`.
25 |
26 | # In[2]:
27 |
28 | words, word_to_vec_map = read_glove_vecs('data/glove.6B.50d.txt')
29 |
30 |
31 | # You've loaded:
32 | # - `words`: set of words in the vocabulary.
33 | # - `word_to_vec_map`: dictionary mapping words to their GloVe vector representation.
34 | #
35 | # You've seen that one-hot vectors do not do a good job cpaturing what words are similar. GloVe vectors provide much more useful information about the meaning of individual words. Lets now see how you can use GloVe vectors to decide how similar two words are.
36 | #
37 | #
38 |
39 | # # 1 - Cosine similarity
40 | #
41 | # To measure how similar two words are, we need a way to measure the degree of similarity between two embedding vectors for the two words. Given two vectors $u$ and $v$, cosine similarity is defined as follows:
42 | #
43 | # $$\text{CosineSimilarity(u, v)} = \frac {u . v} {||u||_2 ||v||_2} = cos(\theta) \tag{1}$$
44 | #
45 | # where $u.v$ is the dot product (or inner product) of two vectors, $||u||_2$ is the norm (or length) of the vector $u$, and $\theta$ is the angle between $u$ and $v$. This similarity depends on the angle between $u$ and $v$. If $u$ and $v$ are very similar, their cosine similarity will be close to 1; if they are dissimilar, the cosine similarity will take a smaller value.
46 | #
47 | #
48 | # **Figure 1**: The cosine of the angle between two vectors is a measure of how similar they are
49 | #
50 | # **Exercise**: Implement the function `cosine_similarity()` to evaluate similarity between word vectors.
51 | #
52 | # **Reminder**: The norm of $u$ is defined as $ ||u||_2 = \sqrt{\sum_{i=1}^{n} u_i^2}$
53 |
54 | # In[3]:
55 |
56 | # GRADED FUNCTION: cosine_similarity
57 |
58 | def cosine_similarity(u, v):
59 | """
60 | Cosine similarity reflects the degree of similariy between u and v
61 |
62 | Arguments:
63 | u -- a word vector of shape (n,)
64 | v -- a word vector of shape (n,)
65 |
66 | Returns:
67 | cosine_similarity -- the cosine similarity between u and v defined by the formula above.
68 | """
69 |
70 | distance = 0.0
71 |
72 | ### START CODE HERE ###
73 | # Compute the dot product between u and v (≈1 line)
74 | dot = np.dot(u,v)
75 | # Compute the L2 norm of u (≈1 line)
76 | norm_u = np.sqrt(np.sum(u * u))
77 |
78 | # Compute the L2 norm of v (≈1 line)
79 | norm_v = np.sqrt(np.sum(v * v))
80 | # Compute the cosine similarity defined by formula (1) (≈1 line)
81 | cosine_similarity = dot / (norm_u * norm_v)
82 | ### END CODE HERE ###
83 |
84 | return cosine_similarity
85 |
86 |
87 | # In[4]:
88 |
89 | father = word_to_vec_map["father"]
90 | mother = word_to_vec_map["mother"]
91 | ball = word_to_vec_map["ball"]
92 | crocodile = word_to_vec_map["crocodile"]
93 | france = word_to_vec_map["france"]
94 | italy = word_to_vec_map["italy"]
95 | paris = word_to_vec_map["paris"]
96 | rome = word_to_vec_map["rome"]
97 |
98 | print("cosine_similarity(father, mother) = ", cosine_similarity(father, mother))
99 | print("cosine_similarity(ball, crocodile) = ",cosine_similarity(ball, crocodile))
100 | print("cosine_similarity(france - paris, rome - italy) = ",cosine_similarity(france - paris, rome - italy))
101 |
102 |
103 | # **Expected Output**:
104 | #
105 | #
106 | #
107 | #
108 | # **cosine_similarity(father, mother)** =
109 | # |
110 | #
111 | # 0.890903844289
112 | # |
113 | #
114 | #
115 | #
116 | # **cosine_similarity(ball, crocodile)** =
117 | # |
118 | #
119 | # 0.274392462614
120 | # |
121 | #
122 | #
123 | #
124 | # **cosine_similarity(france - paris, rome - italy)** =
125 | # |
126 | #
127 | # -0.675147930817
128 | # |
129 | #
130 | #
131 |
132 | # After you get the correct expected output, please feel free to modify the inputs and measure the cosine similarity between other pairs of words! Playing around the cosine similarity of other inputs will give you a better sense of how word vectors behave.
133 |
134 | # ## 2 - Word analogy task
135 | #
136 | # In the word analogy task, we complete the sentence "*a* is to *b* as *c* is to **____**". An example is '*man* is to *woman* as *king* is to *queen*' . In detail, we are trying to find a word *d*, such that the associated word vectors $e_a, e_b, e_c, e_d$ are related in the following manner: $e_b - e_a \approx e_d - e_c$. We will measure the similarity between $e_b - e_a$ and $e_d - e_c$ using cosine similarity.
137 | #
138 | # **Exercise**: Complete the code below to be able to perform word analogies!
139 |
140 | # In[7]:
141 |
142 | # GRADED FUNCTION: complete_analogy
143 |
144 | def complete_analogy(word_a, word_b, word_c, word_to_vec_map):
145 | """
146 | Performs the word analogy task as explained above: a is to b as c is to ____.
147 |
148 | Arguments:
149 | word_a -- a word, string
150 | word_b -- a word, string
151 | word_c -- a word, string
152 | word_to_vec_map -- dictionary that maps words to their corresponding vectors.
153 |
154 | Returns:
155 | best_word -- the word such that v_b - v_a is close to v_best_word - v_c, as measured by cosine similarity
156 | """
157 |
158 | # convert words to lower case
159 | word_a, word_b, word_c = word_a.lower(), word_b.lower(), word_c.lower()
160 |
161 | ### START CODE HERE ###
162 | # Get the word embeddings v_a, v_b and v_c (≈1-3 lines)
163 | e_a, e_b, e_c = word_to_vec_map[word_a], word_to_vec_map[word_b], word_to_vec_map[word_c]
164 | ### END CODE HERE ###
165 |
166 | words = word_to_vec_map.keys()
167 | max_cosine_sim = -100 # Initialize max_cosine_sim to a large negative number
168 | best_word = None # Initialize best_word with None, it will help keep track of the word to output
169 |
170 | # loop over the whole word vector set
171 | for w in words:
172 | # to avoid best_word being one of the input words, pass on them.
173 | if w in [word_a, word_b, word_c] :
174 | continue
175 |
176 | ### START CODE HERE ###
177 | # Compute cosine similarity between the vector (e_b - e_a) and the vector ((w's vector representation) - e_c) (≈1 line)
178 | cosine_sim = cosine_similarity(e_b - e_a, word_to_vec_map[w] - e_c)
179 |
180 | # If the cosine_sim is more than the max_cosine_sim seen so far,
181 | # then: set the new max_cosine_sim to the current cosine_sim and the best_word to the current word (≈3 lines)
182 | if cosine_sim > max_cosine_sim:
183 | max_cosine_sim = cosine_sim
184 | best_word = w
185 | ### END CODE HERE ###
186 |
187 | return best_word
188 |
189 |
190 | # Run the cell below to test your code, this may take 1-2 minutes.
191 |
192 | # In[9]:
193 |
194 | triads_to_try = [('italy', 'italian', 'china'), ('india', 'delhi', 'japan'), ('man', 'woman', 'boy'), ('small', 'smaller', 'big')]
195 | for triad in triads_to_try:
196 | print ('{} -> {} :: {} -> {}'.format( *triad, complete_analogy(*triad,word_to_vec_map)))
197 |
198 |
199 | # **Expected Output**:
200 | #
201 | #
202 | #
203 | #
204 | # **italy -> italian** ::
205 | # |
206 | #
207 | # spain -> spanish
208 | # |
209 | #
210 | #
211 | #
212 | # **india -> delhi** ::
213 | # |
214 | #
215 | # japan -> tokyo
216 | # |
217 | #
218 | #
219 | #
220 | # **man -> woman ** ::
221 | # |
222 | #
223 | # boy -> girl
224 | # |
225 | #
226 | #
227 | #
228 | # **small -> smaller ** ::
229 | # |
230 | #
231 | # large -> larger
232 | # |
233 | #
234 | #
235 |
236 | # Once you get the correct expected output, please feel free to modify the input cells above to test your own analogies. Try to find some other analogy pairs that do work, but also find some where the algorithm doesn't give the right answer: For example, you can try small->smaller as big->?.
237 |
238 | # ### Congratulations!
239 | #
240 | # You've come to the end of this assignment. Here are the main points you should remember:
241 | #
242 | # - Cosine similarity a good way to compare similarity between pairs of word vectors. (Though L2 distance works too.)
243 | # - For NLP applications, using a pre-trained set of word vectors from the internet is often a good way to get started.
244 | #
245 | # Even though you have finished the graded portions, we recommend you take a look too at the rest of this notebook.
246 | #
247 | # Congratulations on finishing the graded portions of this notebook!
248 | #
249 |
250 | # ## 3 - Debiasing word vectors (OPTIONAL/UNGRADED)
251 |
252 | # In the following exercise, you will examine gender biases that can be reflected in a word embedding, and explore algorithms for reducing the bias. In addition to learning about the topic of debiasing, this exercise will also help hone your intuition about what word vectors are doing. This section involves a bit of linear algebra, though you can probably complete it even without being expert in linear algebra, and we encourage you to give it a shot. This portion of the notebook is optional and is not graded.
253 | #
254 | # Lets first see how the GloVe word embeddings relate to gender. You will first compute a vector $g = e_{woman}-e_{man}$, where $e_{woman}$ represents the word vector corresponding to the word *woman*, and $e_{man}$ corresponds to the word vector corresponding to the word *man*. The resulting vector $g$ roughly encodes the concept of "gender". (You might get a more accurate representation if you compute $g_1 = e_{mother}-e_{father}$, $g_2 = e_{girl}-e_{boy}$, etc. and average over them. But just using $e_{woman}-e_{man}$ will give good enough results for now.)
255 | #
256 |
257 | # In[10]:
258 |
259 | g = word_to_vec_map['woman'] - word_to_vec_map['man']
260 | print(g)
261 |
262 |
263 | # Now, you will consider the cosine similarity of different words with $g$. Consider what a positive value of similarity means vs a negative cosine similarity.
264 |
265 | # In[11]:
266 |
267 | print ('List of names and their similarities with constructed vector:')
268 |
269 | # girls and boys name
270 | name_list = ['john', 'marie', 'sophie', 'ronaldo', 'priya', 'rahul', 'danielle', 'reza', 'katy', 'yasmin']
271 |
272 | for w in name_list:
273 | print (w, cosine_similarity(word_to_vec_map[w], g))
274 |
275 |
276 | # As you can see, female first names tend to have a positive cosine similarity with our constructed vector $g$, while male first names tend to have a negative cosine similarity. This is not suprising, and the result seems acceptable.
277 | #
278 | # But let's try with some other words.
279 |
280 | # In[12]:
281 |
282 | print('Other words and their similarities:')
283 | word_list = ['lipstick', 'guns', 'science', 'arts', 'literature', 'warrior','doctor', 'tree', 'receptionist',
284 | 'technology', 'fashion', 'teacher', 'engineer', 'pilot', 'computer', 'singer']
285 | for w in word_list:
286 | print (w, cosine_similarity(word_to_vec_map[w], g))
287 |
288 |
289 | # Do you notice anything surprising? It is astonishing how these results reflect certain unhealthy gender stereotypes. For example, "computer" is closer to "man" while "literature" is closer to "woman". Ouch!
290 | #
291 | # We'll see below how to reduce the bias of these vectors, using an algorithm due to [Boliukbasi et al., 2016](https://arxiv.org/abs/1607.06520). Note that some word pairs such as "actor"/"actress" or "grandmother"/"grandfather" should remain gender specific, while other words such as "receptionist" or "technology" should be neutralized, i.e. not be gender-related. You will have to treat these two type of words differently when debiasing.
292 | #
293 | # ### 3.1 - Neutralize bias for non-gender specific words
294 | #
295 | # The figure below should help you visualize what neutralizing does. If you're using a 50-dimensional word embedding, the 50 dimensional space can be split into two parts: The bias-direction $g$, and the remaining 49 dimensions, which we'll call $g_{\perp}$. In linear algebra, we say that the 49 dimensional $g_{\perp}$ is perpendicular (or "othogonal") to $g$, meaning it is at 90 degrees to $g$. The neutralization step takes a vector such as $e_{receptionist}$ and zeros out the component in the direction of $g$, giving us $e_{receptionist}^{debiased}$.
296 | #
297 | # Even though $g_{\perp}$ is 49 dimensional, given the limitations of what we can draw on a screen, we illustrate it using a 1 dimensional axis below.
298 | #
299 | #
300 | # **Figure 2**: The word vector for "receptionist" represented before and after applying the neutralize operation.
301 | #
302 | # **Exercise**: Implement `neutralize()` to remove the bias of words such as "receptionist" or "scientist". Given an input embedding $e$, you can use the following formulas to compute $e^{debiased}$:
303 | #
304 | # $$e^{bias\_component} = \frac{e \cdot g}{||g||_2^2} * g\tag{2}$$
305 | # $$e^{debiased} = e - e^{bias\_component}\tag{3}$$
306 | #
307 | # If you are an expert in linear algebra, you may recognize $e^{bias\_component}$ as the projection of $e$ onto the direction $g$. If you're not an expert in linear algebra, don't worry about this.
308 | #
309 | #
314 |
315 | # In[14]:
316 |
317 | def neutralize(word, g, word_to_vec_map):
318 | """
319 | Removes the bias of "word" by projecting it on the space orthogonal to the bias axis.
320 | This function ensures that gender neutral words are zero in the gender subspace.
321 |
322 | Arguments:
323 | word -- string indicating the word to debias
324 | g -- numpy-array of shape (50,), corresponding to the bias axis (such as gender)
325 | word_to_vec_map -- dictionary mapping words to their corresponding vectors.
326 |
327 | Returns:
328 | e_debiased -- neutralized word vector representation of the input "word"
329 | """
330 |
331 | ### START CODE HERE ###
332 | # Select word vector representation of "word". Use word_to_vec_map. (≈ 1 line)
333 | e = word_to_vec_map[word]
334 |
335 | # Compute e_biascomponent using the formula give above. (≈ 1 line)
336 | e_biascomponent = np.dot(e ,g) / np.sum(g * g) * g
337 |
338 | # Neutralize e by substracting e_biascomponent from it
339 | # e_debiased should be equal to its orthogonal projection. (≈ 1 line)
340 | e_debiased = e - e_biascomponent
341 | ### END CODE HERE ###
342 |
343 | return e_debiased
344 |
345 |
346 | # In[15]:
347 |
348 | e = "receptionist"
349 | print("cosine similarity between " + e + " and g, before neutralizing: ", cosine_similarity(word_to_vec_map["receptionist"], g))
350 |
351 | e_debiased = neutralize("receptionist", g, word_to_vec_map)
352 | print("cosine similarity between " + e + " and g, after neutralizing: ", cosine_similarity(e_debiased, g))
353 |
354 |
355 | # **Expected Output**: The second result is essentially 0, up to numerical roundof (on the order of $10^{-17}$).
356 | #
357 | #
358 | #
359 | #
360 | #
361 | # **cosine similarity between receptionist and g, before neutralizing:** :
362 | # |
363 | #
364 | # 0.330779417506
365 | # |
366 | #
367 | #
368 | #
369 | # **cosine similarity between receptionist and g, after neutralizing:** :
370 | # |
371 | #
372 | # -3.26732746085e-17
373 | # |
374 | #
375 |
376 | # ### 3.2 - Equalization algorithm for gender-specific words
377 | #
378 | # Next, lets see how debiasing can also be applied to word pairs such as "actress" and "actor." Equalization is applied to pairs of words that you might want to have differ only through the gender property. As a concrete example, suppose that "actress" is closer to "babysit" than "actor." By applying neutralizing to "babysit" we can reduce the gender-stereotype associated with babysitting. But this still does not guarantee that "actor" and "actress" are equidistant from "babysit." The equalization algorithm takes care of this.
379 | #
380 | # The key idea behind equalization is to make sure that a particular pair of words are equi-distant from the 49-dimensional $g_\perp$. The equalization step also ensures that the two equalized steps are now the same distance from $e_{receptionist}^{debiased}$, or from any other work that has been neutralized. In pictures, this is how equalization works:
381 | #
382 | #
383 | #
384 | #
385 | # The derivation of the linear algebra to do this is a bit more complex. (See Bolukbasi et al., 2016 for details.) But the key equations are:
386 | #
387 | # $$ \mu = \frac{e_{w1} + e_{w2}}{2}\tag{4}$$
388 | #
389 | # $$ \mu_{B} = \frac {\mu \cdot \text{bias_axis}}{||\text{bias_axis}||_2^2} *\text{bias_axis}
390 | # \tag{5}$$
391 | #
392 | # $$\mu_{\perp} = \mu - \mu_{B} \tag{6}$$
393 | #
394 | # $$ e_{w1B} = \frac {e_{w1} \cdot \text{bias_axis}}{||\text{bias_axis}||_2^2} *\text{bias_axis}
395 | # \tag{7}$$
396 | # $$ e_{w2B} = \frac {e_{w2} \cdot \text{bias_axis}}{||\text{bias_axis}||_2^2} *\text{bias_axis}
397 | # \tag{8}$$
398 | #
399 | #
400 | # $$e_{w1B}^{corrected} = \sqrt{ |{1 - ||\mu_{\perp} ||^2_2} |} * \frac{e_{\text{w1B}} - \mu_B} {|(e_{w1} - \mu_{\perp}) - \mu_B)|} \tag{9}$$
401 | #
402 | #
403 | # $$e_{w2B}^{corrected} = \sqrt{ |{1 - ||\mu_{\perp} ||^2_2} |} * \frac{e_{\text{w2B}} - \mu_B} {|(e_{w2} - \mu_{\perp}) - \mu_B)|} \tag{10}$$
404 | #
405 | # $$e_1 = e_{w1B}^{corrected} + \mu_{\perp} \tag{11}$$
406 | # $$e_2 = e_{w2B}^{corrected} + \mu_{\perp} \tag{12}$$
407 | #
408 | #
409 | # **Exercise**: Implement the function below. Use the equations above to get the final equalized version of the pair of words. Good luck!
410 |
411 | # In[16]:
412 |
413 | def equalize(pair, bias_axis, word_to_vec_map):
414 | """
415 | Debias gender specific words by following the equalize method described in the figure above.
416 |
417 | Arguments:
418 | pair -- pair of strings of gender specific words to debias, e.g. ("actress", "actor")
419 | bias_axis -- numpy-array of shape (50,), vector corresponding to the bias axis, e.g. gender
420 | word_to_vec_map -- dictionary mapping words to their corresponding vectors
421 |
422 | Returns
423 | e_1 -- word vector corresponding to the first word
424 | e_2 -- word vector corresponding to the second word
425 | """
426 |
427 | ### START CODE HERE ###
428 | # Step 1: Select word vector representation of "word". Use word_to_vec_map. (≈ 2 lines)
429 | w1, w2 = pair
430 | e_w1, e_w2 = word_to_vec_map[w1],word_to_vec_map[w2]
431 |
432 | # Step 2: Compute the mean of e_w1 and e_w2 (≈ 1 line)
433 | mu = (e_w1 + e_w2) / 2
434 |
435 | # Step 3: Compute the projections of mu over the bias axis and the orthogonal axis (≈ 2 lines)
436 | mu_B = np.dot(mu, bias_axis) / np.sum(bias_axis * bias_axis) * bias_axis
437 | mu_orth = mu - mu_B
438 |
439 | # Step 4: Use equations (7) and (8) to compute e_w1B and e_w2B (≈2 lines)
440 | e_w1B = np.dot(e_w1, bias_axis) / np.sum(bias_axis * bias_axis) * bias_axis
441 | e_w2B = np.dot(e_w2, bias_axis) / np.sum(bias_axis * bias_axis) * bias_axis
442 |
443 | # Step 5: Adjust the Bias part of e_w1B and e_w2B using the formulas (9) and (10) given above (≈2 lines)
444 | corrected_e_w1B = np.sqrt(np.abs(1 - np.sum(mu_orth * mu_orth))) * (e_w1B - mu_B) / np.linalg.norm(e_w1 - mu_orth - mu_B)
445 | corrected_e_w2B = np.sqrt(np.abs(1 - np.sum(mu_orth * mu_orth))) * (e_w2B - mu_B) / np.linalg.norm(e_w2 - mu_orth - mu_B)
446 |
447 | # Step 6: Debias by equalizing e1 and e2 to the sum of their corrected projections (≈2 lines)
448 | e1 = corrected_e_w1B + mu_orth
449 | e2 = corrected_e_w2B + mu_orth
450 |
451 | ### END CODE HERE ###
452 |
453 | return e1, e2
454 |
455 |
456 | # In[17]:
457 |
458 | print("cosine similarities before equalizing:")
459 | print("cosine_similarity(word_to_vec_map[\"man\"], gender) = ", cosine_similarity(word_to_vec_map["man"], g))
460 | print("cosine_similarity(word_to_vec_map[\"woman\"], gender) = ", cosine_similarity(word_to_vec_map["woman"], g))
461 | print()
462 | e1, e2 = equalize(("man", "woman"), g, word_to_vec_map)
463 | print("cosine similarities after equalizing:")
464 | print("cosine_similarity(e1, gender) = ", cosine_similarity(e1, g))
465 | print("cosine_similarity(e2, gender) = ", cosine_similarity(e2, g))
466 |
467 |
468 | # **Expected Output**:
469 | #
470 | # cosine similarities before equalizing:
471 | #
472 | #
473 | #
474 | # **cosine_similarity(word_to_vec_map["man"], gender)** =
475 | # |
476 | #
477 | # -0.117110957653
478 | # |
479 | #
480 | #
481 | #
482 | # **cosine_similarity(word_to_vec_map["woman"], gender)** =
483 | # |
484 | #
485 | # 0.356666188463
486 | # |
487 | #
488 | #
489 | #
490 | # cosine similarities after equalizing:
491 | #
492 | #
493 | #
494 | # **cosine_similarity(u1, gender)** =
495 | # |
496 | #
497 | # -0.700436428931
498 | # |
499 | #
500 | #
501 | #
502 | # **cosine_similarity(u2, gender)** =
503 | # |
504 | #
505 | # 0.700436428931
506 | # |
507 | #
508 | #
509 |
510 | # Please feel free to play with the input words in the cell above, to apply equalization to other pairs of words.
511 | #
512 | # These debiasing algorithms are very helpful for reducing bias, but are not perfect and do not eliminate all traces of bias. For example, one weakness of this implementation was that the bias direction $g$ was defined using only the pair of words _woman_ and _man_. As discussed earlier, if $g$ were defined by computing $g_1 = e_{woman} - e_{man}$; $g_2 = e_{mother} - e_{father}$; $g_3 = e_{girl} - e_{boy}$; and so on and averaging over them, you would obtain a better estimate of the "gender" dimension in the 50 dimensional word embedding space. Feel free to play with such variants as well.
513 | #
514 |
515 | # ### Congratulations
516 | #
517 | # You have come to the end of this notebook, and have seen a lot of the ways that word vectors can be used as well as modified.
518 | #
519 | # Congratulations on finishing this notebook!
520 | #
521 |
522 | # **References**:
523 | # - The debiasing algorithm is from Bolukbasi et al., 2016, [Man is to Computer Programmer as Woman is to
524 | # Homemaker? Debiasing Word Embeddings](https://papers.nips.cc/paper/6228-man-is-to-computer-programmer-as-woman-is-to-homemaker-debiasing-word-embeddings.pdf)
525 | # - The GloVe word embeddings were due to Jeffrey Pennington, Richard Socher, and Christopher D. Manning. (https://nlp.stanford.edu/projects/glove/)
526 | #
527 |
--------------------------------------------------------------------------------
/py/Python Basics With Numpy v3.py:
--------------------------------------------------------------------------------
1 |
2 | # coding: utf-8
3 |
4 | # # Python Basics with Numpy (optional assignment)
5 | #
6 | # Welcome to your first assignment. This exercise gives you a brief introduction to Python. Even if you've used Python before, this will help familiarize you with functions we'll need.
7 | #
8 | # **Instructions:**
9 | # - You will be using Python 3.
10 | # - Avoid using for-loops and while-loops, unless you are explicitly told to do so.
11 | # - Do not modify the (# GRADED FUNCTION [function name]) comment in some cells. Your work would not be graded if you change this. Each cell containing that comment should only contain one function.
12 | # - After coding your function, run the cell right below it to check if your result is correct.
13 | #
14 | # **After this assignment you will:**
15 | # - Be able to use iPython Notebooks
16 | # - Be able to use numpy functions and numpy matrix/vector operations
17 | # - Understand the concept of "broadcasting"
18 | # - Be able to vectorize code
19 | #
20 | # Let's get started!
21 |
22 | # ## About iPython Notebooks ##
23 | #
24 | # iPython Notebooks are interactive coding environments embedded in a webpage. You will be using iPython notebooks in this class. You only need to write code between the ### START CODE HERE ### and ### END CODE HERE ### comments. After writing your code, you can run the cell by either pressing "SHIFT"+"ENTER" or by clicking on "Run Cell" (denoted by a play symbol) in the upper bar of the notebook.
25 | #
26 | # We will often specify "(≈ X lines of code)" in the comments to tell you about how much code you need to write. It is just a rough estimate, so don't feel bad if your code is longer or shorter.
27 | #
28 | # **Exercise**: Set test to `"Hello World"` in the cell below to print "Hello World" and run the two cells below.
29 |
30 | # In[3]:
31 |
32 | ### START CODE HERE ### (≈ 1 line of code)
33 | test = "Hello World"
34 | ### END CODE HERE ###
35 |
36 |
37 | # In[4]:
38 |
39 | print ("test: " + test)
40 |
41 |
42 | # **Expected output**:
43 | # test: Hello World
44 |
45 | #
46 | # **What you need to remember**:
47 | # - Run your cells using SHIFT+ENTER (or "Run cell")
48 | # - Write code in the designated areas using Python 3 only
49 | # - Do not modify the code outside of the designated areas
50 |
51 | # ## 1 - Building basic functions with numpy ##
52 | #
53 | # Numpy is the main package for scientific computing in Python. It is maintained by a large community (www.numpy.org). In this exercise you will learn several key numpy functions such as np.exp, np.log, and np.reshape. You will need to know how to use these functions for future assignments.
54 | #
55 | # ### 1.1 - sigmoid function, np.exp() ###
56 | #
57 | # Before using np.exp(), you will use math.exp() to implement the sigmoid function. You will then see why np.exp() is preferable to math.exp().
58 | #
59 | # **Exercise**: Build a function that returns the sigmoid of a real number x. Use math.exp(x) for the exponential function.
60 | #
61 | # **Reminder**:
62 | # $sigmoid(x) = \frac{1}{1+e^{-x}}$ is sometimes also known as the logistic function. It is a non-linear function used not only in Machine Learning (Logistic Regression), but also in Deep Learning.
63 | #
64 | #
65 | #
66 | # To refer to a function belonging to a specific package you could call it using package_name.function(). Run the code below to see an example with math.exp().
67 |
68 | # In[13]:
69 |
70 | # GRADED FUNCTION: basic_sigmoid
71 |
72 | import math
73 |
74 | def basic_sigmoid(x):
75 | """
76 | Compute sigmoid of x.
77 |
78 | Arguments:
79 | x -- A scalar
80 |
81 | Return:
82 | s -- sigmoid(x)
83 | """
84 |
85 | ### START CODE HERE ### (≈ 1 line of code)
86 | s = 1 / (1 + math.exp(-x))
87 | ### END CODE HERE ###
88 |
89 | return s
90 |
91 |
92 | # In[14]:
93 |
94 | basic_sigmoid(3)
95 |
96 |
97 | # **Expected Output**:
98 | #
99 | #
100 | # ** basic_sigmoid(3) ** |
101 | # 0.9525741268224334 |
102 | #
103 | #
104 | #
105 |
106 | # Actually, we rarely use the "math" library in deep learning because the inputs of the functions are real numbers. In deep learning we mostly use matrices and vectors. This is why numpy is more useful.
107 |
108 | # In[15]:
109 |
110 | ### One reason why we use "numpy" instead of "math" in Deep Learning ###
111 | x = [1, 2, 3]
112 | basic_sigmoid(x) # you will see this give an error when you run it, because x is a vector.
113 |
114 |
115 | # In fact, if $ x = (x_1, x_2, ..., x_n)$ is a row vector then $np.exp(x)$ will apply the exponential function to every element of x. The output will thus be: $np.exp(x) = (e^{x_1}, e^{x_2}, ..., e^{x_n})$
116 |
117 | # In[16]:
118 |
119 | import numpy as np
120 |
121 | # example of np.exp
122 | x = np.array([1, 2, 3])
123 | print(np.exp(x)) # result is (exp(1), exp(2), exp(3))
124 |
125 |
126 | # Furthermore, if x is a vector, then a Python operation such as $s = x + 3$ or $s = \frac{1}{x}$ will output s as a vector of the same size as x.
127 |
128 | # In[17]:
129 |
130 | # example of vector operation
131 | x = np.array([1, 2, 3])
132 | print (x + 3)
133 |
134 |
135 | # Any time you need more info on a numpy function, we encourage you to look at [the official documentation](https://docs.scipy.org/doc/numpy-1.10.1/reference/generated/numpy.exp.html).
136 | #
137 | # You can also create a new cell in the notebook and write `np.exp?` (for example) to get quick access to the documentation.
138 | #
139 | # **Exercise**: Implement the sigmoid function using numpy.
140 | #
141 | # **Instructions**: x could now be either a real number, a vector, or a matrix. The data structures we use in numpy to represent these shapes (vectors, matrices...) are called numpy arrays. You don't need to know more for now.
142 | # $$ \text{For } x \in \mathbb{R}^n \text{, } sigmoid(x) = sigmoid\begin{pmatrix}
143 | # x_1 \\
144 | # x_2 \\
145 | # ... \\
146 | # x_n \\
147 | # \end{pmatrix} = \begin{pmatrix}
148 | # \frac{1}{1+e^{-x_1}} \\
149 | # \frac{1}{1+e^{-x_2}} \\
150 | # ... \\
151 | # \frac{1}{1+e^{-x_n}} \\
152 | # \end{pmatrix}\tag{1} $$
153 |
154 | # In[18]:
155 |
156 | # GRADED FUNCTION: sigmoid
157 |
158 | import numpy as np # this means you can access numpy functions by writing np.function() instead of numpy.function()
159 |
160 | def sigmoid(x):
161 | """
162 | Compute the sigmoid of x
163 |
164 | Arguments:
165 | x -- A scalar or numpy array of any size
166 |
167 | Return:
168 | s -- sigmoid(x)
169 | """
170 |
171 | ### START CODE HERE ### (≈ 1 line of code)
172 | s = 1 / (1 + np.exp(-x))
173 | ### END CODE HERE ###
174 |
175 | return s
176 |
177 |
178 | # In[19]:
179 |
180 | x = np.array([1, 2, 3])
181 | sigmoid(x)
182 |
183 |
184 | # **Expected Output**:
185 | #
186 | #
187 | # **sigmoid([1,2,3])** |
188 | # array([ 0.73105858, 0.88079708, 0.95257413]) |
189 | #
190 | #
191 | #
192 |
193 | # ### 1.2 - Sigmoid gradient
194 | #
195 | # As you've seen in lecture, you will need to compute gradients to optimize loss functions using backpropagation. Let's code your first gradient function.
196 | #
197 | # **Exercise**: Implement the function sigmoid_grad() to compute the gradient of the sigmoid function with respect to its input x. The formula is: $$sigmoid\_derivative(x) = \sigma'(x) = \sigma(x) (1 - \sigma(x))\tag{2}$$
198 | # You often code this function in two steps:
199 | # 1. Set s to be the sigmoid of x. You might find your sigmoid(x) function useful.
200 | # 2. Compute $\sigma'(x) = s(1-s)$
201 |
202 | # In[20]:
203 |
204 | # GRADED FUNCTION: sigmoid_derivative
205 |
206 | def sigmoid_derivative(x):
207 | """
208 | Compute the gradient (also called the slope or derivative) of the sigmoid function with respect to its input x.
209 | You can store the output of the sigmoid function into variables and then use it to calculate the gradient.
210 |
211 | Arguments:
212 | x -- A scalar or numpy array
213 |
214 | Return:
215 | ds -- Your computed gradient.
216 | """
217 |
218 | ### START CODE HERE ### (≈ 2 lines of code)
219 | s = 1 / (1 + np.exp(-x))
220 | ds = s * (1 - s)
221 | ### END CODE HERE ###
222 |
223 | return ds
224 |
225 |
226 | # In[21]:
227 |
228 | x = np.array([1, 2, 3])
229 | print ("sigmoid_derivative(x) = " + str(sigmoid_derivative(x)))
230 |
231 |
232 | # **Expected Output**:
233 | #
234 | #
235 | #
236 | #
237 | # **sigmoid_derivative([1,2,3])** |
238 | # [ 0.19661193 0.10499359 0.04517666] |
239 | #
240 | #
241 | #
242 | #
243 |
244 | # ### 1.3 - Reshaping arrays ###
245 | #
246 | # Two common numpy functions used in deep learning are [np.shape](https://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.shape.html) and [np.reshape()](https://docs.scipy.org/doc/numpy/reference/generated/numpy.reshape.html).
247 | # - X.shape is used to get the shape (dimension) of a matrix/vector X.
248 | # - X.reshape(...) is used to reshape X into some other dimension.
249 | #
250 | # For example, in computer science, an image is represented by a 3D array of shape $(length, height, depth = 3)$. However, when you read an image as the input of an algorithm you convert it to a vector of shape $(length*height*3, 1)$. In other words, you "unroll", or reshape, the 3D array into a 1D vector.
251 | #
252 | #
253 | #
254 | # **Exercise**: Implement `image2vector()` that takes an input of shape (length, height, 3) and returns a vector of shape (length\*height\*3, 1). For example, if you would like to reshape an array v of shape (a, b, c) into a vector of shape (a*b,c) you would do:
255 | # ``` python
256 | # v = v.reshape((v.shape[0]*v.shape[1], v.shape[2])) # v.shape[0] = a ; v.shape[1] = b ; v.shape[2] = c
257 | # ```
258 | # - Please don't hardcode the dimensions of image as a constant. Instead look up the quantities you need with `image.shape[0]`, etc.
259 |
260 | # In[24]:
261 |
262 | # GRADED FUNCTION: image2vector
263 | def image2vector(image):
264 | """
265 | Argument:
266 | image -- a numpy array of shape (length, height, depth)
267 |
268 | Returns:
269 | v -- a vector of shape (length*height*depth, 1)
270 | """
271 |
272 | ### START CODE HERE ### (≈ 1 line of code)
273 | v = image.reshape((image.shape[0] * image.shape[1] * image.shape[2],1))
274 | ### END CODE HERE ###
275 |
276 | return v
277 |
278 |
279 | # In[25]:
280 |
281 | # This is a 3 by 3 by 2 array, typically images will be (num_px_x, num_px_y,3) where 3 represents the RGB values
282 | image = np.array([[[ 0.67826139, 0.29380381],
283 | [ 0.90714982, 0.52835647],
284 | [ 0.4215251 , 0.45017551]],
285 |
286 | [[ 0.92814219, 0.96677647],
287 | [ 0.85304703, 0.52351845],
288 | [ 0.19981397, 0.27417313]],
289 |
290 | [[ 0.60659855, 0.00533165],
291 | [ 0.10820313, 0.49978937],
292 | [ 0.34144279, 0.94630077]]])
293 |
294 | print ("image2vector(image) = " + str(image2vector(image)))
295 |
296 |
297 | # **Expected Output**:
298 | #
299 | #
300 | #
301 | #
302 | # **image2vector(image)** |
303 | # [[ 0.67826139]
304 | # [ 0.29380381]
305 | # [ 0.90714982]
306 | # [ 0.52835647]
307 | # [ 0.4215251 ]
308 | # [ 0.45017551]
309 | # [ 0.92814219]
310 | # [ 0.96677647]
311 | # [ 0.85304703]
312 | # [ 0.52351845]
313 | # [ 0.19981397]
314 | # [ 0.27417313]
315 | # [ 0.60659855]
316 | # [ 0.00533165]
317 | # [ 0.10820313]
318 | # [ 0.49978937]
319 | # [ 0.34144279]
320 | # [ 0.94630077]] |
321 | #
322 | #
323 | #
324 | #
325 |
326 | # ### 1.4 - Normalizing rows
327 | #
328 | # Another common technique we use in Machine Learning and Deep Learning is to normalize our data. It often leads to a better performance because gradient descent converges faster after normalization. Here, by normalization we mean changing x to $ \frac{x}{\| x\|} $ (dividing each row vector of x by its norm).
329 | #
330 | # For example, if $$x =
331 | # \begin{bmatrix}
332 | # 0 & 3 & 4 \\
333 | # 2 & 6 & 4 \\
334 | # \end{bmatrix}\tag{3}$$ then $$\| x\| = np.linalg.norm(x, axis = 1, keepdims = True) = \begin{bmatrix}
335 | # 5 \\
336 | # \sqrt{56} \\
337 | # \end{bmatrix}\tag{4} $$and $$ x\_normalized = \frac{x}{\| x\|} = \begin{bmatrix}
338 | # 0 & \frac{3}{5} & \frac{4}{5} \\
339 | # \frac{2}{\sqrt{56}} & \frac{6}{\sqrt{56}} & \frac{4}{\sqrt{56}} \\
340 | # \end{bmatrix}\tag{5}$$ Note that you can divide matrices of different sizes and it works fine: this is called broadcasting and you're going to learn about it in part 5.
341 | #
342 | #
343 | # **Exercise**: Implement normalizeRows() to normalize the rows of a matrix. After applying this function to an input matrix x, each row of x should be a vector of unit length (meaning length 1).
344 |
345 | # In[26]:
346 |
347 | # GRADED FUNCTION: normalizeRows
348 |
349 | def normalizeRows(x):
350 | """
351 | Implement a function that normalizes each row of the matrix x (to have unit length).
352 |
353 | Argument:
354 | x -- A numpy matrix of shape (n, m)
355 |
356 | Returns:
357 | x -- The normalized (by row) numpy matrix. You are allowed to modify x.
358 | """
359 |
360 | ### START CODE HERE ### (≈ 2 lines of code)
361 | # Compute x_norm as the norm 2 of x. Use np.linalg.norm(..., ord = 2, axis = ..., keepdims = True)
362 | x_norm = np.linalg.norm(x,ord = 2,axis = 1,keepdims = True)
363 |
364 | # Divide x by its norm.
365 | x = x / x_norm
366 | ### END CODE HERE ###
367 |
368 | return x
369 |
370 |
371 | # In[27]:
372 |
373 | x = np.array([
374 | [0, 3, 4],
375 | [1, 6, 4]])
376 | print("normalizeRows(x) = " + str(normalizeRows(x)))
377 |
378 |
379 | # **Expected Output**:
380 | #
381 | #
382 | #
383 | #
384 | # **normalizeRows(x)** |
385 | # [[ 0. 0.6 0.8 ]
386 | # [ 0.13736056 0.82416338 0.54944226]] |
387 | #
388 | #
389 | #
390 | #
391 |
392 | # **Note**:
393 | # In normalizeRows(), you can try to print the shapes of x_norm and x, and then rerun the assessment. You'll find out that they have different shapes. This is normal given that x_norm takes the norm of each row of x. So x_norm has the same number of rows but only 1 column. So how did it work when you divided x by x_norm? This is called broadcasting and we'll talk about it now!
394 |
395 | # ### 1.5 - Broadcasting and the softmax function ####
396 | # A very important concept to understand in numpy is "broadcasting". It is very useful for performing mathematical operations between arrays of different shapes. For the full details on broadcasting, you can read the official [broadcasting documentation](http://docs.scipy.org/doc/numpy/user/basics.broadcasting.html).
397 |
398 | # **Exercise**: Implement a softmax function using numpy. You can think of softmax as a normalizing function used when your algorithm needs to classify two or more classes. You will learn more about softmax in the second course of this specialization.
399 | #
400 | # **Instructions**:
401 | # - $ \text{for } x \in \mathbb{R}^{1\times n} \text{, } softmax(x) = softmax(\begin{bmatrix}
402 | # x_1 &&
403 | # x_2 &&
404 | # ... &&
405 | # x_n
406 | # \end{bmatrix}) = \begin{bmatrix}
407 | # \frac{e^{x_1}}{\sum_{j}e^{x_j}} &&
408 | # \frac{e^{x_2}}{\sum_{j}e^{x_j}} &&
409 | # ... &&
410 | # \frac{e^{x_n}}{\sum_{j}e^{x_j}}
411 | # \end{bmatrix} $
412 | #
413 | # - $\text{for a matrix } x \in \mathbb{R}^{m \times n} \text{, $x_{ij}$ maps to the element in the $i^{th}$ row and $j^{th}$ column of $x$, thus we have: }$ $$softmax(x) = softmax\begin{bmatrix}
414 | # x_{11} & x_{12} & x_{13} & \dots & x_{1n} \\
415 | # x_{21} & x_{22} & x_{23} & \dots & x_{2n} \\
416 | # \vdots & \vdots & \vdots & \ddots & \vdots \\
417 | # x_{m1} & x_{m2} & x_{m3} & \dots & x_{mn}
418 | # \end{bmatrix} = \begin{bmatrix}
419 | # \frac{e^{x_{11}}}{\sum_{j}e^{x_{1j}}} & \frac{e^{x_{12}}}{\sum_{j}e^{x_{1j}}} & \frac{e^{x_{13}}}{\sum_{j}e^{x_{1j}}} & \dots & \frac{e^{x_{1n}}}{\sum_{j}e^{x_{1j}}} \\
420 | # \frac{e^{x_{21}}}{\sum_{j}e^{x_{2j}}} & \frac{e^{x_{22}}}{\sum_{j}e^{x_{2j}}} & \frac{e^{x_{23}}}{\sum_{j}e^{x_{2j}}} & \dots & \frac{e^{x_{2n}}}{\sum_{j}e^{x_{2j}}} \\
421 | # \vdots & \vdots & \vdots & \ddots & \vdots \\
422 | # \frac{e^{x_{m1}}}{\sum_{j}e^{x_{mj}}} & \frac{e^{x_{m2}}}{\sum_{j}e^{x_{mj}}} & \frac{e^{x_{m3}}}{\sum_{j}e^{x_{mj}}} & \dots & \frac{e^{x_{mn}}}{\sum_{j}e^{x_{mj}}}
423 | # \end{bmatrix} = \begin{pmatrix}
424 | # softmax\text{(first row of x)} \\
425 | # softmax\text{(second row of x)} \\
426 | # ... \\
427 | # softmax\text{(last row of x)} \\
428 | # \end{pmatrix} $$
429 |
430 | # In[28]:
431 |
432 | # GRADED FUNCTION: softmax
433 |
434 | def softmax(x):
435 | """Calculates the softmax for each row of the input x.
436 |
437 | Your code should work for a row vector and also for matrices of shape (n, m).
438 |
439 | Argument:
440 | x -- A numpy matrix of shape (n,m)
441 |
442 | Returns:
443 | s -- A numpy matrix equal to the softmax of x, of shape (n,m)
444 | """
445 |
446 | ### START CODE HERE ### (≈ 3 lines of code)
447 | # Apply exp() element-wise to x. Use np.exp(...).
448 | x_exp = np.exp(x)
449 |
450 | # Create a vector x_sum that sums each row of x_exp. Use np.sum(..., axis = 1, keepdims = True).
451 | x_sum = np.sum(x_exp,axis = 1,keepdims = True)
452 |
453 | # Compute softmax(x) by dividing x_exp by x_sum. It should automatically use numpy broadcasting.
454 | s = x_exp / x_sum
455 |
456 | ### END CODE HERE ###
457 |
458 | return s
459 |
460 |
461 | # In[29]:
462 |
463 | x = np.array([
464 | [9, 2, 5, 0, 0],
465 | [7, 5, 0, 0 ,0]])
466 | print("softmax(x) = " + str(softmax(x)))
467 |
468 |
469 | # **Expected Output**:
470 | #
471 | #
472 | #
473 | #
474 | # **softmax(x)** |
475 | # [[ 9.80897665e-01 8.94462891e-04 1.79657674e-02 1.21052389e-04
476 | # 1.21052389e-04]
477 | # [ 8.78679856e-01 1.18916387e-01 8.01252314e-04 8.01252314e-04
478 | # 8.01252314e-04]] |
479 | #
480 | #
481 | #
482 |
483 | # **Note**:
484 | # - If you print the shapes of x_exp, x_sum and s above and rerun the assessment cell, you will see that x_sum is of shape (2,1) while x_exp and s are of shape (2,5). **x_exp/x_sum** works due to python broadcasting.
485 | #
486 | # Congratulations! You now have a pretty good understanding of python numpy and have implemented a few useful functions that you will be using in deep learning.
487 |
488 | #
489 | # **What you need to remember:**
490 | # - np.exp(x) works for any np.array x and applies the exponential function to every coordinate
491 | # - the sigmoid function and its gradient
492 | # - image2vector is commonly used in deep learning
493 | # - np.reshape is widely used. In the future, you'll see that keeping your matrix/vector dimensions straight will go toward eliminating a lot of bugs.
494 | # - numpy has efficient built-in functions
495 | # - broadcasting is extremely useful
496 |
497 | # ## 2) Vectorization
498 |
499 | #
500 | # In deep learning, you deal with very large datasets. Hence, a non-computationally-optimal function can become a huge bottleneck in your algorithm and can result in a model that takes ages to run. To make sure that your code is computationally efficient, you will use vectorization. For example, try to tell the difference between the following implementations of the dot/outer/elementwise product.
501 |
502 | # In[30]:
503 |
504 | import time
505 |
506 | x1 = [9, 2, 5, 0, 0, 7, 5, 0, 0, 0, 9, 2, 5, 0, 0]
507 | x2 = [9, 2, 2, 9, 0, 9, 2, 5, 0, 0, 9, 2, 5, 0, 0]
508 |
509 | ### CLASSIC DOT PRODUCT OF VECTORS IMPLEMENTATION ###
510 | tic = time.process_time()
511 | dot = 0
512 | for i in range(len(x1)):
513 | dot+= x1[i]*x2[i]
514 | toc = time.process_time()
515 | print ("dot = " + str(dot) + "\n ----- Computation time = " + str(1000*(toc - tic)) + "ms")
516 |
517 | ### CLASSIC OUTER PRODUCT IMPLEMENTATION ###
518 | tic = time.process_time()
519 | outer = np.zeros((len(x1),len(x2))) # we create a len(x1)*len(x2) matrix with only zeros
520 | for i in range(len(x1)):
521 | for j in range(len(x2)):
522 | outer[i,j] = x1[i]*x2[j]
523 | toc = time.process_time()
524 | print ("outer = " + str(outer) + "\n ----- Computation time = " + str(1000*(toc - tic)) + "ms")
525 |
526 | ### CLASSIC ELEMENTWISE IMPLEMENTATION ###
527 | tic = time.process_time()
528 | mul = np.zeros(len(x1))
529 | for i in range(len(x1)):
530 | mul[i] = x1[i]*x2[i]
531 | toc = time.process_time()
532 | print ("elementwise multiplication = " + str(mul) + "\n ----- Computation time = " + str(1000*(toc - tic)) + "ms")
533 |
534 | ### CLASSIC GENERAL DOT PRODUCT IMPLEMENTATION ###
535 | W = np.random.rand(3,len(x1)) # Random 3*len(x1) numpy array
536 | tic = time.process_time()
537 | gdot = np.zeros(W.shape[0])
538 | for i in range(W.shape[0]):
539 | for j in range(len(x1)):
540 | gdot[i] += W[i,j]*x1[j]
541 | toc = time.process_time()
542 | print ("gdot = " + str(gdot) + "\n ----- Computation time = " + str(1000*(toc - tic)) + "ms")
543 |
544 |
545 | # In[31]:
546 |
547 | x1 = [9, 2, 5, 0, 0, 7, 5, 0, 0, 0, 9, 2, 5, 0, 0]
548 | x2 = [9, 2, 2, 9, 0, 9, 2, 5, 0, 0, 9, 2, 5, 0, 0]
549 |
550 | ### VECTORIZED DOT PRODUCT OF VECTORS ###
551 | tic = time.process_time()
552 | dot = np.dot(x1,x2)
553 | toc = time.process_time()
554 | print ("dot = " + str(dot) + "\n ----- Computation time = " + str(1000*(toc - tic)) + "ms")
555 |
556 | ### VECTORIZED OUTER PRODUCT ###
557 | tic = time.process_time()
558 | outer = np.outer(x1,x2)
559 | toc = time.process_time()
560 | print ("outer = " + str(outer) + "\n ----- Computation time = " + str(1000*(toc - tic)) + "ms")
561 |
562 | ### VECTORIZED ELEMENTWISE MULTIPLICATION ###
563 | tic = time.process_time()
564 | mul = np.multiply(x1,x2)
565 | toc = time.process_time()
566 | print ("elementwise multiplication = " + str(mul) + "\n ----- Computation time = " + str(1000*(toc - tic)) + "ms")
567 |
568 | ### VECTORIZED GENERAL DOT PRODUCT ###
569 | tic = time.process_time()
570 | dot = np.dot(W,x1)
571 | toc = time.process_time()
572 | print ("gdot = " + str(dot) + "\n ----- Computation time = " + str(1000*(toc - tic)) + "ms")
573 |
574 |
575 | # As you may have noticed, the vectorized implementation is much cleaner and more efficient. For bigger vectors/matrices, the differences in running time become even bigger.
576 | #
577 | # **Note** that `np.dot()` performs a matrix-matrix or matrix-vector multiplication. This is different from `np.multiply()` and the `*` operator (which is equivalent to `.*` in Matlab/Octave), which performs an element-wise multiplication.
578 |
579 | # ### 2.1 Implement the L1 and L2 loss functions
580 | #
581 | # **Exercise**: Implement the numpy vectorized version of the L1 loss. You may find the function abs(x) (absolute value of x) useful.
582 | #
583 | # **Reminder**:
584 | # - The loss is used to evaluate the performance of your model. The bigger your loss is, the more different your predictions ($ \hat{y} $) are from the true values ($y$). In deep learning, you use optimization algorithms like Gradient Descent to train your model and to minimize the cost.
585 | # - L1 loss is defined as:
586 | # $$\begin{align*} & L_1(\hat{y}, y) = \sum_{i=0}^m|y^{(i)} - \hat{y}^{(i)}| \end{align*}\tag{6}$$
587 |
588 | # In[41]:
589 |
590 | # GRADED FUNCTION: L1
591 |
592 | def L1(yhat, y):
593 | """
594 | Arguments:
595 | yhat -- vector of size m (predicted labels)
596 | y -- vector of size m (true labels)
597 |
598 | Returns:
599 | loss -- the value of the L1 loss function defined above
600 | """
601 |
602 | ### START CODE HERE ### (≈ 1 line of code)
603 | loss = np.sum(np.abs(yhat-y),axis = 0)
604 | ### END CODE HERE ###
605 |
606 | return loss
607 |
608 |
609 | # In[42]:
610 |
611 | yhat = np.array([.9, 0.2, 0.1, .4, .9])
612 | y = np.array([1, 0, 0, 1, 1])
613 | print("L1 = " + str(L1(yhat,y)))
614 |
615 |
616 | # **Expected Output**:
617 | #
618 | #
619 | #
620 | #
621 | # **L1** |
622 | # 1.1 |
623 | #
624 | #
625 | #
626 |
627 | # **Exercise**: Implement the numpy vectorized version of the L2 loss. There are several way of implementing the L2 loss but you may find the function np.dot() useful. As a reminder, if $x = [x_1, x_2, ..., x_n]$, then `np.dot(x,x)` = $\sum_{j=0}^n x_j^{2}$.
628 | #
629 | # - L2 loss is defined as $$\begin{align*} & L_2(\hat{y},y) = \sum_{i=0}^m(y^{(i)} - \hat{y}^{(i)})^2 \end{align*}\tag{7}$$
630 |
631 | # In[45]:
632 |
633 | # GRADED FUNCTION: L2
634 |
635 | def L2(yhat, y):
636 | """
637 | Arguments:
638 | yhat -- vector of size m (predicted labels)
639 | y -- vector of size m (true labels)
640 |
641 | Returns:
642 | loss -- the value of the L2 loss function defined above
643 | """
644 |
645 | ### START CODE HERE ### (≈ 1 line of code)
646 | loss = np.dot(np.abs(yhat-y),np.abs(yhat-y))
647 | ### END CODE HERE ###
648 |
649 | return loss
650 |
651 |
652 | # In[46]:
653 |
654 | yhat = np.array([.9, 0.2, 0.1, .4, .9])
655 | y = np.array([1, 0, 0, 1, 1])
656 | print("L2 = " + str(L2(yhat,y)))
657 |
658 |
659 | # **Expected Output**:
660 | #
661 | #
662 | # **L2** |
663 | # 0.43 |
664 | #
665 | #
666 |
667 | # Congratulations on completing this assignment. We hope that this little warm-up exercise helps you in the future assignments, which will be more exciting and interesting!
668 |
669 | #
670 | # **What to remember:**
671 | # - Vectorization is very important in deep learning. It provides computational efficiency and clarity.
672 | # - You have reviewed the L1 and L2 loss.
673 | # - You are familiar with many numpy functions such as np.sum, np.dot, np.multiply, np.maximum, etc...
674 |
--------------------------------------------------------------------------------