├── .github
    └── ISSUE_TEMPLATE
    │   ├── asking-questions.md
    │   ├── bug_report.md
    │   ├── clarification.md
    │   └── feature_request.md
├── Assets
    └── banner.png
├── Course 1-Neural Networks & Deep Learning
    ├── Week 1
    │   └── Week 1 Introduction to Deep Learning.pdf
    ├── Week 2
    │   ├── Logistic_Regression_with_a_Neural_Network_mindset.ipynb
    │   ├── Logistic_Regression_with_a_Neural_Network_mindset.py
    │   ├── Python_Basics_with_Numpy.ipynb
    │   ├── Python_Basics_with_Numpy.py
    │   └── Week 2 Neural Network Basics.pdf
    ├── Week 3
    │   ├── Planar_data_classification_with_one_hidden_layer.ipynb
    │   ├── Planar_data_classification_with_one_hidden_layer.py
    │   └── Week 3 Shallow Neural Networks.pdf
    └── Week 4
    │   ├── Building_your_Deep_Neural_Network_Step_by_Step.ipynb
    │   ├── Building_your_Deep_Neural_Network_Step_by_Step.py
    │   ├── Deep Neural Network - Application.ipynb
    │   ├── Deep Neural Network - Application.py
    │   └── Week 4 Key Concepts on Deep Neural Networks.pdf
├── Course 2-Improving Deep Neural Networks
    ├── Week 1
    │   ├── Gradient_Checking.ipynb
    │   ├── Gradient_Checking.py
    │   ├── Initialization.ipynb
    │   ├── Initialization.py
    │   ├── Regularization.ipynb
    │   ├── Regularization.py
    │   └── Week 1 Practical aspects of Deep Learning.pdf
    ├── Week 2
    │   ├── Optimization_methods.ipynb
    │   ├── Optimization_methods.py
    │   └── Week 2 Optimization Algorithms.pdf
    └── Week 3
    │   ├── Tensorflow_introduction.ipynb
    │   ├── Tensorflow_introduction.py
    │   └── Week 3 Hyperparameter tuning, Batch Normalization, Programming Frameworks.pdf
├── Course 3-Structuring MachineLearningProjects
    ├── Week 1 Bird Recognition in the City of Peacetopia Case Study.pdf
    └── Week 2 Autonomous Driving Case Study.pdf
├── Course 4-ConvolutionalNeuralNetworks
    ├── Week 1
    │   ├── Convolution_model_Application.ipynb
    │   ├── Convolution_model_Application.py
    │   ├── Convolution_model_Step_by_Step_v1.ipynb
    │   ├── Convolution_model_Step_by_Step_v1.py
    │   └── Week 1 The Basics of ConvNets.pdf
    ├── Week 2
    │   ├── Residual_Networks.ipynb
    │   ├── Residual_Networks.py
    │   ├── Transfer_learning_with_MobileNet_v1.ipynb
    │   ├── Transfer_learning_with_MobileNet_v1.py
    │   └── Week 2 Deep Convolutional Models.pdf
    ├── Week 3
    │   ├── Autonomous_driving_application_Car_detection.ipynb
    │   ├── Autonomous_driving_application_Car_detection.py
    │   ├── Image_segmentation_Unet_v2.ipynb
    │   ├── Image_segmentation_Unet_v2.py
    │   └── Week 3 Detection Algorithms.pdf
    └── Week 4
    │   ├── Art_Generation_with_Neural_Style_Transfer.ipynb
    │   ├── Art_Generation_with_Neural_Style_Transfer.py
    │   ├── Face_Recognition.ipynb
    │   ├── Face_Recognition.py
    │   └── Week 4 Special Applications Face Recognition and Neural Style Transfer.pdf
├── Course 5-SequenceModels
    ├── Week 1
    │   ├── Building_a_Recurrent_Neural_Network_Step_by_Step.ipynb
    │   ├── Building_a_Recurrent_Neural_Network_Step_by_Step.py
    │   ├── Dinosaurus_Island_Character_level_language_model.ipynb
    │   ├── Dinosaurus_Island_Character_level_language_model.py
    │   ├── Improvise_a_Jazz_Solo_with_an_LSTM_Network_v4.ipynb
    │   ├── Improvise_a_Jazz_Solo_with_an_LSTM_Network_v4.py
    │   └── Week 1 Recurrent Neural Networks.pdf
    ├── Week 2
    │   ├── Emoji_v3a.ipynb
    │   ├── Emoji_v3a.py
    │   ├── Operations_on_word_vectors_v2a.ipynb
    │   ├── Operations_on_word_vectors_v2a.py
    │   └── Week 2 Natural Language Processing and Word Embeddings.pdf
    ├── Week 3
    │   ├── Neural_machine_translation_with_attention_v4a.ipynb
    │   ├── Neural_machine_translation_with_attention_v4a.py
    │   ├── Trigger_word_detection_v2a.ipynb
    │   ├── Trigger_word_detection_v2a.py
    │   └── Week 3 Sequence models and Attention Mechanism.pdf
    └── Week 4
    │   ├── C5_W4_A1_Transformer_Subclass_v1.ipynb
    │   ├── C5_W4_A1_Transformer_Subclass_v1.py
    │   └── Week 4 Transformers.pdf
├── LICENSE
└── README.md


/.github/ISSUE_TEMPLATE/asking-questions.md:
--------------------------------------------------------------------------------
 1 | ---
 2 | name: Asking Questions
 3 | about: For Asking Question about the repo.
 4 | title: ''
 5 | labels: ''
 6 | assignees: ''
 7 | 
 8 | ---
 9 | 
10 | 
11 | 


--------------------------------------------------------------------------------
/.github/ISSUE_TEMPLATE/bug_report.md:
--------------------------------------------------------------------------------
 1 | ---
 2 | name: Bug report
 3 | about: Create a report to help us improve
 4 | title: ''
 5 | labels: ''
 6 | assignees: ''
 7 | 
 8 | ---
 9 | 
10 | **Describe the bug**
11 | A clear and concise description of what the bug is.
12 | 
13 | **To Reproduce**
14 | Steps to reproduce the behavior:
15 | 1. Go to '...'
16 | 2. Click on '....'
17 | 3. Scroll down to '....'
18 | 4. See error
19 | 
20 | **Expected behavior**
21 | A clear and concise description of what you expected to happen.
22 | 
23 | **Screenshots**
24 | If applicable, add screenshots to help explain your problem.
25 | 
26 | **Desktop (please complete the following information):**
27 |  - OS: [e.g. iOS]
28 |  - Browser [e.g. chrome, safari]
29 |  - Version [e.g. 22]
30 | 
31 | **Smartphone (please complete the following information):**
32 |  - Device: [e.g. iPhone6]
33 |  - OS: [e.g. iOS8.1]
34 |  - Browser [e.g. stock browser, safari]
35 |  - Version [e.g. 22]
36 | 
37 | **Additional context**
38 | Add any other context about the problem here.
39 | 


--------------------------------------------------------------------------------
/.github/ISSUE_TEMPLATE/clarification.md:
--------------------------------------------------------------------------------
 1 | ---
 2 | name: Clarification
 3 | about: Question about files and code versions.
 4 | title: ''
 5 | labels: ''
 6 | assignees: ''
 7 | 
 8 | ---
 9 | 
10 | 
11 | 


--------------------------------------------------------------------------------
/.github/ISSUE_TEMPLATE/feature_request.md:
--------------------------------------------------------------------------------
 1 | ---
 2 | name: Feature request
 3 | about: Suggest an idea for this project
 4 | title: ''
 5 | labels: ''
 6 | assignees: ''
 7 | 
 8 | ---
 9 | 
10 | **Is your feature request related to a problem? Please describe.**
11 | A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]
12 | 
13 | **Describe the solution you'd like**
14 | A clear and concise description of what you want to happen.
15 | 
16 | **Describe alternatives you've considered**
17 | A clear and concise description of any alternative solutions or features you've considered.
18 | 
19 | **Additional context**
20 | Add any other context or screenshots about the feature request here.
21 | 


--------------------------------------------------------------------------------
/Assets/banner.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/TheKidPadra/DeepLearning.AI-Deep-Learning-Specialization/0a951ae6671f739e86b9a2846c821df1a0215ef6/Assets/banner.png


--------------------------------------------------------------------------------
/Course 1-Neural Networks & Deep Learning/Week 1/Week 1 Introduction to Deep Learning.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/TheKidPadra/DeepLearning.AI-Deep-Learning-Specialization/0a951ae6671f739e86b9a2846c821df1a0215ef6/Course 1-Neural Networks & Deep Learning/Week 1/Week 1 Introduction to Deep Learning.pdf


--------------------------------------------------------------------------------
/Course 1-Neural Networks & Deep Learning/Week 2/Python_Basics_with_Numpy.py:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/env python
  2 | # coding: utf-8
  3 | 
  4 | # # Python Basics with Numpy (optional assignment)
  5 | # 
  6 | # Welcome to your first assignment. This exercise gives you a brief introduction to Python. Even if you've used Python before, this will help familiarize you with the functions we'll need.  
  7 | # 
  8 | # **Instructions:**
  9 | # - You will be using Python 3.
 10 | # - Avoid using for-loops and while-loops, unless you are explicitly told to do so.
 11 | # - After coding your function, run the cell right below it to check if your result is correct.
 12 | # 
 13 | # **After this assignment you will:**
 14 | # - Be able to use iPython Notebooks
 15 | # - Be able to use numpy functions and numpy matrix/vector operations
 16 | # - Understand the concept of "broadcasting"
 17 | # - Be able to vectorize code
 18 | # 
 19 | # Let's get started!
 20 | # 
 21 | # ## Important Note on Submission to the AutoGrader
 22 | # 
 23 | # Before submitting your assignment to the AutoGrader, please make sure you are not doing the following:
 24 | # 
 25 | # 1. You have not added any _extra_ `print` statement(s) in the assignment.
 26 | # 2. You have not added any _extra_ code cell(s) in the assignment.
 27 | # 3. You have not changed any of the function parameters.
 28 | # 4. You are not using any global variables inside your graded exercises. Unless specifically instructed to do so, please refrain from it and use the local variables instead.
 29 | # 5. You are not changing the assignment code where it is not required, like creating _extra_ variables.
 30 | # 
 31 | # If you do any of the following, you will get something like, `Grader not found` (or similarly unexpected) error upon submitting your assignment. Before asking for help/debugging the errors in your assignment, check for these first. If this is the case, and you don't remember the changes you have made, you can get a fresh copy of the assignment by following these [instructions](https://www.coursera.org/learn/neural-networks-deep-learning/supplement/iLwon/h-ow-to-refresh-your-workspace).
 32 | 
 33 | # ## Table of Contents
 34 | # - [About iPython Notebooks](#0)
 35 | #     - [Exercise 1](#ex-1)
 36 | # - [1 - Building basic functions with numpy](#1)
 37 | #     - [1.1 - sigmoid function, np.exp()](#1-1)
 38 | #         - [Exercise 2 - basic_sigmoid](#ex-2)
 39 | #         - [Exercise 3 - sigmoid](#ex-3)
 40 | #     - [1.2 - Sigmoid Gradient](#1-2)
 41 | #         - [Exercise 4 - sigmoid_derivative](#ex-4)
 42 | #     - [1.3 - Reshaping arrays](#1-3)
 43 | #         - [Exercise 5 - image2vector](#ex-5)
 44 | #     - [1.4 - Normalizing rows](#1-4)
 45 | #         - [Exercise 6 - normalize_rows](#ex-6)
 46 | #         - [Exercise 7 - softmax](#ex-7)
 47 | # - [2 - Vectorization](#2)
 48 | #     - [2.1 Implement the L1 and L2 loss functions](#2-1)
 49 | #         - [Exercise 8 - L1](#ex-8)
 50 | #         - [Exercise 9 - L2](#ex-9)
 51 | 
 52 | # <a name='0'></a>
 53 | # ## About iPython Notebooks ##
 54 | # 
 55 | # iPython Notebooks are interactive coding environments embedded in a webpage. You will be using iPython notebooks in this class. You only need to write code between the # your code here comment. After writing your code, you can run the cell by either pressing "SHIFT"+"ENTER" or by clicking on "Run Cell" (denoted by a play symbol) in the upper bar of the notebook. 
 56 | # 
 57 | # We will often specify "(≈ X lines of code)" in the comments to tell you about how much code you need to write. It is just a rough estimate, so don't feel bad if your code is longer or shorter.
 58 | # 
 59 | # <a name='ex-1'></a>
 60 | # ### Exercise 1
 61 | # Set test to `"Hello World"` in the cell below to print "Hello World" and run the two cells below.
 62 | 
 63 | # In[1]:
 64 | 
 65 | 
 66 | # (≈ 1 line of code)
 67 | # test = 
 68 | # YOUR CODE STARTS HERE
 69 | test = "Hello World"
 70 | 
 71 | # YOUR CODE ENDS HERE
 72 | 
 73 | 
 74 | # In[2]:
 75 | 
 76 | 
 77 | print ("test: " + test)
 78 | 
 79 | 
 80 | # **Expected output**:
 81 | # test: Hello World
 82 | 
 83 | # <font color='blue'>
 84 | # <b>What you need to remember </b>:
 85 | #     
 86 | # - Run your cells using SHIFT+ENTER (or "Run cell")
 87 | # - Write code in the designated areas using Python 3 only
 88 | # - Do not modify the code outside of the designated areas
 89 | 
 90 | # <a name='1'></a>
 91 | # ## 1 - Building basic functions with numpy ##
 92 | # 
 93 | # Numpy is the main package for scientific computing in Python. It is maintained by a large community (www.numpy.org). In this exercise you will learn several key numpy functions such as `np.exp`, `np.log`, and `np.reshape`. You will need to know how to use these functions for future assignments.
 94 | # 
 95 | # <a name='1-1'></a>
 96 | # ### 1.1 - sigmoid function, np.exp() ###
 97 | # 
 98 | # Before using `np.exp()`, you will use `math.exp()` to implement the sigmoid function. You will then see why `np.exp()` is preferable to `math.exp()`.
 99 | # 
100 | # <a name='ex-2'></a>
101 | # ### Exercise 2 - basic_sigmoid
102 | # Build a function that returns the sigmoid of a real number x. Use `math.exp(x)` for the exponential function.
103 | # 
104 | # **Reminder**:
105 | # $sigmoid(x) = \frac{1}{1+e^{-x}}$ is sometimes also known as the logistic function. It is a non-linear function used not only in Machine Learning (Logistic Regression), but also in Deep Learning.
106 | # 
107 | # <img src="images/Sigmoid.png" style="width:500px;height:228px;">
108 | # 
109 | # To refer to a function belonging to a specific package you could call it using `package_name.function()`. Run the code below to see an example with `math.exp()`.
110 | 
111 | # In[6]:
112 | 
113 | 
114 | import math
115 | from public_tests import *
116 | 
117 | # GRADED FUNCTION: basic_sigmoid
118 | 
119 | def basic_sigmoid(x):
120 |     """
121 |     Compute sigmoid of x.
122 | 
123 |     Arguments:
124 |     x -- A scalar
125 | 
126 |     Return:
127 |     s -- sigmoid(x)
128 |     """
129 |     # (≈ 1 line of code)
130 |     # s = 
131 |     # YOUR CODE STARTS HERE
132 |     s = 1/(1+math.exp(-x))
133 |     
134 |     # YOUR CODE ENDS HERE
135 |     
136 |     return s
137 | 
138 | 
139 | # In[7]:
140 | 
141 | 
142 | print("basic_sigmoid(1) = " + str(basic_sigmoid(1)))
143 | 
144 | basic_sigmoid_test(basic_sigmoid)
145 | 
146 | 
147 | # Actually, we rarely use the "math" library in deep learning because the inputs of the functions are real numbers. In deep learning we mostly use matrices and vectors. This is why numpy is more useful. 
148 | 
149 | # In[8]:
150 | 
151 | 
152 | ### One reason why we use "numpy" instead of "math" in Deep Learning ###
153 | 
154 | x = [1, 2, 3] # x becomes a python list object
155 | basic_sigmoid(x) # you will see this give an error when you run it, because x is a vector.
156 | 
157 | 
158 | # In fact, if $ x = (x_1, x_2, ..., x_n)$ is a row vector then `np.exp(x)` will apply the exponential function to every element of x. The output will thus be: `np.exp(x) = (e^{x_1}, e^{x_2}, ..., e^{x_n})`
159 | 
160 | # In[9]:
161 | 
162 | 
163 | import numpy as np
164 | 
165 | # example of np.exp
166 | t_x = np.array([1, 2, 3])
167 | print(np.exp(t_x)) # result is (exp(1), exp(2), exp(3))
168 | 
169 | 
170 | # Furthermore, if x is a vector, then a Python operation such as $s = x + 3$ or $s = \frac{1}{x}$ will output s as a vector of the same size as x.
171 | 
172 | # In[10]:
173 | 
174 | 
175 | # example of vector operation
176 | t_x = np.array([1, 2, 3])
177 | print (t_x + 3)
178 | 
179 | 
180 | # Any time you need more info on a numpy function, we encourage you to look at [the official documentation](https://docs.scipy.org/doc/numpy-1.10.1/reference/generated/numpy.exp.html). 
181 | # 
182 | # You can also create a new cell in the notebook and write `np.exp?` (for example) to get quick access to the documentation.
183 | # 
184 | # <a name='ex-3'></a>
185 | # ### Exercise 3 - sigmoid
186 | # Implement the sigmoid function using numpy. 
187 | # 
188 | # **Instructions**: x could now be either a real number, a vector, or a matrix. The data structures we use in numpy to represent these shapes (vectors, matrices...) are called numpy arrays. You don't need to know more for now.
189 | # $$ \text{For } x \in \mathbb{R}^n \text{,     } sigmoid(x) = sigmoid\begin{pmatrix}
190 | #     x_1  \\
191 | #     x_2  \\
192 | #     ...  \\
193 | #     x_n  \\
194 | # \end{pmatrix} = \begin{pmatrix}
195 | #     \frac{1}{1+e^{-x_1}}  \\
196 | #     \frac{1}{1+e^{-x_2}}  \\
197 | #     ...  \\
198 | #     \frac{1}{1+e^{-x_n}}  \\
199 | # \end{pmatrix}\tag{1} $$
200 | 
201 | # In[11]:
202 | 
203 | 
204 | # GRADED FUNCTION: sigmoid
205 | import numpy as np 
206 | 
207 | def sigmoid(x):
208 |     """
209 |     Compute the sigmoid of x
210 | 
211 |     Arguments:
212 |     x -- A scalar or numpy array of any size
213 | 
214 |     Return:
215 |     s -- sigmoid(x)
216 |     """
217 |     
218 |     # (≈ 1 line of code)
219 |     # s = 
220 |     # YOUR CODE STARTS HERE
221 |     s=1/(1+np.exp(-x))
222 |     # YOUR CODE ENDS HERE
223 |     
224 |     return s
225 | 
226 | 
227 | # In[12]:
228 | 
229 | 
230 | t_x = np.array([1, 2, 3])
231 | print("sigmoid(t_x) = " + str(sigmoid(t_x)))
232 | 
233 | sigmoid_test(sigmoid)
234 | 
235 | 
236 | # <a name='1-2'></a>
237 | # ### 1.2 - Sigmoid Gradient
238 | # 
239 | # As you've seen in lecture, you will need to compute gradients to optimize loss functions using backpropagation. Let's code your first gradient function.
240 | # 
241 | # <a name='ex-4'></a>
242 | # ### Exercise 4 - sigmoid_derivative
243 | # Implement the function sigmoid_grad() to compute the gradient of the sigmoid function with respect to its input x. The formula is: 
244 | # 
245 | # $$sigmoid\_derivative(x) = \sigma'(x) = \sigma(x) (1 - \sigma(x))\tag{2}$$
246 | # 
247 | # You often code this function in two steps:
248 | # 1. Set s to be the sigmoid of x. You might find your sigmoid(x) function useful.
249 | # 2. Compute $\sigma'(x) = s(1-s)$
250 | 
251 | # In[13]:
252 | 
253 | 
254 | # GRADED FUNCTION: sigmoid_derivative
255 | 
256 | def sigmoid_derivative(x):
257 |     """
258 |     Compute the gradient (also called the slope or derivative) of the sigmoid function with respect to its input x.
259 |     You can store the output of the sigmoid function into variables and then use it to calculate the gradient.
260 |     
261 |     Arguments:
262 |     x -- A scalar or numpy array
263 | 
264 |     Return:
265 |     ds -- Your computed gradient.
266 |     """
267 |     
268 |     #(≈ 2 lines of code)
269 |     # s = 
270 |     # ds = 
271 |     # YOUR CODE STARTS HERE
272 |     s = 1/(1+np.exp(-x))
273 |     ds = s*(1-s)
274 |     # YOUR CODE ENDS HERE
275 |     
276 |     return ds
277 | 
278 | 
279 | # In[14]:
280 | 
281 | 
282 | t_x = np.array([1, 2, 3])
283 | print ("sigmoid_derivative(t_x) = " + str(sigmoid_derivative(t_x)))
284 | 
285 | sigmoid_derivative_test(sigmoid_derivative)
286 | 
287 | 
288 | # <a name='1-3'></a>
289 | # ### 1.3 - Reshaping arrays ###
290 | # 
291 | # Two common numpy functions used in deep learning are [np.shape](https://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.shape.html) and [np.reshape()](https://docs.scipy.org/doc/numpy/reference/generated/numpy.reshape.html). 
292 | # - X.shape is used to get the shape (dimension) of a matrix/vector X. 
293 | # - X.reshape(...) is used to reshape X into some other dimension. 
294 | # 
295 | # For example, in computer science, an image is represented by a 3D array of shape $(length, height, depth = 3)$. However, when you read an image as the input of an algorithm you convert it to a vector of shape $(length*height*3, 1)$. In other words, you "unroll", or reshape, the 3D array into a 1D vector.
296 | # 
297 | # <img src="images/image2vector_kiank.png" style="width:500px;height:300;">
298 | # 
299 | # <a name='ex-5'></a>
300 | # ### Exercise 5 - image2vector
301 | # Implement `image2vector()` that takes an input of shape (length, height, 3) and returns a vector of shape (length\*height\*3, 1). For example, if you would like to reshape an array v of shape (a, b, c) into a vector of shape (a*b,c) you would do:
302 | # ``` python
303 | # v = v.reshape((v.shape[0] * v.shape[1], v.shape[2])) # v.shape[0] = a ; v.shape[1] = b ; v.shape[2] = c
304 | # ```
305 | # - Please don't hardcode the dimensions of image as a constant. Instead look up the quantities you need with `image.shape[0]`, etc. 
306 | # - You can use v = v.reshape(-1, 1). Just make sure you understand why it works.
307 | 
308 | # In[15]:
309 | 
310 | 
311 | # GRADED FUNCTION:image2vector
312 | 
313 | def image2vector(image):
314 |     """
315 |     Argument:
316 |     image -- a numpy array of shape (length, height, depth)
317 |     
318 |     Returns:
319 |     v -- a vector of shape (length*height*depth, 1)
320 |     """
321 |     
322 |     # (≈ 1 line of code)
323 |     # v =
324 |     # YOUR CODE STARTS HERE
325 |     v = image.reshape((image.shape[2] * image.shape[1] * image.shape[0], 1))
326 |     # YOUR CODE ENDS HERE
327 |     
328 |     return v
329 | 
330 | 
331 | # In[16]:
332 | 
333 | 
334 | # This is a 3 by 3 by 2 array, typically images will be (num_px_x, num_px_y,3) where 3 represents the RGB values
335 | t_image = np.array([[[ 0.67826139,  0.29380381],
336 |                      [ 0.90714982,  0.52835647],
337 |                      [ 0.4215251 ,  0.45017551]],
338 | 
339 |                    [[ 0.92814219,  0.96677647],
340 |                     [ 0.85304703,  0.52351845],
341 |                     [ 0.19981397,  0.27417313]],
342 | 
343 |                    [[ 0.60659855,  0.00533165],
344 |                     [ 0.10820313,  0.49978937],
345 |                     [ 0.34144279,  0.94630077]]])
346 | 
347 | print ("image2vector(image) = " + str(image2vector(t_image)))
348 | 
349 | image2vector_test(image2vector)
350 | 
351 | 
352 | # <a name='1-4'></a>
353 | # ### 1.4 - Normalizing rows
354 | # 
355 | # Another common technique we use in Machine Learning and Deep Learning is to normalize our data. It often leads to a better performance because gradient descent converges faster after normalization. Here, by normalization we mean changing x to $ \frac{x}{\| x\|} $ (dividing each row vector of x by its norm).
356 | # 
357 | # For example, if 
358 | # $$x = \begin{bmatrix}
359 | #         0 & 3 & 4 \\
360 | #         2 & 6 & 4 \\
361 | # \end{bmatrix}\tag{3}$$ 
362 | # then 
363 | # $$\| x\| = \text{np.linalg.norm(x, axis=1, keepdims=True)} = \begin{bmatrix}
364 | #     5 \\
365 | #     \sqrt{56} \\
366 | # \end{bmatrix}\tag{4} $$
367 | # and
368 | # $$ x\_normalized = \frac{x}{\| x\|} = \begin{bmatrix}
369 | #     0 & \frac{3}{5} & \frac{4}{5} \\
370 | #     \frac{2}{\sqrt{56}} & \frac{6}{\sqrt{56}} & \frac{4}{\sqrt{56}} \\
371 | # \end{bmatrix}\tag{5}$$ 
372 | # 
373 | # Note that you can divide matrices of different sizes and it works fine: this is called broadcasting and you're going to learn about it in part 5.
374 | # 
375 | # With `keepdims=True` the result will broadcast correctly against the original x.
376 | # 
377 | # `axis=1` means you are going to get the norm in a row-wise manner. If you need the norm in a column-wise way, you would need to set `axis=0`. 
378 | # 
379 | # numpy.linalg.norm has another parameter `ord` where we specify the type of normalization to be done (in the exercise below you'll do 2-norm). To get familiar with the types of normalization you can visit [numpy.linalg.norm](https://numpy.org/doc/stable/reference/generated/numpy.linalg.norm.html)
380 | # 
381 | # <a name='ex-6'></a>
382 | # ### Exercise 6 - normalize_rows
383 | # Implement normalizeRows() to normalize the rows of a matrix. After applying this function to an input matrix x, each row of x should be a vector of unit length (meaning length 1).
384 | # 
385 | # **Note**: Don't try to use `x /= x_norm`. For the matrix division numpy must broadcast the x_norm, which is not supported by the operant `/=`
386 | 
387 | # In[17]:
388 | 
389 | 
390 | # GRADED FUNCTION: normalize_rows
391 | 
392 | def normalize_rows(x):
393 |     """
394 |     Implement a function that normalizes each row of the matrix x (to have unit length).
395 |     
396 |     Argument:
397 |     x -- A numpy matrix of shape (n, m)
398 |     
399 |     Returns:
400 |     x -- The normalized (by row) numpy matrix. You are allowed to modify x.
401 |     """
402 |     
403 |     #(≈ 2 lines of code)
404 |     # Compute x_norm as the norm 2 of x. Use np.linalg.norm(..., ord = 2, axis = ..., keepdims = True)
405 |     # x_norm =
406 |     # Divide x by its norm.
407 |     # x =
408 |     # YOUR CODE STARTS HERE
409 |     x_norm = np.linalg.norm(x,ord = 2,axis=1,keepdims=True)
410 |     x=x/x_norm
411 |     # YOUR CODE ENDS HERE
412 | 
413 |     return x
414 | 
415 | 
416 | # In[18]:
417 | 
418 | 
419 | x = np.array([[0, 3, 4],
420 |               [1, 6, 4]])
421 | print("normalizeRows(x) = " + str(normalize_rows(x)))
422 | 
423 | normalizeRows_test(normalize_rows)
424 | 
425 | 
426 | # **Note**:
427 | # In normalize_rows(), you can try to print the shapes of x_norm and x, and then rerun the assessment. You'll find out that they have different shapes. This is normal given that x_norm takes the norm of each row of x. So x_norm has the same number of rows but only 1 column. So how did it work when you divided x by x_norm? This is called broadcasting and we'll talk about it now! 
428 | 
429 | # <a name='ex-7'></a>
430 | # ### Exercise 7 - softmax
431 | # Implement a softmax function using numpy. You can think of softmax as a normalizing function used when your algorithm needs to classify two or more classes. You will learn more about softmax in the second course of this specialization.
432 | # 
433 | # **Instructions**:
434 | # - $\text{for } x \in \mathbb{R}^{1\times n} \text{,     }$
435 | # 
436 | # \begin{align*}
437 | #  softmax(x) &= softmax\left(\begin{bmatrix}
438 | #     x_1  &&
439 | #     x_2 &&
440 | #     ...  &&
441 | #     x_n  
442 | # \end{bmatrix}\right) \\&= \begin{bmatrix}
443 | #     \frac{e^{x_1}}{\sum_{j}e^{x_j}}  &&
444 | #     \frac{e^{x_2}}{\sum_{j}e^{x_j}}  &&
445 | #     ...  &&
446 | #     \frac{e^{x_n}}{\sum_{j}e^{x_j}} 
447 | # \end{bmatrix} 
448 | # \end{align*}
449 | # 
450 | # - $\text{for a matrix } x \in \mathbb{R}^{m \times n} \text{,  $x_{ij}$ maps to the element in the $i^{th}$ row and $j^{th}$ column of $x$, thus we have: }$  
451 | # 
452 | # \begin{align*}
453 | # softmax(x) &= softmax\begin{bmatrix}
454 | #             x_{11} & x_{12} & x_{13} & \dots  & x_{1n} \\
455 | #             x_{21} & x_{22} & x_{23} & \dots  & x_{2n} \\
456 | #             \vdots & \vdots & \vdots & \ddots & \vdots \\
457 | #             x_{m1} & x_{m2} & x_{m3} & \dots  & x_{mn}
458 | #             \end{bmatrix} \\ \\&= 
459 | #  \begin{bmatrix}
460 | #     \frac{e^{x_{11}}}{\sum_{j}e^{x_{1j}}} & \frac{e^{x_{12}}}{\sum_{j}e^{x_{1j}}} & \frac{e^{x_{13}}}{\sum_{j}e^{x_{1j}}} & \dots  & \frac{e^{x_{1n}}}{\sum_{j}e^{x_{1j}}} \\
461 | #     \frac{e^{x_{21}}}{\sum_{j}e^{x_{2j}}} & \frac{e^{x_{22}}}{\sum_{j}e^{x_{2j}}} & \frac{e^{x_{23}}}{\sum_{j}e^{x_{2j}}} & \dots  & \frac{e^{x_{2n}}}{\sum_{j}e^{x_{2j}}} \\
462 | #     \vdots & \vdots & \vdots & \ddots & \vdots \\
463 | #     \frac{e^{x_{m1}}}{\sum_{j}e^{x_{mj}}} & \frac{e^{x_{m2}}}{\sum_{j}e^{x_{mj}}} & \frac{e^{x_{m3}}}{\sum_{j}e^{x_{mj}}} & \dots  & \frac{e^{x_{mn}}}{\sum_{j}e^{x_{mj}}}
464 | # \end{bmatrix} \\ \\ &= \begin{pmatrix}
465 | #     softmax\text{(first row of x)}  \\
466 | #     softmax\text{(second row of x)} \\
467 | #     \vdots  \\
468 | #     softmax\text{(last row of x)} \\
469 | # \end{pmatrix} 
470 | # \end{align*}
471 | 
472 | # **Notes:**
473 | # Note that later in the course, you'll see "m" used to represent the "number of training examples", and each training example is in its own column of the matrix. Also, each feature will be in its own row (each row has data for the same feature).  
474 | # Softmax should be performed for all features of each training example, so softmax would be performed on the columns (once we switch to that representation later in this course).
475 | # 
476 | # However, in this coding practice, we're just focusing on getting familiar with Python, so we're using the common math notation $m \times n$  
477 | # where $m$ is the number of rows and $n$ is the number of columns.
478 | 
479 | # In[19]:
480 | 
481 | 
482 | # GRADED FUNCTION: softmax
483 | 
484 | def softmax(x):
485 |     """Calculates the softmax for each row of the input x.
486 | 
487 |     Your code should work for a row vector and also for matrices of shape (m,n).
488 | 
489 |     Argument:
490 |     x -- A numpy matrix of shape (m,n)
491 | 
492 |     Returns:
493 |     s -- A numpy matrix equal to the softmax of x, of shape (m,n)
494 |     """
495 |     
496 |     #(≈ 3 lines of code)
497 |     # Apply exp() element-wise to x. Use np.exp(...).
498 |     # x_exp = ...
499 | 
500 |     # Create a vector x_sum that sums each row of x_exp. Use np.sum(..., axis = 1, keepdims = True).
501 |     # x_sum = ...
502 |     
503 |     # Compute softmax(x) by dividing x_exp by x_sum. It should automatically use numpy broadcasting.
504 |     # s = ...
505 |     
506 |     # YOUR CODE STARTS HERE
507 |     x_exp = np.exp(x)
508 |     x_sum = np.sum(x_exp,axis = 1, keepdims=True)
509 |     s = x_exp/x_sum
510 |     # YOUR CODE ENDS HERE
511 |     
512 |     return s
513 | 
514 | 
515 | # In[20]:
516 | 
517 | 
518 | t_x = np.array([[9, 2, 5, 0, 0],
519 |                 [7, 5, 0, 0 ,0]])
520 | print("softmax(x) = " + str(softmax(t_x)))
521 | 
522 | softmax_test(softmax)
523 | 
524 | 
525 | # #### Notes
526 | # - If you print the shapes of x_exp, x_sum and s above and rerun the assessment cell, you will see that x_sum is of shape (2,1) while x_exp and s are of shape (2,5). **x_exp/x_sum** works due to python broadcasting.
527 | # 
528 | # Congratulations! You now have a pretty good understanding of python numpy and have implemented a few useful functions that you will be using in deep learning.
529 | 
530 | # <font color='blue'>
531 | # <b>What you need to remember:</b>
532 | #     
533 | # - np.exp(x) works for any np.array x and applies the exponential function to every coordinate
534 | # - the sigmoid function and its gradient
535 | # - image2vector is commonly used in deep learning
536 | # - np.reshape is widely used. In the future, you'll see that keeping your matrix/vector dimensions straight will go toward eliminating a lot of bugs. 
537 | # - numpy has efficient built-in functions
538 | # - broadcasting is extremely useful
539 | 
540 | # <a name='2'></a>
541 | # ## 2 - Vectorization
542 | # 
543 | # 
544 | # In deep learning, you deal with very large datasets. Hence, a non-computationally-optimal function can become a huge bottleneck in your algorithm and can result in a model that takes ages to run. To make sure that your code is  computationally efficient, you will use vectorization. For example, try to tell the difference between the following implementations of the dot/outer/elementwise product.
545 | 
546 | # In[21]:
547 | 
548 | 
549 | import time
550 | 
551 | x1 = [9, 2, 5, 0, 0, 7, 5, 0, 0, 0, 9, 2, 5, 0, 0]
552 | x2 = [9, 2, 2, 9, 0, 9, 2, 5, 0, 0, 9, 2, 5, 0, 0]
553 | 
554 | ### CLASSIC DOT PRODUCT OF VECTORS IMPLEMENTATION ###
555 | tic = time.process_time()
556 | dot = 0
557 | 
558 | for i in range(len(x1)):
559 |     dot += x1[i] * x2[i]
560 | toc = time.process_time()
561 | print ("dot = " + str(dot) + "\n ----- Computation time = " + str(1000 * (toc - tic)) + "ms")
562 | 
563 | ### CLASSIC OUTER PRODUCT IMPLEMENTATION ###
564 | tic = time.process_time()
565 | outer = np.zeros((len(x1), len(x2))) # we create a len(x1)*len(x2) matrix with only zeros
566 | 
567 | for i in range(len(x1)):
568 |     for j in range(len(x2)):
569 |         outer[i,j] = x1[i] * x2[j]
570 | toc = time.process_time()
571 | print ("outer = " + str(outer) + "\n ----- Computation time = " + str(1000 * (toc - tic)) + "ms")
572 | 
573 | ### CLASSIC ELEMENTWISE IMPLEMENTATION ###
574 | tic = time.process_time()
575 | mul = np.zeros(len(x1))
576 | 
577 | for i in range(len(x1)):
578 |     mul[i] = x1[i] * x2[i]
579 | toc = time.process_time()
580 | print ("elementwise multiplication = " + str(mul) + "\n ----- Computation time = " + str(1000 * (toc - tic)) + "ms")
581 | 
582 | ### CLASSIC GENERAL DOT PRODUCT IMPLEMENTATION ###
583 | W = np.random.rand(3,len(x1)) # Random 3*len(x1) numpy array
584 | tic = time.process_time()
585 | gdot = np.zeros(W.shape[0])
586 | 
587 | for i in range(W.shape[0]):
588 |     for j in range(len(x1)):
589 |         gdot[i] += W[i,j] * x1[j]
590 | toc = time.process_time()
591 | print ("gdot = " + str(gdot) + "\n ----- Computation time = " + str(1000 * (toc - tic)) + "ms")
592 | 
593 | 
594 | # In[22]:
595 | 
596 | 
597 | x1 = [9, 2, 5, 0, 0, 7, 5, 0, 0, 0, 9, 2, 5, 0, 0]
598 | x2 = [9, 2, 2, 9, 0, 9, 2, 5, 0, 0, 9, 2, 5, 0, 0]
599 | 
600 | ### VECTORIZED DOT PRODUCT OF VECTORS ###
601 | tic = time.process_time()
602 | dot = np.dot(x1,x2)
603 | toc = time.process_time()
604 | print ("dot = " + str(dot) + "\n ----- Computation time = " + str(1000 * (toc - tic)) + "ms")
605 | 
606 | ### VECTORIZED OUTER PRODUCT ###
607 | tic = time.process_time()
608 | outer = np.outer(x1,x2)
609 | toc = time.process_time()
610 | print ("outer = " + str(outer) + "\n ----- Computation time = " + str(1000 * (toc - tic)) + "ms")
611 | 
612 | ### VECTORIZED ELEMENTWISE MULTIPLICATION ###
613 | tic = time.process_time()
614 | mul = np.multiply(x1,x2)
615 | toc = time.process_time()
616 | print ("elementwise multiplication = " + str(mul) + "\n ----- Computation time = " + str(1000*(toc - tic)) + "ms")
617 | 
618 | ### VECTORIZED GENERAL DOT PRODUCT ###
619 | tic = time.process_time()
620 | dot = np.dot(W,x1)
621 | toc = time.process_time()
622 | print ("gdot = " + str(dot) + "\n ----- Computation time = " + str(1000 * (toc - tic)) + "ms")
623 | 
624 | 
625 | # As you may have noticed, the vectorized implementation is much cleaner and more efficient. For bigger vectors/matrices, the differences in running time become even bigger. 
626 | # 
627 | # **Note** that `np.dot()` performs a matrix-matrix or matrix-vector multiplication. This is different from `np.multiply()` and the `*` operator (which is equivalent to  `.*` in Matlab/Octave), which performs an element-wise multiplication.
628 | 
629 | # <a name='2-1'></a>
630 | # ### 2.1 Implement the L1 and L2 loss functions
631 | # 
632 | # <a name='ex-8'></a>
633 | # ### Exercise 8 - L1 
634 | # Implement the numpy vectorized version of the L1 loss. You may find the function abs(x) (absolute value of x) useful.
635 | # 
636 | # **Reminder**:
637 | # - The loss is used to evaluate the performance of your model. The bigger your loss is, the more different your predictions ($ \hat{y} $) are from the true values ($y$). In deep learning, you use optimization algorithms like Gradient Descent to train your model and to minimize the cost.
638 | # - L1 loss is defined as:
639 | # $$\begin{align*} & L_1(\hat{y}, y) = \sum_{i=0}^{m-1}|y^{(i)} - \hat{y}^{(i)}| \end{align*}\tag{6}$$
640 | 
641 | # In[23]:
642 | 
643 | 
644 | # GRADED FUNCTION: L1
645 | 
646 | def L1(yhat, y):
647 |     """
648 |     Arguments:
649 |     yhat -- vector of size m (predicted labels)
650 |     y -- vector of size m (true labels)
651 |     
652 |     Returns:
653 |     loss -- the value of the L1 loss function defined above
654 |     """
655 |     
656 |     #(≈ 1 line of code)
657 |     # loss = 
658 |     # YOUR CODE STARTS HERE
659 |     loss = sum(abs(y-yhat))
660 |     # YOUR CODE ENDS HERE
661 |     
662 |     return loss
663 | 
664 | 
665 | # In[24]:
666 | 
667 | 
668 | yhat = np.array([.9, 0.2, 0.1, .4, .9])
669 | y = np.array([1, 0, 0, 1, 1])
670 | print("L1 = " + str(L1(yhat, y)))
671 | 
672 | L1_test(L1)
673 | 
674 | 
675 | # <a name='ex-9'></a>
676 | # ### Exercise 9 - L2
677 | # Implement the numpy vectorized version of the L2 loss. There are several way of implementing the L2 loss but you may find the function np.dot() useful. As a reminder, if $x = [x_1, x_2, ..., x_n]$, then `np.dot(x,x)` = $\sum_{j=0}^n x_j^{2}$. 
678 | # 
679 | # - L2 loss is defined as $$\begin{align*} & L_2(\hat{y},y) = \sum_{i=0}^{m-1}(y^{(i)} - \hat{y}^{(i)})^2 \end{align*}\tag{7}$$
680 | 
681 | # In[27]:
682 | 
683 | 
684 | # GRADED FUNCTION: L2
685 | 
686 | def L2(yhat, y):
687 |     """
688 |     Arguments:
689 |     yhat -- vector of size m (predicted labels)
690 |     y -- vector of size m (true labels)
691 |     
692 |     Returns:
693 |     loss -- the value of the L2 loss function defined above
694 |     """
695 |     
696 |     #(≈ 1 line of code)
697 |     # loss = ...
698 |     # YOUR CODE STARTS HERE
699 |     loss = sum((y-yhat)**2)
700 |     # YOUR CODE ENDS HERE
701 |     
702 |     return loss
703 | 
704 | 
705 | # In[28]:
706 | 
707 | 
708 | yhat = np.array([.9, 0.2, 0.1, .4, .9])
709 | y = np.array([1, 0, 0, 1, 1])
710 | 
711 | print("L2 = " + str(L2(yhat, y)))
712 | 
713 | L2_test(L2)
714 | 
715 | 
716 | # Congratulations on completing this assignment. We hope that this little warm-up exercise helps you in the future assignments, which will be more exciting and interesting!
717 | 
718 | # <font color='blue'>
719 | # <b>What to remember:</b>
720 | #     
721 | # - Vectorization is very important in deep learning. It provides computational efficiency and clarity.
722 | # - You have reviewed the L1 and L2 loss.
723 | # - You are familiar with many numpy functions such as np.sum, np.dot, np.multiply, np.maximum, etc...
724 | 


--------------------------------------------------------------------------------
/Course 1-Neural Networks & Deep Learning/Week 2/Week 2 Neural Network Basics.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/TheKidPadra/DeepLearning.AI-Deep-Learning-Specialization/0a951ae6671f739e86b9a2846c821df1a0215ef6/Course 1-Neural Networks & Deep Learning/Week 2/Week 2 Neural Network Basics.pdf


--------------------------------------------------------------------------------
/Course 1-Neural Networks & Deep Learning/Week 3/Week 3 Shallow Neural Networks.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/TheKidPadra/DeepLearning.AI-Deep-Learning-Specialization/0a951ae6671f739e86b9a2846c821df1a0215ef6/Course 1-Neural Networks & Deep Learning/Week 3/Week 3 Shallow Neural Networks.pdf


--------------------------------------------------------------------------------
/Course 1-Neural Networks & Deep Learning/Week 4/Deep Neural Network - Application.py:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/env python
  2 | # coding: utf-8
  3 | 
  4 | # # Deep Neural Network for Image Classification: Application
  5 | # 
  6 | # By the time you complete this notebook, you will have finished the last programming assignment of Week 4, and also the last programming assignment of Course 1! Go you! 
  7 | # 
  8 | # To build your cat/not-a-cat classifier, you'll use the functions from the previous assignment to build a deep network. Hopefully, you'll see an improvement in accuracy over your previous logistic regression implementation.  
  9 | # 
 10 | # **After this assignment you will be able to:**
 11 | # 
 12 | # - Build and train a deep L-layer neural network, and apply it to supervised learning
 13 | # 
 14 | # Let's get started!
 15 | # 
 16 | # ## Important Note on Submission to the AutoGrader
 17 | # 
 18 | # Before submitting your assignment to the AutoGrader, please make sure you are not doing the following:
 19 | # 
 20 | # 1. You have not added any _extra_ `print` statement(s) in the assignment.
 21 | # 2. You have not added any _extra_ code cell(s) in the assignment.
 22 | # 3. You have not changed any of the function parameters.
 23 | # 4. You are not using any global variables inside your graded exercises. Unless specifically instructed to do so, please refrain from it and use the local variables instead.
 24 | # 5. You are not changing the assignment code where it is not required, like creating _extra_ variables.
 25 | # 
 26 | # If you do any of the following, you will get something like, `Grader Error: Grader feedback not found` (or similarly unexpected) error upon submitting your assignment. Before asking for help/debugging the errors in your assignment, check for these first. If this is the case, and you don't remember the changes you have made, you can get a fresh copy of the assignment by following these [instructions](https://www.coursera.org/learn/neural-networks-deep-learning/supplement/iLwon/h-ow-to-refresh-your-workspace).
 27 | 
 28 | # ## Table of Contents
 29 | # - [1 - Packages](#1)
 30 | # - [2 - Load and Process the Dataset](#2)
 31 | # - [3 - Model Architecture](#3)
 32 | #     - [3.1 - 2-layer Neural Network](#3-1)
 33 | #     - [3.2 - L-layer Deep Neural Network](#3-2)
 34 | #     - [3.3 - General Methodology](#3-3)
 35 | # - [4 - Two-layer Neural Network](#4)
 36 | #     - [Exercise 1 - two_layer_model](#ex-1)
 37 | #     - [4.1 - Train the model](#4-1)
 38 | # - [5 - L-layer Neural Network](#5)
 39 | #     - [Exercise 2 - L_layer_model](#ex-2)
 40 | #     - [5.1 - Train the model](#5-1)
 41 | # - [6 - Results Analysis](#6)
 42 | # - [7 - Test with your own image (optional/ungraded exercise)](#7)
 43 | 
 44 | # <a name='1'></a>
 45 | # ## 1 - Packages
 46 | 
 47 | # Begin by importing all the packages you'll need during this assignment. 
 48 | # 
 49 | # - [numpy](https://www.numpy.org/) is the fundamental package for scientific computing with Python.
 50 | # - [matplotlib](http://matplotlib.org) is a library to plot graphs in Python.
 51 | # - [h5py](http://www.h5py.org) is a common package to interact with a dataset that is stored on an H5 file.
 52 | # - [PIL](http://www.pythonware.com/products/pil/) and [scipy](https://www.scipy.org/) are used here to test your model with your own picture at the end.
 53 | # - `dnn_app_utils` provides the functions implemented in the "Building your Deep Neural Network: Step by Step" assignment to this notebook.
 54 | # - `np.random.seed(1)` is used to keep all the random function calls consistent. It helps grade your work - so please don't change it! 
 55 | 
 56 | # In[4]:
 57 | 
 58 | 
 59 | import time
 60 | import numpy as np
 61 | import h5py
 62 | import matplotlib.pyplot as plt
 63 | import scipy
 64 | from PIL import Image
 65 | from scipy import ndimage
 66 | from dnn_app_utils_v3 import *
 67 | from public_tests import *
 68 | 
 69 | get_ipython().run_line_magic('matplotlib', 'inline')
 70 | plt.rcParams['figure.figsize'] = (5.0, 4.0) # set default size of plots
 71 | plt.rcParams['image.interpolation'] = 'nearest'
 72 | plt.rcParams['image.cmap'] = 'gray'
 73 | 
 74 | get_ipython().run_line_magic('load_ext', 'autoreload')
 75 | get_ipython().run_line_magic('autoreload', '2')
 76 | 
 77 | np.random.seed(1)
 78 | 
 79 | 
 80 | # <a name='2'></a>
 81 | # ## 2 - Load and Process the Dataset
 82 | # 
 83 | # You'll be using the same "Cat vs non-Cat" dataset as in "Logistic Regression as a Neural Network" (Assignment 2). The model you built back then had 70% test accuracy on classifying cat vs non-cat images. Hopefully, your new model will perform even better!
 84 | # 
 85 | # **Problem Statement**: You are given a dataset ("data.h5") containing:
 86 | #     - a training set of `m_train` images labelled as cat (1) or non-cat (0)
 87 | #     - a test set of `m_test` images labelled as cat and non-cat
 88 | #     - each image is of shape (num_px, num_px, 3) where 3 is for the 3 channels (RGB).
 89 | # 
 90 | # Let's get more familiar with the dataset. Load the data by running the cell below.
 91 | 
 92 | # In[5]:
 93 | 
 94 | 
 95 | train_x_orig, train_y, test_x_orig, test_y, classes = load_data()
 96 | 
 97 | 
 98 | # The following code will show you an image in the dataset. Feel free to change the index and re-run the cell multiple times to check out other images. 
 99 | 
100 | # In[6]:
101 | 
102 | 
103 | # Example of a picture
104 | index = 25
105 | plt.imshow(train_x_orig[index])
106 | print ("y = " + str(train_y[0,index]) + ". It's a " + classes[train_y[0,index]].decode("utf-8") +  " picture.")
107 | 
108 | 
109 | # In[7]:
110 | 
111 | 
112 | # Explore your dataset 
113 | m_train = train_x_orig.shape[0]
114 | num_px = train_x_orig.shape[1]
115 | m_test = test_x_orig.shape[0]
116 | 
117 | print ("Number of training examples: " + str(m_train))
118 | print ("Number of testing examples: " + str(m_test))
119 | print ("Each image is of size: (" + str(num_px) + ", " + str(num_px) + ", 3)")
120 | print ("train_x_orig shape: " + str(train_x_orig.shape))
121 | print ("train_y shape: " + str(train_y.shape))
122 | print ("test_x_orig shape: " + str(test_x_orig.shape))
123 | print ("test_y shape: " + str(test_y.shape))
124 | 
125 | 
126 | # As usual, you reshape and standardize the images before feeding them to the network. The code is given in the cell below.
127 | # 
128 | # <img src="images/imvectorkiank.png" style="width:450px;height:300px;">
129 | # <caption><center><font color='purple'><b>Figure 1</b>: Image to vector conversion.</font></center></caption>
130 | 
131 | # In[8]:
132 | 
133 | 
134 | # Reshape the training and test examples 
135 | train_x_flatten = train_x_orig.reshape(train_x_orig.shape[0], -1).T   # The "-1" makes reshape flatten the remaining dimensions
136 | test_x_flatten = test_x_orig.reshape(test_x_orig.shape[0], -1).T
137 | 
138 | # Standardize data to have feature values between 0 and 1.
139 | train_x = train_x_flatten/255.
140 | test_x = test_x_flatten/255.
141 | 
142 | print ("train_x's shape: " + str(train_x.shape))
143 | print ("test_x's shape: " + str(test_x.shape))
144 | 
145 | 
146 | # **Note**:
147 | # $12,288$ equals $64 \times 64 \times 3$, which is the size of one reshaped image vector.
148 | 
149 | # <a name='3'></a>
150 | # ## 3 - Model Architecture
151 | 
152 | # <a name='3-1'></a>
153 | # ### 3.1 - 2-layer Neural Network
154 | # 
155 | # Now that you're familiar with the dataset, it's time to build a deep neural network to distinguish cat images from non-cat images!
156 | # 
157 | # You're going to build two different models:
158 | # 
159 | # - A 2-layer neural network
160 | # - An L-layer deep neural network
161 | # 
162 | # Then, you'll compare the performance of these models, and try out some different values for $L$. 
163 | # 
164 | # Let's look at the two architectures:
165 | # 
166 | # <img src="images/2layerNN_kiank.png" style="width:650px;height:400px;">
167 | # <caption><center><font color='purple'><b>Figure 2</b>: 2-layer neural network. <br> The model can be summarized as: INPUT -> LINEAR -> RELU -> LINEAR -> SIGMOID -> OUTPUT.</font></center></caption>
168 | # 
169 | # <u><b>Detailed Architecture of Figure 2</b></u>:
170 | # - The input is a (64,64,3) image which is flattened to a vector of size $(12288,1)$. 
171 | # - The corresponding vector: $[x_0,x_1,...,x_{12287}]^T$ is then multiplied by the weight matrix $W^{[1]}$ of size $(n^{[1]}, 12288)$.
172 | # - Then, add a bias term and take its relu to get the following vector: $[a_0^{[1]}, a_1^{[1]},..., a_{n^{[1]}-1}^{[1]}]^T$.
173 | # - Multiply the resulting vector by $W^{[2]}$ and add the intercept (bias). 
174 | # - Finally, take the sigmoid of the result. If it's greater than 0.5, classify it as a cat.
175 | # 
176 | # <a name='3-2'></a>
177 | # ### 3.2 - L-layer Deep Neural Network
178 | # 
179 | # It's pretty difficult to represent an L-layer deep neural network using the above representation. However, here is a simplified network representation:
180 | # 
181 | # <img src="images/LlayerNN_kiank.png" style="width:650px;height:400px;">
182 | # <caption><center><font color='purple'><b>Figure 3</b>: L-layer neural network. <br> The model can be summarized as: [LINEAR -> RELU] $\times$ (L-1) -> LINEAR -> SIGMOID</font></center></caption>
183 | # 
184 | # <u><b>Detailed Architecture of Figure 3</b></u>:
185 | # - The input is a (64,64,3) image which is flattened to a vector of size (12288,1).
186 | # - The corresponding vector: $[x_0,x_1,...,x_{12287}]^T$ is then multiplied by the weight matrix $W^{[1]}$ and then you add the intercept $b^{[1]}$. The result is called the linear unit.
187 | # - Next, take the relu of the linear unit. This process could be repeated several times for each $(W^{[l]}, b^{[l]})$ depending on the model architecture.
188 | # - Finally, take the sigmoid of the final linear unit. If it is greater than 0.5, classify it as a cat.
189 | # 
190 | # <a name='3-3'></a>
191 | # ### 3.3 - General Methodology
192 | # 
193 | # As usual, you'll follow the Deep Learning methodology to build the model:
194 | # 
195 | # 1. Initialize parameters / Define hyperparameters
196 | # 2. Loop for num_iterations:
197 | #     a. Forward propagation
198 | #     b. Compute cost function
199 | #     c. Backward propagation
200 | #     d. Update parameters (using parameters, and grads from backprop) 
201 | # 3. Use trained parameters to predict labels
202 | # 
203 | # Now go ahead and implement those two models!
204 | 
205 | # <a name='4'></a>
206 | # ## 4 - Two-layer Neural Network
207 | # 
208 | # <a name='ex-1'></a>
209 | # ### Exercise 1 - two_layer_model 
210 | # 
211 | # Use the helper functions you have implemented in the previous assignment to build a 2-layer neural network with the following structure: *LINEAR -> RELU -> LINEAR -> SIGMOID*. The functions and their inputs are:
212 | # ```python
213 | # def initialize_parameters(n_x, n_h, n_y):
214 | #     ...
215 | #     return parameters 
216 | # def linear_activation_forward(A_prev, W, b, activation):
217 | #     ...
218 | #     return A, cache
219 | # def compute_cost(AL, Y):
220 | #     ...
221 | #     return cost
222 | # def linear_activation_backward(dA, cache, activation):
223 | #     ...
224 | #     return dA_prev, dW, db
225 | # def update_parameters(parameters, grads, learning_rate):
226 | #     ...
227 | #     return parameters
228 | # ```
229 | 
230 | # In[9]:
231 | 
232 | 
233 | ### CONSTANTS DEFINING THE MODEL ####
234 | n_x = 12288     # num_px * num_px * 3
235 | n_h = 7
236 | n_y = 1
237 | layers_dims = (n_x, n_h, n_y)
238 | learning_rate = 0.0075
239 | 
240 | 
241 | # In[10]:
242 | 
243 | 
244 | # GRADED FUNCTION: two_layer_model
245 | 
246 | def two_layer_model(X, Y, layers_dims, learning_rate = 0.0075, num_iterations = 3000, print_cost=False):
247 |     """
248 |     Implements a two-layer neural network: LINEAR->RELU->LINEAR->SIGMOID.
249 |     
250 |     Arguments:
251 |     X -- input data, of shape (n_x, number of examples)
252 |     Y -- true "label" vector (containing 1 if cat, 0 if non-cat), of shape (1, number of examples)
253 |     layers_dims -- dimensions of the layers (n_x, n_h, n_y)
254 |     num_iterations -- number of iterations of the optimization loop
255 |     learning_rate -- learning rate of the gradient descent update rule
256 |     print_cost -- If set to True, this will print the cost every 100 iterations 
257 |     
258 |     Returns:
259 |     parameters -- a dictionary containing W1, W2, b1, and b2
260 |     """
261 |     
262 |     np.random.seed(1)
263 |     grads = {}
264 |     costs = []                              # to keep track of the cost
265 |     m = X.shape[1]                           # number of examples
266 |     (n_x, n_h, n_y) = layers_dims
267 |     
268 |     # Initialize parameters dictionary, by calling one of the functions you'd previously implemented
269 |     #(≈ 1 line of code)
270 |     # parameters = ...
271 |     # YOUR CODE STARTS HERE
272 |     parameters = initialize_parameters(n_x, n_h, n_y)
273 |     
274 |     # YOUR CODE ENDS HERE
275 |     
276 |     # Get W1, b1, W2 and b2 from the dictionary parameters.
277 |     W1 = parameters["W1"]
278 |     b1 = parameters["b1"]
279 |     W2 = parameters["W2"]
280 |     b2 = parameters["b2"]
281 |     
282 |     # Loop (gradient descent)
283 | 
284 |     for i in range(0, num_iterations):
285 | 
286 |         # Forward propagation: LINEAR -> RELU -> LINEAR -> SIGMOID. Inputs: "X, W1, b1, W2, b2". Output: "A1, cache1, A2, cache2".
287 |         #(≈ 2 lines of code)
288 |         # A1, cache1 = ...
289 |         # A2, cache2 = ...
290 |         # YOUR CODE STARTS HERE
291 |         A1, cache1 = linear_activation_forward(X, W1, b1, activation="relu")
292 |         A2, cache2 = linear_activation_forward(A1, W2, b2, activation="sigmoid")
293 |         
294 |         # YOUR CODE ENDS HERE
295 |         
296 |         # Compute cost
297 |         #(≈ 1 line of code)
298 |         # cost = ...
299 |         # YOUR CODE STARTS HERE
300 |         
301 |         cost = compute_cost(A2, Y)
302 |         # YOUR CODE ENDS HERE
303 |         
304 |         # Initializing backward propagation
305 |         dA2 = - (np.divide(Y, A2) - np.divide(1 - Y, 1 - A2))
306 |         
307 |         # Backward propagation. Inputs: "dA2, cache2, cache1". Outputs: "dA1, dW2, db2; also dA0 (not used), dW1, db1".
308 |         #(≈ 2 lines of code)
309 |         # dA1, dW2, db2 = ...
310 |         # dA0, dW1, db1 = ...
311 |         # YOUR CODE STARTS HERE
312 |         dA1, dW2, db2 = linear_activation_backward(dA2, cache2, activation="sigmoid")
313 |         dA0, dW1, db1 = linear_activation_backward(dA1, cache1, activation="relu")
314 |         
315 |         # YOUR CODE ENDS HERE
316 |         
317 |         # Set grads['dWl'] to dW1, grads['db1'] to db1, grads['dW2'] to dW2, grads['db2'] to db2
318 |         grads['dW1'] = dW1
319 |         grads['db1'] = db1
320 |         grads['dW2'] = dW2
321 |         grads['db2'] = db2
322 |         
323 |         # Update parameters.
324 |         #(approx. 1 line of code)
325 |         # parameters = ...
326 |         # YOUR CODE STARTS HERE
327 |         
328 |         parameters = update_parameters(parameters, grads, learning_rate)
329 |         # YOUR CODE ENDS HERE
330 | 
331 |         # Retrieve W1, b1, W2, b2 from parameters
332 |         W1 = parameters["W1"]
333 |         b1 = parameters["b1"]
334 |         W2 = parameters["W2"]
335 |         b2 = parameters["b2"]
336 |         
337 |         # Print the cost every 100 iterations
338 |         if print_cost and i % 100 == 0 or i == num_iterations - 1:
339 |             print("Cost after iteration {}: {}".format(i, np.squeeze(cost)))
340 |         if i % 100 == 0 or i == num_iterations:
341 |             costs.append(cost)
342 | 
343 |     return parameters, costs
344 | 
345 | def plot_costs(costs, learning_rate=0.0075):
346 |     plt.plot(np.squeeze(costs))
347 |     plt.ylabel('cost')
348 |     plt.xlabel('iterations (per hundreds)')
349 |     plt.title("Learning rate =" + str(learning_rate))
350 |     plt.show()
351 | 
352 | 
353 | # In[11]:
354 | 
355 | 
356 | parameters, costs = two_layer_model(train_x, train_y, layers_dims = (n_x, n_h, n_y), num_iterations = 2, print_cost=False)
357 | 
358 | print("Cost after first iteration: " + str(costs[0]))
359 | 
360 | two_layer_model_test(two_layer_model)
361 | 
362 | 
363 | # **Expected output:**
364 | # 
365 | # ```
366 | # cost after iteration 1 must be around 0.69
367 | # ```
368 | 
369 | # <a name='4-1'></a>
370 | # ### 4.1 - Train the model 
371 | # 
372 | # If your code passed the previous cell, run the cell below to train your parameters. 
373 | # 
374 | # - The cost should decrease on every iteration. 
375 | # 
376 | # - It may take up to 5 minutes to run 2500 iterations. 
377 | 
378 | # In[12]:
379 | 
380 | 
381 | parameters, costs = two_layer_model(train_x, train_y, layers_dims = (n_x, n_h, n_y), num_iterations = 2500, print_cost=True)
382 | plot_costs(costs, learning_rate)
383 | 
384 | 
385 | # **Expected Output**:
386 | # <table> 
387 | #     <tr>
388 | #         <td> <b>Cost after iteration 0</b></td>
389 | #         <td> 0.6930497356599888 </td>
390 | #     </tr>
391 | #     <tr>
392 | #         <td> <b>Cost after iteration 100</b></td>
393 | #         <td> 0.6464320953428849 </td>
394 | #     </tr>
395 | #     <tr>
396 | #         <td> <b>...</b></td>
397 | #         <td> ... </td>
398 | #     </tr>
399 | #     <tr>
400 | #         <td> <b>Cost after iteration 2499</b></td>
401 | #         <td> 0.04421498215868956 </td>
402 | #     </tr>
403 | # </table>
404 | 
405 | # **Nice!** You successfully trained the model. Good thing you built a vectorized implementation! Otherwise it might have taken 10 times longer to train this.
406 | # 
407 | # Now, you can use the trained parameters to classify images from the dataset. To see your predictions on the training and test sets, run the cell below.
408 | 
409 | # In[13]:
410 | 
411 | 
412 | predictions_train = predict(train_x, train_y, parameters)
413 | 
414 | 
415 | # **Expected Output**:
416 | # <table> 
417 | #     <tr>
418 | #         <td> <b>Accuracy</b></td>
419 | #         <td> 0.9999999999999998 </td>
420 | #     </tr>
421 | # </table>
422 | 
423 | # In[14]:
424 | 
425 | 
426 | predictions_test = predict(test_x, test_y, parameters)
427 | 
428 | 
429 | # **Expected Output**:
430 | # 
431 | # <table> 
432 | #     <tr>
433 | #         <td> <b>Accuracy</b></td>
434 | #         <td> 0.72 </td>
435 | #     </tr>
436 | # </table>
437 | 
438 | # ### Congratulations! It seems that your 2-layer neural network has better performance (72%) than the logistic regression implementation (70%, assignment week 2). Let's see if you can do even better with an $L$-layer model.
439 | # 
440 | # **Note**: You may notice that running the model on fewer iterations (say 1500) gives better accuracy on the test set. This is called "early stopping" and you'll hear more about it in the next course. Early stopping is a way to prevent overfitting. 
441 | 
442 | # <a name='5'></a>
443 | # ## 5 - L-layer Neural Network
444 | # 
445 | # <a name='ex-2'></a>
446 | # ### Exercise 2 - L_layer_model 
447 | # 
448 | # Use the helper functions you implemented previously to build an $L$-layer neural network with the following structure: *[LINEAR -> RELU]$\times$(L-1) -> LINEAR -> SIGMOID*. The functions and their inputs are:
449 | # ```python
450 | # def initialize_parameters_deep(layers_dims):
451 | #     ...
452 | #     return parameters 
453 | # def L_model_forward(X, parameters):
454 | #     ...
455 | #     return AL, caches
456 | # def compute_cost(AL, Y):
457 | #     ...
458 | #     return cost
459 | # def L_model_backward(AL, Y, caches):
460 | #     ...
461 | #     return grads
462 | # def update_parameters(parameters, grads, learning_rate):
463 | #     ...
464 | #     return parameters
465 | # ```
466 | 
467 | # In[15]:
468 | 
469 | 
470 | ### CONSTANTS ###
471 | layers_dims = [12288, 20, 7, 5, 1] #  4-layer model
472 | 
473 | 
474 | # In[16]:
475 | 
476 | 
477 | # GRADED FUNCTION: L_layer_model
478 | 
479 | def L_layer_model(X, Y, layers_dims, learning_rate = 0.0075, num_iterations = 3000, print_cost=False):
480 |     """
481 |     Implements a L-layer neural network: [LINEAR->RELU]*(L-1)->LINEAR->SIGMOID.
482 |     
483 |     Arguments:
484 |     X -- input data, of shape (n_x, number of examples)
485 |     Y -- true "label" vector (containing 1 if cat, 0 if non-cat), of shape (1, number of examples)
486 |     layers_dims -- list containing the input size and each layer size, of length (number of layers + 1).
487 |     learning_rate -- learning rate of the gradient descent update rule
488 |     num_iterations -- number of iterations of the optimization loop
489 |     print_cost -- if True, it prints the cost every 100 steps
490 |     
491 |     Returns:
492 |     parameters -- parameters learnt by the model. They can then be used to predict.
493 |     """
494 | 
495 |     np.random.seed(1)
496 |     costs = []                         # keep track of cost
497 |     
498 |     # Parameters initialization.
499 |     #(≈ 1 line of code)
500 |     # parameters = ...
501 |     # YOUR CODE STARTS HERE
502 |     parameters = initialize_parameters_deep(layers_dims)
503 |     
504 |     # YOUR CODE ENDS HERE
505 |     
506 |     # Loop (gradient descent)
507 |     for i in range(0, num_iterations):
508 | 
509 |         # Forward propagation: [LINEAR -> RELU]*(L-1) -> LINEAR -> SIGMOID.
510 |         #(≈ 1 line of code)
511 |         # AL, caches = ...
512 |         # YOUR CODE STARTS HERE
513 |         AL, caches = L_model_forward(X, parameters)
514 |         
515 |         # YOUR CODE ENDS HERE
516 |         
517 |         # Compute cost.
518 |         #(≈ 1 line of code)
519 |         # cost = ...
520 |         # YOUR CODE STARTS HERE
521 |         cost = compute_cost(AL, Y)
522 |         
523 |         # YOUR CODE ENDS HERE
524 |     
525 |         # Backward propagation.
526 |         #(≈ 1 line of code)
527 |         # grads = ...    
528 |         # YOUR CODE STARTS HERE
529 |         grads = L_model_backward(AL, Y, caches)
530 |         
531 |         # YOUR CODE ENDS HERE
532 |  
533 |         # Update parameters.
534 |         #(≈ 1 line of code)
535 |         # parameters = ...
536 |         # YOUR CODE STARTS HERE
537 |         parameters = update_parameters(parameters, grads, learning_rate)
538 |         
539 |         # YOUR CODE ENDS HERE
540 |                 
541 |         # Print the cost every 100 iterations
542 |         if print_cost and i % 100 == 0 or i == num_iterations - 1:
543 |             print("Cost after iteration {}: {}".format(i, np.squeeze(cost)))
544 |         if i % 100 == 0 or i == num_iterations:
545 |             costs.append(cost)
546 |     
547 |     return parameters, costs
548 | 
549 | 
550 | # In[17]:
551 | 
552 | 
553 | parameters, costs = L_layer_model(train_x, train_y, layers_dims, num_iterations = 1, print_cost = False)
554 | 
555 | print("Cost after first iteration: " + str(costs[0]))
556 | 
557 | L_layer_model_test(L_layer_model)
558 | 
559 | 
560 | # <a name='5-1'></a>
561 | # ### 5.1 - Train the model 
562 | # 
563 | # If your code passed the previous cell, run the cell below to train your model as a 4-layer neural network. 
564 | # 
565 | # - The cost should decrease on every iteration. 
566 | # 
567 | # - It may take up to 5 minutes to run 2500 iterations. 
568 | 
569 | # In[18]:
570 | 
571 | 
572 | parameters, costs = L_layer_model(train_x, train_y, layers_dims, num_iterations = 2500, print_cost = True)
573 | 
574 | 
575 | # **Expected Output**:
576 | # <table> 
577 | #     <tr>
578 | #         <td> <b>Cost after iteration 0</b></td>
579 | #         <td> 0.771749 </td>
580 | #     </tr>
581 | #     <tr>
582 | #         <td> <b>Cost after iteration 100</b></td>
583 | #         <td> 0.672053 </td>
584 | #     </tr>
585 | #     <tr>
586 | #         <td> <b>...</b></td>
587 | #         <td> ... </td>
588 | #     </tr>
589 | #     <tr>
590 | #         <td> <b>Cost after iteration 2499</b></td>
591 | #         <td> 0.088439 </td>
592 | #     </tr>
593 | # </table>
594 | 
595 | # In[19]:
596 | 
597 | 
598 | pred_train = predict(train_x, train_y, parameters)
599 | 
600 | 
601 | # **Expected Output**:
602 | # 
603 | # <table>
604 | #     <tr>
605 | #     <td>
606 | #         <b>Train Accuracy</b>
607 | #     </td>
608 | #     <td>
609 | #     0.985645933014
610 | #     </td>
611 | #     </tr>
612 | # </table>
613 | 
614 | # In[20]:
615 | 
616 | 
617 | pred_test = predict(test_x, test_y, parameters)
618 | 
619 | 
620 | # **Expected Output**:
621 | # 
622 | # <table> 
623 | #     <tr>
624 | #         <td> <b>Test Accuracy</b></td>
625 | #         <td> 0.8 </td>
626 | #     </tr>
627 | # </table>
628 | 
629 | # ### Congrats! It seems that your 4-layer neural network has better performance (80%) than your 2-layer neural network (72%) on the same test set. 
630 | # 
631 | # This is pretty good performance for this task. Nice job! 
632 | # 
633 | # In the next course on "Improving deep neural networks," you'll be able to obtain even higher accuracy by systematically searching for better hyperparameters: learning_rate, layers_dims, or num_iterations, for example.  
634 | 
635 | # <a name='6'></a>
636 | # ##  6 - Results Analysis
637 | # 
638 | # First, take a look at some images the L-layer model labeled incorrectly. This will show a few mislabeled images. 
639 | 
640 | # In[21]:
641 | 
642 | 
643 | print_mislabeled_images(classes, test_x, test_y, pred_test)
644 | 
645 | 
646 | # **A few types of images the model tends to do poorly on include:** 
647 | # - Cat body in an unusual position
648 | # - Cat appears against a background of a similar color
649 | # - Unusual cat color and species
650 | # - Camera Angle
651 | # - Brightness of the picture
652 | # - Scale variation (cat is very large or small in image) 
653 | 
654 | # ### Congratulations on finishing this assignment! 
655 | # 
656 | # You just built and trained a deep L-layer neural network, and applied it in order to distinguish cats from non-cats, a very serious and important task in deep learning. ;) 
657 | # 
658 | # By now, you've also completed all the assignments for Course 1 in the Deep Learning Specialization. Amazing work! If you'd like to test out how closely you resemble a cat yourself, there's an optional ungraded exercise below, where you can test your own image. 
659 | # 
660 | # Great work and hope to see you in the next course! 
661 | 
662 | # <a name='7'></a>
663 | # ## 7 - Test with your own image (optional/ungraded exercise) ##
664 | # 
665 | # From this point, if you so choose, you can use your own image to test  the output of your model. To do that follow these steps:
666 | # 
667 | # 1. Click on "File" in the upper bar of this notebook, then click "Open" to go on your Coursera Hub.
668 | # 2. Add your image to this Jupyter Notebook's directory, in the "images" folder
669 | # 3. Change your image's name in the following code
670 | # 4. Run the code and check if the algorithm is right (1 = cat, 0 = non-cat)!
671 | 
672 | # In[22]:
673 | 
674 | 
675 | ## START CODE HERE ##
676 | my_image = "my_image.jpg" # change this to the name of your image file 
677 | my_label_y = [1] # the true class of your image (1 -> cat, 0 -> non-cat)
678 | ## END CODE HERE ##
679 | 
680 | fname = "images/" + my_image
681 | image = np.array(Image.open(fname).resize((num_px, num_px)))
682 | plt.imshow(image)
683 | image = image / 255.
684 | image = image.reshape((1, num_px * num_px * 3)).T
685 | 
686 | my_predicted_image = predict(image, my_label_y, parameters)
687 | 
688 | 
689 | print ("y = " + str(np.squeeze(my_predicted_image)) + ", your L-layer model predicts a \"" + classes[int(np.squeeze(my_predicted_image)),].decode("utf-8") +  "\" picture.")
690 | 
691 | 
692 | # **References**:
693 | # 
694 | # - for auto-reloading external module: http://stackoverflow.com/questions/1907993/autoreload-of-modules-in-ipython
695 | 


--------------------------------------------------------------------------------
/Course 1-Neural Networks & Deep Learning/Week 4/Week 4 Key Concepts on Deep Neural Networks.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/TheKidPadra/DeepLearning.AI-Deep-Learning-Specialization/0a951ae6671f739e86b9a2846c821df1a0215ef6/Course 1-Neural Networks & Deep Learning/Week 4/Week 4 Key Concepts on Deep Neural Networks.pdf


--------------------------------------------------------------------------------
/Course 2-Improving Deep Neural Networks/Week 1/Gradient_Checking.py:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/env python
  2 | # coding: utf-8
  3 | 
  4 | # # Gradient Checking
  5 | # 
  6 | # Welcome to the final assignment for this week! In this assignment you'll be implementing gradient checking.
  7 | # 
  8 | # By the end of this notebook, you'll be able to:
  9 | # 
 10 | # Implement gradient checking to verify the accuracy of your backprop implementation.
 11 | # 
 12 | # ## Important Note on Submission to the AutoGrader
 13 | # 
 14 | # Before submitting your assignment to the AutoGrader, please make sure you are not doing the following:
 15 | # 
 16 | # 1. You have not added any _extra_ `print` statement(s) in the assignment.
 17 | # 2. You have not added any _extra_ code cell(s) in the assignment.
 18 | # 3. You have not changed any of the function parameters.
 19 | # 4. You are not using any global variables inside your graded exercises. Unless specifically instructed to do so, please refrain from it and use the local variables instead.
 20 | # 5. You are not changing the assignment code where it is not required, like creating _extra_ variables.
 21 | # 
 22 | # If you do any of the following, you will get something like, `Grader Error: Grader feedback not found` (or similarly unexpected) error upon submitting your assignment. Before asking for help/debugging the errors in your assignment, check for these first. If this is the case, and you don't remember the changes you have made, you can get a fresh copy of the assignment by following these [instructions](https://www.coursera.org/learn/deep-neural-network/supplement/QWEnZ/h-ow-to-refresh-your-workspace).
 23 | 
 24 | # ## Table of Contents
 25 | # - [1 - Packages](#1)
 26 | # - [2 - Problem Statement](#2)
 27 | # - [3 - How does Gradient Checking work?](#3)
 28 | # - [4 - 1-Dimensional Gradient Checking](#4)
 29 | #     - [Exercise 1 - forward_propagation](#ex-1)
 30 | #     - [Exercise 2 - backward_propagation](#ex-2)
 31 | #     - [Exercise 3 - gradient_check](#ex-3)
 32 | # - [5 - N-Dimensional Gradient Checking](#5)
 33 | #     - [Exercise 4 - gradient_check_n](#ex-4)
 34 | 
 35 | # <a name='1'></a>
 36 | # ## 1 - Packages
 37 | 
 38 | # In[1]:
 39 | 
 40 | 
 41 | import numpy as np
 42 | from testCases import *
 43 | from public_tests import *
 44 | from gc_utils import sigmoid, relu, dictionary_to_vector, vector_to_dictionary, gradients_to_vector
 45 | 
 46 | get_ipython().run_line_magic('load_ext', 'autoreload')
 47 | get_ipython().run_line_magic('autoreload', '2')
 48 | 
 49 | 
 50 | # <a name='2'></a>
 51 | # ## 2 - Problem Statement
 52 | # 
 53 | # You are part of a team working to make mobile payments available globally, and are asked to build a deep learning model to detect fraud--whenever someone makes a payment, you want to see if the payment might be fraudulent, such as if the user's account has been taken over by a hacker.
 54 | # 
 55 | # You already know that backpropagation is quite challenging to implement, and sometimes has bugs. Because this is a mission-critical application, your company's CEO wants to be really certain that your implementation of backpropagation is correct. Your CEO says, "Give me proof that your backpropagation is actually working!" To give this reassurance, you are going to use "gradient checking."
 56 | # 
 57 | # Let's do it!
 58 | 
 59 | # <a name='3'></a>
 60 | # ## 3 - How does Gradient Checking work?
 61 | # Backpropagation computes the gradients $\frac{\partial J}{\partial \theta}$, where $\theta$ denotes the parameters of the model. $J$ is computed using forward propagation and your loss function.
 62 | # 
 63 | # Because forward propagation is relatively easy to implement, you're confident you got that right, and so you're almost 100% sure that you're computing the cost $J$ correctly. Thus, you can use your code for computing $J$ to verify the code for computing $\frac{\partial J}{\partial \theta}$.
 64 | # 
 65 | # Let's look back at the definition of a derivative (or gradient):$$ \frac{\partial J}{\partial \theta} = \lim_{\varepsilon \to 0} \frac{J(\theta + \varepsilon) - J(\theta - \varepsilon)}{2 \varepsilon} \tag{1}$$
 66 | # 
 67 | # If you're not familiar with the "$\displaystyle \lim_{\varepsilon \to 0}$" notation, it's just a way of saying "when $\varepsilon$ is really, really small."
 68 | # 
 69 | # You know the following:
 70 | # 
 71 | # $\frac{\partial J}{\partial \theta}$ is what you want to make sure you're computing correctly.
 72 | # You can compute $J(\theta + \varepsilon)$ and $J(\theta - \varepsilon)$ (in the case that $\theta$ is a real number), since you're confident your implementation for $J$ is correct.
 73 | # Let's use equation (1) and a small value for $\varepsilon$ to convince your CEO that your code for computing $\frac{\partial J}{\partial \theta}$ is correct!
 74 | 
 75 | # <a name='4'></a>
 76 | # ## 4 - 1-Dimensional Gradient Checking
 77 | # 
 78 | # Consider a 1D linear function $J(\theta) = \theta x$. The model contains only a single real-valued parameter $\theta$, and takes $x$ as input.
 79 | # 
 80 | # You will implement code to compute $J(.)$ and its derivative $\frac{\partial J}{\partial \theta}$. You will then use gradient checking to make sure your derivative computation for $J$ is correct. 
 81 | # 
 82 | # <img src="images/1Dgrad_kiank.png" style="width:600px;height:250px;">
 83 | # <caption><center><font color='purple'><b>Figure 1</b>:1D linear model </font></center></caption>
 84 | # 
 85 | # The diagram above shows the key computation steps: First start with $x$, then evaluate the function $J(x)$ ("forward propagation"). Then compute the derivative $\frac{\partial J}{\partial \theta}$ ("backward propagation"). 
 86 | # 
 87 | # <a name='ex-1'></a>
 88 | # ### Exercise 1 - forward_propagation
 89 | # 
 90 | # Implement `forward propagation`. For this simple function compute $J(.)$
 91 | 
 92 | # In[2]:
 93 | 
 94 | 
 95 | # GRADED FUNCTION: forward_propagation
 96 | 
 97 | def forward_propagation(x, theta):
 98 |     """
 99 |     Implement the linear forward propagation (compute J) presented in Figure 1 (J(theta) = theta * x)
100 |     
101 |     Arguments:
102 |     x -- a real-valued input
103 |     theta -- our parameter, a real number as well
104 |     
105 |     Returns:
106 |     J -- the value of function J, computed using the formula J(theta) = theta * x
107 |     """
108 |     
109 |     # (approx. 1 line)
110 |     # J = 
111 |     # YOUR CODE STARTS HERE
112 |     J = np.dot(theta,x)
113 |     
114 |     # YOUR CODE ENDS HERE
115 |     
116 |     return J
117 | 
118 | 
119 | # In[3]:
120 | 
121 | 
122 | x, theta = 2, 4
123 | J = forward_propagation(x, theta)
124 | print ("J = " + str(J))
125 | forward_propagation_test(forward_propagation)
126 | 
127 | 
128 | # <a name='ex-2'></a>
129 | # ### Exercise 2 - backward_propagation
130 | # 
131 | # Now, implement the `backward propagation` step (derivative computation) of Figure 1. That is, compute the derivative of $J(\theta) = \theta x$ with respect to $\theta$. To save you from doing the calculus, you should get $dtheta = \frac { \partial J }{ \partial \theta} = x$.
132 | 
133 | # In[4]:
134 | 
135 | 
136 | # GRADED FUNCTION: backward_propagation
137 | 
138 | def backward_propagation(x, theta):
139 |     """
140 |     Computes the derivative of J with respect to theta (see Figure 1).
141 |     
142 |     Arguments:
143 |     x -- a real-valued input
144 |     theta -- our parameter, a real number as well
145 |     
146 |     Returns:
147 |     dtheta -- the gradient of the cost with respect to theta
148 |     """
149 |     
150 |     # (approx. 1 line)
151 |     # dtheta = 
152 |     # YOUR CODE STARTS HERE
153 |     
154 |     dtheta=x
155 |     # YOUR CODE ENDS HERE
156 |     
157 |     return dtheta
158 | 
159 | 
160 | # In[5]:
161 | 
162 | 
163 | x, theta = 3, 4
164 | dtheta = backward_propagation(x, theta)
165 | print ("dtheta = " + str(dtheta))
166 | backward_propagation_test(backward_propagation)
167 | 
168 | 
169 | # #### Expected output:
170 | # 
171 | # ```
172 | # dtheta = 3
173 | #  All tests passed.
174 | # ```
175 | 
176 | # <a name='ex-3'></a>
177 | # ### Exercise 3 - gradient_check
178 | # 
179 | # To show that the `backward_propagation()` function is correctly computing the gradient $\frac{\partial J}{\partial \theta}$, let's implement gradient checking.
180 | # 
181 | # **Instructions**:
182 | # - First compute "gradapprox" using the formula above (1) and a small value of $\varepsilon$. Here are the Steps to follow:
183 | #     1. $\theta^{+} = \theta + \varepsilon$
184 | #     2. $\theta^{-} = \theta - \varepsilon$
185 | #     3. $J^{+} = J(\theta^{+})$
186 | #     4. $J^{-} = J(\theta^{-})$
187 | #     5. $gradapprox = \frac{J^{+} - J^{-}}{2  \varepsilon}$
188 | # - Then compute the gradient using backward propagation, and store the result in a variable "grad"
189 | # - Finally, compute the relative difference between "gradapprox" and the "grad" using the following formula:
190 | # $$ difference = \frac {\mid\mid grad - gradapprox \mid\mid_2}{\mid\mid grad \mid\mid_2 + \mid\mid gradapprox \mid\mid_2} \tag{2}$$
191 | # You will need 3 Steps to compute this formula:
192 | #    - 1'. compute the numerator using np.linalg.norm(...)
193 | #    - 2'. compute the denominator. You will need to call np.linalg.norm(...) twice.
194 | #    - 3'. divide them.
195 | # - If this difference is small (say less than $10^{-7}$), you can be quite confident that you have computed your gradient correctly. Otherwise, there may be a mistake in the gradient computation. 
196 | # 
197 | 
198 | # In[8]:
199 | 
200 | 
201 | # GRADED FUNCTION: gradient_check
202 | 
203 | def gradient_check(x, theta, epsilon=1e-7, print_msg=False):
204 |     """
205 |     Implement the gradient checking presented in Figure 1.
206 |     
207 |     Arguments:
208 |     x -- a float input
209 |     theta -- our parameter, a float as well
210 |     epsilon -- tiny shift to the input to compute approximated gradient with formula(1)
211 |     
212 |     Returns:
213 |     difference -- difference (2) between the approximated gradient and the backward propagation gradient. Float output
214 |     """
215 |     
216 |     # Compute gradapprox using right side of formula (1). epsilon is small enough, you don't need to worry about the limit.
217 |     # (approx. 5 lines)
218 |     # theta_plus =                                 # Step 1
219 |     # theta_minus =                                # Step 2
220 |     # J_plus =                                    # Step 3
221 |     # J_minus =                                   # Step 4
222 |     # gradapprox =                                # Step 5
223 |     # YOUR CODE STARTS HERE
224 |     
225 |     thetaplus = theta + epsilon                           # Step 1
226 |     thetaminus = theta - epsilon                          # Step 2
227 |     J_plus = np.dot(thetaplus,x)                            # Step 3
228 |     J_minus = np.dot(thetaminus,x)                        # Step 4
229 |     gradapprox = (J_plus - J_minus)/(2*epsilon)
230 |     # YOUR CODE ENDS HERE
231 |     
232 |     # Check if gradapprox is close enough to the output of backward_propagation()
233 |     #(approx. 1 line) DO NOT USE "grad = gradapprox"
234 |     # grad =
235 |     # YOUR CODE STARTS HERE
236 |     grad = x
237 |     
238 |     # YOUR CODE ENDS HERE
239 |     
240 |     #(approx. 3 lines)
241 |     # numerator =                                 # Step 1'
242 |     # denominator =                               # Step 2'
243 |     # difference =                                # Step 3'
244 |     # YOUR CODE STARTS HERE
245 |     
246 |     numerator = np.linalg.norm(gradapprox-grad)                         # Step 1'
247 |     denominator = np.linalg.norm(gradapprox) + np.linalg.norm(grad)     # Step 2'
248 |     difference = numerator/denominator                                  # Step 3'
249 |     # YOUR CODE ENDS HERE
250 |     if print_msg:
251 |         if difference > 2e-7:
252 |             print ("\033[93m" + "There is a mistake in the backward propagation! difference = " + str(difference) + "\033[0m")
253 |         else:
254 |             print ("\033[92m" + "Your backward propagation works perfectly fine! difference = " + str(difference) + "\033[0m")
255 |     
256 |     return difference
257 | 
258 | 
259 | # In[9]:
260 | 
261 | 
262 | x, theta = 3, 4
263 | difference = gradient_check(x, theta, print_msg=True)
264 | 
265 | 
266 | # **Expected output**:
267 | # 
268 | # <table>
269 | #     <tr>
270 | #         <td>  <b> Your backward propagation works perfectly fine!</b>  </td>
271 | #         <td> difference = 7.814075313343006e-11 </td>
272 | #     </tr>
273 | # </table>
274 | 
275 | # Congrats, the difference is smaller than the $2 * 10^{-7}$ threshold. So you can have high confidence that you've correctly computed the gradient in `backward_propagation()`. 
276 | # 
277 | # Now, in the more general case, your cost function $J$ has more than a single 1D input. When you are training a neural network, $\theta$ actually consists of multiple matrices $W^{[l]}$ and biases $b^{[l]}$! It is important to know how to do a gradient check with higher-dimensional inputs. Let's do it!
278 | 
279 | # <a name='5'></a>
280 | # ## 5 - N-Dimensional Gradient Checking
281 | 
282 | # The following figure describes the forward and backward propagation of your fraud detection model.
283 | # 
284 | # <img src="images/NDgrad_kiank.png" style="width:600px;height:400px;">
285 | # <caption><center><font color='purple'><b>Figure 2</b>: Deep neural network. LINEAR -> RELU -> LINEAR -> RELU -> LINEAR -> SIGMOID</font></center></caption>
286 | # 
287 | # Let's look at your implementations for forward propagation and backward propagation. 
288 | 
289 | # In[10]:
290 | 
291 | 
292 | def forward_propagation_n(X, Y, parameters):
293 |     """
294 |     Implements the forward propagation (and computes the cost) presented in Figure 3.
295 |     
296 |     Arguments:
297 |     X -- training set for m examples
298 |     Y -- labels for m examples 
299 |     parameters -- python dictionary containing your parameters "W1", "b1", "W2", "b2", "W3", "b3":
300 |                     W1 -- weight matrix of shape (5, 4)
301 |                     b1 -- bias vector of shape (5, 1)
302 |                     W2 -- weight matrix of shape (3, 5)
303 |                     b2 -- bias vector of shape (3, 1)
304 |                     W3 -- weight matrix of shape (1, 3)
305 |                     b3 -- bias vector of shape (1, 1)
306 |     
307 |     Returns:
308 |     cost -- the cost function (logistic cost for m examples)
309 |     cache -- a tuple with the intermediate values (Z1, A1, W1, b1, Z2, A2, W2, b2, Z3, A3, W3, b3)
310 | 
311 |     """
312 |     
313 |     # retrieve parameters
314 |     m = X.shape[1]
315 |     W1 = parameters["W1"]
316 |     b1 = parameters["b1"]
317 |     W2 = parameters["W2"]
318 |     b2 = parameters["b2"]
319 |     W3 = parameters["W3"]
320 |     b3 = parameters["b3"]
321 | 
322 |     # LINEAR -> RELU -> LINEAR -> RELU -> LINEAR -> SIGMOID
323 |     Z1 = np.dot(W1, X) + b1
324 |     A1 = relu(Z1)
325 |     Z2 = np.dot(W2, A1) + b2
326 |     A2 = relu(Z2)
327 |     Z3 = np.dot(W3, A2) + b3
328 |     A3 = sigmoid(Z3)
329 | 
330 |     # Cost
331 |     log_probs = np.multiply(-np.log(A3),Y) + np.multiply(-np.log(1 - A3), 1 - Y)
332 |     cost = 1. / m * np.sum(log_probs)
333 |     
334 |     cache = (Z1, A1, W1, b1, Z2, A2, W2, b2, Z3, A3, W3, b3)
335 |     
336 |     return cost, cache
337 | 
338 | 
339 | # Now, run backward propagation.
340 | 
341 | # In[14]:
342 | 
343 | 
344 | def backward_propagation_n(X, Y, cache):
345 |     """
346 |     Implement the backward propagation presented in figure 2.
347 |     
348 |     Arguments:
349 |     X -- input datapoint, of shape (input size, 1)
350 |     Y -- true "label"
351 |     cache -- cache output from forward_propagation_n()
352 |     
353 |     Returns:
354 |     gradients -- A dictionary with the gradients of the cost with respect to each parameter, activation and pre-activation variables.
355 |     """
356 |     
357 |     m = X.shape[1]
358 |     (Z1, A1, W1, b1, Z2, A2, W2, b2, Z3, A3, W3, b3) = cache
359 |     
360 |     dZ3 = A3 - Y
361 |     dW3 = 1./m * np.dot(dZ3, A2.T)
362 |     db3 = 1./m * np.sum(dZ3, axis=1, keepdims = True)
363 |     
364 |     dA2 = np.dot(W3.T, dZ3)
365 |     dZ2 = np.multiply(dA2, np.int64(A2 > 0))
366 |     dW2 = 1./m * np.dot(dZ2, A1.T)
367 |     db2 = 1./m * np.sum(dZ2, axis=1, keepdims = True)
368 |     
369 |     dA1 = np.dot(W2.T, dZ2)
370 |     dZ1 = np.multiply(dA1, np.int64(A1 > 0))
371 |     dW1 = 1./m * np.dot(dZ1, X.T)
372 |     db1 = 1./m * np.sum(dZ1, axis=1, keepdims = True)
373 |     
374 |     gradients = {"dZ3": dZ3, "dW3": dW3, "db3": db3,
375 |                  "dA2": dA2, "dZ2": dZ2, "dW2": dW2, "db2": db2,
376 |                  "dA1": dA1, "dZ1": dZ1, "dW1": dW1, "db1": db1}
377 |     
378 |     return gradients
379 | 
380 | 
381 | # You obtained some results on the fraud detection test set but you are not 100% sure of your model. Nobody's perfect! Let's implement gradient checking to verify if your gradients are correct.
382 | 
383 | # **How does gradient checking work?**.
384 | # 
385 | # As in Section 3 and 4, you want to compare "gradapprox" to the gradient computed by backpropagation. The formula is still:
386 | # 
387 | # $$ \frac{\partial J}{\partial \theta} = \lim_{\varepsilon \to 0} \frac{J(\theta + \varepsilon) - J(\theta - \varepsilon)}{2 \varepsilon} \tag{1}$$
388 | # 
389 | # However, $\theta$ is not a scalar anymore. It is a dictionary called "parameters". The  function "`dictionary_to_vector()`" has been implemented for you. It converts the "parameters" dictionary into a vector called "values", obtained by reshaping all parameters (W1, b1, W2, b2, W3, b3) into vectors and concatenating them.
390 | # 
391 | # The inverse function is "`vector_to_dictionary`" which outputs back the "parameters" dictionary.
392 | # 
393 | # <img src="images/dictionary_to_vector.png" style="width:600px;height:400px;">
394 | # <caption><center><font color='purple'><b>Figure 2</b>: dictionary_to_vector() and vector_to_dictionary(). You will need these functions in gradient_check_n()</font></center></caption>
395 | # 
396 | # The "gradients" dictionary has also been converted into a vector "grad" using gradients_to_vector(), so you don't need to worry about that.
397 | # 
398 | # Now, for every single parameter in your vector, you will apply the same procedure as for the gradient_check exercise. You will store each gradient approximation in a vector `gradapprox`. If the check goes as expected, each value in this approximation must match the real gradient values stored in the `grad` vector. 
399 | # 
400 | # Note that `grad` is calculated using the function `gradients_to_vector`, which uses the gradients outputs of the `backward_propagation_n` function.
401 | # 
402 | # <a name='ex-4'></a>
403 | # ### Exercise 4 - gradient_check_n
404 | # 
405 | # Implement the function below.
406 | # 
407 | # **Instructions**: Here is pseudo-code that will help you implement the gradient check.
408 | # 
409 | # For each i in num_parameters:
410 | # - To compute `J_plus[i]`:
411 | #     1. Set $\theta^{+}$ to `np.copy(parameters_values)`
412 | #     2. Set $\theta^{+}_i$ to $\theta^{+}_i + \varepsilon$
413 | #     3. Calculate $J^{+}_i$ using to `forward_propagation_n(x, y, vector_to_dictionary(`$\theta^{+}$ `))`.     
414 | # - To compute `J_minus[i]`: do the same thing with $\theta^{-}$
415 | # - Compute $gradapprox[i] = \frac{J^{+}_i - J^{-}_i}{2 \varepsilon}$
416 | # 
417 | # Thus, you get a vector gradapprox, where gradapprox[i] is an approximation of the gradient with respect to `parameter_values[i]`. You can now compare this gradapprox vector to the gradients vector from backpropagation. Just like for the 1D case (Steps 1', 2', 3'), compute: 
418 | # $$ difference = \frac {\| grad - gradapprox \|_2}{\| grad \|_2 + \| gradapprox \|_2 } \tag{3}$$
419 | # 
420 | # **Note**: Use `np.linalg.norm` to get the norms
421 | 
422 | # In[15]:
423 | 
424 | 
425 | # GRADED FUNCTION: gradient_check_n
426 | 
427 | def gradient_check_n(parameters, gradients, X, Y, epsilon=1e-7, print_msg=False):
428 |     """
429 |     Checks if backward_propagation_n computes correctly the gradient of the cost output by forward_propagation_n
430 |     
431 |     Arguments:
432 |     parameters -- python dictionary containing your parameters "W1", "b1", "W2", "b2", "W3", "b3"
433 |     grad -- output of backward_propagation_n, contains gradients of the cost with respect to the parameters 
434 |     X -- input datapoint, of shape (input size, number of examples)
435 |     Y -- true "label"
436 |     epsilon -- tiny shift to the input to compute approximated gradient with formula(1)
437 |     
438 |     Returns:
439 |     difference -- difference (2) between the approximated gradient and the backward propagation gradient
440 |     """
441 |     
442 |     # Set-up variables
443 |     parameters_values, _ = dictionary_to_vector(parameters)
444 |     
445 |     grad = gradients_to_vector(gradients)
446 |     num_parameters = parameters_values.shape[0]
447 |     J_plus = np.zeros((num_parameters, 1))
448 |     J_minus = np.zeros((num_parameters, 1))
449 |     gradapprox = np.zeros((num_parameters, 1))
450 |     
451 |     # Compute gradapprox
452 |     for i in range(num_parameters):
453 |         
454 |         # Compute J_plus[i]. Inputs: "parameters_values, epsilon". Output = "J_plus[i]".
455 |         # "_" is used because the function you have outputs two parameters but we only care about the first one
456 |         #(approx. 3 lines)
457 |         # theta_plus =                                        # Step 1
458 |         # theta_plus[i] =                                     # Step 2
459 |         # J_plus[i], _ =                                     # Step 3
460 |         # YOUR CODE STARTS HERE
461 |         thetaplus = np.copy(parameters_values)                                       # Step 1
462 |         thetaplus[i][0] = thetaplus[i][0] + epsilon                                  # Step 2
463 |         J_plus[i], _ = forward_propagation_n(X, Y, vector_to_dictionary(thetaplus))  # Step 3
464 |         
465 |         # YOUR CODE ENDS HERE
466 |         
467 |         # Compute J_minus[i]. Inputs: "parameters_values, epsilon". Output = "J_minus[i]".
468 |         #(approx. 3 lines)
469 |         # theta_minus =                                    # Step 1
470 |         # theta_minus[i] =                                 # Step 2        
471 |         # J_minus[i], _ =                                 # Step 3
472 |         # YOUR CODE STARTS HERE
473 |         
474 |         thetaminus = np.copy(parameters_values)                                       # Step 1
475 |         thetaminus[i][0] = thetaminus[i][0] - epsilon                                 # Step 2
476 |         J_minus[i], _ = forward_propagation_n(X, Y, vector_to_dictionary(thetaminus)) # Step 3
477 |         # YOUR CODE ENDS HERE
478 |         
479 |         # Compute gradapprox[i]
480 |         # (approx. 1 line)
481 |         # gradapprox[i] = 
482 |         # YOUR CODE STARTS HERE
483 |         
484 |         gradapprox[i] = (J_plus[i] - J_minus[i])/(2*epsilon)
485 |         # YOUR CODE ENDS HERE
486 |     
487 |     # Compare gradapprox to backward propagation gradients by computing difference.
488 |     # (approx. 3 line)
489 |     # numerator =                                             # Step 1'
490 |     # denominator =                                           # Step 2'
491 |     # difference =                                            # Step 3'
492 |     # YOUR CODE STARTS HERE
493 |     
494 |     numerator = np.linalg.norm(grad - gradapprox)                       # Step 1'
495 |     denominator = np.linalg.norm(grad) + np.linalg.norm(gradapprox)     # Step 2'
496 |     difference = numerator/denominator                                  # Step 3'
497 |     # YOUR CODE ENDS HERE
498 |     if print_msg:
499 |         if difference > 2e-7:
500 |             print ("\033[93m" + "There is a mistake in the backward propagation! difference = " + str(difference) + "\033[0m")
501 |         else:
502 |             print ("\033[92m" + "Your backward propagation works perfectly fine! difference = " + str(difference) + "\033[0m")
503 | 
504 |     return difference
505 | 
506 | 
507 | # In[16]:
508 | 
509 | 
510 | X, Y, parameters = gradient_check_n_test_case()
511 | 
512 | cost, cache = forward_propagation_n(X, Y, parameters)
513 | gradients = backward_propagation_n(X, Y, cache)
514 | difference = gradient_check_n(parameters, gradients, X, Y, 1e-7, True)
515 | expected_values = [0.2850931567761623, 1.1890913024229996e-07]
516 | assert not(type(difference) == np.ndarray), "You are not using np.linalg.norm for numerator or denominator"
517 | assert np.any(np.isclose(difference, expected_values)), "Wrong value. It is not one of the expected values"
518 | 
519 | 
520 | # **Expected output**:
521 | # 
522 | # <table>
523 | #     <tr>
524 | #         <td>  <b> There is a mistake in the backward propagation!</b>  </td>
525 | #         <td> difference = 0.2850931567761623 </td>
526 | #     </tr>
527 | # </table>
528 | 
529 | # It seems that there were errors in the `backward_propagation_n` code! Good thing you've implemented the gradient check. Go back to `backward_propagation` and try to find/correct the errors *(Hint: check dW2 and db1)*. Rerun the gradient check when you think you've fixed it. Remember, you'll need to re-execute the cell defining `backward_propagation_n()` if you modify the code. 
530 | # 
531 | # Can you get gradient check to declare your derivative computation correct? Even though this part of the assignment isn't graded, you should try to find the bug and re-run gradient check until you're convinced backprop is now correctly implemented. 
532 | # 
533 | # **Notes** 
534 | # - Gradient Checking is slow! Approximating the gradient with $\frac{\partial J}{\partial \theta} \approx  \frac{J(\theta + \varepsilon) - J(\theta - \varepsilon)}{2 \varepsilon}$ is computationally costly. For this reason, we don't run gradient checking at every iteration during training. Just a few times to check if the gradient is correct. 
535 | # - Gradient Checking, at least as we've presented it, doesn't work with dropout. You would usually run the gradient check algorithm without dropout to make sure your backprop is correct, then add dropout. 
536 | # 
537 | # Congrats! Now you can be confident that your deep learning model for fraud detection is working correctly! You can even use this to convince your CEO. :) 
538 | # <br>
539 | # <font color='blue'>
540 | #     
541 | # **What you should remember from this notebook**:
542 | # - Gradient checking verifies closeness between the gradients from backpropagation and the numerical approximation of the gradient (computed using forward propagation).
543 | # - Gradient checking is slow, so you don't want to run it in every iteration of training. You would usually run it only to make sure your code is correct, then turn it off and use backprop for the actual learning process. 
544 | 


--------------------------------------------------------------------------------
/Course 2-Improving Deep Neural Networks/Week 1/Initialization.py:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/env python
  2 | # coding: utf-8
  3 | 
  4 | # # Initialization
  5 | # 
  6 | # Welcome to the first assignment of Improving Deep Neural Networks!
  7 | # 
  8 | # Training your neural network requires specifying an initial value of the weights. A well-chosen initialization method helps the learning process.
  9 | # 
 10 | # If you completed the previous course of this specialization, you probably followed the instructions for weight initialization, and seen that it's worked pretty well so far. But how do you choose the initialization for a new neural network? In this notebook, you'll try out a few different initializations, including random, zeros, and He initialization, and see how each leads to different results.
 11 | # 
 12 | # A well-chosen initialization can:
 13 | # - Speed up the convergence of gradient descent
 14 | # - Increase the odds of gradient descent converging to a lower training (and generalization) error 
 15 | # 
 16 | # Let's get started!
 17 | # 
 18 | # ## Important Note on Submission to the AutoGrader
 19 | # 
 20 | # Before submitting your assignment to the AutoGrader, please make sure you are not doing the following:
 21 | # 
 22 | # 1. You have not added any _extra_ `print` statement(s) in the assignment.
 23 | # 2. You have not added any _extra_ code cell(s) in the assignment.
 24 | # 3. You have not changed any of the function parameters.
 25 | # 4. You are not using any global variables inside your graded exercises. Unless specifically instructed to do so, please refrain from it and use the local variables instead.
 26 | # 5. You are not changing the assignment code where it is not required, like creating _extra_ variables.
 27 | # 
 28 | # If you do any of the following, you will get something like, `Grader Error: Grader feedback not found` (or similarly unexpected) error upon submitting your assignment. Before asking for help/debugging the errors in your assignment, check for these first. If this is the case, and you don't remember the changes you have made, you can get a fresh copy of the assignment by following these [instructions](https://www.coursera.org/learn/deep-neural-network/supplement/QWEnZ/h-ow-to-refresh-your-workspace).
 29 | 
 30 | # ## Table of Contents
 31 | # - [1 - Packages](#1)
 32 | # - [2 - Loading the Dataset](#2)
 33 | # - [3 - Neural Network Model](#3)
 34 | # - [4 - Zero Initialization](#4)
 35 | #     - [Exercise 1 - initialize_parameters_zeros](#ex-1)
 36 | # - [5 - Random Initialization](#5)
 37 | #     - [Exercise 2 - initialize_parameters_random](#ex-2)
 38 | # - [6 - He Initialization](#6)
 39 | #     - [Exercise 3 - initialize_parameters_he](#ex-3)
 40 | # - [7 - Conclusions](#7)
 41 | 
 42 | # <a name='1'></a>
 43 | # ## 1 - Packages
 44 | 
 45 | # In[1]:
 46 | 
 47 | 
 48 | import numpy as np
 49 | import matplotlib.pyplot as plt
 50 | import sklearn
 51 | import sklearn.datasets
 52 | from public_tests import *
 53 | from init_utils import sigmoid, relu, compute_loss, forward_propagation, backward_propagation
 54 | from init_utils import update_parameters, predict, load_dataset, plot_decision_boundary, predict_dec
 55 | 
 56 | get_ipython().run_line_magic('matplotlib', 'inline')
 57 | plt.rcParams['figure.figsize'] = (7.0, 4.0) # set default size of plots
 58 | plt.rcParams['image.interpolation'] = 'nearest'
 59 | plt.rcParams['image.cmap'] = 'gray'
 60 | 
 61 | get_ipython().run_line_magic('load_ext', 'autoreload')
 62 | get_ipython().run_line_magic('autoreload', '2')
 63 | 
 64 | # load image dataset: blue/red dots in circles
 65 | # train_X, train_Y, test_X, test_Y = load_dataset()
 66 | 
 67 | 
 68 | # <a name='2'></a>
 69 | # ## 2 - Loading the Dataset
 70 | 
 71 | # In[2]:
 72 | 
 73 | 
 74 | train_X, train_Y, test_X, test_Y = load_dataset()
 75 | 
 76 | 
 77 | # For this classifier, you want to separate the blue dots from the red dots.
 78 | 
 79 | # <a name='3'></a>
 80 | # ## 3 - Neural Network Model 
 81 | 
 82 | # You'll use a 3-layer neural network (already implemented for you). These are the initialization methods you'll experiment with: 
 83 | # - *Zeros initialization* --  setting `initialization = "zeros"` in the input argument.
 84 | # - *Random initialization* -- setting `initialization = "random"` in the input argument. This initializes the weights to large random values.  
 85 | # - *He initialization* -- setting `initialization = "he"` in the input argument. This initializes the weights to random values scaled according to a paper by He et al., 2015. 
 86 | # 
 87 | # **Instructions**: Instructions: Read over the code below, and run it. In the next part, you'll implement the three initialization methods that this `model()` calls.
 88 | 
 89 | # In[3]:
 90 | 
 91 | 
 92 | def model(X, Y, learning_rate = 0.01, num_iterations = 15000, print_cost = True, initialization = "he"):
 93 |     """
 94 |     Implements a three-layer neural network: LINEAR->RELU->LINEAR->RELU->LINEAR->SIGMOID.
 95 |     
 96 |     Arguments:
 97 |     X -- input data, of shape (2, number of examples)
 98 |     Y -- true "label" vector (containing 0 for red dots; 1 for blue dots), of shape (1, number of examples)
 99 |     learning_rate -- learning rate for gradient descent 
100 |     num_iterations -- number of iterations to run gradient descent
101 |     print_cost -- if True, print the cost every 1000 iterations
102 |     initialization -- flag to choose which initialization to use ("zeros","random" or "he")
103 |     
104 |     Returns:
105 |     parameters -- parameters learnt by the model
106 |     """
107 |         
108 |     grads = {}
109 |     costs = [] # to keep track of the loss
110 |     m = X.shape[1] # number of examples
111 |     layers_dims = [X.shape[0], 10, 5, 1]
112 |     
113 |     # Initialize parameters dictionary.
114 |     if initialization == "zeros":
115 |         parameters = initialize_parameters_zeros(layers_dims)
116 |     elif initialization == "random":
117 |         parameters = initialize_parameters_random(layers_dims)
118 |     elif initialization == "he":
119 |         parameters = initialize_parameters_he(layers_dims)
120 | 
121 |     # Loop (gradient descent)
122 | 
123 |     for i in range(num_iterations):
124 | 
125 |         # Forward propagation: LINEAR -> RELU -> LINEAR -> RELU -> LINEAR -> SIGMOID.
126 |         a3, cache = forward_propagation(X, parameters)
127 |         
128 |         # Loss
129 |         cost = compute_loss(a3, Y)
130 | 
131 |         # Backward propagation.
132 |         grads = backward_propagation(X, Y, cache)
133 |         
134 |         # Update parameters.
135 |         parameters = update_parameters(parameters, grads, learning_rate)
136 |         
137 |         # Print the loss every 1000 iterations
138 |         if print_cost and i % 1000 == 0:
139 |             print("Cost after iteration {}: {}".format(i, cost))
140 |             costs.append(cost)
141 |             
142 |     # plot the loss
143 |     plt.plot(costs)
144 |     plt.ylabel('cost')
145 |     plt.xlabel('iterations (per hundreds)')
146 |     plt.title("Learning rate =" + str(learning_rate))
147 |     plt.show()
148 |     
149 |     return parameters
150 | 
151 | 
152 | # <a name='4'></a>
153 | # ## 4 - Zero Initialization
154 | # 
155 | # There are two types of parameters to initialize in a neural network:
156 | # - the weight matrices $(W^{[1]}, W^{[2]}, W^{[3]}, ..., W^{[L-1]}, W^{[L]})$
157 | # - the bias vectors $(b^{[1]}, b^{[2]}, b^{[3]}, ..., b^{[L-1]}, b^{[L]})$
158 | # 
159 | # <a name='ex-1'></a>
160 | # ### Exercise 1 - initialize_parameters_zeros
161 | # 
162 | # Implement the following function to initialize all parameters to zeros. You'll see later that this does not work well since it fails to "break symmetry," but try it anyway and see what happens. Use `np.zeros((..,..))` with the correct shapes.
163 | 
164 | # In[4]:
165 | 
166 | 
167 | # GRADED FUNCTION: initialize_parameters_zeros 
168 | 
169 | def initialize_parameters_zeros(layers_dims):
170 |     """
171 |     Arguments:
172 |     layer_dims -- python array (list) containing the size of each layer.
173 |     
174 |     Returns:
175 |     parameters -- python dictionary containing your parameters "W1", "b1", ..., "WL", "bL":
176 |                     W1 -- weight matrix of shape (layers_dims[1], layers_dims[0])
177 |                     b1 -- bias vector of shape (layers_dims[1], 1)
178 |                     ...
179 |                     WL -- weight matrix of shape (layers_dims[L], layers_dims[L-1])
180 |                     bL -- bias vector of shape (layers_dims[L], 1)
181 |     """
182 |     
183 |     parameters = {}
184 |     L = len(layers_dims)            # number of layers in the network
185 |     
186 |     for l in range(1, L):
187 |         #(≈ 2 lines of code)
188 |         # parameters['W' + str(l)] = 
189 |         # parameters['b' + str(l)] = 
190 |         # YOUR CODE STARTS HERE
191 |         parameters['W' + str(l)] = np.zeros((layers_dims[l], layers_dims[l-1]))
192 |         parameters['b' + str(l)] = np.zeros((layers_dims[l], 1))
193 |         
194 |         # YOUR CODE ENDS HERE
195 |     return parameters
196 | 
197 | 
198 | # In[5]:
199 | 
200 | 
201 | parameters = initialize_parameters_zeros([3, 2, 1])
202 | print("W1 = " + str(parameters["W1"]))
203 | print("b1 = " + str(parameters["b1"]))
204 | print("W2 = " + str(parameters["W2"]))
205 | print("b2 = " + str(parameters["b2"]))
206 | initialize_parameters_zeros_test(initialize_parameters_zeros)
207 | 
208 | 
209 | # Run the following code to train your model on 15,000 iterations using zeros initialization.
210 | 
211 | # In[6]:
212 | 
213 | 
214 | parameters = model(train_X, train_Y, initialization = "zeros")
215 | print ("On the train set:")
216 | predictions_train = predict(train_X, train_Y, parameters)
217 | print ("On the test set:")
218 | predictions_test = predict(test_X, test_Y, parameters)
219 | 
220 | 
221 | # The performance is terrible, the cost doesn't decrease, and the algorithm performs no better than random guessing. Why? Take a look at the details of the predictions and the decision boundary:
222 | 
223 | # In[7]:
224 | 
225 | 
226 | print ("predictions_train = " + str(predictions_train))
227 | print ("predictions_test = " + str(predictions_test))
228 | 
229 | 
230 | # In[8]:
231 | 
232 | 
233 | plt.title("Model with Zeros initialization")
234 | axes = plt.gca()
235 | axes.set_xlim([-1.5,1.5])
236 | axes.set_ylim([-1.5,1.5])
237 | plot_decision_boundary(lambda x: predict_dec(parameters, x.T), train_X, train_Y)
238 | 
239 | 
240 | # __Note__: For sake of simplicity calculations below are done using only one example at a time.
241 | # 
242 | # Since the weights and biases are zero, multiplying by the weights creates the zero vector which gives 0 when the activation function is ReLU. As `z = 0`
243 | # 
244 | # $$a = ReLU(z) = max(0, z) = 0$$
245 | # 
246 | # At the classification layer, where the activation function is sigmoid you then get (for either input): 
247 | # 
248 | # $$\sigma(z) = \frac{1}{ 1 + e^{-(z)}} = \frac{1}{2} = y_{pred}$$
249 | # 
250 | # As for every example you are getting a 0.5 chance of it being true our cost function becomes helpless in adjusting the weights.
251 | # 
252 | # Your loss function:
253 | # $$ \mathcal{L}(a, y) =  - y  \ln(y_{pred}) - (1-y)  \ln(1-y_{pred})$$
254 | # 
255 | # For `y=1`, `y_pred=0.5` it becomes:
256 | # 
257 | # $$ \mathcal{L}(0, 1) =  - (1)  \ln(\frac{1}{2}) = 0.6931471805599453$$
258 | # 
259 | # For `y=0`, `y_pred=0.5` it becomes:
260 | # 
261 | # $$ \mathcal{L}(0, 0) =  - (1)  \ln(\frac{1}{2}) = 0.6931471805599453$$
262 | # 
263 | # As you can see with the prediction being 0.5 whether the actual (`y`) value is 1 or 0 you get the same loss value for both, so none of the weights get adjusted and you are stuck with the same old value of the weights. 
264 | # 
265 | # This is why you can see that the model is predicting 0 for every example! No wonder it's doing so badly.
266 | # 
267 | # In general, initializing all the weights to zero results in the network failing to break symmetry. This means that every neuron in each layer will learn the same thing, so you might as well be training a neural network with $n^{[l]}=1$ for every layer. This way, the network is no more powerful than a linear classifier like logistic regression. 
268 | 
269 | # <font color='blue'>
270 | #     
271 | # **What you should remember**:
272 | # - The weights $W^{[l]}$ should be initialized randomly to break symmetry. 
273 | # - However, it's okay to initialize the biases $b^{[l]}$ to zeros. Symmetry is still broken so long as $W^{[l]}$ is initialized randomly. 
274 | # 
275 | 
276 | # <a name='5'></a>
277 | # ## 5 - Random Initialization
278 | # 
279 | # To break symmetry, initialize the weights randomly. Following random initialization, each neuron can then proceed to learn a different function of its inputs. In this exercise, you'll see what happens when the weights are initialized randomly, but to very large values.
280 | # 
281 | # <a name='ex-2'></a>
282 | # ### Exercise 2 - initialize_parameters_random
283 | # 
284 | # Implement the following function to initialize your weights to large random values (scaled by \*10) and your biases to zeros. Use `np.random.randn(..,..) * 10` for weights and `np.zeros((.., ..))` for biases. You're using a fixed `np.random.seed(..)` to make sure your "random" weights  match ours, so don't worry if running your code several times always gives you the same initial values for the parameters. 
285 | 
286 | # In[9]:
287 | 
288 | 
289 | # GRADED FUNCTION: initialize_parameters_random
290 | 
291 | def initialize_parameters_random(layers_dims):
292 |     """
293 |     Arguments:
294 |     layer_dims -- python array (list) containing the size of each layer.
295 |     
296 |     Returns:
297 |     parameters -- python dictionary containing your parameters "W1", "b1", ..., "WL", "bL":
298 |                     W1 -- weight matrix of shape (layers_dims[1], layers_dims[0])
299 |                     b1 -- bias vector of shape (layers_dims[1], 1)
300 |                     ...
301 |                     WL -- weight matrix of shape (layers_dims[L], layers_dims[L-1])
302 |                     bL -- bias vector of shape (layers_dims[L], 1)
303 |     """
304 |     
305 |     np.random.seed(3)               # This seed makes sure your "random" numbers will be the as ours
306 |     parameters = {}
307 |     L = len(layers_dims)            # integer representing the number of layers
308 |     
309 |     for l in range(1, L):
310 |         #(≈ 2 lines of code)
311 |         # parameters['W' + str(l)] = 
312 |         # parameters['b' + str(l)] =
313 |         # YOUR CODE STARTS HERE
314 |         
315 |         parameters['W' + str(l)] = np.random.randn(layers_dims[l], layers_dims[l-1])*10
316 |         parameters['b' + str(l)] = np.zeros((layers_dims[l], 1))
317 |         # YOUR CODE ENDS HERE
318 | 
319 |     return parameters
320 | 
321 | 
322 | # In[10]:
323 | 
324 | 
325 | parameters = initialize_parameters_random([3, 2, 1])
326 | print("W1 = " + str(parameters["W1"]))
327 | print("b1 = " + str(parameters["b1"]))
328 | print("W2 = " + str(parameters["W2"]))
329 | print("b2 = " + str(parameters["b2"]))
330 | initialize_parameters_random_test(initialize_parameters_random)
331 | 
332 | 
333 | # Run the following code to train your model on 15,000 iterations using random initialization.
334 | 
335 | # In[11]:
336 | 
337 | 
338 | parameters = model(train_X, train_Y, initialization = "random")
339 | print ("On the train set:")
340 | predictions_train = predict(train_X, train_Y, parameters)
341 | print ("On the test set:")
342 | predictions_test = predict(test_X, test_Y, parameters)
343 | 
344 | 
345 | # If you see "inf" as the cost after the iteration 0, this is because of numerical roundoff. A more numerically sophisticated implementation would fix this, but for the purposes of this notebook, it isn't really worth worrying about.
346 | # 
347 | # In any case, you've now broken the symmetry, and this gives noticeably better accuracy than before. The model is no longer outputting all 0s. Progress!
348 | 
349 | # In[12]:
350 | 
351 | 
352 | print (predictions_train)
353 | print (predictions_test)
354 | 
355 | 
356 | # In[13]:
357 | 
358 | 
359 | plt.title("Model with large random initialization")
360 | axes = plt.gca()
361 | axes.set_xlim([-1.5,1.5])
362 | axes.set_ylim([-1.5,1.5])
363 | plot_decision_boundary(lambda x: predict_dec(parameters, x.T), train_X, train_Y)
364 | 
365 | 
366 | # **Observations**:
367 | # - The cost starts very high. This is because with large random-valued weights, the last activation (sigmoid) outputs results that are very close to 0 or 1 for some examples, and when it gets that example wrong it incurs a very high loss for that example. Indeed, when $\log(a^{[3]}) = \log(0)$, the loss goes to infinity.
368 | # - Poor initialization can lead to vanishing/exploding gradients, which also slows down the optimization algorithm. 
369 | # - If you train this network longer you will see better results, but initializing with overly large random numbers slows down the optimization.
370 | # 
371 | # <font color='blue'>
372 | #     
373 | # **In summary**:
374 | # - Initializing weights to very large random values doesn't work well. 
375 | # - Initializing with small random values should do better. The important question is, how small should be these random values be? Let's find out up next!
376 | # 
377 | # <font color='black'>    
378 | #     
379 | # **Optional Read:**
380 | # 
381 | # 
382 | # The main difference between Gaussian variable (`numpy.random.randn()`) and uniform random variable is the distribution of the generated random numbers:
383 | # 
384 | # - numpy.random.rand() produces numbers in a [uniform distribution](https://raw.githubusercontent.com/jahnog/deeplearning-notes/master/Course2/images/rand.jpg).
385 | # - and numpy.random.randn() produces numbers in a [normal distribution](https://raw.githubusercontent.com/jahnog/deeplearning-notes/master/Course2/images/randn.jpg).
386 | # 
387 | # When used for weight initialization, randn() helps most the weights to Avoid being close to the extremes, allocating most of them in the center of the range.
388 | # 
389 | # An intuitive way to see it is, for example, if you take the [sigmoid() activation function](https://raw.githubusercontent.com/jahnog/deeplearning-notes/master/Course2/images/sigmoid.jpg).
390 | # 
391 | # You’ll remember that the slope near 0 or near 1 is extremely small, so the weights near those extremes will converge much more slowly to the solution, and having most of them near the center will speed the convergence.
392 | 
393 | # <a name='6'></a>
394 | # ## 6 - He Initialization
395 | # 
396 | # Finally, try "He Initialization"; this is named for the first author of He et al., 2015. (If you have heard of "Xavier initialization", this is similar except Xavier initialization uses a scaling factor for the weights $W^{[l]}$ of `sqrt(1./layers_dims[l-1])` where He initialization would use `sqrt(2./layers_dims[l-1])`.)
397 | # 
398 | # <a name='ex-3'></a>
399 | # ### Exercise 3 - initialize_parameters_he
400 | # 
401 | # Implement the following function to initialize your parameters with He initialization. This function is similar to the previous `initialize_parameters_random(...)`. The only difference is that instead of multiplying `np.random.randn(..,..)` by 10, you will multiply it by $\sqrt{\frac{2}{\text{dimension of the previous layer}}}$, which is what He initialization recommends for layers with a ReLU activation. 
402 | 
403 | # In[16]:
404 | 
405 | 
406 | # GRADED FUNCTION: initialize_parameters_he
407 | 
408 | def initialize_parameters_he(layers_dims):
409 |     """
410 |     Arguments:
411 |     layer_dims -- python array (list) containing the size of each layer.
412 |     
413 |     Returns:
414 |     parameters -- python dictionary containing your parameters "W1", "b1", ..., "WL", "bL":
415 |                     W1 -- weight matrix of shape (layers_dims[1], layers_dims[0])
416 |                     b1 -- bias vector of shape (layers_dims[1], 1)
417 |                     ...
418 |                     WL -- weight matrix of shape (layers_dims[L], layers_dims[L-1])
419 |                     bL -- bias vector of shape (layers_dims[L], 1)
420 |     """
421 |     
422 |     np.random.seed(3)
423 |     parameters = {}
424 |     L = len(layers_dims) - 1 # integer representing the number of layers
425 |     import math
426 |     for l in range(1, L + 1):
427 |         ### START CODE HERE ### (≈ 2 lines of code)
428 |         parameters['W' + str(l)] = np.random.randn(layers_dims[l], layers_dims[l-1])*math.sqrt(2./layers_dims[l-1])
429 |         parameters['b' + str(l)] = np.zeros((layers_dims[l], 1))*math.sqrt(2./layers_dims[l-1])
430 |         ### END CODE HERE ###
431 |         
432 |     return parameters
433 | 
434 | 
435 | # In[17]:
436 | 
437 | 
438 | parameters = initialize_parameters_he([2, 4, 1])
439 | print("W1 = " + str(parameters["W1"]))
440 | print("b1 = " + str(parameters["b1"]))
441 | print("W2 = " + str(parameters["W2"]))
442 | print("b2 = " + str(parameters["b2"]))
443 | 
444 | initialize_parameters_he_test(initialize_parameters_he)
445 | # parameters
446 | 
447 | 
448 | # **Expected output**
449 | # 
450 | # ```
451 | # W1 = [[ 1.78862847  0.43650985]
452 | #  [ 0.09649747 -1.8634927 ]
453 | #  [-0.2773882  -0.35475898]
454 | #  [-0.08274148 -0.62700068]]
455 | # b1 = [[0.] [0.] [0.] [0.]]
456 | # W2 = [[-0.03098412 -0.33744411 -0.92904268  0.62552248]]
457 | # b2 = [[0.]]
458 | # ```
459 | 
460 | # Run the following code to train your model on 15,000 iterations using He initialization.
461 | 
462 | # In[18]:
463 | 
464 | 
465 | parameters = model(train_X, train_Y, initialization = "he")
466 | print ("On the train set:")
467 | predictions_train = predict(train_X, train_Y, parameters)
468 | print ("On the test set:")
469 | predictions_test = predict(test_X, test_Y, parameters)
470 | 
471 | 
472 | # In[19]:
473 | 
474 | 
475 | plt.title("Model with He initialization")
476 | axes = plt.gca()
477 | axes.set_xlim([-1.5,1.5])
478 | axes.set_ylim([-1.5,1.5])
479 | plot_decision_boundary(lambda x: predict_dec(parameters, x.T), train_X, train_Y)
480 | 
481 | 
482 | # **Observations**:
483 | # - The model with He initialization separates the blue and the red dots very well in a small number of iterations.
484 | # 
485 | 
486 | # <a name='7'></a>
487 | # ## 7 - Conclusions
488 | 
489 | # You've tried three different types of initializations. For the same number of iterations and same hyperparameters, the comparison is:
490 | # 
491 | # <table> 
492 | #     <tr>
493 | #         <td>
494 | #             <b>Model</b>
495 | #         </td>
496 | #         <td>
497 | #             <b>Train accuracy</b>
498 | #         </td>
499 | #         <td>
500 | #             <b>Problem/Comment</b>
501 | #         </td>
502 | #     </tr>
503 | #         <td>
504 | #         3-layer NN with zeros initialization
505 | #         </td>
506 | #         <td>
507 | #         50%
508 | #         </td>
509 | #         <td>
510 | #         fails to break symmetry
511 | #         </td>
512 | #     <tr>
513 | #         <td>
514 | #         3-layer NN with large random initialization
515 | #         </td>
516 | #         <td>
517 | #         83%
518 | #         </td>
519 | #         <td>
520 | #         too large weights 
521 | #         </td>
522 | #     </tr>
523 | #     <tr>
524 | #         <td>
525 | #         3-layer NN with He initialization
526 | #         </td>
527 | #         <td>
528 | #         99%
529 | #         </td>
530 | #         <td>
531 | #         recommended method
532 | #         </td>
533 | #     </tr>
534 | # </table> 
535 | 
536 | # **Congratulations**! You've completed this notebook on Initialization. 
537 | # 
538 | # Here's a quick recap of the main takeaways:
539 | # 
540 | # <font color='blue'>
541 | #     
542 | # - Different initializations lead to very different results
543 | # - Random initialization is used to break symmetry and make sure different hidden units can learn different things
544 | # - Resist initializing to values that are too large!
545 | # - He initialization works well for networks with ReLU activations
546 | 


--------------------------------------------------------------------------------
/Course 2-Improving Deep Neural Networks/Week 1/Week 1 Practical aspects of Deep Learning.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/TheKidPadra/DeepLearning.AI-Deep-Learning-Specialization/0a951ae6671f739e86b9a2846c821df1a0215ef6/Course 2-Improving Deep Neural Networks/Week 1/Week 1 Practical aspects of Deep Learning.pdf


--------------------------------------------------------------------------------
/Course 2-Improving Deep Neural Networks/Week 2/Week 2 Optimization Algorithms.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/TheKidPadra/DeepLearning.AI-Deep-Learning-Specialization/0a951ae6671f739e86b9a2846c821df1a0215ef6/Course 2-Improving Deep Neural Networks/Week 2/Week 2 Optimization Algorithms.pdf


--------------------------------------------------------------------------------
/Course 2-Improving Deep Neural Networks/Week 3/Week 3 Hyperparameter tuning, Batch Normalization, Programming Frameworks.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/TheKidPadra/DeepLearning.AI-Deep-Learning-Specialization/0a951ae6671f739e86b9a2846c821df1a0215ef6/Course 2-Improving Deep Neural Networks/Week 3/Week 3 Hyperparameter tuning, Batch Normalization, Programming Frameworks.pdf


--------------------------------------------------------------------------------
/Course 3-Structuring MachineLearningProjects/Week 1 Bird Recognition in the City of Peacetopia Case Study.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/TheKidPadra/DeepLearning.AI-Deep-Learning-Specialization/0a951ae6671f739e86b9a2846c821df1a0215ef6/Course 3-Structuring MachineLearningProjects/Week 1 Bird Recognition in the City of Peacetopia Case Study.pdf


--------------------------------------------------------------------------------
/Course 3-Structuring MachineLearningProjects/Week 2 Autonomous Driving Case Study.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/TheKidPadra/DeepLearning.AI-Deep-Learning-Specialization/0a951ae6671f739e86b9a2846c821df1a0215ef6/Course 3-Structuring MachineLearningProjects/Week 2 Autonomous Driving Case Study.pdf


--------------------------------------------------------------------------------
/Course 4-ConvolutionalNeuralNetworks/Week 1/Convolution_model_Application.py:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/env python
  2 | # coding: utf-8
  3 | 
  4 | # # Convolutional Neural Networks: Application
  5 | # 
  6 | # Welcome to Course 4's second assignment! In this notebook, you will:
  7 | # 
  8 | # - Create a mood classifer using the TF Keras Sequential API
  9 | # - Build a ConvNet to identify sign language digits using the TF Keras Functional API
 10 | # 
 11 | # **After this assignment you will be able to:**
 12 | # 
 13 | # - Build and train a ConvNet in TensorFlow for a __binary__ classification problem
 14 | # - Build and train a ConvNet in TensorFlow for a __multiclass__ classification problem
 15 | # - Explain different use cases for the Sequential and Functional APIs
 16 | # 
 17 | # To complete this assignment, you should already be familiar with TensorFlow. If you are not, please refer back to the **TensorFlow Tutorial** of the third week of Course 2 ("**Improving deep neural networks**").
 18 | # 
 19 | # ## Important Note on Submission to the AutoGrader
 20 | # 
 21 | # Before submitting your assignment to the AutoGrader, please make sure you are not doing the following:
 22 | # 
 23 | # 1. You have not added any _extra_ `print` statement(s) in the assignment.
 24 | # 2. You have not added any _extra_ code cell(s) in the assignment.
 25 | # 3. You have not changed any of the function parameters.
 26 | # 4. You are not using any global variables inside your graded exercises. Unless specifically instructed to do so, please refrain from it and use the local variables instead.
 27 | # 5. You are not changing the assignment code where it is not required, like creating _extra_ variables.
 28 | # 
 29 | # If you do any of the following, you will get something like, `Grader Error: Grader feedback not found` (or similarly unexpected) error upon submitting your assignment. Before asking for help/debugging the errors in your assignment, check for these first. If this is the case, and you don't remember the changes you have made, you can get a fresh copy of the assignment by following these [instructions](https://www.coursera.org/learn/convolutional-neural-networks/supplement/DS4yP/h-ow-to-refresh-your-workspace).
 30 | 
 31 | # ## Table of Contents
 32 | # 
 33 | # - [1 - Packages](#1)
 34 | #     - [1.1 - Load the Data and Split the Data into Train/Test Sets](#1-1)
 35 | # - [2 - Layers in TF Keras](#2)
 36 | # - [3 - The Sequential API](#3)
 37 | #     - [3.1 - Create the Sequential Model](#3-1)
 38 | #         - [Exercise 1 - happyModel](#ex-1)
 39 | #     - [3.2 - Train and Evaluate the Model](#3-2)
 40 | # - [4 - The Functional API](#4)
 41 | #     - [4.1 - Load the SIGNS Dataset](#4-1)
 42 | #     - [4.2 - Split the Data into Train/Test Sets](#4-2)
 43 | #     - [4.3 - Forward Propagation](#4-3)
 44 | #         - [Exercise 2 - convolutional_model](#ex-2)
 45 | #     - [4.4 - Train the Model](#4-4)
 46 | # - [5 - History Object](#5)
 47 | # - [6 - Bibliography](#6)
 48 | 
 49 | # <a name='1'></a>
 50 | # ## 1 - Packages
 51 | # 
 52 | # As usual, begin by loading in the packages.
 53 | 
 54 | # In[1]:
 55 | 
 56 | 
 57 | import math
 58 | import numpy as np
 59 | import h5py
 60 | import matplotlib.pyplot as plt
 61 | from matplotlib.pyplot import imread
 62 | import scipy
 63 | from PIL import Image
 64 | import pandas as pd
 65 | import tensorflow as tf
 66 | import tensorflow.keras.layers as tfl
 67 | from tensorflow.python.framework import ops
 68 | from cnn_utils import *
 69 | from test_utils import summary, comparator
 70 | 
 71 | get_ipython().run_line_magic('matplotlib', 'inline')
 72 | np.random.seed(1)
 73 | 
 74 | 
 75 | # <a name='1-1'></a>
 76 | # ### 1.1 - Load the Data and Split the Data into Train/Test Sets
 77 | # 
 78 | # You'll be using the Happy House dataset for this part of the assignment, which contains images of peoples' faces. Your task will be to build a ConvNet that determines whether the people in the images are smiling or not -- because they only get to enter the house if they're smiling!  
 79 | 
 80 | # In[2]:
 81 | 
 82 | 
 83 | X_train_orig, Y_train_orig, X_test_orig, Y_test_orig, classes = load_happy_dataset()
 84 | 
 85 | # Normalize image vectors
 86 | X_train = X_train_orig/255.
 87 | X_test = X_test_orig/255.
 88 | 
 89 | # Reshape
 90 | Y_train = Y_train_orig.T
 91 | Y_test = Y_test_orig.T
 92 | 
 93 | print ("number of training examples = " + str(X_train.shape[0]))
 94 | print ("number of test examples = " + str(X_test.shape[0]))
 95 | print ("X_train shape: " + str(X_train.shape))
 96 | print ("Y_train shape: " + str(Y_train.shape))
 97 | print ("X_test shape: " + str(X_test.shape))
 98 | print ("Y_test shape: " + str(Y_test.shape))
 99 | 
100 | 
101 | # You can display the images contained in the dataset. Images are **64x64** pixels in RGB format (3 channels).
102 | 
103 | # In[7]:
104 | 
105 | 
106 | index = 124
107 | plt.imshow(X_train_orig[index]) #display sample training image
108 | plt.show()
109 | 
110 | 
111 | # <a name='2'></a>
112 | # ## 2 - Layers in TF Keras 
113 | # 
114 | # In the previous assignment, you created layers manually in numpy. In TF Keras, you don't have to write code directly to create layers. Rather, TF Keras has pre-defined layers you can use. 
115 | # 
116 | # When you create a layer in TF Keras, you are creating a function that takes some input and transforms it into an output you can reuse later. Nice and easy! 
117 | 
118 | # <a name='3'></a>
119 | # ## 3 - The Sequential API
120 | # 
121 | # In the previous assignment, you built helper functions using `numpy` to understand the mechanics behind convolutional neural networks. Most practical applications of deep learning today are built using programming frameworks, which have many built-in functions you can simply call. Keras is a high-level abstraction built on top of TensorFlow, which allows for even more simplified and optimized model creation and training. 
122 | # 
123 | # For the first part of this assignment, you'll create a model using TF Keras' Sequential API, which allows you to build layer by layer, and is ideal for building models where each layer has **exactly one** input tensor and **one** output tensor. 
124 | # 
125 | # As you'll see, using the Sequential API is simple and straightforward, but is only appropriate for simpler, more straightforward tasks. Later in this notebook you'll spend some time building with a more flexible, powerful alternative: the Functional API. 
126 | #  
127 | 
128 | # <a name='3-1'></a>
129 | # ### 3.1 - Create the Sequential Model
130 | # 
131 | # As mentioned earlier, the TensorFlow Keras Sequential API can be used to build simple models with layer operations that proceed in a sequential order. 
132 | # 
133 | # You can also add layers incrementally to a Sequential model with the `.add()` method, or remove them using the `.pop()` method, much like you would in a regular Python list.
134 | # 
135 | # Actually, you can think of a Sequential model as behaving like a list of layers. Like Python lists, Sequential layers are ordered, and the order in which they are specified matters.  If your model is non-linear or contains layers with multiple inputs or outputs, a Sequential model wouldn't be the right choice!
136 | # 
137 | # For any layer construction in Keras, you'll need to specify the input shape in advance. This is because in Keras, the shape of the weights is based on the shape of the inputs. The weights are only created when the model first sees some input data. Sequential models can be created by passing a list of layers to the Sequential constructor, like you will do in the next assignment.
138 | # 
139 | # <a name='ex-1'></a>
140 | # ### Exercise 1 - happyModel
141 | # 
142 | # Implement the `happyModel` function below to build the following model: `ZEROPAD2D -> CONV2D -> BATCHNORM -> RELU -> MAXPOOL -> FLATTEN -> DENSE`. Take help from [tf.keras.layers](https://www.tensorflow.org/api_docs/python/tf/keras/layers) 
143 | # 
144 | # Also, plug in the following parameters for all the steps:
145 | # 
146 | #  - [ZeroPadding2D](https://www.tensorflow.org/api_docs/python/tf/keras/layers/ZeroPadding2D): padding 3, input shape 64 x 64 x 3
147 | #  - [Conv2D](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Conv2D): Use 32 7x7 filters, stride 1
148 | #  - [BatchNormalization](https://www.tensorflow.org/api_docs/python/tf/keras/layers/BatchNormalization): for axis 3
149 | #  - [ReLU](https://www.tensorflow.org/api_docs/python/tf/keras/layers/ReLU)
150 | #  - [MaxPool2D](https://www.tensorflow.org/api_docs/python/tf/keras/layers/MaxPool2D): Using default parameters
151 | #  - [Flatten](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Flatten) the previous output.
152 | #  - Fully-connected ([Dense](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Dense)) layer: Apply a fully connected layer with 1 neuron and a sigmoid activation. 
153 | #  
154 | #  
155 | #  **Hint:**
156 | #  
157 | #  Use **tfl** as shorthand for **tensorflow.keras.layers**
158 | 
159 | # In[8]:
160 | 
161 | 
162 | def happyModel():
163 |     """
164 |     Implements the forward propagation for the binary classification model:
165 |     ZEROPAD2D -> CONV2D -> BATCHNORM -> RELU -> MAXPOOL -> FLATTEN -> DENSE
166 |     
167 |     Note that for simplicity and grading purposes, you'll hard-code all the values
168 |     such as the stride and kernel (filter) sizes. 
169 |     Normally, functions should take these values as function parameters.
170 |     
171 |     Arguments:
172 |     None
173 | 
174 |     Returns:
175 |     model -- TF Keras model (object containing the information for the entire training process) 
176 |     """
177 |     model = tf.keras.Sequential([
178 |             ## ZeroPadding2D with padding 3, input shape of 64 x 64 x 3
179 |             tfl.ZeroPadding2D(padding=(3, 3), input_shape=(64, 64, 3)),
180 |             ## Conv2D with 32 7x7 filters and stride of 1
181 |             tfl.Conv2D(32, (7,7)),
182 |             ## BatchNormalization for axis 3
183 |             tfl.BatchNormalization(axis=-1),
184 |             ## ReLU
185 |             tfl.ReLU(),
186 |             ## Max Pooling 2D with default parameters
187 |             tfl.MaxPool2D(),
188 |             ## Flatten layer
189 |             tfl.Flatten(),
190 |             ## Dense layer with 1 unit for output & 'sigmoid' activation
191 |             tfl.Dense(1, activation='sigmoid')
192 |         ])
193 |     
194 |     return model
195 | 
196 | 
197 | # In[9]:
198 | 
199 | 
200 | happy_model = happyModel()
201 | # Print a summary for each layer
202 | for layer in summary(happy_model):
203 |     print(layer)
204 |     
205 | output = [['ZeroPadding2D', (None, 70, 70, 3), 0, ((3, 3), (3, 3))],
206 |             ['Conv2D', (None, 64, 64, 32), 4736, 'valid', 'linear', 'GlorotUniform'],
207 |             ['BatchNormalization', (None, 64, 64, 32), 128],
208 |             ['ReLU', (None, 64, 64, 32), 0],
209 |             ['MaxPooling2D', (None, 32, 32, 32), 0, (2, 2), (2, 2), 'valid'],
210 |             ['Flatten', (None, 32768), 0],
211 |             ['Dense', (None, 1), 32769, 'sigmoid']]
212 |     
213 | comparator(summary(happy_model), output)
214 | 
215 | 
216 | # #### Expected Output:
217 | # 
218 | # ```
219 | # ['ZeroPadding2D', (None, 70, 70, 3), 0, ((3, 3), (3, 3))]
220 | # ['Conv2D', (None, 64, 64, 32), 4736, 'valid', 'linear', 'GlorotUniform']
221 | # ['BatchNormalization', (None, 64, 64, 32), 128]
222 | # ['ReLU', (None, 64, 64, 32), 0]
223 | # ['MaxPooling2D', (None, 32, 32, 32), 0, (2, 2), (2, 2), 'valid']
224 | # ['Flatten', (None, 32768), 0]
225 | # ['Dense', (None, 1), 32769, 'sigmoid']
226 | # All tests passed!
227 | # ```
228 | 
229 | # Now that your model is created, you can compile it for training with an optimizer and loss of your choice. When the string `accuracy` is specified as a metric, the type of accuracy used will be automatically converted based on the loss function used. This is one of the many optimizations built into TensorFlow that make your life easier! If you'd like to read more on how the compiler operates, check the docs [here](https://www.tensorflow.org/api_docs/python/tf/keras/Model#compile).
230 | 
231 | # In[10]:
232 | 
233 | 
234 | happy_model.compile(optimizer='adam',
235 |                    loss='binary_crossentropy',
236 |                    metrics=['accuracy'])
237 | 
238 | 
239 | # It's time to check your model's parameters with the `.summary()` method. This will display the types of layers you have, the shape of the outputs, and how many parameters are in each layer. 
240 | 
241 | # In[11]:
242 | 
243 | 
244 | happy_model.summary()
245 | 
246 | 
247 | # <a name='3-2'></a>
248 | # ### 3.2 - Train and Evaluate the Model
249 | # 
250 | # After creating the model, compiling it with your choice of optimizer and loss function, and doing a sanity check on its contents, you are now ready to build! 
251 | # 
252 | # Simply call `.fit()` to train. That's it! No need for mini-batching, saving, or complex backpropagation computations. That's all been done for you, as you're using a TensorFlow dataset with the batches specified already. You do have the option to specify epoch number or minibatch size if you like (for example, in the case of an un-batched dataset).
253 | 
254 | # In[12]:
255 | 
256 | 
257 | happy_model.fit(X_train, Y_train, epochs=10, batch_size=16)
258 | 
259 | 
260 | # After that completes, just use `.evaluate()` to evaluate against your test set. This function will print the value of the loss function and the performance metrics specified during the compilation of the model. In this case, the `binary_crossentropy` and the `accuracy` respectively.
261 | 
262 | # In[13]:
263 | 
264 | 
265 | happy_model.evaluate(X_test, Y_test)
266 | 
267 | 
268 | # Easy, right? But what if you need to build a model with shared layers, branches, or multiple inputs and outputs? This is where Sequential, with its beautifully simple yet limited functionality, won't be able to help you. 
269 | # 
270 | # Next up: Enter the Functional API, your slightly more complex, highly flexible friend.  
271 | 
272 | # <a name='4'></a>
273 | # ## 4 - The Functional API
274 | 
275 | # Welcome to the second half of the assignment, where you'll use Keras' flexible [Functional API](https://www.tensorflow.org/guide/keras/functional) to build a ConvNet that can differentiate between 6 sign language digits. 
276 | # 
277 | # The Functional API can handle models with non-linear topology, shared layers, as well as layers with multiple inputs or outputs. Imagine that, where the Sequential API requires the model to move in a linear fashion through its layers, the Functional API allows much more flexibility. Where Sequential is a straight line, a Functional model is a graph, where the nodes of the layers can connect in many more ways than one. 
278 | # 
279 | # In the visual example below, the one possible direction of the movement Sequential model is shown in contrast to a skip connection, which is just one of the many ways a Functional model can be constructed. A skip connection, as you might have guessed, skips some layer in the network and feeds the output to a later layer in the network. Don't worry, you'll be spending more time with skip connections very soon! 
280 | 
281 | # <img src="images/seq_vs_func.png" style="width:350px;height:200px;">
282 | 
283 | # <a name='4-1'></a>
284 | # ### 4.1 - Load the SIGNS Dataset
285 | # 
286 | # As a reminder, the SIGNS dataset is a collection of 6 signs representing numbers from 0 to 5.
287 | 
288 | # In[14]:
289 | 
290 | 
291 | # Loading the data (signs)
292 | X_train_orig, Y_train_orig, X_test_orig, Y_test_orig, classes = load_signs_dataset()
293 | 
294 | 
295 | # <img src="images/SIGNS.png" style="width:800px;height:300px;">
296 | # 
297 | # The next cell will show you an example of a labelled image in the dataset. Feel free to change the value of `index` below and re-run to see different examples. 
298 | 
299 | # In[15]:
300 | 
301 | 
302 | # Example of an image from the dataset
303 | index = 9
304 | plt.imshow(X_train_orig[index])
305 | print ("y = " + str(np.squeeze(Y_train_orig[:, index])))
306 | 
307 | 
308 | # <a name='4-2'></a>
309 | # ### 4.2 - Split the Data into Train/Test Sets
310 | # 
311 | # In Course 2, you built a fully-connected network for this dataset. But since this is an image dataset, it is more natural to apply a ConvNet to it.
312 | # 
313 | # To get started, let's examine the shapes of your data. 
314 | 
315 | # In[16]:
316 | 
317 | 
318 | X_train = X_train_orig/255.
319 | X_test = X_test_orig/255.
320 | Y_train = convert_to_one_hot(Y_train_orig, 6).T
321 | Y_test = convert_to_one_hot(Y_test_orig, 6).T
322 | print ("number of training examples = " + str(X_train.shape[0]))
323 | print ("number of test examples = " + str(X_test.shape[0]))
324 | print ("X_train shape: " + str(X_train.shape))
325 | print ("Y_train shape: " + str(Y_train.shape))
326 | print ("X_test shape: " + str(X_test.shape))
327 | print ("Y_test shape: " + str(Y_test.shape))
328 | 
329 | 
330 | # <a name='4-3'></a>
331 | # ### 4.3 - Forward Propagation
332 | # 
333 | # In TensorFlow, there are built-in functions that implement the convolution steps for you. By now, you should be familiar with how TensorFlow builds computational graphs. In the [Functional API](https://www.tensorflow.org/guide/keras/functional), you create a graph of layers. This is what allows such great flexibility.
334 | # 
335 | # However, the following model could also be defined using the Sequential API since the information flow is on a single line. But don't deviate. What we want you to learn is to use the functional API.
336 | # 
337 | # Begin building your graph of layers by creating an input node that functions as a callable object:
338 | # 
339 | # - **input_img = tf.keras.Input(shape=input_shape):** 
340 | # 
341 | # Then, create a new node in the graph of layers by calling a layer on the `input_img` object: 
342 | # 
343 | # - **tf.keras.layers.Conv2D(filters= ... , kernel_size= ... , padding='same')(input_img):** Read the full documentation on [Conv2D](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Conv2D).
344 | # 
345 | # - **tf.keras.layers.MaxPool2D(pool_size=(f, f), strides=(s, s), padding='same'):** `MaxPool2D()` downsamples your input using a window of size (f, f) and strides of size (s, s) to carry out max pooling over each window.  For max pooling, you usually operate on a single example at a time and a single channel at a time. Read the full documentation on [MaxPool2D](https://www.tensorflow.org/api_docs/python/tf/keras/layers/MaxPool2D).
346 | # 
347 | # - **tf.keras.layers.ReLU():** computes the elementwise ReLU of Z (which can be any shape). You can read the full documentation on [ReLU](https://www.tensorflow.org/api_docs/python/tf/keras/layers/ReLU).
348 | # 
349 | # - **tf.keras.layers.Flatten()**: given a tensor "P", this function takes each training (or test) example in the batch and flattens it into a 1D vector.  
350 | # 
351 | #     * If a tensor P has the shape (batch_size,h,w,c), it returns a flattened tensor with shape (batch_size, k), where $k=h \times w \times c$.  "k" equals the product of all the dimension sizes other than the first dimension.
352 | #     
353 | #     * For example, given a tensor with dimensions [100, 2, 3, 4], it flattens the tensor to be of shape [100, 24], where 24 = 2 * 3 * 4.  You can read the full documentation on [Flatten](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Flatten).
354 | # 
355 | # - **tf.keras.layers.Dense(units= ... , activation='softmax')(F):** given the flattened input F, it returns the output computed using a fully connected layer. You can read the full documentation on [Dense](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Dense).
356 | # 
357 | # In the last function above (`tf.keras.layers.Dense()`), the fully connected layer automatically initializes weights in the graph and keeps on training them as you train the model. Hence, you did not need to initialize those weights when initializing the parameters.
358 | # 
359 | # Lastly, before creating the model, you'll need to define the output using the last of the function's compositions (in this example, a Dense layer): 
360 | # 
361 | # - **outputs = tf.keras.layers.Dense(units=6, activation='softmax')(F)**
362 | # 
363 | # 
364 | # #### Window, kernel, filter, pool
365 | # 
366 | # The words "kernel" and "filter" are used to refer to the same thing. The word "filter" accounts for the amount of "kernels" that will be used in a single convolution layer. "Pool" is the name of the operation that takes the max or average value of the kernels. 
367 | # 
368 | # This is why the parameter `pool_size` refers to `kernel_size`, and you use `(f,f)` to refer to the filter size. 
369 | # 
370 | # Pool size and kernel size refer to the same thing in different objects - They refer to the shape of the window where the operation takes place. 
371 | 
372 | # <a name='ex-2'></a>
373 | # ### Exercise 2 - convolutional_model
374 | # 
375 | # Implement the `convolutional_model` function below to build the following model: `CONV2D -> RELU -> MAXPOOL -> CONV2D -> RELU -> MAXPOOL -> FLATTEN -> DENSE`. Use the functions above! 
376 | # 
377 | # Also, plug in the following parameters for all the steps:
378 | # 
379 | #  - [Conv2D](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Conv2D): Use 8 4 by 4 filters, stride 1, padding is "SAME"
380 | #  - [ReLU](https://www.tensorflow.org/api_docs/python/tf/keras/layers/ReLU)
381 | #  - [MaxPool2D](https://www.tensorflow.org/api_docs/python/tf/keras/layers/MaxPool2D): Use an 8 by 8 filter size and an 8 by 8 stride, padding is "SAME"
382 | #  - **Conv2D**: Use 16 2 by 2 filters, stride 1, padding is "SAME"
383 | #  - **ReLU**
384 | #  - **MaxPool2D**: Use a 4 by 4 filter size and a 4 by 4 stride, padding is "SAME"
385 | #  - [Flatten](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Flatten) the previous output.
386 | #  - Fully-connected ([Dense](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Dense)) layer: Apply a fully connected layer with 6 neurons and a softmax activation. 
387 | 
388 | # In[17]:
389 | 
390 | 
391 | # GRADED FUNCTION: convolutional_model
392 | 
393 | def convolutional_model(input_shape):
394 |     """
395 |     Implements the forward propagation for the model:
396 |     CONV2D -> RELU -> MAXPOOL -> CONV2D -> RELU -> MAXPOOL -> FLATTEN -> DENSE
397 |     
398 |     Note that for simplicity and grading purposes, you'll hard-code some values
399 |     such as the stride and kernel (filter) sizes. 
400 |     Normally, functions should take these values as function parameters.
401 |     
402 |     Arguments:
403 |     input_img -- input dataset, of shape (input_shape)
404 | 
405 |     Returns:
406 |     model -- TF Keras model (object containing the information for the entire training process) 
407 |     """
408 | 
409 |     input_img = tf.keras.Input(shape=input_shape)
410 |     ## CONV2D: 8 filters 4x4, stride of 1, padding 'SAME'
411 |     # Z1 = None
412 |     ## RELU
413 |     # A1 = None
414 |     ## MAXPOOL: window 8x8, stride 8, padding 'SAME'
415 |     # P1 = None
416 |     ## CONV2D: 16 filters 2x2, stride 1, padding 'SAME'
417 |     # Z2 = None
418 |     ## RELU
419 |     # A2 = None
420 |     ## MAXPOOL: window 4x4, stride 4, padding 'SAME'
421 |     # P2 = None
422 |     ## FLATTEN
423 |     # F = None
424 |     ## Dense layer
425 |     ## 6 neurons in output layer. Hint: one of the arguments should be "activation='softmax'" 
426 |     # outputs = None
427 |     # YOUR CODE STARTS HERE
428 |     Z1 = tfl.Conv2D(8, 4, activation='linear', padding="same", strides=1)(input_img)
429 |     A1 = tfl.ReLU()(Z1)
430 |     P1 = tfl.MaxPool2D(pool_size=(8, 8), strides=(8, 8), padding='same')(A1)
431 |     Z2 = tfl.Conv2D(16, 2, activation='linear', padding="same", strides=1)(P1)
432 |     A2 = tfl.ReLU()(Z2)
433 |     P2 = tfl.MaxPool2D(pool_size=(4, 4), strides=(4, 4), padding='same')(A2)
434 |     F = tfl.Flatten()(P2)
435 |     outputs = tfl.Dense(6, activation='softmax')(F)
436 |     
437 |     # YOUR CODE ENDS HERE
438 |     model = tf.keras.Model(inputs=input_img, outputs=outputs)
439 |     return model
440 | 
441 | 
442 | # In[18]:
443 | 
444 | 
445 | conv_model = convolutional_model((64, 64, 3))
446 | conv_model.compile(optimizer='adam',
447 |                   loss='categorical_crossentropy',
448 |                   metrics=['accuracy'])
449 | conv_model.summary()
450 |     
451 | output = [['InputLayer', [(None, 64, 64, 3)], 0],
452 |         ['Conv2D', (None, 64, 64, 8), 392, 'same', 'linear', 'GlorotUniform'],
453 |         ['ReLU', (None, 64, 64, 8), 0],
454 |         ['MaxPooling2D', (None, 8, 8, 8), 0, (8, 8), (8, 8), 'same'],
455 |         ['Conv2D', (None, 8, 8, 16), 528, 'same', 'linear', 'GlorotUniform'],
456 |         ['ReLU', (None, 8, 8, 16), 0],
457 |         ['MaxPooling2D', (None, 2, 2, 16), 0, (4, 4), (4, 4), 'same'],
458 |         ['Flatten', (None, 64), 0],
459 |         ['Dense', (None, 6), 390, 'softmax']]
460 |     
461 | comparator(summary(conv_model), output)
462 | 
463 | 
464 | # Both the Sequential and Functional APIs return a TF Keras model object. The only difference is how inputs are handled inside the object model! 
465 | 
466 | # <a name='4-4'></a>
467 | # ### 4.4 - Train the Model
468 | 
469 | # In[19]:
470 | 
471 | 
472 | train_dataset = tf.data.Dataset.from_tensor_slices((X_train, Y_train)).batch(64)
473 | test_dataset = tf.data.Dataset.from_tensor_slices((X_test, Y_test)).batch(64)
474 | history = conv_model.fit(train_dataset, epochs=100, validation_data=test_dataset)
475 | 
476 | 
477 | # <a name='5'></a>
478 | # ## 5 - History Object 
479 | # 
480 | # The history object is an output of the `.fit()` operation, and provides a record of all the loss and metric values in memory. It's stored as a dictionary that you can retrieve at `history.history`: 
481 | 
482 | # In[20]:
483 | 
484 | 
485 | history.history
486 | 
487 | 
488 | # Now visualize the loss over time using `history.history`: 
489 | 
490 | # In[21]:
491 | 
492 | 
493 | # The history.history["loss"] entry is a dictionary with as many values as epochs that the
494 | # model was trained on. 
495 | df_loss_acc = pd.DataFrame(history.history)
496 | df_loss= df_loss_acc[['loss','val_loss']]
497 | df_loss.rename(columns={'loss':'train','val_loss':'validation'},inplace=True)
498 | df_acc= df_loss_acc[['accuracy','val_accuracy']]
499 | df_acc.rename(columns={'accuracy':'train','val_accuracy':'validation'},inplace=True)
500 | df_loss.plot(title='Model loss',figsize=(12,8)).set(xlabel='Epoch',ylabel='Loss')
501 | df_acc.plot(title='Model Accuracy',figsize=(12,8)).set(xlabel='Epoch',ylabel='Accuracy')
502 | 
503 | 
504 | # **Congratulations**! You've finished the assignment and built two models: One that recognizes  smiles, and another that recognizes SIGN language with almost 80% accuracy on the test set. In addition to that, you now also understand the applications of two Keras APIs: Sequential and Functional. Nicely done! 
505 | # 
506 | # By now, you know a bit about how the Functional API works and may have glimpsed the possibilities. In your next assignment, you'll really get a feel for its power when you get the opportunity to build a very deep ConvNet, using ResNets! 
507 | 
508 | # <a name='6'></a>
509 | # ## 6 - Bibliography
510 | # 
511 | # You're always encouraged to read the official documentation. To that end, you can find the docs for the Sequential and Functional APIs here: 
512 | # 
513 | # https://www.tensorflow.org/guide/keras/sequential_model
514 | # 
515 | # https://www.tensorflow.org/guide/keras/functional
516 | 


--------------------------------------------------------------------------------
/Course 4-ConvolutionalNeuralNetworks/Week 1/Week 1 The Basics of ConvNets.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/TheKidPadra/DeepLearning.AI-Deep-Learning-Specialization/0a951ae6671f739e86b9a2846c821df1a0215ef6/Course 4-ConvolutionalNeuralNetworks/Week 1/Week 1 The Basics of ConvNets.pdf


--------------------------------------------------------------------------------
/Course 4-ConvolutionalNeuralNetworks/Week 2/Transfer_learning_with_MobileNet_v1.py:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/env python
  2 | # coding: utf-8
  3 | 
  4 | # # Transfer Learning with MobileNetV2
  5 | 
  6 | # Welcome to this week's assignment, where you'll be using transfer learning on a pre-trained CNN to build an Alpaca/Not Alpaca classifier!
  7 | # 
  8 | # <img src="images/alpaca.png" style="width:300px;height:220px;">
  9 | # 
 10 | # A pre-trained model is a network that's already been trained on a large dataset and saved, which allows you to use it to customize your own model cheaply and efficiently. The one you'll be using, MobileNetV2, was designed to provide fast and computationally efficient performance. It's been pre-trained on ImageNet, a dataset containing over 14 million images and 1000 classes.
 11 | # 
 12 | # By the end of this assignment, you will be able to:
 13 | # 
 14 | # - Create a dataset from a directory
 15 | # - Preprocess and augment data using the Sequential API
 16 | # - Adapt a pretrained model to new data and train a classifier using the Functional API and MobileNet
 17 | # - Fine-tune a classifier's final layers to improve accuracy
 18 | # 
 19 | # ## Important Note on Submission to the AutoGrader
 20 | # 
 21 | # Before submitting your assignment to the AutoGrader, please make sure you are not doing the following:
 22 | # 
 23 | # 1. You have not added any _extra_ `print` statement(s) in the assignment.
 24 | # 2. You have not added any _extra_ code cell(s) in the assignment.
 25 | # 3. You have not changed any of the function parameters.
 26 | # 4. You are not using any global variables inside your graded exercises. Unless specifically instructed to do so, please refrain from it and use the local variables instead.
 27 | # 5. You are not changing the assignment code where it is not required, like creating _extra_ variables.
 28 | # 
 29 | # If you do any of the following, you will get something like, `Grader Error: Grader feedback not found` (or similarly unexpected) error upon submitting your assignment. Before asking for help/debugging the errors in your assignment, check for these first. If this is the case, and you don't remember the changes you have made, you can get a fresh copy of the assignment by following these [instructions](https://www.coursera.org/learn/convolutional-neural-networks/supplement/DS4yP/h-ow-to-refresh-your-workspace).
 30 | 
 31 | # ## Table of Content
 32 | # 
 33 | # - [1 - Packages](#1)
 34 | #     - [1.1 Create the Dataset and Split it into Training and Validation Sets](#1-1)
 35 | # - [2 - Preprocess and Augment Training Data](#2)
 36 | #     - [Exercise 1 - data_augmenter](#ex-1)
 37 | # - [3 - Using MobileNetV2 for Transfer Learning](#3)
 38 | #     - [3.1 - Inside a MobileNetV2 Convolutional Building Block](#3-1)
 39 | #     - [3.2 - Layer Freezing with the Functional API](#3-2)
 40 | #         - [Exercise 2 - alpaca_model](#ex-2)
 41 | #     - [3.3 - Fine-tuning the Model](#3-3)
 42 | #         - [Exercise 3](#ex-3)
 43 | 
 44 | # <a name='1'></a>
 45 | # ## 1 - Packages
 46 | 
 47 | # In[2]:
 48 | 
 49 | 
 50 | import matplotlib.pyplot as plt
 51 | import numpy as np
 52 | import os
 53 | import tensorflow as tf
 54 | import tensorflow.keras.layers as tfl
 55 | 
 56 | from tensorflow.keras.preprocessing import image_dataset_from_directory
 57 | from tensorflow.keras.layers.experimental.preprocessing import RandomFlip, RandomRotation
 58 | 
 59 | 
 60 | # <a name='1-1'></a>
 61 | # ### 1.1 Create the Dataset and Split it into Training and Validation Sets
 62 | # 
 63 | # When training and evaluating deep learning models in Keras, generating a dataset from image files stored on disk is simple and fast. Call `image_data_set_from_directory()` to read from the directory and create both training and validation datasets. 
 64 | # 
 65 | # If you're specifying a validation split, you'll also need to specify the subset for each portion. Just set the training set to `subset='training'` and the validation set to `subset='validation'`.
 66 | # 
 67 | # You'll also set your seeds to match each other, so your training and validation sets don't overlap. :) 
 68 | 
 69 | # In[3]:
 70 | 
 71 | 
 72 | BATCH_SIZE = 32
 73 | IMG_SIZE = (160, 160)
 74 | directory = "dataset/"
 75 | train_dataset = image_dataset_from_directory(directory,
 76 |                                              shuffle=True,
 77 |                                              batch_size=BATCH_SIZE,
 78 |                                              image_size=IMG_SIZE,
 79 |                                              validation_split=0.2,
 80 |                                              subset='training',
 81 |                                              seed=42)
 82 | validation_dataset = image_dataset_from_directory(directory,
 83 |                                              shuffle=True,
 84 |                                              batch_size=BATCH_SIZE,
 85 |                                              image_size=IMG_SIZE,
 86 |                                              validation_split=0.2,
 87 |                                              subset='validation',
 88 |                                              seed=42)
 89 | 
 90 | 
 91 | # Now let's take a look at some of the images from the training set: 
 92 | # 
 93 | # **Note:** The original dataset has some mislabelled images in it as well.
 94 | 
 95 | # In[4]:
 96 | 
 97 | 
 98 | class_names = train_dataset.class_names
 99 | 
100 | plt.figure(figsize=(10, 10))
101 | for images, labels in train_dataset.take(1):
102 |     for i in range(9):
103 |         ax = plt.subplot(3, 3, i + 1)
104 |         plt.imshow(images[i].numpy().astype("uint8"))
105 |         plt.title(class_names[labels[i]])
106 |         plt.axis("off")
107 | 
108 | 
109 | # <a name='2'></a>
110 | # ## 2 - Preprocess and Augment Training Data
111 | # 
112 | # You may have encountered `dataset.prefetch` in a previous TensorFlow assignment, as an important extra step in data preprocessing. 
113 | # 
114 | # Using `prefetch()` prevents a memory bottleneck that can occur when reading from disk. It sets aside some data and keeps it ready for when it's needed, by creating a source dataset from your input data, applying a transformation to preprocess it, then iterating over the dataset one element at a time. Because the iteration is streaming, the data doesn't need to fit into memory.
115 | # 
116 | # You can set the number of elements to prefetch manually, or you can use `tf.data.experimental.AUTOTUNE` to choose the parameters automatically. Autotune prompts `tf.data` to tune that value dynamically at runtime, by tracking the time spent in each operation and feeding those times into an optimization algorithm. The optimization algorithm tries to find the best allocation of its CPU budget across all tunable operations. 
117 | # 
118 | # To increase diversity in the training set and help your model learn the data better, it's standard practice to augment the images by transforming them, i.e., randomly flipping and rotating them. Keras' Sequential API offers a straightforward method for these kinds of data augmentations, with built-in, customizable preprocessing layers. These layers are saved with the rest of your model and can be re-used later.  Ahh, so convenient! 
119 | # 
120 | # As always, you're invited to read the official docs, which you can find for data augmentation [here](https://www.tensorflow.org/tutorials/images/data_augmentation).
121 | # 
122 | 
123 | # In[6]:
124 | 
125 | 
126 | AUTOTUNE = tf.data.experimental.AUTOTUNE
127 | train_dataset = train_dataset.prefetch(buffer_size=AUTOTUNE)
128 | 
129 | 
130 | # <a name='ex-1'></a>
131 | # ### Exercise 1 - data_augmenter
132 | # 
133 | # Implement a function for data augmentation. Use a `Sequential` keras model composed of 2 layers:
134 | # * `RandomFlip('horizontal')`
135 | # * `RandomRotation(0.2)`
136 | 
137 | # In[8]:
138 | 
139 | 
140 | # UNQ_C1
141 | # GRADED FUNCTION: data_augmenter
142 | def data_augmenter():
143 |     '''
144 |     Create a Sequential model composed of 2 layers
145 |     Returns:
146 |         tf.keras.Sequential
147 |     '''
148 |     ### START CODE HERE
149 |     data_augmentation = tf.keras.Sequential()
150 |     data_augmentation.add(RandomFlip('horizontal'))
151 |     data_augmentation.add(RandomRotation(0.2))
152 |     ### END CODE HERE
153 |     
154 |     return data_augmentation
155 | 
156 | 
157 | # In[9]:
158 | 
159 | 
160 | augmenter = data_augmenter()
161 | 
162 | assert(augmenter.layers[0].name.startswith('random_flip')), "First layer must be RandomFlip"
163 | assert augmenter.layers[0].mode == 'horizontal', "RadomFlip parameter must be horizontal"
164 | assert(augmenter.layers[1].name.startswith('random_rotation')), "Second layer must be RandomRotation"
165 | assert augmenter.layers[1].factor == 0.2, "Rotation factor must be 0.2"
166 | assert len(augmenter.layers) == 2, "The model must have only 2 layers"
167 | 
168 | print('\033[92mAll tests passed!')
169 | 
170 | 
171 | # Take a look at how an image from the training set has been augmented with simple transformations:
172 | # 
173 | # From one cute animal, to 9 variations of that cute animal, in three lines of code. Now your model has a lot more to learn from.
174 | 
175 | # In[10]:
176 | 
177 | 
178 | data_augmentation = data_augmenter()
179 | 
180 | for image, _ in train_dataset.take(1):
181 |     plt.figure(figsize=(10, 10))
182 |     first_image = image[0]
183 |     for i in range(9):
184 |         ax = plt.subplot(3, 3, i + 1)
185 |         augmented_image = data_augmentation(tf.expand_dims(first_image, 0))
186 |         plt.imshow(augmented_image[0] / 255)
187 |         plt.axis('off')
188 | 
189 | 
190 | # Next, you'll apply your first tool from the MobileNet application in TensorFlow, to normalize your input. Since you're using a pre-trained model that was trained on the normalization values [-1,1], it's best practice to reuse that standard with tf.keras.applications.mobilenet_v2.preprocess_input.
191 | 
192 | # <font color = 'blue'>
193 | # 
194 | # **What you should remember:**
195 | # 
196 | # * When calling image_data_set_from_directory(), specify the train/val subsets and match the seeds to prevent overlap
197 | # * Use prefetch() to prevent memory bottlenecks when reading from disk
198 | # * Give your model more to learn from with simple data augmentations like rotation and flipping.
199 | # * When using a pretrained model, it's best to reuse the weights it was trained on.
200 | 
201 | # In[11]:
202 | 
203 | 
204 | preprocess_input = tf.keras.applications.mobilenet_v2.preprocess_input
205 | 
206 | 
207 | # <a name='3'></a>
208 | # ## 3 - Using MobileNetV2 for Transfer Learning 
209 | # 
210 | # MobileNetV2 was trained on ImageNet and is optimized to run on mobile and other low-power applications. It's 155 layers deep (just in case you felt the urge to plot the model yourself, prepare for a long journey!) and very efficient for object detection and image segmentation tasks, as well as classification tasks like this one. The architecture has three defining characteristics:
211 | # 
212 | # *   Depthwise separable convolutions
213 | # *   Thin input and output bottlenecks between layers
214 | # *   Shortcut connections between bottleneck layers
215 | # 
216 | # <a name='3-1'></a>
217 | # ### 3.1 - Inside a MobileNetV2 Convolutional Building Block
218 | # 
219 | # MobileNetV2 uses depthwise separable convolutions as efficient building blocks. Traditional convolutions are often very resource-intensive, and  depthwise separable convolutions are able to reduce the number of trainable parameters and operations and also speed up convolutions in two steps: 
220 | # 
221 | # 1. The first step calculates an intermediate result by convolving on each of the channels independently. This is the depthwise convolution.
222 | # 
223 | # 2. In the second step, another convolution merges the outputs of the previous step into one. This gets a single result from a single feature at a time, and then is applied to all the filters in the output layer. This is the pointwise convolution, or: **Shape of the depthwise convolution X Number of filters.**
224 | # 
225 | # <img src="images/mobilenetv2.png" style="width:650px;height:450px;">
226 | # <caption><center> <u> <font color='purple'> <b>Figure 1</b> </u><font color='purple'>  : <b>MobileNetV2 Architecture</b> <br> This diagram was inspired by the original seen <a href="https://ai.googleblog.com/2018/04/mobilenetv2-next-generation-of-on.html#:~:text=MobileNetV2%20is%20a%20significant%20improvement,object%20detection%20and%20semantic%20segmentation.">here</a>.</center></caption>
227 | # 
228 | # Each block consists of an inverted residual structure with a bottleneck at each end. These bottlenecks encode the intermediate inputs and outputs in a low dimensional space, and prevent non-linearities from destroying important information. 
229 | # 
230 | # The shortcut connections, which are similar to the ones in traditional residual networks, serve the same purpose of speeding up training and improving predictions. These connections skip over the intermediate convolutions and connect the bottleneck layers. 
231 | 
232 | # Let's try to train your base model using all the layers from the pretrained model. 
233 | # 
234 | # Similarly to how you reused the pretrained normalization values MobileNetV2 was trained on, you'll also load the pretrained weights from ImageNet by specifying `weights='imagenet'`. 
235 | 
236 | # In[12]:
237 | 
238 | 
239 | IMG_SHAPE = IMG_SIZE + (3,)
240 | base_model = tf.keras.applications.MobileNetV2(input_shape=IMG_SHAPE,
241 |                                                include_top=True,
242 |                                                weights='imagenet')
243 | 
244 | 
245 | # Print the model summary below to see all the model's layers, the shapes of their outputs, and the total number of parameters, trainable and non-trainable. 
246 | 
247 | # In[13]:
248 | 
249 | 
250 | base_model.summary()
251 | 
252 | 
253 | # Note the last 2 layers here. They are the so called top layers, and they are responsible of the classification in the model
254 | 
255 | # In[14]:
256 | 
257 | 
258 | nb_layers = len(base_model.layers)
259 | print(base_model.layers[nb_layers - 2].name)
260 | print(base_model.layers[nb_layers - 1].name)
261 | 
262 | 
263 | # Notice some of the layers in the summary like `Conv2D` and `DepthwiseConv2D` and how they follow the progression of expansion to depthwise convolution to projection. In combination with BatchNormalization and ReLU, these make up the bottleneck layers mentioned earlier.
264 | 
265 | # <font color='blue'>
266 | # 
267 | # **What you should remember**:
268 | # 
269 | # * MobileNetV2's unique features are: 
270 | #   * Depthwise separable convolutions that provide lightweight feature filtering and creation
271 | #   * Input and output bottlenecks that preserve important information on either end of the block
272 | # * Depthwise separable convolutions deal with both spatial and depth (number of channels) dimensions
273 | 
274 | # Next, choose the first batch from the tensorflow dataset to use the images, and run it through the MobileNetV2 base model to test out the predictions on some of your images. 
275 | 
276 | # In[15]:
277 | 
278 | 
279 | image_batch, label_batch = next(iter(train_dataset))
280 | feature_batch = base_model(image_batch)
281 | print(feature_batch.shape)
282 | 
283 | 
284 | # In[16]:
285 | 
286 | 
287 | #Shows the different label probabilities in one tensor 
288 | label_batch
289 | 
290 | 
291 | # Now decode the predictions made by the model. Earlier, when you printed the shape of the batch, it would have returned (32, 1000). The number 32 refers to the batch size and 1000 refers to the 1000 classes the model was pretrained on. The predictions returned by the base model below follow this format:
292 | # 
293 | # First the class number, then a human-readable label, and last the probability of the image belonging to that class. You'll notice that there are two of these returned for each image in the batch - these the top two probabilities returned for that image.
294 | 
295 | # In[17]:
296 | 
297 | 
298 | base_model.trainable = False
299 | image_var = tf.Variable(preprocess_input(image_batch))
300 | pred = base_model(image_var)
301 | 
302 | tf.keras.applications.mobilenet_v2.decode_predictions(pred.numpy(), top=2)
303 | 
304 | 
305 | # Uh-oh. There's a whole lot of labels here, some of them hilariously wrong, but none of them say "alpaca."
306 | # 
307 | # This is because MobileNet pretrained over ImageNet doesn't have the correct labels for alpacas, so when you use the full model, all you get is a bunch of incorrectly classified images.
308 | # 
309 | # Fortunately, you can delete the top layer, which contains all the classification labels, and create a new classification layer.
310 | 
311 | # <a name='3-2'></a>
312 | # ### 3.2 - Layer Freezing with the Functional API
313 | # 
314 | # <img src="images/snowalpaca.png" style="width:400px;height:250px;">
315 | # 
316 | # In the next sections, you'll see how you can use a pretrained model to modify the classifier task so that it's able to recognize alpacas. You can achieve this in three steps: 
317 | # 
318 | # 1. Delete the top layer (the classification layer)
319 | #     * Set `include_top` in `base_model` as False
320 | # 2. Add a new classifier layer
321 | #     * Train only one layer by freezing the rest of the network
322 | #     * As mentioned before, a single neuron is enough to solve a binary classification problem.
323 | # 3. Freeze the base model and train the newly-created classifier layer
324 | #     * Set `base model.trainable=False` to avoid changing the weights and train *only* the new layer
325 | #     * Set training in `base_model` to False to avoid keeping track of statistics in the batch norm layer
326 | 
327 | # <a name='ex-2'></a>
328 | # ### Exercise 2 - alpaca_model
329 | 
330 | # In[18]:
331 | 
332 | 
333 | # UNQ_C2
334 | # GRADED FUNCTION
335 | def alpaca_model(image_shape=IMG_SIZE, data_augmentation=data_augmenter()):
336 |     ''' Define a tf.keras model for binary classification out of the MobileNetV2 model
337 |     Arguments:
338 |         image_shape -- Image width and height
339 |         data_augmentation -- data augmentation function
340 |     Returns:
341 |     Returns:
342 |         tf.keras.model
343 |     '''
344 |     
345 |     
346 |     input_shape = image_shape + (3,)
347 |     
348 |     ### START CODE HERE
349 |     
350 |     base_model = tf.keras.applications.MobileNetV2(input_shape=input_shape,
351 |                                                    include_top=False, # <== Important!!!!
352 |                                                    weights='imagenet') # From imageNet
353 |     # Freeze the base model by making it non trainable
354 |     base_model.trainable = False
355 | 
356 |     # create the input layer (Same as the imageNetv2 input size)
357 |     inputs = tf.keras.Input(shape=input_shape)
358 |     
359 |     # apply data augmentation to the inputs
360 |     x = data_augmentation(inputs)
361 |     
362 |     # data preprocessing using the same weights the model was trained on
363 |     x = tf.keras.applications.mobilenet_v2.preprocess_input(x)
364 |     
365 |     # set training to False to avoid keeping track of statistics in the batch norm layer
366 |     x = base_model(x, training=False) 
367 |     
368 |     # Add the new Binary classification layers
369 |     # use global avg pooling to summarize the info in each channel
370 |     x = tfl.GlobalAveragePooling2D()(x) 
371 |     #include dropout with probability of 0.2 to avoid overfitting
372 |     x = tfl.Dropout(rate=0.2)(x)
373 |         
374 |     # create a prediction layer with one neuron (as a classifier only needs one)
375 |     prediction_layer = tfl.Dense(1)
376 |     
377 |     ### END CODE HERE
378 |     
379 |     outputs = prediction_layer(x) 
380 |     model = tf.keras.Model(inputs, outputs)
381 |     
382 |     
383 |     return model
384 | 
385 | 
386 | # Create your new model using the data_augmentation function defined earlier.
387 | 
388 | # In[19]:
389 | 
390 | 
391 | model2 = alpaca_model(IMG_SIZE, data_augmentation)
392 | 
393 | 
394 | # In[20]:
395 | 
396 | 
397 | from test_utils import summary, comparator
398 | 
399 | alpaca_summary = [['InputLayer', [(None, 160, 160, 3)], 0],
400 |                     ['Sequential', (None, 160, 160, 3), 0],
401 |                     ['TensorFlowOpLayer', [(None, 160, 160, 3)], 0],
402 |                     ['TensorFlowOpLayer', [(None, 160, 160, 3)], 0],
403 |                     ['Functional', (None, 5, 5, 1280), 2257984],
404 |                     ['GlobalAveragePooling2D', (None, 1280), 0],
405 |                     ['Dropout', (None, 1280), 0, 0.2],
406 |                     ['Dense', (None, 1), 1281, 'linear']] #linear is the default activation
407 | 
408 | comparator(summary(model2), alpaca_summary)
409 | 
410 | for layer in summary(model2):
411 |     print(layer)
412 |     
413 | 
414 | 
415 | # The base learning rate has been set for you, so you can go ahead and compile the new model and run it for 5 epochs:
416 | 
417 | # In[21]:
418 | 
419 | 
420 | base_learning_rate = 0.001
421 | model2.compile(optimizer=tf.keras.optimizers.Adam(lr=base_learning_rate),
422 |               loss=tf.keras.losses.BinaryCrossentropy(from_logits=True),
423 |               metrics=['accuracy'])
424 | 
425 | 
426 | # In[22]:
427 | 
428 | 
429 | initial_epochs = 5
430 | history = model2.fit(train_dataset, validation_data=validation_dataset, epochs=initial_epochs)
431 | 
432 | 
433 | # Plot the training and validation accuracy:
434 | 
435 | # In[23]:
436 | 
437 | 
438 | acc = [0.] + history.history['accuracy']
439 | val_acc = [0.] + history.history['val_accuracy']
440 | 
441 | loss = history.history['loss']
442 | val_loss = history.history['val_loss']
443 | 
444 | plt.figure(figsize=(8, 8))
445 | plt.subplot(2, 1, 1)
446 | plt.plot(acc, label='Training Accuracy')
447 | plt.plot(val_acc, label='Validation Accuracy')
448 | plt.legend(loc='lower right')
449 | plt.ylabel('Accuracy')
450 | plt.ylim([min(plt.ylim()),1])
451 | plt.title('Training and Validation Accuracy')
452 | 
453 | plt.subplot(2, 1, 2)
454 | plt.plot(loss, label='Training Loss')
455 | plt.plot(val_loss, label='Validation Loss')
456 | plt.legend(loc='upper right')
457 | plt.ylabel('Cross Entropy')
458 | plt.ylim([0,1.0])
459 | plt.title('Training and Validation Loss')
460 | plt.xlabel('epoch')
461 | plt.show()
462 | 
463 | 
464 | # In[24]:
465 | 
466 | 
467 | class_names
468 | 
469 | 
470 | # The results are ok, but could be better. Next, try some fine-tuning.
471 | 
472 | # <a name='3-3'></a>
473 | # ### 3.3 - Fine-tuning the Model
474 | # 
475 | # You could try fine-tuning the model by re-running the optimizer in the last layers to improve accuracy. When you use a smaller learning rate, you take smaller steps to adapt it a little more closely to the new data. In transfer learning, the way you achieve this is by unfreezing the layers at the end of the network, and then re-training your model on the final layers with a very low learning rate. Adapting your learning rate to go over these layers in smaller steps can yield more fine details - and higher accuracy.
476 | # 
477 | # The intuition for what's happening: when the network is in its earlier stages, it trains on low-level features, like edges. In the later layers, more complex, high-level features like wispy hair or pointy ears begin to emerge. For transfer learning, the low-level features can be kept the same, as they have common features for most images. When you add new data, you generally want the high-level features to adapt to it, which is rather like letting the network learn to detect features more related to your data, such as soft fur or big teeth. 
478 | # 
479 | # To achieve this, just unfreeze the final layers and re-run the optimizer with a smaller learning rate, while keeping all the other layers frozen.
480 | # 
481 | # Where the final layers actually begin is a bit arbitrary, so feel free to play around with this number a bit. The important takeaway is that the later layers are the part of your network that contain the fine details (pointy ears, hairy tails) that are more specific to your problem.
482 | # 
483 | # First, unfreeze the base model by setting `base_model.trainable=True`, set a layer to fine-tune from, then re-freeze all the layers before it. Run it again for another few epochs, and see if your accuracy improved!
484 | 
485 | # <a name='ex-3'></a>
486 | # ### Exercise 3
487 | 
488 | # In[27]:
489 | 
490 | 
491 | # UNQ_C3
492 | base_model = model2.layers[4]
493 | base_model.trainable = True
494 | # Let's take a look to see how many layers are in the base model
495 | print("Number of layers in the base model: ", len(base_model.layers))
496 | 
497 | # Fine-tune from this layer onwards
498 | fine_tune_at = 120
499 | 
500 | ### START CODE HERE
501 | 
502 | # Freeze all the layers before the `fine_tune_at` layer
503 | for layer in base_model.layers[:fine_tune_at]:
504 |     layer.trainable = True
505 |     
506 | # Define a BinaryCrossentropy loss function. Use from_logits=True
507 | loss_function=tf.python.keras.losses.BinaryCrossentropy(from_logits=True)
508 | # Define an Adam optimizer with a learning rate of 0.1 * base_learning_rate
509 | optimizer = tf.keras.optimizers.Adam(lr=base_learning_rate*0.1)
510 | # Use accuracy as evaluation metric
511 | metrics=['accuracy']
512 | ### END CODE HERE
513 | 
514 | model2.compile(loss=loss_function,
515 |               optimizer = optimizer,
516 |               metrics=metrics)
517 | 
518 | 
519 | # In[28]:
520 | 
521 | 
522 | assert type(loss_function) == tf.python.keras.losses.BinaryCrossentropy, "Not the correct layer"
523 | assert loss_function.from_logits, "Use from_logits=True"
524 | assert type(optimizer) == tf.keras.optimizers.Adam, "This is not an Adam optimizer"
525 | assert optimizer.lr == base_learning_rate / 10, "Wrong learning rate"
526 | assert metrics[0] == 'accuracy', "Wrong metric"
527 | 
528 | print('\033[92mAll tests passed!')
529 | 
530 | 
531 | # In[ ]:
532 | 
533 | 
534 | fine_tune_epochs = 5
535 | total_epochs =  initial_epochs + fine_tune_epochs
536 | 
537 | history_fine = model2.fit(train_dataset,
538 |                          epochs=total_epochs,
539 |                          initial_epoch=history.epoch[-1],
540 |                          validation_data=validation_dataset)
541 | 
542 | 
543 | # Ahhh, quite an improvement! A little fine-tuning can really go a long way.
544 | 
545 | # In[ ]:
546 | 
547 | 
548 | acc += history_fine.history['accuracy']
549 | val_acc += history_fine.history['val_accuracy']
550 | 
551 | loss += history_fine.history['loss']
552 | val_loss += history_fine.history['val_loss']
553 | 
554 | 
555 | # In[ ]:
556 | 
557 | 
558 | plt.figure(figsize=(8, 8))
559 | plt.subplot(2, 1, 1)
560 | plt.plot(acc, label='Training Accuracy')
561 | plt.plot(val_acc, label='Validation Accuracy')
562 | plt.ylim([0, 1])
563 | plt.plot([initial_epochs-1,initial_epochs-1],
564 |           plt.ylim(), label='Start Fine Tuning')
565 | plt.legend(loc='lower right')
566 | plt.title('Training and Validation Accuracy')
567 | 
568 | plt.subplot(2, 1, 2)
569 | plt.plot(loss, label='Training Loss')
570 | plt.plot(val_loss, label='Validation Loss')
571 | plt.ylim([0, 1.0])
572 | plt.plot([initial_epochs-1,initial_epochs-1],
573 |          plt.ylim(), label='Start Fine Tuning')
574 | plt.legend(loc='upper right')
575 | plt.title('Training and Validation Loss')
576 | plt.xlabel('epoch')
577 | plt.show()
578 | 
579 | 
580 | # <font color='blue'>
581 | # 
582 | # **What you should remember**:
583 | # 
584 | # * To adapt the classifier to new data: Delete the top layer, add a new classification layer, and train only on that layer
585 | # * When freezing layers, avoid keeping track of statistics (like in the batch normalization layer)
586 | # * Fine-tune the final layers of your model to capture high-level details near the end of the network and potentially improve accuracy 
587 | 
588 | # ## Congratulations!
589 | # 
590 | # You've completed this assignment on transfer learning and fine-tuning. Here's a quick recap of all you just accomplished:
591 | # 
592 | # * Created a dataset from a directory
593 | # * Augmented data with the Sequential API
594 | # * Adapted a pretrained model to new data with the Functional API and MobileNetV2
595 | # * Fine-tuned the classifier's final layers and boosted the model's accuracy
596 | # 
597 | # That's awesome! 
598 | 


--------------------------------------------------------------------------------
/Course 4-ConvolutionalNeuralNetworks/Week 2/Week 2 Deep Convolutional Models.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/TheKidPadra/DeepLearning.AI-Deep-Learning-Specialization/0a951ae6671f739e86b9a2846c821df1a0215ef6/Course 4-ConvolutionalNeuralNetworks/Week 2/Week 2 Deep Convolutional Models.pdf


--------------------------------------------------------------------------------
/Course 4-ConvolutionalNeuralNetworks/Week 3/Week 3 Detection Algorithms.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/TheKidPadra/DeepLearning.AI-Deep-Learning-Specialization/0a951ae6671f739e86b9a2846c821df1a0215ef6/Course 4-ConvolutionalNeuralNetworks/Week 3/Week 3 Detection Algorithms.pdf


--------------------------------------------------------------------------------
/Course 4-ConvolutionalNeuralNetworks/Week 4/Week 4 Special Applications Face Recognition and Neural Style Transfer.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/TheKidPadra/DeepLearning.AI-Deep-Learning-Specialization/0a951ae6671f739e86b9a2846c821df1a0215ef6/Course 4-ConvolutionalNeuralNetworks/Week 4/Week 4 Special Applications Face Recognition and Neural Style Transfer.pdf


--------------------------------------------------------------------------------
/Course 5-SequenceModels/Week 1/Week 1 Recurrent Neural Networks.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/TheKidPadra/DeepLearning.AI-Deep-Learning-Specialization/0a951ae6671f739e86b9a2846c821df1a0215ef6/Course 5-SequenceModels/Week 1/Week 1 Recurrent Neural Networks.pdf


--------------------------------------------------------------------------------
/Course 5-SequenceModels/Week 2/Operations_on_word_vectors_v2a.py:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/env python
  2 | # coding: utf-8
  3 | 
  4 | # # Operations on Word Vectors
  5 | # 
  6 | # Welcome to your first assignment of Week 2, Course 5 of the Deep Learning Specialization! 
  7 | # 
  8 | # Because word embeddings are very computationally expensive to train, most ML practitioners will load a pre-trained set of embeddings. In this notebook you'll try your hand at loading, measuring similarity between, and modifying pre-trained embeddings. 
  9 | # 
 10 | # **After this assignment you'll be able to**:
 11 | # 
 12 | # * Explain how word embeddings capture relationships between words
 13 | # * Load pre-trained word vectors
 14 | # * Measure similarity between word vectors using cosine similarity
 15 | # * Use word embeddings to solve word analogy problems such as Man is to Woman as King is to ______.  
 16 | # 
 17 | # At the end of this notebook you'll have a chance to try an optional exercise, where you'll modify word embeddings to reduce their gender bias. Reducing bias is an important consideration in ML, so you're encouraged to take this challenge!  
 18 | # 
 19 | # ## Important Note on Submission to the AutoGrader
 20 | # 
 21 | # Before submitting your assignment to the AutoGrader, please make sure you are not doing the following:
 22 | # 
 23 | # 1. You have not added any _extra_ `print` statement(s) in the assignment.
 24 | # 2. You have not added any _extra_ code cell(s) in the assignment.
 25 | # 3. You have not changed any of the function parameters.
 26 | # 4. You are not using any global variables inside your graded exercises. Unless specifically instructed to do so, please refrain from it and use the local variables instead.
 27 | # 5. You are not changing the assignment code where it is not required, like creating _extra_ variables.
 28 | # 
 29 | # If you do any of the following, you will get something like, `Grader Error: Grader feedback not found` (or similarly unexpected) error upon submitting your assignment. Before asking for help/debugging the errors in your assignment, check for these first. If this is the case, and you don't remember the changes you have made, you can get a fresh copy of the assignment by following these [instructions](https://www.coursera.org/learn/nlp-sequence-models/supplement/qHIve/h-ow-to-refresh-your-workspace).
 30 | 
 31 | # ## Table of Contents
 32 | # 
 33 | # - [Packages](#0)
 34 | # - [1 - Load the Word Vectors](#1)
 35 | # - [2 - Embedding Vectors Versus One-Hot Vectors](#2)
 36 | # - [3 - Cosine Similarity](#3)
 37 | #     - [Exercise 1 - cosine_similarity](#ex-1)
 38 | # - [4 - Word Analogy Task](#4)
 39 | #     - [Exercise 2 - complete_analogy](#ex-2)
 40 | # - [5 - Debiasing Word Vectors (OPTIONAL/UNGRADED)](#5)
 41 | #     - [5.1 - Neutralize Bias for Non-Gender Specific Words](#5-1)
 42 | #         - [Exercise 3 - neutralize](#ex-3)
 43 | #     - [5.2 - Equalization Algorithm for Gender-Specific Words](#5-2)
 44 | #         - [Exercise 4 - equalize](#ex-4)
 45 | # - [6 - References](#6)
 46 | 
 47 | # <a name='0'></a>
 48 | # ## Packages
 49 | # 
 50 | # Let's get started! Run the following cell to load the packages you'll need.
 51 | 
 52 | # In[1]:
 53 | 
 54 | 
 55 | import numpy as np
 56 | from w2v_utils import *
 57 | 
 58 | 
 59 | # <a name='1'></a>
 60 | # ## 1 - Load the Word Vectors
 61 | # 
 62 | # For this assignment, you'll use 50-dimensional GloVe vectors to represent words. 
 63 | # Run the following cell to load the `word_to_vec_map`. 
 64 | 
 65 | # In[2]:
 66 | 
 67 | 
 68 | words, word_to_vec_map = read_glove_vecs('data/glove.6B.50d.txt')
 69 | 
 70 | 
 71 | # You've loaded:
 72 | # - `words`: set of words in the vocabulary.
 73 | # - `word_to_vec_map`: dictionary mapping words to their GloVe vector representation.
 74 | # 
 75 | # <a name='2'></a>
 76 | # ## 2 - Embedding Vectors Versus One-Hot Vectors
 77 | # Recall from the lesson videos that one-hot vectors don't do a good job of capturing the level of similarity between words. This is because every one-hot vector has the same Euclidean distance from any other one-hot vector.
 78 | # 
 79 | # Embedding vectors, such as GloVe vectors, provide much more useful information about the meaning of individual words.  
 80 | # Now, see how you can use GloVe vectors to measure the similarity between two words! 
 81 | 
 82 | # <a name='3'></a>
 83 | # ## 3 - Cosine Similarity
 84 | # 
 85 | # To measure the similarity between two words, you need a way to measure the degree of similarity between two embedding vectors for the two words. Given two vectors $u$ and $v$, cosine similarity is defined as follows: 
 86 | # 
 87 | # $$\text{CosineSimilarity(u, v)} = \frac {u \cdot v} {||u||_2 ||v||_2} = cos(\theta) \tag{1}$$
 88 | # 
 89 | # * $u \cdot v$ is the dot product (or inner product) of two vectors
 90 | # * $||u||_2$ is the norm (or length) of the vector $u$
 91 | # * $\theta$ is the angle between $u$ and $v$. 
 92 | # * The cosine similarity depends on the angle between $u$ and $v$. 
 93 | #     * If $u$ and $v$ are very similar, their cosine similarity will be close to 1.
 94 | #     * If they are dissimilar, the cosine similarity will take a smaller value. 
 95 | # 
 96 | # <img src="images/cosine_sim.png" style="width:800px;height:250px;">
 97 | # <caption><center><font color='purple'><b>Figure 1</b>: The cosine of the angle between two vectors is a measure of their similarity.</font></center></caption>
 98 | # 
 99 | # <a name='ex-1'></a>
100 | # ### Exercise 1 - cosine_similarity
101 | # 
102 | # Implement the function `cosine_similarity()` to evaluate the similarity between word vectors.
103 | # 
104 | # **Reminder**: The norm of $u$ is defined as $ ||u||_2 = \sqrt{\sum_{i=1}^{n} u_i^2}$
105 | # 
106 | # #### Additional Hints
107 | # * You may find [np.dot](https://numpy.org/doc/stable/reference/generated/numpy.dot.html), [np.sum](https://numpy.org/doc/stable/reference/generated/numpy.sum.html), or [np.sqrt](https://numpy.org/doc/stable/reference/generated/numpy.sqrt.html) useful depending upon the implementation that you choose.
108 | 
109 | # In[6]:
110 | 
111 | 
112 | # UNQ_C1 (UNIQUE CELL IDENTIFIER, DO NOT EDIT)
113 | # GRADED FUNCTION: cosine_similarity
114 | 
115 | def cosine_similarity(u, v):
116 |     """
117 |     Cosine similarity reflects the degree of similarity between u and v
118 |         
119 |     Arguments:
120 |         u -- a word vector of shape (n,)          
121 |         v -- a word vector of shape (n,)
122 | 
123 |     Returns:
124 |         cosine_similarity -- the cosine similarity between u and v defined by the formula above.
125 |     """
126 |     
127 |     # Special case. Consider the case u = [0, 0], v=[0, 0]
128 |     if np.all(u == v):
129 |         return 1
130 |     
131 |     ### START CODE HERE ###
132 |     # Compute the dot product between u and v (≈1 line)
133 |     dot = np.dot(u, v)
134 |     
135 |     # Compute the L2 norm of u (≈1 line)
136 |     norm_u = np.sqrt(np.dot(u, u))
137 |     
138 |     # Compute the L2 norm of v (≈1 line)
139 |     norm_v = np.sqrt(np.dot(v, v))
140 |     
141 |     # Avoid division by 0
142 |     if np.isclose(norm_u * norm_v, 0, atol=1e-32):
143 |         return 0
144 |     
145 |     # Compute the cosine similarity defined by formula (1) (≈1 line)
146 |     cosine_similarity = dot / norm_u / norm_v
147 |     ### END CODE HERE ###
148 |     
149 |     return cosine_similarity
150 | 
151 | 
152 | # In[7]:
153 | 
154 | 
155 | # START SKIP FOR GRADING
156 | father = word_to_vec_map["father"]
157 | mother = word_to_vec_map["mother"]
158 | ball = word_to_vec_map["ball"]
159 | crocodile = word_to_vec_map["crocodile"]
160 | france = word_to_vec_map["france"]
161 | italy = word_to_vec_map["italy"]
162 | paris = word_to_vec_map["paris"]
163 | rome = word_to_vec_map["rome"]
164 | 
165 | print("cosine_similarity(father, mother) = ", cosine_similarity(father, mother))
166 | print("cosine_similarity(ball, crocodile) = ",cosine_similarity(ball, crocodile))
167 | print("cosine_similarity(france - paris, rome - italy) = ",cosine_similarity(france - paris, rome - italy))
168 | # END SKIP FOR GRADING
169 | 
170 | # PUBLIC TESTS
171 | def cosine_similarity_test(target):
172 |     a = np.random.uniform(-10, 10, 10)
173 |     b = np.random.uniform(-10, 10, 10)
174 |     c = np.random.uniform(-1, 1, 23)
175 |         
176 |     assert np.isclose(cosine_similarity(a, a), 1), "cosine_similarity(a, a) must be 1"
177 |     assert np.isclose(cosine_similarity((c >= 0) * 1, (c < 0) * 1), 0), "cosine_similarity(a, not(a)) must be 0"
178 |     assert np.isclose(cosine_similarity(a, -a), -1), "cosine_similarity(a, -a) must be -1"
179 |     assert np.isclose(cosine_similarity(a, b), cosine_similarity(a * 2, b * 4)), "cosine_similarity must be scale-independent. You must divide by the product of the norms of each input"
180 | 
181 |     print("\033[92mAll test passed!")
182 |     
183 | cosine_similarity_test(cosine_similarity)
184 | 
185 | 
186 | # #### Try different words!
187 | # 
188 | # After you get the correct expected output, please feel free to modify the inputs and measure the cosine similarity between other pairs of words! Playing around with the cosine similarity of other inputs will give you a better sense of how word vectors behave.
189 | 
190 | # <a name='4'></a>
191 | # ## 4 - Word Analogy Task
192 | # 
193 | # * In the word analogy task, complete this sentence:  
194 | #     <font color='brown'>"*a* is to *b* as *c* is to **____**"</font>. 
195 | # 
196 | # * An example is:  
197 | #     <font color='brown'> '*man* is to *woman* as *king* is to *queen*' </font>. 
198 | # 
199 | # * You're trying to find a word *d*, such that the associated word vectors $e_a, e_b, e_c, e_d$ are related in the following manner:   
200 | #     $e_b - e_a \approx e_d - e_c$
201 | # * Measure the similarity between $e_b - e_a$ and $e_d - e_c$ using cosine similarity. 
202 | # 
203 | # <a name='ex-2'></a>
204 | # ### Exercise 2 - complete_analogy
205 | # 
206 | # Complete the code below to perform word analogies!
207 | 
208 | # In[8]:
209 | 
210 | 
211 | # UNQ_C2 (UNIQUE CELL IDENTIFIER, DO NOT EDIT)
212 | # GRADED FUNCTION: complete_analogy
213 | 
214 | def complete_analogy(word_a, word_b, word_c, word_to_vec_map):
215 |     """
216 |     Performs the word analogy task as explained above: a is to b as c is to ____. 
217 |     
218 |     Arguments:
219 |     word_a -- a word, string
220 |     word_b -- a word, string
221 |     word_c -- a word, string
222 |     word_to_vec_map -- dictionary that maps words to their corresponding vectors. 
223 |     
224 |     Returns:
225 |     best_word --  the word such that v_b - v_a is close to v_best_word - v_c, as measured by cosine similarity
226 |     """
227 |     
228 |     # convert words to lowercase
229 |     word_a, word_b, word_c = word_a.lower(), word_b.lower(), word_c.lower()
230 |     
231 |     ### START CODE HERE ###
232 |     # Get the word embeddings e_a, e_b and e_c (≈1-3 lines)
233 |     e_a, e_b, e_c = word_to_vec_map[word_a], word_to_vec_map[word_b], word_to_vec_map[word_c]
234 |     ### END CODE HERE ###
235 |     
236 |     words = word_to_vec_map.keys()
237 |     max_cosine_sim = -100              # Initialize max_cosine_sim to a large negative number
238 |     best_word = None                   # Initialize best_word with None, it will help keep track of the word to output
239 |     
240 |     # loop over the whole word vector set
241 |     for w in words:   
242 |         # to avoid best_word being one the input words, skip the input word_c
243 |         # skip word_c from query
244 |         if w == word_c:
245 |             continue
246 |         
247 |         ### START CODE HERE ###
248 |         # Compute cosine similarity between the vector (e_b - e_a) and the vector ((w's vector representation) - e_c)  (≈1 line)
249 |         cosine_sim = cosine_similarity((e_b - e_a), (word_to_vec_map[w] - e_c))
250 |         
251 |         # If the cosine_sim is more than the max_cosine_sim seen so far,
252 |             # then: set the new max_cosine_sim to the current cosine_sim and the best_word to the current word (≈3 lines)
253 |         if cosine_sim > max_cosine_sim:
254 |             max_cosine_sim = cosine_sim
255 |             best_word = w
256 |         ### END CODE HERE ###
257 |         
258 |     return best_word
259 | 
260 | 
261 | # In[9]:
262 | 
263 | 
264 | # PUBLIC TEST
265 | def complete_analogy_test(target):
266 |     a = [3, 3] # Center at a
267 |     a_nw = [2, 4] # North-West oriented vector from a
268 |     a_s = [3, 2] # South oriented vector from a
269 |     
270 |     c = [-2, 1] # Center at c
271 |     # Create a controlled word to vec map
272 |     word_to_vec_map = {'a': a,
273 |                        'synonym_of_a': a,
274 |                        'a_nw': a_nw, 
275 |                        'a_s': a_s, 
276 |                        'c': c, 
277 |                        'c_n': [-2, 2], # N
278 |                        'c_ne': [-1, 2], # NE
279 |                        'c_e': [-1, 1], # E
280 |                        'c_se': [-1, 0], # SE
281 |                        'c_s': [-2, 0], # S
282 |                        'c_sw': [-3, 0], # SW
283 |                        'c_w': [-3, 1], # W
284 |                        'c_nw': [-3, 2] # NW
285 |                       }
286 |     
287 |     # Convert lists to np.arrays
288 |     for key in word_to_vec_map.keys():
289 |         word_to_vec_map[key] = np.array(word_to_vec_map[key])
290 |             
291 |     assert(target('a', 'a_nw', 'c', word_to_vec_map) == 'c_nw')
292 |     assert(target('a', 'a_s', 'c', word_to_vec_map) == 'c_s')
293 |     assert(target('a', 'synonym_of_a', 'c', word_to_vec_map) != 'c'), "Best word cannot be input query"
294 |     assert(target('a', 'c', 'a', word_to_vec_map) == 'c')
295 | 
296 |     print("\033[92mAll tests passed")
297 |     
298 | complete_analogy_test(complete_analogy)
299 | 
300 | 
301 | # Run the cell below to test your code. Patience, young grasshopper...this may take 1-2 minutes.
302 | 
303 | # In[10]:
304 | 
305 | 
306 | # START SKIP FOR GRADING
307 | triads_to_try = [('italy', 'italian', 'spain'), ('india', 'delhi', 'japan'), ('man', 'woman', 'boy'), ('small', 'smaller', 'large')]
308 | for triad in triads_to_try:
309 |     print ('{} -> {} :: {} -> {}'.format( *triad, complete_analogy(*triad, word_to_vec_map)))
310 | 
311 | # END SKIP FOR GRADING
312 | 
313 | 
314 | # Once you get the output, try modifying the input cells above to test your own analogies. 
315 | # 
316 | # **Hint**: Try to find some other analogy pairs that will work, along with some others where the algorithm doesn't give the right answer:
317 | #     * For example, you can try small->smaller as big->?
318 | 
319 | # ## Congratulations!
320 | # 
321 | # You've come to the end of the graded portion of the assignment. By now, you've: 
322 | # 
323 | # * Loaded some pre-trained word vectors
324 | # * Measured the similarity between word vectors using cosine similarity
325 | # * Used word embeddings to solve word analogy problems such as Man is to Woman as King is to __.
326 | # 
327 | # Cosine similarity is a relatively simple and intuitive, yet powerful, method you can use to capture nuanced relationships between words. These exercises should be helpful to you in explaining how it works, and applying it to your own projects!  
328 | 
329 | # <font color='blue'>
330 | #     <b>What you should remember</b>:
331 | # 
332 | # - Cosine similarity is a good way to compare the similarity between pairs of word vectors.
333 | #     - Note that L2 (Euclidean) distance also works.
334 | # - For NLP applications, using a pre-trained set of word vectors is often a great way to get started. </font>
335 | # 
336 | # Even though you've finished the graded portion, please take a look at the rest of this notebook to learn about debiasing word vectors.
337 | 
338 | # <a name='5'></a>
339 | # ## 5 - Debiasing Word Vectors (OPTIONAL/UNGRADED) 
340 | 
341 | # In the following exercise, you'll examine gender biases that can be reflected in a word embedding, and explore algorithms for reducing the bias. In addition to learning about the topic of debiasing, this exercise will also help hone your intuition about what word vectors are doing. This section involves a bit of linear algebra, though you can certainly complete it without being an expert! Go ahead and give it a shot. This portion of the notebook is optional and is not graded...so just have fun and explore.  
342 | # 
343 | # First, see how the GloVe word embeddings relate to gender. You'll begin by computing a vector $g = e_{woman}-e_{man}$, where $e_{woman}$ represents the word vector corresponding to the word *woman*, and $e_{man}$ corresponds to the word vector corresponding to the word *man*. The resulting vector $g$ roughly encodes the concept of "gender". 
344 | # 
345 | # You might get a more accurate representation if you compute $g_1 = e_{mother}-e_{father}$, $g_2 = e_{girl}-e_{boy}$, etc. and average over them, but just using $e_{woman}-e_{man}$ will give good enough results for now.
346 | # 
347 | 
348 | # In[ ]:
349 | 
350 | 
351 | g = word_to_vec_map['woman'] - word_to_vec_map['man']
352 | print(g)
353 | 
354 | 
355 | # Now, consider the cosine similarity of different words with $g$. What does a positive value of similarity mean, versus a negative cosine similarity? 
356 | 
357 | # In[ ]:
358 | 
359 | 
360 | print ('List of names and their similarities with constructed vector:')
361 | 
362 | # girls and boys name
363 | name_list = ['john', 'marie', 'sophie', 'ronaldo', 'priya', 'rahul', 'danielle', 'reza', 'katy', 'yasmin']
364 | 
365 | for w in name_list:
366 |     print (w, cosine_similarity(word_to_vec_map[w], g))
367 | 
368 | 
369 | # As you can see, female first names tend to have a positive cosine similarity with our constructed vector $g$, while male first names tend to have a negative cosine similarity. This is not surprising, and the result seems acceptable. 
370 | # 
371 | # Now try with some other words:
372 | 
373 | # In[ ]:
374 | 
375 | 
376 | print('Other words and their similarities:')
377 | word_list = ['lipstick', 'guns', 'science', 'arts', 'literature', 'warrior','doctor', 'tree', 'receptionist', 
378 |              'technology',  'fashion', 'teacher', 'engineer', 'pilot', 'computer', 'singer']
379 | for w in word_list:
380 |     print (w, cosine_similarity(word_to_vec_map[w], g))
381 | 
382 | 
383 | # Do you notice anything surprising? It is astonishing how these results reflect certain unhealthy gender stereotypes. For example, we see “computer” is negative and is closer in value to male first names, while “literature” is positive and is closer to female first names. Ouch! 
384 | # 
385 | # You'll see below how to reduce the bias of these vectors, using an algorithm due to [Boliukbasi et al., 2016](https://arxiv.org/abs/1607.06520). Note that some word pairs such as "actor"/"actress" or "grandmother"/"grandfather" should remain gender-specific, while other words such as "receptionist" or "technology" should be neutralized, i.e. not be gender-related. You'll have to treat these two types of words differently when debiasing.
386 | # 
387 | # <a name='5-1'></a>
388 | # ### 5.1 - Neutralize Bias for Non-Gender Specific Words 
389 | # 
390 | # The figure below should help you visualize what neutralizing does. If you're using a 50-dimensional word embedding, the 50 dimensional space can be split into two parts: The bias-direction $g$, and the remaining 49 dimensions, which is called $g_{\perp}$ here. In linear algebra, we say that the 49-dimensional $g_{\perp}$ is perpendicular (or "orthogonal") to $g$, meaning it is at 90 degrees to $g$. The neutralization step takes a vector such as $e_{receptionist}$ and zeros out the component in the direction of $g$, giving us $e_{receptionist}^{debiased}$. 
391 | # 
392 | # Even though $g_{\perp}$ is 49-dimensional, given the limitations of what you can draw on a 2D screen, it's illustrated using a 1-dimensional axis below. 
393 | # 
394 | # <img src="images/neutral.png" style="width:800px;height:300px;">
395 | # <caption><center><font color='purple'><b>Figure 2</b>: The word vector for "receptionist" represented before and after applying the neutralize operation.</font> </center></caption>
396 | # 
397 | # <a name='ex-3'></a>
398 | # ### Exercise 3 - neutralize
399 | # 
400 | # Implement `neutralize()` to remove the bias of words such as "receptionist" or "scientist."
401 | # 
402 | # Given an input embedding $e$, you can use the following formulas to compute $e^{debiased}$: 
403 | # 
404 | # $$e^{bias\_component} = \frac{e \cdot g}{||g||_2^2} * g\tag{2}$$
405 | # $$e^{debiased} = e - e^{bias\_component}\tag{3}$$
406 | # 
407 | # If you are an expert in linear algebra, you may recognize $e^{bias\_component}$ as the projection of $e$ onto the direction $g$. If you're not an expert in linear algebra, don't worry about this. ;)
408 | # 
409 | # **Note:** The [paper](https://papers.nips.cc/paper/6228-man-is-to-computer-programmer-as-woman-is-to-homemaker-debiasing-word-embeddings.pdf), which the debiasing algorithm is from, assumes all word vectors to have L2 norm as 1 and hence the need for the calculations below:
410 | 
411 | # In[ ]:
412 | 
413 | 
414 | # The paper assumes all word vectors to have L2 norm as 1 and hence the need for this calculation
415 | from tqdm import tqdm
416 | word_to_vec_map_unit_vectors = {
417 |     word: embedding / np.linalg.norm(embedding)
418 |     for word, embedding in tqdm(word_to_vec_map.items())
419 | }
420 | g_unit = word_to_vec_map_unit_vectors['woman'] - word_to_vec_map_unit_vectors['man']
421 | 
422 | 
423 | # In[ ]:
424 | 
425 | 
426 | def neutralize(word, g, word_to_vec_map):
427 |     """
428 |     Removes the bias of "word" by projecting it on the space orthogonal to the bias axis. 
429 |     This function ensures that gender neutral words are zero in the gender subspace.
430 |     
431 |     Arguments:
432 |         word -- string indicating the word to debias
433 |         g -- numpy-array of shape (50,), corresponding to the bias axis (such as gender)
434 |         word_to_vec_map -- dictionary mapping words to their corresponding vectors.
435 |     
436 |     Returns:
437 |         e_debiased -- neutralized word vector representation of the input "word"
438 |     """
439 |     
440 |     ### START CODE HERE ###
441 |     # Select word vector representation of "word". Use word_to_vec_map. (≈ 1 line)
442 |     e = word_to_vec_map[word]
443 |     
444 |     # Compute e_biascomponent using the formula given above. (≈ 1 line)
445 |     e_biascomponent = np.dot(e, g) / np.dot(g, g) * g
446 |  
447 |     # Neutralize e by subtracting e_biascomponent from it 
448 |     # e_debiased should be equal to its orthogonal projection. (≈ 1 line)
449 |     e_debiased = e - e_biascomponent
450 |     ### END CODE HERE ###
451 |     
452 |     return e_debiased
453 | 
454 | 
455 | # In[ ]:
456 | 
457 | 
458 | word = "receptionist"
459 | print("cosine similarity between " + word + " and g, before neutralizing: ", cosine_similarity(word_to_vec_map[word], g))
460 | 
461 | e_debiased = neutralize(word, g_unit, word_to_vec_map_unit_vectors)
462 | print("cosine similarity between " + word + " and g_unit, after neutralizing: ", cosine_similarity(e_debiased, g_unit))
463 | 
464 | 
465 | # **Expected Output**: The second result is essentially 0, up to numerical rounding (on the order of $10^{-17}$).
466 | # 
467 | # 
468 | # <table>
469 | #     <tr>
470 | #         <td>
471 | #             <b>cosine similarity between receptionist and g, before neutralizing:</b> :
472 | #         </td>
473 | #         <td>
474 | #          0.3307794175059374
475 | #         </td>
476 | #     </tr>
477 | #         <tr>
478 | #         <td>
479 | #             <b>cosine similarity between receptionist and g_unit, after neutralizing</b> :
480 | #         </td>
481 | #         <td>
482 | #          3.5723165491646677e-17
483 | #     </tr>
484 | # </table>
485 | 
486 | # <a name='5-2'></a>
487 | # ### 5.2 - Equalization Algorithm for Gender-Specific Words
488 | # 
489 | # Next, let's see how debiasing can also be applied to word pairs such as "actress" and "actor." Equalization is applied to pairs of words that you might want to have differ only through the gender property. As a concrete example, suppose that "actress" is closer to "babysit" than "actor." By applying neutralization to "babysit," you can reduce the gender stereotype associated with babysitting. But this still does not guarantee that "actor" and "actress" are equidistant from "babysit." The equalization algorithm takes care of this. 
490 | # 
491 | # The key idea behind equalization is to make sure that a particular pair of words are equidistant from the 49-dimensional $g_\perp$. The equalization step also ensures that the two equalized steps are now the same distance from $e_{receptionist}^{debiased}$, or from any other work that has been neutralized. Visually, this is how equalization works: 
492 | # 
493 | # <img src="images/equalize10.png" style="width:800px;height:400px;">
494 | # 
495 | # 
496 | # The derivation of the linear algebra to do this is a bit more complex. (See Bolukbasi et al., 2016 in the References for details.) Here are the key equations: 
497 | # 
498 | # 
499 | # $$ \mu = \frac{e_{w1} + e_{w2}}{2}\tag{4}$$ 
500 | # 
501 | # $$ \mu_{B} = \frac {\mu \cdot \text{bias_axis}}{||\text{bias_axis}||_2^2} *\text{bias_axis}
502 | # \tag{5}$$ 
503 | # 
504 | # $$\mu_{\perp} = \mu - \mu_{B} \tag{6}$$
505 | # 
506 | # $$ e_{w1B} = \frac {e_{w1} \cdot \text{bias_axis}}{||\text{bias_axis}||_2^2} *\text{bias_axis}
507 | # \tag{7}$$ 
508 | # $$ e_{w2B} = \frac {e_{w2} \cdot \text{bias_axis}}{||\text{bias_axis}||_2^2} *\text{bias_axis}
509 | # \tag{8}$$
510 | # 
511 | # 
512 | # $$e_{w1B}^{corrected} = \sqrt{{1 - ||\mu_{\perp} ||^2_2}} * \frac{e_{\text{w1B}} - \mu_B} {||e_{w1} - \mu_B||_2} \tag{9}$$
513 | # 
514 | # 
515 | # $$e_{w2B}^{corrected} = \sqrt{{1 - ||\mu_{\perp} ||^2_2}} * \frac{e_{\text{w2B}} - \mu_B} {||e_{w2} - \mu_B||_2} \tag{10}$$
516 | # 
517 | # $$e_1 = e_{w1B}^{corrected} + \mu_{\perp} \tag{11}$$
518 | # $$e_2 = e_{w2B}^{corrected} + \mu_{\perp} \tag{12}$$
519 | # 
520 | # 
521 | # <a name='ex-4'></a>
522 | # ### Exercise 4 - equalize
523 | # 
524 | # Implement the `equalize()` function below. 
525 | # 
526 | # Use the equations above to get the final equalized version of the pair of words. Good luck!
527 | # 
528 | # **Hint**
529 | # - Use [np.linalg.norm](https://numpy.org/doc/stable/reference/generated/numpy.linalg.norm.html)
530 | 
531 | # In[ ]:
532 | 
533 | 
534 | def equalize(pair, bias_axis, word_to_vec_map):
535 |     """
536 |     Debias gender specific words by following the equalize method described in the figure above.
537 |     
538 |     Arguments:
539 |     pair -- pair of strings of gender specific words to debias, e.g. ("actress", "actor") 
540 |     bias_axis -- numpy-array of shape (50,), vector corresponding to the bias axis, e.g. gender
541 |     word_to_vec_map -- dictionary mapping words to their corresponding vectors
542 |     
543 |     Returns
544 |     e_1 -- word vector corresponding to the first word
545 |     e_2 -- word vector corresponding to the second word
546 |     """
547 |     
548 |     ### START CODE HERE ###
549 |     # Step 1: Select word vector representation of "word". Use word_to_vec_map. (≈ 2 lines)
550 |     w1, w2 = pair[0], pair[1]
551 |     e_w1, e_w2 = word_to_vec_map[w1], word_to_vec_map[w2]
552 |     
553 |     # Step 2: Compute the mean of e_w1 and e_w2 (≈ 1 line)
554 |     mu = (e_w1 + e_w2) / 2
555 | 
556 |     # Step 3: Compute the projections of mu over the bias axis and the orthogonal axis (≈ 2 lines)
557 |     mu_B = np.dot(mu, bias_axis) / np.dot(bias_axis, bias_axis) * bias_axis
558 |     mu_orth = mu - mu_B
559 | 
560 |     # Step 4: Use equations (7) and (8) to compute e_w1B and e_w2B (≈2 lines)
561 |     e_w1B = np.dot(e_w1, bias_axis) / np.dot(bias_axis, bias_axis) * bias_axis
562 |     e_w2B = np.dot(e_w2, bias_axis) / np.dot(bias_axis, bias_axis) * bias_axis
563 |         
564 |     # Step 5: Adjust the Bias part of e_w1B and e_w2B using the formulas (9) and (10) given above (≈2 lines)
565 |     corrected_e_w1B = np.sqrt(abs(1 - np.dot(mu_orth, mu_orth))) * (e_w1B - mu_B) / abs((e_w1 - mu_orth) - mu_B)
566 |     corrected_e_w2B = np.sqrt(abs(1 - np.dot(mu_orth, mu_orth))) * (e_w2B - mu_B) / abs((e_w2 - mu_orth) - mu_B)
567 | 
568 |     # Step 6: Debias by equalizing e1 and e2 to the sum of their corrected projections (≈2 lines)
569 |     e1 = corrected_e_w1B + mu_orth
570 |     e2 = corrected_e_w2B + mu_orth
571 |                                                                 
572 |     ### END CODE HERE ###
573 |     
574 |     return e1, e2
575 | 
576 | 
577 | # In[ ]:
578 | 
579 | 
580 | print("cosine similarities before equalizing:")
581 | print("cosine_similarity(word_to_vec_map[\"man\"], gender) = ", cosine_similarity(word_to_vec_map["man"], g))
582 | print("cosine_similarity(word_to_vec_map[\"woman\"], gender) = ", cosine_similarity(word_to_vec_map["woman"], g))
583 | print()
584 | e1, e2 = equalize(("man", "woman"), g_unit, word_to_vec_map_unit_vectors)
585 | print("cosine similarities after equalizing:")
586 | print("cosine_similarity(e1, gender) = ", cosine_similarity(e1, g_unit))
587 | print("cosine_similarity(e2, gender) = ", cosine_similarity(e2, g_unit))
588 | 
589 | 
590 | # **Expected Output**:
591 | # 
592 | # cosine similarities before equalizing:
593 | # <table>
594 | #     <tr>
595 | #         <td>
596 | #             <b>cosine_similarity(word_to_vec_map["man"], gender)</b> =
597 | #         </td>
598 | #         <td>
599 | #          -0.117110957653
600 | #         </td>
601 | #     </tr>
602 | #         <tr>
603 | #         <td>
604 | #             <b>cosine_similarity(word_to_vec_map["woman"], gender)</b> =
605 | #         </td>
606 | #         <td>
607 | #          0.356666188463
608 | #         </td>
609 | #     </tr>
610 | # </table>
611 | # 
612 | # cosine similarities after equalizing:
613 | # <table>
614 | #     <tr>
615 | #         <td>
616 | #             <b>cosine_similarity(e1, gender)</b> =
617 | #         </td>
618 | #         <td>
619 | #          -0.058578740443554995
620 | #         </td>
621 | #     </tr>
622 | #         <tr>
623 | #         <td>
624 | #             <b>cosine_similarity(e2, gender)</b> =
625 | #         </td>
626 | #         <td>
627 | #          0.058578740443555
628 | #         </td>
629 | #     </tr>
630 | # </table>
631 | 
632 | # Go ahead and play with the input words in the cell above, to apply equalization to other pairs of words. 
633 | # 
634 | # Hint: Try...
635 | # 
636 | # These debiasing algorithms are very helpful for reducing bias, but aren't perfect and don't eliminate all traces of bias. For example, one weakness of this implementation was that the bias direction $g$ was defined using only the pair of words _woman_ and _man_. As discussed earlier, if $g$ were defined by computing $g_1 = e_{woman} - e_{man}$; $g_2 = e_{mother} - e_{father}$; $g_3 = e_{girl} - e_{boy}$; and so on and averaging over them, you would obtain a better estimate of the "gender" dimension in the 50 dimensional word embedding space. Feel free to play with these types of variants as well! 
637 | 
638 | # ### Congratulations!
639 | # 
640 | # You have come to the end of both graded and ungraded portions of this notebook, and have seen several of the ways that word vectors can be applied and  modified. Great work pushing your knowledge in the areas of neutralizing and equalizing word vectors! See you next time.
641 | 
642 | # <a name='6'></a>
643 | # ## 6 - References
644 | # 
645 | # - The debiasing algorithm is from Bolukbasi et al., 2016, [Man is to Computer Programmer as Woman is to
646 | # Homemaker? Debiasing Word Embeddings](https://papers.nips.cc/paper/6228-man-is-to-computer-programmer-as-woman-is-to-homemaker-debiasing-word-embeddings.pdf)
647 | # - The GloVe word embeddings were due to Jeffrey Pennington, Richard Socher, and Christopher D. Manning. (https://nlp.stanford.edu/projects/glove/)
648 | # 
649 | 
650 | # In[ ]:
651 | 
652 | 
653 | 
654 | 
655 | 


--------------------------------------------------------------------------------
/Course 5-SequenceModels/Week 2/Week 2 Natural Language Processing and Word Embeddings.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/TheKidPadra/DeepLearning.AI-Deep-Learning-Specialization/0a951ae6671f739e86b9a2846c821df1a0215ef6/Course 5-SequenceModels/Week 2/Week 2 Natural Language Processing and Word Embeddings.pdf


--------------------------------------------------------------------------------
/Course 5-SequenceModels/Week 3/Week 3 Sequence models and Attention Mechanism.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/TheKidPadra/DeepLearning.AI-Deep-Learning-Specialization/0a951ae6671f739e86b9a2846c821df1a0215ef6/Course 5-SequenceModels/Week 3/Week 3 Sequence models and Attention Mechanism.pdf


--------------------------------------------------------------------------------
/Course 5-SequenceModels/Week 4/Week 4 Transformers.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/TheKidPadra/DeepLearning.AI-Deep-Learning-Specialization/0a951ae6671f739e86b9a2846c821df1a0215ef6/Course 5-SequenceModels/Week 4/Week 4 Transformers.pdf


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) 2023 Padra Esfandiyar
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | # Deep Learning Specialization on Coursera (offered by deeplearning.ai)
  2 | 
  3 | <p align="center"><img width="auto" src="https://github.com/TheKidPadra/DeepLearning.AI-CourseraDeepLearningSpecialization-/blob/main/Assets/banner.png" /></p>
  4 | 
  5 | Programming assignments and quizzes answers from all courses in the Coursera [Deep Learning Specialization](https://www.coursera.org/specializations/deep-learning) offered by [<img src="https://github.com/TheKidPadra/TheKidPadra/blob/main/socials/DeepLearningAILogo.png" width="200"/>](https://www.deeplearning.ai/courses/deep-learning-specialization/)
  6 | ## Credits
  7 | 
  8 | This repo contains my work for this specialization. The code base, quiz questions and diagrams are taken from the [Deep Learning Specialization](https://www.coursera.org/specializations/deep-learning), unless specified otherwise.
  9 | 
 10 | ## Courses
 11 | 
 12 | The [Deep Learning Specialization](https://www.coursera.org/specializations/deep-learning) on Coursera contains five courses:
 13 | 
 14 | - Course 1: [Neural Networks and Deep Learning](https://www.coursera.org/learn/neural-networks-deep-learning?specialization=deep-learning)
 15 | - Course 2: [Improving Deep Neural Networks: Hyperparameter Tuning, Regularization and Optimization](https://www.coursera.org/learn/deep-neural-network?specialization=deep-learning)
 16 | - Course 3: [Structuring Machine Learning Projects](https://www.coursera.org/learn/machine-learning-projects?specialization=deep-learning)
 17 | - Course 4: [Convolutional Neural Networks](https://www.coursera.org/learn/convolutional-neural-networks?specialization=deep-learning)
 18 | - Course 5: [Sequence Models](https://www.coursera.org/learn/nlp-sequence-models?specialization=deep-learning)
 19 | 
 20 | ## Specialization Info
 21 | 
 22 | - The Deep Learning Specialization is a foundational program that will help you understand the capabilities, challenges, and consequences of deep learning and prepare you to participate in the development of leading-edge AI technology.In this Specialization, you will build and train neural network architectures such as Convolutional Neural Networks, Recurrent Neural Networks, LSTMs, Transformers, and learn how to make them better with strategies such as Dropout, BatchNorm, Xavier/He initialization, and more. Get ready to master theoretical concepts and their industry applications using Python and TensorFlow and tackle real-world cases such as speech recognition, music synthesis, chatbots, machine translation, natural language processing, and more.AI is transforming many industries. The Deep Learning Specialization provides a pathway for you to take the definitive step in the world of AI by helping you gain the knowledge and skills to level up your career. Along the way, you will also get career advice from deep learning experts from industry and academia.
 23 | 
 24 | ## Applied Learning Project
 25 | ### By the end you’ll be able to:
 26 | 
 27 | - Build and train deep neural networks, implement vectorized neural networks, identify architecture parameters, and apply DL to your applications
 28 | 
 29 | - Use best practices to train and develop test sets and analyze bias/variance for building DL applications, use standard NN techniques, apply optimization algorithms, and implement a neural network in TensorFlow
 30 | 
 31 | - Use strategies for reducing errors in ML systems, understand complex ML settings, and apply end-to-end, transfer, and multi-task learning
 32 | 
 33 | - Build a Convolutional Neural Network, apply it to visual detection and recognition tasks, use neural style transfer to generate art, and apply these algorithms to image, video, and other 2D/3D data
 34 | 
 35 | - Build and train Recurrent Neural Networks and its variants (GRUs, LSTMs), apply RNNs to character-level language modeling, work with NLP and Word Embeddings, and use HuggingFace tokenizers and transformers to perform Named Entity Recognition and Question Answering
 36 | 
 37 | 
 38 | ## What you will learn
 39 | 
 40 | - Build and train deep neural networks, identify key architecture parameters, implement vectorized neural networks and deep learning to applications
 41 | 
 42 | - Train test sets, analyze variance for DL applications, use standard techniques and optimization algorithms, and build neural networks in TensorFlow
 43 | 
 44 | - Build a CNN and apply it to detection and recognition tasks, use neural style transfer to generate art, and apply algorithms to image and video data
 45 | 
 46 | - Build and train RNNs, work with NLP and Word Embeddings, and use HuggingFace tokenizers and transformer models to perform NER and Question Answering
 47 | 
 48 | ## Usage
 49 | 
 50 | I share the assignment notebooks with my prefilled and from the contributors code structred as in the course Course/Week
 51 | The assignment notebooks are subject to changes through time.
 52 | 
 53 | # Connect with your mentors and fellow learners on Slack!
 54 | Once you enrolled to the course, you are invited to join a slack workspace for this specialization:
 55 | Please join the Slack workspace by going to the following link [deeplearningai-nlp.slack.com](https://deeplearningai-nlp.slack.com)
 56 | This Slack workspace includes all courses of this specialization.
 57 | 
 58 | ## Programming Assignments
 59 | 
 60 | ### Course 1: Neural Networks and Deep Learning
 61 | 
 62 | 
 63 | * Week 1 
 64 |    + No Lab
 65 |    + Week 1 Quiz [Introduction to Deep Learning](https://github.com/TheKidPadra/DeepLearning.AI-CourseraDeepLearningSpecialization-/blob/main/Course%201-Neural%20Networks%20%26%20Deep%20Learning/Week%201/Week%201%20Introduction%20to%20Deep%20Learning.pdf)
 66 | * Week 2 Labs & Quiz:
 67 |    + [Python Basics with Numpy (optional assignment)](https://github.com/TheKidPadra/DeepLearning.AI-CourseraDeepLearningSpecialization-/blob/main/Course%201-Neural%20Networks%20%26%20Deep%20Learning/Week%202/Python_Basics_with_Numpy.ipynb)
 68 |    + [Logistic Regression with a Neural Network mindset](https://github.com/TheKidPadra/DeepLearning.AI-CourseraDeepLearningSpecialization-/blob/main/Course%201-Neural%20Networks%20%26%20Deep%20Learning/Week%202/Logistic_Regression_with_a_Neural_Network_mindset.ipynb)
 69 |    + Week 2 Quiz [Neural Network Basics](https://github.com/TheKidPadra/DeepLearning.AI-CourseraDeepLearningSpecialization-/blob/main/Course%201-Neural%20Networks%20%26%20Deep%20Learning/Week%202/Week%202%20Neural%20Network%20Basics.pdf)
 70 | 
 71 | * Week 3 Lab & Quiz:
 72 |    + [Planar data classification with one hidden layer](https://github.com/TheKidPadra/DeepLearning.AI-CourseraDeepLearningSpecialization-/blob/main/Course%201-Neural%20Networks%20%26%20Deep%20Learning/Week%203/Planar_data_classification_with_one_hidden_layer.ipynb)
 73 |    + Week 3 Quiz [Shallow Neural Networks](https://github.com/TheKidPadra/DeepLearning.AI-CourseraDeepLearningSpecialization-/blob/main/Course%201-Neural%20Networks%20%26%20Deep%20Learning/Week%203/Week%203%20Shallow%20Neural%20Networks.pdf)
 74 | 
 75 | * Week 4 Labs & Quiz:
 76 |    + [Building your Deep Neural Network: Step by Step](https://github.com/TheKidPadra/DeepLearning.AI-CourseraDeepLearningSpecialization-/blob/main/Course%201-Neural%20Networks%20%26%20Deep%20Learning/Week%204/Building_your_Deep_Neural_Network_Step_by_Step.ipynb)
 77 |    + [Deep Neural Network for Image Classification: Application](https://github.com/TheKidPadra/DeepLearning.AI-CourseraDeepLearningSpecialization-/blob/main/Course%201-Neural%20Networks%20%26%20Deep%20Learning/Week%204/Deep%20Neural%20Network%20-%20Application.ipynb)
 78 |    + Week 4 Quiz [Key Concepts on Deep Neural Networks](https://github.com/TheKidPadra/DeepLearning.AI-CourseraDeepLearningSpecialization-/blob/main/Course%201-Neural%20Networks%20%26%20Deep%20Learning/Week%204/Week%204%20Key%20Concepts%20on%20Deep%20Neural%20Networks.pdf)
 79 | 
 80 | 
 81 | 
 82 | ### Course 2: Improving Deep Neural Networks: Hyperparameter Tuning, Regularization and Optimization
 83 | 
 84 | * Week 1 Labs & Quiz:
 85 |     + [Initialization](https://github.com/TheKidPadra/DeepLearning.AI-CourseraDeepLearningSpecialization-/blob/main/Course%202-Improving%20Deep%20Neural%20Networks/Week%201/Initialization.ipynb)
 86 |     + [Regularization](https://github.com/TheKidPadra/DeepLearning.AI-CourseraDeepLearningSpecialization-/blob/main/Course%202-Improving%20Deep%20Neural%20Networks/Week%201/Regularization.ipynb)
 87 |     + [Gradient Checking](https://github.com/TheKidPadra/DeepLearning.AI-CourseraDeepLearningSpecialization-/blob/main/Course%202-Improving%20Deep%20Neural%20Networks/Week%201/Gradient_Checking.ipynb)
 88 |     + Week 1 Quiz [Practical aspects of Deep Learning](https://github.com/TheKidPadra/DeepLearning.AI-CourseraDeepLearningSpecialization-/blob/main/Course%202-Improving%20Deep%20Neural%20Networks/Week%201/Week%201%20Practical%20aspects%20of%20Deep%20Learning.pdf)
 89 | * Week 2 Labs & Quiz:
 90 |    + [Optimization Methods](https://github.com/TheKidPadra/DeepLearning.AI-CourseraDeepLearningSpecialization-/blob/main/Course%202-Improving%20Deep%20Neural%20Networks/Week%202/Optimization_methods.ipynb)
 91 |    + Week 2 Quiz [Optimization Algorithms](https://github.com/TheKidPadra/DeepLearning.AI-CourseraDeepLearningSpecialization-/blob/main/Course%202-Improving%20Deep%20Neural%20Networks/Week%202/Week%202%20Optimization%20Algorithms.pdf)
 92 | * Week 3 Labs & Quiz:
 93 |    + [Introduction to TensorFlow](https://github.com/TheKidPadra/DeepLearning.AI-CourseraDeepLearningSpecialization-/blob/main/Course%202-Improving%20Deep%20Neural%20Networks/Week%203/Tensorflow_introduction.ipynb)
 94 |    + Week 3 Quiz [Hyperparameter tuning, Batch Normalization, Programming Frameworks](https://github.com/TheKidPadra/DeepLearning.AI-CourseraDeepLearningSpecialization-/blob/main/Course%202-Improving%20Deep%20Neural%20Networks/Week%203/Week%203%20Hyperparameter%20tuning%2C%20Batch%20Normalization%2C%20Programming%20Frameworks.pdf)
 95 | 
 96 | 
 97 | ### Course 3: Structuring Machine Learning Projects
 98 | 
 99 | * Week 1 Labs & Quiz:
100 |    + No Lab
101 |    + Week 1 Quiz [Bird Recognition in the City of Peacetopia (Case Study)](https://github.com/TheKidPadra/DeepLearning.AI-CourseraDeepLearningSpecialization-/blob/main/Course%203-Structuring%20MachineLearningProjects/Week%201%20Bird%20Recognition%20in%20the%20City%20of%20Peacetopia%20Case%20Study.pdf)
102 | * Week 2 Labs & Quiz:
103 |    + No Lab
104 |    + Week 2 Quiz [Autonomous Driving (Case Study)](https://github.com/TheKidPadra/DeepLearning.AI-CourseraDeepLearningSpecialization-/blob/main/Course%203-Structuring%20MachineLearningProjects/Week%202%20Autonomous%20Driving%20Case%20Study.pdf)
105 | 
106 | 
107 | ### Course 4: Convolutional Neural Networks
108 | 
109 | * Week 1 Labs & Quiz:
110 |    + [Convolutional Neural Networks: Step by Step](https://github.com/TheKidPadra/DeepLearning.AI-CourseraDeepLearningSpecialization-/blob/main/Course%204-ConvolutionalNeuralNetworks/Week%201/Convolution_model_Step_by_Step_v1.ipynb)
111 |    + [Convolutional Neural Networks: Application](https://github.com/TheKidPadra/DeepLearning.AI-CourseraDeepLearningSpecialization-/blob/main/Course%204-ConvolutionalNeuralNetworks/Week%201/Convolution_model_Application.ipynb)
112 |    + Week 1 Quiz [The Basics of ConvNets](https://github.com/TheKidPadra/DeepLearning.AI-CourseraDeepLearningSpecialization-/blob/main/Course%204-ConvolutionalNeuralNetworks/Week%201/Week%201%20The%20Basics%20of%20ConvNets.pdf)
113 | * Week 2 Labs & Quiz:
114 |    + [Residual Networks](https://github.com/TheKidPadra/DeepLearning.AI-CourseraDeepLearningSpecialization-/blob/main/Course%204-ConvolutionalNeuralNetworks/Week%202/Residual_Networks.ipynb)
115 |    + [Transfer Learning with MobileNetV2](https://github.com/TheKidPadra/DeepLearning.AI-CourseraDeepLearningSpecialization-/blob/main/Course%204-ConvolutionalNeuralNetworks/Week%202/Transfer_learning_with_MobileNet_v1.ipynb)
116 |    + Week 2 Quiz [Deep Convolutional Models](https://github.com/TheKidPadra/DeepLearning.AI-CourseraDeepLearningSpecialization-/blob/main/Course%204-ConvolutionalNeuralNetworks/Week%202/Week%202%20Deep%20Convolutional%20Models.pdf)
117 | * Week 3 Labs & Quiz:
118 |   + [Autonomous Driving - Car Detection](https://github.com/TheKidPadra/DeepLearning.AI-CourseraDeepLearningSpecialization-/blob/main/Course%204-ConvolutionalNeuralNetworks/Week%203/Autonomous_driving_application_Car_detection.ipynb)
119 |   + [Image Segmentation with U-Net](https://github.com/TheKidPadra/DeepLearning.AI-CourseraDeepLearningSpecialization-/blob/main/Course%204-ConvolutionalNeuralNetworks/Week%203/Image_segmentation_Unet_v2.ipynb)
120 |   + Week 3 Quiz [Detection Algorithms](https://github.com/TheKidPadra/DeepLearning.AI-CourseraDeepLearningSpecialization-/blob/main/Course%204-ConvolutionalNeuralNetworks/Week%203/Week%203%20Detection%20Algorithms.pdf)
121 | * Week 4 Labs & Quiz:
122 |   + [Face Recognition](https://github.com/TheKidPadra/DeepLearning.AI-CourseraDeepLearningSpecialization-/blob/main/Course%204-ConvolutionalNeuralNetworks/Week%204/Face_Recognition.ipynb)
123 |   + [Deep Learning & Art: Neural Style Transfer](https://github.com/TheKidPadra/DeepLearning.AI-CourseraDeepLearningSpecialization-/blob/main/Course%204-ConvolutionalNeuralNetworks/Week%204/Art_Generation_with_Neural_Style_Transfer.ipynb)
124 |   + Week 4 Quiz [Special Applications Face Recognition and Neural Style Transfer](https://github.com/TheKidPadra/DeepLearning.AI-CourseraDeepLearningSpecialization-/blob/main/Course%204-ConvolutionalNeuralNetworks/Week%204/Week%204%20Special%20Applications%20Face%20Recognition%20and%20Neural%20Style%20Transfer.pdf)
125 | 
126 | 
127 | ### Course 5: Sequence Models
128 | * Week 1 Labs & Quiz:
129 |   + [Building your Recurrent Neural Network - Step by Step](https://github.com/TheKidPadra/DeepLearning.AI-CourseraDeepLearningSpecialization-/blob/main/Course%205-SequenceModels/Week%201/Building_a_Recurrent_Neural_Network_Step_by_Step.ipynb)
130 |   + [Character level language model - Dinosaurus Island](https://github.com/TheKidPadra/DeepLearning.AI-CourseraDeepLearningSpecialization-/blob/main/Course%205-SequenceModels/Week%201/Dinosaurus_Island_Character_level_language_model.ipynb)
131 |   + [Improvise a Jazz Solo with an LSTM Network](https://github.com/TheKidPadra/DeepLearning.AI-CourseraDeepLearningSpecialization-/blob/main/Course%205-SequenceModels/Week%201/Improvise_a_Jazz_Solo_with_an_LSTM_Network_v4.ipynb)
132 |   + Week 1 Quiz [Recurrent Neural Networks](https://github.com/TheKidPadra/DeepLearning.AI-CourseraDeepLearningSpecialization-/blob/main/Course%205-SequenceModels/Week%201/Week%201%20Recurrent%20Neural%20Networks.pdf)
133 | * Week 2 Labs & Quiz:
134 |   + [Operations on Word Vectors](https://github.com/TheKidPadra/DeepLearning.AI-CourseraDeepLearningSpecialization-/blob/main/Course%205-SequenceModels/Week%202/Operations_on_word_vectors_v2a.ipynb)
135 |   + [Emojify!](https://github.com/TheKidPadra/DeepLearning.AI-CourseraDeepLearningSpecialization-/blob/main/Course%205-SequenceModels/Week%202/Emoji_v3a.ipynb)
136 |   + Week 2 Quiz [Natural Language Processing and Word Embeddings](https://github.com/TheKidPadra/DeepLearning.AI-CourseraDeepLearningSpecialization-/blob/main/Course%205-SequenceModels/Week%202/Week%202%20Natural%20Language%20Processing%20and%20Word%20Embeddings.pdf)
137 | * Week 3 Labs & Quiz:
138 |   + [Neural Machine Translation](https://github.com/TheKidPadra/DeepLearning.AI-CourseraDeepLearningSpecialization-/blob/main/Course%205-SequenceModels/Week%203/Neural_machine_translation_with_attention_v4a.ipynb)
139 |   + [Trigger Word Detection](https://github.com/TheKidPadra/DeepLearning.AI-CourseraDeepLearningSpecialization-/blob/main/Course%205-SequenceModels/Week%203/Trigger_word_detection_v2a.ipynb)
140 |   + Week 3 Quiz [Sequence Models and Attention Mechanism](https://github.com/TheKidPadra/DeepLearning.AI-CourseraDeepLearningSpecialization-/blob/main/Course%205-SequenceModels/Week%203/Week%203%20Sequence%20models%20and%20Attention%20Mechanism.pdf)
141 | * Week 4 Labs & Quiz:
142 |   + [Transformer Network](https://github.com/TheKidPadra/DeepLearning.AI-CourseraDeepLearningSpecialization-/blob/main/Course%205-SequenceModels/Week%204/C5_W4_A1_Transformer_Subclass_v1.ipynb)
143 |   + Week 4 Quiz [Trasformers](https://github.com/TheKidPadra/DeepLearning.AI-CourseraDeepLearningSpecialization-/blob/main/Course%205-SequenceModels/Week%204/Week%204%20Transformers.pdf)
144 |   
145 |     
146 |   ## Certificate
147 | 
148 | 1. [Neural Networks and Deep Learning](https://www.coursera.org/account/accomplishments/verify/9LV2D8ND4UNV)
149 | 2. [Improving Deep Neural Networks: Hyperparameter Tuning, Regularization and Optimization](https://www.coursera.org/account/accomplishments/verify/7ETLGGB85S9N)
150 | 3. [Structuring Machine Learning Projects](https://www.coursera.org/account/accomplishments/verify/4S4BS8BEEPR4)
151 | 4. [Convolutional Neural Networks](https://www.coursera.org/account/accomplishments/verify/T862BAR8XNV3)
152 | 5. [Sequence Models](https://www.coursera.org/account/accomplishments/verify/KFGNFGUWZGGB)
153 | 6. [Deep Learning Specialization(Final Certificate)]()
154 | 
155 | --------------------------------------------------------------------------------------------------------------
156 | ## References
157 | 1. [Neural Networks and Deep Learning](https://www.coursera.org/learn/neural-networks-deep-learning?specialization=deep-learning)
158 | 2. [Improving Deep Neural Networks: Hyperparameter Tuning, Regularization and Optimization](https://www.coursera.org/learn/deep-neural-network?specialization=deep-learning)
159 | 3. [Structuring Machine Learning Projects](https://www.coursera.org/learn/machine-learning-projects?specialization=deep-learning)
160 | 4. [Convolutional Neural Networks](https://www.coursera.org/learn/convolutional-neural-networks?specialization=deep-learning)
161 | 5. [Sequence Models](https://www.coursera.org/learn/nlp-sequence-models?specialization=deep-learning)
162 | 
163 | ----------------------------------------------------------------------------------------------------------------
164 | 
165 | ## 📝 License
166 | The gem is available as open source under the terms of the [MIT license](https://opensource.org/licenses/MIT).
167 | 
168 | ----------------------------------------------------------------------------------------------------------------
169 | 
170 | ## Disclaimer
171 | I recognize the hard time people spend on building intuition, understanding new concepts and debugging assignments. The solutions uploaded here are **only for reference**. They are meant to unblock you if you get stuck somewhere. Please do not copy any part of the code as-is (the programming assignments are fairly easy if you read the instructions carefully). Similarly, try out the quizzes yourself before you refer to the quiz solutions.
172 | 
173 | 


--------------------------------------------------------------------------------