├── .github └── ISSUE_TEMPLATE │ ├── asking-questions.md │ ├── bug_report.md │ ├── clarification.md │ └── feature_request.md ├── Assets └── banner.png ├── Course 1-Neural Networks & Deep Learning ├── Week 1 │ └── Week 1 Introduction to Deep Learning.pdf ├── Week 2 │ ├── Logistic_Regression_with_a_Neural_Network_mindset.ipynb │ ├── Logistic_Regression_with_a_Neural_Network_mindset.py │ ├── Python_Basics_with_Numpy.ipynb │ ├── Python_Basics_with_Numpy.py │ └── Week 2 Neural Network Basics.pdf ├── Week 3 │ ├── Planar_data_classification_with_one_hidden_layer.ipynb │ ├── Planar_data_classification_with_one_hidden_layer.py │ └── Week 3 Shallow Neural Networks.pdf └── Week 4 │ ├── Building_your_Deep_Neural_Network_Step_by_Step.ipynb │ ├── Building_your_Deep_Neural_Network_Step_by_Step.py │ ├── Deep Neural Network - Application.ipynb │ ├── Deep Neural Network - Application.py │ └── Week 4 Key Concepts on Deep Neural Networks.pdf ├── Course 2-Improving Deep Neural Networks ├── Week 1 │ ├── Gradient_Checking.ipynb │ ├── Gradient_Checking.py │ ├── Initialization.ipynb │ ├── Initialization.py │ ├── Regularization.ipynb │ ├── Regularization.py │ └── Week 1 Practical aspects of Deep Learning.pdf ├── Week 2 │ ├── Optimization_methods.ipynb │ ├── Optimization_methods.py │ └── Week 2 Optimization Algorithms.pdf └── Week 3 │ ├── Tensorflow_introduction.ipynb │ ├── Tensorflow_introduction.py │ └── Week 3 Hyperparameter tuning, Batch Normalization, Programming Frameworks.pdf ├── Course 3-Structuring MachineLearningProjects ├── Week 1 Bird Recognition in the City of Peacetopia Case Study.pdf └── Week 2 Autonomous Driving Case Study.pdf ├── Course 4-ConvolutionalNeuralNetworks ├── Week 1 │ ├── Convolution_model_Application.ipynb │ ├── Convolution_model_Application.py │ ├── Convolution_model_Step_by_Step_v1.ipynb │ ├── Convolution_model_Step_by_Step_v1.py │ └── Week 1 The Basics of ConvNets.pdf ├── Week 2 │ ├── Residual_Networks.ipynb │ ├── Residual_Networks.py │ ├── Transfer_learning_with_MobileNet_v1.ipynb │ ├── Transfer_learning_with_MobileNet_v1.py │ └── Week 2 Deep Convolutional Models.pdf ├── Week 3 │ ├── Autonomous_driving_application_Car_detection.ipynb │ ├── Autonomous_driving_application_Car_detection.py │ ├── Image_segmentation_Unet_v2.ipynb │ ├── Image_segmentation_Unet_v2.py │ └── Week 3 Detection Algorithms.pdf └── Week 4 │ ├── Art_Generation_with_Neural_Style_Transfer.ipynb │ ├── Art_Generation_with_Neural_Style_Transfer.py │ ├── Face_Recognition.ipynb │ ├── Face_Recognition.py │ └── Week 4 Special Applications Face Recognition and Neural Style Transfer.pdf ├── Course 5-SequenceModels ├── Week 1 │ ├── Building_a_Recurrent_Neural_Network_Step_by_Step.ipynb │ ├── Building_a_Recurrent_Neural_Network_Step_by_Step.py │ ├── Dinosaurus_Island_Character_level_language_model.ipynb │ ├── Dinosaurus_Island_Character_level_language_model.py │ ├── Improvise_a_Jazz_Solo_with_an_LSTM_Network_v4.ipynb │ ├── Improvise_a_Jazz_Solo_with_an_LSTM_Network_v4.py │ └── Week 1 Recurrent Neural Networks.pdf ├── Week 2 │ ├── Emoji_v3a.ipynb │ ├── Emoji_v3a.py │ ├── Operations_on_word_vectors_v2a.ipynb │ ├── Operations_on_word_vectors_v2a.py │ └── Week 2 Natural Language Processing and Word Embeddings.pdf ├── Week 3 │ ├── Neural_machine_translation_with_attention_v4a.ipynb │ ├── Neural_machine_translation_with_attention_v4a.py │ ├── Trigger_word_detection_v2a.ipynb │ ├── Trigger_word_detection_v2a.py │ └── Week 3 Sequence models and Attention Mechanism.pdf └── Week 4 │ ├── C5_W4_A1_Transformer_Subclass_v1.ipynb │ ├── C5_W4_A1_Transformer_Subclass_v1.py │ └── Week 4 Transformers.pdf ├── LICENSE └── README.md /.github/ISSUE_TEMPLATE/asking-questions.md: -------------------------------------------------------------------------------- 1 | --- 2 | name: Asking Questions 3 | about: For Asking Question about the repo. 4 | title: '' 5 | labels: '' 6 | assignees: '' 7 | 8 | --- 9 | 10 | 11 | -------------------------------------------------------------------------------- /.github/ISSUE_TEMPLATE/bug_report.md: -------------------------------------------------------------------------------- 1 | --- 2 | name: Bug report 3 | about: Create a report to help us improve 4 | title: '' 5 | labels: '' 6 | assignees: '' 7 | 8 | --- 9 | 10 | **Describe the bug** 11 | A clear and concise description of what the bug is. 12 | 13 | **To Reproduce** 14 | Steps to reproduce the behavior: 15 | 1. Go to '...' 16 | 2. Click on '....' 17 | 3. Scroll down to '....' 18 | 4. See error 19 | 20 | **Expected behavior** 21 | A clear and concise description of what you expected to happen. 22 | 23 | **Screenshots** 24 | If applicable, add screenshots to help explain your problem. 25 | 26 | **Desktop (please complete the following information):** 27 | - OS: [e.g. iOS] 28 | - Browser [e.g. chrome, safari] 29 | - Version [e.g. 22] 30 | 31 | **Smartphone (please complete the following information):** 32 | - Device: [e.g. iPhone6] 33 | - OS: [e.g. iOS8.1] 34 | - Browser [e.g. stock browser, safari] 35 | - Version [e.g. 22] 36 | 37 | **Additional context** 38 | Add any other context about the problem here. 39 | -------------------------------------------------------------------------------- /.github/ISSUE_TEMPLATE/clarification.md: -------------------------------------------------------------------------------- 1 | --- 2 | name: Clarification 3 | about: Question about files and code versions. 4 | title: '' 5 | labels: '' 6 | assignees: '' 7 | 8 | --- 9 | 10 | 11 | -------------------------------------------------------------------------------- /.github/ISSUE_TEMPLATE/feature_request.md: -------------------------------------------------------------------------------- 1 | --- 2 | name: Feature request 3 | about: Suggest an idea for this project 4 | title: '' 5 | labels: '' 6 | assignees: '' 7 | 8 | --- 9 | 10 | **Is your feature request related to a problem? Please describe.** 11 | A clear and concise description of what the problem is. Ex. I'm always frustrated when [...] 12 | 13 | **Describe the solution you'd like** 14 | A clear and concise description of what you want to happen. 15 | 16 | **Describe alternatives you've considered** 17 | A clear and concise description of any alternative solutions or features you've considered. 18 | 19 | **Additional context** 20 | Add any other context or screenshots about the feature request here. 21 | -------------------------------------------------------------------------------- /Assets/banner.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TheKidPadra/DeepLearning.AI-Deep-Learning-Specialization/0a951ae6671f739e86b9a2846c821df1a0215ef6/Assets/banner.png -------------------------------------------------------------------------------- /Course 1-Neural Networks & Deep Learning/Week 1/Week 1 Introduction to Deep Learning.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TheKidPadra/DeepLearning.AI-Deep-Learning-Specialization/0a951ae6671f739e86b9a2846c821df1a0215ef6/Course 1-Neural Networks & Deep Learning/Week 1/Week 1 Introduction to Deep Learning.pdf -------------------------------------------------------------------------------- /Course 1-Neural Networks & Deep Learning/Week 2/Python_Basics_with_Numpy.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # coding: utf-8 3 | 4 | # # Python Basics with Numpy (optional assignment) 5 | # 6 | # Welcome to your first assignment. This exercise gives you a brief introduction to Python. Even if you've used Python before, this will help familiarize you with the functions we'll need. 7 | # 8 | # **Instructions:** 9 | # - You will be using Python 3. 10 | # - Avoid using for-loops and while-loops, unless you are explicitly told to do so. 11 | # - After coding your function, run the cell right below it to check if your result is correct. 12 | # 13 | # **After this assignment you will:** 14 | # - Be able to use iPython Notebooks 15 | # - Be able to use numpy functions and numpy matrix/vector operations 16 | # - Understand the concept of "broadcasting" 17 | # - Be able to vectorize code 18 | # 19 | # Let's get started! 20 | # 21 | # ## Important Note on Submission to the AutoGrader 22 | # 23 | # Before submitting your assignment to the AutoGrader, please make sure you are not doing the following: 24 | # 25 | # 1. You have not added any _extra_ `print` statement(s) in the assignment. 26 | # 2. You have not added any _extra_ code cell(s) in the assignment. 27 | # 3. You have not changed any of the function parameters. 28 | # 4. You are not using any global variables inside your graded exercises. Unless specifically instructed to do so, please refrain from it and use the local variables instead. 29 | # 5. You are not changing the assignment code where it is not required, like creating _extra_ variables. 30 | # 31 | # If you do any of the following, you will get something like, `Grader not found` (or similarly unexpected) error upon submitting your assignment. Before asking for help/debugging the errors in your assignment, check for these first. If this is the case, and you don't remember the changes you have made, you can get a fresh copy of the assignment by following these [instructions](https://www.coursera.org/learn/neural-networks-deep-learning/supplement/iLwon/h-ow-to-refresh-your-workspace). 32 | 33 | # ## Table of Contents 34 | # - [About iPython Notebooks](#0) 35 | # - [Exercise 1](#ex-1) 36 | # - [1 - Building basic functions with numpy](#1) 37 | # - [1.1 - sigmoid function, np.exp()](#1-1) 38 | # - [Exercise 2 - basic_sigmoid](#ex-2) 39 | # - [Exercise 3 - sigmoid](#ex-3) 40 | # - [1.2 - Sigmoid Gradient](#1-2) 41 | # - [Exercise 4 - sigmoid_derivative](#ex-4) 42 | # - [1.3 - Reshaping arrays](#1-3) 43 | # - [Exercise 5 - image2vector](#ex-5) 44 | # - [1.4 - Normalizing rows](#1-4) 45 | # - [Exercise 6 - normalize_rows](#ex-6) 46 | # - [Exercise 7 - softmax](#ex-7) 47 | # - [2 - Vectorization](#2) 48 | # - [2.1 Implement the L1 and L2 loss functions](#2-1) 49 | # - [Exercise 8 - L1](#ex-8) 50 | # - [Exercise 9 - L2](#ex-9) 51 | 52 | # 53 | # ## About iPython Notebooks ## 54 | # 55 | # iPython Notebooks are interactive coding environments embedded in a webpage. You will be using iPython notebooks in this class. You only need to write code between the # your code here comment. After writing your code, you can run the cell by either pressing "SHIFT"+"ENTER" or by clicking on "Run Cell" (denoted by a play symbol) in the upper bar of the notebook. 56 | # 57 | # We will often specify "(≈ X lines of code)" in the comments to tell you about how much code you need to write. It is just a rough estimate, so don't feel bad if your code is longer or shorter. 58 | # 59 | # 60 | # ### Exercise 1 61 | # Set test to `"Hello World"` in the cell below to print "Hello World" and run the two cells below. 62 | 63 | # In[1]: 64 | 65 | 66 | # (≈ 1 line of code) 67 | # test = 68 | # YOUR CODE STARTS HERE 69 | test = "Hello World" 70 | 71 | # YOUR CODE ENDS HERE 72 | 73 | 74 | # In[2]: 75 | 76 | 77 | print ("test: " + test) 78 | 79 | 80 | # **Expected output**: 81 | # test: Hello World 82 | 83 | # 84 | # What you need to remember : 85 | # 86 | # - Run your cells using SHIFT+ENTER (or "Run cell") 87 | # - Write code in the designated areas using Python 3 only 88 | # - Do not modify the code outside of the designated areas 89 | 90 | # 91 | # ## 1 - Building basic functions with numpy ## 92 | # 93 | # Numpy is the main package for scientific computing in Python. It is maintained by a large community (www.numpy.org). In this exercise you will learn several key numpy functions such as `np.exp`, `np.log`, and `np.reshape`. You will need to know how to use these functions for future assignments. 94 | # 95 | # 96 | # ### 1.1 - sigmoid function, np.exp() ### 97 | # 98 | # Before using `np.exp()`, you will use `math.exp()` to implement the sigmoid function. You will then see why `np.exp()` is preferable to `math.exp()`. 99 | # 100 | # 101 | # ### Exercise 2 - basic_sigmoid 102 | # Build a function that returns the sigmoid of a real number x. Use `math.exp(x)` for the exponential function. 103 | # 104 | # **Reminder**: 105 | # $sigmoid(x) = \frac{1}{1+e^{-x}}$ is sometimes also known as the logistic function. It is a non-linear function used not only in Machine Learning (Logistic Regression), but also in Deep Learning. 106 | # 107 | # 108 | # 109 | # To refer to a function belonging to a specific package you could call it using `package_name.function()`. Run the code below to see an example with `math.exp()`. 110 | 111 | # In[6]: 112 | 113 | 114 | import math 115 | from public_tests import * 116 | 117 | # GRADED FUNCTION: basic_sigmoid 118 | 119 | def basic_sigmoid(x): 120 | """ 121 | Compute sigmoid of x. 122 | 123 | Arguments: 124 | x -- A scalar 125 | 126 | Return: 127 | s -- sigmoid(x) 128 | """ 129 | # (≈ 1 line of code) 130 | # s = 131 | # YOUR CODE STARTS HERE 132 | s = 1/(1+math.exp(-x)) 133 | 134 | # YOUR CODE ENDS HERE 135 | 136 | return s 137 | 138 | 139 | # In[7]: 140 | 141 | 142 | print("basic_sigmoid(1) = " + str(basic_sigmoid(1))) 143 | 144 | basic_sigmoid_test(basic_sigmoid) 145 | 146 | 147 | # Actually, we rarely use the "math" library in deep learning because the inputs of the functions are real numbers. In deep learning we mostly use matrices and vectors. This is why numpy is more useful. 148 | 149 | # In[8]: 150 | 151 | 152 | ### One reason why we use "numpy" instead of "math" in Deep Learning ### 153 | 154 | x = [1, 2, 3] # x becomes a python list object 155 | basic_sigmoid(x) # you will see this give an error when you run it, because x is a vector. 156 | 157 | 158 | # In fact, if $ x = (x_1, x_2, ..., x_n)$ is a row vector then `np.exp(x)` will apply the exponential function to every element of x. The output will thus be: `np.exp(x) = (e^{x_1}, e^{x_2}, ..., e^{x_n})` 159 | 160 | # In[9]: 161 | 162 | 163 | import numpy as np 164 | 165 | # example of np.exp 166 | t_x = np.array([1, 2, 3]) 167 | print(np.exp(t_x)) # result is (exp(1), exp(2), exp(3)) 168 | 169 | 170 | # Furthermore, if x is a vector, then a Python operation such as $s = x + 3$ or $s = \frac{1}{x}$ will output s as a vector of the same size as x. 171 | 172 | # In[10]: 173 | 174 | 175 | # example of vector operation 176 | t_x = np.array([1, 2, 3]) 177 | print (t_x + 3) 178 | 179 | 180 | # Any time you need more info on a numpy function, we encourage you to look at [the official documentation](https://docs.scipy.org/doc/numpy-1.10.1/reference/generated/numpy.exp.html). 181 | # 182 | # You can also create a new cell in the notebook and write `np.exp?` (for example) to get quick access to the documentation. 183 | # 184 | # 185 | # ### Exercise 3 - sigmoid 186 | # Implement the sigmoid function using numpy. 187 | # 188 | # **Instructions**: x could now be either a real number, a vector, or a matrix. The data structures we use in numpy to represent these shapes (vectors, matrices...) are called numpy arrays. You don't need to know more for now. 189 | # $$ \text{For } x \in \mathbb{R}^n \text{, } sigmoid(x) = sigmoid\begin{pmatrix} 190 | # x_1 \\ 191 | # x_2 \\ 192 | # ... \\ 193 | # x_n \\ 194 | # \end{pmatrix} = \begin{pmatrix} 195 | # \frac{1}{1+e^{-x_1}} \\ 196 | # \frac{1}{1+e^{-x_2}} \\ 197 | # ... \\ 198 | # \frac{1}{1+e^{-x_n}} \\ 199 | # \end{pmatrix}\tag{1} $$ 200 | 201 | # In[11]: 202 | 203 | 204 | # GRADED FUNCTION: sigmoid 205 | import numpy as np 206 | 207 | def sigmoid(x): 208 | """ 209 | Compute the sigmoid of x 210 | 211 | Arguments: 212 | x -- A scalar or numpy array of any size 213 | 214 | Return: 215 | s -- sigmoid(x) 216 | """ 217 | 218 | # (≈ 1 line of code) 219 | # s = 220 | # YOUR CODE STARTS HERE 221 | s=1/(1+np.exp(-x)) 222 | # YOUR CODE ENDS HERE 223 | 224 | return s 225 | 226 | 227 | # In[12]: 228 | 229 | 230 | t_x = np.array([1, 2, 3]) 231 | print("sigmoid(t_x) = " + str(sigmoid(t_x))) 232 | 233 | sigmoid_test(sigmoid) 234 | 235 | 236 | # 237 | # ### 1.2 - Sigmoid Gradient 238 | # 239 | # As you've seen in lecture, you will need to compute gradients to optimize loss functions using backpropagation. Let's code your first gradient function. 240 | # 241 | # 242 | # ### Exercise 4 - sigmoid_derivative 243 | # Implement the function sigmoid_grad() to compute the gradient of the sigmoid function with respect to its input x. The formula is: 244 | # 245 | # $$sigmoid\_derivative(x) = \sigma'(x) = \sigma(x) (1 - \sigma(x))\tag{2}$$ 246 | # 247 | # You often code this function in two steps: 248 | # 1. Set s to be the sigmoid of x. You might find your sigmoid(x) function useful. 249 | # 2. Compute $\sigma'(x) = s(1-s)$ 250 | 251 | # In[13]: 252 | 253 | 254 | # GRADED FUNCTION: sigmoid_derivative 255 | 256 | def sigmoid_derivative(x): 257 | """ 258 | Compute the gradient (also called the slope or derivative) of the sigmoid function with respect to its input x. 259 | You can store the output of the sigmoid function into variables and then use it to calculate the gradient. 260 | 261 | Arguments: 262 | x -- A scalar or numpy array 263 | 264 | Return: 265 | ds -- Your computed gradient. 266 | """ 267 | 268 | #(≈ 2 lines of code) 269 | # s = 270 | # ds = 271 | # YOUR CODE STARTS HERE 272 | s = 1/(1+np.exp(-x)) 273 | ds = s*(1-s) 274 | # YOUR CODE ENDS HERE 275 | 276 | return ds 277 | 278 | 279 | # In[14]: 280 | 281 | 282 | t_x = np.array([1, 2, 3]) 283 | print ("sigmoid_derivative(t_x) = " + str(sigmoid_derivative(t_x))) 284 | 285 | sigmoid_derivative_test(sigmoid_derivative) 286 | 287 | 288 | # 289 | # ### 1.3 - Reshaping arrays ### 290 | # 291 | # Two common numpy functions used in deep learning are [np.shape](https://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.shape.html) and [np.reshape()](https://docs.scipy.org/doc/numpy/reference/generated/numpy.reshape.html). 292 | # - X.shape is used to get the shape (dimension) of a matrix/vector X. 293 | # - X.reshape(...) is used to reshape X into some other dimension. 294 | # 295 | # For example, in computer science, an image is represented by a 3D array of shape $(length, height, depth = 3)$. However, when you read an image as the input of an algorithm you convert it to a vector of shape $(length*height*3, 1)$. In other words, you "unroll", or reshape, the 3D array into a 1D vector. 296 | # 297 | # 298 | # 299 | # 300 | # ### Exercise 5 - image2vector 301 | # Implement `image2vector()` that takes an input of shape (length, height, 3) and returns a vector of shape (length\*height\*3, 1). For example, if you would like to reshape an array v of shape (a, b, c) into a vector of shape (a*b,c) you would do: 302 | # ``` python 303 | # v = v.reshape((v.shape[0] * v.shape[1], v.shape[2])) # v.shape[0] = a ; v.shape[1] = b ; v.shape[2] = c 304 | # ``` 305 | # - Please don't hardcode the dimensions of image as a constant. Instead look up the quantities you need with `image.shape[0]`, etc. 306 | # - You can use v = v.reshape(-1, 1). Just make sure you understand why it works. 307 | 308 | # In[15]: 309 | 310 | 311 | # GRADED FUNCTION:image2vector 312 | 313 | def image2vector(image): 314 | """ 315 | Argument: 316 | image -- a numpy array of shape (length, height, depth) 317 | 318 | Returns: 319 | v -- a vector of shape (length*height*depth, 1) 320 | """ 321 | 322 | # (≈ 1 line of code) 323 | # v = 324 | # YOUR CODE STARTS HERE 325 | v = image.reshape((image.shape[2] * image.shape[1] * image.shape[0], 1)) 326 | # YOUR CODE ENDS HERE 327 | 328 | return v 329 | 330 | 331 | # In[16]: 332 | 333 | 334 | # This is a 3 by 3 by 2 array, typically images will be (num_px_x, num_px_y,3) where 3 represents the RGB values 335 | t_image = np.array([[[ 0.67826139, 0.29380381], 336 | [ 0.90714982, 0.52835647], 337 | [ 0.4215251 , 0.45017551]], 338 | 339 | [[ 0.92814219, 0.96677647], 340 | [ 0.85304703, 0.52351845], 341 | [ 0.19981397, 0.27417313]], 342 | 343 | [[ 0.60659855, 0.00533165], 344 | [ 0.10820313, 0.49978937], 345 | [ 0.34144279, 0.94630077]]]) 346 | 347 | print ("image2vector(image) = " + str(image2vector(t_image))) 348 | 349 | image2vector_test(image2vector) 350 | 351 | 352 | # 353 | # ### 1.4 - Normalizing rows 354 | # 355 | # Another common technique we use in Machine Learning and Deep Learning is to normalize our data. It often leads to a better performance because gradient descent converges faster after normalization. Here, by normalization we mean changing x to $ \frac{x}{\| x\|} $ (dividing each row vector of x by its norm). 356 | # 357 | # For example, if 358 | # $$x = \begin{bmatrix} 359 | # 0 & 3 & 4 \\ 360 | # 2 & 6 & 4 \\ 361 | # \end{bmatrix}\tag{3}$$ 362 | # then 363 | # $$\| x\| = \text{np.linalg.norm(x, axis=1, keepdims=True)} = \begin{bmatrix} 364 | # 5 \\ 365 | # \sqrt{56} \\ 366 | # \end{bmatrix}\tag{4} $$ 367 | # and 368 | # $$ x\_normalized = \frac{x}{\| x\|} = \begin{bmatrix} 369 | # 0 & \frac{3}{5} & \frac{4}{5} \\ 370 | # \frac{2}{\sqrt{56}} & \frac{6}{\sqrt{56}} & \frac{4}{\sqrt{56}} \\ 371 | # \end{bmatrix}\tag{5}$$ 372 | # 373 | # Note that you can divide matrices of different sizes and it works fine: this is called broadcasting and you're going to learn about it in part 5. 374 | # 375 | # With `keepdims=True` the result will broadcast correctly against the original x. 376 | # 377 | # `axis=1` means you are going to get the norm in a row-wise manner. If you need the norm in a column-wise way, you would need to set `axis=0`. 378 | # 379 | # numpy.linalg.norm has another parameter `ord` where we specify the type of normalization to be done (in the exercise below you'll do 2-norm). To get familiar with the types of normalization you can visit [numpy.linalg.norm](https://numpy.org/doc/stable/reference/generated/numpy.linalg.norm.html) 380 | # 381 | # 382 | # ### Exercise 6 - normalize_rows 383 | # Implement normalizeRows() to normalize the rows of a matrix. After applying this function to an input matrix x, each row of x should be a vector of unit length (meaning length 1). 384 | # 385 | # **Note**: Don't try to use `x /= x_norm`. For the matrix division numpy must broadcast the x_norm, which is not supported by the operant `/=` 386 | 387 | # In[17]: 388 | 389 | 390 | # GRADED FUNCTION: normalize_rows 391 | 392 | def normalize_rows(x): 393 | """ 394 | Implement a function that normalizes each row of the matrix x (to have unit length). 395 | 396 | Argument: 397 | x -- A numpy matrix of shape (n, m) 398 | 399 | Returns: 400 | x -- The normalized (by row) numpy matrix. You are allowed to modify x. 401 | """ 402 | 403 | #(≈ 2 lines of code) 404 | # Compute x_norm as the norm 2 of x. Use np.linalg.norm(..., ord = 2, axis = ..., keepdims = True) 405 | # x_norm = 406 | # Divide x by its norm. 407 | # x = 408 | # YOUR CODE STARTS HERE 409 | x_norm = np.linalg.norm(x,ord = 2,axis=1,keepdims=True) 410 | x=x/x_norm 411 | # YOUR CODE ENDS HERE 412 | 413 | return x 414 | 415 | 416 | # In[18]: 417 | 418 | 419 | x = np.array([[0, 3, 4], 420 | [1, 6, 4]]) 421 | print("normalizeRows(x) = " + str(normalize_rows(x))) 422 | 423 | normalizeRows_test(normalize_rows) 424 | 425 | 426 | # **Note**: 427 | # In normalize_rows(), you can try to print the shapes of x_norm and x, and then rerun the assessment. You'll find out that they have different shapes. This is normal given that x_norm takes the norm of each row of x. So x_norm has the same number of rows but only 1 column. So how did it work when you divided x by x_norm? This is called broadcasting and we'll talk about it now! 428 | 429 | # 430 | # ### Exercise 7 - softmax 431 | # Implement a softmax function using numpy. You can think of softmax as a normalizing function used when your algorithm needs to classify two or more classes. You will learn more about softmax in the second course of this specialization. 432 | # 433 | # **Instructions**: 434 | # - $\text{for } x \in \mathbb{R}^{1\times n} \text{, }$ 435 | # 436 | # \begin{align*} 437 | # softmax(x) &= softmax\left(\begin{bmatrix} 438 | # x_1 && 439 | # x_2 && 440 | # ... && 441 | # x_n 442 | # \end{bmatrix}\right) \\&= \begin{bmatrix} 443 | # \frac{e^{x_1}}{\sum_{j}e^{x_j}} && 444 | # \frac{e^{x_2}}{\sum_{j}e^{x_j}} && 445 | # ... && 446 | # \frac{e^{x_n}}{\sum_{j}e^{x_j}} 447 | # \end{bmatrix} 448 | # \end{align*} 449 | # 450 | # - $\text{for a matrix } x \in \mathbb{R}^{m \times n} \text{, $x_{ij}$ maps to the element in the $i^{th}$ row and $j^{th}$ column of $x$, thus we have: }$ 451 | # 452 | # \begin{align*} 453 | # softmax(x) &= softmax\begin{bmatrix} 454 | # x_{11} & x_{12} & x_{13} & \dots & x_{1n} \\ 455 | # x_{21} & x_{22} & x_{23} & \dots & x_{2n} \\ 456 | # \vdots & \vdots & \vdots & \ddots & \vdots \\ 457 | # x_{m1} & x_{m2} & x_{m3} & \dots & x_{mn} 458 | # \end{bmatrix} \\ \\&= 459 | # \begin{bmatrix} 460 | # \frac{e^{x_{11}}}{\sum_{j}e^{x_{1j}}} & \frac{e^{x_{12}}}{\sum_{j}e^{x_{1j}}} & \frac{e^{x_{13}}}{\sum_{j}e^{x_{1j}}} & \dots & \frac{e^{x_{1n}}}{\sum_{j}e^{x_{1j}}} \\ 461 | # \frac{e^{x_{21}}}{\sum_{j}e^{x_{2j}}} & \frac{e^{x_{22}}}{\sum_{j}e^{x_{2j}}} & \frac{e^{x_{23}}}{\sum_{j}e^{x_{2j}}} & \dots & \frac{e^{x_{2n}}}{\sum_{j}e^{x_{2j}}} \\ 462 | # \vdots & \vdots & \vdots & \ddots & \vdots \\ 463 | # \frac{e^{x_{m1}}}{\sum_{j}e^{x_{mj}}} & \frac{e^{x_{m2}}}{\sum_{j}e^{x_{mj}}} & \frac{e^{x_{m3}}}{\sum_{j}e^{x_{mj}}} & \dots & \frac{e^{x_{mn}}}{\sum_{j}e^{x_{mj}}} 464 | # \end{bmatrix} \\ \\ &= \begin{pmatrix} 465 | # softmax\text{(first row of x)} \\ 466 | # softmax\text{(second row of x)} \\ 467 | # \vdots \\ 468 | # softmax\text{(last row of x)} \\ 469 | # \end{pmatrix} 470 | # \end{align*} 471 | 472 | # **Notes:** 473 | # Note that later in the course, you'll see "m" used to represent the "number of training examples", and each training example is in its own column of the matrix. Also, each feature will be in its own row (each row has data for the same feature). 474 | # Softmax should be performed for all features of each training example, so softmax would be performed on the columns (once we switch to that representation later in this course). 475 | # 476 | # However, in this coding practice, we're just focusing on getting familiar with Python, so we're using the common math notation $m \times n$ 477 | # where $m$ is the number of rows and $n$ is the number of columns. 478 | 479 | # In[19]: 480 | 481 | 482 | # GRADED FUNCTION: softmax 483 | 484 | def softmax(x): 485 | """Calculates the softmax for each row of the input x. 486 | 487 | Your code should work for a row vector and also for matrices of shape (m,n). 488 | 489 | Argument: 490 | x -- A numpy matrix of shape (m,n) 491 | 492 | Returns: 493 | s -- A numpy matrix equal to the softmax of x, of shape (m,n) 494 | """ 495 | 496 | #(≈ 3 lines of code) 497 | # Apply exp() element-wise to x. Use np.exp(...). 498 | # x_exp = ... 499 | 500 | # Create a vector x_sum that sums each row of x_exp. Use np.sum(..., axis = 1, keepdims = True). 501 | # x_sum = ... 502 | 503 | # Compute softmax(x) by dividing x_exp by x_sum. It should automatically use numpy broadcasting. 504 | # s = ... 505 | 506 | # YOUR CODE STARTS HERE 507 | x_exp = np.exp(x) 508 | x_sum = np.sum(x_exp,axis = 1, keepdims=True) 509 | s = x_exp/x_sum 510 | # YOUR CODE ENDS HERE 511 | 512 | return s 513 | 514 | 515 | # In[20]: 516 | 517 | 518 | t_x = np.array([[9, 2, 5, 0, 0], 519 | [7, 5, 0, 0 ,0]]) 520 | print("softmax(x) = " + str(softmax(t_x))) 521 | 522 | softmax_test(softmax) 523 | 524 | 525 | # #### Notes 526 | # - If you print the shapes of x_exp, x_sum and s above and rerun the assessment cell, you will see that x_sum is of shape (2,1) while x_exp and s are of shape (2,5). **x_exp/x_sum** works due to python broadcasting. 527 | # 528 | # Congratulations! You now have a pretty good understanding of python numpy and have implemented a few useful functions that you will be using in deep learning. 529 | 530 | # 531 | # What you need to remember: 532 | # 533 | # - np.exp(x) works for any np.array x and applies the exponential function to every coordinate 534 | # - the sigmoid function and its gradient 535 | # - image2vector is commonly used in deep learning 536 | # - np.reshape is widely used. In the future, you'll see that keeping your matrix/vector dimensions straight will go toward eliminating a lot of bugs. 537 | # - numpy has efficient built-in functions 538 | # - broadcasting is extremely useful 539 | 540 | # 541 | # ## 2 - Vectorization 542 | # 543 | # 544 | # In deep learning, you deal with very large datasets. Hence, a non-computationally-optimal function can become a huge bottleneck in your algorithm and can result in a model that takes ages to run. To make sure that your code is computationally efficient, you will use vectorization. For example, try to tell the difference between the following implementations of the dot/outer/elementwise product. 545 | 546 | # In[21]: 547 | 548 | 549 | import time 550 | 551 | x1 = [9, 2, 5, 0, 0, 7, 5, 0, 0, 0, 9, 2, 5, 0, 0] 552 | x2 = [9, 2, 2, 9, 0, 9, 2, 5, 0, 0, 9, 2, 5, 0, 0] 553 | 554 | ### CLASSIC DOT PRODUCT OF VECTORS IMPLEMENTATION ### 555 | tic = time.process_time() 556 | dot = 0 557 | 558 | for i in range(len(x1)): 559 | dot += x1[i] * x2[i] 560 | toc = time.process_time() 561 | print ("dot = " + str(dot) + "\n ----- Computation time = " + str(1000 * (toc - tic)) + "ms") 562 | 563 | ### CLASSIC OUTER PRODUCT IMPLEMENTATION ### 564 | tic = time.process_time() 565 | outer = np.zeros((len(x1), len(x2))) # we create a len(x1)*len(x2) matrix with only zeros 566 | 567 | for i in range(len(x1)): 568 | for j in range(len(x2)): 569 | outer[i,j] = x1[i] * x2[j] 570 | toc = time.process_time() 571 | print ("outer = " + str(outer) + "\n ----- Computation time = " + str(1000 * (toc - tic)) + "ms") 572 | 573 | ### CLASSIC ELEMENTWISE IMPLEMENTATION ### 574 | tic = time.process_time() 575 | mul = np.zeros(len(x1)) 576 | 577 | for i in range(len(x1)): 578 | mul[i] = x1[i] * x2[i] 579 | toc = time.process_time() 580 | print ("elementwise multiplication = " + str(mul) + "\n ----- Computation time = " + str(1000 * (toc - tic)) + "ms") 581 | 582 | ### CLASSIC GENERAL DOT PRODUCT IMPLEMENTATION ### 583 | W = np.random.rand(3,len(x1)) # Random 3*len(x1) numpy array 584 | tic = time.process_time() 585 | gdot = np.zeros(W.shape[0]) 586 | 587 | for i in range(W.shape[0]): 588 | for j in range(len(x1)): 589 | gdot[i] += W[i,j] * x1[j] 590 | toc = time.process_time() 591 | print ("gdot = " + str(gdot) + "\n ----- Computation time = " + str(1000 * (toc - tic)) + "ms") 592 | 593 | 594 | # In[22]: 595 | 596 | 597 | x1 = [9, 2, 5, 0, 0, 7, 5, 0, 0, 0, 9, 2, 5, 0, 0] 598 | x2 = [9, 2, 2, 9, 0, 9, 2, 5, 0, 0, 9, 2, 5, 0, 0] 599 | 600 | ### VECTORIZED DOT PRODUCT OF VECTORS ### 601 | tic = time.process_time() 602 | dot = np.dot(x1,x2) 603 | toc = time.process_time() 604 | print ("dot = " + str(dot) + "\n ----- Computation time = " + str(1000 * (toc - tic)) + "ms") 605 | 606 | ### VECTORIZED OUTER PRODUCT ### 607 | tic = time.process_time() 608 | outer = np.outer(x1,x2) 609 | toc = time.process_time() 610 | print ("outer = " + str(outer) + "\n ----- Computation time = " + str(1000 * (toc - tic)) + "ms") 611 | 612 | ### VECTORIZED ELEMENTWISE MULTIPLICATION ### 613 | tic = time.process_time() 614 | mul = np.multiply(x1,x2) 615 | toc = time.process_time() 616 | print ("elementwise multiplication = " + str(mul) + "\n ----- Computation time = " + str(1000*(toc - tic)) + "ms") 617 | 618 | ### VECTORIZED GENERAL DOT PRODUCT ### 619 | tic = time.process_time() 620 | dot = np.dot(W,x1) 621 | toc = time.process_time() 622 | print ("gdot = " + str(dot) + "\n ----- Computation time = " + str(1000 * (toc - tic)) + "ms") 623 | 624 | 625 | # As you may have noticed, the vectorized implementation is much cleaner and more efficient. For bigger vectors/matrices, the differences in running time become even bigger. 626 | # 627 | # **Note** that `np.dot()` performs a matrix-matrix or matrix-vector multiplication. This is different from `np.multiply()` and the `*` operator (which is equivalent to `.*` in Matlab/Octave), which performs an element-wise multiplication. 628 | 629 | # 630 | # ### 2.1 Implement the L1 and L2 loss functions 631 | # 632 | # 633 | # ### Exercise 8 - L1 634 | # Implement the numpy vectorized version of the L1 loss. You may find the function abs(x) (absolute value of x) useful. 635 | # 636 | # **Reminder**: 637 | # - The loss is used to evaluate the performance of your model. The bigger your loss is, the more different your predictions ($ \hat{y} $) are from the true values ($y$). In deep learning, you use optimization algorithms like Gradient Descent to train your model and to minimize the cost. 638 | # - L1 loss is defined as: 639 | # $$\begin{align*} & L_1(\hat{y}, y) = \sum_{i=0}^{m-1}|y^{(i)} - \hat{y}^{(i)}| \end{align*}\tag{6}$$ 640 | 641 | # In[23]: 642 | 643 | 644 | # GRADED FUNCTION: L1 645 | 646 | def L1(yhat, y): 647 | """ 648 | Arguments: 649 | yhat -- vector of size m (predicted labels) 650 | y -- vector of size m (true labels) 651 | 652 | Returns: 653 | loss -- the value of the L1 loss function defined above 654 | """ 655 | 656 | #(≈ 1 line of code) 657 | # loss = 658 | # YOUR CODE STARTS HERE 659 | loss = sum(abs(y-yhat)) 660 | # YOUR CODE ENDS HERE 661 | 662 | return loss 663 | 664 | 665 | # In[24]: 666 | 667 | 668 | yhat = np.array([.9, 0.2, 0.1, .4, .9]) 669 | y = np.array([1, 0, 0, 1, 1]) 670 | print("L1 = " + str(L1(yhat, y))) 671 | 672 | L1_test(L1) 673 | 674 | 675 | # 676 | # ### Exercise 9 - L2 677 | # Implement the numpy vectorized version of the L2 loss. There are several way of implementing the L2 loss but you may find the function np.dot() useful. As a reminder, if $x = [x_1, x_2, ..., x_n]$, then `np.dot(x,x)` = $\sum_{j=0}^n x_j^{2}$. 678 | # 679 | # - L2 loss is defined as $$\begin{align*} & L_2(\hat{y},y) = \sum_{i=0}^{m-1}(y^{(i)} - \hat{y}^{(i)})^2 \end{align*}\tag{7}$$ 680 | 681 | # In[27]: 682 | 683 | 684 | # GRADED FUNCTION: L2 685 | 686 | def L2(yhat, y): 687 | """ 688 | Arguments: 689 | yhat -- vector of size m (predicted labels) 690 | y -- vector of size m (true labels) 691 | 692 | Returns: 693 | loss -- the value of the L2 loss function defined above 694 | """ 695 | 696 | #(≈ 1 line of code) 697 | # loss = ... 698 | # YOUR CODE STARTS HERE 699 | loss = sum((y-yhat)**2) 700 | # YOUR CODE ENDS HERE 701 | 702 | return loss 703 | 704 | 705 | # In[28]: 706 | 707 | 708 | yhat = np.array([.9, 0.2, 0.1, .4, .9]) 709 | y = np.array([1, 0, 0, 1, 1]) 710 | 711 | print("L2 = " + str(L2(yhat, y))) 712 | 713 | L2_test(L2) 714 | 715 | 716 | # Congratulations on completing this assignment. We hope that this little warm-up exercise helps you in the future assignments, which will be more exciting and interesting! 717 | 718 | # 719 | # What to remember: 720 | # 721 | # - Vectorization is very important in deep learning. It provides computational efficiency and clarity. 722 | # - You have reviewed the L1 and L2 loss. 723 | # - You are familiar with many numpy functions such as np.sum, np.dot, np.multiply, np.maximum, etc... 724 | -------------------------------------------------------------------------------- /Course 1-Neural Networks & Deep Learning/Week 2/Week 2 Neural Network Basics.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TheKidPadra/DeepLearning.AI-Deep-Learning-Specialization/0a951ae6671f739e86b9a2846c821df1a0215ef6/Course 1-Neural Networks & Deep Learning/Week 2/Week 2 Neural Network Basics.pdf -------------------------------------------------------------------------------- /Course 1-Neural Networks & Deep Learning/Week 3/Week 3 Shallow Neural Networks.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TheKidPadra/DeepLearning.AI-Deep-Learning-Specialization/0a951ae6671f739e86b9a2846c821df1a0215ef6/Course 1-Neural Networks & Deep Learning/Week 3/Week 3 Shallow Neural Networks.pdf -------------------------------------------------------------------------------- /Course 1-Neural Networks & Deep Learning/Week 4/Deep Neural Network - Application.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # coding: utf-8 3 | 4 | # # Deep Neural Network for Image Classification: Application 5 | # 6 | # By the time you complete this notebook, you will have finished the last programming assignment of Week 4, and also the last programming assignment of Course 1! Go you! 7 | # 8 | # To build your cat/not-a-cat classifier, you'll use the functions from the previous assignment to build a deep network. Hopefully, you'll see an improvement in accuracy over your previous logistic regression implementation. 9 | # 10 | # **After this assignment you will be able to:** 11 | # 12 | # - Build and train a deep L-layer neural network, and apply it to supervised learning 13 | # 14 | # Let's get started! 15 | # 16 | # ## Important Note on Submission to the AutoGrader 17 | # 18 | # Before submitting your assignment to the AutoGrader, please make sure you are not doing the following: 19 | # 20 | # 1. You have not added any _extra_ `print` statement(s) in the assignment. 21 | # 2. You have not added any _extra_ code cell(s) in the assignment. 22 | # 3. You have not changed any of the function parameters. 23 | # 4. You are not using any global variables inside your graded exercises. Unless specifically instructed to do so, please refrain from it and use the local variables instead. 24 | # 5. You are not changing the assignment code where it is not required, like creating _extra_ variables. 25 | # 26 | # If you do any of the following, you will get something like, `Grader Error: Grader feedback not found` (or similarly unexpected) error upon submitting your assignment. Before asking for help/debugging the errors in your assignment, check for these first. If this is the case, and you don't remember the changes you have made, you can get a fresh copy of the assignment by following these [instructions](https://www.coursera.org/learn/neural-networks-deep-learning/supplement/iLwon/h-ow-to-refresh-your-workspace). 27 | 28 | # ## Table of Contents 29 | # - [1 - Packages](#1) 30 | # - [2 - Load and Process the Dataset](#2) 31 | # - [3 - Model Architecture](#3) 32 | # - [3.1 - 2-layer Neural Network](#3-1) 33 | # - [3.2 - L-layer Deep Neural Network](#3-2) 34 | # - [3.3 - General Methodology](#3-3) 35 | # - [4 - Two-layer Neural Network](#4) 36 | # - [Exercise 1 - two_layer_model](#ex-1) 37 | # - [4.1 - Train the model](#4-1) 38 | # - [5 - L-layer Neural Network](#5) 39 | # - [Exercise 2 - L_layer_model](#ex-2) 40 | # - [5.1 - Train the model](#5-1) 41 | # - [6 - Results Analysis](#6) 42 | # - [7 - Test with your own image (optional/ungraded exercise)](#7) 43 | 44 | # 45 | # ## 1 - Packages 46 | 47 | # Begin by importing all the packages you'll need during this assignment. 48 | # 49 | # - [numpy](https://www.numpy.org/) is the fundamental package for scientific computing with Python. 50 | # - [matplotlib](http://matplotlib.org) is a library to plot graphs in Python. 51 | # - [h5py](http://www.h5py.org) is a common package to interact with a dataset that is stored on an H5 file. 52 | # - [PIL](http://www.pythonware.com/products/pil/) and [scipy](https://www.scipy.org/) are used here to test your model with your own picture at the end. 53 | # - `dnn_app_utils` provides the functions implemented in the "Building your Deep Neural Network: Step by Step" assignment to this notebook. 54 | # - `np.random.seed(1)` is used to keep all the random function calls consistent. It helps grade your work - so please don't change it! 55 | 56 | # In[4]: 57 | 58 | 59 | import time 60 | import numpy as np 61 | import h5py 62 | import matplotlib.pyplot as plt 63 | import scipy 64 | from PIL import Image 65 | from scipy import ndimage 66 | from dnn_app_utils_v3 import * 67 | from public_tests import * 68 | 69 | get_ipython().run_line_magic('matplotlib', 'inline') 70 | plt.rcParams['figure.figsize'] = (5.0, 4.0) # set default size of plots 71 | plt.rcParams['image.interpolation'] = 'nearest' 72 | plt.rcParams['image.cmap'] = 'gray' 73 | 74 | get_ipython().run_line_magic('load_ext', 'autoreload') 75 | get_ipython().run_line_magic('autoreload', '2') 76 | 77 | np.random.seed(1) 78 | 79 | 80 | # 81 | # ## 2 - Load and Process the Dataset 82 | # 83 | # You'll be using the same "Cat vs non-Cat" dataset as in "Logistic Regression as a Neural Network" (Assignment 2). The model you built back then had 70% test accuracy on classifying cat vs non-cat images. Hopefully, your new model will perform even better! 84 | # 85 | # **Problem Statement**: You are given a dataset ("data.h5") containing: 86 | # - a training set of `m_train` images labelled as cat (1) or non-cat (0) 87 | # - a test set of `m_test` images labelled as cat and non-cat 88 | # - each image is of shape (num_px, num_px, 3) where 3 is for the 3 channels (RGB). 89 | # 90 | # Let's get more familiar with the dataset. Load the data by running the cell below. 91 | 92 | # In[5]: 93 | 94 | 95 | train_x_orig, train_y, test_x_orig, test_y, classes = load_data() 96 | 97 | 98 | # The following code will show you an image in the dataset. Feel free to change the index and re-run the cell multiple times to check out other images. 99 | 100 | # In[6]: 101 | 102 | 103 | # Example of a picture 104 | index = 25 105 | plt.imshow(train_x_orig[index]) 106 | print ("y = " + str(train_y[0,index]) + ". It's a " + classes[train_y[0,index]].decode("utf-8") + " picture.") 107 | 108 | 109 | # In[7]: 110 | 111 | 112 | # Explore your dataset 113 | m_train = train_x_orig.shape[0] 114 | num_px = train_x_orig.shape[1] 115 | m_test = test_x_orig.shape[0] 116 | 117 | print ("Number of training examples: " + str(m_train)) 118 | print ("Number of testing examples: " + str(m_test)) 119 | print ("Each image is of size: (" + str(num_px) + ", " + str(num_px) + ", 3)") 120 | print ("train_x_orig shape: " + str(train_x_orig.shape)) 121 | print ("train_y shape: " + str(train_y.shape)) 122 | print ("test_x_orig shape: " + str(test_x_orig.shape)) 123 | print ("test_y shape: " + str(test_y.shape)) 124 | 125 | 126 | # As usual, you reshape and standardize the images before feeding them to the network. The code is given in the cell below. 127 | # 128 | # 129 | #
Figure 1: Image to vector conversion.
130 | 131 | # In[8]: 132 | 133 | 134 | # Reshape the training and test examples 135 | train_x_flatten = train_x_orig.reshape(train_x_orig.shape[0], -1).T # The "-1" makes reshape flatten the remaining dimensions 136 | test_x_flatten = test_x_orig.reshape(test_x_orig.shape[0], -1).T 137 | 138 | # Standardize data to have feature values between 0 and 1. 139 | train_x = train_x_flatten/255. 140 | test_x = test_x_flatten/255. 141 | 142 | print ("train_x's shape: " + str(train_x.shape)) 143 | print ("test_x's shape: " + str(test_x.shape)) 144 | 145 | 146 | # **Note**: 147 | # $12,288$ equals $64 \times 64 \times 3$, which is the size of one reshaped image vector. 148 | 149 | # 150 | # ## 3 - Model Architecture 151 | 152 | # 153 | # ### 3.1 - 2-layer Neural Network 154 | # 155 | # Now that you're familiar with the dataset, it's time to build a deep neural network to distinguish cat images from non-cat images! 156 | # 157 | # You're going to build two different models: 158 | # 159 | # - A 2-layer neural network 160 | # - An L-layer deep neural network 161 | # 162 | # Then, you'll compare the performance of these models, and try out some different values for $L$. 163 | # 164 | # Let's look at the two architectures: 165 | # 166 | # 167 | #
Figure 2: 2-layer neural network.
The model can be summarized as: INPUT -> LINEAR -> RELU -> LINEAR -> SIGMOID -> OUTPUT.
168 | # 169 | # Detailed Architecture of Figure 2: 170 | # - The input is a (64,64,3) image which is flattened to a vector of size $(12288,1)$. 171 | # - The corresponding vector: $[x_0,x_1,...,x_{12287}]^T$ is then multiplied by the weight matrix $W^{[1]}$ of size $(n^{[1]}, 12288)$. 172 | # - Then, add a bias term and take its relu to get the following vector: $[a_0^{[1]}, a_1^{[1]},..., a_{n^{[1]}-1}^{[1]}]^T$. 173 | # - Multiply the resulting vector by $W^{[2]}$ and add the intercept (bias). 174 | # - Finally, take the sigmoid of the result. If it's greater than 0.5, classify it as a cat. 175 | # 176 | # 177 | # ### 3.2 - L-layer Deep Neural Network 178 | # 179 | # It's pretty difficult to represent an L-layer deep neural network using the above representation. However, here is a simplified network representation: 180 | # 181 | # 182 | #
Figure 3: L-layer neural network.
The model can be summarized as: [LINEAR -> RELU] $\times$ (L-1) -> LINEAR -> SIGMOID
183 | # 184 | # Detailed Architecture of Figure 3: 185 | # - The input is a (64,64,3) image which is flattened to a vector of size (12288,1). 186 | # - The corresponding vector: $[x_0,x_1,...,x_{12287}]^T$ is then multiplied by the weight matrix $W^{[1]}$ and then you add the intercept $b^{[1]}$. The result is called the linear unit. 187 | # - Next, take the relu of the linear unit. This process could be repeated several times for each $(W^{[l]}, b^{[l]})$ depending on the model architecture. 188 | # - Finally, take the sigmoid of the final linear unit. If it is greater than 0.5, classify it as a cat. 189 | # 190 | # 191 | # ### 3.3 - General Methodology 192 | # 193 | # As usual, you'll follow the Deep Learning methodology to build the model: 194 | # 195 | # 1. Initialize parameters / Define hyperparameters 196 | # 2. Loop for num_iterations: 197 | # a. Forward propagation 198 | # b. Compute cost function 199 | # c. Backward propagation 200 | # d. Update parameters (using parameters, and grads from backprop) 201 | # 3. Use trained parameters to predict labels 202 | # 203 | # Now go ahead and implement those two models! 204 | 205 | # 206 | # ## 4 - Two-layer Neural Network 207 | # 208 | # 209 | # ### Exercise 1 - two_layer_model 210 | # 211 | # Use the helper functions you have implemented in the previous assignment to build a 2-layer neural network with the following structure: *LINEAR -> RELU -> LINEAR -> SIGMOID*. The functions and their inputs are: 212 | # ```python 213 | # def initialize_parameters(n_x, n_h, n_y): 214 | # ... 215 | # return parameters 216 | # def linear_activation_forward(A_prev, W, b, activation): 217 | # ... 218 | # return A, cache 219 | # def compute_cost(AL, Y): 220 | # ... 221 | # return cost 222 | # def linear_activation_backward(dA, cache, activation): 223 | # ... 224 | # return dA_prev, dW, db 225 | # def update_parameters(parameters, grads, learning_rate): 226 | # ... 227 | # return parameters 228 | # ``` 229 | 230 | # In[9]: 231 | 232 | 233 | ### CONSTANTS DEFINING THE MODEL #### 234 | n_x = 12288 # num_px * num_px * 3 235 | n_h = 7 236 | n_y = 1 237 | layers_dims = (n_x, n_h, n_y) 238 | learning_rate = 0.0075 239 | 240 | 241 | # In[10]: 242 | 243 | 244 | # GRADED FUNCTION: two_layer_model 245 | 246 | def two_layer_model(X, Y, layers_dims, learning_rate = 0.0075, num_iterations = 3000, print_cost=False): 247 | """ 248 | Implements a two-layer neural network: LINEAR->RELU->LINEAR->SIGMOID. 249 | 250 | Arguments: 251 | X -- input data, of shape (n_x, number of examples) 252 | Y -- true "label" vector (containing 1 if cat, 0 if non-cat), of shape (1, number of examples) 253 | layers_dims -- dimensions of the layers (n_x, n_h, n_y) 254 | num_iterations -- number of iterations of the optimization loop 255 | learning_rate -- learning rate of the gradient descent update rule 256 | print_cost -- If set to True, this will print the cost every 100 iterations 257 | 258 | Returns: 259 | parameters -- a dictionary containing W1, W2, b1, and b2 260 | """ 261 | 262 | np.random.seed(1) 263 | grads = {} 264 | costs = [] # to keep track of the cost 265 | m = X.shape[1] # number of examples 266 | (n_x, n_h, n_y) = layers_dims 267 | 268 | # Initialize parameters dictionary, by calling one of the functions you'd previously implemented 269 | #(≈ 1 line of code) 270 | # parameters = ... 271 | # YOUR CODE STARTS HERE 272 | parameters = initialize_parameters(n_x, n_h, n_y) 273 | 274 | # YOUR CODE ENDS HERE 275 | 276 | # Get W1, b1, W2 and b2 from the dictionary parameters. 277 | W1 = parameters["W1"] 278 | b1 = parameters["b1"] 279 | W2 = parameters["W2"] 280 | b2 = parameters["b2"] 281 | 282 | # Loop (gradient descent) 283 | 284 | for i in range(0, num_iterations): 285 | 286 | # Forward propagation: LINEAR -> RELU -> LINEAR -> SIGMOID. Inputs: "X, W1, b1, W2, b2". Output: "A1, cache1, A2, cache2". 287 | #(≈ 2 lines of code) 288 | # A1, cache1 = ... 289 | # A2, cache2 = ... 290 | # YOUR CODE STARTS HERE 291 | A1, cache1 = linear_activation_forward(X, W1, b1, activation="relu") 292 | A2, cache2 = linear_activation_forward(A1, W2, b2, activation="sigmoid") 293 | 294 | # YOUR CODE ENDS HERE 295 | 296 | # Compute cost 297 | #(≈ 1 line of code) 298 | # cost = ... 299 | # YOUR CODE STARTS HERE 300 | 301 | cost = compute_cost(A2, Y) 302 | # YOUR CODE ENDS HERE 303 | 304 | # Initializing backward propagation 305 | dA2 = - (np.divide(Y, A2) - np.divide(1 - Y, 1 - A2)) 306 | 307 | # Backward propagation. Inputs: "dA2, cache2, cache1". Outputs: "dA1, dW2, db2; also dA0 (not used), dW1, db1". 308 | #(≈ 2 lines of code) 309 | # dA1, dW2, db2 = ... 310 | # dA0, dW1, db1 = ... 311 | # YOUR CODE STARTS HERE 312 | dA1, dW2, db2 = linear_activation_backward(dA2, cache2, activation="sigmoid") 313 | dA0, dW1, db1 = linear_activation_backward(dA1, cache1, activation="relu") 314 | 315 | # YOUR CODE ENDS HERE 316 | 317 | # Set grads['dWl'] to dW1, grads['db1'] to db1, grads['dW2'] to dW2, grads['db2'] to db2 318 | grads['dW1'] = dW1 319 | grads['db1'] = db1 320 | grads['dW2'] = dW2 321 | grads['db2'] = db2 322 | 323 | # Update parameters. 324 | #(approx. 1 line of code) 325 | # parameters = ... 326 | # YOUR CODE STARTS HERE 327 | 328 | parameters = update_parameters(parameters, grads, learning_rate) 329 | # YOUR CODE ENDS HERE 330 | 331 | # Retrieve W1, b1, W2, b2 from parameters 332 | W1 = parameters["W1"] 333 | b1 = parameters["b1"] 334 | W2 = parameters["W2"] 335 | b2 = parameters["b2"] 336 | 337 | # Print the cost every 100 iterations 338 | if print_cost and i % 100 == 0 or i == num_iterations - 1: 339 | print("Cost after iteration {}: {}".format(i, np.squeeze(cost))) 340 | if i % 100 == 0 or i == num_iterations: 341 | costs.append(cost) 342 | 343 | return parameters, costs 344 | 345 | def plot_costs(costs, learning_rate=0.0075): 346 | plt.plot(np.squeeze(costs)) 347 | plt.ylabel('cost') 348 | plt.xlabel('iterations (per hundreds)') 349 | plt.title("Learning rate =" + str(learning_rate)) 350 | plt.show() 351 | 352 | 353 | # In[11]: 354 | 355 | 356 | parameters, costs = two_layer_model(train_x, train_y, layers_dims = (n_x, n_h, n_y), num_iterations = 2, print_cost=False) 357 | 358 | print("Cost after first iteration: " + str(costs[0])) 359 | 360 | two_layer_model_test(two_layer_model) 361 | 362 | 363 | # **Expected output:** 364 | # 365 | # ``` 366 | # cost after iteration 1 must be around 0.69 367 | # ``` 368 | 369 | # 370 | # ### 4.1 - Train the model 371 | # 372 | # If your code passed the previous cell, run the cell below to train your parameters. 373 | # 374 | # - The cost should decrease on every iteration. 375 | # 376 | # - It may take up to 5 minutes to run 2500 iterations. 377 | 378 | # In[12]: 379 | 380 | 381 | parameters, costs = two_layer_model(train_x, train_y, layers_dims = (n_x, n_h, n_y), num_iterations = 2500, print_cost=True) 382 | plot_costs(costs, learning_rate) 383 | 384 | 385 | # **Expected Output**: 386 | # 387 | # 388 | # 389 | # 390 | # 391 | # 392 | # 393 | # 394 | # 395 | # 396 | # 397 | # 398 | # 399 | # 400 | # 401 | # 402 | # 403 | #
Cost after iteration 0 0.6930497356599888
Cost after iteration 100 0.6464320953428849
... ...
Cost after iteration 2499 0.04421498215868956
404 | 405 | # **Nice!** You successfully trained the model. Good thing you built a vectorized implementation! Otherwise it might have taken 10 times longer to train this. 406 | # 407 | # Now, you can use the trained parameters to classify images from the dataset. To see your predictions on the training and test sets, run the cell below. 408 | 409 | # In[13]: 410 | 411 | 412 | predictions_train = predict(train_x, train_y, parameters) 413 | 414 | 415 | # **Expected Output**: 416 | # 417 | # 418 | # 419 | # 420 | # 421 | #
Accuracy 0.9999999999999998
422 | 423 | # In[14]: 424 | 425 | 426 | predictions_test = predict(test_x, test_y, parameters) 427 | 428 | 429 | # **Expected Output**: 430 | # 431 | # 432 | # 433 | # 434 | # 435 | # 436 | #
Accuracy 0.72
437 | 438 | # ### Congratulations! It seems that your 2-layer neural network has better performance (72%) than the logistic regression implementation (70%, assignment week 2). Let's see if you can do even better with an $L$-layer model. 439 | # 440 | # **Note**: You may notice that running the model on fewer iterations (say 1500) gives better accuracy on the test set. This is called "early stopping" and you'll hear more about it in the next course. Early stopping is a way to prevent overfitting. 441 | 442 | # 443 | # ## 5 - L-layer Neural Network 444 | # 445 | # 446 | # ### Exercise 2 - L_layer_model 447 | # 448 | # Use the helper functions you implemented previously to build an $L$-layer neural network with the following structure: *[LINEAR -> RELU]$\times$(L-1) -> LINEAR -> SIGMOID*. The functions and their inputs are: 449 | # ```python 450 | # def initialize_parameters_deep(layers_dims): 451 | # ... 452 | # return parameters 453 | # def L_model_forward(X, parameters): 454 | # ... 455 | # return AL, caches 456 | # def compute_cost(AL, Y): 457 | # ... 458 | # return cost 459 | # def L_model_backward(AL, Y, caches): 460 | # ... 461 | # return grads 462 | # def update_parameters(parameters, grads, learning_rate): 463 | # ... 464 | # return parameters 465 | # ``` 466 | 467 | # In[15]: 468 | 469 | 470 | ### CONSTANTS ### 471 | layers_dims = [12288, 20, 7, 5, 1] # 4-layer model 472 | 473 | 474 | # In[16]: 475 | 476 | 477 | # GRADED FUNCTION: L_layer_model 478 | 479 | def L_layer_model(X, Y, layers_dims, learning_rate = 0.0075, num_iterations = 3000, print_cost=False): 480 | """ 481 | Implements a L-layer neural network: [LINEAR->RELU]*(L-1)->LINEAR->SIGMOID. 482 | 483 | Arguments: 484 | X -- input data, of shape (n_x, number of examples) 485 | Y -- true "label" vector (containing 1 if cat, 0 if non-cat), of shape (1, number of examples) 486 | layers_dims -- list containing the input size and each layer size, of length (number of layers + 1). 487 | learning_rate -- learning rate of the gradient descent update rule 488 | num_iterations -- number of iterations of the optimization loop 489 | print_cost -- if True, it prints the cost every 100 steps 490 | 491 | Returns: 492 | parameters -- parameters learnt by the model. They can then be used to predict. 493 | """ 494 | 495 | np.random.seed(1) 496 | costs = [] # keep track of cost 497 | 498 | # Parameters initialization. 499 | #(≈ 1 line of code) 500 | # parameters = ... 501 | # YOUR CODE STARTS HERE 502 | parameters = initialize_parameters_deep(layers_dims) 503 | 504 | # YOUR CODE ENDS HERE 505 | 506 | # Loop (gradient descent) 507 | for i in range(0, num_iterations): 508 | 509 | # Forward propagation: [LINEAR -> RELU]*(L-1) -> LINEAR -> SIGMOID. 510 | #(≈ 1 line of code) 511 | # AL, caches = ... 512 | # YOUR CODE STARTS HERE 513 | AL, caches = L_model_forward(X, parameters) 514 | 515 | # YOUR CODE ENDS HERE 516 | 517 | # Compute cost. 518 | #(≈ 1 line of code) 519 | # cost = ... 520 | # YOUR CODE STARTS HERE 521 | cost = compute_cost(AL, Y) 522 | 523 | # YOUR CODE ENDS HERE 524 | 525 | # Backward propagation. 526 | #(≈ 1 line of code) 527 | # grads = ... 528 | # YOUR CODE STARTS HERE 529 | grads = L_model_backward(AL, Y, caches) 530 | 531 | # YOUR CODE ENDS HERE 532 | 533 | # Update parameters. 534 | #(≈ 1 line of code) 535 | # parameters = ... 536 | # YOUR CODE STARTS HERE 537 | parameters = update_parameters(parameters, grads, learning_rate) 538 | 539 | # YOUR CODE ENDS HERE 540 | 541 | # Print the cost every 100 iterations 542 | if print_cost and i % 100 == 0 or i == num_iterations - 1: 543 | print("Cost after iteration {}: {}".format(i, np.squeeze(cost))) 544 | if i % 100 == 0 or i == num_iterations: 545 | costs.append(cost) 546 | 547 | return parameters, costs 548 | 549 | 550 | # In[17]: 551 | 552 | 553 | parameters, costs = L_layer_model(train_x, train_y, layers_dims, num_iterations = 1, print_cost = False) 554 | 555 | print("Cost after first iteration: " + str(costs[0])) 556 | 557 | L_layer_model_test(L_layer_model) 558 | 559 | 560 | # 561 | # ### 5.1 - Train the model 562 | # 563 | # If your code passed the previous cell, run the cell below to train your model as a 4-layer neural network. 564 | # 565 | # - The cost should decrease on every iteration. 566 | # 567 | # - It may take up to 5 minutes to run 2500 iterations. 568 | 569 | # In[18]: 570 | 571 | 572 | parameters, costs = L_layer_model(train_x, train_y, layers_dims, num_iterations = 2500, print_cost = True) 573 | 574 | 575 | # **Expected Output**: 576 | # 577 | # 578 | # 579 | # 580 | # 581 | # 582 | # 583 | # 584 | # 585 | # 586 | # 587 | # 588 | # 589 | # 590 | # 591 | # 592 | # 593 | #
Cost after iteration 0 0.771749
Cost after iteration 100 0.672053
... ...
Cost after iteration 2499 0.088439
594 | 595 | # In[19]: 596 | 597 | 598 | pred_train = predict(train_x, train_y, parameters) 599 | 600 | 601 | # **Expected Output**: 602 | # 603 | # 604 | # 605 | # 608 | # 611 | # 612 | #
606 | # Train Accuracy 607 | # 609 | # 0.985645933014 610 | #
613 | 614 | # In[20]: 615 | 616 | 617 | pred_test = predict(test_x, test_y, parameters) 618 | 619 | 620 | # **Expected Output**: 621 | # 622 | # 623 | # 624 | # 625 | # 626 | # 627 | #
Test Accuracy 0.8
628 | 629 | # ### Congrats! It seems that your 4-layer neural network has better performance (80%) than your 2-layer neural network (72%) on the same test set. 630 | # 631 | # This is pretty good performance for this task. Nice job! 632 | # 633 | # In the next course on "Improving deep neural networks," you'll be able to obtain even higher accuracy by systematically searching for better hyperparameters: learning_rate, layers_dims, or num_iterations, for example. 634 | 635 | # 636 | # ## 6 - Results Analysis 637 | # 638 | # First, take a look at some images the L-layer model labeled incorrectly. This will show a few mislabeled images. 639 | 640 | # In[21]: 641 | 642 | 643 | print_mislabeled_images(classes, test_x, test_y, pred_test) 644 | 645 | 646 | # **A few types of images the model tends to do poorly on include:** 647 | # - Cat body in an unusual position 648 | # - Cat appears against a background of a similar color 649 | # - Unusual cat color and species 650 | # - Camera Angle 651 | # - Brightness of the picture 652 | # - Scale variation (cat is very large or small in image) 653 | 654 | # ### Congratulations on finishing this assignment! 655 | # 656 | # You just built and trained a deep L-layer neural network, and applied it in order to distinguish cats from non-cats, a very serious and important task in deep learning. ;) 657 | # 658 | # By now, you've also completed all the assignments for Course 1 in the Deep Learning Specialization. Amazing work! If you'd like to test out how closely you resemble a cat yourself, there's an optional ungraded exercise below, where you can test your own image. 659 | # 660 | # Great work and hope to see you in the next course! 661 | 662 | # 663 | # ## 7 - Test with your own image (optional/ungraded exercise) ## 664 | # 665 | # From this point, if you so choose, you can use your own image to test the output of your model. To do that follow these steps: 666 | # 667 | # 1. Click on "File" in the upper bar of this notebook, then click "Open" to go on your Coursera Hub. 668 | # 2. Add your image to this Jupyter Notebook's directory, in the "images" folder 669 | # 3. Change your image's name in the following code 670 | # 4. Run the code and check if the algorithm is right (1 = cat, 0 = non-cat)! 671 | 672 | # In[22]: 673 | 674 | 675 | ## START CODE HERE ## 676 | my_image = "my_image.jpg" # change this to the name of your image file 677 | my_label_y = [1] # the true class of your image (1 -> cat, 0 -> non-cat) 678 | ## END CODE HERE ## 679 | 680 | fname = "images/" + my_image 681 | image = np.array(Image.open(fname).resize((num_px, num_px))) 682 | plt.imshow(image) 683 | image = image / 255. 684 | image = image.reshape((1, num_px * num_px * 3)).T 685 | 686 | my_predicted_image = predict(image, my_label_y, parameters) 687 | 688 | 689 | print ("y = " + str(np.squeeze(my_predicted_image)) + ", your L-layer model predicts a \"" + classes[int(np.squeeze(my_predicted_image)),].decode("utf-8") + "\" picture.") 690 | 691 | 692 | # **References**: 693 | # 694 | # - for auto-reloading external module: http://stackoverflow.com/questions/1907993/autoreload-of-modules-in-ipython 695 | -------------------------------------------------------------------------------- /Course 1-Neural Networks & Deep Learning/Week 4/Week 4 Key Concepts on Deep Neural Networks.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TheKidPadra/DeepLearning.AI-Deep-Learning-Specialization/0a951ae6671f739e86b9a2846c821df1a0215ef6/Course 1-Neural Networks & Deep Learning/Week 4/Week 4 Key Concepts on Deep Neural Networks.pdf -------------------------------------------------------------------------------- /Course 2-Improving Deep Neural Networks/Week 1/Gradient_Checking.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # coding: utf-8 3 | 4 | # # Gradient Checking 5 | # 6 | # Welcome to the final assignment for this week! In this assignment you'll be implementing gradient checking. 7 | # 8 | # By the end of this notebook, you'll be able to: 9 | # 10 | # Implement gradient checking to verify the accuracy of your backprop implementation. 11 | # 12 | # ## Important Note on Submission to the AutoGrader 13 | # 14 | # Before submitting your assignment to the AutoGrader, please make sure you are not doing the following: 15 | # 16 | # 1. You have not added any _extra_ `print` statement(s) in the assignment. 17 | # 2. You have not added any _extra_ code cell(s) in the assignment. 18 | # 3. You have not changed any of the function parameters. 19 | # 4. You are not using any global variables inside your graded exercises. Unless specifically instructed to do so, please refrain from it and use the local variables instead. 20 | # 5. You are not changing the assignment code where it is not required, like creating _extra_ variables. 21 | # 22 | # If you do any of the following, you will get something like, `Grader Error: Grader feedback not found` (or similarly unexpected) error upon submitting your assignment. Before asking for help/debugging the errors in your assignment, check for these first. If this is the case, and you don't remember the changes you have made, you can get a fresh copy of the assignment by following these [instructions](https://www.coursera.org/learn/deep-neural-network/supplement/QWEnZ/h-ow-to-refresh-your-workspace). 23 | 24 | # ## Table of Contents 25 | # - [1 - Packages](#1) 26 | # - [2 - Problem Statement](#2) 27 | # - [3 - How does Gradient Checking work?](#3) 28 | # - [4 - 1-Dimensional Gradient Checking](#4) 29 | # - [Exercise 1 - forward_propagation](#ex-1) 30 | # - [Exercise 2 - backward_propagation](#ex-2) 31 | # - [Exercise 3 - gradient_check](#ex-3) 32 | # - [5 - N-Dimensional Gradient Checking](#5) 33 | # - [Exercise 4 - gradient_check_n](#ex-4) 34 | 35 | # 36 | # ## 1 - Packages 37 | 38 | # In[1]: 39 | 40 | 41 | import numpy as np 42 | from testCases import * 43 | from public_tests import * 44 | from gc_utils import sigmoid, relu, dictionary_to_vector, vector_to_dictionary, gradients_to_vector 45 | 46 | get_ipython().run_line_magic('load_ext', 'autoreload') 47 | get_ipython().run_line_magic('autoreload', '2') 48 | 49 | 50 | # 51 | # ## 2 - Problem Statement 52 | # 53 | # You are part of a team working to make mobile payments available globally, and are asked to build a deep learning model to detect fraud--whenever someone makes a payment, you want to see if the payment might be fraudulent, such as if the user's account has been taken over by a hacker. 54 | # 55 | # You already know that backpropagation is quite challenging to implement, and sometimes has bugs. Because this is a mission-critical application, your company's CEO wants to be really certain that your implementation of backpropagation is correct. Your CEO says, "Give me proof that your backpropagation is actually working!" To give this reassurance, you are going to use "gradient checking." 56 | # 57 | # Let's do it! 58 | 59 | # 60 | # ## 3 - How does Gradient Checking work? 61 | # Backpropagation computes the gradients $\frac{\partial J}{\partial \theta}$, where $\theta$ denotes the parameters of the model. $J$ is computed using forward propagation and your loss function. 62 | # 63 | # Because forward propagation is relatively easy to implement, you're confident you got that right, and so you're almost 100% sure that you're computing the cost $J$ correctly. Thus, you can use your code for computing $J$ to verify the code for computing $\frac{\partial J}{\partial \theta}$. 64 | # 65 | # Let's look back at the definition of a derivative (or gradient):$$ \frac{\partial J}{\partial \theta} = \lim_{\varepsilon \to 0} \frac{J(\theta + \varepsilon) - J(\theta - \varepsilon)}{2 \varepsilon} \tag{1}$$ 66 | # 67 | # If you're not familiar with the "$\displaystyle \lim_{\varepsilon \to 0}$" notation, it's just a way of saying "when $\varepsilon$ is really, really small." 68 | # 69 | # You know the following: 70 | # 71 | # $\frac{\partial J}{\partial \theta}$ is what you want to make sure you're computing correctly. 72 | # You can compute $J(\theta + \varepsilon)$ and $J(\theta - \varepsilon)$ (in the case that $\theta$ is a real number), since you're confident your implementation for $J$ is correct. 73 | # Let's use equation (1) and a small value for $\varepsilon$ to convince your CEO that your code for computing $\frac{\partial J}{\partial \theta}$ is correct! 74 | 75 | # 76 | # ## 4 - 1-Dimensional Gradient Checking 77 | # 78 | # Consider a 1D linear function $J(\theta) = \theta x$. The model contains only a single real-valued parameter $\theta$, and takes $x$ as input. 79 | # 80 | # You will implement code to compute $J(.)$ and its derivative $\frac{\partial J}{\partial \theta}$. You will then use gradient checking to make sure your derivative computation for $J$ is correct. 81 | # 82 | # 83 | #
Figure 1:1D linear model
84 | # 85 | # The diagram above shows the key computation steps: First start with $x$, then evaluate the function $J(x)$ ("forward propagation"). Then compute the derivative $\frac{\partial J}{\partial \theta}$ ("backward propagation"). 86 | # 87 | # 88 | # ### Exercise 1 - forward_propagation 89 | # 90 | # Implement `forward propagation`. For this simple function compute $J(.)$ 91 | 92 | # In[2]: 93 | 94 | 95 | # GRADED FUNCTION: forward_propagation 96 | 97 | def forward_propagation(x, theta): 98 | """ 99 | Implement the linear forward propagation (compute J) presented in Figure 1 (J(theta) = theta * x) 100 | 101 | Arguments: 102 | x -- a real-valued input 103 | theta -- our parameter, a real number as well 104 | 105 | Returns: 106 | J -- the value of function J, computed using the formula J(theta) = theta * x 107 | """ 108 | 109 | # (approx. 1 line) 110 | # J = 111 | # YOUR CODE STARTS HERE 112 | J = np.dot(theta,x) 113 | 114 | # YOUR CODE ENDS HERE 115 | 116 | return J 117 | 118 | 119 | # In[3]: 120 | 121 | 122 | x, theta = 2, 4 123 | J = forward_propagation(x, theta) 124 | print ("J = " + str(J)) 125 | forward_propagation_test(forward_propagation) 126 | 127 | 128 | # 129 | # ### Exercise 2 - backward_propagation 130 | # 131 | # Now, implement the `backward propagation` step (derivative computation) of Figure 1. That is, compute the derivative of $J(\theta) = \theta x$ with respect to $\theta$. To save you from doing the calculus, you should get $dtheta = \frac { \partial J }{ \partial \theta} = x$. 132 | 133 | # In[4]: 134 | 135 | 136 | # GRADED FUNCTION: backward_propagation 137 | 138 | def backward_propagation(x, theta): 139 | """ 140 | Computes the derivative of J with respect to theta (see Figure 1). 141 | 142 | Arguments: 143 | x -- a real-valued input 144 | theta -- our parameter, a real number as well 145 | 146 | Returns: 147 | dtheta -- the gradient of the cost with respect to theta 148 | """ 149 | 150 | # (approx. 1 line) 151 | # dtheta = 152 | # YOUR CODE STARTS HERE 153 | 154 | dtheta=x 155 | # YOUR CODE ENDS HERE 156 | 157 | return dtheta 158 | 159 | 160 | # In[5]: 161 | 162 | 163 | x, theta = 3, 4 164 | dtheta = backward_propagation(x, theta) 165 | print ("dtheta = " + str(dtheta)) 166 | backward_propagation_test(backward_propagation) 167 | 168 | 169 | # #### Expected output: 170 | # 171 | # ``` 172 | # dtheta = 3 173 | # All tests passed. 174 | # ``` 175 | 176 | # 177 | # ### Exercise 3 - gradient_check 178 | # 179 | # To show that the `backward_propagation()` function is correctly computing the gradient $\frac{\partial J}{\partial \theta}$, let's implement gradient checking. 180 | # 181 | # **Instructions**: 182 | # - First compute "gradapprox" using the formula above (1) and a small value of $\varepsilon$. Here are the Steps to follow: 183 | # 1. $\theta^{+} = \theta + \varepsilon$ 184 | # 2. $\theta^{-} = \theta - \varepsilon$ 185 | # 3. $J^{+} = J(\theta^{+})$ 186 | # 4. $J^{-} = J(\theta^{-})$ 187 | # 5. $gradapprox = \frac{J^{+} - J^{-}}{2 \varepsilon}$ 188 | # - Then compute the gradient using backward propagation, and store the result in a variable "grad" 189 | # - Finally, compute the relative difference between "gradapprox" and the "grad" using the following formula: 190 | # $$ difference = \frac {\mid\mid grad - gradapprox \mid\mid_2}{\mid\mid grad \mid\mid_2 + \mid\mid gradapprox \mid\mid_2} \tag{2}$$ 191 | # You will need 3 Steps to compute this formula: 192 | # - 1'. compute the numerator using np.linalg.norm(...) 193 | # - 2'. compute the denominator. You will need to call np.linalg.norm(...) twice. 194 | # - 3'. divide them. 195 | # - If this difference is small (say less than $10^{-7}$), you can be quite confident that you have computed your gradient correctly. Otherwise, there may be a mistake in the gradient computation. 196 | # 197 | 198 | # In[8]: 199 | 200 | 201 | # GRADED FUNCTION: gradient_check 202 | 203 | def gradient_check(x, theta, epsilon=1e-7, print_msg=False): 204 | """ 205 | Implement the gradient checking presented in Figure 1. 206 | 207 | Arguments: 208 | x -- a float input 209 | theta -- our parameter, a float as well 210 | epsilon -- tiny shift to the input to compute approximated gradient with formula(1) 211 | 212 | Returns: 213 | difference -- difference (2) between the approximated gradient and the backward propagation gradient. Float output 214 | """ 215 | 216 | # Compute gradapprox using right side of formula (1). epsilon is small enough, you don't need to worry about the limit. 217 | # (approx. 5 lines) 218 | # theta_plus = # Step 1 219 | # theta_minus = # Step 2 220 | # J_plus = # Step 3 221 | # J_minus = # Step 4 222 | # gradapprox = # Step 5 223 | # YOUR CODE STARTS HERE 224 | 225 | thetaplus = theta + epsilon # Step 1 226 | thetaminus = theta - epsilon # Step 2 227 | J_plus = np.dot(thetaplus,x) # Step 3 228 | J_minus = np.dot(thetaminus,x) # Step 4 229 | gradapprox = (J_plus - J_minus)/(2*epsilon) 230 | # YOUR CODE ENDS HERE 231 | 232 | # Check if gradapprox is close enough to the output of backward_propagation() 233 | #(approx. 1 line) DO NOT USE "grad = gradapprox" 234 | # grad = 235 | # YOUR CODE STARTS HERE 236 | grad = x 237 | 238 | # YOUR CODE ENDS HERE 239 | 240 | #(approx. 3 lines) 241 | # numerator = # Step 1' 242 | # denominator = # Step 2' 243 | # difference = # Step 3' 244 | # YOUR CODE STARTS HERE 245 | 246 | numerator = np.linalg.norm(gradapprox-grad) # Step 1' 247 | denominator = np.linalg.norm(gradapprox) + np.linalg.norm(grad) # Step 2' 248 | difference = numerator/denominator # Step 3' 249 | # YOUR CODE ENDS HERE 250 | if print_msg: 251 | if difference > 2e-7: 252 | print ("\033[93m" + "There is a mistake in the backward propagation! difference = " + str(difference) + "\033[0m") 253 | else: 254 | print ("\033[92m" + "Your backward propagation works perfectly fine! difference = " + str(difference) + "\033[0m") 255 | 256 | return difference 257 | 258 | 259 | # In[9]: 260 | 261 | 262 | x, theta = 3, 4 263 | difference = gradient_check(x, theta, print_msg=True) 264 | 265 | 266 | # **Expected output**: 267 | # 268 | # 269 | # 270 | # 271 | # 272 | # 273 | #
Your backward propagation works perfectly fine! difference = 7.814075313343006e-11
274 | 275 | # Congrats, the difference is smaller than the $2 * 10^{-7}$ threshold. So you can have high confidence that you've correctly computed the gradient in `backward_propagation()`. 276 | # 277 | # Now, in the more general case, your cost function $J$ has more than a single 1D input. When you are training a neural network, $\theta$ actually consists of multiple matrices $W^{[l]}$ and biases $b^{[l]}$! It is important to know how to do a gradient check with higher-dimensional inputs. Let's do it! 278 | 279 | # 280 | # ## 5 - N-Dimensional Gradient Checking 281 | 282 | # The following figure describes the forward and backward propagation of your fraud detection model. 283 | # 284 | # 285 | #
Figure 2: Deep neural network. LINEAR -> RELU -> LINEAR -> RELU -> LINEAR -> SIGMOID
286 | # 287 | # Let's look at your implementations for forward propagation and backward propagation. 288 | 289 | # In[10]: 290 | 291 | 292 | def forward_propagation_n(X, Y, parameters): 293 | """ 294 | Implements the forward propagation (and computes the cost) presented in Figure 3. 295 | 296 | Arguments: 297 | X -- training set for m examples 298 | Y -- labels for m examples 299 | parameters -- python dictionary containing your parameters "W1", "b1", "W2", "b2", "W3", "b3": 300 | W1 -- weight matrix of shape (5, 4) 301 | b1 -- bias vector of shape (5, 1) 302 | W2 -- weight matrix of shape (3, 5) 303 | b2 -- bias vector of shape (3, 1) 304 | W3 -- weight matrix of shape (1, 3) 305 | b3 -- bias vector of shape (1, 1) 306 | 307 | Returns: 308 | cost -- the cost function (logistic cost for m examples) 309 | cache -- a tuple with the intermediate values (Z1, A1, W1, b1, Z2, A2, W2, b2, Z3, A3, W3, b3) 310 | 311 | """ 312 | 313 | # retrieve parameters 314 | m = X.shape[1] 315 | W1 = parameters["W1"] 316 | b1 = parameters["b1"] 317 | W2 = parameters["W2"] 318 | b2 = parameters["b2"] 319 | W3 = parameters["W3"] 320 | b3 = parameters["b3"] 321 | 322 | # LINEAR -> RELU -> LINEAR -> RELU -> LINEAR -> SIGMOID 323 | Z1 = np.dot(W1, X) + b1 324 | A1 = relu(Z1) 325 | Z2 = np.dot(W2, A1) + b2 326 | A2 = relu(Z2) 327 | Z3 = np.dot(W3, A2) + b3 328 | A3 = sigmoid(Z3) 329 | 330 | # Cost 331 | log_probs = np.multiply(-np.log(A3),Y) + np.multiply(-np.log(1 - A3), 1 - Y) 332 | cost = 1. / m * np.sum(log_probs) 333 | 334 | cache = (Z1, A1, W1, b1, Z2, A2, W2, b2, Z3, A3, W3, b3) 335 | 336 | return cost, cache 337 | 338 | 339 | # Now, run backward propagation. 340 | 341 | # In[14]: 342 | 343 | 344 | def backward_propagation_n(X, Y, cache): 345 | """ 346 | Implement the backward propagation presented in figure 2. 347 | 348 | Arguments: 349 | X -- input datapoint, of shape (input size, 1) 350 | Y -- true "label" 351 | cache -- cache output from forward_propagation_n() 352 | 353 | Returns: 354 | gradients -- A dictionary with the gradients of the cost with respect to each parameter, activation and pre-activation variables. 355 | """ 356 | 357 | m = X.shape[1] 358 | (Z1, A1, W1, b1, Z2, A2, W2, b2, Z3, A3, W3, b3) = cache 359 | 360 | dZ3 = A3 - Y 361 | dW3 = 1./m * np.dot(dZ3, A2.T) 362 | db3 = 1./m * np.sum(dZ3, axis=1, keepdims = True) 363 | 364 | dA2 = np.dot(W3.T, dZ3) 365 | dZ2 = np.multiply(dA2, np.int64(A2 > 0)) 366 | dW2 = 1./m * np.dot(dZ2, A1.T) 367 | db2 = 1./m * np.sum(dZ2, axis=1, keepdims = True) 368 | 369 | dA1 = np.dot(W2.T, dZ2) 370 | dZ1 = np.multiply(dA1, np.int64(A1 > 0)) 371 | dW1 = 1./m * np.dot(dZ1, X.T) 372 | db1 = 1./m * np.sum(dZ1, axis=1, keepdims = True) 373 | 374 | gradients = {"dZ3": dZ3, "dW3": dW3, "db3": db3, 375 | "dA2": dA2, "dZ2": dZ2, "dW2": dW2, "db2": db2, 376 | "dA1": dA1, "dZ1": dZ1, "dW1": dW1, "db1": db1} 377 | 378 | return gradients 379 | 380 | 381 | # You obtained some results on the fraud detection test set but you are not 100% sure of your model. Nobody's perfect! Let's implement gradient checking to verify if your gradients are correct. 382 | 383 | # **How does gradient checking work?**. 384 | # 385 | # As in Section 3 and 4, you want to compare "gradapprox" to the gradient computed by backpropagation. The formula is still: 386 | # 387 | # $$ \frac{\partial J}{\partial \theta} = \lim_{\varepsilon \to 0} \frac{J(\theta + \varepsilon) - J(\theta - \varepsilon)}{2 \varepsilon} \tag{1}$$ 388 | # 389 | # However, $\theta$ is not a scalar anymore. It is a dictionary called "parameters". The function "`dictionary_to_vector()`" has been implemented for you. It converts the "parameters" dictionary into a vector called "values", obtained by reshaping all parameters (W1, b1, W2, b2, W3, b3) into vectors and concatenating them. 390 | # 391 | # The inverse function is "`vector_to_dictionary`" which outputs back the "parameters" dictionary. 392 | # 393 | # 394 | #
Figure 2: dictionary_to_vector() and vector_to_dictionary(). You will need these functions in gradient_check_n()
395 | # 396 | # The "gradients" dictionary has also been converted into a vector "grad" using gradients_to_vector(), so you don't need to worry about that. 397 | # 398 | # Now, for every single parameter in your vector, you will apply the same procedure as for the gradient_check exercise. You will store each gradient approximation in a vector `gradapprox`. If the check goes as expected, each value in this approximation must match the real gradient values stored in the `grad` vector. 399 | # 400 | # Note that `grad` is calculated using the function `gradients_to_vector`, which uses the gradients outputs of the `backward_propagation_n` function. 401 | # 402 | # 403 | # ### Exercise 4 - gradient_check_n 404 | # 405 | # Implement the function below. 406 | # 407 | # **Instructions**: Here is pseudo-code that will help you implement the gradient check. 408 | # 409 | # For each i in num_parameters: 410 | # - To compute `J_plus[i]`: 411 | # 1. Set $\theta^{+}$ to `np.copy(parameters_values)` 412 | # 2. Set $\theta^{+}_i$ to $\theta^{+}_i + \varepsilon$ 413 | # 3. Calculate $J^{+}_i$ using to `forward_propagation_n(x, y, vector_to_dictionary(`$\theta^{+}$ `))`. 414 | # - To compute `J_minus[i]`: do the same thing with $\theta^{-}$ 415 | # - Compute $gradapprox[i] = \frac{J^{+}_i - J^{-}_i}{2 \varepsilon}$ 416 | # 417 | # Thus, you get a vector gradapprox, where gradapprox[i] is an approximation of the gradient with respect to `parameter_values[i]`. You can now compare this gradapprox vector to the gradients vector from backpropagation. Just like for the 1D case (Steps 1', 2', 3'), compute: 418 | # $$ difference = \frac {\| grad - gradapprox \|_2}{\| grad \|_2 + \| gradapprox \|_2 } \tag{3}$$ 419 | # 420 | # **Note**: Use `np.linalg.norm` to get the norms 421 | 422 | # In[15]: 423 | 424 | 425 | # GRADED FUNCTION: gradient_check_n 426 | 427 | def gradient_check_n(parameters, gradients, X, Y, epsilon=1e-7, print_msg=False): 428 | """ 429 | Checks if backward_propagation_n computes correctly the gradient of the cost output by forward_propagation_n 430 | 431 | Arguments: 432 | parameters -- python dictionary containing your parameters "W1", "b1", "W2", "b2", "W3", "b3" 433 | grad -- output of backward_propagation_n, contains gradients of the cost with respect to the parameters 434 | X -- input datapoint, of shape (input size, number of examples) 435 | Y -- true "label" 436 | epsilon -- tiny shift to the input to compute approximated gradient with formula(1) 437 | 438 | Returns: 439 | difference -- difference (2) between the approximated gradient and the backward propagation gradient 440 | """ 441 | 442 | # Set-up variables 443 | parameters_values, _ = dictionary_to_vector(parameters) 444 | 445 | grad = gradients_to_vector(gradients) 446 | num_parameters = parameters_values.shape[0] 447 | J_plus = np.zeros((num_parameters, 1)) 448 | J_minus = np.zeros((num_parameters, 1)) 449 | gradapprox = np.zeros((num_parameters, 1)) 450 | 451 | # Compute gradapprox 452 | for i in range(num_parameters): 453 | 454 | # Compute J_plus[i]. Inputs: "parameters_values, epsilon". Output = "J_plus[i]". 455 | # "_" is used because the function you have outputs two parameters but we only care about the first one 456 | #(approx. 3 lines) 457 | # theta_plus = # Step 1 458 | # theta_plus[i] = # Step 2 459 | # J_plus[i], _ = # Step 3 460 | # YOUR CODE STARTS HERE 461 | thetaplus = np.copy(parameters_values) # Step 1 462 | thetaplus[i][0] = thetaplus[i][0] + epsilon # Step 2 463 | J_plus[i], _ = forward_propagation_n(X, Y, vector_to_dictionary(thetaplus)) # Step 3 464 | 465 | # YOUR CODE ENDS HERE 466 | 467 | # Compute J_minus[i]. Inputs: "parameters_values, epsilon". Output = "J_minus[i]". 468 | #(approx. 3 lines) 469 | # theta_minus = # Step 1 470 | # theta_minus[i] = # Step 2 471 | # J_minus[i], _ = # Step 3 472 | # YOUR CODE STARTS HERE 473 | 474 | thetaminus = np.copy(parameters_values) # Step 1 475 | thetaminus[i][0] = thetaminus[i][0] - epsilon # Step 2 476 | J_minus[i], _ = forward_propagation_n(X, Y, vector_to_dictionary(thetaminus)) # Step 3 477 | # YOUR CODE ENDS HERE 478 | 479 | # Compute gradapprox[i] 480 | # (approx. 1 line) 481 | # gradapprox[i] = 482 | # YOUR CODE STARTS HERE 483 | 484 | gradapprox[i] = (J_plus[i] - J_minus[i])/(2*epsilon) 485 | # YOUR CODE ENDS HERE 486 | 487 | # Compare gradapprox to backward propagation gradients by computing difference. 488 | # (approx. 3 line) 489 | # numerator = # Step 1' 490 | # denominator = # Step 2' 491 | # difference = # Step 3' 492 | # YOUR CODE STARTS HERE 493 | 494 | numerator = np.linalg.norm(grad - gradapprox) # Step 1' 495 | denominator = np.linalg.norm(grad) + np.linalg.norm(gradapprox) # Step 2' 496 | difference = numerator/denominator # Step 3' 497 | # YOUR CODE ENDS HERE 498 | if print_msg: 499 | if difference > 2e-7: 500 | print ("\033[93m" + "There is a mistake in the backward propagation! difference = " + str(difference) + "\033[0m") 501 | else: 502 | print ("\033[92m" + "Your backward propagation works perfectly fine! difference = " + str(difference) + "\033[0m") 503 | 504 | return difference 505 | 506 | 507 | # In[16]: 508 | 509 | 510 | X, Y, parameters = gradient_check_n_test_case() 511 | 512 | cost, cache = forward_propagation_n(X, Y, parameters) 513 | gradients = backward_propagation_n(X, Y, cache) 514 | difference = gradient_check_n(parameters, gradients, X, Y, 1e-7, True) 515 | expected_values = [0.2850931567761623, 1.1890913024229996e-07] 516 | assert not(type(difference) == np.ndarray), "You are not using np.linalg.norm for numerator or denominator" 517 | assert np.any(np.isclose(difference, expected_values)), "Wrong value. It is not one of the expected values" 518 | 519 | 520 | # **Expected output**: 521 | # 522 | # 523 | # 524 | # 525 | # 526 | # 527 | #
There is a mistake in the backward propagation! difference = 0.2850931567761623
528 | 529 | # It seems that there were errors in the `backward_propagation_n` code! Good thing you've implemented the gradient check. Go back to `backward_propagation` and try to find/correct the errors *(Hint: check dW2 and db1)*. Rerun the gradient check when you think you've fixed it. Remember, you'll need to re-execute the cell defining `backward_propagation_n()` if you modify the code. 530 | # 531 | # Can you get gradient check to declare your derivative computation correct? Even though this part of the assignment isn't graded, you should try to find the bug and re-run gradient check until you're convinced backprop is now correctly implemented. 532 | # 533 | # **Notes** 534 | # - Gradient Checking is slow! Approximating the gradient with $\frac{\partial J}{\partial \theta} \approx \frac{J(\theta + \varepsilon) - J(\theta - \varepsilon)}{2 \varepsilon}$ is computationally costly. For this reason, we don't run gradient checking at every iteration during training. Just a few times to check if the gradient is correct. 535 | # - Gradient Checking, at least as we've presented it, doesn't work with dropout. You would usually run the gradient check algorithm without dropout to make sure your backprop is correct, then add dropout. 536 | # 537 | # Congrats! Now you can be confident that your deep learning model for fraud detection is working correctly! You can even use this to convince your CEO. :) 538 | #
539 | # 540 | # 541 | # **What you should remember from this notebook**: 542 | # - Gradient checking verifies closeness between the gradients from backpropagation and the numerical approximation of the gradient (computed using forward propagation). 543 | # - Gradient checking is slow, so you don't want to run it in every iteration of training. You would usually run it only to make sure your code is correct, then turn it off and use backprop for the actual learning process. 544 | -------------------------------------------------------------------------------- /Course 2-Improving Deep Neural Networks/Week 1/Initialization.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # coding: utf-8 3 | 4 | # # Initialization 5 | # 6 | # Welcome to the first assignment of Improving Deep Neural Networks! 7 | # 8 | # Training your neural network requires specifying an initial value of the weights. A well-chosen initialization method helps the learning process. 9 | # 10 | # If you completed the previous course of this specialization, you probably followed the instructions for weight initialization, and seen that it's worked pretty well so far. But how do you choose the initialization for a new neural network? In this notebook, you'll try out a few different initializations, including random, zeros, and He initialization, and see how each leads to different results. 11 | # 12 | # A well-chosen initialization can: 13 | # - Speed up the convergence of gradient descent 14 | # - Increase the odds of gradient descent converging to a lower training (and generalization) error 15 | # 16 | # Let's get started! 17 | # 18 | # ## Important Note on Submission to the AutoGrader 19 | # 20 | # Before submitting your assignment to the AutoGrader, please make sure you are not doing the following: 21 | # 22 | # 1. You have not added any _extra_ `print` statement(s) in the assignment. 23 | # 2. You have not added any _extra_ code cell(s) in the assignment. 24 | # 3. You have not changed any of the function parameters. 25 | # 4. You are not using any global variables inside your graded exercises. Unless specifically instructed to do so, please refrain from it and use the local variables instead. 26 | # 5. You are not changing the assignment code where it is not required, like creating _extra_ variables. 27 | # 28 | # If you do any of the following, you will get something like, `Grader Error: Grader feedback not found` (or similarly unexpected) error upon submitting your assignment. Before asking for help/debugging the errors in your assignment, check for these first. If this is the case, and you don't remember the changes you have made, you can get a fresh copy of the assignment by following these [instructions](https://www.coursera.org/learn/deep-neural-network/supplement/QWEnZ/h-ow-to-refresh-your-workspace). 29 | 30 | # ## Table of Contents 31 | # - [1 - Packages](#1) 32 | # - [2 - Loading the Dataset](#2) 33 | # - [3 - Neural Network Model](#3) 34 | # - [4 - Zero Initialization](#4) 35 | # - [Exercise 1 - initialize_parameters_zeros](#ex-1) 36 | # - [5 - Random Initialization](#5) 37 | # - [Exercise 2 - initialize_parameters_random](#ex-2) 38 | # - [6 - He Initialization](#6) 39 | # - [Exercise 3 - initialize_parameters_he](#ex-3) 40 | # - [7 - Conclusions](#7) 41 | 42 | # 43 | # ## 1 - Packages 44 | 45 | # In[1]: 46 | 47 | 48 | import numpy as np 49 | import matplotlib.pyplot as plt 50 | import sklearn 51 | import sklearn.datasets 52 | from public_tests import * 53 | from init_utils import sigmoid, relu, compute_loss, forward_propagation, backward_propagation 54 | from init_utils import update_parameters, predict, load_dataset, plot_decision_boundary, predict_dec 55 | 56 | get_ipython().run_line_magic('matplotlib', 'inline') 57 | plt.rcParams['figure.figsize'] = (7.0, 4.0) # set default size of plots 58 | plt.rcParams['image.interpolation'] = 'nearest' 59 | plt.rcParams['image.cmap'] = 'gray' 60 | 61 | get_ipython().run_line_magic('load_ext', 'autoreload') 62 | get_ipython().run_line_magic('autoreload', '2') 63 | 64 | # load image dataset: blue/red dots in circles 65 | # train_X, train_Y, test_X, test_Y = load_dataset() 66 | 67 | 68 | # 69 | # ## 2 - Loading the Dataset 70 | 71 | # In[2]: 72 | 73 | 74 | train_X, train_Y, test_X, test_Y = load_dataset() 75 | 76 | 77 | # For this classifier, you want to separate the blue dots from the red dots. 78 | 79 | # 80 | # ## 3 - Neural Network Model 81 | 82 | # You'll use a 3-layer neural network (already implemented for you). These are the initialization methods you'll experiment with: 83 | # - *Zeros initialization* -- setting `initialization = "zeros"` in the input argument. 84 | # - *Random initialization* -- setting `initialization = "random"` in the input argument. This initializes the weights to large random values. 85 | # - *He initialization* -- setting `initialization = "he"` in the input argument. This initializes the weights to random values scaled according to a paper by He et al., 2015. 86 | # 87 | # **Instructions**: Instructions: Read over the code below, and run it. In the next part, you'll implement the three initialization methods that this `model()` calls. 88 | 89 | # In[3]: 90 | 91 | 92 | def model(X, Y, learning_rate = 0.01, num_iterations = 15000, print_cost = True, initialization = "he"): 93 | """ 94 | Implements a three-layer neural network: LINEAR->RELU->LINEAR->RELU->LINEAR->SIGMOID. 95 | 96 | Arguments: 97 | X -- input data, of shape (2, number of examples) 98 | Y -- true "label" vector (containing 0 for red dots; 1 for blue dots), of shape (1, number of examples) 99 | learning_rate -- learning rate for gradient descent 100 | num_iterations -- number of iterations to run gradient descent 101 | print_cost -- if True, print the cost every 1000 iterations 102 | initialization -- flag to choose which initialization to use ("zeros","random" or "he") 103 | 104 | Returns: 105 | parameters -- parameters learnt by the model 106 | """ 107 | 108 | grads = {} 109 | costs = [] # to keep track of the loss 110 | m = X.shape[1] # number of examples 111 | layers_dims = [X.shape[0], 10, 5, 1] 112 | 113 | # Initialize parameters dictionary. 114 | if initialization == "zeros": 115 | parameters = initialize_parameters_zeros(layers_dims) 116 | elif initialization == "random": 117 | parameters = initialize_parameters_random(layers_dims) 118 | elif initialization == "he": 119 | parameters = initialize_parameters_he(layers_dims) 120 | 121 | # Loop (gradient descent) 122 | 123 | for i in range(num_iterations): 124 | 125 | # Forward propagation: LINEAR -> RELU -> LINEAR -> RELU -> LINEAR -> SIGMOID. 126 | a3, cache = forward_propagation(X, parameters) 127 | 128 | # Loss 129 | cost = compute_loss(a3, Y) 130 | 131 | # Backward propagation. 132 | grads = backward_propagation(X, Y, cache) 133 | 134 | # Update parameters. 135 | parameters = update_parameters(parameters, grads, learning_rate) 136 | 137 | # Print the loss every 1000 iterations 138 | if print_cost and i % 1000 == 0: 139 | print("Cost after iteration {}: {}".format(i, cost)) 140 | costs.append(cost) 141 | 142 | # plot the loss 143 | plt.plot(costs) 144 | plt.ylabel('cost') 145 | plt.xlabel('iterations (per hundreds)') 146 | plt.title("Learning rate =" + str(learning_rate)) 147 | plt.show() 148 | 149 | return parameters 150 | 151 | 152 | # 153 | # ## 4 - Zero Initialization 154 | # 155 | # There are two types of parameters to initialize in a neural network: 156 | # - the weight matrices $(W^{[1]}, W^{[2]}, W^{[3]}, ..., W^{[L-1]}, W^{[L]})$ 157 | # - the bias vectors $(b^{[1]}, b^{[2]}, b^{[3]}, ..., b^{[L-1]}, b^{[L]})$ 158 | # 159 | # 160 | # ### Exercise 1 - initialize_parameters_zeros 161 | # 162 | # Implement the following function to initialize all parameters to zeros. You'll see later that this does not work well since it fails to "break symmetry," but try it anyway and see what happens. Use `np.zeros((..,..))` with the correct shapes. 163 | 164 | # In[4]: 165 | 166 | 167 | # GRADED FUNCTION: initialize_parameters_zeros 168 | 169 | def initialize_parameters_zeros(layers_dims): 170 | """ 171 | Arguments: 172 | layer_dims -- python array (list) containing the size of each layer. 173 | 174 | Returns: 175 | parameters -- python dictionary containing your parameters "W1", "b1", ..., "WL", "bL": 176 | W1 -- weight matrix of shape (layers_dims[1], layers_dims[0]) 177 | b1 -- bias vector of shape (layers_dims[1], 1) 178 | ... 179 | WL -- weight matrix of shape (layers_dims[L], layers_dims[L-1]) 180 | bL -- bias vector of shape (layers_dims[L], 1) 181 | """ 182 | 183 | parameters = {} 184 | L = len(layers_dims) # number of layers in the network 185 | 186 | for l in range(1, L): 187 | #(≈ 2 lines of code) 188 | # parameters['W' + str(l)] = 189 | # parameters['b' + str(l)] = 190 | # YOUR CODE STARTS HERE 191 | parameters['W' + str(l)] = np.zeros((layers_dims[l], layers_dims[l-1])) 192 | parameters['b' + str(l)] = np.zeros((layers_dims[l], 1)) 193 | 194 | # YOUR CODE ENDS HERE 195 | return parameters 196 | 197 | 198 | # In[5]: 199 | 200 | 201 | parameters = initialize_parameters_zeros([3, 2, 1]) 202 | print("W1 = " + str(parameters["W1"])) 203 | print("b1 = " + str(parameters["b1"])) 204 | print("W2 = " + str(parameters["W2"])) 205 | print("b2 = " + str(parameters["b2"])) 206 | initialize_parameters_zeros_test(initialize_parameters_zeros) 207 | 208 | 209 | # Run the following code to train your model on 15,000 iterations using zeros initialization. 210 | 211 | # In[6]: 212 | 213 | 214 | parameters = model(train_X, train_Y, initialization = "zeros") 215 | print ("On the train set:") 216 | predictions_train = predict(train_X, train_Y, parameters) 217 | print ("On the test set:") 218 | predictions_test = predict(test_X, test_Y, parameters) 219 | 220 | 221 | # The performance is terrible, the cost doesn't decrease, and the algorithm performs no better than random guessing. Why? Take a look at the details of the predictions and the decision boundary: 222 | 223 | # In[7]: 224 | 225 | 226 | print ("predictions_train = " + str(predictions_train)) 227 | print ("predictions_test = " + str(predictions_test)) 228 | 229 | 230 | # In[8]: 231 | 232 | 233 | plt.title("Model with Zeros initialization") 234 | axes = plt.gca() 235 | axes.set_xlim([-1.5,1.5]) 236 | axes.set_ylim([-1.5,1.5]) 237 | plot_decision_boundary(lambda x: predict_dec(parameters, x.T), train_X, train_Y) 238 | 239 | 240 | # __Note__: For sake of simplicity calculations below are done using only one example at a time. 241 | # 242 | # Since the weights and biases are zero, multiplying by the weights creates the zero vector which gives 0 when the activation function is ReLU. As `z = 0` 243 | # 244 | # $$a = ReLU(z) = max(0, z) = 0$$ 245 | # 246 | # At the classification layer, where the activation function is sigmoid you then get (for either input): 247 | # 248 | # $$\sigma(z) = \frac{1}{ 1 + e^{-(z)}} = \frac{1}{2} = y_{pred}$$ 249 | # 250 | # As for every example you are getting a 0.5 chance of it being true our cost function becomes helpless in adjusting the weights. 251 | # 252 | # Your loss function: 253 | # $$ \mathcal{L}(a, y) = - y \ln(y_{pred}) - (1-y) \ln(1-y_{pred})$$ 254 | # 255 | # For `y=1`, `y_pred=0.5` it becomes: 256 | # 257 | # $$ \mathcal{L}(0, 1) = - (1) \ln(\frac{1}{2}) = 0.6931471805599453$$ 258 | # 259 | # For `y=0`, `y_pred=0.5` it becomes: 260 | # 261 | # $$ \mathcal{L}(0, 0) = - (1) \ln(\frac{1}{2}) = 0.6931471805599453$$ 262 | # 263 | # As you can see with the prediction being 0.5 whether the actual (`y`) value is 1 or 0 you get the same loss value for both, so none of the weights get adjusted and you are stuck with the same old value of the weights. 264 | # 265 | # This is why you can see that the model is predicting 0 for every example! No wonder it's doing so badly. 266 | # 267 | # In general, initializing all the weights to zero results in the network failing to break symmetry. This means that every neuron in each layer will learn the same thing, so you might as well be training a neural network with $n^{[l]}=1$ for every layer. This way, the network is no more powerful than a linear classifier like logistic regression. 268 | 269 | # 270 | # 271 | # **What you should remember**: 272 | # - The weights $W^{[l]}$ should be initialized randomly to break symmetry. 273 | # - However, it's okay to initialize the biases $b^{[l]}$ to zeros. Symmetry is still broken so long as $W^{[l]}$ is initialized randomly. 274 | # 275 | 276 | # 277 | # ## 5 - Random Initialization 278 | # 279 | # To break symmetry, initialize the weights randomly. Following random initialization, each neuron can then proceed to learn a different function of its inputs. In this exercise, you'll see what happens when the weights are initialized randomly, but to very large values. 280 | # 281 | # 282 | # ### Exercise 2 - initialize_parameters_random 283 | # 284 | # Implement the following function to initialize your weights to large random values (scaled by \*10) and your biases to zeros. Use `np.random.randn(..,..) * 10` for weights and `np.zeros((.., ..))` for biases. You're using a fixed `np.random.seed(..)` to make sure your "random" weights match ours, so don't worry if running your code several times always gives you the same initial values for the parameters. 285 | 286 | # In[9]: 287 | 288 | 289 | # GRADED FUNCTION: initialize_parameters_random 290 | 291 | def initialize_parameters_random(layers_dims): 292 | """ 293 | Arguments: 294 | layer_dims -- python array (list) containing the size of each layer. 295 | 296 | Returns: 297 | parameters -- python dictionary containing your parameters "W1", "b1", ..., "WL", "bL": 298 | W1 -- weight matrix of shape (layers_dims[1], layers_dims[0]) 299 | b1 -- bias vector of shape (layers_dims[1], 1) 300 | ... 301 | WL -- weight matrix of shape (layers_dims[L], layers_dims[L-1]) 302 | bL -- bias vector of shape (layers_dims[L], 1) 303 | """ 304 | 305 | np.random.seed(3) # This seed makes sure your "random" numbers will be the as ours 306 | parameters = {} 307 | L = len(layers_dims) # integer representing the number of layers 308 | 309 | for l in range(1, L): 310 | #(≈ 2 lines of code) 311 | # parameters['W' + str(l)] = 312 | # parameters['b' + str(l)] = 313 | # YOUR CODE STARTS HERE 314 | 315 | parameters['W' + str(l)] = np.random.randn(layers_dims[l], layers_dims[l-1])*10 316 | parameters['b' + str(l)] = np.zeros((layers_dims[l], 1)) 317 | # YOUR CODE ENDS HERE 318 | 319 | return parameters 320 | 321 | 322 | # In[10]: 323 | 324 | 325 | parameters = initialize_parameters_random([3, 2, 1]) 326 | print("W1 = " + str(parameters["W1"])) 327 | print("b1 = " + str(parameters["b1"])) 328 | print("W2 = " + str(parameters["W2"])) 329 | print("b2 = " + str(parameters["b2"])) 330 | initialize_parameters_random_test(initialize_parameters_random) 331 | 332 | 333 | # Run the following code to train your model on 15,000 iterations using random initialization. 334 | 335 | # In[11]: 336 | 337 | 338 | parameters = model(train_X, train_Y, initialization = "random") 339 | print ("On the train set:") 340 | predictions_train = predict(train_X, train_Y, parameters) 341 | print ("On the test set:") 342 | predictions_test = predict(test_X, test_Y, parameters) 343 | 344 | 345 | # If you see "inf" as the cost after the iteration 0, this is because of numerical roundoff. A more numerically sophisticated implementation would fix this, but for the purposes of this notebook, it isn't really worth worrying about. 346 | # 347 | # In any case, you've now broken the symmetry, and this gives noticeably better accuracy than before. The model is no longer outputting all 0s. Progress! 348 | 349 | # In[12]: 350 | 351 | 352 | print (predictions_train) 353 | print (predictions_test) 354 | 355 | 356 | # In[13]: 357 | 358 | 359 | plt.title("Model with large random initialization") 360 | axes = plt.gca() 361 | axes.set_xlim([-1.5,1.5]) 362 | axes.set_ylim([-1.5,1.5]) 363 | plot_decision_boundary(lambda x: predict_dec(parameters, x.T), train_X, train_Y) 364 | 365 | 366 | # **Observations**: 367 | # - The cost starts very high. This is because with large random-valued weights, the last activation (sigmoid) outputs results that are very close to 0 or 1 for some examples, and when it gets that example wrong it incurs a very high loss for that example. Indeed, when $\log(a^{[3]}) = \log(0)$, the loss goes to infinity. 368 | # - Poor initialization can lead to vanishing/exploding gradients, which also slows down the optimization algorithm. 369 | # - If you train this network longer you will see better results, but initializing with overly large random numbers slows down the optimization. 370 | # 371 | # 372 | # 373 | # **In summary**: 374 | # - Initializing weights to very large random values doesn't work well. 375 | # - Initializing with small random values should do better. The important question is, how small should be these random values be? Let's find out up next! 376 | # 377 | # 378 | # 379 | # **Optional Read:** 380 | # 381 | # 382 | # The main difference between Gaussian variable (`numpy.random.randn()`) and uniform random variable is the distribution of the generated random numbers: 383 | # 384 | # - numpy.random.rand() produces numbers in a [uniform distribution](https://raw.githubusercontent.com/jahnog/deeplearning-notes/master/Course2/images/rand.jpg). 385 | # - and numpy.random.randn() produces numbers in a [normal distribution](https://raw.githubusercontent.com/jahnog/deeplearning-notes/master/Course2/images/randn.jpg). 386 | # 387 | # When used for weight initialization, randn() helps most the weights to Avoid being close to the extremes, allocating most of them in the center of the range. 388 | # 389 | # An intuitive way to see it is, for example, if you take the [sigmoid() activation function](https://raw.githubusercontent.com/jahnog/deeplearning-notes/master/Course2/images/sigmoid.jpg). 390 | # 391 | # You’ll remember that the slope near 0 or near 1 is extremely small, so the weights near those extremes will converge much more slowly to the solution, and having most of them near the center will speed the convergence. 392 | 393 | # 394 | # ## 6 - He Initialization 395 | # 396 | # Finally, try "He Initialization"; this is named for the first author of He et al., 2015. (If you have heard of "Xavier initialization", this is similar except Xavier initialization uses a scaling factor for the weights $W^{[l]}$ of `sqrt(1./layers_dims[l-1])` where He initialization would use `sqrt(2./layers_dims[l-1])`.) 397 | # 398 | # 399 | # ### Exercise 3 - initialize_parameters_he 400 | # 401 | # Implement the following function to initialize your parameters with He initialization. This function is similar to the previous `initialize_parameters_random(...)`. The only difference is that instead of multiplying `np.random.randn(..,..)` by 10, you will multiply it by $\sqrt{\frac{2}{\text{dimension of the previous layer}}}$, which is what He initialization recommends for layers with a ReLU activation. 402 | 403 | # In[16]: 404 | 405 | 406 | # GRADED FUNCTION: initialize_parameters_he 407 | 408 | def initialize_parameters_he(layers_dims): 409 | """ 410 | Arguments: 411 | layer_dims -- python array (list) containing the size of each layer. 412 | 413 | Returns: 414 | parameters -- python dictionary containing your parameters "W1", "b1", ..., "WL", "bL": 415 | W1 -- weight matrix of shape (layers_dims[1], layers_dims[0]) 416 | b1 -- bias vector of shape (layers_dims[1], 1) 417 | ... 418 | WL -- weight matrix of shape (layers_dims[L], layers_dims[L-1]) 419 | bL -- bias vector of shape (layers_dims[L], 1) 420 | """ 421 | 422 | np.random.seed(3) 423 | parameters = {} 424 | L = len(layers_dims) - 1 # integer representing the number of layers 425 | import math 426 | for l in range(1, L + 1): 427 | ### START CODE HERE ### (≈ 2 lines of code) 428 | parameters['W' + str(l)] = np.random.randn(layers_dims[l], layers_dims[l-1])*math.sqrt(2./layers_dims[l-1]) 429 | parameters['b' + str(l)] = np.zeros((layers_dims[l], 1))*math.sqrt(2./layers_dims[l-1]) 430 | ### END CODE HERE ### 431 | 432 | return parameters 433 | 434 | 435 | # In[17]: 436 | 437 | 438 | parameters = initialize_parameters_he([2, 4, 1]) 439 | print("W1 = " + str(parameters["W1"])) 440 | print("b1 = " + str(parameters["b1"])) 441 | print("W2 = " + str(parameters["W2"])) 442 | print("b2 = " + str(parameters["b2"])) 443 | 444 | initialize_parameters_he_test(initialize_parameters_he) 445 | # parameters 446 | 447 | 448 | # **Expected output** 449 | # 450 | # ``` 451 | # W1 = [[ 1.78862847 0.43650985] 452 | # [ 0.09649747 -1.8634927 ] 453 | # [-0.2773882 -0.35475898] 454 | # [-0.08274148 -0.62700068]] 455 | # b1 = [[0.] [0.] [0.] [0.]] 456 | # W2 = [[-0.03098412 -0.33744411 -0.92904268 0.62552248]] 457 | # b2 = [[0.]] 458 | # ``` 459 | 460 | # Run the following code to train your model on 15,000 iterations using He initialization. 461 | 462 | # In[18]: 463 | 464 | 465 | parameters = model(train_X, train_Y, initialization = "he") 466 | print ("On the train set:") 467 | predictions_train = predict(train_X, train_Y, parameters) 468 | print ("On the test set:") 469 | predictions_test = predict(test_X, test_Y, parameters) 470 | 471 | 472 | # In[19]: 473 | 474 | 475 | plt.title("Model with He initialization") 476 | axes = plt.gca() 477 | axes.set_xlim([-1.5,1.5]) 478 | axes.set_ylim([-1.5,1.5]) 479 | plot_decision_boundary(lambda x: predict_dec(parameters, x.T), train_X, train_Y) 480 | 481 | 482 | # **Observations**: 483 | # - The model with He initialization separates the blue and the red dots very well in a small number of iterations. 484 | # 485 | 486 | # 487 | # ## 7 - Conclusions 488 | 489 | # You've tried three different types of initializations. For the same number of iterations and same hyperparameters, the comparison is: 490 | # 491 | # 492 | # 493 | # 496 | # 499 | # 502 | # 503 | # 506 | # 509 | # 512 | # 513 | # 516 | # 519 | # 522 | # 523 | # 524 | # 527 | # 530 | # 533 | # 534 | #
494 | # Model 495 | # 497 | # Train accuracy 498 | # 500 | # Problem/Comment 501 | #
504 | # 3-layer NN with zeros initialization 505 | # 507 | # 50% 508 | # 510 | # fails to break symmetry 511 | #
514 | # 3-layer NN with large random initialization 515 | # 517 | # 83% 518 | # 520 | # too large weights 521 | #
525 | # 3-layer NN with He initialization 526 | # 528 | # 99% 529 | # 531 | # recommended method 532 | #
535 | 536 | # **Congratulations**! You've completed this notebook on Initialization. 537 | # 538 | # Here's a quick recap of the main takeaways: 539 | # 540 | # 541 | # 542 | # - Different initializations lead to very different results 543 | # - Random initialization is used to break symmetry and make sure different hidden units can learn different things 544 | # - Resist initializing to values that are too large! 545 | # - He initialization works well for networks with ReLU activations 546 | -------------------------------------------------------------------------------- /Course 2-Improving Deep Neural Networks/Week 1/Week 1 Practical aspects of Deep Learning.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TheKidPadra/DeepLearning.AI-Deep-Learning-Specialization/0a951ae6671f739e86b9a2846c821df1a0215ef6/Course 2-Improving Deep Neural Networks/Week 1/Week 1 Practical aspects of Deep Learning.pdf -------------------------------------------------------------------------------- /Course 2-Improving Deep Neural Networks/Week 2/Week 2 Optimization Algorithms.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TheKidPadra/DeepLearning.AI-Deep-Learning-Specialization/0a951ae6671f739e86b9a2846c821df1a0215ef6/Course 2-Improving Deep Neural Networks/Week 2/Week 2 Optimization Algorithms.pdf -------------------------------------------------------------------------------- /Course 2-Improving Deep Neural Networks/Week 3/Week 3 Hyperparameter tuning, Batch Normalization, Programming Frameworks.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TheKidPadra/DeepLearning.AI-Deep-Learning-Specialization/0a951ae6671f739e86b9a2846c821df1a0215ef6/Course 2-Improving Deep Neural Networks/Week 3/Week 3 Hyperparameter tuning, Batch Normalization, Programming Frameworks.pdf -------------------------------------------------------------------------------- /Course 3-Structuring MachineLearningProjects/Week 1 Bird Recognition in the City of Peacetopia Case Study.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TheKidPadra/DeepLearning.AI-Deep-Learning-Specialization/0a951ae6671f739e86b9a2846c821df1a0215ef6/Course 3-Structuring MachineLearningProjects/Week 1 Bird Recognition in the City of Peacetopia Case Study.pdf -------------------------------------------------------------------------------- /Course 3-Structuring MachineLearningProjects/Week 2 Autonomous Driving Case Study.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TheKidPadra/DeepLearning.AI-Deep-Learning-Specialization/0a951ae6671f739e86b9a2846c821df1a0215ef6/Course 3-Structuring MachineLearningProjects/Week 2 Autonomous Driving Case Study.pdf -------------------------------------------------------------------------------- /Course 4-ConvolutionalNeuralNetworks/Week 1/Convolution_model_Application.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # coding: utf-8 3 | 4 | # # Convolutional Neural Networks: Application 5 | # 6 | # Welcome to Course 4's second assignment! In this notebook, you will: 7 | # 8 | # - Create a mood classifer using the TF Keras Sequential API 9 | # - Build a ConvNet to identify sign language digits using the TF Keras Functional API 10 | # 11 | # **After this assignment you will be able to:** 12 | # 13 | # - Build and train a ConvNet in TensorFlow for a __binary__ classification problem 14 | # - Build and train a ConvNet in TensorFlow for a __multiclass__ classification problem 15 | # - Explain different use cases for the Sequential and Functional APIs 16 | # 17 | # To complete this assignment, you should already be familiar with TensorFlow. If you are not, please refer back to the **TensorFlow Tutorial** of the third week of Course 2 ("**Improving deep neural networks**"). 18 | # 19 | # ## Important Note on Submission to the AutoGrader 20 | # 21 | # Before submitting your assignment to the AutoGrader, please make sure you are not doing the following: 22 | # 23 | # 1. You have not added any _extra_ `print` statement(s) in the assignment. 24 | # 2. You have not added any _extra_ code cell(s) in the assignment. 25 | # 3. You have not changed any of the function parameters. 26 | # 4. You are not using any global variables inside your graded exercises. Unless specifically instructed to do so, please refrain from it and use the local variables instead. 27 | # 5. You are not changing the assignment code where it is not required, like creating _extra_ variables. 28 | # 29 | # If you do any of the following, you will get something like, `Grader Error: Grader feedback not found` (or similarly unexpected) error upon submitting your assignment. Before asking for help/debugging the errors in your assignment, check for these first. If this is the case, and you don't remember the changes you have made, you can get a fresh copy of the assignment by following these [instructions](https://www.coursera.org/learn/convolutional-neural-networks/supplement/DS4yP/h-ow-to-refresh-your-workspace). 30 | 31 | # ## Table of Contents 32 | # 33 | # - [1 - Packages](#1) 34 | # - [1.1 - Load the Data and Split the Data into Train/Test Sets](#1-1) 35 | # - [2 - Layers in TF Keras](#2) 36 | # - [3 - The Sequential API](#3) 37 | # - [3.1 - Create the Sequential Model](#3-1) 38 | # - [Exercise 1 - happyModel](#ex-1) 39 | # - [3.2 - Train and Evaluate the Model](#3-2) 40 | # - [4 - The Functional API](#4) 41 | # - [4.1 - Load the SIGNS Dataset](#4-1) 42 | # - [4.2 - Split the Data into Train/Test Sets](#4-2) 43 | # - [4.3 - Forward Propagation](#4-3) 44 | # - [Exercise 2 - convolutional_model](#ex-2) 45 | # - [4.4 - Train the Model](#4-4) 46 | # - [5 - History Object](#5) 47 | # - [6 - Bibliography](#6) 48 | 49 | # 50 | # ## 1 - Packages 51 | # 52 | # As usual, begin by loading in the packages. 53 | 54 | # In[1]: 55 | 56 | 57 | import math 58 | import numpy as np 59 | import h5py 60 | import matplotlib.pyplot as plt 61 | from matplotlib.pyplot import imread 62 | import scipy 63 | from PIL import Image 64 | import pandas as pd 65 | import tensorflow as tf 66 | import tensorflow.keras.layers as tfl 67 | from tensorflow.python.framework import ops 68 | from cnn_utils import * 69 | from test_utils import summary, comparator 70 | 71 | get_ipython().run_line_magic('matplotlib', 'inline') 72 | np.random.seed(1) 73 | 74 | 75 | # 76 | # ### 1.1 - Load the Data and Split the Data into Train/Test Sets 77 | # 78 | # You'll be using the Happy House dataset for this part of the assignment, which contains images of peoples' faces. Your task will be to build a ConvNet that determines whether the people in the images are smiling or not -- because they only get to enter the house if they're smiling! 79 | 80 | # In[2]: 81 | 82 | 83 | X_train_orig, Y_train_orig, X_test_orig, Y_test_orig, classes = load_happy_dataset() 84 | 85 | # Normalize image vectors 86 | X_train = X_train_orig/255. 87 | X_test = X_test_orig/255. 88 | 89 | # Reshape 90 | Y_train = Y_train_orig.T 91 | Y_test = Y_test_orig.T 92 | 93 | print ("number of training examples = " + str(X_train.shape[0])) 94 | print ("number of test examples = " + str(X_test.shape[0])) 95 | print ("X_train shape: " + str(X_train.shape)) 96 | print ("Y_train shape: " + str(Y_train.shape)) 97 | print ("X_test shape: " + str(X_test.shape)) 98 | print ("Y_test shape: " + str(Y_test.shape)) 99 | 100 | 101 | # You can display the images contained in the dataset. Images are **64x64** pixels in RGB format (3 channels). 102 | 103 | # In[7]: 104 | 105 | 106 | index = 124 107 | plt.imshow(X_train_orig[index]) #display sample training image 108 | plt.show() 109 | 110 | 111 | # 112 | # ## 2 - Layers in TF Keras 113 | # 114 | # In the previous assignment, you created layers manually in numpy. In TF Keras, you don't have to write code directly to create layers. Rather, TF Keras has pre-defined layers you can use. 115 | # 116 | # When you create a layer in TF Keras, you are creating a function that takes some input and transforms it into an output you can reuse later. Nice and easy! 117 | 118 | # 119 | # ## 3 - The Sequential API 120 | # 121 | # In the previous assignment, you built helper functions using `numpy` to understand the mechanics behind convolutional neural networks. Most practical applications of deep learning today are built using programming frameworks, which have many built-in functions you can simply call. Keras is a high-level abstraction built on top of TensorFlow, which allows for even more simplified and optimized model creation and training. 122 | # 123 | # For the first part of this assignment, you'll create a model using TF Keras' Sequential API, which allows you to build layer by layer, and is ideal for building models where each layer has **exactly one** input tensor and **one** output tensor. 124 | # 125 | # As you'll see, using the Sequential API is simple and straightforward, but is only appropriate for simpler, more straightforward tasks. Later in this notebook you'll spend some time building with a more flexible, powerful alternative: the Functional API. 126 | # 127 | 128 | # 129 | # ### 3.1 - Create the Sequential Model 130 | # 131 | # As mentioned earlier, the TensorFlow Keras Sequential API can be used to build simple models with layer operations that proceed in a sequential order. 132 | # 133 | # You can also add layers incrementally to a Sequential model with the `.add()` method, or remove them using the `.pop()` method, much like you would in a regular Python list. 134 | # 135 | # Actually, you can think of a Sequential model as behaving like a list of layers. Like Python lists, Sequential layers are ordered, and the order in which they are specified matters. If your model is non-linear or contains layers with multiple inputs or outputs, a Sequential model wouldn't be the right choice! 136 | # 137 | # For any layer construction in Keras, you'll need to specify the input shape in advance. This is because in Keras, the shape of the weights is based on the shape of the inputs. The weights are only created when the model first sees some input data. Sequential models can be created by passing a list of layers to the Sequential constructor, like you will do in the next assignment. 138 | # 139 | # 140 | # ### Exercise 1 - happyModel 141 | # 142 | # Implement the `happyModel` function below to build the following model: `ZEROPAD2D -> CONV2D -> BATCHNORM -> RELU -> MAXPOOL -> FLATTEN -> DENSE`. Take help from [tf.keras.layers](https://www.tensorflow.org/api_docs/python/tf/keras/layers) 143 | # 144 | # Also, plug in the following parameters for all the steps: 145 | # 146 | # - [ZeroPadding2D](https://www.tensorflow.org/api_docs/python/tf/keras/layers/ZeroPadding2D): padding 3, input shape 64 x 64 x 3 147 | # - [Conv2D](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Conv2D): Use 32 7x7 filters, stride 1 148 | # - [BatchNormalization](https://www.tensorflow.org/api_docs/python/tf/keras/layers/BatchNormalization): for axis 3 149 | # - [ReLU](https://www.tensorflow.org/api_docs/python/tf/keras/layers/ReLU) 150 | # - [MaxPool2D](https://www.tensorflow.org/api_docs/python/tf/keras/layers/MaxPool2D): Using default parameters 151 | # - [Flatten](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Flatten) the previous output. 152 | # - Fully-connected ([Dense](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Dense)) layer: Apply a fully connected layer with 1 neuron and a sigmoid activation. 153 | # 154 | # 155 | # **Hint:** 156 | # 157 | # Use **tfl** as shorthand for **tensorflow.keras.layers** 158 | 159 | # In[8]: 160 | 161 | 162 | def happyModel(): 163 | """ 164 | Implements the forward propagation for the binary classification model: 165 | ZEROPAD2D -> CONV2D -> BATCHNORM -> RELU -> MAXPOOL -> FLATTEN -> DENSE 166 | 167 | Note that for simplicity and grading purposes, you'll hard-code all the values 168 | such as the stride and kernel (filter) sizes. 169 | Normally, functions should take these values as function parameters. 170 | 171 | Arguments: 172 | None 173 | 174 | Returns: 175 | model -- TF Keras model (object containing the information for the entire training process) 176 | """ 177 | model = tf.keras.Sequential([ 178 | ## ZeroPadding2D with padding 3, input shape of 64 x 64 x 3 179 | tfl.ZeroPadding2D(padding=(3, 3), input_shape=(64, 64, 3)), 180 | ## Conv2D with 32 7x7 filters and stride of 1 181 | tfl.Conv2D(32, (7,7)), 182 | ## BatchNormalization for axis 3 183 | tfl.BatchNormalization(axis=-1), 184 | ## ReLU 185 | tfl.ReLU(), 186 | ## Max Pooling 2D with default parameters 187 | tfl.MaxPool2D(), 188 | ## Flatten layer 189 | tfl.Flatten(), 190 | ## Dense layer with 1 unit for output & 'sigmoid' activation 191 | tfl.Dense(1, activation='sigmoid') 192 | ]) 193 | 194 | return model 195 | 196 | 197 | # In[9]: 198 | 199 | 200 | happy_model = happyModel() 201 | # Print a summary for each layer 202 | for layer in summary(happy_model): 203 | print(layer) 204 | 205 | output = [['ZeroPadding2D', (None, 70, 70, 3), 0, ((3, 3), (3, 3))], 206 | ['Conv2D', (None, 64, 64, 32), 4736, 'valid', 'linear', 'GlorotUniform'], 207 | ['BatchNormalization', (None, 64, 64, 32), 128], 208 | ['ReLU', (None, 64, 64, 32), 0], 209 | ['MaxPooling2D', (None, 32, 32, 32), 0, (2, 2), (2, 2), 'valid'], 210 | ['Flatten', (None, 32768), 0], 211 | ['Dense', (None, 1), 32769, 'sigmoid']] 212 | 213 | comparator(summary(happy_model), output) 214 | 215 | 216 | # #### Expected Output: 217 | # 218 | # ``` 219 | # ['ZeroPadding2D', (None, 70, 70, 3), 0, ((3, 3), (3, 3))] 220 | # ['Conv2D', (None, 64, 64, 32), 4736, 'valid', 'linear', 'GlorotUniform'] 221 | # ['BatchNormalization', (None, 64, 64, 32), 128] 222 | # ['ReLU', (None, 64, 64, 32), 0] 223 | # ['MaxPooling2D', (None, 32, 32, 32), 0, (2, 2), (2, 2), 'valid'] 224 | # ['Flatten', (None, 32768), 0] 225 | # ['Dense', (None, 1), 32769, 'sigmoid'] 226 | # All tests passed! 227 | # ``` 228 | 229 | # Now that your model is created, you can compile it for training with an optimizer and loss of your choice. When the string `accuracy` is specified as a metric, the type of accuracy used will be automatically converted based on the loss function used. This is one of the many optimizations built into TensorFlow that make your life easier! If you'd like to read more on how the compiler operates, check the docs [here](https://www.tensorflow.org/api_docs/python/tf/keras/Model#compile). 230 | 231 | # In[10]: 232 | 233 | 234 | happy_model.compile(optimizer='adam', 235 | loss='binary_crossentropy', 236 | metrics=['accuracy']) 237 | 238 | 239 | # It's time to check your model's parameters with the `.summary()` method. This will display the types of layers you have, the shape of the outputs, and how many parameters are in each layer. 240 | 241 | # In[11]: 242 | 243 | 244 | happy_model.summary() 245 | 246 | 247 | # 248 | # ### 3.2 - Train and Evaluate the Model 249 | # 250 | # After creating the model, compiling it with your choice of optimizer and loss function, and doing a sanity check on its contents, you are now ready to build! 251 | # 252 | # Simply call `.fit()` to train. That's it! No need for mini-batching, saving, or complex backpropagation computations. That's all been done for you, as you're using a TensorFlow dataset with the batches specified already. You do have the option to specify epoch number or minibatch size if you like (for example, in the case of an un-batched dataset). 253 | 254 | # In[12]: 255 | 256 | 257 | happy_model.fit(X_train, Y_train, epochs=10, batch_size=16) 258 | 259 | 260 | # After that completes, just use `.evaluate()` to evaluate against your test set. This function will print the value of the loss function and the performance metrics specified during the compilation of the model. In this case, the `binary_crossentropy` and the `accuracy` respectively. 261 | 262 | # In[13]: 263 | 264 | 265 | happy_model.evaluate(X_test, Y_test) 266 | 267 | 268 | # Easy, right? But what if you need to build a model with shared layers, branches, or multiple inputs and outputs? This is where Sequential, with its beautifully simple yet limited functionality, won't be able to help you. 269 | # 270 | # Next up: Enter the Functional API, your slightly more complex, highly flexible friend. 271 | 272 | # 273 | # ## 4 - The Functional API 274 | 275 | # Welcome to the second half of the assignment, where you'll use Keras' flexible [Functional API](https://www.tensorflow.org/guide/keras/functional) to build a ConvNet that can differentiate between 6 sign language digits. 276 | # 277 | # The Functional API can handle models with non-linear topology, shared layers, as well as layers with multiple inputs or outputs. Imagine that, where the Sequential API requires the model to move in a linear fashion through its layers, the Functional API allows much more flexibility. Where Sequential is a straight line, a Functional model is a graph, where the nodes of the layers can connect in many more ways than one. 278 | # 279 | # In the visual example below, the one possible direction of the movement Sequential model is shown in contrast to a skip connection, which is just one of the many ways a Functional model can be constructed. A skip connection, as you might have guessed, skips some layer in the network and feeds the output to a later layer in the network. Don't worry, you'll be spending more time with skip connections very soon! 280 | 281 | # 282 | 283 | # 284 | # ### 4.1 - Load the SIGNS Dataset 285 | # 286 | # As a reminder, the SIGNS dataset is a collection of 6 signs representing numbers from 0 to 5. 287 | 288 | # In[14]: 289 | 290 | 291 | # Loading the data (signs) 292 | X_train_orig, Y_train_orig, X_test_orig, Y_test_orig, classes = load_signs_dataset() 293 | 294 | 295 | # 296 | # 297 | # The next cell will show you an example of a labelled image in the dataset. Feel free to change the value of `index` below and re-run to see different examples. 298 | 299 | # In[15]: 300 | 301 | 302 | # Example of an image from the dataset 303 | index = 9 304 | plt.imshow(X_train_orig[index]) 305 | print ("y = " + str(np.squeeze(Y_train_orig[:, index]))) 306 | 307 | 308 | # 309 | # ### 4.2 - Split the Data into Train/Test Sets 310 | # 311 | # In Course 2, you built a fully-connected network for this dataset. But since this is an image dataset, it is more natural to apply a ConvNet to it. 312 | # 313 | # To get started, let's examine the shapes of your data. 314 | 315 | # In[16]: 316 | 317 | 318 | X_train = X_train_orig/255. 319 | X_test = X_test_orig/255. 320 | Y_train = convert_to_one_hot(Y_train_orig, 6).T 321 | Y_test = convert_to_one_hot(Y_test_orig, 6).T 322 | print ("number of training examples = " + str(X_train.shape[0])) 323 | print ("number of test examples = " + str(X_test.shape[0])) 324 | print ("X_train shape: " + str(X_train.shape)) 325 | print ("Y_train shape: " + str(Y_train.shape)) 326 | print ("X_test shape: " + str(X_test.shape)) 327 | print ("Y_test shape: " + str(Y_test.shape)) 328 | 329 | 330 | # 331 | # ### 4.3 - Forward Propagation 332 | # 333 | # In TensorFlow, there are built-in functions that implement the convolution steps for you. By now, you should be familiar with how TensorFlow builds computational graphs. In the [Functional API](https://www.tensorflow.org/guide/keras/functional), you create a graph of layers. This is what allows such great flexibility. 334 | # 335 | # However, the following model could also be defined using the Sequential API since the information flow is on a single line. But don't deviate. What we want you to learn is to use the functional API. 336 | # 337 | # Begin building your graph of layers by creating an input node that functions as a callable object: 338 | # 339 | # - **input_img = tf.keras.Input(shape=input_shape):** 340 | # 341 | # Then, create a new node in the graph of layers by calling a layer on the `input_img` object: 342 | # 343 | # - **tf.keras.layers.Conv2D(filters= ... , kernel_size= ... , padding='same')(input_img):** Read the full documentation on [Conv2D](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Conv2D). 344 | # 345 | # - **tf.keras.layers.MaxPool2D(pool_size=(f, f), strides=(s, s), padding='same'):** `MaxPool2D()` downsamples your input using a window of size (f, f) and strides of size (s, s) to carry out max pooling over each window. For max pooling, you usually operate on a single example at a time and a single channel at a time. Read the full documentation on [MaxPool2D](https://www.tensorflow.org/api_docs/python/tf/keras/layers/MaxPool2D). 346 | # 347 | # - **tf.keras.layers.ReLU():** computes the elementwise ReLU of Z (which can be any shape). You can read the full documentation on [ReLU](https://www.tensorflow.org/api_docs/python/tf/keras/layers/ReLU). 348 | # 349 | # - **tf.keras.layers.Flatten()**: given a tensor "P", this function takes each training (or test) example in the batch and flattens it into a 1D vector. 350 | # 351 | # * If a tensor P has the shape (batch_size,h,w,c), it returns a flattened tensor with shape (batch_size, k), where $k=h \times w \times c$. "k" equals the product of all the dimension sizes other than the first dimension. 352 | # 353 | # * For example, given a tensor with dimensions [100, 2, 3, 4], it flattens the tensor to be of shape [100, 24], where 24 = 2 * 3 * 4. You can read the full documentation on [Flatten](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Flatten). 354 | # 355 | # - **tf.keras.layers.Dense(units= ... , activation='softmax')(F):** given the flattened input F, it returns the output computed using a fully connected layer. You can read the full documentation on [Dense](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Dense). 356 | # 357 | # In the last function above (`tf.keras.layers.Dense()`), the fully connected layer automatically initializes weights in the graph and keeps on training them as you train the model. Hence, you did not need to initialize those weights when initializing the parameters. 358 | # 359 | # Lastly, before creating the model, you'll need to define the output using the last of the function's compositions (in this example, a Dense layer): 360 | # 361 | # - **outputs = tf.keras.layers.Dense(units=6, activation='softmax')(F)** 362 | # 363 | # 364 | # #### Window, kernel, filter, pool 365 | # 366 | # The words "kernel" and "filter" are used to refer to the same thing. The word "filter" accounts for the amount of "kernels" that will be used in a single convolution layer. "Pool" is the name of the operation that takes the max or average value of the kernels. 367 | # 368 | # This is why the parameter `pool_size` refers to `kernel_size`, and you use `(f,f)` to refer to the filter size. 369 | # 370 | # Pool size and kernel size refer to the same thing in different objects - They refer to the shape of the window where the operation takes place. 371 | 372 | # 373 | # ### Exercise 2 - convolutional_model 374 | # 375 | # Implement the `convolutional_model` function below to build the following model: `CONV2D -> RELU -> MAXPOOL -> CONV2D -> RELU -> MAXPOOL -> FLATTEN -> DENSE`. Use the functions above! 376 | # 377 | # Also, plug in the following parameters for all the steps: 378 | # 379 | # - [Conv2D](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Conv2D): Use 8 4 by 4 filters, stride 1, padding is "SAME" 380 | # - [ReLU](https://www.tensorflow.org/api_docs/python/tf/keras/layers/ReLU) 381 | # - [MaxPool2D](https://www.tensorflow.org/api_docs/python/tf/keras/layers/MaxPool2D): Use an 8 by 8 filter size and an 8 by 8 stride, padding is "SAME" 382 | # - **Conv2D**: Use 16 2 by 2 filters, stride 1, padding is "SAME" 383 | # - **ReLU** 384 | # - **MaxPool2D**: Use a 4 by 4 filter size and a 4 by 4 stride, padding is "SAME" 385 | # - [Flatten](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Flatten) the previous output. 386 | # - Fully-connected ([Dense](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Dense)) layer: Apply a fully connected layer with 6 neurons and a softmax activation. 387 | 388 | # In[17]: 389 | 390 | 391 | # GRADED FUNCTION: convolutional_model 392 | 393 | def convolutional_model(input_shape): 394 | """ 395 | Implements the forward propagation for the model: 396 | CONV2D -> RELU -> MAXPOOL -> CONV2D -> RELU -> MAXPOOL -> FLATTEN -> DENSE 397 | 398 | Note that for simplicity and grading purposes, you'll hard-code some values 399 | such as the stride and kernel (filter) sizes. 400 | Normally, functions should take these values as function parameters. 401 | 402 | Arguments: 403 | input_img -- input dataset, of shape (input_shape) 404 | 405 | Returns: 406 | model -- TF Keras model (object containing the information for the entire training process) 407 | """ 408 | 409 | input_img = tf.keras.Input(shape=input_shape) 410 | ## CONV2D: 8 filters 4x4, stride of 1, padding 'SAME' 411 | # Z1 = None 412 | ## RELU 413 | # A1 = None 414 | ## MAXPOOL: window 8x8, stride 8, padding 'SAME' 415 | # P1 = None 416 | ## CONV2D: 16 filters 2x2, stride 1, padding 'SAME' 417 | # Z2 = None 418 | ## RELU 419 | # A2 = None 420 | ## MAXPOOL: window 4x4, stride 4, padding 'SAME' 421 | # P2 = None 422 | ## FLATTEN 423 | # F = None 424 | ## Dense layer 425 | ## 6 neurons in output layer. Hint: one of the arguments should be "activation='softmax'" 426 | # outputs = None 427 | # YOUR CODE STARTS HERE 428 | Z1 = tfl.Conv2D(8, 4, activation='linear', padding="same", strides=1)(input_img) 429 | A1 = tfl.ReLU()(Z1) 430 | P1 = tfl.MaxPool2D(pool_size=(8, 8), strides=(8, 8), padding='same')(A1) 431 | Z2 = tfl.Conv2D(16, 2, activation='linear', padding="same", strides=1)(P1) 432 | A2 = tfl.ReLU()(Z2) 433 | P2 = tfl.MaxPool2D(pool_size=(4, 4), strides=(4, 4), padding='same')(A2) 434 | F = tfl.Flatten()(P2) 435 | outputs = tfl.Dense(6, activation='softmax')(F) 436 | 437 | # YOUR CODE ENDS HERE 438 | model = tf.keras.Model(inputs=input_img, outputs=outputs) 439 | return model 440 | 441 | 442 | # In[18]: 443 | 444 | 445 | conv_model = convolutional_model((64, 64, 3)) 446 | conv_model.compile(optimizer='adam', 447 | loss='categorical_crossentropy', 448 | metrics=['accuracy']) 449 | conv_model.summary() 450 | 451 | output = [['InputLayer', [(None, 64, 64, 3)], 0], 452 | ['Conv2D', (None, 64, 64, 8), 392, 'same', 'linear', 'GlorotUniform'], 453 | ['ReLU', (None, 64, 64, 8), 0], 454 | ['MaxPooling2D', (None, 8, 8, 8), 0, (8, 8), (8, 8), 'same'], 455 | ['Conv2D', (None, 8, 8, 16), 528, 'same', 'linear', 'GlorotUniform'], 456 | ['ReLU', (None, 8, 8, 16), 0], 457 | ['MaxPooling2D', (None, 2, 2, 16), 0, (4, 4), (4, 4), 'same'], 458 | ['Flatten', (None, 64), 0], 459 | ['Dense', (None, 6), 390, 'softmax']] 460 | 461 | comparator(summary(conv_model), output) 462 | 463 | 464 | # Both the Sequential and Functional APIs return a TF Keras model object. The only difference is how inputs are handled inside the object model! 465 | 466 | # 467 | # ### 4.4 - Train the Model 468 | 469 | # In[19]: 470 | 471 | 472 | train_dataset = tf.data.Dataset.from_tensor_slices((X_train, Y_train)).batch(64) 473 | test_dataset = tf.data.Dataset.from_tensor_slices((X_test, Y_test)).batch(64) 474 | history = conv_model.fit(train_dataset, epochs=100, validation_data=test_dataset) 475 | 476 | 477 | # 478 | # ## 5 - History Object 479 | # 480 | # The history object is an output of the `.fit()` operation, and provides a record of all the loss and metric values in memory. It's stored as a dictionary that you can retrieve at `history.history`: 481 | 482 | # In[20]: 483 | 484 | 485 | history.history 486 | 487 | 488 | # Now visualize the loss over time using `history.history`: 489 | 490 | # In[21]: 491 | 492 | 493 | # The history.history["loss"] entry is a dictionary with as many values as epochs that the 494 | # model was trained on. 495 | df_loss_acc = pd.DataFrame(history.history) 496 | df_loss= df_loss_acc[['loss','val_loss']] 497 | df_loss.rename(columns={'loss':'train','val_loss':'validation'},inplace=True) 498 | df_acc= df_loss_acc[['accuracy','val_accuracy']] 499 | df_acc.rename(columns={'accuracy':'train','val_accuracy':'validation'},inplace=True) 500 | df_loss.plot(title='Model loss',figsize=(12,8)).set(xlabel='Epoch',ylabel='Loss') 501 | df_acc.plot(title='Model Accuracy',figsize=(12,8)).set(xlabel='Epoch',ylabel='Accuracy') 502 | 503 | 504 | # **Congratulations**! You've finished the assignment and built two models: One that recognizes smiles, and another that recognizes SIGN language with almost 80% accuracy on the test set. In addition to that, you now also understand the applications of two Keras APIs: Sequential and Functional. Nicely done! 505 | # 506 | # By now, you know a bit about how the Functional API works and may have glimpsed the possibilities. In your next assignment, you'll really get a feel for its power when you get the opportunity to build a very deep ConvNet, using ResNets! 507 | 508 | # 509 | # ## 6 - Bibliography 510 | # 511 | # You're always encouraged to read the official documentation. To that end, you can find the docs for the Sequential and Functional APIs here: 512 | # 513 | # https://www.tensorflow.org/guide/keras/sequential_model 514 | # 515 | # https://www.tensorflow.org/guide/keras/functional 516 | -------------------------------------------------------------------------------- /Course 4-ConvolutionalNeuralNetworks/Week 1/Week 1 The Basics of ConvNets.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TheKidPadra/DeepLearning.AI-Deep-Learning-Specialization/0a951ae6671f739e86b9a2846c821df1a0215ef6/Course 4-ConvolutionalNeuralNetworks/Week 1/Week 1 The Basics of ConvNets.pdf -------------------------------------------------------------------------------- /Course 4-ConvolutionalNeuralNetworks/Week 2/Transfer_learning_with_MobileNet_v1.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # coding: utf-8 3 | 4 | # # Transfer Learning with MobileNetV2 5 | 6 | # Welcome to this week's assignment, where you'll be using transfer learning on a pre-trained CNN to build an Alpaca/Not Alpaca classifier! 7 | # 8 | # 9 | # 10 | # A pre-trained model is a network that's already been trained on a large dataset and saved, which allows you to use it to customize your own model cheaply and efficiently. The one you'll be using, MobileNetV2, was designed to provide fast and computationally efficient performance. It's been pre-trained on ImageNet, a dataset containing over 14 million images and 1000 classes. 11 | # 12 | # By the end of this assignment, you will be able to: 13 | # 14 | # - Create a dataset from a directory 15 | # - Preprocess and augment data using the Sequential API 16 | # - Adapt a pretrained model to new data and train a classifier using the Functional API and MobileNet 17 | # - Fine-tune a classifier's final layers to improve accuracy 18 | # 19 | # ## Important Note on Submission to the AutoGrader 20 | # 21 | # Before submitting your assignment to the AutoGrader, please make sure you are not doing the following: 22 | # 23 | # 1. You have not added any _extra_ `print` statement(s) in the assignment. 24 | # 2. You have not added any _extra_ code cell(s) in the assignment. 25 | # 3. You have not changed any of the function parameters. 26 | # 4. You are not using any global variables inside your graded exercises. Unless specifically instructed to do so, please refrain from it and use the local variables instead. 27 | # 5. You are not changing the assignment code where it is not required, like creating _extra_ variables. 28 | # 29 | # If you do any of the following, you will get something like, `Grader Error: Grader feedback not found` (or similarly unexpected) error upon submitting your assignment. Before asking for help/debugging the errors in your assignment, check for these first. If this is the case, and you don't remember the changes you have made, you can get a fresh copy of the assignment by following these [instructions](https://www.coursera.org/learn/convolutional-neural-networks/supplement/DS4yP/h-ow-to-refresh-your-workspace). 30 | 31 | # ## Table of Content 32 | # 33 | # - [1 - Packages](#1) 34 | # - [1.1 Create the Dataset and Split it into Training and Validation Sets](#1-1) 35 | # - [2 - Preprocess and Augment Training Data](#2) 36 | # - [Exercise 1 - data_augmenter](#ex-1) 37 | # - [3 - Using MobileNetV2 for Transfer Learning](#3) 38 | # - [3.1 - Inside a MobileNetV2 Convolutional Building Block](#3-1) 39 | # - [3.2 - Layer Freezing with the Functional API](#3-2) 40 | # - [Exercise 2 - alpaca_model](#ex-2) 41 | # - [3.3 - Fine-tuning the Model](#3-3) 42 | # - [Exercise 3](#ex-3) 43 | 44 | # 45 | # ## 1 - Packages 46 | 47 | # In[2]: 48 | 49 | 50 | import matplotlib.pyplot as plt 51 | import numpy as np 52 | import os 53 | import tensorflow as tf 54 | import tensorflow.keras.layers as tfl 55 | 56 | from tensorflow.keras.preprocessing import image_dataset_from_directory 57 | from tensorflow.keras.layers.experimental.preprocessing import RandomFlip, RandomRotation 58 | 59 | 60 | # 61 | # ### 1.1 Create the Dataset and Split it into Training and Validation Sets 62 | # 63 | # When training and evaluating deep learning models in Keras, generating a dataset from image files stored on disk is simple and fast. Call `image_data_set_from_directory()` to read from the directory and create both training and validation datasets. 64 | # 65 | # If you're specifying a validation split, you'll also need to specify the subset for each portion. Just set the training set to `subset='training'` and the validation set to `subset='validation'`. 66 | # 67 | # You'll also set your seeds to match each other, so your training and validation sets don't overlap. :) 68 | 69 | # In[3]: 70 | 71 | 72 | BATCH_SIZE = 32 73 | IMG_SIZE = (160, 160) 74 | directory = "dataset/" 75 | train_dataset = image_dataset_from_directory(directory, 76 | shuffle=True, 77 | batch_size=BATCH_SIZE, 78 | image_size=IMG_SIZE, 79 | validation_split=0.2, 80 | subset='training', 81 | seed=42) 82 | validation_dataset = image_dataset_from_directory(directory, 83 | shuffle=True, 84 | batch_size=BATCH_SIZE, 85 | image_size=IMG_SIZE, 86 | validation_split=0.2, 87 | subset='validation', 88 | seed=42) 89 | 90 | 91 | # Now let's take a look at some of the images from the training set: 92 | # 93 | # **Note:** The original dataset has some mislabelled images in it as well. 94 | 95 | # In[4]: 96 | 97 | 98 | class_names = train_dataset.class_names 99 | 100 | plt.figure(figsize=(10, 10)) 101 | for images, labels in train_dataset.take(1): 102 | for i in range(9): 103 | ax = plt.subplot(3, 3, i + 1) 104 | plt.imshow(images[i].numpy().astype("uint8")) 105 | plt.title(class_names[labels[i]]) 106 | plt.axis("off") 107 | 108 | 109 | # 110 | # ## 2 - Preprocess and Augment Training Data 111 | # 112 | # You may have encountered `dataset.prefetch` in a previous TensorFlow assignment, as an important extra step in data preprocessing. 113 | # 114 | # Using `prefetch()` prevents a memory bottleneck that can occur when reading from disk. It sets aside some data and keeps it ready for when it's needed, by creating a source dataset from your input data, applying a transformation to preprocess it, then iterating over the dataset one element at a time. Because the iteration is streaming, the data doesn't need to fit into memory. 115 | # 116 | # You can set the number of elements to prefetch manually, or you can use `tf.data.experimental.AUTOTUNE` to choose the parameters automatically. Autotune prompts `tf.data` to tune that value dynamically at runtime, by tracking the time spent in each operation and feeding those times into an optimization algorithm. The optimization algorithm tries to find the best allocation of its CPU budget across all tunable operations. 117 | # 118 | # To increase diversity in the training set and help your model learn the data better, it's standard practice to augment the images by transforming them, i.e., randomly flipping and rotating them. Keras' Sequential API offers a straightforward method for these kinds of data augmentations, with built-in, customizable preprocessing layers. These layers are saved with the rest of your model and can be re-used later. Ahh, so convenient! 119 | # 120 | # As always, you're invited to read the official docs, which you can find for data augmentation [here](https://www.tensorflow.org/tutorials/images/data_augmentation). 121 | # 122 | 123 | # In[6]: 124 | 125 | 126 | AUTOTUNE = tf.data.experimental.AUTOTUNE 127 | train_dataset = train_dataset.prefetch(buffer_size=AUTOTUNE) 128 | 129 | 130 | # 131 | # ### Exercise 1 - data_augmenter 132 | # 133 | # Implement a function for data augmentation. Use a `Sequential` keras model composed of 2 layers: 134 | # * `RandomFlip('horizontal')` 135 | # * `RandomRotation(0.2)` 136 | 137 | # In[8]: 138 | 139 | 140 | # UNQ_C1 141 | # GRADED FUNCTION: data_augmenter 142 | def data_augmenter(): 143 | ''' 144 | Create a Sequential model composed of 2 layers 145 | Returns: 146 | tf.keras.Sequential 147 | ''' 148 | ### START CODE HERE 149 | data_augmentation = tf.keras.Sequential() 150 | data_augmentation.add(RandomFlip('horizontal')) 151 | data_augmentation.add(RandomRotation(0.2)) 152 | ### END CODE HERE 153 | 154 | return data_augmentation 155 | 156 | 157 | # In[9]: 158 | 159 | 160 | augmenter = data_augmenter() 161 | 162 | assert(augmenter.layers[0].name.startswith('random_flip')), "First layer must be RandomFlip" 163 | assert augmenter.layers[0].mode == 'horizontal', "RadomFlip parameter must be horizontal" 164 | assert(augmenter.layers[1].name.startswith('random_rotation')), "Second layer must be RandomRotation" 165 | assert augmenter.layers[1].factor == 0.2, "Rotation factor must be 0.2" 166 | assert len(augmenter.layers) == 2, "The model must have only 2 layers" 167 | 168 | print('\033[92mAll tests passed!') 169 | 170 | 171 | # Take a look at how an image from the training set has been augmented with simple transformations: 172 | # 173 | # From one cute animal, to 9 variations of that cute animal, in three lines of code. Now your model has a lot more to learn from. 174 | 175 | # In[10]: 176 | 177 | 178 | data_augmentation = data_augmenter() 179 | 180 | for image, _ in train_dataset.take(1): 181 | plt.figure(figsize=(10, 10)) 182 | first_image = image[0] 183 | for i in range(9): 184 | ax = plt.subplot(3, 3, i + 1) 185 | augmented_image = data_augmentation(tf.expand_dims(first_image, 0)) 186 | plt.imshow(augmented_image[0] / 255) 187 | plt.axis('off') 188 | 189 | 190 | # Next, you'll apply your first tool from the MobileNet application in TensorFlow, to normalize your input. Since you're using a pre-trained model that was trained on the normalization values [-1,1], it's best practice to reuse that standard with tf.keras.applications.mobilenet_v2.preprocess_input. 191 | 192 | # 193 | # 194 | # **What you should remember:** 195 | # 196 | # * When calling image_data_set_from_directory(), specify the train/val subsets and match the seeds to prevent overlap 197 | # * Use prefetch() to prevent memory bottlenecks when reading from disk 198 | # * Give your model more to learn from with simple data augmentations like rotation and flipping. 199 | # * When using a pretrained model, it's best to reuse the weights it was trained on. 200 | 201 | # In[11]: 202 | 203 | 204 | preprocess_input = tf.keras.applications.mobilenet_v2.preprocess_input 205 | 206 | 207 | # 208 | # ## 3 - Using MobileNetV2 for Transfer Learning 209 | # 210 | # MobileNetV2 was trained on ImageNet and is optimized to run on mobile and other low-power applications. It's 155 layers deep (just in case you felt the urge to plot the model yourself, prepare for a long journey!) and very efficient for object detection and image segmentation tasks, as well as classification tasks like this one. The architecture has three defining characteristics: 211 | # 212 | # * Depthwise separable convolutions 213 | # * Thin input and output bottlenecks between layers 214 | # * Shortcut connections between bottleneck layers 215 | # 216 | # 217 | # ### 3.1 - Inside a MobileNetV2 Convolutional Building Block 218 | # 219 | # MobileNetV2 uses depthwise separable convolutions as efficient building blocks. Traditional convolutions are often very resource-intensive, and depthwise separable convolutions are able to reduce the number of trainable parameters and operations and also speed up convolutions in two steps: 220 | # 221 | # 1. The first step calculates an intermediate result by convolving on each of the channels independently. This is the depthwise convolution. 222 | # 223 | # 2. In the second step, another convolution merges the outputs of the previous step into one. This gets a single result from a single feature at a time, and then is applied to all the filters in the output layer. This is the pointwise convolution, or: **Shape of the depthwise convolution X Number of filters.** 224 | # 225 | # 226 | #
Figure 1 : MobileNetV2 Architecture
This diagram was inspired by the original seen here.
227 | # 228 | # Each block consists of an inverted residual structure with a bottleneck at each end. These bottlenecks encode the intermediate inputs and outputs in a low dimensional space, and prevent non-linearities from destroying important information. 229 | # 230 | # The shortcut connections, which are similar to the ones in traditional residual networks, serve the same purpose of speeding up training and improving predictions. These connections skip over the intermediate convolutions and connect the bottleneck layers. 231 | 232 | # Let's try to train your base model using all the layers from the pretrained model. 233 | # 234 | # Similarly to how you reused the pretrained normalization values MobileNetV2 was trained on, you'll also load the pretrained weights from ImageNet by specifying `weights='imagenet'`. 235 | 236 | # In[12]: 237 | 238 | 239 | IMG_SHAPE = IMG_SIZE + (3,) 240 | base_model = tf.keras.applications.MobileNetV2(input_shape=IMG_SHAPE, 241 | include_top=True, 242 | weights='imagenet') 243 | 244 | 245 | # Print the model summary below to see all the model's layers, the shapes of their outputs, and the total number of parameters, trainable and non-trainable. 246 | 247 | # In[13]: 248 | 249 | 250 | base_model.summary() 251 | 252 | 253 | # Note the last 2 layers here. They are the so called top layers, and they are responsible of the classification in the model 254 | 255 | # In[14]: 256 | 257 | 258 | nb_layers = len(base_model.layers) 259 | print(base_model.layers[nb_layers - 2].name) 260 | print(base_model.layers[nb_layers - 1].name) 261 | 262 | 263 | # Notice some of the layers in the summary like `Conv2D` and `DepthwiseConv2D` and how they follow the progression of expansion to depthwise convolution to projection. In combination with BatchNormalization and ReLU, these make up the bottleneck layers mentioned earlier. 264 | 265 | # 266 | # 267 | # **What you should remember**: 268 | # 269 | # * MobileNetV2's unique features are: 270 | # * Depthwise separable convolutions that provide lightweight feature filtering and creation 271 | # * Input and output bottlenecks that preserve important information on either end of the block 272 | # * Depthwise separable convolutions deal with both spatial and depth (number of channels) dimensions 273 | 274 | # Next, choose the first batch from the tensorflow dataset to use the images, and run it through the MobileNetV2 base model to test out the predictions on some of your images. 275 | 276 | # In[15]: 277 | 278 | 279 | image_batch, label_batch = next(iter(train_dataset)) 280 | feature_batch = base_model(image_batch) 281 | print(feature_batch.shape) 282 | 283 | 284 | # In[16]: 285 | 286 | 287 | #Shows the different label probabilities in one tensor 288 | label_batch 289 | 290 | 291 | # Now decode the predictions made by the model. Earlier, when you printed the shape of the batch, it would have returned (32, 1000). The number 32 refers to the batch size and 1000 refers to the 1000 classes the model was pretrained on. The predictions returned by the base model below follow this format: 292 | # 293 | # First the class number, then a human-readable label, and last the probability of the image belonging to that class. You'll notice that there are two of these returned for each image in the batch - these the top two probabilities returned for that image. 294 | 295 | # In[17]: 296 | 297 | 298 | base_model.trainable = False 299 | image_var = tf.Variable(preprocess_input(image_batch)) 300 | pred = base_model(image_var) 301 | 302 | tf.keras.applications.mobilenet_v2.decode_predictions(pred.numpy(), top=2) 303 | 304 | 305 | # Uh-oh. There's a whole lot of labels here, some of them hilariously wrong, but none of them say "alpaca." 306 | # 307 | # This is because MobileNet pretrained over ImageNet doesn't have the correct labels for alpacas, so when you use the full model, all you get is a bunch of incorrectly classified images. 308 | # 309 | # Fortunately, you can delete the top layer, which contains all the classification labels, and create a new classification layer. 310 | 311 | # 312 | # ### 3.2 - Layer Freezing with the Functional API 313 | # 314 | # 315 | # 316 | # In the next sections, you'll see how you can use a pretrained model to modify the classifier task so that it's able to recognize alpacas. You can achieve this in three steps: 317 | # 318 | # 1. Delete the top layer (the classification layer) 319 | # * Set `include_top` in `base_model` as False 320 | # 2. Add a new classifier layer 321 | # * Train only one layer by freezing the rest of the network 322 | # * As mentioned before, a single neuron is enough to solve a binary classification problem. 323 | # 3. Freeze the base model and train the newly-created classifier layer 324 | # * Set `base model.trainable=False` to avoid changing the weights and train *only* the new layer 325 | # * Set training in `base_model` to False to avoid keeping track of statistics in the batch norm layer 326 | 327 | # 328 | # ### Exercise 2 - alpaca_model 329 | 330 | # In[18]: 331 | 332 | 333 | # UNQ_C2 334 | # GRADED FUNCTION 335 | def alpaca_model(image_shape=IMG_SIZE, data_augmentation=data_augmenter()): 336 | ''' Define a tf.keras model for binary classification out of the MobileNetV2 model 337 | Arguments: 338 | image_shape -- Image width and height 339 | data_augmentation -- data augmentation function 340 | Returns: 341 | Returns: 342 | tf.keras.model 343 | ''' 344 | 345 | 346 | input_shape = image_shape + (3,) 347 | 348 | ### START CODE HERE 349 | 350 | base_model = tf.keras.applications.MobileNetV2(input_shape=input_shape, 351 | include_top=False, # <== Important!!!! 352 | weights='imagenet') # From imageNet 353 | # Freeze the base model by making it non trainable 354 | base_model.trainable = False 355 | 356 | # create the input layer (Same as the imageNetv2 input size) 357 | inputs = tf.keras.Input(shape=input_shape) 358 | 359 | # apply data augmentation to the inputs 360 | x = data_augmentation(inputs) 361 | 362 | # data preprocessing using the same weights the model was trained on 363 | x = tf.keras.applications.mobilenet_v2.preprocess_input(x) 364 | 365 | # set training to False to avoid keeping track of statistics in the batch norm layer 366 | x = base_model(x, training=False) 367 | 368 | # Add the new Binary classification layers 369 | # use global avg pooling to summarize the info in each channel 370 | x = tfl.GlobalAveragePooling2D()(x) 371 | #include dropout with probability of 0.2 to avoid overfitting 372 | x = tfl.Dropout(rate=0.2)(x) 373 | 374 | # create a prediction layer with one neuron (as a classifier only needs one) 375 | prediction_layer = tfl.Dense(1) 376 | 377 | ### END CODE HERE 378 | 379 | outputs = prediction_layer(x) 380 | model = tf.keras.Model(inputs, outputs) 381 | 382 | 383 | return model 384 | 385 | 386 | # Create your new model using the data_augmentation function defined earlier. 387 | 388 | # In[19]: 389 | 390 | 391 | model2 = alpaca_model(IMG_SIZE, data_augmentation) 392 | 393 | 394 | # In[20]: 395 | 396 | 397 | from test_utils import summary, comparator 398 | 399 | alpaca_summary = [['InputLayer', [(None, 160, 160, 3)], 0], 400 | ['Sequential', (None, 160, 160, 3), 0], 401 | ['TensorFlowOpLayer', [(None, 160, 160, 3)], 0], 402 | ['TensorFlowOpLayer', [(None, 160, 160, 3)], 0], 403 | ['Functional', (None, 5, 5, 1280), 2257984], 404 | ['GlobalAveragePooling2D', (None, 1280), 0], 405 | ['Dropout', (None, 1280), 0, 0.2], 406 | ['Dense', (None, 1), 1281, 'linear']] #linear is the default activation 407 | 408 | comparator(summary(model2), alpaca_summary) 409 | 410 | for layer in summary(model2): 411 | print(layer) 412 | 413 | 414 | 415 | # The base learning rate has been set for you, so you can go ahead and compile the new model and run it for 5 epochs: 416 | 417 | # In[21]: 418 | 419 | 420 | base_learning_rate = 0.001 421 | model2.compile(optimizer=tf.keras.optimizers.Adam(lr=base_learning_rate), 422 | loss=tf.keras.losses.BinaryCrossentropy(from_logits=True), 423 | metrics=['accuracy']) 424 | 425 | 426 | # In[22]: 427 | 428 | 429 | initial_epochs = 5 430 | history = model2.fit(train_dataset, validation_data=validation_dataset, epochs=initial_epochs) 431 | 432 | 433 | # Plot the training and validation accuracy: 434 | 435 | # In[23]: 436 | 437 | 438 | acc = [0.] + history.history['accuracy'] 439 | val_acc = [0.] + history.history['val_accuracy'] 440 | 441 | loss = history.history['loss'] 442 | val_loss = history.history['val_loss'] 443 | 444 | plt.figure(figsize=(8, 8)) 445 | plt.subplot(2, 1, 1) 446 | plt.plot(acc, label='Training Accuracy') 447 | plt.plot(val_acc, label='Validation Accuracy') 448 | plt.legend(loc='lower right') 449 | plt.ylabel('Accuracy') 450 | plt.ylim([min(plt.ylim()),1]) 451 | plt.title('Training and Validation Accuracy') 452 | 453 | plt.subplot(2, 1, 2) 454 | plt.plot(loss, label='Training Loss') 455 | plt.plot(val_loss, label='Validation Loss') 456 | plt.legend(loc='upper right') 457 | plt.ylabel('Cross Entropy') 458 | plt.ylim([0,1.0]) 459 | plt.title('Training and Validation Loss') 460 | plt.xlabel('epoch') 461 | plt.show() 462 | 463 | 464 | # In[24]: 465 | 466 | 467 | class_names 468 | 469 | 470 | # The results are ok, but could be better. Next, try some fine-tuning. 471 | 472 | # 473 | # ### 3.3 - Fine-tuning the Model 474 | # 475 | # You could try fine-tuning the model by re-running the optimizer in the last layers to improve accuracy. When you use a smaller learning rate, you take smaller steps to adapt it a little more closely to the new data. In transfer learning, the way you achieve this is by unfreezing the layers at the end of the network, and then re-training your model on the final layers with a very low learning rate. Adapting your learning rate to go over these layers in smaller steps can yield more fine details - and higher accuracy. 476 | # 477 | # The intuition for what's happening: when the network is in its earlier stages, it trains on low-level features, like edges. In the later layers, more complex, high-level features like wispy hair or pointy ears begin to emerge. For transfer learning, the low-level features can be kept the same, as they have common features for most images. When you add new data, you generally want the high-level features to adapt to it, which is rather like letting the network learn to detect features more related to your data, such as soft fur or big teeth. 478 | # 479 | # To achieve this, just unfreeze the final layers and re-run the optimizer with a smaller learning rate, while keeping all the other layers frozen. 480 | # 481 | # Where the final layers actually begin is a bit arbitrary, so feel free to play around with this number a bit. The important takeaway is that the later layers are the part of your network that contain the fine details (pointy ears, hairy tails) that are more specific to your problem. 482 | # 483 | # First, unfreeze the base model by setting `base_model.trainable=True`, set a layer to fine-tune from, then re-freeze all the layers before it. Run it again for another few epochs, and see if your accuracy improved! 484 | 485 | # 486 | # ### Exercise 3 487 | 488 | # In[27]: 489 | 490 | 491 | # UNQ_C3 492 | base_model = model2.layers[4] 493 | base_model.trainable = True 494 | # Let's take a look to see how many layers are in the base model 495 | print("Number of layers in the base model: ", len(base_model.layers)) 496 | 497 | # Fine-tune from this layer onwards 498 | fine_tune_at = 120 499 | 500 | ### START CODE HERE 501 | 502 | # Freeze all the layers before the `fine_tune_at` layer 503 | for layer in base_model.layers[:fine_tune_at]: 504 | layer.trainable = True 505 | 506 | # Define a BinaryCrossentropy loss function. Use from_logits=True 507 | loss_function=tf.python.keras.losses.BinaryCrossentropy(from_logits=True) 508 | # Define an Adam optimizer with a learning rate of 0.1 * base_learning_rate 509 | optimizer = tf.keras.optimizers.Adam(lr=base_learning_rate*0.1) 510 | # Use accuracy as evaluation metric 511 | metrics=['accuracy'] 512 | ### END CODE HERE 513 | 514 | model2.compile(loss=loss_function, 515 | optimizer = optimizer, 516 | metrics=metrics) 517 | 518 | 519 | # In[28]: 520 | 521 | 522 | assert type(loss_function) == tf.python.keras.losses.BinaryCrossentropy, "Not the correct layer" 523 | assert loss_function.from_logits, "Use from_logits=True" 524 | assert type(optimizer) == tf.keras.optimizers.Adam, "This is not an Adam optimizer" 525 | assert optimizer.lr == base_learning_rate / 10, "Wrong learning rate" 526 | assert metrics[0] == 'accuracy', "Wrong metric" 527 | 528 | print('\033[92mAll tests passed!') 529 | 530 | 531 | # In[ ]: 532 | 533 | 534 | fine_tune_epochs = 5 535 | total_epochs = initial_epochs + fine_tune_epochs 536 | 537 | history_fine = model2.fit(train_dataset, 538 | epochs=total_epochs, 539 | initial_epoch=history.epoch[-1], 540 | validation_data=validation_dataset) 541 | 542 | 543 | # Ahhh, quite an improvement! A little fine-tuning can really go a long way. 544 | 545 | # In[ ]: 546 | 547 | 548 | acc += history_fine.history['accuracy'] 549 | val_acc += history_fine.history['val_accuracy'] 550 | 551 | loss += history_fine.history['loss'] 552 | val_loss += history_fine.history['val_loss'] 553 | 554 | 555 | # In[ ]: 556 | 557 | 558 | plt.figure(figsize=(8, 8)) 559 | plt.subplot(2, 1, 1) 560 | plt.plot(acc, label='Training Accuracy') 561 | plt.plot(val_acc, label='Validation Accuracy') 562 | plt.ylim([0, 1]) 563 | plt.plot([initial_epochs-1,initial_epochs-1], 564 | plt.ylim(), label='Start Fine Tuning') 565 | plt.legend(loc='lower right') 566 | plt.title('Training and Validation Accuracy') 567 | 568 | plt.subplot(2, 1, 2) 569 | plt.plot(loss, label='Training Loss') 570 | plt.plot(val_loss, label='Validation Loss') 571 | plt.ylim([0, 1.0]) 572 | plt.plot([initial_epochs-1,initial_epochs-1], 573 | plt.ylim(), label='Start Fine Tuning') 574 | plt.legend(loc='upper right') 575 | plt.title('Training and Validation Loss') 576 | plt.xlabel('epoch') 577 | plt.show() 578 | 579 | 580 | # 581 | # 582 | # **What you should remember**: 583 | # 584 | # * To adapt the classifier to new data: Delete the top layer, add a new classification layer, and train only on that layer 585 | # * When freezing layers, avoid keeping track of statistics (like in the batch normalization layer) 586 | # * Fine-tune the final layers of your model to capture high-level details near the end of the network and potentially improve accuracy 587 | 588 | # ## Congratulations! 589 | # 590 | # You've completed this assignment on transfer learning and fine-tuning. Here's a quick recap of all you just accomplished: 591 | # 592 | # * Created a dataset from a directory 593 | # * Augmented data with the Sequential API 594 | # * Adapted a pretrained model to new data with the Functional API and MobileNetV2 595 | # * Fine-tuned the classifier's final layers and boosted the model's accuracy 596 | # 597 | # That's awesome! 598 | -------------------------------------------------------------------------------- /Course 4-ConvolutionalNeuralNetworks/Week 2/Week 2 Deep Convolutional Models.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TheKidPadra/DeepLearning.AI-Deep-Learning-Specialization/0a951ae6671f739e86b9a2846c821df1a0215ef6/Course 4-ConvolutionalNeuralNetworks/Week 2/Week 2 Deep Convolutional Models.pdf -------------------------------------------------------------------------------- /Course 4-ConvolutionalNeuralNetworks/Week 3/Week 3 Detection Algorithms.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TheKidPadra/DeepLearning.AI-Deep-Learning-Specialization/0a951ae6671f739e86b9a2846c821df1a0215ef6/Course 4-ConvolutionalNeuralNetworks/Week 3/Week 3 Detection Algorithms.pdf -------------------------------------------------------------------------------- /Course 4-ConvolutionalNeuralNetworks/Week 4/Week 4 Special Applications Face Recognition and Neural Style Transfer.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TheKidPadra/DeepLearning.AI-Deep-Learning-Specialization/0a951ae6671f739e86b9a2846c821df1a0215ef6/Course 4-ConvolutionalNeuralNetworks/Week 4/Week 4 Special Applications Face Recognition and Neural Style Transfer.pdf -------------------------------------------------------------------------------- /Course 5-SequenceModels/Week 1/Week 1 Recurrent Neural Networks.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TheKidPadra/DeepLearning.AI-Deep-Learning-Specialization/0a951ae6671f739e86b9a2846c821df1a0215ef6/Course 5-SequenceModels/Week 1/Week 1 Recurrent Neural Networks.pdf -------------------------------------------------------------------------------- /Course 5-SequenceModels/Week 2/Operations_on_word_vectors_v2a.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # coding: utf-8 3 | 4 | # # Operations on Word Vectors 5 | # 6 | # Welcome to your first assignment of Week 2, Course 5 of the Deep Learning Specialization! 7 | # 8 | # Because word embeddings are very computationally expensive to train, most ML practitioners will load a pre-trained set of embeddings. In this notebook you'll try your hand at loading, measuring similarity between, and modifying pre-trained embeddings. 9 | # 10 | # **After this assignment you'll be able to**: 11 | # 12 | # * Explain how word embeddings capture relationships between words 13 | # * Load pre-trained word vectors 14 | # * Measure similarity between word vectors using cosine similarity 15 | # * Use word embeddings to solve word analogy problems such as Man is to Woman as King is to ______. 16 | # 17 | # At the end of this notebook you'll have a chance to try an optional exercise, where you'll modify word embeddings to reduce their gender bias. Reducing bias is an important consideration in ML, so you're encouraged to take this challenge! 18 | # 19 | # ## Important Note on Submission to the AutoGrader 20 | # 21 | # Before submitting your assignment to the AutoGrader, please make sure you are not doing the following: 22 | # 23 | # 1. You have not added any _extra_ `print` statement(s) in the assignment. 24 | # 2. You have not added any _extra_ code cell(s) in the assignment. 25 | # 3. You have not changed any of the function parameters. 26 | # 4. You are not using any global variables inside your graded exercises. Unless specifically instructed to do so, please refrain from it and use the local variables instead. 27 | # 5. You are not changing the assignment code where it is not required, like creating _extra_ variables. 28 | # 29 | # If you do any of the following, you will get something like, `Grader Error: Grader feedback not found` (or similarly unexpected) error upon submitting your assignment. Before asking for help/debugging the errors in your assignment, check for these first. If this is the case, and you don't remember the changes you have made, you can get a fresh copy of the assignment by following these [instructions](https://www.coursera.org/learn/nlp-sequence-models/supplement/qHIve/h-ow-to-refresh-your-workspace). 30 | 31 | # ## Table of Contents 32 | # 33 | # - [Packages](#0) 34 | # - [1 - Load the Word Vectors](#1) 35 | # - [2 - Embedding Vectors Versus One-Hot Vectors](#2) 36 | # - [3 - Cosine Similarity](#3) 37 | # - [Exercise 1 - cosine_similarity](#ex-1) 38 | # - [4 - Word Analogy Task](#4) 39 | # - [Exercise 2 - complete_analogy](#ex-2) 40 | # - [5 - Debiasing Word Vectors (OPTIONAL/UNGRADED)](#5) 41 | # - [5.1 - Neutralize Bias for Non-Gender Specific Words](#5-1) 42 | # - [Exercise 3 - neutralize](#ex-3) 43 | # - [5.2 - Equalization Algorithm for Gender-Specific Words](#5-2) 44 | # - [Exercise 4 - equalize](#ex-4) 45 | # - [6 - References](#6) 46 | 47 | # 48 | # ## Packages 49 | # 50 | # Let's get started! Run the following cell to load the packages you'll need. 51 | 52 | # In[1]: 53 | 54 | 55 | import numpy as np 56 | from w2v_utils import * 57 | 58 | 59 | # 60 | # ## 1 - Load the Word Vectors 61 | # 62 | # For this assignment, you'll use 50-dimensional GloVe vectors to represent words. 63 | # Run the following cell to load the `word_to_vec_map`. 64 | 65 | # In[2]: 66 | 67 | 68 | words, word_to_vec_map = read_glove_vecs('data/glove.6B.50d.txt') 69 | 70 | 71 | # You've loaded: 72 | # - `words`: set of words in the vocabulary. 73 | # - `word_to_vec_map`: dictionary mapping words to their GloVe vector representation. 74 | # 75 | # 76 | # ## 2 - Embedding Vectors Versus One-Hot Vectors 77 | # Recall from the lesson videos that one-hot vectors don't do a good job of capturing the level of similarity between words. This is because every one-hot vector has the same Euclidean distance from any other one-hot vector. 78 | # 79 | # Embedding vectors, such as GloVe vectors, provide much more useful information about the meaning of individual words. 80 | # Now, see how you can use GloVe vectors to measure the similarity between two words! 81 | 82 | # 83 | # ## 3 - Cosine Similarity 84 | # 85 | # To measure the similarity between two words, you need a way to measure the degree of similarity between two embedding vectors for the two words. Given two vectors $u$ and $v$, cosine similarity is defined as follows: 86 | # 87 | # $$\text{CosineSimilarity(u, v)} = \frac {u \cdot v} {||u||_2 ||v||_2} = cos(\theta) \tag{1}$$ 88 | # 89 | # * $u \cdot v$ is the dot product (or inner product) of two vectors 90 | # * $||u||_2$ is the norm (or length) of the vector $u$ 91 | # * $\theta$ is the angle between $u$ and $v$. 92 | # * The cosine similarity depends on the angle between $u$ and $v$. 93 | # * If $u$ and $v$ are very similar, their cosine similarity will be close to 1. 94 | # * If they are dissimilar, the cosine similarity will take a smaller value. 95 | # 96 | # 97 | #
Figure 1: The cosine of the angle between two vectors is a measure of their similarity.
98 | # 99 | # 100 | # ### Exercise 1 - cosine_similarity 101 | # 102 | # Implement the function `cosine_similarity()` to evaluate the similarity between word vectors. 103 | # 104 | # **Reminder**: The norm of $u$ is defined as $ ||u||_2 = \sqrt{\sum_{i=1}^{n} u_i^2}$ 105 | # 106 | # #### Additional Hints 107 | # * You may find [np.dot](https://numpy.org/doc/stable/reference/generated/numpy.dot.html), [np.sum](https://numpy.org/doc/stable/reference/generated/numpy.sum.html), or [np.sqrt](https://numpy.org/doc/stable/reference/generated/numpy.sqrt.html) useful depending upon the implementation that you choose. 108 | 109 | # In[6]: 110 | 111 | 112 | # UNQ_C1 (UNIQUE CELL IDENTIFIER, DO NOT EDIT) 113 | # GRADED FUNCTION: cosine_similarity 114 | 115 | def cosine_similarity(u, v): 116 | """ 117 | Cosine similarity reflects the degree of similarity between u and v 118 | 119 | Arguments: 120 | u -- a word vector of shape (n,) 121 | v -- a word vector of shape (n,) 122 | 123 | Returns: 124 | cosine_similarity -- the cosine similarity between u and v defined by the formula above. 125 | """ 126 | 127 | # Special case. Consider the case u = [0, 0], v=[0, 0] 128 | if np.all(u == v): 129 | return 1 130 | 131 | ### START CODE HERE ### 132 | # Compute the dot product between u and v (≈1 line) 133 | dot = np.dot(u, v) 134 | 135 | # Compute the L2 norm of u (≈1 line) 136 | norm_u = np.sqrt(np.dot(u, u)) 137 | 138 | # Compute the L2 norm of v (≈1 line) 139 | norm_v = np.sqrt(np.dot(v, v)) 140 | 141 | # Avoid division by 0 142 | if np.isclose(norm_u * norm_v, 0, atol=1e-32): 143 | return 0 144 | 145 | # Compute the cosine similarity defined by formula (1) (≈1 line) 146 | cosine_similarity = dot / norm_u / norm_v 147 | ### END CODE HERE ### 148 | 149 | return cosine_similarity 150 | 151 | 152 | # In[7]: 153 | 154 | 155 | # START SKIP FOR GRADING 156 | father = word_to_vec_map["father"] 157 | mother = word_to_vec_map["mother"] 158 | ball = word_to_vec_map["ball"] 159 | crocodile = word_to_vec_map["crocodile"] 160 | france = word_to_vec_map["france"] 161 | italy = word_to_vec_map["italy"] 162 | paris = word_to_vec_map["paris"] 163 | rome = word_to_vec_map["rome"] 164 | 165 | print("cosine_similarity(father, mother) = ", cosine_similarity(father, mother)) 166 | print("cosine_similarity(ball, crocodile) = ",cosine_similarity(ball, crocodile)) 167 | print("cosine_similarity(france - paris, rome - italy) = ",cosine_similarity(france - paris, rome - italy)) 168 | # END SKIP FOR GRADING 169 | 170 | # PUBLIC TESTS 171 | def cosine_similarity_test(target): 172 | a = np.random.uniform(-10, 10, 10) 173 | b = np.random.uniform(-10, 10, 10) 174 | c = np.random.uniform(-1, 1, 23) 175 | 176 | assert np.isclose(cosine_similarity(a, a), 1), "cosine_similarity(a, a) must be 1" 177 | assert np.isclose(cosine_similarity((c >= 0) * 1, (c < 0) * 1), 0), "cosine_similarity(a, not(a)) must be 0" 178 | assert np.isclose(cosine_similarity(a, -a), -1), "cosine_similarity(a, -a) must be -1" 179 | assert np.isclose(cosine_similarity(a, b), cosine_similarity(a * 2, b * 4)), "cosine_similarity must be scale-independent. You must divide by the product of the norms of each input" 180 | 181 | print("\033[92mAll test passed!") 182 | 183 | cosine_similarity_test(cosine_similarity) 184 | 185 | 186 | # #### Try different words! 187 | # 188 | # After you get the correct expected output, please feel free to modify the inputs and measure the cosine similarity between other pairs of words! Playing around with the cosine similarity of other inputs will give you a better sense of how word vectors behave. 189 | 190 | # 191 | # ## 4 - Word Analogy Task 192 | # 193 | # * In the word analogy task, complete this sentence: 194 | # "*a* is to *b* as *c* is to **____**". 195 | # 196 | # * An example is: 197 | # '*man* is to *woman* as *king* is to *queen*' . 198 | # 199 | # * You're trying to find a word *d*, such that the associated word vectors $e_a, e_b, e_c, e_d$ are related in the following manner: 200 | # $e_b - e_a \approx e_d - e_c$ 201 | # * Measure the similarity between $e_b - e_a$ and $e_d - e_c$ using cosine similarity. 202 | # 203 | # 204 | # ### Exercise 2 - complete_analogy 205 | # 206 | # Complete the code below to perform word analogies! 207 | 208 | # In[8]: 209 | 210 | 211 | # UNQ_C2 (UNIQUE CELL IDENTIFIER, DO NOT EDIT) 212 | # GRADED FUNCTION: complete_analogy 213 | 214 | def complete_analogy(word_a, word_b, word_c, word_to_vec_map): 215 | """ 216 | Performs the word analogy task as explained above: a is to b as c is to ____. 217 | 218 | Arguments: 219 | word_a -- a word, string 220 | word_b -- a word, string 221 | word_c -- a word, string 222 | word_to_vec_map -- dictionary that maps words to their corresponding vectors. 223 | 224 | Returns: 225 | best_word -- the word such that v_b - v_a is close to v_best_word - v_c, as measured by cosine similarity 226 | """ 227 | 228 | # convert words to lowercase 229 | word_a, word_b, word_c = word_a.lower(), word_b.lower(), word_c.lower() 230 | 231 | ### START CODE HERE ### 232 | # Get the word embeddings e_a, e_b and e_c (≈1-3 lines) 233 | e_a, e_b, e_c = word_to_vec_map[word_a], word_to_vec_map[word_b], word_to_vec_map[word_c] 234 | ### END CODE HERE ### 235 | 236 | words = word_to_vec_map.keys() 237 | max_cosine_sim = -100 # Initialize max_cosine_sim to a large negative number 238 | best_word = None # Initialize best_word with None, it will help keep track of the word to output 239 | 240 | # loop over the whole word vector set 241 | for w in words: 242 | # to avoid best_word being one the input words, skip the input word_c 243 | # skip word_c from query 244 | if w == word_c: 245 | continue 246 | 247 | ### START CODE HERE ### 248 | # Compute cosine similarity between the vector (e_b - e_a) and the vector ((w's vector representation) - e_c) (≈1 line) 249 | cosine_sim = cosine_similarity((e_b - e_a), (word_to_vec_map[w] - e_c)) 250 | 251 | # If the cosine_sim is more than the max_cosine_sim seen so far, 252 | # then: set the new max_cosine_sim to the current cosine_sim and the best_word to the current word (≈3 lines) 253 | if cosine_sim > max_cosine_sim: 254 | max_cosine_sim = cosine_sim 255 | best_word = w 256 | ### END CODE HERE ### 257 | 258 | return best_word 259 | 260 | 261 | # In[9]: 262 | 263 | 264 | # PUBLIC TEST 265 | def complete_analogy_test(target): 266 | a = [3, 3] # Center at a 267 | a_nw = [2, 4] # North-West oriented vector from a 268 | a_s = [3, 2] # South oriented vector from a 269 | 270 | c = [-2, 1] # Center at c 271 | # Create a controlled word to vec map 272 | word_to_vec_map = {'a': a, 273 | 'synonym_of_a': a, 274 | 'a_nw': a_nw, 275 | 'a_s': a_s, 276 | 'c': c, 277 | 'c_n': [-2, 2], # N 278 | 'c_ne': [-1, 2], # NE 279 | 'c_e': [-1, 1], # E 280 | 'c_se': [-1, 0], # SE 281 | 'c_s': [-2, 0], # S 282 | 'c_sw': [-3, 0], # SW 283 | 'c_w': [-3, 1], # W 284 | 'c_nw': [-3, 2] # NW 285 | } 286 | 287 | # Convert lists to np.arrays 288 | for key in word_to_vec_map.keys(): 289 | word_to_vec_map[key] = np.array(word_to_vec_map[key]) 290 | 291 | assert(target('a', 'a_nw', 'c', word_to_vec_map) == 'c_nw') 292 | assert(target('a', 'a_s', 'c', word_to_vec_map) == 'c_s') 293 | assert(target('a', 'synonym_of_a', 'c', word_to_vec_map) != 'c'), "Best word cannot be input query" 294 | assert(target('a', 'c', 'a', word_to_vec_map) == 'c') 295 | 296 | print("\033[92mAll tests passed") 297 | 298 | complete_analogy_test(complete_analogy) 299 | 300 | 301 | # Run the cell below to test your code. Patience, young grasshopper...this may take 1-2 minutes. 302 | 303 | # In[10]: 304 | 305 | 306 | # START SKIP FOR GRADING 307 | triads_to_try = [('italy', 'italian', 'spain'), ('india', 'delhi', 'japan'), ('man', 'woman', 'boy'), ('small', 'smaller', 'large')] 308 | for triad in triads_to_try: 309 | print ('{} -> {} :: {} -> {}'.format( *triad, complete_analogy(*triad, word_to_vec_map))) 310 | 311 | # END SKIP FOR GRADING 312 | 313 | 314 | # Once you get the output, try modifying the input cells above to test your own analogies. 315 | # 316 | # **Hint**: Try to find some other analogy pairs that will work, along with some others where the algorithm doesn't give the right answer: 317 | # * For example, you can try small->smaller as big->? 318 | 319 | # ## Congratulations! 320 | # 321 | # You've come to the end of the graded portion of the assignment. By now, you've: 322 | # 323 | # * Loaded some pre-trained word vectors 324 | # * Measured the similarity between word vectors using cosine similarity 325 | # * Used word embeddings to solve word analogy problems such as Man is to Woman as King is to __. 326 | # 327 | # Cosine similarity is a relatively simple and intuitive, yet powerful, method you can use to capture nuanced relationships between words. These exercises should be helpful to you in explaining how it works, and applying it to your own projects! 328 | 329 | # 330 | # What you should remember: 331 | # 332 | # - Cosine similarity is a good way to compare the similarity between pairs of word vectors. 333 | # - Note that L2 (Euclidean) distance also works. 334 | # - For NLP applications, using a pre-trained set of word vectors is often a great way to get started. 335 | # 336 | # Even though you've finished the graded portion, please take a look at the rest of this notebook to learn about debiasing word vectors. 337 | 338 | # 339 | # ## 5 - Debiasing Word Vectors (OPTIONAL/UNGRADED) 340 | 341 | # In the following exercise, you'll examine gender biases that can be reflected in a word embedding, and explore algorithms for reducing the bias. In addition to learning about the topic of debiasing, this exercise will also help hone your intuition about what word vectors are doing. This section involves a bit of linear algebra, though you can certainly complete it without being an expert! Go ahead and give it a shot. This portion of the notebook is optional and is not graded...so just have fun and explore. 342 | # 343 | # First, see how the GloVe word embeddings relate to gender. You'll begin by computing a vector $g = e_{woman}-e_{man}$, where $e_{woman}$ represents the word vector corresponding to the word *woman*, and $e_{man}$ corresponds to the word vector corresponding to the word *man*. The resulting vector $g$ roughly encodes the concept of "gender". 344 | # 345 | # You might get a more accurate representation if you compute $g_1 = e_{mother}-e_{father}$, $g_2 = e_{girl}-e_{boy}$, etc. and average over them, but just using $e_{woman}-e_{man}$ will give good enough results for now. 346 | # 347 | 348 | # In[ ]: 349 | 350 | 351 | g = word_to_vec_map['woman'] - word_to_vec_map['man'] 352 | print(g) 353 | 354 | 355 | # Now, consider the cosine similarity of different words with $g$. What does a positive value of similarity mean, versus a negative cosine similarity? 356 | 357 | # In[ ]: 358 | 359 | 360 | print ('List of names and their similarities with constructed vector:') 361 | 362 | # girls and boys name 363 | name_list = ['john', 'marie', 'sophie', 'ronaldo', 'priya', 'rahul', 'danielle', 'reza', 'katy', 'yasmin'] 364 | 365 | for w in name_list: 366 | print (w, cosine_similarity(word_to_vec_map[w], g)) 367 | 368 | 369 | # As you can see, female first names tend to have a positive cosine similarity with our constructed vector $g$, while male first names tend to have a negative cosine similarity. This is not surprising, and the result seems acceptable. 370 | # 371 | # Now try with some other words: 372 | 373 | # In[ ]: 374 | 375 | 376 | print('Other words and their similarities:') 377 | word_list = ['lipstick', 'guns', 'science', 'arts', 'literature', 'warrior','doctor', 'tree', 'receptionist', 378 | 'technology', 'fashion', 'teacher', 'engineer', 'pilot', 'computer', 'singer'] 379 | for w in word_list: 380 | print (w, cosine_similarity(word_to_vec_map[w], g)) 381 | 382 | 383 | # Do you notice anything surprising? It is astonishing how these results reflect certain unhealthy gender stereotypes. For example, we see “computer” is negative and is closer in value to male first names, while “literature” is positive and is closer to female first names. Ouch! 384 | # 385 | # You'll see below how to reduce the bias of these vectors, using an algorithm due to [Boliukbasi et al., 2016](https://arxiv.org/abs/1607.06520). Note that some word pairs such as "actor"/"actress" or "grandmother"/"grandfather" should remain gender-specific, while other words such as "receptionist" or "technology" should be neutralized, i.e. not be gender-related. You'll have to treat these two types of words differently when debiasing. 386 | # 387 | # 388 | # ### 5.1 - Neutralize Bias for Non-Gender Specific Words 389 | # 390 | # The figure below should help you visualize what neutralizing does. If you're using a 50-dimensional word embedding, the 50 dimensional space can be split into two parts: The bias-direction $g$, and the remaining 49 dimensions, which is called $g_{\perp}$ here. In linear algebra, we say that the 49-dimensional $g_{\perp}$ is perpendicular (or "orthogonal") to $g$, meaning it is at 90 degrees to $g$. The neutralization step takes a vector such as $e_{receptionist}$ and zeros out the component in the direction of $g$, giving us $e_{receptionist}^{debiased}$. 391 | # 392 | # Even though $g_{\perp}$ is 49-dimensional, given the limitations of what you can draw on a 2D screen, it's illustrated using a 1-dimensional axis below. 393 | # 394 | # 395 | #
Figure 2: The word vector for "receptionist" represented before and after applying the neutralize operation.
396 | # 397 | # 398 | # ### Exercise 3 - neutralize 399 | # 400 | # Implement `neutralize()` to remove the bias of words such as "receptionist" or "scientist." 401 | # 402 | # Given an input embedding $e$, you can use the following formulas to compute $e^{debiased}$: 403 | # 404 | # $$e^{bias\_component} = \frac{e \cdot g}{||g||_2^2} * g\tag{2}$$ 405 | # $$e^{debiased} = e - e^{bias\_component}\tag{3}$$ 406 | # 407 | # If you are an expert in linear algebra, you may recognize $e^{bias\_component}$ as the projection of $e$ onto the direction $g$. If you're not an expert in linear algebra, don't worry about this. ;) 408 | # 409 | # **Note:** The [paper](https://papers.nips.cc/paper/6228-man-is-to-computer-programmer-as-woman-is-to-homemaker-debiasing-word-embeddings.pdf), which the debiasing algorithm is from, assumes all word vectors to have L2 norm as 1 and hence the need for the calculations below: 410 | 411 | # In[ ]: 412 | 413 | 414 | # The paper assumes all word vectors to have L2 norm as 1 and hence the need for this calculation 415 | from tqdm import tqdm 416 | word_to_vec_map_unit_vectors = { 417 | word: embedding / np.linalg.norm(embedding) 418 | for word, embedding in tqdm(word_to_vec_map.items()) 419 | } 420 | g_unit = word_to_vec_map_unit_vectors['woman'] - word_to_vec_map_unit_vectors['man'] 421 | 422 | 423 | # In[ ]: 424 | 425 | 426 | def neutralize(word, g, word_to_vec_map): 427 | """ 428 | Removes the bias of "word" by projecting it on the space orthogonal to the bias axis. 429 | This function ensures that gender neutral words are zero in the gender subspace. 430 | 431 | Arguments: 432 | word -- string indicating the word to debias 433 | g -- numpy-array of shape (50,), corresponding to the bias axis (such as gender) 434 | word_to_vec_map -- dictionary mapping words to their corresponding vectors. 435 | 436 | Returns: 437 | e_debiased -- neutralized word vector representation of the input "word" 438 | """ 439 | 440 | ### START CODE HERE ### 441 | # Select word vector representation of "word". Use word_to_vec_map. (≈ 1 line) 442 | e = word_to_vec_map[word] 443 | 444 | # Compute e_biascomponent using the formula given above. (≈ 1 line) 445 | e_biascomponent = np.dot(e, g) / np.dot(g, g) * g 446 | 447 | # Neutralize e by subtracting e_biascomponent from it 448 | # e_debiased should be equal to its orthogonal projection. (≈ 1 line) 449 | e_debiased = e - e_biascomponent 450 | ### END CODE HERE ### 451 | 452 | return e_debiased 453 | 454 | 455 | # In[ ]: 456 | 457 | 458 | word = "receptionist" 459 | print("cosine similarity between " + word + " and g, before neutralizing: ", cosine_similarity(word_to_vec_map[word], g)) 460 | 461 | e_debiased = neutralize(word, g_unit, word_to_vec_map_unit_vectors) 462 | print("cosine similarity between " + word + " and g_unit, after neutralizing: ", cosine_similarity(e_debiased, g_unit)) 463 | 464 | 465 | # **Expected Output**: The second result is essentially 0, up to numerical rounding (on the order of $10^{-17}$). 466 | # 467 | # 468 | # 469 | # 470 | # 473 | # 476 | # 477 | # 478 | # 481 | # 484 | #
471 | # cosine similarity between receptionist and g, before neutralizing: : 472 | # 474 | # 0.3307794175059374 475 | #
479 | # cosine similarity between receptionist and g_unit, after neutralizing : 480 | # 482 | # 3.5723165491646677e-17 483 | #
485 | 486 | # 487 | # ### 5.2 - Equalization Algorithm for Gender-Specific Words 488 | # 489 | # Next, let's see how debiasing can also be applied to word pairs such as "actress" and "actor." Equalization is applied to pairs of words that you might want to have differ only through the gender property. As a concrete example, suppose that "actress" is closer to "babysit" than "actor." By applying neutralization to "babysit," you can reduce the gender stereotype associated with babysitting. But this still does not guarantee that "actor" and "actress" are equidistant from "babysit." The equalization algorithm takes care of this. 490 | # 491 | # The key idea behind equalization is to make sure that a particular pair of words are equidistant from the 49-dimensional $g_\perp$. The equalization step also ensures that the two equalized steps are now the same distance from $e_{receptionist}^{debiased}$, or from any other work that has been neutralized. Visually, this is how equalization works: 492 | # 493 | # 494 | # 495 | # 496 | # The derivation of the linear algebra to do this is a bit more complex. (See Bolukbasi et al., 2016 in the References for details.) Here are the key equations: 497 | # 498 | # 499 | # $$ \mu = \frac{e_{w1} + e_{w2}}{2}\tag{4}$$ 500 | # 501 | # $$ \mu_{B} = \frac {\mu \cdot \text{bias_axis}}{||\text{bias_axis}||_2^2} *\text{bias_axis} 502 | # \tag{5}$$ 503 | # 504 | # $$\mu_{\perp} = \mu - \mu_{B} \tag{6}$$ 505 | # 506 | # $$ e_{w1B} = \frac {e_{w1} \cdot \text{bias_axis}}{||\text{bias_axis}||_2^2} *\text{bias_axis} 507 | # \tag{7}$$ 508 | # $$ e_{w2B} = \frac {e_{w2} \cdot \text{bias_axis}}{||\text{bias_axis}||_2^2} *\text{bias_axis} 509 | # \tag{8}$$ 510 | # 511 | # 512 | # $$e_{w1B}^{corrected} = \sqrt{{1 - ||\mu_{\perp} ||^2_2}} * \frac{e_{\text{w1B}} - \mu_B} {||e_{w1} - \mu_B||_2} \tag{9}$$ 513 | # 514 | # 515 | # $$e_{w2B}^{corrected} = \sqrt{{1 - ||\mu_{\perp} ||^2_2}} * \frac{e_{\text{w2B}} - \mu_B} {||e_{w2} - \mu_B||_2} \tag{10}$$ 516 | # 517 | # $$e_1 = e_{w1B}^{corrected} + \mu_{\perp} \tag{11}$$ 518 | # $$e_2 = e_{w2B}^{corrected} + \mu_{\perp} \tag{12}$$ 519 | # 520 | # 521 | # 522 | # ### Exercise 4 - equalize 523 | # 524 | # Implement the `equalize()` function below. 525 | # 526 | # Use the equations above to get the final equalized version of the pair of words. Good luck! 527 | # 528 | # **Hint** 529 | # - Use [np.linalg.norm](https://numpy.org/doc/stable/reference/generated/numpy.linalg.norm.html) 530 | 531 | # In[ ]: 532 | 533 | 534 | def equalize(pair, bias_axis, word_to_vec_map): 535 | """ 536 | Debias gender specific words by following the equalize method described in the figure above. 537 | 538 | Arguments: 539 | pair -- pair of strings of gender specific words to debias, e.g. ("actress", "actor") 540 | bias_axis -- numpy-array of shape (50,), vector corresponding to the bias axis, e.g. gender 541 | word_to_vec_map -- dictionary mapping words to their corresponding vectors 542 | 543 | Returns 544 | e_1 -- word vector corresponding to the first word 545 | e_2 -- word vector corresponding to the second word 546 | """ 547 | 548 | ### START CODE HERE ### 549 | # Step 1: Select word vector representation of "word". Use word_to_vec_map. (≈ 2 lines) 550 | w1, w2 = pair[0], pair[1] 551 | e_w1, e_w2 = word_to_vec_map[w1], word_to_vec_map[w2] 552 | 553 | # Step 2: Compute the mean of e_w1 and e_w2 (≈ 1 line) 554 | mu = (e_w1 + e_w2) / 2 555 | 556 | # Step 3: Compute the projections of mu over the bias axis and the orthogonal axis (≈ 2 lines) 557 | mu_B = np.dot(mu, bias_axis) / np.dot(bias_axis, bias_axis) * bias_axis 558 | mu_orth = mu - mu_B 559 | 560 | # Step 4: Use equations (7) and (8) to compute e_w1B and e_w2B (≈2 lines) 561 | e_w1B = np.dot(e_w1, bias_axis) / np.dot(bias_axis, bias_axis) * bias_axis 562 | e_w2B = np.dot(e_w2, bias_axis) / np.dot(bias_axis, bias_axis) * bias_axis 563 | 564 | # Step 5: Adjust the Bias part of e_w1B and e_w2B using the formulas (9) and (10) given above (≈2 lines) 565 | corrected_e_w1B = np.sqrt(abs(1 - np.dot(mu_orth, mu_orth))) * (e_w1B - mu_B) / abs((e_w1 - mu_orth) - mu_B) 566 | corrected_e_w2B = np.sqrt(abs(1 - np.dot(mu_orth, mu_orth))) * (e_w2B - mu_B) / abs((e_w2 - mu_orth) - mu_B) 567 | 568 | # Step 6: Debias by equalizing e1 and e2 to the sum of their corrected projections (≈2 lines) 569 | e1 = corrected_e_w1B + mu_orth 570 | e2 = corrected_e_w2B + mu_orth 571 | 572 | ### END CODE HERE ### 573 | 574 | return e1, e2 575 | 576 | 577 | # In[ ]: 578 | 579 | 580 | print("cosine similarities before equalizing:") 581 | print("cosine_similarity(word_to_vec_map[\"man\"], gender) = ", cosine_similarity(word_to_vec_map["man"], g)) 582 | print("cosine_similarity(word_to_vec_map[\"woman\"], gender) = ", cosine_similarity(word_to_vec_map["woman"], g)) 583 | print() 584 | e1, e2 = equalize(("man", "woman"), g_unit, word_to_vec_map_unit_vectors) 585 | print("cosine similarities after equalizing:") 586 | print("cosine_similarity(e1, gender) = ", cosine_similarity(e1, g_unit)) 587 | print("cosine_similarity(e2, gender) = ", cosine_similarity(e2, g_unit)) 588 | 589 | 590 | # **Expected Output**: 591 | # 592 | # cosine similarities before equalizing: 593 | # 594 | # 595 | # 598 | # 601 | # 602 | # 603 | # 606 | # 609 | # 610 | #
596 | # cosine_similarity(word_to_vec_map["man"], gender) = 597 | # 599 | # -0.117110957653 600 | #
604 | # cosine_similarity(word_to_vec_map["woman"], gender) = 605 | # 607 | # 0.356666188463 608 | #
611 | # 612 | # cosine similarities after equalizing: 613 | # 614 | # 615 | # 618 | # 621 | # 622 | # 623 | # 626 | # 629 | # 630 | #
616 | # cosine_similarity(e1, gender) = 617 | # 619 | # -0.058578740443554995 620 | #
624 | # cosine_similarity(e2, gender) = 625 | # 627 | # 0.058578740443555 628 | #
631 | 632 | # Go ahead and play with the input words in the cell above, to apply equalization to other pairs of words. 633 | # 634 | # Hint: Try... 635 | # 636 | # These debiasing algorithms are very helpful for reducing bias, but aren't perfect and don't eliminate all traces of bias. For example, one weakness of this implementation was that the bias direction $g$ was defined using only the pair of words _woman_ and _man_. As discussed earlier, if $g$ were defined by computing $g_1 = e_{woman} - e_{man}$; $g_2 = e_{mother} - e_{father}$; $g_3 = e_{girl} - e_{boy}$; and so on and averaging over them, you would obtain a better estimate of the "gender" dimension in the 50 dimensional word embedding space. Feel free to play with these types of variants as well! 637 | 638 | # ### Congratulations! 639 | # 640 | # You have come to the end of both graded and ungraded portions of this notebook, and have seen several of the ways that word vectors can be applied and modified. Great work pushing your knowledge in the areas of neutralizing and equalizing word vectors! See you next time. 641 | 642 | # 643 | # ## 6 - References 644 | # 645 | # - The debiasing algorithm is from Bolukbasi et al., 2016, [Man is to Computer Programmer as Woman is to 646 | # Homemaker? Debiasing Word Embeddings](https://papers.nips.cc/paper/6228-man-is-to-computer-programmer-as-woman-is-to-homemaker-debiasing-word-embeddings.pdf) 647 | # - The GloVe word embeddings were due to Jeffrey Pennington, Richard Socher, and Christopher D. Manning. (https://nlp.stanford.edu/projects/glove/) 648 | # 649 | 650 | # In[ ]: 651 | 652 | 653 | 654 | 655 | -------------------------------------------------------------------------------- /Course 5-SequenceModels/Week 2/Week 2 Natural Language Processing and Word Embeddings.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TheKidPadra/DeepLearning.AI-Deep-Learning-Specialization/0a951ae6671f739e86b9a2846c821df1a0215ef6/Course 5-SequenceModels/Week 2/Week 2 Natural Language Processing and Word Embeddings.pdf -------------------------------------------------------------------------------- /Course 5-SequenceModels/Week 3/Week 3 Sequence models and Attention Mechanism.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TheKidPadra/DeepLearning.AI-Deep-Learning-Specialization/0a951ae6671f739e86b9a2846c821df1a0215ef6/Course 5-SequenceModels/Week 3/Week 3 Sequence models and Attention Mechanism.pdf -------------------------------------------------------------------------------- /Course 5-SequenceModels/Week 4/Week 4 Transformers.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TheKidPadra/DeepLearning.AI-Deep-Learning-Specialization/0a951ae6671f739e86b9a2846c821df1a0215ef6/Course 5-SequenceModels/Week 4/Week 4 Transformers.pdf -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2023 Padra Esfandiyar 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Deep Learning Specialization on Coursera (offered by deeplearning.ai) 2 | 3 |

4 | 5 | Programming assignments and quizzes answers from all courses in the Coursera [Deep Learning Specialization](https://www.coursera.org/specializations/deep-learning) offered by [](https://www.deeplearning.ai/courses/deep-learning-specialization/) 6 | ## Credits 7 | 8 | This repo contains my work for this specialization. The code base, quiz questions and diagrams are taken from the [Deep Learning Specialization](https://www.coursera.org/specializations/deep-learning), unless specified otherwise. 9 | 10 | ## Courses 11 | 12 | The [Deep Learning Specialization](https://www.coursera.org/specializations/deep-learning) on Coursera contains five courses: 13 | 14 | - Course 1: [Neural Networks and Deep Learning](https://www.coursera.org/learn/neural-networks-deep-learning?specialization=deep-learning) 15 | - Course 2: [Improving Deep Neural Networks: Hyperparameter Tuning, Regularization and Optimization](https://www.coursera.org/learn/deep-neural-network?specialization=deep-learning) 16 | - Course 3: [Structuring Machine Learning Projects](https://www.coursera.org/learn/machine-learning-projects?specialization=deep-learning) 17 | - Course 4: [Convolutional Neural Networks](https://www.coursera.org/learn/convolutional-neural-networks?specialization=deep-learning) 18 | - Course 5: [Sequence Models](https://www.coursera.org/learn/nlp-sequence-models?specialization=deep-learning) 19 | 20 | ## Specialization Info 21 | 22 | - The Deep Learning Specialization is a foundational program that will help you understand the capabilities, challenges, and consequences of deep learning and prepare you to participate in the development of leading-edge AI technology.In this Specialization, you will build and train neural network architectures such as Convolutional Neural Networks, Recurrent Neural Networks, LSTMs, Transformers, and learn how to make them better with strategies such as Dropout, BatchNorm, Xavier/He initialization, and more. Get ready to master theoretical concepts and their industry applications using Python and TensorFlow and tackle real-world cases such as speech recognition, music synthesis, chatbots, machine translation, natural language processing, and more.AI is transforming many industries. The Deep Learning Specialization provides a pathway for you to take the definitive step in the world of AI by helping you gain the knowledge and skills to level up your career. Along the way, you will also get career advice from deep learning experts from industry and academia. 23 | 24 | ## Applied Learning Project 25 | ### By the end you’ll be able to: 26 | 27 | - Build and train deep neural networks, implement vectorized neural networks, identify architecture parameters, and apply DL to your applications 28 | 29 | - Use best practices to train and develop test sets and analyze bias/variance for building DL applications, use standard NN techniques, apply optimization algorithms, and implement a neural network in TensorFlow 30 | 31 | - Use strategies for reducing errors in ML systems, understand complex ML settings, and apply end-to-end, transfer, and multi-task learning 32 | 33 | - Build a Convolutional Neural Network, apply it to visual detection and recognition tasks, use neural style transfer to generate art, and apply these algorithms to image, video, and other 2D/3D data 34 | 35 | - Build and train Recurrent Neural Networks and its variants (GRUs, LSTMs), apply RNNs to character-level language modeling, work with NLP and Word Embeddings, and use HuggingFace tokenizers and transformers to perform Named Entity Recognition and Question Answering 36 | 37 | 38 | ## What you will learn 39 | 40 | - Build and train deep neural networks, identify key architecture parameters, implement vectorized neural networks and deep learning to applications 41 | 42 | - Train test sets, analyze variance for DL applications, use standard techniques and optimization algorithms, and build neural networks in TensorFlow 43 | 44 | - Build a CNN and apply it to detection and recognition tasks, use neural style transfer to generate art, and apply algorithms to image and video data 45 | 46 | - Build and train RNNs, work with NLP and Word Embeddings, and use HuggingFace tokenizers and transformer models to perform NER and Question Answering 47 | 48 | ## Usage 49 | 50 | I share the assignment notebooks with my prefilled and from the contributors code structred as in the course Course/Week 51 | The assignment notebooks are subject to changes through time. 52 | 53 | # Connect with your mentors and fellow learners on Slack! 54 | Once you enrolled to the course, you are invited to join a slack workspace for this specialization: 55 | Please join the Slack workspace by going to the following link [deeplearningai-nlp.slack.com](https://deeplearningai-nlp.slack.com) 56 | This Slack workspace includes all courses of this specialization. 57 | 58 | ## Programming Assignments 59 | 60 | ### Course 1: Neural Networks and Deep Learning 61 | 62 | 63 | * Week 1 64 | + No Lab 65 | + Week 1 Quiz [Introduction to Deep Learning](https://github.com/TheKidPadra/DeepLearning.AI-CourseraDeepLearningSpecialization-/blob/main/Course%201-Neural%20Networks%20%26%20Deep%20Learning/Week%201/Week%201%20Introduction%20to%20Deep%20Learning.pdf) 66 | * Week 2 Labs & Quiz: 67 | + [Python Basics with Numpy (optional assignment)](https://github.com/TheKidPadra/DeepLearning.AI-CourseraDeepLearningSpecialization-/blob/main/Course%201-Neural%20Networks%20%26%20Deep%20Learning/Week%202/Python_Basics_with_Numpy.ipynb) 68 | + [Logistic Regression with a Neural Network mindset](https://github.com/TheKidPadra/DeepLearning.AI-CourseraDeepLearningSpecialization-/blob/main/Course%201-Neural%20Networks%20%26%20Deep%20Learning/Week%202/Logistic_Regression_with_a_Neural_Network_mindset.ipynb) 69 | + Week 2 Quiz [Neural Network Basics](https://github.com/TheKidPadra/DeepLearning.AI-CourseraDeepLearningSpecialization-/blob/main/Course%201-Neural%20Networks%20%26%20Deep%20Learning/Week%202/Week%202%20Neural%20Network%20Basics.pdf) 70 | 71 | * Week 3 Lab & Quiz: 72 | + [Planar data classification with one hidden layer](https://github.com/TheKidPadra/DeepLearning.AI-CourseraDeepLearningSpecialization-/blob/main/Course%201-Neural%20Networks%20%26%20Deep%20Learning/Week%203/Planar_data_classification_with_one_hidden_layer.ipynb) 73 | + Week 3 Quiz [Shallow Neural Networks](https://github.com/TheKidPadra/DeepLearning.AI-CourseraDeepLearningSpecialization-/blob/main/Course%201-Neural%20Networks%20%26%20Deep%20Learning/Week%203/Week%203%20Shallow%20Neural%20Networks.pdf) 74 | 75 | * Week 4 Labs & Quiz: 76 | + [Building your Deep Neural Network: Step by Step](https://github.com/TheKidPadra/DeepLearning.AI-CourseraDeepLearningSpecialization-/blob/main/Course%201-Neural%20Networks%20%26%20Deep%20Learning/Week%204/Building_your_Deep_Neural_Network_Step_by_Step.ipynb) 77 | + [Deep Neural Network for Image Classification: Application](https://github.com/TheKidPadra/DeepLearning.AI-CourseraDeepLearningSpecialization-/blob/main/Course%201-Neural%20Networks%20%26%20Deep%20Learning/Week%204/Deep%20Neural%20Network%20-%20Application.ipynb) 78 | + Week 4 Quiz [Key Concepts on Deep Neural Networks](https://github.com/TheKidPadra/DeepLearning.AI-CourseraDeepLearningSpecialization-/blob/main/Course%201-Neural%20Networks%20%26%20Deep%20Learning/Week%204/Week%204%20Key%20Concepts%20on%20Deep%20Neural%20Networks.pdf) 79 | 80 | 81 | 82 | ### Course 2: Improving Deep Neural Networks: Hyperparameter Tuning, Regularization and Optimization 83 | 84 | * Week 1 Labs & Quiz: 85 | + [Initialization](https://github.com/TheKidPadra/DeepLearning.AI-CourseraDeepLearningSpecialization-/blob/main/Course%202-Improving%20Deep%20Neural%20Networks/Week%201/Initialization.ipynb) 86 | + [Regularization](https://github.com/TheKidPadra/DeepLearning.AI-CourseraDeepLearningSpecialization-/blob/main/Course%202-Improving%20Deep%20Neural%20Networks/Week%201/Regularization.ipynb) 87 | + [Gradient Checking](https://github.com/TheKidPadra/DeepLearning.AI-CourseraDeepLearningSpecialization-/blob/main/Course%202-Improving%20Deep%20Neural%20Networks/Week%201/Gradient_Checking.ipynb) 88 | + Week 1 Quiz [Practical aspects of Deep Learning](https://github.com/TheKidPadra/DeepLearning.AI-CourseraDeepLearningSpecialization-/blob/main/Course%202-Improving%20Deep%20Neural%20Networks/Week%201/Week%201%20Practical%20aspects%20of%20Deep%20Learning.pdf) 89 | * Week 2 Labs & Quiz: 90 | + [Optimization Methods](https://github.com/TheKidPadra/DeepLearning.AI-CourseraDeepLearningSpecialization-/blob/main/Course%202-Improving%20Deep%20Neural%20Networks/Week%202/Optimization_methods.ipynb) 91 | + Week 2 Quiz [Optimization Algorithms](https://github.com/TheKidPadra/DeepLearning.AI-CourseraDeepLearningSpecialization-/blob/main/Course%202-Improving%20Deep%20Neural%20Networks/Week%202/Week%202%20Optimization%20Algorithms.pdf) 92 | * Week 3 Labs & Quiz: 93 | + [Introduction to TensorFlow](https://github.com/TheKidPadra/DeepLearning.AI-CourseraDeepLearningSpecialization-/blob/main/Course%202-Improving%20Deep%20Neural%20Networks/Week%203/Tensorflow_introduction.ipynb) 94 | + Week 3 Quiz [Hyperparameter tuning, Batch Normalization, Programming Frameworks](https://github.com/TheKidPadra/DeepLearning.AI-CourseraDeepLearningSpecialization-/blob/main/Course%202-Improving%20Deep%20Neural%20Networks/Week%203/Week%203%20Hyperparameter%20tuning%2C%20Batch%20Normalization%2C%20Programming%20Frameworks.pdf) 95 | 96 | 97 | ### Course 3: Structuring Machine Learning Projects 98 | 99 | * Week 1 Labs & Quiz: 100 | + No Lab 101 | + Week 1 Quiz [Bird Recognition in the City of Peacetopia (Case Study)](https://github.com/TheKidPadra/DeepLearning.AI-CourseraDeepLearningSpecialization-/blob/main/Course%203-Structuring%20MachineLearningProjects/Week%201%20Bird%20Recognition%20in%20the%20City%20of%20Peacetopia%20Case%20Study.pdf) 102 | * Week 2 Labs & Quiz: 103 | + No Lab 104 | + Week 2 Quiz [Autonomous Driving (Case Study)](https://github.com/TheKidPadra/DeepLearning.AI-CourseraDeepLearningSpecialization-/blob/main/Course%203-Structuring%20MachineLearningProjects/Week%202%20Autonomous%20Driving%20Case%20Study.pdf) 105 | 106 | 107 | ### Course 4: Convolutional Neural Networks 108 | 109 | * Week 1 Labs & Quiz: 110 | + [Convolutional Neural Networks: Step by Step](https://github.com/TheKidPadra/DeepLearning.AI-CourseraDeepLearningSpecialization-/blob/main/Course%204-ConvolutionalNeuralNetworks/Week%201/Convolution_model_Step_by_Step_v1.ipynb) 111 | + [Convolutional Neural Networks: Application](https://github.com/TheKidPadra/DeepLearning.AI-CourseraDeepLearningSpecialization-/blob/main/Course%204-ConvolutionalNeuralNetworks/Week%201/Convolution_model_Application.ipynb) 112 | + Week 1 Quiz [The Basics of ConvNets](https://github.com/TheKidPadra/DeepLearning.AI-CourseraDeepLearningSpecialization-/blob/main/Course%204-ConvolutionalNeuralNetworks/Week%201/Week%201%20The%20Basics%20of%20ConvNets.pdf) 113 | * Week 2 Labs & Quiz: 114 | + [Residual Networks](https://github.com/TheKidPadra/DeepLearning.AI-CourseraDeepLearningSpecialization-/blob/main/Course%204-ConvolutionalNeuralNetworks/Week%202/Residual_Networks.ipynb) 115 | + [Transfer Learning with MobileNetV2](https://github.com/TheKidPadra/DeepLearning.AI-CourseraDeepLearningSpecialization-/blob/main/Course%204-ConvolutionalNeuralNetworks/Week%202/Transfer_learning_with_MobileNet_v1.ipynb) 116 | + Week 2 Quiz [Deep Convolutional Models](https://github.com/TheKidPadra/DeepLearning.AI-CourseraDeepLearningSpecialization-/blob/main/Course%204-ConvolutionalNeuralNetworks/Week%202/Week%202%20Deep%20Convolutional%20Models.pdf) 117 | * Week 3 Labs & Quiz: 118 | + [Autonomous Driving - Car Detection](https://github.com/TheKidPadra/DeepLearning.AI-CourseraDeepLearningSpecialization-/blob/main/Course%204-ConvolutionalNeuralNetworks/Week%203/Autonomous_driving_application_Car_detection.ipynb) 119 | + [Image Segmentation with U-Net](https://github.com/TheKidPadra/DeepLearning.AI-CourseraDeepLearningSpecialization-/blob/main/Course%204-ConvolutionalNeuralNetworks/Week%203/Image_segmentation_Unet_v2.ipynb) 120 | + Week 3 Quiz [Detection Algorithms](https://github.com/TheKidPadra/DeepLearning.AI-CourseraDeepLearningSpecialization-/blob/main/Course%204-ConvolutionalNeuralNetworks/Week%203/Week%203%20Detection%20Algorithms.pdf) 121 | * Week 4 Labs & Quiz: 122 | + [Face Recognition](https://github.com/TheKidPadra/DeepLearning.AI-CourseraDeepLearningSpecialization-/blob/main/Course%204-ConvolutionalNeuralNetworks/Week%204/Face_Recognition.ipynb) 123 | + [Deep Learning & Art: Neural Style Transfer](https://github.com/TheKidPadra/DeepLearning.AI-CourseraDeepLearningSpecialization-/blob/main/Course%204-ConvolutionalNeuralNetworks/Week%204/Art_Generation_with_Neural_Style_Transfer.ipynb) 124 | + Week 4 Quiz [Special Applications Face Recognition and Neural Style Transfer](https://github.com/TheKidPadra/DeepLearning.AI-CourseraDeepLearningSpecialization-/blob/main/Course%204-ConvolutionalNeuralNetworks/Week%204/Week%204%20Special%20Applications%20Face%20Recognition%20and%20Neural%20Style%20Transfer.pdf) 125 | 126 | 127 | ### Course 5: Sequence Models 128 | * Week 1 Labs & Quiz: 129 | + [Building your Recurrent Neural Network - Step by Step](https://github.com/TheKidPadra/DeepLearning.AI-CourseraDeepLearningSpecialization-/blob/main/Course%205-SequenceModels/Week%201/Building_a_Recurrent_Neural_Network_Step_by_Step.ipynb) 130 | + [Character level language model - Dinosaurus Island](https://github.com/TheKidPadra/DeepLearning.AI-CourseraDeepLearningSpecialization-/blob/main/Course%205-SequenceModels/Week%201/Dinosaurus_Island_Character_level_language_model.ipynb) 131 | + [Improvise a Jazz Solo with an LSTM Network](https://github.com/TheKidPadra/DeepLearning.AI-CourseraDeepLearningSpecialization-/blob/main/Course%205-SequenceModels/Week%201/Improvise_a_Jazz_Solo_with_an_LSTM_Network_v4.ipynb) 132 | + Week 1 Quiz [Recurrent Neural Networks](https://github.com/TheKidPadra/DeepLearning.AI-CourseraDeepLearningSpecialization-/blob/main/Course%205-SequenceModels/Week%201/Week%201%20Recurrent%20Neural%20Networks.pdf) 133 | * Week 2 Labs & Quiz: 134 | + [Operations on Word Vectors](https://github.com/TheKidPadra/DeepLearning.AI-CourseraDeepLearningSpecialization-/blob/main/Course%205-SequenceModels/Week%202/Operations_on_word_vectors_v2a.ipynb) 135 | + [Emojify!](https://github.com/TheKidPadra/DeepLearning.AI-CourseraDeepLearningSpecialization-/blob/main/Course%205-SequenceModels/Week%202/Emoji_v3a.ipynb) 136 | + Week 2 Quiz [Natural Language Processing and Word Embeddings](https://github.com/TheKidPadra/DeepLearning.AI-CourseraDeepLearningSpecialization-/blob/main/Course%205-SequenceModels/Week%202/Week%202%20Natural%20Language%20Processing%20and%20Word%20Embeddings.pdf) 137 | * Week 3 Labs & Quiz: 138 | + [Neural Machine Translation](https://github.com/TheKidPadra/DeepLearning.AI-CourseraDeepLearningSpecialization-/blob/main/Course%205-SequenceModels/Week%203/Neural_machine_translation_with_attention_v4a.ipynb) 139 | + [Trigger Word Detection](https://github.com/TheKidPadra/DeepLearning.AI-CourseraDeepLearningSpecialization-/blob/main/Course%205-SequenceModels/Week%203/Trigger_word_detection_v2a.ipynb) 140 | + Week 3 Quiz [Sequence Models and Attention Mechanism](https://github.com/TheKidPadra/DeepLearning.AI-CourseraDeepLearningSpecialization-/blob/main/Course%205-SequenceModels/Week%203/Week%203%20Sequence%20models%20and%20Attention%20Mechanism.pdf) 141 | * Week 4 Labs & Quiz: 142 | + [Transformer Network](https://github.com/TheKidPadra/DeepLearning.AI-CourseraDeepLearningSpecialization-/blob/main/Course%205-SequenceModels/Week%204/C5_W4_A1_Transformer_Subclass_v1.ipynb) 143 | + Week 4 Quiz [Trasformers](https://github.com/TheKidPadra/DeepLearning.AI-CourseraDeepLearningSpecialization-/blob/main/Course%205-SequenceModels/Week%204/Week%204%20Transformers.pdf) 144 | 145 | 146 | ## Certificate 147 | 148 | 1. [Neural Networks and Deep Learning](https://www.coursera.org/account/accomplishments/verify/9LV2D8ND4UNV) 149 | 2. [Improving Deep Neural Networks: Hyperparameter Tuning, Regularization and Optimization](https://www.coursera.org/account/accomplishments/verify/7ETLGGB85S9N) 150 | 3. [Structuring Machine Learning Projects](https://www.coursera.org/account/accomplishments/verify/4S4BS8BEEPR4) 151 | 4. [Convolutional Neural Networks](https://www.coursera.org/account/accomplishments/verify/T862BAR8XNV3) 152 | 5. [Sequence Models](https://www.coursera.org/account/accomplishments/verify/KFGNFGUWZGGB) 153 | 6. [Deep Learning Specialization(Final Certificate)]() 154 | 155 | -------------------------------------------------------------------------------------------------------------- 156 | ## References 157 | 1. [Neural Networks and Deep Learning](https://www.coursera.org/learn/neural-networks-deep-learning?specialization=deep-learning) 158 | 2. [Improving Deep Neural Networks: Hyperparameter Tuning, Regularization and Optimization](https://www.coursera.org/learn/deep-neural-network?specialization=deep-learning) 159 | 3. [Structuring Machine Learning Projects](https://www.coursera.org/learn/machine-learning-projects?specialization=deep-learning) 160 | 4. [Convolutional Neural Networks](https://www.coursera.org/learn/convolutional-neural-networks?specialization=deep-learning) 161 | 5. [Sequence Models](https://www.coursera.org/learn/nlp-sequence-models?specialization=deep-learning) 162 | 163 | ---------------------------------------------------------------------------------------------------------------- 164 | 165 | ## 📝 License 166 | The gem is available as open source under the terms of the [MIT license](https://opensource.org/licenses/MIT). 167 | 168 | ---------------------------------------------------------------------------------------------------------------- 169 | 170 | ## Disclaimer 171 | I recognize the hard time people spend on building intuition, understanding new concepts and debugging assignments. The solutions uploaded here are **only for reference**. They are meant to unblock you if you get stuck somewhere. Please do not copy any part of the code as-is (the programming assignments are fairly easy if you read the instructions carefully). Similarly, try out the quizzes yourself before you refer to the quiz solutions. 172 | 173 | --------------------------------------------------------------------------------