├── C3_W2_Collaborative_RecSys_Assignment.ipynb ├── C3_W3_A1_Assignment.ipynb └── C3_W2_RecSysNN_Assignment.ipynb /C3_W2_Collaborative_RecSys_Assignment.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": { 6 | "id": "Lzk7iX_CodX6", 7 | "tags": [] 8 | }, 9 | "source": [ 10 | "# Practice lab: Collaborative Filtering Recommender Systems\n", 11 | "\n", 12 | "In this exercise, you will implement collaborative filtering to build a recommender system for movies. \n", 13 | "\n", 14 | "# Outline\n", 15 | "- [ 1 - Notation](#1)\n", 16 | "- [ 2 - Recommender Systems](#2)\n", 17 | "- [ 3 - Movie ratings dataset](#3)\n", 18 | "- [ 4 - Collaborative filtering learning algorithm](#4)\n", 19 | " - [ 4.1 Collaborative filtering cost function](#4.1)\n", 20 | " - [ Exercise 1](#ex01)\n", 21 | "- [ 5 - Learning movie recommendations](#5)\n", 22 | "- [ 6 - Recommendations](#6)\n", 23 | "- [ 7 - Congratulations!](#7)\n", 24 | "\n", 25 | "\n" 26 | ] 27 | }, 28 | { 29 | "cell_type": "markdown", 30 | "metadata": {}, 31 | "source": [ 32 | "## Packages \n", 33 | "We will use the now familiar NumPy and Tensorflow Packages." 34 | ] 35 | }, 36 | { 37 | "cell_type": "code", 38 | "execution_count": 1, 39 | "metadata": {}, 40 | "outputs": [], 41 | "source": [ 42 | "import numpy as np\n", 43 | "import tensorflow as tf\n", 44 | "from tensorflow import keras\n", 45 | "from recsys_utils import *" 46 | ] 47 | }, 48 | { 49 | "cell_type": "markdown", 50 | "metadata": {}, 51 | "source": [ 52 | "\n", 53 | "## 1 - Notation\n" 54 | ] 55 | }, 56 | { 57 | "cell_type": "markdown", 58 | "metadata": {}, 59 | "source": [ 60 | "|General
Notation | Description| Python (if any) |\n", 61 | "|:-------------|:------------------------------------------------------------||\n", 62 | "| $r(i,j)$ | scalar; = 1 if user j rated game i = 0 otherwise ||\n", 63 | "| $y(i,j)$ | scalar; = rating given by user j on game i (if r(i,j) = 1 is defined) ||\n", 64 | "|$\\mathbf{w}^{(j)}$ | vector; parameters for user j ||\n", 65 | "|$b^{(j)}$ | scalar; parameter for user j ||\n", 66 | "| $\\mathbf{x}^{(i)}$ | vector; feature ratings for movie i || \n", 67 | "| $n_u$ | number of users |num_users|\n", 68 | "| $n_m$ | number of movies | num_movies |\n", 69 | "| $n$ | number of features | num_features |\n", 70 | "| $\\mathbf{X}$ | matrix of vectors $\\mathbf{x}^{(i)}$ | X |\n", 71 | "| $\\mathbf{W}$ | matrix of vectors $\\mathbf{w}^{(j)}$ | W |\n", 72 | "| $\\mathbf{b}$ | vector of bias parameters $b^{(j)}$ | b |\n", 73 | "| $\\mathbf{R}$ | matrix of elements $r(i,j)$ | R |\n", 74 | "\n" 75 | ] 76 | }, 77 | { 78 | "cell_type": "markdown", 79 | "metadata": { 80 | "tags": [] 81 | }, 82 | "source": [ 83 | "\n", 84 | "## 2 - Recommender Systems \n", 85 | "In this lab, you will implement the collaborative filtering learning algorithm and apply it to a dataset of movie ratings.\n", 86 | "The goal of a collaborative filtering recommender system is to generate two vectors: For each user, a 'parameter vector' that embodies the movie tastes of a user. For each movie, a feature vector of the same size which embodies some description of the movie. The dot product of the two vectors plus the bias term should produce an estimate of the rating the user might give to that movie.\n", 87 | "\n", 88 | "The diagram below details how these vectors are learned." 89 | ] 90 | }, 91 | { 92 | "cell_type": "markdown", 93 | "metadata": {}, 94 | "source": [ 95 | "
\n", 96 | " \n", 97 | "
" 98 | ] 99 | }, 100 | { 101 | "cell_type": "markdown", 102 | "metadata": {}, 103 | "source": [ 104 | "Existing ratings are provided in matrix form as shown. $Y$ contains ratings; 0.5 to 5 inclusive in 0.5 steps. 0 if the movie has not been rated. $R$ has a 1 where movies have been rated. Movies are in rows, users in columns. Each user has a parameter vector $w^{user}$ and bias. Each movie has a feature vector $x^{movie}$. These vectors are simultaneously learned by using the existing user/movie ratings as training data. One training example is shown above: $\\mathbf{w}^{(1)} \\cdot \\mathbf{x}^{(1)} + b^{(1)} = 4$. It is worth noting that the feature vector $x^{movie}$ must satisfy all the users while the user vector $w^{user}$ must satisfy all the movies. This is the source of the name of this approach - all the users collaborate to generate the rating set. " 105 | ] 106 | }, 107 | { 108 | "cell_type": "markdown", 109 | "metadata": {}, 110 | "source": [ 111 | "
\n", 112 | " \n", 113 | "
" 114 | ] 115 | }, 116 | { 117 | "cell_type": "markdown", 118 | "metadata": {}, 119 | "source": [ 120 | "Once the feature vectors and parameters are learned, they can be used to predict how a user might rate an unrated movie. This is shown in the diagram above. The equation is an example of predicting a rating for user one on movie zero." 121 | ] 122 | }, 123 | { 124 | "cell_type": "markdown", 125 | "metadata": {}, 126 | "source": [ 127 | "\n", 128 | "In this exercise, you will implement the function `cofiCostFunc` that computes the collaborative filtering\n", 129 | "objective function. After implementing the objective function, you will use a TensorFlow custom training loop to learn the parameters for collaborative filtering. The first step is to detail the data set and data structures that will be used in the lab." 130 | ] 131 | }, 132 | { 133 | "cell_type": "markdown", 134 | "metadata": { 135 | "id": "6-09Hto6odYD" 136 | }, 137 | "source": [ 138 | "\n", 139 | "## 3 - Movie ratings dataset \n", 140 | "The data set is derived from the [MovieLens \"ml-latest-small\"](https://grouplens.org/datasets/movielens/latest/) dataset. \n", 141 | "[F. Maxwell Harper and Joseph A. Konstan. 2015. The MovieLens Datasets: History and Context. ACM Transactions on Interactive Intelligent Systems (TiiS) 5, 4: 19:1–19:19. ]\n", 142 | "\n", 143 | "The original dataset has 9000 movies rated by 600 users. The dataset has been reduced in size to focus on movies from the years since 2000. This dataset consists of ratings on a scale of 0.5 to 5 in 0.5 step increments. The reduced dataset has $n_u = 443$ users, and $n_m= 4778$ movies. \n", 144 | "\n", 145 | "Below, you will load the movie dataset into the variables $Y$ and $R$.\n", 146 | "\n", 147 | "The matrix $Y$ (a $n_m \\times n_u$ matrix) stores the ratings $y^{(i,j)}$. The matrix $R$ is an binary-valued indicator matrix, where $R(i,j) = 1$ if user $j$ gave a rating to movie $i$, and $R(i,j)=0$ otherwise. \n", 148 | "\n", 149 | "Throughout this part of the exercise, you will also be working with the\n", 150 | "matrices, $\\mathbf{X}$, $\\mathbf{W}$ and $\\mathbf{b}$: \n", 151 | "\n", 152 | "$$\\mathbf{X} = \n", 153 | "\\begin{bmatrix}\n", 154 | "--- (\\mathbf{x}^{(0)})^T --- \\\\\n", 155 | "--- (\\mathbf{x}^{(1)})^T --- \\\\\n", 156 | "\\vdots \\\\\n", 157 | "--- (\\mathbf{x}^{(n_m-1)})^T --- \\\\\n", 158 | "\\end{bmatrix} , \\quad\n", 159 | "\\mathbf{W} = \n", 160 | "\\begin{bmatrix}\n", 161 | "--- (\\mathbf{w}^{(0)})^T --- \\\\\n", 162 | "--- (\\mathbf{w}^{(1)})^T --- \\\\\n", 163 | "\\vdots \\\\\n", 164 | "--- (\\mathbf{w}^{(n_u-1)})^T --- \\\\\n", 165 | "\\end{bmatrix},\\quad\n", 166 | "\\mathbf{ b} = \n", 167 | "\\begin{bmatrix}\n", 168 | " b^{(0)} \\\\\n", 169 | " b^{(1)} \\\\\n", 170 | "\\vdots \\\\\n", 171 | "b^{(n_u-1)} \\\\\n", 172 | "\\end{bmatrix}\\quad\n", 173 | "$$ \n", 174 | "\n", 175 | "The $i$-th row of $\\mathbf{X}$ corresponds to the\n", 176 | "feature vector $x^{(i)}$ for the $i$-th movie, and the $j$-th row of\n", 177 | "$\\mathbf{W}$ corresponds to one parameter vector $\\mathbf{w}^{(j)}$, for the\n", 178 | "$j$-th user. Both $x^{(i)}$ and $\\mathbf{w}^{(j)}$ are $n$-dimensional\n", 179 | "vectors. For the purposes of this exercise, you will use $n=10$, and\n", 180 | "therefore, $\\mathbf{x}^{(i)}$ and $\\mathbf{w}^{(j)}$ have 10 elements.\n", 181 | "Correspondingly, $\\mathbf{X}$ is a\n", 182 | "$n_m \\times 10$ matrix and $\\mathbf{W}$ is a $n_u \\times 10$ matrix.\n", 183 | "\n", 184 | "We will start by loading the movie ratings dataset to understand the structure of the data.\n", 185 | "We will load $Y$ and $R$ with the movie dataset. \n", 186 | "We'll also load $\\mathbf{X}$, $\\mathbf{W}$, and $\\mathbf{b}$ with pre-computed values. These values will be learned later in the lab, but we'll use pre-computed values to develop the cost model." 187 | ] 188 | }, 189 | { 190 | "cell_type": "code", 191 | "execution_count": 2, 192 | "metadata": {}, 193 | "outputs": [ 194 | { 195 | "name": "stdout", 196 | "output_type": "stream", 197 | "text": [ 198 | "Y (4778, 443) R (4778, 443)\n", 199 | "X (4778, 10)\n", 200 | "W (443, 10)\n", 201 | "b (1, 443)\n", 202 | "num_features 10\n", 203 | "num_movies 4778\n", 204 | "num_users 443\n" 205 | ] 206 | } 207 | ], 208 | "source": [ 209 | "#Load data\n", 210 | "X, W, b, num_movies, num_features, num_users = load_precalc_params_small()\n", 211 | "Y, R = load_ratings_small()\n", 212 | "\n", 213 | "print(\"Y\", Y.shape, \"R\", R.shape)\n", 214 | "print(\"X\", X.shape)\n", 215 | "print(\"W\", W.shape)\n", 216 | "print(\"b\", b.shape)\n", 217 | "print(\"num_features\", num_features)\n", 218 | "print(\"num_movies\", num_movies)\n", 219 | "print(\"num_users\", num_users)" 220 | ] 221 | }, 222 | { 223 | "cell_type": "code", 224 | "execution_count": 3, 225 | "metadata": { 226 | "id": "bxm1O_wbodYF" 227 | }, 228 | "outputs": [ 229 | { 230 | "name": "stdout", 231 | "output_type": "stream", 232 | "text": [ 233 | "Average rating for movie 1 : 3.400 / 5\n" 234 | ] 235 | } 236 | ], 237 | "source": [ 238 | "# From the matrix, we can compute statistics like average rating.\n", 239 | "tsmean = np.mean(Y[0, R[0, :].astype(bool)])\n", 240 | "print(f\"Average rating for movie 1 : {tsmean:0.3f} / 5\" )" 241 | ] 242 | }, 243 | { 244 | "cell_type": "markdown", 245 | "metadata": {}, 246 | "source": [ 247 | "\n", 248 | "## 4 - Collaborative filtering learning algorithm \n", 249 | "\n", 250 | "Now, you will begin implementing the collaborative filtering learning\n", 251 | "algorithm. You will start by implementing the objective function. \n", 252 | "\n", 253 | "The collaborative filtering algorithm in the setting of movie\n", 254 | "recommendations considers a set of $n$-dimensional parameter vectors\n", 255 | "$\\mathbf{x}^{(0)},...,\\mathbf{x}^{(n_m-1)}$, $\\mathbf{w}^{(0)},...,\\mathbf{w}^{(n_u-1)}$ and $b^{(0)},...,b^{(n_u-1)}$, where the\n", 256 | "model predicts the rating for movie $i$ by user $j$ as\n", 257 | "$y^{(i,j)} = \\mathbf{w}^{(j)}\\cdot \\mathbf{x}^{(i)} + b^{(i)}$ . Given a dataset that consists of\n", 258 | "a set of ratings produced by some users on some movies, you wish to\n", 259 | "learn the parameter vectors $\\mathbf{x}^{(0)},...,\\mathbf{x}^{(n_m-1)},\n", 260 | "\\mathbf{w}^{(0)},...,\\mathbf{w}^{(n_u-1)}$ and $b^{(0)},...,b^{(n_u-1)}$ that produce the best fit (minimizes\n", 261 | "the squared error).\n", 262 | "\n", 263 | "You will complete the code in cofiCostFunc to compute the cost\n", 264 | "function for collaborative filtering. " 265 | ] 266 | }, 267 | { 268 | "cell_type": "markdown", 269 | "metadata": { 270 | "id": "bcqg0LJWodYH" 271 | }, 272 | "source": [ 273 | "\n", 274 | "\n", 275 | "### 4.1 Collaborative filtering cost function\n", 276 | "\n", 277 | "The collaborative filtering cost function is given by\n", 278 | "$$J({\\mathbf{x}^{(0)},...,\\mathbf{x}^{(n_m-1)},\\mathbf{w}^{(0)},b^{(0)},...,\\mathbf{w}^{(n_u-1)},b^{(n_u-1)}})= \\frac{1}{2}\\sum_{(i,j):r(i,j)=1}(\\mathbf{w}^{(j)} \\cdot \\mathbf{x}^{(i)} + b^{(j)} - y^{(i,j)})^2\n", 279 | "+\\underbrace{\n", 280 | "\\frac{\\lambda}{2}\n", 281 | "\\sum_{j=0}^{n_u-1}\\sum_{k=0}^{n-1}(\\mathbf{w}^{(j)}_k)^2\n", 282 | "+ \\frac{\\lambda}{2}\\sum_{i=0}^{n_m-1}\\sum_{k=0}^{n-1}(\\mathbf{x}_k^{(i)})^2\n", 283 | "}_{regularization}\n", 284 | "\\tag{1}$$\n", 285 | "The first summation in (1) is \"for all $i$, $j$ where $r(i,j)$ equals $1$\" and could be written:\n", 286 | "\n", 287 | "$$\n", 288 | "= \\frac{1}{2}\\sum_{j=0}^{n_u-1} \\sum_{i=0}^{n_m-1}r(i,j)*(\\mathbf{w}^{(j)} \\cdot \\mathbf{x}^{(i)} + b^{(j)} - y^{(i,j)})^2\n", 289 | "+\\text{regularization}\n", 290 | "$$\n", 291 | "\n", 292 | "You should now write cofiCostFunc (collaborative filtering cost function) to return this cost." 293 | ] 294 | }, 295 | { 296 | "cell_type": "markdown", 297 | "metadata": {}, 298 | "source": [ 299 | "\n", 300 | "### Exercise 1\n", 301 | "\n", 302 | "**For loop Implementation:** \n", 303 | "Start by implementing the cost function using for loops.\n", 304 | "Consider developing the cost function in two steps. First, develop the cost function without regularization. A test case that does not include regularization is provided below to test your implementation. Once that is working, add regularization and run the tests that include regularization. Note that you should be accumulating the cost for user $j$ and movie $i$ only if $R(i,j) = 1$." 305 | ] 306 | }, 307 | { 308 | "cell_type": "code", 309 | "execution_count": 19, 310 | "metadata": {}, 311 | "outputs": [], 312 | "source": [ 313 | "# GRADED FUNCTION: cofi_cost_func\n", 314 | "# UNQ_C1\n", 315 | "\n", 316 | "def cofi_cost_func(X, W, b, Y, R, lambda_):\n", 317 | " \"\"\"\n", 318 | " Returns the cost for the content-based filtering\n", 319 | " Args:\n", 320 | " X (ndarray (num_movies,num_features)): matrix of item features\n", 321 | " W (ndarray (num_users,num_features)) : matrix of user parameters\n", 322 | " b (ndarray (1, num_users) : vector of user parameters\n", 323 | " Y (ndarray (num_movies,num_users) : matrix of user ratings of movies\n", 324 | " R (ndarray (num_movies,num_users) : matrix, where R(i, j) = 1 if the i-th movies was rated by the j-th user\n", 325 | " lambda_ (float): regularization parameter\n", 326 | " Returns:\n", 327 | " J (float) : Cost\n", 328 | " \"\"\"\n", 329 | " nm, nu = Y.shape\n", 330 | " J = 0\n", 331 | " ### START CODE HERE ### \n", 332 | " for j in range(nu):\n", 333 | " w = W[j,:]\n", 334 | " b_j = b[0,j]\n", 335 | " for i in range(nm):\n", 336 | " x = X[i,:]\n", 337 | " y = Y[i,j]\n", 338 | " r = R[i,j]\n", 339 | " J += np.square(r * (np.dot(w,x) + b_j - y ) )\n", 340 | " J += lambda_* (np.sum(np.square(W)) + np.sum(np.square(X))) \n", 341 | " J = J/2\n", 342 | " ### END CODE HERE ### \n", 343 | "\n", 344 | " return J" 345 | ] 346 | }, 347 | { 348 | "cell_type": "code", 349 | "execution_count": 20, 350 | "metadata": {}, 351 | "outputs": [ 352 | { 353 | "name": "stdout", 354 | "output_type": "stream", 355 | "text": [ 356 | "\u001b[92mAll tests passed!\n" 357 | ] 358 | } 359 | ], 360 | "source": [ 361 | "# Public tests\n", 362 | "from public_tests import *\n", 363 | "test_cofi_cost_func(cofi_cost_func)" 364 | ] 365 | }, 366 | { 367 | "cell_type": "markdown", 368 | "metadata": {}, 369 | "source": [ 370 | "
\n", 371 | " Click for hints\n", 372 | " You can structure the code in two for loops similar to the summation in (1). \n", 373 | " Implement the code without regularization first. \n", 374 | " Note that some of the elements in (1) are vectors. Use np.dot(). You can also use np.square().\n", 375 | " Pay close attention to which elements are indexed by i and which are indexed by j. Don't forget to divide by two.\n", 376 | " \n", 377 | "```python \n", 378 | " ### START CODE HERE ### \n", 379 | " for j in range(nu):\n", 380 | " \n", 381 | " \n", 382 | " for i in range(nm):\n", 383 | " \n", 384 | " \n", 385 | " ### END CODE HERE ### \n", 386 | "``` \n", 387 | "
\n", 388 | " Click for more hints\n", 389 | " \n", 390 | " Here is some more details. The code below pulls out each element from the matrix before using it. \n", 391 | " One could also reference the matrix directly. \n", 392 | " This code does not contain regularization.\n", 393 | " \n", 394 | "```python \n", 395 | " nm,nu = Y.shape\n", 396 | " J = 0\n", 397 | " ### START CODE HERE ### \n", 398 | " for j in range(nu):\n", 399 | " w = W[j,:]\n", 400 | " b_j = b[0,j]\n", 401 | " for i in range(nm):\n", 402 | " x = \n", 403 | " y = \n", 404 | " r =\n", 405 | " J += \n", 406 | " J = J/2\n", 407 | " ### END CODE HERE ### \n", 408 | "\n", 409 | "```\n", 410 | " \n", 411 | "
\n", 412 | " Last Resort (full non-regularized implementation)\n", 413 | " \n", 414 | "```python \n", 415 | " nm,nu = Y.shape\n", 416 | " J = 0\n", 417 | " ### START CODE HERE ### \n", 418 | " for j in range(nu):\n", 419 | " w = W[j,:]\n", 420 | " b_j = b[0,j]\n", 421 | " for i in range(nm):\n", 422 | " x = X[i,:]\n", 423 | " y = Y[i,j]\n", 424 | " r = R[i,j]\n", 425 | " J += np.square(r * (np.dot(w,x) + b_j - y ) )\n", 426 | " J = J/2\n", 427 | " ### END CODE HERE ### \n", 428 | "```\n", 429 | " \n", 430 | "
\n", 431 | " regularization\n", 432 | " Regularization just squares each element of the W array and X array and them sums all the squared elements.\n", 433 | " You can utilize np.square() and np.sum().\n", 434 | "\n", 435 | "
\n", 436 | " regularization details\n", 437 | " \n", 438 | "```python \n", 439 | " J += lambda_* (np.sum(np.square(W)) + np.sum(np.square(X)))\n", 440 | "```\n", 441 | " \n", 442 | "
\n", 443 | "
\n", 444 | "
\n", 445 | "
\n", 446 | "\n", 447 | " \n" 448 | ] 449 | }, 450 | { 451 | "cell_type": "code", 452 | "execution_count": null, 453 | "metadata": {}, 454 | "outputs": [], 455 | "source": [ 456 | "# Reduce the data set size so that this runs faster\n", 457 | "num_users_r = 4\n", 458 | "num_movies_r = 5 \n", 459 | "num_features_r = 3\n", 460 | "\n", 461 | "X_r = X[:num_movies_r, :num_features_r]\n", 462 | "W_r = W[:num_users_r, :num_features_r]\n", 463 | "b_r = b[0, :num_users_r].reshape(1,-1)\n", 464 | "Y_r = Y[:num_movies_r, :num_users_r]\n", 465 | "R_r = R[:num_movies_r, :num_users_r]\n", 466 | "\n", 467 | "# Evaluate cost function\n", 468 | "J = cofi_cost_func(X_r, W_r, b_r, Y_r, R_r, 0);\n", 469 | "print(f\"Cost: {J:0.2f}\")" 470 | ] 471 | }, 472 | { 473 | "cell_type": "markdown", 474 | "metadata": { 475 | "id": "xGznmQ91odYL" 476 | }, 477 | "source": [ 478 | "**Expected Output (lambda = 0)**: \n", 479 | "$13.67$." 480 | ] 481 | }, 482 | { 483 | "cell_type": "code", 484 | "execution_count": null, 485 | "metadata": {}, 486 | "outputs": [], 487 | "source": [ 488 | "# Evaluate cost function with regularization \n", 489 | "J = cofi_cost_func(X_r, W_r, b_r, Y_r, R_r, 1.5);\n", 490 | "print(f\"Cost (with regularization): {J:0.2f}\")" 491 | ] 492 | }, 493 | { 494 | "cell_type": "markdown", 495 | "metadata": { 496 | "id": "1xbepzUUodYP" 497 | }, 498 | "source": [ 499 | "**Expected Output**:\n", 500 | "\n", 501 | "28.09" 502 | ] 503 | }, 504 | { 505 | "cell_type": "markdown", 506 | "metadata": {}, 507 | "source": [ 508 | "**Vectorized Implementation**\n", 509 | "\n", 510 | "It is important to create a vectorized implementation to compute $J$, since it will later be called many times during optimization. The linear algebra utilized is not the focus of this series, so the implementation is provided. If you are an expert in linear algebra, feel free to create your version without referencing the code below. \n", 511 | "\n", 512 | "Run the code below and verify that it produces the same results as the non-vectorized version." 513 | ] 514 | }, 515 | { 516 | "cell_type": "code", 517 | "execution_count": 8, 518 | "metadata": {}, 519 | "outputs": [], 520 | "source": [ 521 | "def cofi_cost_func_v(X, W, b, Y, R, lambda_):\n", 522 | " \"\"\"\n", 523 | " Returns the cost for the content-based filtering\n", 524 | " Vectorized for speed. Uses tensorflow operations to be compatible with custom training loop.\n", 525 | " Args:\n", 526 | " X (ndarray (num_movies,num_features)): matrix of item features\n", 527 | " W (ndarray (num_users,num_features)) : matrix of user parameters\n", 528 | " b (ndarray (1, num_users) : vector of user parameters\n", 529 | " Y (ndarray (num_movies,num_users) : matrix of user ratings of movies\n", 530 | " R (ndarray (num_movies,num_users) : matrix, where R(i, j) = 1 if the i-th movies was rated by the j-th user\n", 531 | " lambda_ (float): regularization parameter\n", 532 | " Returns:\n", 533 | " J (float) : Cost\n", 534 | " \"\"\"\n", 535 | " j = (tf.linalg.matmul(X, tf.transpose(W)) + b - Y)*R\n", 536 | " J = 0.5 * tf.reduce_sum(j**2) + (lambda_/2) * (tf.reduce_sum(X**2) + tf.reduce_sum(W**2))\n", 537 | " return J" 538 | ] 539 | }, 540 | { 541 | "cell_type": "code", 542 | "execution_count": null, 543 | "metadata": {}, 544 | "outputs": [], 545 | "source": [ 546 | "# Evaluate cost function\n", 547 | "J = cofi_cost_func_v(X_r, W_r, b_r, Y_r, R_r, 0);\n", 548 | "print(f\"Cost: {J:0.2f}\")\n", 549 | "\n", 550 | "# Evaluate cost function with regularization \n", 551 | "J = cofi_cost_func_v(X_r, W_r, b_r, Y_r, R_r, 1.5);\n", 552 | "print(f\"Cost (with regularization): {J:0.2f}\")" 553 | ] 554 | }, 555 | { 556 | "cell_type": "markdown", 557 | "metadata": { 558 | "id": "1xbepzUUodYP" 559 | }, 560 | "source": [ 561 | "**Expected Output**: \n", 562 | "Cost: 13.67 \n", 563 | "Cost (with regularization): 28.09" 564 | ] 565 | }, 566 | { 567 | "cell_type": "markdown", 568 | "metadata": { 569 | "id": "ilaeM8yWodYR" 570 | }, 571 | "source": [ 572 | "\n", 573 | "## 5 - Learning movie recommendations \n", 574 | "------------------------------\n", 575 | "\n", 576 | "After you have finished implementing the collaborative filtering cost\n", 577 | "function, you can start training your algorithm to make\n", 578 | "movie recommendations for yourself. \n", 579 | "\n", 580 | "In the cell below, you can enter your own movie choices. The algorithm will then make recommendations for you! We have filled out some values according to our preferences, but after you have things working with our choices, you should change this to match your tastes.\n", 581 | "A list of all movies in the dataset is in the file [movie list](data/small_movie_list.csv)." 582 | ] 583 | }, 584 | { 585 | "cell_type": "code", 586 | "execution_count": 9, 587 | "metadata": { 588 | "id": "WJO8Jr0UodYR" 589 | }, 590 | "outputs": [ 591 | { 592 | "name": "stdout", 593 | "output_type": "stream", 594 | "text": [ 595 | "\n", 596 | "New user ratings:\n", 597 | "\n", 598 | "Rated 5.0 for Shrek (2001)\n", 599 | "Rated 5.0 for Harry Potter and the Sorcerer's Stone (a.k.a. Harry Potter and the Philosopher's Stone) (2001)\n", 600 | "Rated 2.0 for Amelie (Fabuleux destin d'Amélie Poulain, Le) (2001)\n", 601 | "Rated 5.0 for Harry Potter and the Chamber of Secrets (2002)\n", 602 | "Rated 5.0 for Pirates of the Caribbean: The Curse of the Black Pearl (2003)\n", 603 | "Rated 5.0 for Lord of the Rings: The Return of the King, The (2003)\n", 604 | "Rated 3.0 for Eternal Sunshine of the Spotless Mind (2004)\n", 605 | "Rated 5.0 for Incredibles, The (2004)\n", 606 | "Rated 2.0 for Persuasion (2007)\n", 607 | "Rated 5.0 for Toy Story 3 (2010)\n", 608 | "Rated 3.0 for Inception (2010)\n", 609 | "Rated 1.0 for Louis Theroux: Law & Disorder (2008)\n", 610 | "Rated 1.0 for Nothing to Declare (Rien à déclarer) (2010)\n" 611 | ] 612 | } 613 | ], 614 | "source": [ 615 | "movieList, movieList_df = load_Movie_List_pd()\n", 616 | "\n", 617 | "my_ratings = np.zeros(num_movies) # Initialize my ratings\n", 618 | "\n", 619 | "# Check the file small_movie_list.csv for id of each movie in our dataset\n", 620 | "# For example, Toy Story 3 (2010) has ID 2700, so to rate it \"5\", you can set\n", 621 | "my_ratings[2700] = 5 \n", 622 | "\n", 623 | "#Or suppose you did not enjoy Persuasion (2007), you can set\n", 624 | "my_ratings[2609] = 2;\n", 625 | "\n", 626 | "# We have selected a few movies we liked / did not like and the ratings we\n", 627 | "# gave are as follows:\n", 628 | "my_ratings[929] = 5 # Lord of the Rings: The Return of the King, The\n", 629 | "my_ratings[246] = 5 # Shrek (2001)\n", 630 | "my_ratings[2716] = 3 # Inception\n", 631 | "my_ratings[1150] = 5 # Incredibles, The (2004)\n", 632 | "my_ratings[382] = 2 # Amelie (Fabuleux destin d'Amélie Poulain, Le)\n", 633 | "my_ratings[366] = 5 # Harry Potter and the Sorcerer's Stone (a.k.a. Harry Potter and the Philosopher's Stone) (2001)\n", 634 | "my_ratings[622] = 5 # Harry Potter and the Chamber of Secrets (2002)\n", 635 | "my_ratings[988] = 3 # Eternal Sunshine of the Spotless Mind (2004)\n", 636 | "my_ratings[2925] = 1 # Louis Theroux: Law & Disorder (2008)\n", 637 | "my_ratings[2937] = 1 # Nothing to Declare (Rien à déclarer)\n", 638 | "my_ratings[793] = 5 # Pirates of the Caribbean: The Curse of the Black Pearl (2003)\n", 639 | "my_rated = [i for i in range(len(my_ratings)) if my_ratings[i] > 0]\n", 640 | "\n", 641 | "print('\\nNew user ratings:\\n')\n", 642 | "for i in range(len(my_ratings)):\n", 643 | " if my_ratings[i] > 0 :\n", 644 | " print(f'Rated {my_ratings[i]} for {movieList_df.loc[i,\"title\"]}');" 645 | ] 646 | }, 647 | { 648 | "cell_type": "markdown", 649 | "metadata": {}, 650 | "source": [ 651 | "Now, let's add these reviews to $Y$ and $R$ and normalize the ratings." 652 | ] 653 | }, 654 | { 655 | "cell_type": "code", 656 | "execution_count": 10, 657 | "metadata": {}, 658 | "outputs": [], 659 | "source": [ 660 | "# Reload ratings and add new ratings\n", 661 | "Y, R = load_ratings_small()\n", 662 | "Y = np.c_[my_ratings, Y]\n", 663 | "R = np.c_[(my_ratings != 0).astype(int), R]\n", 664 | "\n", 665 | "# Normalize the Dataset\n", 666 | "Ynorm, Ymean = normalizeRatings(Y, R)" 667 | ] 668 | }, 669 | { 670 | "cell_type": "markdown", 671 | "metadata": {}, 672 | "source": [ 673 | "Let's prepare to train the model. Initialize the parameters and select the Adam optimizer." 674 | ] 675 | }, 676 | { 677 | "cell_type": "code", 678 | "execution_count": 11, 679 | "metadata": { 680 | "tags": [] 681 | }, 682 | "outputs": [], 683 | "source": [ 684 | "# Useful Values\n", 685 | "num_movies, num_users = Y.shape\n", 686 | "num_features = 100\n", 687 | "\n", 688 | "# Set Initial Parameters (W, X), use tf.Variable to track these variables\n", 689 | "tf.random.set_seed(1234) # for consistent results\n", 690 | "W = tf.Variable(tf.random.normal((num_users, num_features),dtype=tf.float64), name='W')\n", 691 | "X = tf.Variable(tf.random.normal((num_movies, num_features),dtype=tf.float64), name='X')\n", 692 | "b = tf.Variable(tf.random.normal((1, num_users), dtype=tf.float64), name='b')\n", 693 | "\n", 694 | "# Instantiate an optimizer.\n", 695 | "optimizer = keras.optimizers.Adam(learning_rate=1e-1)" 696 | ] 697 | }, 698 | { 699 | "cell_type": "markdown", 700 | "metadata": {}, 701 | "source": [ 702 | "Let's now train the collaborative filtering model. This will learn the parameters $\\mathbf{X}$, $\\mathbf{W}$, and $\\mathbf{b}$. " 703 | ] 704 | }, 705 | { 706 | "cell_type": "markdown", 707 | "metadata": {}, 708 | "source": [ 709 | "The operations involved in learning $w$, $b$, and $x$ simultaneously do not fall into the typical 'layers' offered in the TensorFlow neural network package. Consequently, the flow used in Course 2: Model, Compile(), Fit(), Predict(), are not directly applicable. Instead, we can use a custom training loop.\n", 710 | "\n", 711 | "Recall from earlier labs the steps of gradient descent.\n", 712 | "- repeat until convergence:\n", 713 | " - compute forward pass\n", 714 | " - compute the derivatives of the loss relative to parameters\n", 715 | " - update the parameters using the learning rate and the computed derivatives \n", 716 | " \n", 717 | "TensorFlow has the marvelous capability of calculating the derivatives for you. This is shown below. Within the `tf.GradientTape()` section, operations on Tensorflow Variables are tracked. When `tape.gradient()` is later called, it will return the gradient of the loss relative to the tracked variables. The gradients can then be applied to the parameters using an optimizer. \n", 718 | "This is a very brief introduction to a useful feature of TensorFlow and other machine learning frameworks. Further information can be found by investigating \"custom training loops\" within the framework of interest.\n", 719 | " \n" 720 | ] 721 | }, 722 | { 723 | "cell_type": "code", 724 | "execution_count": 12, 725 | "metadata": {}, 726 | "outputs": [ 727 | { 728 | "name": "stdout", 729 | "output_type": "stream", 730 | "text": [ 731 | "Training loss at iteration 0: 2321191.3\n", 732 | "Training loss at iteration 20: 136168.7\n", 733 | "Training loss at iteration 40: 51863.3\n", 734 | "Training loss at iteration 60: 24598.8\n", 735 | "Training loss at iteration 80: 13630.4\n", 736 | "Training loss at iteration 100: 8487.6\n", 737 | "Training loss at iteration 120: 5807.7\n", 738 | "Training loss at iteration 140: 4311.6\n", 739 | "Training loss at iteration 160: 3435.2\n", 740 | "Training loss at iteration 180: 2902.1\n" 741 | ] 742 | } 743 | ], 744 | "source": [ 745 | "iterations = 200\n", 746 | "lambda_ = 1\n", 747 | "for iter in range(iterations):\n", 748 | " # Use TensorFlow’s GradientTape\n", 749 | " # to record the operations used to compute the cost \n", 750 | " with tf.GradientTape() as tape:\n", 751 | "\n", 752 | " # Compute the cost (forward pass included in cost)\n", 753 | " cost_value = cofi_cost_func_v(X, W, b, Ynorm, R, lambda_)\n", 754 | "\n", 755 | " # Use the gradient tape to automatically retrieve\n", 756 | " # the gradients of the trainable variables with respect to the loss\n", 757 | " grads = tape.gradient( cost_value, [X,W,b] )\n", 758 | "\n", 759 | " # Run one step of gradient descent by updating\n", 760 | " # the value of the variables to minimize the loss.\n", 761 | " optimizer.apply_gradients( zip(grads, [X,W,b]) )\n", 762 | "\n", 763 | " # Log periodically.\n", 764 | " if iter % 20 == 0:\n", 765 | " print(f\"Training loss at iteration {iter}: {cost_value:0.1f}\")" 766 | ] 767 | }, 768 | { 769 | "cell_type": "markdown", 770 | "metadata": { 771 | "id": "SSzUL7eQodYS" 772 | }, 773 | "source": [ 774 | "\n", 775 | "## 6 - Recommendations\n", 776 | "Below, we compute the ratings for all the movies and users and display the movies that are recommended. These are based on the movies and ratings entered as `my_ratings[]` above. To predict the rating of movie $i$ for user $j$, you compute $\\mathbf{w}^{(j)} \\cdot \\mathbf{x}^{(i)} + b^{(j)}$. This can be computed for all ratings using matrix multiplication." 777 | ] 778 | }, 779 | { 780 | "cell_type": "code", 781 | "execution_count": null, 782 | "metadata": { 783 | "id": "ns266wKtodYT" 784 | }, 785 | "outputs": [], 786 | "source": [ 787 | "# Make a prediction using trained weights and biases\n", 788 | "p = np.matmul(X.numpy(), np.transpose(W.numpy())) + b.numpy()\n", 789 | "\n", 790 | "#restore the mean\n", 791 | "pm = p + Ymean\n", 792 | "\n", 793 | "my_predictions = pm[:,0]\n", 794 | "\n", 795 | "# sort predictions\n", 796 | "ix = tf.argsort(my_predictions, direction='DESCENDING')\n", 797 | "\n", 798 | "for i in range(17):\n", 799 | " j = ix[i]\n", 800 | " if j not in my_rated:\n", 801 | " print(f'Predicting rating {my_predictions[j]:0.2f} for movie {movieList[j]}')\n", 802 | "\n", 803 | "print('\\n\\nOriginal vs Predicted ratings:\\n')\n", 804 | "for i in range(len(my_ratings)):\n", 805 | " if my_ratings[i] > 0:\n", 806 | " print(f'Original {my_ratings[i]}, Predicted {my_predictions[i]:0.2f} for {movieList[i]}')" 807 | ] 808 | }, 809 | { 810 | "cell_type": "markdown", 811 | "metadata": {}, 812 | "source": [ 813 | "In practice, additional information can be utilized to enhance our predictions. Above, the predicted ratings for the first few hundred movies lie in a small range. We can augment the above by selecting from those top movies, movies that have high average ratings and movies with more than 20 ratings. This section uses a [Pandas](https://pandas.pydata.org/) data frame which has many handy sorting features." 814 | ] 815 | }, 816 | { 817 | "cell_type": "code", 818 | "execution_count": null, 819 | "metadata": {}, 820 | "outputs": [], 821 | "source": [ 822 | "filter=(movieList_df[\"number of ratings\"] > 20)\n", 823 | "movieList_df[\"pred\"] = my_predictions\n", 824 | "movieList_df = movieList_df.reindex(columns=[\"pred\", \"mean rating\", \"number of ratings\", \"title\"])\n", 825 | "movieList_df.loc[ix[:300]].loc[filter].sort_values(\"mean rating\", ascending=False)" 826 | ] 827 | }, 828 | { 829 | "cell_type": "markdown", 830 | "metadata": {}, 831 | "source": [ 832 | "\n", 833 | "## 7 - Congratulations! \n", 834 | "You have implemented a useful recommender system!" 835 | ] 836 | } 837 | ], 838 | "metadata": { 839 | "kernelspec": { 840 | "display_name": "Python 3", 841 | "language": "python", 842 | "name": "python3" 843 | }, 844 | "language_info": { 845 | "codemirror_mode": { 846 | "name": "ipython", 847 | "version": 3 848 | }, 849 | "file_extension": ".py", 850 | "mimetype": "text/x-python", 851 | "name": "python", 852 | "nbconvert_exporter": "python", 853 | "pygments_lexer": "ipython3", 854 | "version": "3.7.6" 855 | } 856 | }, 857 | "nbformat": 4, 858 | "nbformat_minor": 4 859 | } 860 | -------------------------------------------------------------------------------- /C3_W3_A1_Assignment.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Deep Q-Learning - Lunar Lander\n", 8 | "\n", 9 | "In this assignment, you will train an agent to land a lunar lander safely on a landing pad on the surface of the moon.\n", 10 | "\n", 11 | "\n", 12 | "# Outline\n", 13 | "- [ 1 - Import Packages ](#1)\n", 14 | "- [ 2 - Hyperparameters](#2)\n", 15 | "- [ 3 - The Lunar Lander Environment](#3)\n", 16 | " - [ 3.1 Action Space](#3.1)\n", 17 | " - [ 3.2 Observation Space](#3.2)\n", 18 | " - [ 3.3 Rewards](#3.3)\n", 19 | " - [ 3.4 Episode Termination](#3.4)\n", 20 | "- [ 4 - Load the Environment](#4)\n", 21 | "- [ 5 - Interacting with the Gym Environment](#5)\n", 22 | " - [ 5.1 Exploring the Environment's Dynamics](#5.1)\n", 23 | "- [ 6 - Deep Q-Learning](#6)\n", 24 | " - [ 6.1 Target Network](#6.1)\n", 25 | " - [ Exercise 1](#ex01)\n", 26 | " - [ 6.2 Experience Replay](#6.2)\n", 27 | "- [ 7 - Deep Q-Learning Algorithm with Experience Replay](#7)\n", 28 | " - [ Exercise 2](#ex02)\n", 29 | "- [ 8 - Update the Network Weights](#8)\n", 30 | "- [ 9 - Train the Agent](#9)\n", 31 | "- [ 10 - See the Trained Agent In Action](#10)\n", 32 | "- [ 11 - Congratulations!](#11)\n", 33 | "- [ 12 - References](#12)\n" 34 | ] 35 | }, 36 | { 37 | "cell_type": "markdown", 38 | "metadata": {}, 39 | "source": [ 40 | "\n", 41 | "## 1 - Import Packages\n", 42 | "\n", 43 | "We'll make use of the following packages:\n", 44 | "- `numpy` is a package for scientific computing in python.\n", 45 | "- `deque` will be our data structure for our memory buffer.\n", 46 | "- `namedtuple` will be used to store the experience tuples.\n", 47 | "- The `gym` toolkit is a collection of environments that can be used to test reinforcement learning algorithms. We should note that in this notebook we are using `gym` version `0.24.0`.\n", 48 | "- `PIL.Image` and `pyvirtualdisplay` are needed to render the Lunar Lander environment.\n", 49 | "- We will use several modules from the `tensorflow.keras` framework for building deep learning models.\n", 50 | "- `utils` is a module that contains helper functions for this assignment. You do not need to modify the code in this file.\n", 51 | "\n", 52 | "Run the cell below to import all the necessary packages." 53 | ] 54 | }, 55 | { 56 | "cell_type": "code", 57 | "execution_count": 1, 58 | "metadata": { 59 | "id": "KYbOPKRtfQOr" 60 | }, 61 | "outputs": [], 62 | "source": [ 63 | "import time\n", 64 | "from collections import deque, namedtuple\n", 65 | "\n", 66 | "import gym\n", 67 | "import numpy as np\n", 68 | "import PIL.Image\n", 69 | "import tensorflow as tf\n", 70 | "import utils\n", 71 | "\n", 72 | "from pyvirtualdisplay import Display\n", 73 | "from tensorflow.keras import Sequential\n", 74 | "from tensorflow.keras.layers import Dense, Input\n", 75 | "from tensorflow.keras.losses import MSE\n", 76 | "from tensorflow.keras.optimizers import Adam" 77 | ] 78 | }, 79 | { 80 | "cell_type": "code", 81 | "execution_count": 2, 82 | "metadata": {}, 83 | "outputs": [], 84 | "source": [ 85 | "# Set up a virtual display to render the Lunar Lander environment.\n", 86 | "Display(visible=0, size=(840, 480)).start();\n", 87 | "\n", 88 | "# Set the random seed for TensorFlow\n", 89 | "tf.random.set_seed(utils.SEED)" 90 | ] 91 | }, 92 | { 93 | "cell_type": "markdown", 94 | "metadata": {}, 95 | "source": [ 96 | "\n", 97 | "## 2 - Hyperparameters\n", 98 | "\n", 99 | "Run the cell below to set the hyperparameters." 100 | ] 101 | }, 102 | { 103 | "cell_type": "code", 104 | "execution_count": 3, 105 | "metadata": {}, 106 | "outputs": [], 107 | "source": [ 108 | "MEMORY_SIZE = 100_000 # size of memory buffer\n", 109 | "GAMMA = 0.995 # discount factor\n", 110 | "ALPHA = 1e-3 # learning rate \n", 111 | "NUM_STEPS_FOR_UPDATE = 4 # perform a learning update every C time steps" 112 | ] 113 | }, 114 | { 115 | "cell_type": "markdown", 116 | "metadata": {}, 117 | "source": [ 118 | "\n", 119 | "## 3 - The Lunar Lander Environment\n", 120 | "\n", 121 | "In this notebook we will be using [OpenAI's Gym Library](https://www.gymlibrary.ml/). The Gym library provides a wide variety of environments for reinforcement learning. To put it simply, an environment represents a problem or task to be solved. In this notebook, we will try to solve the Lunar Lander environment using reinforcement learning.\n", 122 | "\n", 123 | "The goal of the Lunar Lander environment is to land the lunar lander safely on the landing pad on the surface of the moon. The landing pad is designated by two flag poles and it is always at coordinates `(0,0)` but the lander is also allowed to land outside of the landing pad. The lander starts at the top center of the environment with a random initial force applied to its center of mass and has infinite fuel. The environment is considered solved if you get `200` points. \n", 124 | "\n", 125 | "
\n", 126 | "
\n", 127 | "
\n", 128 | " \n", 129 | "
Fig 1. Lunar Lander Environment.
\n", 130 | "
\n", 131 | "\n", 132 | "\n", 133 | "\n", 134 | "\n", 135 | "### 3.1 Action Space\n", 136 | "\n", 137 | "The agent has four discrete actions available:\n", 138 | "\n", 139 | "* Do nothing.\n", 140 | "* Fire right engine.\n", 141 | "* Fire main engine.\n", 142 | "* Fire left engine.\n", 143 | "\n", 144 | "Each action has a corresponding numerical value:\n", 145 | "\n", 146 | "```python\n", 147 | "Do nothing = 0\n", 148 | "Fire right engine = 1\n", 149 | "Fire main engine = 2\n", 150 | "Fire left engine = 3\n", 151 | "```\n", 152 | "\n", 153 | "\n", 154 | "### 3.2 Observation Space\n", 155 | "\n", 156 | "The agent's observation space consists of a state vector with 8 variables:\n", 157 | "\n", 158 | "* Its $(x,y)$ coordinates. The landing pad is always at coordinates $(0,0)$.\n", 159 | "* Its linear velocities $(\\dot x,\\dot y)$.\n", 160 | "* Its angle $\\theta$.\n", 161 | "* Its angular velocity $\\dot \\theta$.\n", 162 | "* Two booleans, $l$ and $r$, that represent whether each leg is in contact with the ground or not.\n", 163 | "\n", 164 | "\n", 165 | "### 3.3 Rewards\n", 166 | "\n", 167 | "The Lunar Lander environment has the following reward system:\n", 168 | "\n", 169 | "* Landing on the landing pad and coming to rest is about 100-140 points.\n", 170 | "* If the lander moves away from the landing pad, it loses reward. \n", 171 | "* If the lander crashes, it receives -100 points.\n", 172 | "* If the lander comes to rest, it receives +100 points.\n", 173 | "* Each leg with ground contact is +10 points.\n", 174 | "* Firing the main engine is -0.3 points each frame.\n", 175 | "* Firing the side engine is -0.03 points each frame.\n", 176 | "\n", 177 | "\n", 178 | "### 3.4 Episode Termination\n", 179 | "\n", 180 | "An episode ends (i.e the environment enters a terminal state) if:\n", 181 | "\n", 182 | "* The lunar lander crashes (i.e if the body of the lunar lander comes in contact with the surface of the moon).\n", 183 | "\n", 184 | "* The lander's $x$-coordinate is greater than 1.\n", 185 | "\n", 186 | "You can check out the [Open AI Gym documentation](https://www.gymlibrary.ml/environments/box2d/lunar_lander/) for a full description of the environment. " 187 | ] 188 | }, 189 | { 190 | "cell_type": "markdown", 191 | "metadata": {}, 192 | "source": [ 193 | "\n", 194 | "## 4 - Load the Environment\n", 195 | "\n", 196 | "We start by loading the `LunarLander-v2` environment from the `gym` library by using the `.make()` method. `LunarLander-v2` is the latest version of the Lunar Lander environment and you can read about its version history in the [Open AI Gym documentation](https://www.gymlibrary.ml/environments/box2d/lunar_lander/#version-history)." 197 | ] 198 | }, 199 | { 200 | "cell_type": "code", 201 | "execution_count": 4, 202 | "metadata": { 203 | "id": "ILVMYKewfR0n" 204 | }, 205 | "outputs": [], 206 | "source": [ 207 | "env = gym.make('LunarLander-v2')" 208 | ] 209 | }, 210 | { 211 | "cell_type": "markdown", 212 | "metadata": {}, 213 | "source": [ 214 | "Once we load the environment we use the `.reset()` method to reset the environment to the initial state. The lander starts at the top center of the environment and we can render the first frame of the environment by using the `.render()` method." 215 | ] 216 | }, 217 | { 218 | "cell_type": "code", 219 | "execution_count": 5, 220 | "metadata": {}, 221 | "outputs": [ 222 | { 223 | "data": { 224 | "image/png": "iVBORw0KGgoAAAANSUhEUgAAAlgAAAGQCAIAAAD9V4nPAAAPk0lEQVR4nO3df2yUdZ4H8CkUoUoR0UW3EDaKgsgqtxwnxpBAbsO5nsetMdhsiEfiedoY43lnzMUYN/bOGDU5fYpRYxpDjD+jXeMF6i1REjDAGX6Uc9AF5eSXUgqClG2hoBWe+2NcddtaWjrzPM88z+v1D32G6Tyf+WSezzvzzHyfVuSAQfq3Wz/cfWTVxWP/em/n+l371u7eveHQod1ff33stt+8ebz7yynn/11J976zfeVHO5Zt3vy7ESOqJky46upf3DKqcuzEMbM//+P/vLv2sf37Py7p3iF9hsVdAJSZ+jv2tXZumFB9ddWIcZeNu37mJYtnzPj7r78+lsvlToXfjBh+TqkLOGv4mDAMc7lcd/fxmTMWnl912SXn/fKs4aMnjJk955o7Sr13SB9BCIPT+dW+U+E3546aVNj89MDKlpamws8jh48ZMayq1AVUDhv51VcdhZ/b9x042LW18POoynOrz6qZNGlmqQuAlBGEMAjfvR0sbHZ+ta/jWOsXX3waeSEVhX9WbfzP6rNqDnZtK2xOGDN75l8sjLwYKG+CEAbhyIndlcOqRp91YWFz6/7/2rZt5Xf/W5GrCHNhqWuo+FMKFuzcsaG1Y0Ph58phI39y9hWXXjqn1DVAmghCGKj6O/bt7Vg/cczswuaRE7sPfvlpe/veeKt6r+XJC86+fP/RDwqbE8ZcPW3a/IqKin5/CfieIISBOtT1yeizLhxVObawuf2L32/b9m6sFX3r/7avae3cUPgGTS6Xm1g9e9q0+fGWBGVEEMKA9Ph08FDXJ61tW44d+zLeqgre2/zkhOrZrZ3rC5sXjr5q6mW/HDmy5N9fhXQQhHB69XfsO3B0y7iqS0cMP7twy67Dq3746WDkep753Pbxu18e39598nhhc2L17N/8ujHyqqAsCUIYkB++HTxwdMuu3esLawcTosebwvPPntLVfWj06J/EWxWUhcq4C4Ckq79jX2vnxp+OnjmsYnjhls873u/z08ET33Qc697/2R/X5Xp9d7TXt0l73tDHHX7kF050t1dXX9B77x9uXT7+Z5OOd7dXjTgvl8tNrJ7961899srvbuvvuQGCEE7r5KnuL459+IuL/rGw2dq58ZNPVoXhqd73rB55UUXFsFGVY3qfuqzoeUvPG/q4w4/8wqjKsXv39nHa873NT/7Lz99v7Vx/6bhf5XK5c0dN2n8sP27czw4f3tPf04PM8x1r6M9v/2l3a+fGUZXnjj/n57lc7uSp7v9tW9r89r/HXVff5s68t+biKTWj/6p65E9zudyxr7/4rGPd8hUPdnW1x10aJJfPCKE/X588drDrD99ttnZu+OgPv4+xnv71+KSw46u9Xd0Hx46dEG9VkHDeEcJpzP3Lf5186bVHTuy6cPSMz9rXvf3f/xF3Rf2ZO/PeSZdcVTls1OETO453dL6z+rG4KwIgFebNunfhjcEVV/xN3IWc3j//w9q/vf63lZUj4y4EgNQZMaLkf1xi6IYN8yU4AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAykpF3AVAMW3alDt1Knf0aG7XrtyaNbkXXoi7oJTSZ9JEEJIqmzb1vMW8LgV9Jk0EIanSe0D3YF4XhT6TJoKQVDntgO7BvD4z+kyaCEJSZbADugfzeoD0mTQRhKTKEAd0D7NmFfPR0kSfSZPKuAuABPFOJRr6TKIIQjLNRI6GPpNkgpBsMZGjoc+UEUFIypnI0dBnypcgJG1M5GjoM0AShWEYdwmZoM+kybC4C0i/mpqaJ554IgzDF198cf78+XGXAwBRmTVr1muvvRb+uba2tscff3z69OlxV5dO3qlEQ5+B01iwYMHq1avDfrW0tNxzzz3nnXde3MWmigEdDX0GflRdXd327dv7j8Aempuba2tr4y48JQzoaOgz0FN1dXV9ff3Ro0cHFYE/dOLEicbGxjlz5sT9VMqbAR0NfQa+N3Xq1MbGxjPOv9527txZX19/ySWXxP3MypIBHQ19BnK5XG7evHnNzc1FjMAe1q1bV1dXV1VVFfcTLScGdDT0GbJu0aJFLS0tpYvAHpqamhYsWBD3ky4PBnQ09Bmy67777mtra4ssAn+ovb19yZIls/zFmn4Z0NHQZ8ic7xbFJ8HWrVvvv//+mpqauLuSRKEBHQl9hgzpc1F8QqxcuXLx4sVxdyhZQgM6EvoMmTCQRfEJ8dJLL7l4W0FoQEdCnyHlzmBRfBK4eFvOgI6KPkM6DX1RfEJk+eJtoQEdCX2GtCn6oviEyODF20IDOhL6DOlR6kXxSZCpi7eFBnQk9BnSIOJF8UlQuHjb5MmT4+59CYUGdCT0GcpbjIviEyLFF28LDehI6DOUpUQtik+I9F28LTSgI6HPUGaSvCg+Cdrb25966ql0XLwtNKAjoc9QNspoUXwSpODibaEBHQl9hjJQpoviE6J8L94WGtCR0GdIrtQsik+I5ubmurq6MnqPGBrQkdBnSKK0LopPiJaWlvr6+uR/jhga0JHQZ0iWLCyKT462trbGxsbEftc0NKAjoc+lVltb29TUFJbhWRmilsFF8YmSwEM0NKAjoc8l8l3+9VYuZ2WIjkXxiZKcQzQ0oCOhz8XVT/71lvCzMkRhxowZpRvoDFHsh2hoQEdCn4tiUPnXpwSelaHkFi9eXJyBTenFcoiGBnQk9Hkohp5/vSXnrAyl5QJpZSrKQzQ0oCOhz2egFPnXW+xnZSihlStXlvoFRKlFcIiGBnQk9Hngosm/Pjlxmh41NTUHDx6M5WVE6ZToEA0N6Ejo82nFmH+9OXFa3q6//vq4X0KUVnEP0dCAjoQ+/5hE5V9vTpyWn/vvvz/ulw3RKcohGhrQkdDnHhKef31y4rQM+MNJWXbGh2hoQEdCnwvKMf96c+K0Iu4C+jBixIh8Pj9t2rS4CyF+mzdvXr58eXNz86ZNmwZy/zAMKyqS+KpOmYz3uba29uabb164cGHchRTZ/v37l/9J3LVEKnEv5VmzZm3cuDHuKkicAR6iGR/Qkclmn9Oaf316++23C4fbvn374q6l5JL1Ur7tttuef/75uKsg6fo5RLM5oKOXqT5nKv96G+xZmXKUoJfyU089dffdd8ddBeWk9yGaqQEdoyz0OeP511uKT5wm5aW8evXquXPnxl0F5SrFhygRk38DkbITp/EH4aRJk/L5/NixY+MuBBiEI0eOtA/M4cOH4y729OTfmUnHidOYg3DBggXLli2Ltwag1BKbmvKvWMr6rEycQfjggw8+/PDDMRYAJFAEqSn/SmrJkiVBEOzZsyfuQgYqtiBsamryKgSGaFCpKf+i9NZbbwVBsGbNmrgLOb0YgrCqqiqfz1922WXR7xqAKLW0tDQ0NLz88stxF9KfqIPwmmuuef/99yPeKQAxOnToUBAEQRAcP3487lr6MCzKndXV1UlBgKy54IILHnnkka6urueee+7yyy+Pu5yeontH+Oyzz955552R7Q6AZFqxYkUQBO+8807chXwroiBct27dtddeG82+AEi+rVu3BkHwxhtvdHR0xFtJyYNw8uTJ+Xz+nHPOKfWOACg7R48ebWhoCIIgxgsvlDYIb7rppjfffLOkuwAgBV588cUgCD744IPod13CIKyvr3/ooYdK9/gApMzq1asbGho++uijHTt2RLbTUgXhW2+9deONN5bowQFIsV27dgVB8Nxzz3V3d0ewu+IHYXV1dT6fv/jii4v+yABkx8mTJ4MgaGhoaG1tLemOihyEc+bMKYsL6gBQLpqamoIgKN0y9GIG4V133fX0008X8QEBoGD9+vVBEKxdu7bobxCLFoSNjY233357sR4NAHpra2srLLco4seHxQnC9evXX3311UV5KAA4rWeeeSYIgqJ8uXSoQTh16tR8Pj9y5MihlwIAg7J8+fKGhobDhw8PZQHikIKwtrb29ddfH8ojAMAQbdmyJQiCZcuWndnlac48CB955JEHHnjgjH8dAIroyJEjheUWg7146RkGYXNz8w033HBmvwsApbN06dIgCGpqagb4By4GHYTjxo3L5/MTJ04cfG0AEJGVK1cGQbBz586PP/64/3sOLgjnzZu3atWqIRQGANHZvn17Q0PDCy+8cPz48R+7zyD+Qv0999wjBQEoI1OmTHn22Wfb29sfffTR8ePH33LLLb3vM9B3hEuXLr311luLWh4AROrVV19taGgYNWrUD68GOqAgbGlpmTlzZskKA4DorFu3LgiCTZs27dmzJ3faIJw+fXo+nx8+fHgktQFARD7//PMgCIIg6C8IFy1a9Morr0RWEwBE70e/LPP4449LQQBSr+93hCtWrLjuuusiLgUAotczCMePH5/P5y+66KJYqgGAiP3ZqdH58+cfOHBACgKQHd8H4X333TfAy7IBQGp8e2r0pZde6nO9PQCkW0Uul9uyZcuVV14ZdyUAEIOKMAzjrgEAYjOIi24DQPoIQgAyTRACkGmCEIBME4QAZJogBCDTBCEAmSYIAcg0QQhApglCADJNEAKQaYIQgEwThABkmiAEINMEIQCZJggByDRBCECmCUIAMk0QApBpghCATBOEAGSaIAQg0wQhAJkmCAHINEEIQKYJQgAyTRACkGmCEIBME4QAZJogBCDTBCEAmSYIAcg0QQhApglCADJNEAKQaYIQgEwThABkmiAEINMEIQCZJggByDRBCECmCUIAMk0QApBpghCATBOEAGSaIAQg0wQhAJkmCAHINEEIQKYJQgAyTRACkGmCEIBME4QAZJogBCDTBCEAmSYIAcg0QQhApglCADJNEAKQaYIQgEwThABkmiAEINP+H4bqlqkM8g9eAAAAAElFTkSuQmCC\n", 225 | "text/plain": [ 226 | "" 227 | ] 228 | }, 229 | "execution_count": 5, 230 | "metadata": {}, 231 | "output_type": "execute_result" 232 | } 233 | ], 234 | "source": [ 235 | "env.reset()\n", 236 | "PIL.Image.fromarray(env.render(mode='rgb_array'))" 237 | ] 238 | }, 239 | { 240 | "cell_type": "markdown", 241 | "metadata": {}, 242 | "source": [ 243 | "In order to build our neural network later on we need to know the size of the state vector and the number of valid actions. We can get this information from our environment by using the `.observation_space.shape` and `action_space.n` methods, respectively." 244 | ] 245 | }, 246 | { 247 | "cell_type": "code", 248 | "execution_count": 6, 249 | "metadata": { 250 | "id": "x3fdqdG4CUu2" 251 | }, 252 | "outputs": [ 253 | { 254 | "name": "stdout", 255 | "output_type": "stream", 256 | "text": [ 257 | "State Shape: (8,)\n", 258 | "Number of actions: 4\n" 259 | ] 260 | } 261 | ], 262 | "source": [ 263 | "state_size = env.observation_space.shape\n", 264 | "num_actions = env.action_space.n\n", 265 | "\n", 266 | "print('State Shape:', state_size)\n", 267 | "print('Number of actions:', num_actions)" 268 | ] 269 | }, 270 | { 271 | "cell_type": "markdown", 272 | "metadata": {}, 273 | "source": [ 274 | "\n", 275 | "## 5 - Interacting with the Gym Environment\n", 276 | "\n", 277 | "The Gym library implements the standard “agent-environment loop” formalism:\n", 278 | "\n", 279 | "
\n", 280 | "
\n", 281 | "\n", 282 | "
Fig 2. Agent-environment Loop Formalism.
\n", 283 | "
\n", 284 | "
\n", 285 | "\n", 286 | "In the standard “agent-environment loop” formalism, an agent interacts with the environment in discrete time steps $t=0,1,2,...$. At each time step $t$, the agent uses a policy $\\pi$ to select an action $A_t$ based on its observation of the environment's state $S_t$. The agent receives a numerical reward $R_t$ and on the next time step, moves to a new state $S_{t+1}$.\n", 287 | "\n", 288 | "\n", 289 | "### 5.1 Exploring the Environment's Dynamics\n", 290 | "\n", 291 | "In Open AI's Gym environments, we use the `.step()` method to run a single time step of the environment's dynamics. In the version of `gym` that we are using the `.step()` method accepts an action and returns four values:\n", 292 | "\n", 293 | "* `observation` (**object**): an environment-specific object representing your observation of the environment. In the Lunar Lander environment this corresponds to a numpy array containing the positions and velocities of the lander as described in section [3.2 Observation Space](#3.2).\n", 294 | "\n", 295 | "\n", 296 | "* `reward` (**float**): amount of reward returned as a result of taking the given action. In the Lunar Lander environment this corresponds to a float of type `numpy.float64` as described in section [3.3 Rewards](#3.3).\n", 297 | "\n", 298 | "\n", 299 | "* `done` (**boolean**): When done is `True`, it indicates the episode has terminated and it’s time to reset the environment. \n", 300 | "\n", 301 | "\n", 302 | "* `info` (**dictionary**): diagnostic information useful for debugging. We won't be using this variable in this notebook but it is shown here for completeness.\n", 303 | "\n", 304 | "To begin an episode, we need to reset the environment to an initial state. We do this by using the `.reset()` method. " 305 | ] 306 | }, 307 | { 308 | "cell_type": "code", 309 | "execution_count": 7, 310 | "metadata": {}, 311 | "outputs": [], 312 | "source": [ 313 | "# Reset the environment and get the initial state.\n", 314 | "initial_state = env.reset()" 315 | ] 316 | }, 317 | { 318 | "cell_type": "markdown", 319 | "metadata": {}, 320 | "source": [ 321 | "Once the environment is reset, the agent can start taking actions in the environment by using the `.step()` method. Note that the agent can only take one action per time step. \n", 322 | "\n", 323 | "In the cell below you can select different actions and see how the returned values change depending on the action taken. Remember that in this environment the agent has four discrete actions available and we specify them in code by using their corresponding numerical value:\n", 324 | "\n", 325 | "```python\n", 326 | "Do nothing = 0\n", 327 | "Fire right engine = 1\n", 328 | "Fire main engine = 2\n", 329 | "Fire left engine = 3\n", 330 | "```" 331 | ] 332 | }, 333 | { 334 | "cell_type": "code", 335 | "execution_count": 8, 336 | "metadata": {}, 337 | "outputs": [ 338 | { 339 | "name": "stdout", 340 | "output_type": "stream", 341 | "text": [ 342 | "Initial State: [0.002 1.422 0.194 0.506 -0.002 -0.044 0.000 0.000]\n", 343 | "Action: 0\n", 344 | "Next State: [0.004 1.433 0.194 0.480 -0.004 -0.044 0.000 0.000]\n", 345 | "Reward Received: 1.1043263227541047\n", 346 | "Episode Terminated: False\n", 347 | "Info: {}\n" 348 | ] 349 | } 350 | ], 351 | "source": [ 352 | "# Select an action\n", 353 | "action = 0\n", 354 | "\n", 355 | "# Run a single time step of the environment's dynamics with the given action.\n", 356 | "next_state, reward, done, info = env.step(action)\n", 357 | "\n", 358 | "with np.printoptions(formatter={'float': '{:.3f}'.format}):\n", 359 | " print(\"Initial State:\", initial_state)\n", 360 | " print(\"Action:\", action)\n", 361 | " print(\"Next State:\", next_state)\n", 362 | " print(\"Reward Received:\", reward)\n", 363 | " print(\"Episode Terminated:\", done)\n", 364 | " print(\"Info:\", info)" 365 | ] 366 | }, 367 | { 368 | "cell_type": "markdown", 369 | "metadata": {}, 370 | "source": [ 371 | "In practice, when we train the agent we use a loop to allow the agent to take many consecutive actions during an episode." 372 | ] 373 | }, 374 | { 375 | "cell_type": "markdown", 376 | "metadata": {}, 377 | "source": [ 378 | "\n", 379 | "## 6 - Deep Q-Learning\n", 380 | "\n", 381 | "In cases where both the state and action space are discrete we can estimate the action-value function iteratively by using the Bellman equation:\n", 382 | "\n", 383 | "$$\n", 384 | "Q_{i+1}(s,a) = R + \\gamma \\max_{a'}Q_i(s',a')\n", 385 | "$$\n", 386 | "\n", 387 | "This iterative method converges to the optimal action-value function $Q^*(s,a)$ as $i\\to\\infty$. This means that the agent just needs to gradually explore the state-action space and keep updating the estimate of $Q(s,a)$ until it converges to the optimal action-value function $Q^*(s,a)$. However, in cases where the state space is continuous it becomes practically impossible to explore the entire state-action space. Consequently, this also makes it practically impossible to gradually estimate $Q(s,a)$ until it converges to $Q^*(s,a)$.\n", 388 | "\n", 389 | "In the Deep $Q$-Learning, we solve this problem by using a neural network to estimate the action-value function $Q(s,a)\\approx Q^*(s,a)$. We call this neural network a $Q$-Network and it can be trained by adjusting its weights at each iteration to minimize the mean-squared error in the Bellman equation.\n", 390 | "\n", 391 | "Unfortunately, using neural networks in reinforcement learning to estimate action-value functions has proven to be highly unstable. Luckily, there's a couple of techniques that can be employed to avoid instabilities. These techniques consist of using a ***Target Network*** and ***Experience Replay***. We will explore these two techniques in the following sections." 392 | ] 393 | }, 394 | { 395 | "cell_type": "markdown", 396 | "metadata": {}, 397 | "source": [ 398 | "\n", 399 | "### 6.1 Target Network\n", 400 | "\n", 401 | "We can train the $Q$-Network by adjusting it's weights at each iteration to minimize the mean-squared error in the Bellman equation, where the target values are given by:\n", 402 | "\n", 403 | "$$\n", 404 | "y = R + \\gamma \\max_{a'}Q(s',a';w)\n", 405 | "$$\n", 406 | "\n", 407 | "where $w$ are the weights of the $Q$-Network. This means that we are adjusting the weights $w$ at each iteration to minimize the following error:\n", 408 | "\n", 409 | "$$\n", 410 | "\\overbrace{\\underbrace{R + \\gamma \\max_{a'}Q(s',a'; w)}_{\\rm {y~target}} - Q(s,a;w)}^{\\rm {Error}}\n", 411 | "$$\n", 412 | "\n", 413 | "Notice that this forms a problem because the $y$ target is changing on every iteration. Having a constantly moving target can lead to oscillations and instabilities. To avoid this, we can create\n", 414 | "a separate neural network for generating the $y$ targets. We call this separate neural network the **target $\\hat Q$-Network** and it will have the same architecture as the original $Q$-Network. By using the target $\\hat Q$-Network, the above error becomes:\n", 415 | "\n", 416 | "$$\n", 417 | "\\overbrace{\\underbrace{R + \\gamma \\max_{a'}\\hat{Q}(s',a'; w^-)}_{\\rm {y~target}} - Q(s,a;w)}^{\\rm {Error}}\n", 418 | "$$\n", 419 | "\n", 420 | "where $w^-$ and $w$ are the weights the target $\\hat Q$-Network and $Q$-Network, respectively.\n", 421 | "\n", 422 | "In practice, we will use the following algorithm: every $C$ time steps we will use the $\\hat Q$-Network to generate the $y$ targets and update the weights of the target $\\hat Q$-Network using the weights of the $Q$-Network. We will update the weights $w^-$ of the the target $\\hat Q$-Network using a **soft update**. This means that we will update the weights $w^-$ using the following rule:\n", 423 | " \n", 424 | "$$\n", 425 | "w^-\\leftarrow \\tau w + (1 - \\tau) w^-\n", 426 | "$$\n", 427 | "\n", 428 | "where $\\tau\\ll 1$. By using the soft update, we are ensuring that the target values, $y$, change slowly, which greatly improves the stability of our learning algorithm." 429 | ] 430 | }, 431 | { 432 | "cell_type": "markdown", 433 | "metadata": {}, 434 | "source": [ 435 | "\n", 436 | "### Exercise 1\n", 437 | "\n", 438 | "In this exercise you will create the $Q$ and target $\\hat Q$ networks and set the optimizer. Remember that the Deep $Q$-Network (DQN) is a neural network that approximates the action-value function $Q(s,a)\\approx Q^*(s,a)$. It does this by learning how to map states to $Q$ values.\n", 439 | "\n", 440 | "To solve the Lunar Lander environment, we are going to employ a DQN with the following architecture:\n", 441 | "\n", 442 | "* An `Input` layer that takes `state_size` as input.\n", 443 | "\n", 444 | "* A `Dense` layer with `64` units and a `relu` activation function.\n", 445 | "\n", 446 | "* A `Dense` layer with `64` units and a `relu` activation function.\n", 447 | "\n", 448 | "* A `Dense` layer with `num_actions` units and a `linear` activation function. This will be the output layer of our network.\n", 449 | "\n", 450 | "\n", 451 | "In the cell below you should create the $Q$-Network and the target $\\hat Q$-Network using the model architecture described above. Remember that both the $Q$-Network and the target $\\hat Q$-Network have the same architecture.\n", 452 | "\n", 453 | "Lastly, you should set `Adam` as the optimizer with a learning rate equal to `ALPHA`. Recall that `ALPHA` was defined in the [Hyperparameters](#2) section. We should note that for this exercise you should use the already imported packages:\n", 454 | "```python\n", 455 | "from tensorflow.keras.layers import Dense, Input\n", 456 | "from tensorflow.keras.optimizers import Adam\n", 457 | "```" 458 | ] 459 | }, 460 | { 461 | "cell_type": "code", 462 | "execution_count": 9, 463 | "metadata": {}, 464 | "outputs": [], 465 | "source": [ 466 | "# UNQ_C1\n", 467 | "# GRADED CELL\n", 468 | "\n", 469 | "# Create the Q-Network\n", 470 | "q_network = Sequential([\n", 471 | " Input(shape=state_size), \n", 472 | " Dense(units=64, activation='relu'), \n", 473 | " Dense(units=64, activation='relu'), \n", 474 | " Dense(units=num_actions, activation='linear'),\n", 475 | " ])\n", 476 | "\n", 477 | "# Create the target Q^-Network\n", 478 | "target_q_network = Sequential([\n", 479 | " Input(shape=state_size), \n", 480 | " Dense(units=64, activation='relu'), \n", 481 | " Dense(units=64, activation='relu'), \n", 482 | " Dense(units=num_actions, activation='linear'), \n", 483 | " ])\n", 484 | "\n", 485 | "### START CODE HERE ### \n", 486 | "optimizer = Adam(learning_rate=ALPHA)\n", 487 | "### END CODE HERE ###" 488 | ] 489 | }, 490 | { 491 | "cell_type": "code", 492 | "execution_count": 10, 493 | "metadata": {}, 494 | "outputs": [ 495 | { 496 | "name": "stdout", 497 | "output_type": "stream", 498 | "text": [ 499 | "\u001b[92mAll tests passed!\n", 500 | "\u001b[92mAll tests passed!\n", 501 | "\u001b[92mAll tests passed!\n" 502 | ] 503 | } 504 | ], 505 | "source": [ 506 | "# UNIT TEST\n", 507 | "from public_tests import *\n", 508 | "\n", 509 | "test_network(q_network)\n", 510 | "test_network(target_q_network)\n", 511 | "test_optimizer(optimizer, ALPHA) " 512 | ] 513 | }, 514 | { 515 | "cell_type": "markdown", 516 | "metadata": {}, 517 | "source": [ 518 | "
\n", 519 | " Click for hints\n", 520 | " \n", 521 | "```python\n", 522 | "# Create the Q-Network\n", 523 | "q_network = Sequential([\n", 524 | " Input(shape=state_size), \n", 525 | " Dense(units=64, activation='relu'), \n", 526 | " Dense(units=64, activation='relu'), \n", 527 | " Dense(units=num_actions, activation='linear'),\n", 528 | " ])\n", 529 | "\n", 530 | "# Create the target Q^-Network\n", 531 | "target_q_network = Sequential([\n", 532 | " Input(shape=state_size), \n", 533 | " Dense(units=64, activation='relu'), \n", 534 | " Dense(units=64, activation='relu'), \n", 535 | " Dense(units=num_actions, activation='linear'), \n", 536 | " ])\n", 537 | "\n", 538 | "optimizer = Adam(learning_rate=ALPHA) \n", 539 | "``` " 540 | ] 541 | }, 542 | { 543 | "cell_type": "markdown", 544 | "metadata": {}, 545 | "source": [ 546 | "\n", 547 | "### 6.2 Experience Replay\n", 548 | "\n", 549 | "When an agent interacts with the environment, the states, actions, and rewards the agent experiences are sequential by nature. If the agent tries to learn from these consecutive experiences it can run into problems due to the strong correlations between them. To avoid this, we employ a technique known as **Experience Replay** to generate uncorrelated experiences for training our agent. Experience replay consists of storing the agent's experiences (i.e the states, actions, and rewards the agent receives) in a memory buffer and then sampling a random mini-batch of experiences from the buffer to do the learning. The experience tuples $(S_t, A_t, R_t, S_{t+1})$ will be added to the memory buffer at each time step as the agent interacts with the environment.\n", 550 | "\n", 551 | "For convenience, we will store the experiences as named tuples." 552 | ] 553 | }, 554 | { 555 | "cell_type": "code", 556 | "execution_count": 11, 557 | "metadata": {}, 558 | "outputs": [], 559 | "source": [ 560 | "# Store experiences as named tuples\n", 561 | "experience = namedtuple(\"Experience\", field_names=[\"state\", \"action\", \"reward\", \"next_state\", \"done\"])" 562 | ] 563 | }, 564 | { 565 | "cell_type": "markdown", 566 | "metadata": {}, 567 | "source": [ 568 | "By using experience replay we avoid problematic correlations, oscillations and instabilities. In addition, experience replay also allows the agent to potentially use the same experience in multiple weight updates, which increases data efficiency." 569 | ] 570 | }, 571 | { 572 | "cell_type": "markdown", 573 | "metadata": {}, 574 | "source": [ 575 | "\n", 576 | "## 7 - Deep Q-Learning Algorithm with Experience Replay\n", 577 | "\n", 578 | "Now that we know all the techniques that we are going to use, we can put them togther to arrive at the Deep Q-Learning Algorithm With Experience Replay.\n", 579 | "
\n", 580 | "
\n", 581 | "
\n", 582 | " \n", 583 | "
Fig 3. Deep Q-Learning with Experience Replay.
\n", 584 | "
" 585 | ] 586 | }, 587 | { 588 | "cell_type": "markdown", 589 | "metadata": {}, 590 | "source": [ 591 | "\n", 592 | "### Exercise 2\n", 593 | "\n", 594 | "In this exercise you will implement line ***12*** of the algorithm outlined in *Fig 3* above and you will also compute the loss between the $y$ targets and the $Q(s,a)$ values. In the cell below, complete the `compute_loss` function by setting the $y$ targets equal to:\n", 595 | "\n", 596 | "$$\n", 597 | "\\begin{equation}\n", 598 | " y_j =\n", 599 | " \\begin{cases}\n", 600 | " R_j & \\text{if episode terminates at step } j+1\\\\\n", 601 | " R_j + \\gamma \\max_{a'}\\hat{Q}(s_{j+1},a') & \\text{otherwise}\\\\\n", 602 | " \\end{cases} \n", 603 | "\\end{equation}\n", 604 | "$$\n", 605 | "\n", 606 | "Here are a couple of things to note:\n", 607 | "\n", 608 | "* The `compute_loss` function takes in a mini-batch of experience tuples. This mini-batch of experience tuples is unpacked to extract the `states`, `actions`, `rewards`, `next_states`, and `done_vals`. You should keep in mind that these variables are *TensorFlow Tensors* whose size will depend on the mini-batch size. For example, if the mini-batch size is `64` then both `rewards` and `done_vals` will be TensorFlow Tensors with `64` elements.\n", 609 | "\n", 610 | "\n", 611 | "* Using `if/else` statements to set the $y$ targets will not work when the variables are tensors with many elements. However, notice that you can use the `done_vals` to implement the above in a single line of code. To do this, recall that the `done` variable is a Boolean variable that takes the value `True` when an episode terminates at step $j+1$ and it is `False` otherwise. Taking into account that a Boolean value of `True` has the numerical value of `1` and a Boolean value of `False` has the numerical value of `0`, you can use the factor `(1 - done_vals)` to implement the above in a single line of code. Here's a hint: notice that `(1 - done_vals)` has a value of `0` when `done_vals` is `True` and a value of `1` when `done_vals` is `False`. \n", 612 | "\n", 613 | "Lastly, compute the loss by calculating the Mean-Squared Error (`MSE`) between the `y_targets` and the `q_values`. To calculate the mean-squared error you should use the already imported package `MSE`:\n", 614 | "```python\n", 615 | "from tensorflow.keras.losses import MSE\n", 616 | "```" 617 | ] 618 | }, 619 | { 620 | "cell_type": "code", 621 | "execution_count": 12, 622 | "metadata": {}, 623 | "outputs": [], 624 | "source": [ 625 | "# UNQ_C2\n", 626 | "# GRADED FUNCTION: calculate_loss\n", 627 | "\n", 628 | "def compute_loss(experiences, gamma, q_network, target_q_network):\n", 629 | " \"\"\" \n", 630 | " Calculates the loss.\n", 631 | " \n", 632 | " Args:\n", 633 | " experiences: (tuple) tuple of [\"state\", \"action\", \"reward\", \"next_state\", \"done\"] namedtuples\n", 634 | " gamma: (float) The discount factor.\n", 635 | " q_network: (tf.keras.Sequential) Keras model for predicting the q_values\n", 636 | " target_q_network: (tf.keras.Sequential) Karas model for predicting the targets\n", 637 | " \n", 638 | " Returns:\n", 639 | " loss: (TensorFlow Tensor(shape=(0,), dtype=int32)) the Mean-Squared Error between\n", 640 | " the y targets and the Q(s,a) values.\n", 641 | " \"\"\"\n", 642 | " \n", 643 | " # Unpack the mini-batch of experience tuples\n", 644 | " states, actions, rewards, next_states, done_vals = experiences\n", 645 | " \n", 646 | " # Compute max Q^(s,a)\n", 647 | " max_qsa = tf.reduce_max(target_q_network(next_states), axis=-1)\n", 648 | " \n", 649 | " # Set y = R if episode terminates, otherwise set y = R + γ max Q^(s,a).\n", 650 | " ### START CODE HERE ### \n", 651 | " y_targets = rewards + (gamma * max_qsa * (1 - done_vals))\n", 652 | " ### END CODE HERE ###\n", 653 | " \n", 654 | " # Get the q_values\n", 655 | " q_values = q_network(states)\n", 656 | " q_values = tf.gather_nd(q_values, tf.stack([tf.range(q_values.shape[0]),\n", 657 | " tf.cast(actions, tf.int32)], axis=1))\n", 658 | " \n", 659 | " # Compute the loss\n", 660 | " ### START CODE HERE ### \n", 661 | " loss = MSE(y_targets, q_values) \n", 662 | " ### END CODE HERE ### \n", 663 | " \n", 664 | " return loss" 665 | ] 666 | }, 667 | { 668 | "cell_type": "code", 669 | "execution_count": 13, 670 | "metadata": {}, 671 | "outputs": [ 672 | { 673 | "name": "stdout", 674 | "output_type": "stream", 675 | "text": [ 676 | "\u001b[92mAll tests passed!\n" 677 | ] 678 | } 679 | ], 680 | "source": [ 681 | "# UNIT TEST \n", 682 | "test_compute_loss(compute_loss)" 683 | ] 684 | }, 685 | { 686 | "cell_type": "markdown", 687 | "metadata": {}, 688 | "source": [ 689 | "
\n", 690 | " Click for hints\n", 691 | " \n", 692 | "```python\n", 693 | "def compute_loss(experiences, gamma, q_network, target_q_network):\n", 694 | " \"\"\" \n", 695 | " Calculates the loss.\n", 696 | " \n", 697 | " Args:\n", 698 | " experiences: (tuple) tuple of [\"state\", \"action\", \"reward\", \"next_state\", \"done\"] namedtuples\n", 699 | " gamma: (float) The discount factor.\n", 700 | " q_network: (tf.keras.Sequential) Keras model for predicting the q_values\n", 701 | " target_q_network: (tf.keras.Sequential) Karas model for predicting the targets\n", 702 | " \n", 703 | " Returns:\n", 704 | " loss: (TensorFlow Tensor(shape=(0,), dtype=int32)) the Mean-Squared Error between\n", 705 | " the y targets and the Q(s,a) values.\n", 706 | " \"\"\"\n", 707 | "\n", 708 | " \n", 709 | " # Unpack the mini-batch of experience tuples\n", 710 | " states, actions, rewards, next_states, done_vals = experiences\n", 711 | " \n", 712 | " # Compute max Q^(s,a)\n", 713 | " max_qsa = tf.reduce_max(target_q_network(next_states), axis=-1)\n", 714 | " \n", 715 | " # Set y = R if episode terminates, otherwise set y = R + γ max Q^(s,a).\n", 716 | " y_targets = rewards + (gamma * max_qsa * (1 - done_vals))\n", 717 | " \n", 718 | " # Get the q_values\n", 719 | " q_values = q_network(states)\n", 720 | " q_values = tf.gather_nd(q_values, tf.stack([tf.range(q_values.shape[0]),\n", 721 | " tf.cast(actions, tf.int32)], axis=1))\n", 722 | " \n", 723 | " # Calculate the loss\n", 724 | " loss = MSE(y_targets, q_values)\n", 725 | " \n", 726 | " return loss\n", 727 | "\n", 728 | "``` \n", 729 | " " 730 | ] 731 | }, 732 | { 733 | "cell_type": "markdown", 734 | "metadata": {}, 735 | "source": [ 736 | "\n", 737 | "## 8 - Update the Network Weights\n", 738 | "\n", 739 | "We will use the `agent_learn` function below to implement lines ***12 -14*** of the algorithm outlined in [Fig 3](#7). The `agent_learn` function will update the weights of the $Q$ and target $\\hat Q$ networks using a custom training loop. Because we are using a custom training loop we need to retrieve the gradients via a `tf.GradientTape` instance, and then call `optimizer.apply_gradients()` to update the weights of our $Q$-Network. Note that we are also using the `@tf.function` decorator to increase performance. Without this decorator our training will take twice as long. If you would like to know more about how to increase performance with `@tf.function` take a look at the [TensorFlow documentation](https://www.tensorflow.org/guide/function).\n", 740 | "\n", 741 | "The last line of this function updates the weights of the target $\\hat Q$-Network using a [soft update](#6.1). If you want to know how this is implemented in code we encourage you to take a look at the `utils.update_target_network` function in the `utils` module." 742 | ] 743 | }, 744 | { 745 | "cell_type": "code", 746 | "execution_count": 14, 747 | "metadata": {}, 748 | "outputs": [], 749 | "source": [ 750 | "@tf.function\n", 751 | "def agent_learn(experiences, gamma):\n", 752 | " \"\"\"\n", 753 | " Updates the weights of the Q networks.\n", 754 | " \n", 755 | " Args:\n", 756 | " experiences: (tuple) tuple of [\"state\", \"action\", \"reward\", \"next_state\", \"done\"] namedtuples\n", 757 | " gamma: (float) The discount factor.\n", 758 | " \n", 759 | " \"\"\"\n", 760 | " \n", 761 | " # Calculate the loss\n", 762 | " with tf.GradientTape() as tape:\n", 763 | " loss = compute_loss(experiences, gamma, q_network, target_q_network)\n", 764 | "\n", 765 | " # Get the gradients of the loss with respect to the weights.\n", 766 | " gradients = tape.gradient(loss, q_network.trainable_variables)\n", 767 | " \n", 768 | " # Update the weights of the q_network.\n", 769 | " optimizer.apply_gradients(zip(gradients, q_network.trainable_variables))\n", 770 | "\n", 771 | " # update the weights of target q_network\n", 772 | " utils.update_target_network(q_network, target_q_network)" 773 | ] 774 | }, 775 | { 776 | "cell_type": "markdown", 777 | "metadata": {}, 778 | "source": [ 779 | "\n", 780 | "## 9 - Train the Agent\n", 781 | "\n", 782 | "We are now ready to train our agent to solve the Lunar Lander environment. In the cell below we will implement the algorithm in [Fig 3](#7) line by line (please note that we have included the same algorithm below for easy reference. This will prevent you from scrolling up and down the notebook):\n", 783 | "\n", 784 | "* **Line 1**: We initialize the `memory_buffer` with a capacity of $N =$ `MEMORY_SIZE`. Notice that we are using a `deque` as the data structure for our `memory_buffer`.\n", 785 | "\n", 786 | "\n", 787 | "* **Line 2**: We skip this line since we already initialized the `q_network` in [Exercise 1](#ex01).\n", 788 | "\n", 789 | "\n", 790 | "* **Line 3**: We initialize the `target_q_network` by setting its weights to be equal to those of the `q_network`.\n", 791 | "\n", 792 | "\n", 793 | "* **Line 4**: We start the outer loop. Notice that we have set $M =$ `num_episodes = 2000`. This number is reasonable because the agent should be able to solve the Lunar Lander environment in less than `2000` episodes using this notebook's default parameters.\n", 794 | "\n", 795 | "\n", 796 | "* **Line 5**: We use the `.reset()` method to reset the environment to the initial state and get the initial state.\n", 797 | "\n", 798 | "\n", 799 | "* **Line 6**: We start the inner loop. Notice that we have set $T =$ `max_num_timesteps = 1000`. This means that the episode will automatically terminate if the episode hasn't terminated after `1000` time steps.\n", 800 | "\n", 801 | "\n", 802 | "* **Line 7**: The agent observes the current `state` and chooses an `action` using an $\\epsilon$-greedy policy. Our agent starts out using a value of $\\epsilon =$ `epsilon = 1` which yields an $\\epsilon$-greedy policy that is equivalent to the equiprobable random policy. This means that at the beginning of our training, the agent is just going to take random actions regardless of the observed `state`. As training progresses we will decrease the value of $\\epsilon$ slowly towards a minimum value using a given $\\epsilon$-decay rate. We want this minimum value to be close to zero because a value of $\\epsilon = 0$ will yield an $\\epsilon$-greedy policy that is equivalent to the greedy policy. This means that towards the end of training, the agent will lean towards selecting the `action` that it believes (based on its past experiences) will maximize $Q(s,a)$. We will set the minimum $\\epsilon$ value to be `0.01` and not exactly 0 because we always want to keep a little bit of exploration during training. If you want to know how this is implemented in code we encourage you to take a look at the `utils.get_action` function in the `utils` module.\n", 803 | "\n", 804 | "\n", 805 | "* **Line 8**: We use the `.step()` method to take the given `action` in the environment and get the `reward` and the `next_state`. \n", 806 | "\n", 807 | "\n", 808 | "* **Line 9**: We store the `experience(state, action, reward, next_state, done)` tuple in our `memory_buffer`. Notice that we also store the `done` variable so that we can keep track of when an episode terminates. This allowed us to set the $y$ targets in [Exercise 2](#ex02).\n", 809 | "\n", 810 | "\n", 811 | "* **Line 10**: We check if the conditions are met to perform a learning update. We do this by using our custom `utils.check_update_conditions` function. This function checks if $C =$ `NUM_STEPS_FOR_UPDATE = 4` time steps have occured and if our `memory_buffer` has enough experience tuples to fill a mini-batch. For example, if the mini-batch size is `64`, then our `memory_buffer` should have at least `64` experience tuples in order to pass the latter condition. If the conditions are met, then the `utils.check_update_conditions` function will return a value of `True`, otherwise it will return a value of `False`.\n", 812 | "\n", 813 | "\n", 814 | "* **Lines 11 - 14**: If the `update` variable is `True` then we perform a learning update. The learning update consists of sampling a random mini-batch of experience tuples from our `memory_buffer`, setting the $y$ targets, performing gradient descent, and updating the weights of the networks. We will use the `agent_learn` function we defined in [Section 8](#8) to perform the latter 3.\n", 815 | "\n", 816 | "\n", 817 | "* **Line 15**: At the end of each iteration of the inner loop we set `next_state` as our new `state` so that the loop can start again from this new state. In addition, we check if the episode has reached a terminal state (i.e we check if `done = True`). If a terminal state has been reached, then we break out of the inner loop.\n", 818 | "\n", 819 | "\n", 820 | "* **Line 16**: At the end of each iteration of the outer loop we update the value of $\\epsilon$, and check if the environment has been solved. We consider that the environment has been solved if the agent receives an average of `200` points in the last `100` episodes. If the environment has not been solved we continue the outer loop and start a new episode.\n", 821 | "\n", 822 | "Finally, we wanted to note that we have included some extra variables to keep track of the total number of points the agent received in each episode. This will help us determine if the agent has solved the environment and it will also allow us to see how our agent performed during training. We also use the `time` module to measure how long the training takes. \n", 823 | "\n", 824 | "
\n", 825 | "
\n", 826 | "
\n", 827 | " \n", 828 | "
Fig 4. Deep Q-Learning with Experience Replay.
\n", 829 | "
\n", 830 | "
\n", 831 | "\n", 832 | "**Note:** With this notebook's default parameters, the following cell takes between 10 to 15 minutes to run. " 833 | ] 834 | }, 835 | { 836 | "cell_type": "code", 837 | "execution_count": null, 838 | "metadata": {}, 839 | "outputs": [ 840 | { 841 | "name": "stdout", 842 | "output_type": "stream", 843 | "text": [ 844 | "Episode 100 | Total point average of the last 100 episodes: -150.85\n", 845 | "Episode 140 | Total point average of the last 100 episodes: -128.23" 846 | ] 847 | } 848 | ], 849 | "source": [ 850 | "start = time.time()\n", 851 | "\n", 852 | "num_episodes = 2000\n", 853 | "max_num_timesteps = 1000\n", 854 | "\n", 855 | "total_point_history = []\n", 856 | "\n", 857 | "num_p_av = 100 # number of total points to use for averaging\n", 858 | "epsilon = 1.0 # initial ε value for ε-greedy policy\n", 859 | "\n", 860 | "# Create a memory buffer D with capacity N\n", 861 | "memory_buffer = deque(maxlen=MEMORY_SIZE)\n", 862 | "\n", 863 | "# Set the target network weights equal to the Q-Network weights\n", 864 | "target_q_network.set_weights(q_network.get_weights())\n", 865 | "\n", 866 | "for i in range(num_episodes):\n", 867 | " \n", 868 | " # Reset the environment to the initial state and get the initial state\n", 869 | " state = env.reset()\n", 870 | " total_points = 0\n", 871 | " \n", 872 | " for t in range(max_num_timesteps):\n", 873 | " \n", 874 | " # From the current state S choose an action A using an ε-greedy policy\n", 875 | " state_qn = np.expand_dims(state, axis=0) # state needs to be the right shape for the q_network\n", 876 | " q_values = q_network(state_qn)\n", 877 | " action = utils.get_action(q_values, epsilon)\n", 878 | " \n", 879 | " # Take action A and receive reward R and the next state S'\n", 880 | " next_state, reward, done, _ = env.step(action)\n", 881 | " \n", 882 | " # Store experience tuple (S,A,R,S') in the memory buffer.\n", 883 | " # We store the done variable as well for convenience.\n", 884 | " memory_buffer.append(experience(state, action, reward, next_state, done))\n", 885 | " \n", 886 | " # Only update the network every NUM_STEPS_FOR_UPDATE time steps.\n", 887 | " update = utils.check_update_conditions(t, NUM_STEPS_FOR_UPDATE, memory_buffer)\n", 888 | " \n", 889 | " if update:\n", 890 | " # Sample random mini-batch of experience tuples (S,A,R,S') from D\n", 891 | " experiences = utils.get_experiences(memory_buffer)\n", 892 | " \n", 893 | " # Set the y targets, perform a gradient descent step,\n", 894 | " # and update the network weights.\n", 895 | " agent_learn(experiences, GAMMA)\n", 896 | " \n", 897 | " state = next_state.copy()\n", 898 | " total_points += reward\n", 899 | " \n", 900 | " if done:\n", 901 | " break\n", 902 | " \n", 903 | " total_point_history.append(total_points)\n", 904 | " av_latest_points = np.mean(total_point_history[-num_p_av:])\n", 905 | " \n", 906 | " # Update the ε value\n", 907 | " epsilon = utils.get_new_eps(epsilon)\n", 908 | "\n", 909 | " print(f\"\\rEpisode {i+1} | Total point average of the last {num_p_av} episodes: {av_latest_points:.2f}\", end=\"\")\n", 910 | "\n", 911 | " if (i+1) % num_p_av == 0:\n", 912 | " print(f\"\\rEpisode {i+1} | Total point average of the last {num_p_av} episodes: {av_latest_points:.2f}\")\n", 913 | "\n", 914 | " # We will consider that the environment is solved if we get an\n", 915 | " # average of 200 points in the last 100 episodes.\n", 916 | " if av_latest_points >= 200.0:\n", 917 | " print(f\"\\n\\nEnvironment solved in {i+1} episodes!\")\n", 918 | " q_network.save('lunar_lander_model.h5')\n", 919 | " break\n", 920 | " \n", 921 | "tot_time = time.time() - start\n", 922 | "\n", 923 | "print(f\"\\nTotal Runtime: {tot_time:.2f} s ({(tot_time/60):.2f} min)\")" 924 | ] 925 | }, 926 | { 927 | "cell_type": "markdown", 928 | "metadata": {}, 929 | "source": [ 930 | "We can plot the point history to see how our agent improved during training." 931 | ] 932 | }, 933 | { 934 | "cell_type": "code", 935 | "execution_count": null, 936 | "metadata": { 937 | "id": "E_EUXxurfe8m", 938 | "scrolled": false 939 | }, 940 | "outputs": [], 941 | "source": [ 942 | "# Plot the point history\n", 943 | "utils.plot_history(total_point_history)" 944 | ] 945 | }, 946 | { 947 | "cell_type": "markdown", 948 | "metadata": { 949 | "id": "c_xwgaX5MnYt" 950 | }, 951 | "source": [ 952 | "\n", 953 | "## 10 - See the Trained Agent In Action\n", 954 | "\n", 955 | "Now that we have trained our agent, we can see it in action. We will use the `utils.create_video` function to create a video of our agent interacting with the environment using the trained $Q$-Network. The `utils.create_video` function uses the `imageio` library to create the video. This library produces some warnings that can be distracting, so, to suppress these warnings we run the code below." 956 | ] 957 | }, 958 | { 959 | "cell_type": "code", 960 | "execution_count": null, 961 | "metadata": {}, 962 | "outputs": [], 963 | "source": [ 964 | "# Suppress warnings from imageio\n", 965 | "import logging\n", 966 | "logging.getLogger().setLevel(logging.ERROR)" 967 | ] 968 | }, 969 | { 970 | "cell_type": "markdown", 971 | "metadata": {}, 972 | "source": [ 973 | "In the cell below we create a video of our agent interacting with the Lunar Lander environment using the trained `q_network`. The video is saved to the `videos` folder with the given `filename`. We use the `utils.embed_mp4` function to embed the video in the Jupyter Notebook so that we can see it here directly without having to download it.\n", 974 | "\n", 975 | "We should note that since the lunar lander starts with a random initial force applied to its center of mass, every time you run the cell below you will see a different video. If the agent was trained properly, it should be able to land the lunar lander in the landing pad every time, regardless of the initial force applied to its center of mass." 976 | ] 977 | }, 978 | { 979 | "cell_type": "code", 980 | "execution_count": null, 981 | "metadata": { 982 | "id": "3Ttb_zLeJKiG" 983 | }, 984 | "outputs": [], 985 | "source": [ 986 | "filename = \"./videos/lunar_lander.mp4\"\n", 987 | "\n", 988 | "utils.create_video(filename, env, q_network)\n", 989 | "utils.embed_mp4(filename)" 990 | ] 991 | }, 992 | { 993 | "cell_type": "markdown", 994 | "metadata": {}, 995 | "source": [ 996 | "\n", 997 | "## 11 - Congratulations!\n", 998 | "\n", 999 | "You have successfully used Deep Q-Learning with Experience Replay to train an agent to land a lunar lander safely on a landing pad on the surface of the moon. Congratulations!" 1000 | ] 1001 | }, 1002 | { 1003 | "cell_type": "markdown", 1004 | "metadata": {}, 1005 | "source": [ 1006 | "\n", 1007 | "## 12 - References\n", 1008 | "\n", 1009 | "If you would like to learn more about Deep Q-Learning, we recommend you check out the following papers.\n", 1010 | "\n", 1011 | "\n", 1012 | "* [Human-level Control Through Deep Reinforcement Learning](https://storage.googleapis.com/deepmind-media/dqn/DQNNaturePaper.pdf)\n", 1013 | "\n", 1014 | "\n", 1015 | "* [Continuous Control with Deep Reinforcement Learning](https://arxiv.org/pdf/1509.02971.pdf)\n", 1016 | "\n", 1017 | "\n", 1018 | "* [Playing Atari with Deep Reinforcement Learning](https://www.cs.toronto.edu/~vmnih/docs/dqn.pdf)" 1019 | ] 1020 | }, 1021 | { 1022 | "cell_type": "code", 1023 | "execution_count": null, 1024 | "metadata": {}, 1025 | "outputs": [], 1026 | "source": [] 1027 | } 1028 | ], 1029 | "metadata": { 1030 | "accelerator": "GPU", 1031 | "colab": { 1032 | "collapsed_sections": [], 1033 | "name": "TensorFlow - Lunar Lander.ipynb", 1034 | "provenance": [] 1035 | }, 1036 | "kernelspec": { 1037 | "display_name": "Python 3", 1038 | "language": "python", 1039 | "name": "python3" 1040 | }, 1041 | "language_info": { 1042 | "codemirror_mode": { 1043 | "name": "ipython", 1044 | "version": 3 1045 | }, 1046 | "file_extension": ".py", 1047 | "mimetype": "text/x-python", 1048 | "name": "python", 1049 | "nbconvert_exporter": "python", 1050 | "pygments_lexer": "ipython3", 1051 | "version": "3.7.6" 1052 | } 1053 | }, 1054 | "nbformat": 4, 1055 | "nbformat_minor": 1 1056 | } 1057 | -------------------------------------------------------------------------------- /C3_W2_RecSysNN_Assignment.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": { 6 | "id": "Lzk7iX_CodX6", 7 | "tags": [] 8 | }, 9 | "source": [ 10 | "# Practice lab: Deep Learning for Content-Based Filtering\n", 11 | "\n", 12 | "In this exercise, you will implement content-based filtering using a neural network to build a recommender system for movies. \n", 13 | "\n", 14 | "# Outline \n", 15 | "- [ 1 - Packages](#1)\n", 16 | "- [ 2 - Movie ratings dataset](#2)\n", 17 | " - [ 2.1 Content-based filtering with a neural network](#2.1)\n", 18 | " - [ 2.2 Preparing the training data](#2.2)\n", 19 | "- [ 3 - Neural Network for content-based filtering](#3)\n", 20 | " - [ 3.1 Predictions](#3.1)\n", 21 | " - [ Exercise 1](#ex01)\n", 22 | "- [ 4 - Congratulations!](#4)\n" 23 | ] 24 | }, 25 | { 26 | "cell_type": "markdown", 27 | "metadata": {}, 28 | "source": [ 29 | "\n", 30 | "## 1 - Packages \n", 31 | "We will use familiar packages, NumPy, TensorFlow and helpful routines from [scikit-learn](https://scikit-learn.org/stable/). We will also use [tabulate](https://pypi.org/project/tabulate/) to neatly print tables and [Pandas](https://pandas.pydata.org/) to organize tabular data." 32 | ] 33 | }, 34 | { 35 | "cell_type": "code", 36 | "execution_count": 1, 37 | "metadata": { 38 | "id": "Xu-w_RmNwCV5" 39 | }, 40 | "outputs": [], 41 | "source": [ 42 | "import numpy as np\n", 43 | "import numpy.ma as ma\n", 44 | "from numpy import genfromtxt\n", 45 | "from collections import defaultdict\n", 46 | "import pandas as pd\n", 47 | "import tensorflow as tf\n", 48 | "from tensorflow import keras\n", 49 | "from sklearn.preprocessing import StandardScaler, MinMaxScaler\n", 50 | "from sklearn.model_selection import train_test_split\n", 51 | "import tabulate\n", 52 | "from recsysNN_utils import *\n", 53 | "pd.set_option(\"display.precision\", 1)" 54 | ] 55 | }, 56 | { 57 | "cell_type": "markdown", 58 | "metadata": {}, 59 | "source": [ 60 | "\n", 61 | "## 2 - Movie ratings dataset \n", 62 | "The data set is derived from the [MovieLens ml-latest-small](https://grouplens.org/datasets/movielens/latest/) dataset. \n", 63 | "\n", 64 | "[F. Maxwell Harper and Joseph A. Konstan. 2015. The MovieLens Datasets: History and Context. ACM Transactions on Interactive Intelligent Systems (TiiS) 5, 4: 19:1–19:19. ]\n", 65 | "\n", 66 | "The original dataset has 9000 movies rated by 600 users with ratings on a scale of 0.5 to 5 in 0.5 step increments. The dataset has been reduced in size to focus on movies from the years since 2000 and popular genres. The reduced dataset has $n_u = 395$ users and $n_m= 694$ movies. For each movie, the dataset provides a movie title, release date, and one or more genres. For example \"Toy Story 3\" was released in 2010 and has several genres: \"Adventure|Animation|Children|Comedy|Fantasy|IMAX\". This dataset contains little information about users other than their ratings. This dataset is used to create training vectors for the neural networks described below. " 67 | ] 68 | }, 69 | { 70 | "cell_type": "markdown", 71 | "metadata": {}, 72 | "source": [ 73 | "\n", 74 | "### 2.1 Content-based filtering with a neural network\n", 75 | "\n", 76 | "In the collaborative filtering lab, you generated two vectors, a user vector and an item/movie vector whose dot product would predict a rating. The vectors were derived solely from the ratings. \n", 77 | "\n", 78 | "Content-based filtering also generates a user and movie feature vector but recognizes there may be other information available about the user and/or movie that may improve the prediction. The additional information is provided to a neural network which then generates the user and movie vector as shown below.\n", 79 | "
\n", 80 | "
\n", 81 | "
\n", 82 | "The movie content provided to the network is a combination of the original data and some 'engineered features'. Recall the feature engineering discussion and lab from Course 1, Week 2, lab 4. The original features are the year the movie was released and the movie's genre presented as a one-hot vector. There are 14 genres. The engineered feature is an average rating derived from the user ratings. Movies with multiple genre have a training vector per genre. \n", 83 | "\n", 84 | "The user content is composed of only engineered features. A per genre average rating is computed per user. Additionally, a user id, rating count and rating average are available, but are not included in the training or prediction content. They are useful in interpreting data.\n", 85 | "\n", 86 | "The training set consists of all the ratings made by the users in the data set. The user and movie/item vectors are presented to the above network together as a training set. The user vector is the same for all the movies rated by the user. \n", 87 | "\n", 88 | "Below, let's load and display some of the data." 89 | ] 90 | }, 91 | { 92 | "cell_type": "code", 93 | "execution_count": 2, 94 | "metadata": { 95 | "id": "M5gfMLYgxCD1" 96 | }, 97 | "outputs": [ 98 | { 99 | "name": "stdout", 100 | "output_type": "stream", 101 | "text": [ 102 | "Number of training vectors: 58187\n" 103 | ] 104 | } 105 | ], 106 | "source": [ 107 | "# Load Data, set configuration variables\n", 108 | "item_train, user_train, y_train, item_features, user_features, item_vecs, movie_dict, user_to_genre = load_data()\n", 109 | "\n", 110 | "num_user_features = user_train.shape[1] - 3 # remove userid, rating count and ave rating during training\n", 111 | "num_item_features = item_train.shape[1] - 1 # remove movie id at train time\n", 112 | "uvs = 3 # user genre vector start\n", 113 | "ivs = 3 # item genre vector start\n", 114 | "u_s = 3 # start of columns to use in training, user\n", 115 | "i_s = 1 # start of columns to use in training, items\n", 116 | "scaledata = True # applies the standard scalar to data if true\n", 117 | "print(f\"Number of training vectors: {len(item_train)}\")" 118 | ] 119 | }, 120 | { 121 | "cell_type": "markdown", 122 | "metadata": {}, 123 | "source": [ 124 | "Some of the user and item/movie features are not used in training. Below, the features in brackets \"[]\" such as the \"user id\", \"rating count\" and \"rating ave\" are not included when the model is trained and used. Note, the user vector is the same for all the movies rated." 125 | ] 126 | }, 127 | { 128 | "cell_type": "code", 129 | "execution_count": 3, 130 | "metadata": {}, 131 | "outputs": [ 132 | { 133 | "data": { 134 | "text/html": [ 135 | "\n", 136 | "\n", 137 | "\n", 138 | "\n", 139 | "\n", 140 | "\n", 141 | "\n", 142 | "\n", 143 | "\n", 144 | "\n", 145 | "\n", 146 | "
[user id] [rating count] [rating ave] Act ion Adve nture Anim ation Chil dren Com edy Crime Docum entary Drama Fan tasy Hor ror Mys tery Rom ance Sci -Fi Thri ller
2 16 4.1 3.9 5.0 0.0 0.0 4.0 4.2 4.0 4.0 0.0 3.0 4.0 0.0 4.2 3.9
2 16 4.1 3.9 5.0 0.0 0.0 4.0 4.2 4.0 4.0 0.0 3.0 4.0 0.0 4.2 3.9
2 16 4.1 3.9 5.0 0.0 0.0 4.0 4.2 4.0 4.0 0.0 3.0 4.0 0.0 4.2 3.9
2 16 4.1 3.9 5.0 0.0 0.0 4.0 4.2 4.0 4.0 0.0 3.0 4.0 0.0 4.2 3.9
2 16 4.1 3.9 5.0 0.0 0.0 4.0 4.2 4.0 4.0 0.0 3.0 4.0 0.0 4.2 3.9
" 147 | ], 148 | "text/plain": [ 149 | "'\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n
[user id] [rating count] [rating ave] Act ion Adve nture Anim ation Chil dren Com edy Crime Docum entary Drama Fan tasy Hor ror Mys tery Rom ance Sci -Fi Thri ller
2 16 4.1 3.9 5.0 0.0 0.0 4.0 4.2 4.0 4.0 0.0 3.0 4.0 0.0 4.2 3.9
2 16 4.1 3.9 5.0 0.0 0.0 4.0 4.2 4.0 4.0 0.0 3.0 4.0 0.0 4.2 3.9
2 16 4.1 3.9 5.0 0.0 0.0 4.0 4.2 4.0 4.0 0.0 3.0 4.0 0.0 4.2 3.9
2 16 4.1 3.9 5.0 0.0 0.0 4.0 4.2 4.0 4.0 0.0 3.0 4.0 0.0 4.2 3.9
2 16 4.1 3.9 5.0 0.0 0.0 4.0 4.2 4.0 4.0 0.0 3.0 4.0 0.0 4.2 3.9
'" 150 | ] 151 | }, 152 | "execution_count": 3, 153 | "metadata": {}, 154 | "output_type": "execute_result" 155 | } 156 | ], 157 | "source": [ 158 | "pprint_train(user_train, user_features, uvs, u_s, maxcount=5)" 159 | ] 160 | }, 161 | { 162 | "cell_type": "code", 163 | "execution_count": 4, 164 | "metadata": {}, 165 | "outputs": [ 166 | { 167 | "data": { 168 | "text/html": [ 169 | "\n", 170 | "\n", 171 | "\n", 172 | "\n", 173 | "\n", 174 | "\n", 175 | "\n", 176 | "\n", 177 | "\n", 178 | "\n", 179 | "\n", 180 | "
[movie id] year ave rating Act ion Adve nture Anim ation Chil dren Com edy Crime Docum entary Drama Fan tasy Hor ror Mys tery Rom ance Sci -Fi Thri ller
6874 2003 4.0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
6874 2003 4.0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
6874 2003 4.0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
8798 2004 3.8 1 0 0 0 0 0 0 0 0 0 0 0 0 0
8798 2004 3.8 0 0 0 0 0 1 0 0 0 0 0 0 0 0
" 181 | ], 182 | "text/plain": [ 183 | "'\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n
[movie id] year ave rating Act ion Adve nture Anim ation Chil dren Com edy Crime Docum entary Drama Fan tasy Hor ror Mys tery Rom ance Sci -Fi Thri ller
6874 2003 4.0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
6874 2003 4.0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
6874 2003 4.0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
8798 2004 3.8 1 0 0 0 0 0 0 0 0 0 0 0 0 0
8798 2004 3.8 0 0 0 0 0 1 0 0 0 0 0 0 0 0
'" 184 | ] 185 | }, 186 | "execution_count": 4, 187 | "metadata": {}, 188 | "output_type": "execute_result" 189 | } 190 | ], 191 | "source": [ 192 | "pprint_train(item_train, item_features, ivs, i_s, maxcount=5, user=False)" 193 | ] 194 | }, 195 | { 196 | "cell_type": "code", 197 | "execution_count": 5, 198 | "metadata": {}, 199 | "outputs": [ 200 | { 201 | "name": "stdout", 202 | "output_type": "stream", 203 | "text": [ 204 | "y_train[:5]: [4. 4. 4. 3.5 3.5]\n" 205 | ] 206 | } 207 | ], 208 | "source": [ 209 | "print(f\"y_train[:5]: {y_train[:5]}\")" 210 | ] 211 | }, 212 | { 213 | "cell_type": "markdown", 214 | "metadata": {}, 215 | "source": [ 216 | "Above, we can see that movie 6874 is an action movie released in 2003. User 2 rates action movies as 3.9 on average. Further, movie 6874 was also listed in the Crime and Thriller genre. MovieLens users gave the movie an average rating of 4. A training example consists of a row from both tables and a rating from y_train." 217 | ] 218 | }, 219 | { 220 | "cell_type": "markdown", 221 | "metadata": {}, 222 | "source": [ 223 | "\n", 224 | "### 2.2 Preparing the training data\n", 225 | "Recall in Course 1, Week 2, you explored feature scaling as a means of improving convergence. We'll scale the input features using the [scikit learn StandardScaler](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html). This was used in Course 1, Week 2, Lab 5. Below, the inverse_transform is also shown to produce the original inputs." 226 | ] 227 | }, 228 | { 229 | "cell_type": "code", 230 | "execution_count": 6, 231 | "metadata": {}, 232 | "outputs": [ 233 | { 234 | "name": "stdout", 235 | "output_type": "stream", 236 | "text": [ 237 | "True\n", 238 | "True\n" 239 | ] 240 | } 241 | ], 242 | "source": [ 243 | "# scale training data\n", 244 | "if scaledata:\n", 245 | " item_train_save = item_train\n", 246 | " user_train_save = user_train\n", 247 | "\n", 248 | " scalerItem = StandardScaler()\n", 249 | " scalerItem.fit(item_train)\n", 250 | " item_train = scalerItem.transform(item_train)\n", 251 | "\n", 252 | " scalerUser = StandardScaler()\n", 253 | " scalerUser.fit(user_train)\n", 254 | " user_train = scalerUser.transform(user_train)\n", 255 | "\n", 256 | " print(np.allclose(item_train_save, scalerItem.inverse_transform(item_train)))\n", 257 | " print(np.allclose(user_train_save, scalerUser.inverse_transform(user_train)))" 258 | ] 259 | }, 260 | { 261 | "cell_type": "markdown", 262 | "metadata": {}, 263 | "source": [ 264 | "To allow us to evaluate the results, we will split the data into training and test sets as was discussed in Course 2, Week 3. Here we will use [sklean train_test_split](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html) to split and shuffle the data. Note that setting the initial random state to the same value ensures item, user, and y are shuffled identically." 265 | ] 266 | }, 267 | { 268 | "cell_type": "code", 269 | "execution_count": 7, 270 | "metadata": {}, 271 | "outputs": [ 272 | { 273 | "name": "stdout", 274 | "output_type": "stream", 275 | "text": [ 276 | "movie/item training data shape: (46549, 17)\n", 277 | "movie/item test data shape: (11638, 17)\n" 278 | ] 279 | } 280 | ], 281 | "source": [ 282 | "item_train, item_test = train_test_split(item_train, train_size=0.80, shuffle=True, random_state=1)\n", 283 | "user_train, user_test = train_test_split(user_train, train_size=0.80, shuffle=True, random_state=1)\n", 284 | "y_train, y_test = train_test_split(y_train, train_size=0.80, shuffle=True, random_state=1)\n", 285 | "print(f\"movie/item training data shape: {item_train.shape}\")\n", 286 | "print(f\"movie/item test data shape: {item_test.shape}\")" 287 | ] 288 | }, 289 | { 290 | "cell_type": "markdown", 291 | "metadata": {}, 292 | "source": [ 293 | "The scaled, shuffled data now has a mean of zero." 294 | ] 295 | }, 296 | { 297 | "cell_type": "code", 298 | "execution_count": 8, 299 | "metadata": {}, 300 | "outputs": [ 301 | { 302 | "data": { 303 | "text/html": [ 304 | "\n", 305 | "\n", 306 | "\n", 307 | "\n", 308 | "\n", 309 | "\n", 310 | "\n", 311 | "\n", 312 | "\n", 313 | "\n", 314 | "\n", 315 | "
[user id] [rating count] [rating ave] Act ion Adve nture Anim ation Chil dren Com edy Crime Docum entary Drama Fan tasy Hor ror Mys tery Rom ance Sci -Fi Thri ller
1 0 0.6 0.7 0.6 0.6 0.7 0.7 0.5 0.7 0.2 0.3 0.3 0.5 0.5 0.8 0.5
0 0 1.6 1.5 1.7 0.9 1.0 1.4 0.8 -1.2 1.2 1.2 1.6 0.9 1.4 1.2 1.0
0 0 0.8 0.6 0.7 0.5 0.6 0.6 0.3 -1.2 0.7 0.8 0.9 0.6 0.2 0.6 0.6
1 0 -0.1 0.2 -0.1 0.3 0.7 0.3 0.2 1.0 -0.5 -0.7 -2.1 0.5 0.7 0.3 0.0
-1 0 -1.3 -0.8 -0.8 0.1 -0.1 -1.1 -0.9 -1.2 -1.5 -0.6 -0.5 -0.6 -0.9 -0.4 -0.9
" 316 | ], 317 | "text/plain": [ 318 | "'\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n
[user id] [rating count] [rating ave] Act ion Adve nture Anim ation Chil dren Com edy Crime Docum entary Drama Fan tasy Hor ror Mys tery Rom ance Sci -Fi Thri ller
1 0 0.6 0.7 0.6 0.6 0.7 0.7 0.5 0.7 0.2 0.3 0.3 0.5 0.5 0.8 0.5
0 0 1.6 1.5 1.7 0.9 1.0 1.4 0.8 -1.2 1.2 1.2 1.6 0.9 1.4 1.2 1.0
0 0 0.8 0.6 0.7 0.5 0.6 0.6 0.3 -1.2 0.7 0.8 0.9 0.6 0.2 0.6 0.6
1 0 -0.1 0.2 -0.1 0.3 0.7 0.3 0.2 1.0 -0.5 -0.7 -2.1 0.5 0.7 0.3 0.0
-1 0 -1.3 -0.8 -0.8 0.1 -0.1 -1.1 -0.9 -1.2 -1.5 -0.6 -0.5 -0.6 -0.9 -0.4 -0.9
'" 319 | ] 320 | }, 321 | "execution_count": 8, 322 | "metadata": {}, 323 | "output_type": "execute_result" 324 | } 325 | ], 326 | "source": [ 327 | "pprint_train(user_train, user_features, uvs, u_s, maxcount=5)" 328 | ] 329 | }, 330 | { 331 | "cell_type": "markdown", 332 | "metadata": { 333 | "id": "KeoAhs95LRop" 334 | }, 335 | "source": [ 336 | "Scale the target ratings using a Min Max Scaler to scale the target to be between -1 and 1. We use scikit-learn because it has an inverse_transform. [scikit learn MinMaxScaler](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MinMaxScaler.html)" 337 | ] 338 | }, 339 | { 340 | "cell_type": "code", 341 | "execution_count": 9, 342 | "metadata": { 343 | "id": "A8myXMxFC8lP" 344 | }, 345 | "outputs": [ 346 | { 347 | "name": "stdout", 348 | "output_type": "stream", 349 | "text": [ 350 | "(46549, 1) (11638, 1)\n" 351 | ] 352 | } 353 | ], 354 | "source": [ 355 | "scaler = MinMaxScaler((-1, 1))\n", 356 | "scaler.fit(y_train.reshape(-1, 1))\n", 357 | "ynorm_train = scaler.transform(y_train.reshape(-1, 1))\n", 358 | "ynorm_test = scaler.transform(y_test.reshape(-1, 1))\n", 359 | "print(ynorm_train.shape, ynorm_test.shape)" 360 | ] 361 | }, 362 | { 363 | "cell_type": "markdown", 364 | "metadata": {}, 365 | "source": [ 366 | "\n", 367 | "## 3 - Neural Network for content-based filtering\n", 368 | "Now, let's construct a neural network as described in the figure above. It will have two networks that are combined by a dot product. You will construct the two networks. In this example, they will be identical. Note that these networks do not need to be the same. If the user content was substantially larger than the movie content, you might elect to increase the complexity of the user network relative to the movie network. In this case, the content is similar, so the networks are the same.\n", 369 | "\n", 370 | "- Use a Keras sequential model\n", 371 | " - The first layer is a dense layer with 256 units and a relu activation.\n", 372 | " - The second layer is a dense layer with 128 units and a relu activation.\n", 373 | " - The third layer is a dense layer with `num_outputs` units and a linear or no activation. \n", 374 | " \n", 375 | "The remainder of the network will be provided. The provided code does not use the Keras sequential model but instead uses the Keras [functional api](https://keras.io/guides/functional_api/). This format allows for more flexibility in how components are interconnected.\n" 376 | ] 377 | }, 378 | { 379 | "cell_type": "code", 380 | "execution_count": 10, 381 | "metadata": { 382 | "id": "CBjZ2HhRwpa0" 383 | }, 384 | "outputs": [ 385 | { 386 | "name": "stdout", 387 | "output_type": "stream", 388 | "text": [ 389 | "Model: \"model\"\n", 390 | "__________________________________________________________________________________________________\n", 391 | "Layer (type) Output Shape Param # Connected to \n", 392 | "==================================================================================================\n", 393 | "input_1 (InputLayer) [(None, 14)] 0 \n", 394 | "__________________________________________________________________________________________________\n", 395 | "input_2 (InputLayer) [(None, 16)] 0 \n", 396 | "__________________________________________________________________________________________________\n", 397 | "sequential (Sequential) (None, 32) 40864 input_1[0][0] \n", 398 | "__________________________________________________________________________________________________\n", 399 | "sequential_1 (Sequential) (None, 32) 41376 input_2[0][0] \n", 400 | "__________________________________________________________________________________________________\n", 401 | "tf_op_layer_l2_normalize/Square [(None, 32)] 0 sequential[0][0] \n", 402 | "__________________________________________________________________________________________________\n", 403 | "tf_op_layer_l2_normalize_1/Squa [(None, 32)] 0 sequential_1[0][0] \n", 404 | "__________________________________________________________________________________________________\n", 405 | "tf_op_layer_l2_normalize/Sum (T [(None, 1)] 0 tf_op_layer_l2_normalize/Square[0\n", 406 | "__________________________________________________________________________________________________\n", 407 | "tf_op_layer_l2_normalize_1/Sum [(None, 1)] 0 tf_op_layer_l2_normalize_1/Square\n", 408 | "__________________________________________________________________________________________________\n", 409 | "tf_op_layer_l2_normalize/Maximu [(None, 1)] 0 tf_op_layer_l2_normalize/Sum[0][0\n", 410 | "__________________________________________________________________________________________________\n", 411 | "tf_op_layer_l2_normalize_1/Maxi [(None, 1)] 0 tf_op_layer_l2_normalize_1/Sum[0]\n", 412 | "__________________________________________________________________________________________________\n", 413 | "tf_op_layer_l2_normalize/Rsqrt [(None, 1)] 0 tf_op_layer_l2_normalize/Maximum[\n", 414 | "__________________________________________________________________________________________________\n", 415 | "tf_op_layer_l2_normalize_1/Rsqr [(None, 1)] 0 tf_op_layer_l2_normalize_1/Maximu\n", 416 | "__________________________________________________________________________________________________\n", 417 | "tf_op_layer_l2_normalize (Tenso [(None, 32)] 0 sequential[0][0] \n", 418 | " tf_op_layer_l2_normalize/Rsqrt[0]\n", 419 | "__________________________________________________________________________________________________\n", 420 | "tf_op_layer_l2_normalize_1 (Ten [(None, 32)] 0 sequential_1[0][0] \n", 421 | " tf_op_layer_l2_normalize_1/Rsqrt[\n", 422 | "__________________________________________________________________________________________________\n", 423 | "dot (Dot) (None, 1) 0 tf_op_layer_l2_normalize[0][0] \n", 424 | " tf_op_layer_l2_normalize_1[0][0] \n", 425 | "==================================================================================================\n", 426 | "Total params: 82,240\n", 427 | "Trainable params: 82,240\n", 428 | "Non-trainable params: 0\n", 429 | "__________________________________________________________________________________________________\n" 430 | ] 431 | } 432 | ], 433 | "source": [ 434 | "# GRADED_CELL\n", 435 | "# UNQ_C1\n", 436 | "\n", 437 | "num_outputs = 32\n", 438 | "tf.random.set_seed(1)\n", 439 | "user_NN = tf.keras.models.Sequential([\n", 440 | " ### START CODE HERE ### \n", 441 | " tf.keras.layers.Dense(256, activation='relu'),\n", 442 | " tf.keras.layers.Dense(128, activation='relu'),\n", 443 | " tf.keras.layers.Dense(num_outputs),\n", 444 | " ### END CODE HERE ### \n", 445 | "])\n", 446 | "\n", 447 | "item_NN = tf.keras.models.Sequential([\n", 448 | " ### START CODE HERE ### \n", 449 | " tf.keras.layers.Dense(256, activation='relu'),\n", 450 | " tf.keras.layers.Dense(128, activation='relu'),\n", 451 | " tf.keras.layers.Dense(num_outputs),\n", 452 | " ### END CODE HERE ### \n", 453 | "])\n", 454 | "\n", 455 | "# create the user input and point to the base network\n", 456 | "input_user = tf.keras.layers.Input(shape=(num_user_features))\n", 457 | "vu = user_NN(input_user)\n", 458 | "vu = tf.linalg.l2_normalize(vu, axis=1)\n", 459 | "\n", 460 | "# create the item input and point to the base network\n", 461 | "input_item = tf.keras.layers.Input(shape=(num_item_features))\n", 462 | "vm = item_NN(input_item)\n", 463 | "vm = tf.linalg.l2_normalize(vm, axis=1)\n", 464 | "\n", 465 | "# compute the dot product of the two vectors vu and vm\n", 466 | "output = tf.keras.layers.Dot(axes=1)([vu, vm])\n", 467 | "\n", 468 | "# specify the inputs and output of the model\n", 469 | "model = Model([input_user, input_item], output)\n", 470 | "\n", 471 | "model.summary()" 472 | ] 473 | }, 474 | { 475 | "cell_type": "code", 476 | "execution_count": 11, 477 | "metadata": {}, 478 | "outputs": [ 479 | { 480 | "name": "stdout", 481 | "output_type": "stream", 482 | "text": [ 483 | "\u001b[92mAll tests passed!\n", 484 | "\u001b[92mAll tests passed!\n" 485 | ] 486 | } 487 | ], 488 | "source": [ 489 | "# Public tests\n", 490 | "from public_tests import *\n", 491 | "test_tower(user_NN)\n", 492 | "test_tower(item_NN)" 493 | ] 494 | }, 495 | { 496 | "cell_type": "markdown", 497 | "metadata": {}, 498 | "source": [ 499 | "
\n", 500 | " Click for hints\n", 501 | " \n", 502 | " You can create a dense layer with a relu activation as shown.\n", 503 | " \n", 504 | "```python \n", 505 | "user_NN = tf.keras.models.Sequential([\n", 506 | " ### START CODE HERE ### \n", 507 | " tf.keras.layers.Dense(256, activation='relu'),\n", 508 | "\n", 509 | " \n", 510 | " ### END CODE HERE ### \n", 511 | "])\n", 512 | "\n", 513 | "item_NN = tf.keras.models.Sequential([\n", 514 | " ### START CODE HERE ### \n", 515 | " tf.keras.layers.Dense(256, activation='relu'),\n", 516 | "\n", 517 | " \n", 518 | " ### END CODE HERE ### \n", 519 | "])\n", 520 | "``` \n", 521 | "
\n", 522 | " Click for solution\n", 523 | " \n", 524 | "```python \n", 525 | "user_NN = tf.keras.models.Sequential([\n", 526 | " ### START CODE HERE ### \n", 527 | " tf.keras.layers.Dense(256, activation='relu'),\n", 528 | " tf.keras.layers.Dense(128, activation='relu'),\n", 529 | " tf.keras.layers.Dense(num_outputs),\n", 530 | " ### END CODE HERE ### \n", 531 | "])\n", 532 | "\n", 533 | "item_NN = tf.keras.models.Sequential([\n", 534 | " ### START CODE HERE ### \n", 535 | " tf.keras.layers.Dense(256, activation='relu'),\n", 536 | " tf.keras.layers.Dense(128, activation='relu'),\n", 537 | " tf.keras.layers.Dense(num_outputs),\n", 538 | " ### END CODE HERE ### \n", 539 | "])\n", 540 | "```\n", 541 | "
\n", 542 | "
\n", 543 | "\n", 544 | " \n" 545 | ] 546 | }, 547 | { 548 | "cell_type": "markdown", 549 | "metadata": {}, 550 | "source": [ 551 | "We'll use a mean squared error loss and an Adam optimizer." 552 | ] 553 | }, 554 | { 555 | "cell_type": "code", 556 | "execution_count": 12, 557 | "metadata": { 558 | "id": "pGK5MEUowxN4" 559 | }, 560 | "outputs": [], 561 | "source": [ 562 | "tf.random.set_seed(1)\n", 563 | "cost_fn = tf.keras.losses.MeanSquaredError()\n", 564 | "opt = keras.optimizers.Adam(learning_rate=0.01)\n", 565 | "model.compile(optimizer=opt,\n", 566 | " loss=cost_fn)" 567 | ] 568 | }, 569 | { 570 | "cell_type": "code", 571 | "execution_count": 13, 572 | "metadata": { 573 | "id": "6zHf7eASw0tN" 574 | }, 575 | "outputs": [ 576 | { 577 | "name": "stdout", 578 | "output_type": "stream", 579 | "text": [ 580 | "Train on 46549 samples\n", 581 | "Epoch 1/30\n", 582 | "46549/46549 [==============================] - 6s 123us/sample - loss: 0.1254\n", 583 | "Epoch 2/30\n", 584 | "46549/46549 [==============================] - 5s 114us/sample - loss: 0.1187\n", 585 | "Epoch 3/30\n", 586 | "46549/46549 [==============================] - 5s 114us/sample - loss: 0.1169\n", 587 | "Epoch 4/30\n", 588 | "46549/46549 [==============================] - 5s 114us/sample - loss: 0.1154\n", 589 | "Epoch 5/30\n", 590 | "46549/46549 [==============================] - 5s 114us/sample - loss: 0.1142\n", 591 | "Epoch 6/30\n", 592 | "46549/46549 [==============================] - 5s 114us/sample - loss: 0.1130\n", 593 | "Epoch 7/30\n", 594 | "46549/46549 [==============================] - 5s 115us/sample - loss: 0.1119\n", 595 | "Epoch 8/30\n", 596 | "46549/46549 [==============================] - 5s 116us/sample - loss: 0.1110\n", 597 | "Epoch 9/30\n", 598 | "46549/46549 [==============================] - 5s 114us/sample - loss: 0.1095\n", 599 | "Epoch 10/30\n", 600 | "46549/46549 [==============================] - 5s 115us/sample - loss: 0.1083\n", 601 | "Epoch 11/30\n", 602 | "46549/46549 [==============================] - 5s 114us/sample - loss: 0.1073\n", 603 | "Epoch 12/30\n", 604 | "46549/46549 [==============================] - 5s 112us/sample - loss: 0.1066\n", 605 | "Epoch 13/30\n", 606 | "46549/46549 [==============================] - 5s 114us/sample - loss: 0.1059\n", 607 | "Epoch 14/30\n", 608 | "46549/46549 [==============================] - 5s 114us/sample - loss: 0.1054\n", 609 | "Epoch 15/30\n", 610 | "46549/46549 [==============================] - 5s 114us/sample - loss: 0.1047\n", 611 | "Epoch 16/30\n", 612 | "46549/46549 [==============================] - 5s 114us/sample - loss: 0.1041\n", 613 | "Epoch 17/30\n", 614 | "46549/46549 [==============================] - 5s 114us/sample - loss: 0.1036\n", 615 | "Epoch 18/30\n", 616 | "46549/46549 [==============================] - 5s 112us/sample - loss: 0.1030\n", 617 | "Epoch 19/30\n", 618 | "46549/46549 [==============================] - 5s 114us/sample - loss: 0.1027\n", 619 | "Epoch 20/30\n", 620 | "46549/46549 [==============================] - 5s 114us/sample - loss: 0.1021\n", 621 | "Epoch 21/30\n", 622 | "46549/46549 [==============================] - 5s 114us/sample - loss: 0.1018\n", 623 | "Epoch 22/30\n", 624 | "46549/46549 [==============================] - 5s 114us/sample - loss: 0.1014\n", 625 | "Epoch 23/30\n", 626 | "46549/46549 [==============================] - 5s 114us/sample - loss: 0.1010\n", 627 | "Epoch 24/30\n", 628 | "46549/46549 [==============================] - 5s 114us/sample - loss: 0.1006\n", 629 | "Epoch 25/30\n", 630 | "46549/46549 [==============================] - 5s 114us/sample - loss: 0.1003\n", 631 | "Epoch 26/30\n", 632 | "46549/46549 [==============================] - 5s 112us/sample - loss: 0.0999\n", 633 | "Epoch 27/30\n", 634 | "46549/46549 [==============================] - 5s 112us/sample - loss: 0.0997\n", 635 | "Epoch 28/30\n", 636 | "46549/46549 [==============================] - 5s 114us/sample - loss: 0.0991\n", 637 | "Epoch 29/30\n", 638 | "46549/46549 [==============================] - 5s 115us/sample - loss: 0.0989\n", 639 | "Epoch 30/30\n", 640 | "46549/46549 [==============================] - 5s 114us/sample - loss: 0.0985\n" 641 | ] 642 | }, 643 | { 644 | "data": { 645 | "text/plain": [ 646 | "" 647 | ] 648 | }, 649 | "execution_count": 13, 650 | "metadata": {}, 651 | "output_type": "execute_result" 652 | } 653 | ], 654 | "source": [ 655 | "tf.random.set_seed(1)\n", 656 | "model.fit([user_train[:, u_s:], item_train[:, i_s:]], ynorm_train, epochs=30)" 657 | ] 658 | }, 659 | { 660 | "cell_type": "markdown", 661 | "metadata": {}, 662 | "source": [ 663 | "Evaluate the model to determine loss on the test data. It is comparable to the training loss indicating the model has not substantially overfit the training data." 664 | ] 665 | }, 666 | { 667 | "cell_type": "code", 668 | "execution_count": 14, 669 | "metadata": {}, 670 | "outputs": [ 671 | { 672 | "name": "stdout", 673 | "output_type": "stream", 674 | "text": [ 675 | "11638/11638 [==============================] - 0s 33us/sample - loss: 0.1045\n" 676 | ] 677 | }, 678 | { 679 | "data": { 680 | "text/plain": [ 681 | "0.10449595100221243" 682 | ] 683 | }, 684 | "execution_count": 14, 685 | "metadata": {}, 686 | "output_type": "execute_result" 687 | } 688 | ], 689 | "source": [ 690 | "model.evaluate([user_test[:, u_s:], item_test[:, i_s:]], ynorm_test)" 691 | ] 692 | }, 693 | { 694 | "cell_type": "markdown", 695 | "metadata": { 696 | "id": "Xsre-gquwEls" 697 | }, 698 | "source": [ 699 | "\n", 700 | "### 3.1 Predictions\n", 701 | "Below, you'll use your model to make predictions in a number of circumstances. \n", 702 | "#### Predictions for a new user\n", 703 | "First, we'll create a new user and have the model suggest movies for that user. After you have tried this example on the example user content, feel free to change the user content to match your own preferences and see what the model suggests. Note that ratings are between 0.5 and 5.0, inclusive, in half-step increments." 704 | ] 705 | }, 706 | { 707 | "cell_type": "code", 708 | "execution_count": 15, 709 | "metadata": { 710 | "id": "4_7nZyPiVJ4r" 711 | }, 712 | "outputs": [], 713 | "source": [ 714 | "new_user_id = 5000\n", 715 | "new_rating_ave = 1.0\n", 716 | "new_action = 1.0\n", 717 | "new_adventure = 1\n", 718 | "new_animation = 1\n", 719 | "new_childrens = 1\n", 720 | "new_comedy = 5\n", 721 | "new_crime = 1\n", 722 | "new_documentary = 1\n", 723 | "new_drama = 1\n", 724 | "new_fantasy = 1\n", 725 | "new_horror = 1\n", 726 | "new_mystery = 1\n", 727 | "new_romance = 5\n", 728 | "new_scifi = 5\n", 729 | "new_thriller = 1\n", 730 | "new_rating_count = 3\n", 731 | "\n", 732 | "user_vec = np.array([[new_user_id, new_rating_count, new_rating_ave,\n", 733 | " new_action, new_adventure, new_animation, new_childrens,\n", 734 | " new_comedy, new_crime, new_documentary,\n", 735 | " new_drama, new_fantasy, new_horror, new_mystery,\n", 736 | " new_romance, new_scifi, new_thriller]])" 737 | ] 738 | }, 739 | { 740 | "cell_type": "markdown", 741 | "metadata": {}, 742 | "source": [ 743 | "\n", 744 | "Let's look at the top-rated movies for the new user. Recall, the user vector had genres that favored Comedy and Romance.\n", 745 | "Below, we'll use a set of movie/item vectors, `item_vecs` that have a vector for each movie in the training/test set. This is matched with the user vector above and the scaled vectors are used to predict ratings for all the movies for our new user above." 746 | ] 747 | }, 748 | { 749 | "cell_type": "code", 750 | "execution_count": 16, 751 | "metadata": {}, 752 | "outputs": [ 753 | { 754 | "data": { 755 | "text/html": [ 756 | "\n", 757 | "\n", 758 | "\n", 759 | "\n", 760 | "\n", 761 | "\n", 762 | "\n", 763 | "\n", 764 | "\n", 765 | "\n", 766 | "\n", 767 | "\n", 768 | "\n", 769 | "\n", 770 | "\n", 771 | "\n", 772 | "
y_p movie id rating avetitle genres
4.86762 64969 3.61765Yes Man (2008) Comedy
4.86692 69122 3.63158Hangover, The (2009) Comedy|Crime
4.86477 63131 3.625 Role Models (2008) Comedy
4.85853 60756 3.55357Step Brothers (2008) Comedy
4.85785 68135 3.55 17 Again (2009) Comedy|Drama
4.85178 78209 3.55 Get Him to the Greek (2010)Comedy
4.85138 8622 3.48649Fahrenheit 9/11 (2004) Documentary
4.8505 67087 3.52941I Love You, Man (2009) Comedy
4.85043 69784 3.65 Brüno (Bruno) (2009) Comedy
4.84934 89864 3.6315850/50 (2011) Comedy|Drama
" 773 | ], 774 | "text/plain": [ 775 | "'\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n
y_p movie id rating avetitle genres
4.86762 64969 3.61765Yes Man (2008) Comedy
4.86692 69122 3.63158Hangover, The (2009) Comedy|Crime
4.86477 63131 3.625 Role Models (2008) Comedy
4.85853 60756 3.55357Step Brothers (2008) Comedy
4.85785 68135 3.55 17 Again (2009) Comedy|Drama
4.85178 78209 3.55 Get Him to the Greek (2010)Comedy
4.85138 8622 3.48649Fahrenheit 9/11 (2004) Documentary
4.8505 67087 3.52941I Love You, Man (2009) Comedy
4.85043 69784 3.65 Brüno (Bruno) (2009) Comedy
4.84934 89864 3.6315850/50 (2011) Comedy|Drama
'" 776 | ] 777 | }, 778 | "execution_count": 16, 779 | "metadata": {}, 780 | "output_type": "execute_result" 781 | } 782 | ], 783 | "source": [ 784 | "# generate and replicate the user vector to match the number movies in the data set.\n", 785 | "user_vecs = gen_user_vecs(user_vec,len(item_vecs))\n", 786 | "\n", 787 | "# scale the vectors and make predictions for all movies. Return results sorted by rating.\n", 788 | "sorted_index, sorted_ypu, sorted_items, sorted_user = predict_uservec(user_vecs, item_vecs, model, u_s, i_s, \n", 789 | " scaler, scalerUser, scalerItem, scaledata=scaledata)\n", 790 | "\n", 791 | "print_pred_movies(sorted_ypu, sorted_user, sorted_items, movie_dict, maxcount = 10)" 792 | ] 793 | }, 794 | { 795 | "cell_type": "markdown", 796 | "metadata": {}, 797 | "source": [ 798 | "If you do create a user above, it is worth noting that the network was trained to predict a user rating given a user vector that includes a **set** of user genre ratings. Simply providing a maximum rating for a single genre and minimum ratings for the rest may not be meaningful to the network if there were no users with similar sets of ratings." 799 | ] 800 | }, 801 | { 802 | "cell_type": "markdown", 803 | "metadata": {}, 804 | "source": [ 805 | "#### Predictions for an existing user.\n", 806 | "Let's look at the predictions for \"user 36\", one of the users in the data set. We can compare the predicted ratings with the model's ratings. Note that movies with multiple genre's show up multiple times in the training data. For example,'The Time Machine' has three genre's: Adventure, Action, Sci-Fi" 807 | ] 808 | }, 809 | { 810 | "cell_type": "code", 811 | "execution_count": 17, 812 | "metadata": {}, 813 | "outputs": [ 814 | { 815 | "data": { 816 | "text/html": [ 817 | "\n", 818 | "\n", 819 | "\n", 820 | "\n", 821 | "\n", 822 | "\n", 823 | "\n", 824 | "\n", 825 | "\n", 826 | "\n", 827 | "\n", 828 | "\n", 829 | "\n", 830 | "\n", 831 | "\n", 832 | "
y_p y user user genre ave movie rating avetitle genres
3.13.0 36 3.00 2.86Time Machine, The (2002)Adventure
3.03.0 36 3.00 2.86Time Machine, The (2002)Action
2.83.0 36 3.00 2.86Time Machine, The (2002)Sci-Fi
2.31.0 36 1.00 4.00Beautiful Mind, A (2001)Romance
2.21.0 36 1.50 4.00Beautiful Mind, A (2001)Drama
1.61.5 36 1.75 3.52Road to Perdition (2002)Crime
1.62.0 36 1.75 3.52Gangs of New York (2002)Crime
1.51.5 36 1.50 3.52Road to Perdition (2002)Drama
1.52.0 36 1.50 3.52Gangs of New York (2002)Drama
" 833 | ], 834 | "text/plain": [ 835 | "'\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n
y_p y user user genre ave movie rating avetitle genres
3.13.0 36 3.00 2.86Time Machine, The (2002)Adventure
3.03.0 36 3.00 2.86Time Machine, The (2002)Action
2.83.0 36 3.00 2.86Time Machine, The (2002)Sci-Fi
2.31.0 36 1.00 4.00Beautiful Mind, A (2001)Romance
2.21.0 36 1.50 4.00Beautiful Mind, A (2001)Drama
1.61.5 36 1.75 3.52Road to Perdition (2002)Crime
1.62.0 36 1.75 3.52Gangs of New York (2002)Crime
1.51.5 36 1.50 3.52Road to Perdition (2002)Drama
1.52.0 36 1.50 3.52Gangs of New York (2002)Drama
'" 836 | ] 837 | }, 838 | "execution_count": 17, 839 | "metadata": {}, 840 | "output_type": "execute_result" 841 | } 842 | ], 843 | "source": [ 844 | "uid = 36 \n", 845 | "# form a set of user vectors. This is the same vector, transformed and repeated.\n", 846 | "user_vecs, y_vecs = get_user_vecs(uid, scalerUser.inverse_transform(user_train), item_vecs, user_to_genre)\n", 847 | "\n", 848 | "# scale the vectors and make predictions for all movies. Return results sorted by rating.\n", 849 | "sorted_index, sorted_ypu, sorted_items, sorted_user = predict_uservec(user_vecs, item_vecs, model, u_s, i_s, scaler, \n", 850 | " scalerUser, scalerItem, scaledata=scaledata)\n", 851 | "sorted_y = y_vecs[sorted_index]\n", 852 | "\n", 853 | "#print sorted predictions\n", 854 | "print_existing_user(sorted_ypu, sorted_y.reshape(-1,1), sorted_user, sorted_items, item_features, ivs, uvs, movie_dict, maxcount = 10)" 855 | ] 856 | }, 857 | { 858 | "cell_type": "markdown", 859 | "metadata": {}, 860 | "source": [ 861 | "#### Finding Similar Items\n", 862 | "The neural network above produces two feature vectors, a user feature vector $v_u$, and a movie feature vector, $v_m$. These are 32 entry vectors whose values are difficult to interpret. However, similar items will have similar vectors. This information can be used to make recommendations. For example, if a user has rated \"Toy Story 3\" highly, one could recommend similar movies by selecting movies with similar movie feature vectors.\n", 863 | "\n", 864 | "A similarity measure is the squared distance between the two vectors $ \\mathbf{v_m^{(k)}}$ and $\\mathbf{v_m^{(i)}}$ :\n", 865 | "$$\\left\\Vert \\mathbf{v_m^{(k)}} - \\mathbf{v_m^{(i)}} \\right\\Vert^2 = \\sum_{l=1}^{n}(v_{m_l}^{(k)} - v_{m_l}^{(i)})^2\\tag{1}$$" 866 | ] 867 | }, 868 | { 869 | "cell_type": "markdown", 870 | "metadata": {}, 871 | "source": [ 872 | "\n", 873 | "### Exercise 1\n", 874 | "\n", 875 | "Write a function to compute the square distance." 876 | ] 877 | }, 878 | { 879 | "cell_type": "code", 880 | "execution_count": 25, 881 | "metadata": {}, 882 | "outputs": [], 883 | "source": [ 884 | "# GRADED_FUNCTION: sq_dist\n", 885 | "# UNQ_C2\n", 886 | "def sq_dist(a,b):\n", 887 | " \"\"\"\n", 888 | " Returns the squared distance between two vectors\n", 889 | " Args:\n", 890 | " a (ndarray (n,)): vector with n features\n", 891 | " b (ndarray (n,)): vector with n features\n", 892 | " Returns:\n", 893 | " d (float) : distance\n", 894 | " \"\"\"\n", 895 | " ### START CODE HERE ###\n", 896 | " d = 0.\n", 897 | " for i in range(len(a)):\n", 898 | " d += (a[i]-b[i])**2\n", 899 | " ### END CODE HERE ### \n", 900 | " return (d)" 901 | ] 902 | }, 903 | { 904 | "cell_type": "code", 905 | "execution_count": 26, 906 | "metadata": {}, 907 | "outputs": [ 908 | { 909 | "name": "stdout", 910 | "output_type": "stream", 911 | "text": [ 912 | "\u001b[92mAll tests passed!\n" 913 | ] 914 | } 915 | ], 916 | "source": [ 917 | "# Public tests\n", 918 | "test_sq_dist(sq_dist)" 919 | ] 920 | }, 921 | { 922 | "cell_type": "code", 923 | "execution_count": 27, 924 | "metadata": {}, 925 | "outputs": [ 926 | { 927 | "name": "stdout", 928 | "output_type": "stream", 929 | "text": [ 930 | "squared distance between a1 and b1: 0.0\n", 931 | "squared distance between a2 and b2: 0.030000000000000054\n", 932 | "squared distance between a3 and b3: 2.0\n" 933 | ] 934 | } 935 | ], 936 | "source": [ 937 | "a1 = np.array([1.0, 2.0, 3.0]); b1 = np.array([1.0, 2.0, 3.0])\n", 938 | "a2 = np.array([1.1, 2.1, 3.1]); b2 = np.array([1.0, 2.0, 3.0])\n", 939 | "a3 = np.array([0, 1, 0]); b3 = np.array([1, 0, 0])\n", 940 | "print(f\"squared distance between a1 and b1: {sq_dist(a1, b1)}\")\n", 941 | "print(f\"squared distance between a2 and b2: {sq_dist(a2, b2)}\")\n", 942 | "print(f\"squared distance between a3 and b3: {sq_dist(a3, b3)}\")" 943 | ] 944 | }, 945 | { 946 | "cell_type": "markdown", 947 | "metadata": {}, 948 | "source": [ 949 | "
\n", 950 | " Click for hints\n", 951 | " \n", 952 | " While a summation is often an indication a for loop should be used, here the subtraction can be element-wise in one statement. Further, you can utilized np.square to square, element-wise, the result of the subtraction. np.sum can be used to sum the squared elements.\n", 953 | " \n", 954 | "
\n", 955 | "\n", 956 | " \n" 957 | ] 958 | }, 959 | { 960 | "cell_type": "markdown", 961 | "metadata": {}, 962 | "source": [ 963 | "A matrix of distances between movies can be computed once when the model is trained and then reused for new recommendations without retraining. The first step, once a model is trained, is to obtain the movie feature vector, $v_m$, for each of the movies. To do this, we will use the trained `item_NN` and build a small model to allow us to run the movie vectors through it to generate $v_m$." 964 | ] 965 | }, 966 | { 967 | "cell_type": "code", 968 | "execution_count": 28, 969 | "metadata": {}, 970 | "outputs": [ 971 | { 972 | "name": "stdout", 973 | "output_type": "stream", 974 | "text": [ 975 | "Model: \"model_1\"\n", 976 | "__________________________________________________________________________________________________\n", 977 | "Layer (type) Output Shape Param # Connected to \n", 978 | "==================================================================================================\n", 979 | "input_3 (InputLayer) [(None, 16)] 0 \n", 980 | "__________________________________________________________________________________________________\n", 981 | "sequential_1 (Sequential) (None, 32) 41376 input_3[0][0] \n", 982 | "__________________________________________________________________________________________________\n", 983 | "tf_op_layer_l2_normalize_2/Squa [(None, 32)] 0 sequential_1[1][0] \n", 984 | "__________________________________________________________________________________________________\n", 985 | "tf_op_layer_l2_normalize_2/Sum [(None, 1)] 0 tf_op_layer_l2_normalize_2/Square\n", 986 | "__________________________________________________________________________________________________\n", 987 | "tf_op_layer_l2_normalize_2/Maxi [(None, 1)] 0 tf_op_layer_l2_normalize_2/Sum[0]\n", 988 | "__________________________________________________________________________________________________\n", 989 | "tf_op_layer_l2_normalize_2/Rsqr [(None, 1)] 0 tf_op_layer_l2_normalize_2/Maximu\n", 990 | "__________________________________________________________________________________________________\n", 991 | "tf_op_layer_l2_normalize_2 (Ten [(None, 32)] 0 sequential_1[1][0] \n", 992 | " tf_op_layer_l2_normalize_2/Rsqrt[\n", 993 | "==================================================================================================\n", 994 | "Total params: 41,376\n", 995 | "Trainable params: 41,376\n", 996 | "Non-trainable params: 0\n", 997 | "__________________________________________________________________________________________________\n" 998 | ] 999 | } 1000 | ], 1001 | "source": [ 1002 | "input_item_m = tf.keras.layers.Input(shape=(num_item_features)) # input layer\n", 1003 | "vm_m = item_NN(input_item_m) # use the trained item_NN\n", 1004 | "vm_m = tf.linalg.l2_normalize(vm_m, axis=1) # incorporate normalization as was done in the original model\n", 1005 | "model_m = Model(input_item_m, vm_m) \n", 1006 | "model_m.summary()" 1007 | ] 1008 | }, 1009 | { 1010 | "cell_type": "markdown", 1011 | "metadata": {}, 1012 | "source": [ 1013 | "Once you have a movie model, you can create a set of movie feature vectors by using the model to predict using a set of item/movie vectors as input. `item_vecs` is a set of all of the movie vectors. Recall that the same movie will appear as a separate vector for each of its genres. It must be scaled to use with the trained model. The result of the prediction is a 32 entry feature vector for each movie." 1014 | ] 1015 | }, 1016 | { 1017 | "cell_type": "code", 1018 | "execution_count": 29, 1019 | "metadata": {}, 1020 | "outputs": [ 1021 | { 1022 | "name": "stdout", 1023 | "output_type": "stream", 1024 | "text": [ 1025 | "size of all predicted movie feature vectors: (1883, 32)\n" 1026 | ] 1027 | } 1028 | ], 1029 | "source": [ 1030 | "scaled_item_vecs = scalerItem.transform(item_vecs)\n", 1031 | "vms = model_m.predict(scaled_item_vecs[:,i_s:])\n", 1032 | "print(f\"size of all predicted movie feature vectors: {vms.shape}\")" 1033 | ] 1034 | }, 1035 | { 1036 | "cell_type": "markdown", 1037 | "metadata": {}, 1038 | "source": [ 1039 | "Let's now compute a matrix of the squared distance between each movie feature vector and all other movie feature vectors:\n", 1040 | "
\n", 1041 | " \n", 1042 | "
" 1043 | ] 1044 | }, 1045 | { 1046 | "cell_type": "markdown", 1047 | "metadata": {}, 1048 | "source": [ 1049 | "We can then find the closest movie by finding the minimum along each row. We will make use of [numpy masked arrays](https://numpy.org/doc/1.21/user/tutorial-ma.html) to avoid selecting the same movie. The masked values along the diagonal won't be included in the computation." 1050 | ] 1051 | }, 1052 | { 1053 | "cell_type": "code", 1054 | "execution_count": null, 1055 | "metadata": {}, 1056 | "outputs": [], 1057 | "source": [ 1058 | "count = 50\n", 1059 | "dim = len(vms)\n", 1060 | "dist = np.zeros((dim,dim))\n", 1061 | "\n", 1062 | "for i in range(dim):\n", 1063 | " for j in range(dim):\n", 1064 | " dist[i,j] = sq_dist(vms[i, :], vms[j, :])\n", 1065 | " \n", 1066 | "m_dist = ma.masked_array(dist, mask=np.identity(dist.shape[0])) # mask the diagonal\n", 1067 | "\n", 1068 | "disp = [[\"movie1\", \"genres\", \"movie2\", \"genres\"]]\n", 1069 | "for i in range(count):\n", 1070 | " min_idx = np.argmin(m_dist[i])\n", 1071 | " movie1_id = int(item_vecs[i,0])\n", 1072 | " movie2_id = int(item_vecs[min_idx,0])\n", 1073 | " genre1,_ = get_item_genre(item_vecs[i,:], ivs, item_features)\n", 1074 | " genre2,_ = get_item_genre(item_vecs[min_idx,:], ivs, item_features)\n", 1075 | "\n", 1076 | " disp.append( [movie_dict[movie1_id]['title'], genre1,\n", 1077 | " movie_dict[movie2_id]['title'], genre2]\n", 1078 | " )\n", 1079 | "table = tabulate.tabulate(disp, tablefmt='html', headers=\"firstrow\", floatfmt=[\".1f\", \".1f\", \".0f\", \".2f\", \".2f\"])\n", 1080 | "table" 1081 | ] 1082 | }, 1083 | { 1084 | "cell_type": "markdown", 1085 | "metadata": {}, 1086 | "source": [ 1087 | "The results show the model will suggest a movie from the same genre." 1088 | ] 1089 | }, 1090 | { 1091 | "cell_type": "markdown", 1092 | "metadata": {}, 1093 | "source": [ 1094 | "\n", 1095 | "## 4 - Congratulations! \n", 1096 | "You have completed a content-based recommender system. \n", 1097 | "\n", 1098 | "This structure is the basis of many commercial recommender systems. The user content can be greatly expanded to incorporate more information about the user if it is available. Items are not limited to movies. This can be used to recommend any item, books, cars or items that are similar to an item in your 'shopping cart'." 1099 | ] 1100 | } 1101 | ], 1102 | "metadata": { 1103 | "colab": { 1104 | "authorship_tag": "ABX9TyOFYdA6zQJ1FpgYwYmRIeXa", 1105 | "collapsed_sections": [], 1106 | "name": "Recsys_NN.ipynb", 1107 | "private_outputs": true, 1108 | "provenance": [ 1109 | { 1110 | "file_id": "1RO0HLb7kRE0Tj_0D4E5I-vQz2QLu3CUm", 1111 | "timestamp": 1655169179306 1112 | } 1113 | ] 1114 | }, 1115 | "gpuClass": "standard", 1116 | "kernelspec": { 1117 | "display_name": "Python 3", 1118 | "language": "python", 1119 | "name": "python3" 1120 | }, 1121 | "language_info": { 1122 | "codemirror_mode": { 1123 | "name": "ipython", 1124 | "version": 3 1125 | }, 1126 | "file_extension": ".py", 1127 | "mimetype": "text/x-python", 1128 | "name": "python", 1129 | "nbconvert_exporter": "python", 1130 | "pygments_lexer": "ipython3", 1131 | "version": "3.7.6" 1132 | } 1133 | }, 1134 | "nbformat": 4, 1135 | "nbformat_minor": 4 1136 | } 1137 | --------------------------------------------------------------------------------