├── ALS_Matrix_Factorization.ipynb
├── Deep_Matrix_Factorization.ipynb
├── Deep_Recommender_Tutorial_Strata_NY_2017.pdf
├── Deep_Wide_Learning.ipynb
├── LICENSE
└── README.md


/ALS_Matrix_Factorization.ipynb:
--------------------------------------------------------------------------------
   1 | {
   2 |  "cells": [
   3 |   {
   4 |    "cell_type": "markdown",
   5 |    "metadata": {},
   6 |    "source": [
   7 |     "In this notebook, we demonstrate how to use tensorflow API to design and implement a matrix factorization model for predicting the movie ratings on the movie lens 1m dataset.\n",
   8 |     "\n",
   9 |     "In our dataset, we have user ids and movie ids and ratings from each user on different movies. This gives us information of user preferences. \n",
  10 |     "\n",
  11 |     "Using this information, we want to learn movie and user features and use these features to predict the rating, so that we can recommend the potentially highest rated movies to our users. \n",
  12 |     "\n",
  13 |     "![ALS matrix factorization model](Picture1.png)\n",
  14 |     "\n",
  15 |     "- users $\\{1,...,N\\}$ with user features $\\{\\theta_1,...,\\theta_N\\} =: \\theta$\n",
  16 |     "- movies $\\{1,...,M\\}$ with movie features $\\{\\phi_1,...,\\phi_M\\} =: \\phi$\n",
  17 |     "- dataset $D = \\{r_{i,j}: user\\ i\\ has\\ rated\\ movie\\ j\\}$ of all ratings\n",
  18 |     "- predicted rating of user $i$ of movie $j$: $\\hat{r}_{i,j} = \\theta_i^T \\cdot \\phi_j$\n",
  19 |     "- cost function $J(\\theta, \\phi)  = \\sum_{r_{i,j} \\in D} (r_{i,j} - \\hat{r}_{i,j})^2$\n",
  20 |     "- We will use ALS method for matrix factorization. This means we calculate two gradients of our cost function $\\nabla_\\theta J$ and $\\nabla_\\phi J$ independently in order to update our parameters $\\theta$ and $\\phi$ one after another.\n",
  21 |     " \n",
  22 |     "We will achieve this in following steps:\n",
  23 |     "1. Load the movielens data into a pandas dataframe for preprocessing\n",
  24 |     "2. Set up our parameters: $\\theta$ and $\\phi$\n",
  25 |     "3. Load the data into a tensorflow variable\n",
  26 |     "4. Define the prediction and cost\n",
  27 |     "5. Start the session and run training, evaluation and finally predictions"
  28 |    ]
  29 |   },
  30 |   {
  31 |    "cell_type": "code",
  32 |    "execution_count": 1,
  33 |    "metadata": {
  34 |     "collapsed": true
  35 |    },
  36 |    "outputs": [],
  37 |    "source": [
  38 |     "import tensorflow as tf\n",
  39 |     "import numpy as np\n",
  40 |     "import pandas as pd\n",
  41 |     "import matplotlib.pyplot as plt\n",
  42 |     "%matplotlib inline"
  43 |    ]
  44 |   },
  45 |   {
  46 |    "cell_type": "markdown",
  47 |    "metadata": {},
  48 |    "source": [
  49 |     "### download data"
  50 |    ]
  51 |   },
  52 |   {
  53 |    "cell_type": "markdown",
  54 |    "metadata": {},
  55 |    "source": [
  56 |     "First download the data from movie lens website. Once the data file is unzipped, there are three files that we are using-- user.csv, movie.csv and rating.csv."
  57 |    ]
  58 |   },
  59 |   {
  60 |    "cell_type": "code",
  61 |    "execution_count": 2,
  62 |    "metadata": {
  63 |     "collapsed": true
  64 |    },
  65 |    "outputs": [],
  66 |    "source": [
  67 |     "#! wget -q http://files.grouplens.org/datasets/movielens/ml-1m.zip\n",
  68 |     "#! unzip ml-1m.zip"
  69 |    ]
  70 |   },
  71 |   {
  72 |    "cell_type": "markdown",
  73 |    "metadata": {},
  74 |    "source": [
  75 |     "### data preprocessing"
  76 |    ]
  77 |   },
  78 |   {
  79 |    "cell_type": "markdown",
  80 |    "metadata": {},
  81 |    "source": [
  82 |     "step 1: load data to pandas dataframes, merge the dataframes to one table."
  83 |    ]
  84 |   },
  85 |   {
  86 |    "cell_type": "code",
  87 |    "execution_count": 3,
  88 |    "metadata": {
  89 |     "collapsed": true
  90 |    },
  91 |    "outputs": [],
  92 |    "source": [
  93 |     "age_desc = { 1: \"Under 18\", 18: \"18-24\", 25: \"25-34\", 35: \"35-44\", 45: \"45-49\", 50: \"50-55\", 56: \"56+\" }\n",
  94 |     "occupation_desc = { 0: \"other or not specified\", 1: \"academic/educator\", 2: \"artist\", 3: \"clerical/admin\",\n",
  95 |     "                4: \"college/grad student\", 5: \"customer service\", 6: \"doctor/health care\",\n",
  96 |     "                7: \"executive/managerial\", 8: \"farmer\", 9: \"homemaker\", 10: \"K-12 student\", 11: \"lawyer\",\n",
  97 |     "                12: \"programmer\", 13: \"retired\", 14: \"sales/marketing\", 15: \"scientist\", 16: \"self-employed\",\n",
  98 |     "                17: \"technician/engineer\", 18: \"tradesman/craftsman\", 19: \"unemployed\", 20: \"writer\" }\n",
  99 |     "\n",
 100 |     "rating_data = pd.read_csv(\n",
 101 |     "    \"ml-1m/ratings.dat\",\n",
 102 |     "    sep=\"::\",\n",
 103 |     "    engine=\"python\",\n",
 104 |     "    encoding=\"latin-1\",\n",
 105 |     "    names=['userid', 'movieid', 'rating', 'timestamp'])\n",
 106 |     "\n",
 107 |     "user_data = pd.read_csv(\n",
 108 |     "    \"ml-1m/users.dat\", \n",
 109 |     "    sep='::', \n",
 110 |     "    engine='python', \n",
 111 |     "    encoding='latin-1',\n",
 112 |     "    names=['userid', 'gender', 'age', 'occupation', 'zipcode']\n",
 113 |     ")\n",
 114 |     "user_data['age_desc'] = user_data['age'].apply(lambda x: age_desc[x])\n",
 115 |     "user_data['occ_desc'] = user_data['occupation'].apply(lambda x: occupation_desc[x])\n",
 116 |     "\n",
 117 |     "movie_data = pd.read_csv(\n",
 118 |     "    \"ml-1m/movies.dat\",\n",
 119 |     "    sep='::', \n",
 120 |     "    engine='python', \n",
 121 |     "    encoding='latin-1',\n",
 122 |     "    names=['movieid', 'title', 'genre']\n",
 123 |     ")\n",
 124 |     "\n",
 125 |     "dataset = pd.merge(pd.merge(rating_data, movie_data, how=\"left\", on=\"movieid\"), user_data, how=\"left\", on=\"userid\")"
 126 |    ]
 127 |   },
 128 |   {
 129 |    "cell_type": "markdown",
 130 |    "metadata": {},
 131 |    "source": [
 132 |     "Step 2: preprocess the movie id and user id"
 133 |    ]
 134 |   },
 135 |   {
 136 |    "cell_type": "code",
 137 |    "execution_count": 4,
 138 |    "metadata": {
 139 |     "collapsed": true
 140 |    },
 141 |    "outputs": [],
 142 |    "source": [
 143 |     "def check_cols(df, cols):\n",
 144 |     "    \"\"\"\n",
 145 |     "    check if there are gaps of index, and if the index starts from 0\n",
 146 |     "    \n",
 147 |     "    Arguments:\n",
 148 |     "    df -- dataframe of the dataset\n",
 149 |     "    cols -- dataframe columns that needs to be checked, in our case user id and movie id\n",
 150 |     "    \n",
 151 |     "    Returns:\n",
 152 |     "    a list of tuple [('COLUMN_NAME', boolean)], if True, the column needs to be fixed, if False, the column is ok.\n",
 153 |     "    \"\"\"\n",
 154 |     "    return [(col, False) if len(dataset[col].unique())-1 == dataset[col].max() else (col, True) for col in cols]\n",
 155 |     "\n",
 156 |     "def remove_gaps(df, col):\n",
 157 |     "    \"\"\"\n",
 158 |     "    preprocess the index of user id and movie id to start from 0 and eliminate gaps in the index\n",
 159 |     "    \n",
 160 |     "    Arguments:\n",
 161 |     "    df -- dataframe of the dataset\n",
 162 |     "    col -- dataframe columns that needs to be adjusted, in our case both user id and movie id\n",
 163 |     "    \n",
 164 |     "    Returns:\n",
 165 |     "    a dataframe with adjusted columns.\n",
 166 |     "    \"\"\"\n",
 167 |     "    adj_col_uni = df[col].sort_values().unique()\n",
 168 |     "    adj_df = pd.DataFrame(adj_col_uni).reset_index().rename(columns = {0: col, 'index': \"adj_%s\"%(col,)})\n",
 169 |     "    return pd.merge(adj_df, df, how=\"right\", on=col)"
 170 |    ]
 171 |   },
 172 |   {
 173 |    "cell_type": "code",
 174 |    "execution_count": 5,
 175 |    "metadata": {},
 176 |    "outputs": [
 177 |     {
 178 |      "name": "stdout",
 179 |      "output_type": "stream",
 180 |      "text": [
 181 |       "before fix:\n",
 182 |       "userid needs fix!\n",
 183 |       "movieid needs fix!\n",
 184 |       "\n",
 185 |       "after fix\n",
 186 |       "adj_userid ok.\n",
 187 |       "adj_movieid ok.\n"
 188 |      ]
 189 |     }
 190 |    ],
 191 |    "source": [
 192 |     "index_cols = [\"userid\", \"movieid\"]\n",
 193 |     "cols_check = check_cols(dataset, index_cols)\n",
 194 |     "print_check = lambda check: print(*[\"%s needs fix!\"%(c,) if f else \"%s ok.\"%(c,) for c, f in check], sep=\"\\n\")\n",
 195 |     "print(\"before fix:\")\n",
 196 |     "print_check(cols_check)\n",
 197 |     "for col, needs_fix in cols_check:\n",
 198 |     "    if needs_fix:\n",
 199 |     "        dataset = remove_gaps(dataset, col)\n",
 200 |     "\n",
 201 |     "print(\"\\nafter fix\")\n",
 202 |     "print_check(check_cols(dataset, [\"adj_userid\", \"adj_movieid\"]))"
 203 |    ]
 204 |   },
 205 |   {
 206 |    "cell_type": "markdown",
 207 |    "metadata": {},
 208 |    "source": [
 209 |     "step 3: shuffle data and split the data to train and validation set."
 210 |    ]
 211 |   },
 212 |   {
 213 |    "cell_type": "code",
 214 |    "execution_count": 6,
 215 |    "metadata": {
 216 |     "collapsed": true
 217 |    },
 218 |    "outputs": [],
 219 |    "source": [
 220 |     "dataset = dataset.sample(frac=1, replace=False)\n",
 221 |     "n_split = int(len(dataset)*.7)\n",
 222 |     "trainset = dataset[:n_split]\n",
 223 |     "validset = dataset[n_split:]"
 224 |    ]
 225 |   },
 226 |   {
 227 |    "cell_type": "markdown",
 228 |    "metadata": {},
 229 |    "source": [
 230 |     "### build the model"
 231 |    ]
 232 |   },
 233 |   {
 234 |    "cell_type": "markdown",
 235 |    "metadata": {},
 236 |    "source": [
 237 |     ">Recall that following is our model:\n",
 238 |     "<div class=\"alert alert-block alert-info\">\n",
 239 |     "- users $\\{1,...,N\\}$ with user features $\\{\\theta_1,...,\\theta_N\\} =: \\theta$<br>\n",
 240 |     "- movies $\\{1,...,M\\}$ with movie features $\\{\\phi_1,...,\\phi_M\\} =: \\phi$<br>\n",
 241 |     "- dataset $D = \\{r_{i,j}: user\\ i\\ has\\ rated\\ movie\\ j\\}$ of all ratings<br>\n",
 242 |     "- predicted rating of user $i$ of movie $j$: $\\hat{r}_{i,j} = \\theta_i^T \\cdot \\phi_j$<br>\n",
 243 |     "- cost function $J(\\theta, \\phi)  = \\sum_{r_{i,j} \\in D} (r_{i,j} - \\hat{r}_{i,j})^2$<br>\n",
 244 |     "- We will use ALS method for matrix factorization. This means we calculate two gradients of our cost function $\\nabla_\\theta J$ and $\\nabla_\\phi J$ independently in order to update our parameters $\\theta$ and $\\phi$ one after another.</div>"
 245 |    ]
 246 |   },
 247 |   {
 248 |    "cell_type": "code",
 249 |    "execution_count": 10,
 250 |    "metadata": {
 251 |     "collapsed": true
 252 |    },
 253 |    "outputs": [],
 254 |    "source": [
 255 |     "def initialize_features(num_users, num_movies, dim):\n",
 256 |     "    \"\"\"\n",
 257 |     "    Initialize features. User_features and movie_features need to be trained by the matrix factorization model.\n",
 258 |     "    \n",
 259 |     "    Arguments:\n",
 260 |     "    num_users -- number of users\n",
 261 |     "    num_movies -- number of movies\n",
 262 |     "    dim -- dimension of learned user and movie features, it's a hyper-parameter\n",
 263 |     "    \n",
 264 |     "    Returns:\n",
 265 |     "    user_features -- a matrix (variable) of shape [number of users, dim]\n",
 266 |     "    movie_features -- a matrix (variable) of shape [number of movies, dim]\n",
 267 |     "    \"\"\" \n",
 268 |     "    user_features = tf.get_variable(\n",
 269 |     "        \"theta\",\n",
 270 |     "        shape = [num_users, dim],\n",
 271 |     "        dtype = tf.float32,\n",
 272 |     "        initializer = tf.truncated_normal_initializer(mean=0, stddev=.05)\n",
 273 |     "    )\n",
 274 |     "    movie_features = tf.get_variable(\n",
 275 |     "        \"phi\",\n",
 276 |     "        shape = [num_movies, dim],\n",
 277 |     "        dtype = tf.float32,\n",
 278 |     "        initializer = tf.truncated_normal_initializer(mean=0, stddev=.05)\n",
 279 |     "    )\n",
 280 |     "    return user_features, movie_features\n",
 281 |     "\n",
 282 |     "def create_dataset(user_ids, movie_ids, ratings):\n",
 283 |     "    \"\"\"\n",
 284 |     "    Load user id, movie id and rating values. Turn numpy array to tensors.\n",
 285 |     "    \n",
 286 |     "    Arguments:\n",
 287 |     "    user_ids -- user index\n",
 288 |     "    movie_ids -- movies index\n",
 289 |     "    ratings -- true rating value\n",
 290 |     "    \n",
 291 |     "    Returns:\n",
 292 |     "    user_id_var -- a constant of shape [number of training examples, 1]\n",
 293 |     "    movie_id_var -- a constant of shape [number of training examples, 1]\n",
 294 |     "    ratings_var -- a constant of shape [number of training examples, 1]\n",
 295 |     "    \"\"\" \n",
 296 |     "    user_id_var = tf.constant(name=\"userid\", value=user_ids)\n",
 297 |     "    movie_id_var = tf.constant(name=\"movieid\", value=movie_ids)\n",
 298 |     "    ratings_var = tf.constant(name=\"ratings\", value=np.asarray(ratings, dtype=np.float32))\n",
 299 |     "    return user_id_var, movie_id_var, ratings_var\n",
 300 |     "   \n",
 301 |     "def lookup_features(user_features, movie_features, user_ids, movie_ids): \n",
 302 |     "    \"\"\"\n",
 303 |     "    Retrieve embeddings based on user ids and movie ids respectively.\n",
 304 |     "    We use tf.gather function for this. tf.gather gathers slices from params according to indices.\n",
 305 |     "    \n",
 306 |     "    Arguments:\n",
 307 |     "    user_features -- shape [number of user ids, dim]\n",
 308 |     "    movie_features -- shape [number of movie ids, dim]\n",
 309 |     "    user_ids -- user id tensor (in our case loaded user ids from dataset)\n",
 310 |     "    movie_ids -- movie id tensor (in our case loaded movie ids from dataset)\n",
 311 |     "    \n",
 312 |     "    Returns:\n",
 313 |     "    selected_user_features -- a tensor of shape [number of examples, dim]\n",
 314 |     "    selected_movie_features -- a tensor of shape [number of examples, dim]\n",
 315 |     "    \"\"\" \n",
 316 |     "    selected_user_features = tf.gather(user_features, user_ids)\n",
 317 |     "    selected_movie_features = tf.gather(movie_features, movie_ids)\n",
 318 |     "    return selected_user_features, selected_movie_features\n",
 319 |     "\n",
 320 |     "def predict(selected_user_features, selected_movie_features):\n",
 321 |     "    \"\"\"\n",
 322 |     "    Calculate predictions. This is the dot product of user features and movie features. \n",
 323 |     "    For each training example, this corresponds to a single number.\n",
 324 |     "    \n",
 325 |     "    Arguments:\n",
 326 |     "    selected_user_features -- matrix of user features for each example -- shape [number of examples, dim]\n",
 327 |     "    selected_movie_features -- matrix of movies features value for each example -- shape [number of examples, dim]\n",
 328 |     "    \n",
 329 |     "    Returns:\n",
 330 |     "    selected_predictions -- a tensor of shape [number of examples, 1]\n",
 331 |     "    \"\"\" \n",
 332 |     "    selected_predictions = tf.reduce_sum(\n",
 333 |     "        selected_user_features * selected_movie_features,\n",
 334 |     "        axis = 1\n",
 335 |     "    )\n",
 336 |     "    ##alternatively, using tf.reduce_sum(tf.multiply(selected_user_embeddings,selected_movie_embeddings), axis=1)\n",
 337 |     "    return selected_predictions\n",
 338 |     "\n",
 339 |     "def mean_squared_difference(predictions, ratings):\n",
 340 |     "    \"\"\"\n",
 341 |     "    Calculate cost.\n",
 342 |     "    \n",
 343 |     "    Arguments:\n",
 344 |     "    predictions -- predicted ratings.\n",
 345 |     "    ratings -- true ratings.\n",
 346 |     "    \n",
 347 |     "    Returns:\n",
 348 |     "    difference -- mean squared error. It's a real number. \n",
 349 |     "    \"\"\" \n",
 350 |     "    difference = tf.reduce_mean(tf.squared_difference(predictions, ratings))\n",
 351 |     "    return difference"
 352 |    ]
 353 |   },
 354 |   {
 355 |    "cell_type": "markdown",
 356 |    "metadata": {},
 357 |    "source": [
 358 |     "### set hyper parameters"
 359 |    ]
 360 |   },
 361 |   {
 362 |    "cell_type": "code",
 363 |    "execution_count": 11,
 364 |    "metadata": {
 365 |     "collapsed": true
 366 |    },
 367 |    "outputs": [],
 368 |    "source": [
 369 |     "emb_dim = 8\n",
 370 |     "learning_rate = 50\n",
 371 |     "epochs = 1000"
 372 |    ]
 373 |   },
 374 |   {
 375 |    "cell_type": "markdown",
 376 |    "metadata": {},
 377 |    "source": [
 378 |     "### train model"
 379 |    ]
 380 |   },
 381 |   {
 382 |    "cell_type": "markdown",
 383 |    "metadata": {},
 384 |    "source": [
 385 |     "Here we define the tensorflow graph and create the session to compute the values.\n",
 386 |     "\n",
 387 |     "From the tensorflow documentation:\n",
 388 |     " - A graph defines the computation. It doesn’t compute anything, it doesn’t hold any values, it just defines the operations that you specified in your code.\n",
 389 |     " - A session allows to execute graphs or part of graphs. It allocates resources (on one or more machines) for that and holds the actual values of intermediate results and variables."
 390 |    ]
 391 |   },
 392 |   {
 393 |    "cell_type": "code",
 394 |    "execution_count": 12,
 395 |    "metadata": {
 396 |     "scrolled": false
 397 |    },
 398 |    "outputs": [
 399 |     {
 400 |      "name": "stdout",
 401 |      "output_type": "stream",
 402 |      "text": [
 403 |       "valid loss at step 1: 14.061668\n",
 404 |       "valid loss at step 101: 1.053597\n",
 405 |       "valid loss at step 201: 0.873390\n",
 406 |       "valid loss at step 301: 0.833664\n",
 407 |       "valid loss at step 401: 0.804873\n",
 408 |       "valid loss at step 501: 0.786872\n",
 409 |       "valid loss at step 601: 0.775171\n",
 410 |       "valid loss at step 701: 0.767974\n",
 411 |       "valid loss at step 801: 0.763356\n",
 412 |       "valid loss at step 901: 0.760175\n"
 413 |      ]
 414 |     }
 415 |    ],
 416 |    "source": [
 417 |     "with tf.Graph().as_default():\n",
 418 |     "    with tf.variable_scope(\"features\"):\n",
 419 |     "        usr_embs, mov_embs = initialize_features(len(dataset.adj_userid.unique()), len(dataset.adj_movieid.unique()), emb_dim)\n",
 420 |     "    with tf.variable_scope(\"train_set\"):\n",
 421 |     "        train_data = trainset[[\"adj_userid\", \"adj_movieid\", \"rating\"]].values.T #shape(3, 700146)\n",
 422 |     "        train_usr_ids, train_mov_ids, train_ratings = create_dataset(*train_data)# expend to 3 lists\n",
 423 |     "    with tf.variable_scope(\"valid_set\"):\n",
 424 |     "        valid_data = validset[[\"adj_userid\", \"adj_movieid\", \"rating\"]].values.T\n",
 425 |     "        valid_usr_ids, valid_mov_ids, valid_ratings = create_dataset(*valid_data)\n",
 426 |     "    with tf.variable_scope(\"training\"):\n",
 427 |     "        train_sel_usr_emb, train_sel_mov_emb = lookup_features(usr_embs, mov_embs, train_usr_ids, train_mov_ids)\n",
 428 |     "        train_preds = predict(train_sel_usr_emb, train_sel_mov_emb)\n",
 429 |     "        train_loss = mean_squared_difference(train_preds, train_ratings)\n",
 430 |     "        optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)\n",
 431 |     "        train_usr_embs = optimizer.minimize(train_loss, var_list=[usr_embs])\n",
 432 |     "        train_mov_embs = optimizer.minimize(train_loss, var_list=[mov_embs])   \n",
 433 |     "    with tf.variable_scope(\"validation\"):\n",
 434 |     "        valid_sel_usr_emb, valid_sel_mov_emb = lookup_features(usr_embs, mov_embs, valid_usr_ids, valid_mov_ids)\n",
 435 |     "        valid_preds = predict(valid_sel_usr_emb, valid_sel_mov_emb)\n",
 436 |     "        valid_loss = mean_squared_difference(valid_preds, valid_ratings)\n",
 437 |     "    with tf.Session() as sess:\n",
 438 |     "        writer = tf.summary.FileWriter('Graph/MF',sess.graph)\n",
 439 |     "        sess.run(tf.global_variables_initializer())\n",
 440 |     "        train_loss_history = []\n",
 441 |     "        valid_loss_history = []\n",
 442 |     "        for i in range(epochs):\n",
 443 |     "            current_train_loss, _ = sess.run([train_loss, train_usr_embs])\n",
 444 |     "            current_train_loss, _ = sess.run([train_loss, train_mov_embs])\n",
 445 |     "            current_valid_loss = sess.run(valid_loss)\n",
 446 |     "            if i%100 ==0:\n",
 447 |     "                print(\"valid loss at step %i: %f\"%(i+1, current_valid_loss))\n",
 448 |     "            train_loss_history.append(current_train_loss)\n",
 449 |     "            valid_loss_history.append(current_valid_loss)\n",
 450 |     "        final_user_features, final_movie_features = sess.run([usr_embs, mov_embs])\n",
 451 |     "        final_valid_predictions = sess.run(valid_preds) \n",
 452 |     "        writer.close()"
 453 |    ]
 454 |   },
 455 |   {
 456 |    "cell_type": "markdown",
 457 |    "metadata": {},
 458 |    "source": [
 459 |     "### plot losses"
 460 |    ]
 461 |   },
 462 |   {
 463 |    "cell_type": "markdown",
 464 |    "metadata": {},
 465 |    "source": [
 466 |     "plot the traing loss and valid loss"
 467 |    ]
 468 |   },
 469 |   {
 470 |    "cell_type": "code",
 471 |    "execution_count": 10,
 472 |    "metadata": {},
 473 |    "outputs": [
 474 |     {
 475 |      "data": {
 476 |       "image/png": "iVBORw0KGgoAAAANSUhEUgAAAtEAAAJQCAYAAABIJTh6AAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzs3Xt43HWd9//XOzPfmSRN2vQQoKWFFtalLbQUCAftTwFB\nbhAVTwjc1NN6y+qigHop6P1bV69dL/Fa9hZxVbarqCsIN1vFw8rKwQXq/gS0xQLFolgotLRAekia\nNJM5fn5/zCRNS9pmkvl+PzPfPB/XNdccMpl5M18Pz3z4zIw55wQAAABg7Jp8DwAAAAA0GiIaAAAA\nqBIRDQAAAFSJiAYAAACqREQDAAAAVSKiAQAAgCoR0QAAAECViGgAAACgSkQ0AAAAUKWk7wHGYtas\nWW7+/Pm+xwAAAEDMrV27drtzrvNQ92uIiJ4/f77WrFnjewwAAADEnJk9P5b7sZ0DAAAAqBIRDQAA\nAFSJiAYAAACq1BB7ogEAAFCWz+e1ZcsWDQ4O+h6loTU3N2vu3LkKgmBcv09EAwAANJAtW7aovb1d\n8+fPl5n5HqchOee0Y8cObdmyRQsWLBjXY7CdAwAAoIEMDg5q5syZBPQEmJlmzpw5odV8IhoAAKDB\nENATN9HXkIgGAAAAqkREAwAAYMx6enr0zW9+c1y/++Y3v1k9PT1jvv8XvvAF3XDDDeN6rrAR0QAA\nABizg0V0oVA46O/efffd6ujoCGOsyBHRAAAAGLPrrrtOGzdu1LJly/TpT39aDz74oF7/+tfrbW97\nmxYvXixJevvb365TTjlFxx9/vFauXDn8u/Pnz9f27du1adMmLVq0SB/+8Id1/PHH67zzzlMmkzno\n865bt05nnHGGli5dqne84x3atWuXJOmmm27S4sWLtXTpUl166aWSpIceekjLli3TsmXLdNJJJ6mv\nr6/mrwMfcQcAANCorrlGWreuto+5bJl0440H/PH111+v9evXa13leR988EE99thjWr9+/fDHxd1y\nyy2aMWOGMpmMTj31VL3rXe/SzJkz93mcZ555Rrfffrv+9V//Ve95z3v0ox/9SCtWrDjg877vfe/T\n17/+dZ155pn6/Oc/ry9+8Yu68cYbdf311+u5555TOp0e3ipyww036Bvf+IaWL1+u/v5+NTc3T/RV\neRVWogEAADAhp5122j6ft3zTTTfpxBNP1BlnnKHNmzfrmWeeedXvLFiwQMuWLZMknXLKKdq0adMB\nH7+3t1c9PT0688wzJUnvf//7tXr1aknS0qVLdfnll+vWW29VMlleH16+fLk++clP6qabblJPT8/w\n7bXESjQAAECjOsiKcZSmTJkyfPnBBx/U/fffr4cfflitra0666yzRv085nQ6PXw5kUgccjvHgfzi\nF7/Q6tWr9fOf/1xf+tKX9OSTT+q6667ThRdeqLvvvlvLly/XPffco4ULF47r8Q+ElWgAAACMWXt7\n+0H3GPf29mr69OlqbW3V008/rUceeWTCzzlt2jRNnz5dv/71ryVJP/jBD3TmmWeqVCpp8+bNOvvs\ns/WVr3xFvb296u/v18aNG7VkyRJde+21OvXUU/X0009PeIb9sRINAACAMZs5c6aWL1+uE044QRdc\ncIEuvPDCfX5+/vnn6+abb9aiRYt03HHH6YwzzqjJ837/+9/XRz7yEQ0MDOiYY47Rd7/7XRWLRa1Y\nsUK9vb1yzumqq65SR0eH/vZv/1YPPPCAmpqadPzxx+uCCy6oyQwjmXOu5g9aa11dXW7NmjW+xwAA\nAPBuw4YNWrRoke8xYmG019LM1jrnug71u2znAAAAAKpERAMAAABVIqIBAACAKhHRAAAAQJWIaAAA\nAKBKoX3EnZndIuktkl5xzp2w388+JekGSZ3Oue1hzTART37tv/TTf3hSyYRTMql9TumU09xZg1p4\nYlrz3rpMOussqYm/RwAAACaLMD8n+nuS/lnSv4280czmSTpP0gshPveE/f7lOfrb7W88+J0ekJbe\n+Lg+f+Qn9K5f/JV04onRDAcAANBA2tra1N/fr61bt+qqq67SqlWrXnWfs846SzfccIO6urrGdLtv\noS2fOudWS9o5yo++Kukzkur6A6rf+6WFKhSkwUGpv1/atUvq7pa2bZOefVZ68EHpn76clZt3lN79\n4tf0ldfeJR3kO98BAAAmuzlz5owa0I0o0j0IZnaRpBedc49H+bzjYSYlElI6LU2ZInV0SLNmSUcc\nIS1YIJ15pvTJ69Jau3G6Lntrn67LfEG/+Z//7HtsAACAUF133XX6xje+MXz9C1/4gm644Qb19/fr\nnHPO0cknn6wlS5bopz/96at+d9OmTTrhhPIu30wmo0svvVSLFi3SO97xDmUymUM+9+23364lS5bo\nhBNO0LXXXitJKhaL+sAHPqATTjhBS5Ys0Ve/+lVJ0k033aTFixdr6dKluvTSS2vxj76PyL7228xa\nJX1O5a0cY7n/FZKukKSjjjoqxMkmJgiklT9s1+ojd+uzD79VD61bJy1b5nssAAAwCVxzjbRuXW0f\nc9ky6cYbD/zzSy65RNdcc42uvPJKSdKdd96pe+65R83Nzbrrrrs0depUbd++XWeccYbe9ra3ycxG\nfZxvfetbam1t1YYNG/TEE0/o5JNPPuhcW7du1bXXXqu1a9dq+vTpOu+88/STn/xE8+bN04svvqj1\n69dLknp6eiRJ119/vZ577jml0+nh22opypXoYyUtkPS4mW2SNFfSY2Z2xGh3ds6tdM51Oee6Ojs7\nIxyzem1t0tWfTGq1ztRTX7vf9zgAAAChOemkk/TKK69o69atevzxxzV9+nTNmzdPzjl97nOf09Kl\nS3XuuefqxRdf1Msvv3zAx1m9erVWrFghSVq6dKmWLl160Of93e9+p7POOkudnZ1KJpO6/PLLtXr1\nah1zzDF69tln9fGPf1y//OUvNXXq1OHHvPzyy3Xrrbcqmaz9unFkK9HOuSclHTZ0vRLSXfX66RzV\net9HWvWZL0h3/cR0/Hd9TwMAACaDg60Yh+niiy/WqlWr9NJLL+mSSy6RJN12223q7u7W2rVrFQSB\n5s+fr8HBwdBnmT59uh5//HHdc889uvnmm3XnnXfqlltu0S9+8QutXr1aP//5z/WlL31JTz75ZE1j\nOrSVaDO7XdLDko4zsy1m9qGwnqseHH64dMq8l/XLntOlzZt9jwMAABCaSy65RHfccYdWrVqliy++\nWJLU29urww47TEEQ6IEHHtDzzz9/0Md4wxveoB/+8IeSpPXr1+uJJ5446P1PO+00PfTQQ9q+fbuK\nxaJuv/12nXnmmdq+fbtKpZLe9a536R/+4R/02GOPqVQqafPmzTr77LP1la98Rb29verv76/NP3xF\naCvRzrnLDvHz+WE9ty/nvtHpn75/urIP3qX0e+f5HgcAACAUxx9/vPr6+nTkkUdq9uzZkqTLL79c\nb33rW7VkyRJ1dXVp4cKFB32Mj370o/rgBz+oRYsWadGiRTrllFMOev/Zs2fr+uuv19lnny3nnC68\n8EJddNFFevzxx/XBD35QpVJJkvTlL39ZxWJRK1asUG9vr5xzuuqqq9TR0VGbf/gKc66uP2lOktTV\n1eXWrFnje4xDWvV/i7r40oTWXPZPOuWHn/I9DgAAiKENGzZo0aJFvseIhdFeSzNb65w75IdS8zV7\nNXTyqQlJ0mOPJzxPAgAAgDAR0TW0YIHUnszoiRdq+68LAAAAUF+I6Boyk46d1auN/YeVv+YQAAAg\nBI2wHbfeTfQ1JKJr7C/m57VRx0obN/oeBQAAxFBzc7N27NhBSE+Ac047duxQc3PzuB8jss+JniyO\nfU1CP31kgYqb7lXixBN9jwMAAGJm7ty52rJli7q7u32P0tCam5s1d+7ccf8+EV1jxy5pVV4pbXly\nl46+yPc0AAAgboIg0IIFC3yPMemxnaPG5i5qlyRt/RN7ogEAAOKKiK6x2XPLH2+37fmc50kAAAAQ\nFiK6xo44ony+bZvfOQAAABAeIrrGOjulJhW1bdf43+0JAACA+kZE11giIR3e2qeX+tt8jwIAAICQ\nENEhOKJ9QNsGp0ulku9RAAAAEAIiOgSHT8/pFXVKPT2+RwEAAEAIiOgQzJjhtEvTJT4EHQAAIJaI\n6BBMn5nQTs2Qtm/3PQoAAABCQESHYMbhgXrUodLLrEQDAADEEREdghmz03JqUu+WPt+jAAAAIARE\ndAimz05Lkna+xLcWAgAAxBERHYIZs8tftLJre9HzJAAAAAgDER2CGZ0JSdLO7XxONAAAQBwR0SGY\nMaN8vnOn3zkAAAAQDiI6BFOnls937/Y7BwAAAMJBRIegvb183seHcwAAAMQSER2Ctrbyed8eXl4A\nAIA4ovJC0NQkTUkOqm8w6XsUAAAAhICIDkl7kFXfYOB7DAAAAISAiA5Je3NOfdm07zEAAAAQAiI6\nJO3NefWVWqUiX7gCAAAQN0R0SNpbiupXm7Rnj+9RAAAAUGNEdEjaWkvqUzsRDQAAEENEdEja24ho\nAACAuCKiQ9LeJiIaAAAgpojokLRPNSIaAAAgpojokLRNbdIetcn1E9EAAABxQ0SHpKW9/G2F2Z6M\n50kAAABQa0R0SFqnliN6YFfW8yQAAACoNSI6JK3Tyl/5PdCT8zwJAAAAao2IDknLtJQkIhoAACCO\niOiQtE5PS5Iyu/OeJwEAAECtEdEhGd4T3Vf0PAkAAABqjYgOSUtL+ZyIBgAAiB8iOiStreXzzIDz\nOwgAAABqjogOyVBEDwz4nQMAAAC1R0SHZHg7R8b8DgIAAICaI6JDMrydI0tEAwAAxA0RHZLhlejB\nhN9BAAAAUHNEdEiGIzpLRAMAAMQNER2SIJACyyuTI6IBAADihogOUUsir4Fc0vcYAAAAqDEiOkQt\nyZwGCoHvMQAAAFBjRHSImpMFZQts5wAAAIgbIjpE6WRR2QLbOQAAAOKGiA5Rc1DUYJHtHAAAAHFD\nRIcoHZSULSUl53yPAgAAgBoiokPUnCopq7SUy/keBQAAADVERIconXIaVLOUyfgeBQAAADVERIco\nnXLllejBQd+jAAAAoIaI6BA1N6u8Ek1EAwAAxAoRHaJ0WuWVaLZzAAAAxAoRHaLmZmNPNAAAQAwR\n0SFKN4tP5wAAAIghIjpEwyvR2azvUQAAAFBDRHSI0s3GSjQAAEAMEdEham5tUl4plTKsRAMAAMRJ\naBFtZreY2Stmtn7Ebf9oZk+b2RNmdpeZdYT1/PUg3VJ+ebP9ec+TAAAAoJbCXIn+nqTz97vtPkkn\nOOeWSvqTpM+G+PzeNbdWInqg6HkSAAAA1FJoEe2cWy1p53633eucK1SuPiJpbljPXw/SrQlJ0uAe\nIhoAACBOfO6J/itJ/3mgH5rZFWa2xszWdHd3RzhW7QxFNCvRAAAA8eIlos3sf0sqSLrtQPdxzq10\nznU557o6OzujG66GmtuSkliJBgAAiJtk1E9oZh+Q9BZJ5zjnXNTPH6X0lPLLm82UPE8CAACAWoo0\nos3sfEmfkXSmc24gyuf2gYgGAACIpzA/4u52SQ9LOs7MtpjZhyT9s6R2SfeZ2Tozuzms568HqdZy\nROcH2c4BAAAQJ6GtRDvnLhvl5u+E9Xz1KEiZJCk3yEo0AABAnPCNhSFKpcrnuSwRDQAAECdEdIiG\nIjqfjfX7JwEAACYdIjpEQVA+zxHRAAAAsUJEh2jvSjTbOQAAAOKEiA7R8Ep0zu8cAAAAqC0iOkTD\nK9E5tnMAAADECREdouGV6LzfOQAAAFBbRHSIhj/iLmd+BwEAAEBNEdEhGt7OwUo0AABArBDRIdq7\nnYOVaAAAgDghokM0vBJd8DsHAAAAaouIDlEiIZlKyhUSvkcBAABADRHRIUs1FZQv8jIDAADECXUX\nsqCpqBwRDQAAECvUXchSiaJyxaTvMQAAAFBDRHTIUoki2zkAAABihroLWdBUVK7EGwsBAADihIgO\nWSpRUr5IRAMAAMQJER2yIFFSzrEnGgAAIE6I6JClkiXl2c4BAAAQK0R0yIJkSTmlpGLR9ygAAACo\nESI6ZKmhiM7nfY8CAACAGiGiQ5ZKOuUVSLmc71EAAABQI0R0yIKkYyUaAAAgZojokKWCyko0EQ0A\nABAbRHTIgoCVaAAAgLghokOWCsRKNAAAQMwQ0SELArESDQAAEDNEdMhSqUpE8+kcAAAAsUFEhyyV\nYjsHAABA3BDRIQtSbOcAAACIGyI6ZKmUsRINAAAQM0R0yIKUsRINAAAQM0R0yFJpVqIBAADihogO\nWZBuUkkJFTN8OgcAAEBcENEhS6VNkpTLFD1PAgAAgFohokOWai6/xPlBIhoAACAuiOiQBenyS5wb\nLHmeBAAAALVCRIeMlWgAAID4IaJDxko0AABA/BDRIUu1JCSxEg0AABAnRHTIguZyRLMSDQAAEB9E\ndMiGVqJzWed5EgAAANQKER2yVGtSkpTPshINAAAQF0R0yIa3c7ASDQAAEBtEdMiGP+IuR0QDAADE\nBREdsiAon7MSDQAAEB9EdMhSqfI5K9EAAADxQUSHbHglmogGAACIDSI6ZEMr0bmc+R0EAAAANUNE\nh2x4O0eelWgAAIC4IKJDtnc7ByvRAAAAcUFEh2zvSrTfOQAAAFA7RHTIhlei86xEAwAAxAURHbKh\niM4X/M4BAACA2iGiQzYc0axEAwAAxAYRHbK9K9FENAAAQFwQ0SFLJsvn+SIRDQAAEBdEdMjMpIQV\nlS/wUgMAAMQFZReBwAoq8MZCAACA2CCiIxA0FZUv8lIDAADEBWUXASIaAAAgXii7CARNReVLvNQA\nAABxQdlFoLwSnfA9BgAAAGoktIg2s1vM7BUzWz/ithlmdp+ZPVM5nx7W89eTIFFiJRoAACBGwiy7\n70k6f7/brpP0K+fcayT9qnI99oJEiZVoAACAGAktop1zqyXt3O/miyR9v3L5+5LeHtbz15PySjQR\nDQAAEBdR7zE43Dm3rXL5JUmHR/z8XhDRAAAA8eJto65zzklyB/q5mV1hZmvMbE13d3eEk9VekCgp\n74hoAACAuIg6ol82s9mSVDl/5UB3dM6tdM51Oee6Ojs7IxswDEHCKe+Skjvg3wwAAABoIFFH9M8k\nvb9y+f2Sfhrx83sRJJ3yCsR3fwMAAMRDmB9xd7ukhyUdZ2ZbzOxDkq6X9CYze0bSuZXrsZcciuh8\n3vcoAAAAqIFkWA/snLvsAD86J6znrFdB0qmgJBENAAAQE3wDSAQCVqIBAABihYiOQJAUEQ0AABAj\nRHQEgoCIBgAAiBMiOgJ8OgcAAEC8ENERCFKsRAMAAMQJER0BtnMAAADECxEdgSAwIhoAACBGiOgI\nsJ0DAAAgXojoCAyvRPPGQgAAgFggoiMQpNjOAQAAECdEdASSKVNRSbkcEQ0AABAHRHQEglT5ZS4M\nsp0DAAAgDojoCAQpkyTlB4ueJwEAAEAtENERCNLllzmfYSUaAAAgDojoCATpykp0tuR5EgAAANQC\nER2BIJ2QREQDAADEBREdgeHtHOyJBgAAiAUiOgJBcyWis0Q0AABAHBDRERjezjHIdg4AAIA4IKIj\nEDRXIjrnPE8CAACAWiCiIzAc0byxEAAAIBaI6AiwEg0AABAvRHQEWIkGAACIFyI6Ask0K9EAAABx\nQkRHIAjK54UcK9EAAABxQERHYCii83m/cwAAAKA2iOgIDEc02zkAAABigYiOACvRAAAA8UJER4CI\nBgAAiBciOgJ7I5rtHAAAAHFAREdgb0Sb30EAAABQE0R0BNjOAQAAEC9EdASGI7rASjQAAEAcENER\n2BvRfucAAABAbRDREWBPNAAAQLwQ0RFgOwcAAEC8ENERSCbL5/kiLzcAAEAcUHURMJMSVlShyEo0\nAABAHBDREQmswHYOAACAmCCiIxI0FdnOAQAAEBNUXUSIaAAAgPig6iISNBWVL/FyAwAAxAFVF5Hy\nSnTC9xgAAACoASI6IkFTiZVoAACAmKDqIhIkWIkGAACICyI6IkGipHyJiAYAAIgDIjoiRDQAAEB8\nENERCRIl5V3S9xgAAACoASI6IkHCKe9YiQYAAIgDIjoiyYQrr0Q753sUAAAATBARHZEg6VRQUioU\nfI8CAACACSKiIxIknfIKpHze9ygAAACYICI6IkQ0AABAfBDRERmOaLZzAAAANDwiOiJBwEo0AABA\nXBDREQkCEdEAAAAxQURHhIgGAACIDyI6IkQ0AABAfBDRERmOaN5YCAAA0PCI6IgEgbESDQAAEBNE\ndESCFBENAAAQF0R0RIIUe6IBAADigoiOSDJoUlFJuRwRDQAA0OiI6IgEKZMkFQZ5YyEAAECjI6Ij\nMhTR+cGi50kAAAAwUV4i2sw+YWZPmdl6M7vdzJp9zBGlIF1+qYloAACAxhd5RJvZkZKuktTlnDtB\nUkLSpVHPETUiGgAAID58bedISmoxs6SkVklbPc0RGSIaAAAgPiKPaOfci5JukPSCpG2Sep1z9+5/\nPzO7wszWmNma7u7uqMesOSIaAAAgPnxs55gu6SJJCyTNkTTFzFbsfz/n3ErnXJdzrquzszPqMWtu\nOKJzzvMkAAAAmCgf2znOlfScc67bOZeX9GNJr/MwR6RYiQYAAIgPHxH9gqQzzKzVzEzSOZI2eJgj\nUkFzQpKUz5Y8TwIAAICJ8rEn+lFJqyQ9JunJygwro54jakQ0AABAfCR9PKlz7u8k/Z2P5/ZlOKJz\nRDQAAECj4xsLI5JsLv+9ks/yxkIAAIBGR0RHZGglupAnogEAABodER0RPuIOAAAgPojoiAQpk0RE\nAwAAxAERHZEgKJ8T0QAAAI2PiI4IEQ0AABAfRHREhiO64HcOAAAATBwRHZG9K9F+5wAAAMDEEdER\nGY7ovN85AAAAMHFEdESIaAAAgPggoiOyd0+0+R0EAAAAEzamiDazq81sqpV9x8weM7Pzwh4uTliJ\nBgAAiI+xrkT/lXNut6TzJE2X9F5J14c2VQwNR3SRlWgAAIBGN9aIHiq/N0v6gXPuqRG3YQySyfI5\n2zkAAAAa31gjeq2Z3atyRN9jZu2SSuGNFT9DK9EFIhoAAKDhJcd4vw9JWibpWefcgJnNkPTB8MaK\nHzMpoSIr0QAAADEw1pXo10r6o3Oux8xWSPp/JfWGN1Y8BU0F9kQDAADEwFgj+luSBszsREmfkrRR\n0r+FNlVMBVZUvpjwPQYAAAAmaKwRXXDOOUkXSfpn59w3JLWHN1Y8sRINAAAQD2PdE91nZp9V+aPt\nXm9mTZKC8MaKp6CpqHyR77cBAABodGMtukskZVX+vOiXJM2V9I+hTRVTQVOJ7RwAAAAxMKaIroTz\nbZKmmdlbJA0659gTXaUgwUo0AABAHIz1a7/fI+m3ki6W9B5Jj5rZu8McLI6CREn5EhENAADQ6Ma6\nJ/p/SzrVOfeKJJlZp6T7Ja0Ka7A4YjsHAABAPIx1WbRpKKArdlTxu6gIkqxEAwAAxMFYV6J/aWb3\nSLq9cv0SSXeHM1J8BQmnfImVaAAAgEY3poh2zn3azN4laXnlppXOubvCGyuekkkiGgAAIA7GuhIt\n59yPJP0oxFliL0g4FZSUCgUpOeaXHgAAAHXmoCVnZn2S3Gg/kuScc1NDmSqmgsApo0DK5YhoAACA\nBnbQknPO8dXeNRQkpd1DEd3a6nscAAAAjBMfFRGhIHDKD0U0AAAAGhYRHaEgUDmis1nfowAAAGAC\niOgIBYGxEg0AABADRHSEUikppxQRDQAA0OCI6Ail01JWaSIaAACgwRHREUqnKyvR7IkGAABoaER0\nhFLNxko0AABADBDREUoT0QAAALFAREco3dykvFIqDRLRAAAAjYyIjlC62SRJ+YG850kAAAAwEUR0\nhFItCUlSdk/B8yQAAACYCCI6QumW8sudHSh6ngQAAAATQURHKN1aWYkmogEAABoaER2hoYjODbCd\nAwAAoJER0RFKtSYlSdlMyfMkAAAAmAgiOkLD2zmIaAAAgIZGREcoPaWyEj3oPE8CAACAiSCiIzQU\n0bkMbywEAABoZER0hFLp8petZLOsRAMAADQyIjpC6XT5PDvodw4AAABMDBEdoeGIzvqdAwAAABND\nREdoKKJzbOcAAABoaER0hFKp8jkr0QAAAI2NiI7Q8HaOnPkdBAAAABNCREdoeDtHzu8cAAAAmBgi\nOkLDK9F5XnYAAIBGRs1FaHhPNNs5AAAAGhoRHaHhiGYlGgAAoKFRcxEyk1KWU67Ayw4AANDIqLmI\npRMFZYloAACAhkbNRSzVVFA2n/A9BgAAACaAiI5YOlFQtkhEAwAANDIiOmLpREE5IhoAAKChEdER\nSyeKyhaTvscAAADABHiJaDPrMLNVZva0mW0ws9f6mMOHVJKIBgAAaHS+au5rkn7pnHu3maUktXqa\nI3LpZFHZYuB7DAAAAExA5BFtZtMkvUHSByTJOZeTlIt6Dl/SQUk5x0o0AABAI/OxnWOBpG5J3zWz\n35vZt81sioc5vEgHJWVLrEQDAAA0Mh8RnZR0sqRvOedOkrRH0nX738nMrjCzNWa2pru7O+oZQ5MO\nSsoqLRUKvkcBAADAOPmI6C2StjjnHq1cX6VyVO/DObfSOdflnOvq7OyMdMAwpQJXjujcpNnBAgAA\nEDuRR7Rz7iVJm83suMpN50j6Q9Rz+JJOOeWUkrJZ36MAAABgnHy9w+3jkm6rfDLHs5I+6GmOyKVT\nKq9EE9EAAAANy0tEO+fWSery8dy+pZutHNGZjO9RAAAAME58Y2HEUmkrb+cYHPQ9CgAAAMaJiI5Y\nuqWJlWgAAIAGR0RHbHg7ByvRAAAADYuIjli6NaG8UirtYSUaAACgURHREUu1JCRJ+T18TjQAAECj\nIqIjlm4tR3R2Nx9xBwAA0KiI6Ig1t5U/VXCwL+95EgAAAIwXER2xlvZyRGf6Cp4nAQAAwHgR0RFr\nmRpIIqIBAAAaGREdsaGIHugveZ4EAAAA40VER6x1WmUlur/oeRIAAACMFxEdsZYp5Zc8M8BKNAAA\nQKMioiPW0lI+z+xxfgcBAADAuBHRERuOaL6wEAAAoGER0REbiuiBAb9zAAAAYPyI6IgNr0QPmt9B\nAAAAMG6w1Y5QAAAb9UlEQVREdMRaW8vnRDQAAEDjIqIjNrwSneWlBwAAaFSUXMSCQEpYUZkcLz0A\nAECjouQ8aGnKaiCX9D0GAAAAxomI9qAlmVeGiAYAAGhYRLQHrcmcMgUiGgAAoFER0R60JAvK5APf\nYwAAAGCciGgPWlIFZYpENAAAQKMioj1oCYoaKKZ9jwEAAIBxIqI9aEkXlSmlJed8jwIAAIBxIKI9\naE2XlFGLlMv5HgUAAADjQER70NLsyhGdyfgeBQAAAONARHswHNGDg75HAQAAwDgQ0R60tEgDamUl\nGgAAoEER0R60tIiVaAAAgAZGRHvQOsXYEw0AANDAiGgPWlpNRSWV78/6HgUAAADjQER70NKWkCRl\neohoAACARkREezAU0QO9ec+TAAAAYDyIaA9apgaSWIkGAABoVES0B60dKUlENAAAQKMioj1o6UhL\nkjK7+dpvAACARkREezAc0eyJBgAAaEhEtAdDe6IH+oqeJwEAAMB4ENEeTGkzSdJAf8nzJAAAABgP\nItqDtrbyeX+f8zsIAAAAxoWI9mAoovv6ze8gAAAAGBci2oP29vJ5/wARDQAA0IiIaA9aWiRTSf2Z\nhO9RAAAAMA5EtAdNTdKUxKD6M0nfowAAAGAciGhP2oKs+rIp32MAAABgHIhoT9pTWfXniGgAAIBG\nRER70pbKqz+f9j0GAAAAxoGI9qStOa/+YrPvMQAAADAORLQn7S1F9ZWmSEW++hsAAKDRENGetLWW\n1K82ac8e36MAAACgSkS0J21TXDmi+/t9jwIAAIAqEdGetLWJlWgAAIAGRUR70j7V1Kd2uT5WogEA\nABoNEe1J27QmFZVUdteA71EAAABQJSLak7Zp5a/87t8+6HkSAAAAVIuI9qRteiWid+Y8TwIAAIBq\nEdGetM8of+V3386850kAAABQLSLak7ZKRPfvIqIBAAAaDRHtSdus8ld+9/fyjYUAAACNhoj2pG1m\nWpLU10NEAwAANBoi2pP2aeWXvn93yfMkAAAAqBYR7UlbW/m8fzcr0QAAAI2GiPZkOKL7/M4BAACA\n6nmLaDNLmNnvzew/fM3gU0uL1KSidvfzdwwAAECj8VlwV0va4PH5vTKTpgUD6h0IfI8CAACAKnmJ\naDObK+lCSd/28fz1oiOVUc9g2vcYAAAAqJKvlegbJX1G0gE/msLMrjCzNWa2pru7O7rJItTRMqie\nbIvvMQAAAFClyCPazN4i6RXn3NqD3c85t9I51+Wc6+rs7Ixoumh1tObVk5/iewwAAABUycdK9HJJ\nbzOzTZLukPRGM7vVwxzedbQV1OOmSYODvkcBAABAFSKPaOfcZ51zc51z8yVdKum/nHMrop6jHnRM\nLalHHVJvr+9RAAAAUAU+X82jjg4R0QAAAA3Ia0Q75x50zr3F5ww+dcxoUr/aVdhBRAMAADQSVqI9\n6piZkCT1bhvwPAkAAACqQUR71HFYSpLUsy3jeRIAAABUg4j2qOPw8het9Lyc9TwJAAAAqkFEe9Qx\np1WS1NOd9zwJAAAAqkFEezQc0TuKnicBAABANYhoj4beWNizy3meBAAAANUgoj3q6Cif9/Sa30EA\nAABQFSLao7Y2qUlF9ezmMAAAADQS6s2jpiZpWnKPevYkfY8CAACAKhDRnnWkBtSzJ/A9BgAAAKpA\nRHvW0ZzVrkyz7zEAAABQBSLasxltOe3MtfkeAwAAAFUgoj3r7Mhre2mGNDDgexQAAACMERHt2ayZ\nTt3qlHbs8D0KAAAAxoiI9qzzcFOPpiu/bbvvUQAAADBGRLRns44ofzLHjk19nicBAADAWBHRnnXO\nK38yx/YX2BMNAADQKIhoz2YdPUWS1L150PMkAAAAGCsi2rPOY9olSd3bCp4nAQAAwFgR0Z7Nml3e\nE72923meBAAAAGNFRHs2c2b5vHsHhwIAAKBRUG6eBYHUkdit7b1J36MAAABgjIjoOtCZ3q3uvhbf\nYwAAAGCMiOg6MKs1o+2ZVt9jAAAAYIyI6DrQOXVQ3blpvscAAADAGBHRdWDWjJK2l2ZIe/b4HgUA\nAABjQETXgcMOb9IrOkylrS/5HgUAAABjQETXgTlHJVVQoO1Pb/c9CgAAAMaAiK4Dc/6i/MkcW5/e\n7XkSAAAAjAURXQfmHDdVkrTt2YznSQAAADAWRHQdmLO4Q5K09YWC50kAAAAwFkR0HThiTvkwbN1m\nnicBAADAWBDRdSCdlmYld2nr9pTvUQAAADAGRHSdmNPSo629U3yPAQAAgDEgouvEnGl7tHWgw/cY\nAAAAGAMiuk7MmZXV1kKnVCz6HgUAAACHQETXiTmznV7SESq+1O17FAAAABwCEV0n5hwVqKSEXln/\nsu9RAAAAcAhEdJ2Yu7BNkrR53U7PkwAAAOBQiOg6Mb9rliRp0x8GPE8CAACAQyGi68TRS8pf/b3p\n2ZLnSQAAAHAoRHSdmDrNNCPRo+deDHyPAgAAgEMgouvIgind2rSj3fcYAAAAOAQiuo7Mn9mnTXs6\nfY8BAACAQyCi68j8I/PaVJwn17/H9ygAAAA4CCK6jsw/NqFBtejlx170PQoAAAAOgoiuIwsWt0iS\nNq3d4XkSAAAAHAwRXUfmnzJTkvTsk2znAAAAqGdEdB055ozDZCrpmQ0F36MAAADgIIjoOtIypUlH\nB9v0xxeafY8CAACAgyCi68zCGa/o6R18zB0AAEA9I6LrzMKjBvTH7HyVsnnfowAAAOAAiOg6c9yi\nJg1oirY8vNn3KAAAADgAIrrOLDxtqiTpj79+xfMkAAAAOBAius4sPHu2JOnp32c8TwIAAIADIaLr\nzOGLZmia9WrDH833KAAAADgAIrrOmElL2p/XE5un+x4FAAAAB0BE16FlR+/S433HqFQo+R4FAAAA\noyCi69BJJ5v61a6ND23xPQoAAABGQUTXoWVvnCFJWvfLlzxPAgAAgNEQ0XVo8YULlFRe6x7N+h4F\nAAAAoyCi61DzzClalNqodX9q9T0KAAAARkFE16mTZ7+k33UfLed8TwIAAID9EdF16nVdOXWXZmnj\nb3f4HgUAAAD7iTyizWyemT1gZn8ws6fM7OqoZ2gEr3vbLEnSb+54wfMkAAAA2J+PleiCpE855xZL\nOkPSlWa22MMcdW3xOxdqqnr1m4fyvkcBAADAfiKPaOfcNufcY5XLfZI2SDoy6jnqXVNbq1479Sn9\n5plZvkcBAADAfrzuiTaz+ZJOkvSozznq1euO26n1/fO1a3vR9ygAAAAYwVtEm1mbpB9JusY5t3uU\nn19hZmvMbE13d3f0A9aBN16QllOT/uuWTb5HAQAAwAheItrMApUD+jbn3I9Hu49zbqVzrss519XZ\n2RntgHXi9P+1RO3arfvv6vM9CgAAAEbw8ekcJuk7kjY45/5P1M/fSIJ5R+istrW674nDfI8CAACA\nEXysRC+X9F5JbzSzdZXTmz3M0RDOPWmHNg7M0XN/zPkeBQAAABU+Pp3jv51z5pxb6pxbVjndHfUc\njeL8S6ZJkv7jm3xeNAAAQL3gGwvr3F++93Qt0gbddZfvSQAAADCEiK53U6fqHQvWafXm+drBN4AD\nAADUBSK6Abzj4qSKSupn/7LN9ygAAAAQEd0QTrnyDB2jjfrhdwd9jwIAAAAR0Q3Bjpqn9859UL/6\n89HassX3NAAAACCiG8R7P5SSU5Nu/Ue2dAAAAPhGRDeIY688X6/Xr/XtfwtUKvmeBgAAYHIjohtF\nZ6euPOURbeyZpf/8Wd73NAAAAJMaEd1A3vmFpZqjF/X1v9vuexQAAIBJjYhuIMGb36QrZ9yhe56Y\nrTVrfE8DAAAweRHRjaSpSR/7TKtmaIf+7uN88woAAIAvRHSDmfrx9+vTbd/S3Y/M1KOP+p4GAABg\nciKiG01rqz722amapW599m965JzvgQAAACYfIroBtV39IX2x7QY98FiH7ridigYAAIgaEd2IpkzR\nX1+/QF36nT7xN4Pq6fE9EAAAwORCRDeoxEc+rJsX3qju3pQ+dVXO9zgAAACTChHdqBIJnfK9q3St\nvqJbfpDSHXf4HggAAGDyIKIb2emn64tXduu1+o2u+Ku8nnnG90AAAACTAxHd4IIbvqzbF/29UoO7\ndeH/KGg7X2YIAAAQOiK60TU36+gff1U/TV+iFzYVddFbixoY8D0UAABAvBHRcbBwoZb/+zW6TSv0\nyCPSmy9w6u/3PRQAAEB8EdFx8Za36F03nalbtUL//euSznuTY2sHAABASIjoOPnYx3TZ9ct0p7tY\nj/02r9NOdXrqKd9DAQAAxA8RHTfXXqt3/tP/o4dKr1fmxR067TSnlSvF14MDAADUEBEdR5/8pE7/\nv5/SmqbT9brif+uv/1q66CLpxRd9DwYAABAPRHRcvec9OvL/u1P3zPtf+qo+oXvvzuu445y+/GUp\nm/U9HAAAQGMjouPslFPU9Pu1uubDe/RUcaHOdffrc5+T/vIvnW6+mZgGAAAYLyI67trapJUrdey9\nN+snR1+t+3Su5vRs0Ec/Kh1zjPT3fy9t2+Z7SAAAgMZCRE8Wb3qT9PjjOvfGt+o36bN1n87V8Znf\n6fOfl446yund75buukvKZHwPCgAAUP+I6MkkCKSrr5Ztek7nfvUtunfKO/UnvUZXp/9FD969R+98\np3TYYdJll0n//u/Sjh2+BwYAAKhP5hrgs8+6urrcmjVrfI8RP4WCdPfd0s03K/+f9+tBnal/n/HX\numvwAm0fmCIzp5NPNr3pTdLrXy+ddpo0a5bvoQEAAMJjZmudc12HvB8RDUnlz79btUq6804VfvOo\nfqdTdd+Ut+v+1ov08M6/VKFY/pcWxx4rnX661NUlHX+8dMIJ0uzZkpnn+QEAAGqAiMb4bdsm3Xef\ndO+90r33ak/3Hq1Rlx5tOUuPTn2THhlYqq19U4fvPn16OaiPO678ZsWRp5kzCWwAANA4iGjURqkk\nbdggPfqo9Mgj0sMPS089pW43U0/peK0PTtb6acu13pboz5k5erm/bZ9fb2uTjj66vFo9dJozZ+/l\nI44oh3ZHh9TEDn0AAOAZEY3w7Nkj/eEP0vr15dOTT5ZDe8sW7VGrntMCPacFejZYqGfbT9Tm5Hxt\nc0doW3amtu6Zqnwx8aqHNCuvaM+cKc2YUT4fujxjhjR1qtTefvBTOs2qNwAAmBgiGtEbHJSef17a\nuFF69tny6bnnyttDtm6VXnpJLp/XTs3QVs3RNs3WSzpCOxOHaUfzkdqZOkI7kodpp2Zqp+vQjvxU\n7Rycot3Z9JiePpksx3Rbm9TSUv2ptVVqbi7HeCr16vPRbhv5MwIeAIDGN9aITkYxDCaJ5ubyxujj\njhv956WSbMcOzdy2TTO3btWSbduk7dvLn6W34w/Szv+uXN4h7dwp9e2QslkVlFCf2g942m3T1Jea\npb7UDPVpuvYMtCuTmaKMtSijVmVcs3a7Zg2U0soU08oUA2UK5dNoq+LjFQQHD+10eu99gmDf01hu\nC+v3kkn+AAAAoFpENKLT1CR1dpZPS5ce+v7OSQMDSvb2anpfn6b39Um7d0t9fXtPw9d3SLufK1/O\nZMqngYFXX85n9vlGmaKalFHLPqecUsOnrNKjn1uLcslW5ZKtyiZalUu0KNvUolyiufwz16xcNq1s\nNq1cf1rZocd0KQ0oqZwLlHfJ8qmUUK5UPs8XE8qXmpQvNilXSChfjGajeDI5vmgPI+yTyfGfhn4/\nkeAPAwBAuIho1C8zacqU8qmWnCtvPclklBgYUFsmo7aRoZ3LSdnsoU+5nJTtkbIvj/G+o9yWzx98\nVElFJZRXoLwC5ZQavjza9eHbkq3KBy3KJ8qxn08073PKJZqVb0orb+nyeVNKOWtW3gLllVLe9nvs\nXKB8NqmcSypfSirjEtpdCf/ccPgnKvHfpHzRlC+UT7m8n3eMJhITC/GxnhKJ0U9NTdXd7vN3mprK\nJ7N9L/OHCAAcGBGNycds70boGTP8zuJcOaSHIjuX2+dkuZyS2aySuZxa9vvZaPc/8O27xnb/7Ci3\nZ7PlOcf7j6h9/xAYjv2mFuWDVuVTU5QPyqv6hWTz3lPQokIiXT4l0yo0pfdeT6RVaEqp0JRS3lLD\nlwsW7D0pWbmcLF9WUgWXVEGJveelhAouoYIr/xFQcE0qlJpUyDdpIGsqFE2FgqlQ0EFPxeK+p1Kp\nfN4Abzk5qKGQHgrrkafRbo/yvtX+flinsB8/itPQP8PQMR95/DmP//mQaq6Hdd+R15uapGnTVNeI\naMAns73vTGxrO/T9fSkWxxnvOVnlj4BX/SFw0MfZOfrtmYM87yFW9cdt/3eQplJSOiW1j3L7fu9A\ndUFKxaBZxWS6fB40qxSky9eHTonyfUrJVPnyq06BSomUik3B3lMipaIlVUoEKlpSRdd0wJAfy+3O\nlW8vlfa9PPI02u1jva0W9y2Vyn+wTOS5wjqN9fEBjN0xx5Q/p6CeEdEADi2R2Lt6X68Osaof+u0D\nA9KuXa/+NwmVf5tQi1X9A0okRo/9/T9WZui0/76VQ11OjfF+Y7081vslavfG33oRZsxP5I+AodlG\nzsl5/M+HVHM9rPvuf33qVNU9IhpAPEyCVf2a3L5nz757UfL5Q18uFv28VmZji/Ch4B65SX3/DesH\n+tlELo/jdyyZlE34sZr23dw+dL7/vxcHECoiGgCi1Air+vsb2gNysNAea5BP9HcO9PtDe1P2v5zN\njn77yP0sB/v9kfdrBCPfKTpaaI92Xqv7RH3foQ3dB9oQX+31WjxGFI/JH0t1g4gGABzc0P95B0Fj\nxX8tDe17qEWQjzXaR/vZyI3hIze5jzw/2M9qdd98PvwZcGC1DPOR7y6t9vJ4f28sj3HEEdLXv+77\nlT4oIhoAgEMx27ulIpXyPc3k4NzBg3u0d5FO9Ppke8yhdxYP/Ww8lw/080M99qEet7fX938CD4mI\nBgAA9WdoTzxQp5p8DwAAAAA0GiIaAAAAqBIRDQAAAFSJiAYAAACqREQDAAAAVSKiAQAAgCoR0QAA\nAECViGgAAACgSkQ0AAAAUCUiGgAAAKgSEQ0AAABUiYgGAAAAqkREAwAAAFUiogEAAIAqEdEAAABA\nlYhoAAAAoEpENAAAAFAlIhoAAACokpeINrPzzeyPZvZnM7vOxwwAAADAeEUe0WaWkPQNSRdIWizp\nMjNbHPUcAAAAwHj5WIk+TdKfnXPPOudyku6QdJGHOQAAAIBx8RHRR0raPOL6lspt+zCzK8xsjZmt\n6e7ujmw4AAAA4FDq9o2FzrmVzrku51xXZ2en73EAAACAYUkPz/mipHkjrs+t3HZAa9eu3W5mz4c6\n1ehmSdru4XkRLY7z5MBxnhw4zpMDx3ly8HWcjx7Lncw5F/Yg+z6hWVLSnySdo3I8/07S/3TOPRXp\nIGNgZmucc12+50C4OM6TA8d5cuA4Tw4c58mh3o9z5CvRzrmCmX1M0j2SEpJuqceABgAAAA7Ex3YO\nOefulnS3j+cGAAAAJqpu31hYJ1b6HgCR4DhPDhznyYHjPDlwnCeHuj7Oke+JBgAAABodK9EAAABA\nlYjoAzCz883sj2b2ZzO7zvc8GB8zm2dmD5jZH8zsKTO7unL7DDO7z8yeqZxPH/E7n60c9z+a2f/w\nNz2qZWYJM/u9mf1H5TrHOWbMrMPMVpnZ02a2wcxey3GOHzP7ROV/s9eb2e1m1sxxbnxmdouZvWJm\n60fcVvVxNbNTzOzJys9uMjOL+p9FIqJHZWYJSd+QdIGkxZIuM7PFfqfCOBUkfco5t1jSGZKurBzL\n6yT9yjn3Gkm/qlxX5WeXSjpe0vmSvln5zwMaw9WSNoy4znGOn69J+qVzbqGkE1U+3hznGDGzIyVd\nJanLOXeCyp/kdak4znHwPZWP0UjjOa7fkvRhSa+pnPZ/zEgQ0aM7TdKfnXPPOudyku6QdJHnmTAO\nzrltzrnHKpf7VP4/3CNVPp7fr9zt+5LeXrl8kaQ7nHNZ59xzkv6s8n8eUOfMbK6kCyV9e8TNHOcY\nMbNpkt4g6TuS5JzLOed6xHGOo6Sklsp3S7RK2iqOc8Nzzq2WtHO/m6s6rmY2W9JU59wjrvzGvn8b\n8TuRIqJHd6SkzSOub6nchgZmZvMlnSTpUUmHO+e2VX70kqTDK5c59o3rRkmfkVQacRvHOV4WSOqW\n9N3Ktp1vm9kUcZxjxTn3oqQbJL0gaZukXufcveI4x1W1x/XIyuX9b48cEY1JwczaJP1I0jXOud0j\nf1b5S5aPqWlgZvYWSa8459Ye6D4c51hISjpZ0reccydJ2qPKv/odwnFufJU9sRep/EfTHElTzGzF\nyPtwnOOp0Y4rET26FyXNG3F9buU2NCAzC1QO6Nuccz+u3Pxy5V8JqXL+SuV2jn1jWi7pbWa2SeXt\nV280s1vFcY6bLZK2OOcerVxfpXJUc5zj5VxJzznnup1zeUk/lvQ6cZzjqtrj+mLl8v63R46IHt3v\nJL3GzBaYWUrlje0/8zwTxqHyjt3vSNrgnPs/I370M0nvr1x+v6Sfjrj9UjNLm9kCld+w8Nuo5sX4\nOOc+65yb65ybr/J/X//LObdCHOdYcc69JGmzmR1XuekcSX8QxzluXpB0hpm1Vv43/ByV38/CcY6n\nqo5rZevHbjM7o/Kfj/eN+J1Iefna73rnnCuY2cck3aPyu4Jvcc495XksjM9ySe+V9KSZravc9jlJ\n10u608w+JOl5Se+RJOfcU2Z2p8r/x1yQdKVzrhj92KgRjnP8fFzSbZUFjmclfVDlBSGOc0w45x41\ns1WSHlP5uP1e5W+uaxPHuaGZ2e2SzpI0y8y2SPo7je9/p/9G5U/6aJH0n5VT5PjGQgAAAKBKbOcA\nAAAAqkREAwAAAFUiogEAAIAqEdEAAABAlYhoAAAAoEpENABMUmZ2lpn9h+85AKAREdEAAABAlYho\nAKhzZrbCzH5rZuvM7F/MLGFm/Wb2VTN7ysx+ZWadlfsuM7NHzOwJM7vLzKZXbv8LM7vfzB43s8fM\n7NjKw7eZ2Soze9rMbqt8AxgA4BCIaACoY2a2SNIlkpY79/+3d/+uPsVxHMefL4v8igwWAxkpP1IG\nZfIPGK6FbjJbbFIs/gfFeMUgxa4M37oTBqWMplvqLhKKdL0M3zNcFs4t935dz8d0zvt8evf5LOe8\ne59PfXocWAEuAjuAl22PABOmJ38B3AOutT0KvF4VfwDcbnsMOA28G+IngKvAYeAQ01M+JUm/4bHf\nkjTbzgIngRdDk3gbsAx8Bx4OY+4Dj5PsBva0nQzxBeBRkl3A/rZPANp+ARjyPW+7NNy/Ag4Ci39/\nWZL0b7OIlqTZFmCh7fWfgsnNX8Z1jfm/rrpewe+CJP0Rt3NI0mx7Bswl2QeQZG+SA0zf33PDmAvA\nYtsPwPskZ4b4PDBp+xFYSnJuyLE1yfZ1XYUkbTJ2HCRphrV9k+QG8DTJFuAbcAX4DJwani0z3TcN\ncAm4MxTJb4HLQ3weuJvk1pDj/DouQ5I2nbRr/QMoSdooST613bnR85Ck/5XbOSRJkqSR7ERLkiRJ\nI9mJliRJkkayiJYkSZJGsoiWJEmSRrKIliRJkkayiJYkSZJGsoiWJEmSRvoBrU2O1eukuE4AAAAA\nSUVORK5CYII=\n",
 477 |       "text/plain": [
 478 |        "<matplotlib.figure.Figure at 0x11ee1d518>"
 479 |       ]
 480 |      },
 481 |      "metadata": {},
 482 |      "output_type": "display_data"
 483 |     }
 484 |    ],
 485 |    "source": [
 486 |     "plt.figure(figsize=(12,10))\n",
 487 |     "plt.plot(train_loss_history, color=\"red\", label=\"train loss\")\n",
 488 |     "plt.plot(valid_loss_history, color=\"blue\", label=\"valid loss\")\n",
 489 |     "plt.xlabel(\"epoch\")\n",
 490 |     "plt.ylabel(\"loss\")\n",
 491 |     "plt.legend()\n",
 492 |     "plt.show()"
 493 |    ]
 494 |   },
 495 |   {
 496 |    "cell_type": "code",
 497 |    "execution_count": 23,
 498 |    "metadata": {},
 499 |    "outputs": [
 500 |     {
 501 |      "name": "stdout",
 502 |      "output_type": "stream",
 503 |      "text": [
 504 |       "MF Accuracy: 45.008881%\n"
 505 |      ]
 506 |     }
 507 |    ],
 508 |    "source": [
 509 |     "mf_accuracy = np.sum(np.round(final_valid_predictions) == validset.rating.values) / len(final_valid_predictions)\n",
 510 |     "print(\"MF Accuracy: %f%%\"%(mf_accuracy*100,))"
 511 |    ]
 512 |   },
 513 |   {
 514 |    "cell_type": "markdown",
 515 |    "metadata": {},
 516 |    "source": [
 517 |     "### results on validation set"
 518 |    ]
 519 |   },
 520 |   {
 521 |    "cell_type": "code",
 522 |    "execution_count": 17,
 523 |    "metadata": {},
 524 |    "outputs": [
 525 |     {
 526 |      "data": {
 527 |       "text/html": [
 528 |        "<div>\n",
 529 |        "<style>\n",
 530 |        "    .dataframe thead tr:only-child th {\n",
 531 |        "        text-align: right;\n",
 532 |        "    }\n",
 533 |        "\n",
 534 |        "    .dataframe thead th {\n",
 535 |        "        text-align: left;\n",
 536 |        "    }\n",
 537 |        "\n",
 538 |        "    .dataframe tbody tr th {\n",
 539 |        "        vertical-align: top;\n",
 540 |        "    }\n",
 541 |        "</style>\n",
 542 |        "<table border=\"1\" class=\"dataframe\">\n",
 543 |        "  <thead>\n",
 544 |        "    <tr style=\"text-align: right;\">\n",
 545 |        "      <th></th>\n",
 546 |        "      <th>gender</th>\n",
 547 |        "      <th>userid</th>\n",
 548 |        "      <th>movieid</th>\n",
 549 |        "      <th>age_desc</th>\n",
 550 |        "      <th>occ_desc</th>\n",
 551 |        "      <th>title</th>\n",
 552 |        "      <th>genre</th>\n",
 553 |        "      <th>rating</th>\n",
 554 |        "      <th>prediction (rnd.)</th>\n",
 555 |        "      <th>prediction (prc.)</th>\n",
 556 |        "    </tr>\n",
 557 |        "  </thead>\n",
 558 |        "  <tbody>\n",
 559 |        "    <tr>\n",
 560 |        "      <th>323753</th>\n",
 561 |        "      <td>M</td>\n",
 562 |        "      <td>3432</td>\n",
 563 |        "      <td>1220</td>\n",
 564 |        "      <td>25-34</td>\n",
 565 |        "      <td>programmer</td>\n",
 566 |        "      <td>Blues Brothers, The (1980)</td>\n",
 567 |        "      <td>Action|Comedy|Musical</td>\n",
 568 |        "      <td>4</td>\n",
 569 |        "      <td>4</td>\n",
 570 |        "      <td>3.595490</td>\n",
 571 |        "    </tr>\n",
 572 |        "    <tr>\n",
 573 |        "      <th>572412</th>\n",
 574 |        "      <td>M</td>\n",
 575 |        "      <td>3579</td>\n",
 576 |        "      <td>2089</td>\n",
 577 |        "      <td>18-24</td>\n",
 578 |        "      <td>other or not specified</td>\n",
 579 |        "      <td>Rescuers Down Under, The (1990)</td>\n",
 580 |        "      <td>Animation|Children's</td>\n",
 581 |        "      <td>3</td>\n",
 582 |        "      <td>3</td>\n",
 583 |        "      <td>2.909302</td>\n",
 584 |        "    </tr>\n",
 585 |        "    <tr>\n",
 586 |        "      <th>473636</th>\n",
 587 |        "      <td>M</td>\n",
 588 |        "      <td>3931</td>\n",
 589 |        "      <td>1680</td>\n",
 590 |        "      <td>25-34</td>\n",
 591 |        "      <td>customer service</td>\n",
 592 |        "      <td>Sliding Doors (1998)</td>\n",
 593 |        "      <td>Drama|Romance</td>\n",
 594 |        "      <td>4</td>\n",
 595 |        "      <td>4</td>\n",
 596 |        "      <td>3.858740</td>\n",
 597 |        "    </tr>\n",
 598 |        "    <tr>\n",
 599 |        "      <th>308120</th>\n",
 600 |        "      <td>M</td>\n",
 601 |        "      <td>4378</td>\n",
 602 |        "      <td>1203</td>\n",
 603 |        "      <td>18-24</td>\n",
 604 |        "      <td>technician/engineer</td>\n",
 605 |        "      <td>12 Angry Men (1957)</td>\n",
 606 |        "      <td>Drama</td>\n",
 607 |        "      <td>3</td>\n",
 608 |        "      <td>3</td>\n",
 609 |        "      <td>3.197440</td>\n",
 610 |        "    </tr>\n",
 611 |        "    <tr>\n",
 612 |        "      <th>621235</th>\n",
 613 |        "      <td>M</td>\n",
 614 |        "      <td>4251</td>\n",
 615 |        "      <td>2301</td>\n",
 616 |        "      <td>45-49</td>\n",
 617 |        "      <td>executive/managerial</td>\n",
 618 |        "      <td>History of the World: Part I (1981)</td>\n",
 619 |        "      <td>Comedy</td>\n",
 620 |        "      <td>2</td>\n",
 621 |        "      <td>3</td>\n",
 622 |        "      <td>3.245275</td>\n",
 623 |        "    </tr>\n",
 624 |        "    <tr>\n",
 625 |        "      <th>280736</th>\n",
 626 |        "      <td>M</td>\n",
 627 |        "      <td>1482</td>\n",
 628 |        "      <td>1127</td>\n",
 629 |        "      <td>25-34</td>\n",
 630 |        "      <td>executive/managerial</td>\n",
 631 |        "      <td>Abyss, The (1989)</td>\n",
 632 |        "      <td>Action|Adventure|Sci-Fi|Thriller</td>\n",
 633 |        "      <td>3</td>\n",
 634 |        "      <td>4</td>\n",
 635 |        "      <td>3.565892</td>\n",
 636 |        "    </tr>\n",
 637 |        "    <tr>\n",
 638 |        "      <th>251324</th>\n",
 639 |        "      <td>M</td>\n",
 640 |        "      <td>2414</td>\n",
 641 |        "      <td>1033</td>\n",
 642 |        "      <td>25-34</td>\n",
 643 |        "      <td>academic/educator</td>\n",
 644 |        "      <td>Fox and the Hound, The (1981)</td>\n",
 645 |        "      <td>Animation|Children's</td>\n",
 646 |        "      <td>4</td>\n",
 647 |        "      <td>4</td>\n",
 648 |        "      <td>3.941283</td>\n",
 649 |        "    </tr>\n",
 650 |        "    <tr>\n",
 651 |        "      <th>178892</th>\n",
 652 |        "      <td>M</td>\n",
 653 |        "      <td>4732</td>\n",
 654 |        "      <td>648</td>\n",
 655 |        "      <td>25-34</td>\n",
 656 |        "      <td>sales/marketing</td>\n",
 657 |        "      <td>Mission: Impossible (1996)</td>\n",
 658 |        "      <td>Action|Adventure|Mystery</td>\n",
 659 |        "      <td>3</td>\n",
 660 |        "      <td>3</td>\n",
 661 |        "      <td>3.208387</td>\n",
 662 |        "    </tr>\n",
 663 |        "    <tr>\n",
 664 |        "      <th>723124</th>\n",
 665 |        "      <td>M</td>\n",
 666 |        "      <td>4439</td>\n",
 667 |        "      <td>2694</td>\n",
 668 |        "      <td>35-44</td>\n",
 669 |        "      <td>doctor/health care</td>\n",
 670 |        "      <td>Big Daddy (1999)</td>\n",
 671 |        "      <td>Comedy</td>\n",
 672 |        "      <td>3</td>\n",
 673 |        "      <td>3</td>\n",
 674 |        "      <td>3.135849</td>\n",
 675 |        "    </tr>\n",
 676 |        "    <tr>\n",
 677 |        "      <th>83939</th>\n",
 678 |        "      <td>M</td>\n",
 679 |        "      <td>727</td>\n",
 680 |        "      <td>317</td>\n",
 681 |        "      <td>35-44</td>\n",
 682 |        "      <td>lawyer</td>\n",
 683 |        "      <td>Santa Clause, The (1994)</td>\n",
 684 |        "      <td>Children's|Comedy|Fantasy</td>\n",
 685 |        "      <td>4</td>\n",
 686 |        "      <td>4</td>\n",
 687 |        "      <td>3.934950</td>\n",
 688 |        "    </tr>\n",
 689 |        "    <tr>\n",
 690 |        "      <th>541832</th>\n",
 691 |        "      <td>M</td>\n",
 692 |        "      <td>4098</td>\n",
 693 |        "      <td>2001</td>\n",
 694 |        "      <td>35-44</td>\n",
 695 |        "      <td>executive/managerial</td>\n",
 696 |        "      <td>Lethal Weapon 2 (1989)</td>\n",
 697 |        "      <td>Action|Comedy|Crime|Drama</td>\n",
 698 |        "      <td>4</td>\n",
 699 |        "      <td>4</td>\n",
 700 |        "      <td>4.141999</td>\n",
 701 |        "    </tr>\n",
 702 |        "    <tr>\n",
 703 |        "      <th>915511</th>\n",
 704 |        "      <td>M</td>\n",
 705 |        "      <td>1579</td>\n",
 706 |        "      <td>3499</td>\n",
 707 |        "      <td>25-34</td>\n",
 708 |        "      <td>other or not specified</td>\n",
 709 |        "      <td>Misery (1990)</td>\n",
 710 |        "      <td>Horror</td>\n",
 711 |        "      <td>4</td>\n",
 712 |        "      <td>4</td>\n",
 713 |        "      <td>4.010313</td>\n",
 714 |        "    </tr>\n",
 715 |        "    <tr>\n",
 716 |        "      <th>405738</th>\n",
 717 |        "      <td>M</td>\n",
 718 |        "      <td>3934</td>\n",
 719 |        "      <td>1379</td>\n",
 720 |        "      <td>25-34</td>\n",
 721 |        "      <td>other or not specified</td>\n",
 722 |        "      <td>Young Guns II (1990)</td>\n",
 723 |        "      <td>Action|Comedy|Western</td>\n",
 724 |        "      <td>2</td>\n",
 725 |        "      <td>2</td>\n",
 726 |        "      <td>2.088194</td>\n",
 727 |        "    </tr>\n",
 728 |        "    <tr>\n",
 729 |        "      <th>285899</th>\n",
 730 |        "      <td>F</td>\n",
 731 |        "      <td>1899</td>\n",
 732 |        "      <td>1147</td>\n",
 733 |        "      <td>45-49</td>\n",
 734 |        "      <td>doctor/health care</td>\n",
 735 |        "      <td>When We Were Kings (1996)</td>\n",
 736 |        "      <td>Documentary</td>\n",
 737 |        "      <td>4</td>\n",
 738 |        "      <td>4</td>\n",
 739 |        "      <td>4.463424</td>\n",
 740 |        "    </tr>\n",
 741 |        "    <tr>\n",
 742 |        "      <th>181963</th>\n",
 743 |        "      <td>F</td>\n",
 744 |        "      <td>4048</td>\n",
 745 |        "      <td>673</td>\n",
 746 |        "      <td>35-44</td>\n",
 747 |        "      <td>academic/educator</td>\n",
 748 |        "      <td>Space Jam (1996)</td>\n",
 749 |        "      <td>Adventure|Animation|Children's|Comedy|Fantasy</td>\n",
 750 |        "      <td>4</td>\n",
 751 |        "      <td>4</td>\n",
 752 |        "      <td>4.013864</td>\n",
 753 |        "    </tr>\n",
 754 |        "    <tr>\n",
 755 |        "      <th>931626</th>\n",
 756 |        "      <td>M</td>\n",
 757 |        "      <td>3125</td>\n",
 758 |        "      <td>3555</td>\n",
 759 |        "      <td>25-34</td>\n",
 760 |        "      <td>executive/managerial</td>\n",
 761 |        "      <td>U-571 (2000)</td>\n",
 762 |        "      <td>Action|Thriller</td>\n",
 763 |        "      <td>4</td>\n",
 764 |        "      <td>4</td>\n",
 765 |        "      <td>3.929566</td>\n",
 766 |        "    </tr>\n",
 767 |        "    <tr>\n",
 768 |        "      <th>602570</th>\n",
 769 |        "      <td>M</td>\n",
 770 |        "      <td>5788</td>\n",
 771 |        "      <td>2193</td>\n",
 772 |        "      <td>25-34</td>\n",
 773 |        "      <td>other or not specified</td>\n",
 774 |        "      <td>Willow (1988)</td>\n",
 775 |        "      <td>Action|Adventure|Fantasy</td>\n",
 776 |        "      <td>3</td>\n",
 777 |        "      <td>4</td>\n",
 778 |        "      <td>3.773484</td>\n",
 779 |        "    </tr>\n",
 780 |        "    <tr>\n",
 781 |        "      <th>516839</th>\n",
 782 |        "      <td>F</td>\n",
 783 |        "      <td>5530</td>\n",
 784 |        "      <td>1924</td>\n",
 785 |        "      <td>18-24</td>\n",
 786 |        "      <td>college/grad student</td>\n",
 787 |        "      <td>Plan 9 from Outer Space (1958)</td>\n",
 788 |        "      <td>Horror|Sci-Fi</td>\n",
 789 |        "      <td>3</td>\n",
 790 |        "      <td>3</td>\n",
 791 |        "      <td>2.927153</td>\n",
 792 |        "    </tr>\n",
 793 |        "    <tr>\n",
 794 |        "      <th>596378</th>\n",
 795 |        "      <td>M</td>\n",
 796 |        "      <td>2242</td>\n",
 797 |        "      <td>2162</td>\n",
 798 |        "      <td>18-24</td>\n",
 799 |        "      <td>college/grad student</td>\n",
 800 |        "      <td>NeverEnding Story II: The Next Chapter, The (1...</td>\n",
 801 |        "      <td>Adventure|Children's|Fantasy</td>\n",
 802 |        "      <td>1</td>\n",
 803 |        "      <td>3</td>\n",
 804 |        "      <td>3.229770</td>\n",
 805 |        "    </tr>\n",
 806 |        "    <tr>\n",
 807 |        "      <th>336866</th>\n",
 808 |        "      <td>F</td>\n",
 809 |        "      <td>1516</td>\n",
 810 |        "      <td>1240</td>\n",
 811 |        "      <td>25-34</td>\n",
 812 |        "      <td>programmer</td>\n",
 813 |        "      <td>Terminator, The (1984)</td>\n",
 814 |        "      <td>Action|Sci-Fi|Thriller</td>\n",
 815 |        "      <td>4</td>\n",
 816 |        "      <td>3</td>\n",
 817 |        "      <td>3.214885</td>\n",
 818 |        "    </tr>\n",
 819 |        "    <tr>\n",
 820 |        "      <th>403557</th>\n",
 821 |        "      <td>F</td>\n",
 822 |        "      <td>4208</td>\n",
 823 |        "      <td>1376</td>\n",
 824 |        "      <td>35-44</td>\n",
 825 |        "      <td>other or not specified</td>\n",
 826 |        "      <td>Star Trek IV: The Voyage Home (1986)</td>\n",
 827 |        "      <td>Action|Adventure|Sci-Fi</td>\n",
 828 |        "      <td>5</td>\n",
 829 |        "      <td>4</td>\n",
 830 |        "      <td>3.898657</td>\n",
 831 |        "    </tr>\n",
 832 |        "    <tr>\n",
 833 |        "      <th>532357</th>\n",
 834 |        "      <td>M</td>\n",
 835 |        "      <td>3101</td>\n",
 836 |        "      <td>1968</td>\n",
 837 |        "      <td>18-24</td>\n",
 838 |        "      <td>scientist</td>\n",
 839 |        "      <td>Breakfast Club, The (1985)</td>\n",
 840 |        "      <td>Comedy|Drama</td>\n",
 841 |        "      <td>5</td>\n",
 842 |        "      <td>4</td>\n",
 843 |        "      <td>4.322102</td>\n",
 844 |        "    </tr>\n",
 845 |        "    <tr>\n",
 846 |        "      <th>898240</th>\n",
 847 |        "      <td>M</td>\n",
 848 |        "      <td>5648</td>\n",
 849 |        "      <td>3421</td>\n",
 850 |        "      <td>25-34</td>\n",
 851 |        "      <td>technician/engineer</td>\n",
 852 |        "      <td>Animal House (1978)</td>\n",
 853 |        "      <td>Comedy</td>\n",
 854 |        "      <td>4</td>\n",
 855 |        "      <td>4</td>\n",
 856 |        "      <td>4.268606</td>\n",
 857 |        "    </tr>\n",
 858 |        "    <tr>\n",
 859 |        "      <th>877298</th>\n",
 860 |        "      <td>M</td>\n",
 861 |        "      <td>2095</td>\n",
 862 |        "      <td>3301</td>\n",
 863 |        "      <td>56+</td>\n",
 864 |        "      <td>retired</td>\n",
 865 |        "      <td>Whole Nine Yards, The (2000)</td>\n",
 866 |        "      <td>Comedy|Crime</td>\n",
 867 |        "      <td>3</td>\n",
 868 |        "      <td>4</td>\n",
 869 |        "      <td>3.500426</td>\n",
 870 |        "    </tr>\n",
 871 |        "    <tr>\n",
 872 |        "      <th>796428</th>\n",
 873 |        "      <td>M</td>\n",
 874 |        "      <td>1395</td>\n",
 875 |        "      <td>2968</td>\n",
 876 |        "      <td>25-34</td>\n",
 877 |        "      <td>artist</td>\n",
 878 |        "      <td>Time Bandits (1981)</td>\n",
 879 |        "      <td>Adventure|Fantasy|Sci-Fi</td>\n",
 880 |        "      <td>5</td>\n",
 881 |        "      <td>5</td>\n",
 882 |        "      <td>4.660578</td>\n",
 883 |        "    </tr>\n",
 884 |        "    <tr>\n",
 885 |        "      <th>732618</th>\n",
 886 |        "      <td>M</td>\n",
 887 |        "      <td>523</td>\n",
 888 |        "      <td>2716</td>\n",
 889 |        "      <td>50-55</td>\n",
 890 |        "      <td>executive/managerial</td>\n",
 891 |        "      <td>Ghostbusters (1984)</td>\n",
 892 |        "      <td>Comedy|Horror</td>\n",
 893 |        "      <td>3</td>\n",
 894 |        "      <td>4</td>\n",
 895 |        "      <td>4.430697</td>\n",
 896 |        "    </tr>\n",
 897 |        "    <tr>\n",
 898 |        "      <th>627537</th>\n",
 899 |        "      <td>F</td>\n",
 900 |        "      <td>3713</td>\n",
 901 |        "      <td>2324</td>\n",
 902 |        "      <td>25-34</td>\n",
 903 |        "      <td>executive/managerial</td>\n",
 904 |        "      <td>Life Is Beautiful (La Vita è bella) (1997)</td>\n",
 905 |        "      <td>Comedy|Drama</td>\n",
 906 |        "      <td>5</td>\n",
 907 |        "      <td>5</td>\n",
 908 |        "      <td>4.849105</td>\n",
 909 |        "    </tr>\n",
 910 |        "    <tr>\n",
 911 |        "      <th>40433</th>\n",
 912 |        "      <td>M</td>\n",
 913 |        "      <td>5677</td>\n",
 914 |        "      <td>141</td>\n",
 915 |        "      <td>25-34</td>\n",
 916 |        "      <td>academic/educator</td>\n",
 917 |        "      <td>Birdcage, The (1996)</td>\n",
 918 |        "      <td>Comedy</td>\n",
 919 |        "      <td>5</td>\n",
 920 |        "      <td>4</td>\n",
 921 |        "      <td>3.652851</td>\n",
 922 |        "    </tr>\n",
 923 |        "    <tr>\n",
 924 |        "      <th>621897</th>\n",
 925 |        "      <td>F</td>\n",
 926 |        "      <td>3464</td>\n",
 927 |        "      <td>2302</td>\n",
 928 |        "      <td>25-34</td>\n",
 929 |        "      <td>sales/marketing</td>\n",
 930 |        "      <td>My Cousin Vinny (1992)</td>\n",
 931 |        "      <td>Comedy</td>\n",
 932 |        "      <td>3</td>\n",
 933 |        "      <td>4</td>\n",
 934 |        "      <td>3.703601</td>\n",
 935 |        "    </tr>\n",
 936 |        "    <tr>\n",
 937 |        "      <th>904828</th>\n",
 938 |        "      <td>F</td>\n",
 939 |        "      <td>3205</td>\n",
 940 |        "      <td>3450</td>\n",
 941 |        "      <td>35-44</td>\n",
 942 |        "      <td>self-employed</td>\n",
 943 |        "      <td>Grumpy Old Men (1993)</td>\n",
 944 |        "      <td>Comedy</td>\n",
 945 |        "      <td>4</td>\n",
 946 |        "      <td>3</td>\n",
 947 |        "      <td>3.459073</td>\n",
 948 |        "    </tr>\n",
 949 |        "    <tr>\n",
 950 |        "      <th>387733</th>\n",
 951 |        "      <td>M</td>\n",
 952 |        "      <td>5111</td>\n",
 953 |        "      <td>1334</td>\n",
 954 |        "      <td>35-44</td>\n",
 955 |        "      <td>artist</td>\n",
 956 |        "      <td>Blob, The (1958)</td>\n",
 957 |        "      <td>Horror|Sci-Fi</td>\n",
 958 |        "      <td>4</td>\n",
 959 |        "      <td>4</td>\n",
 960 |        "      <td>3.665181</td>\n",
 961 |        "    </tr>\n",
 962 |        "    <tr>\n",
 963 |        "      <th>482628</th>\n",
 964 |        "      <td>F</td>\n",
 965 |        "      <td>1894</td>\n",
 966 |        "      <td>1721</td>\n",
 967 |        "      <td>35-44</td>\n",
 968 |        "      <td>executive/managerial</td>\n",
 969 |        "      <td>Titanic (1997)</td>\n",
 970 |        "      <td>Drama|Romance</td>\n",
 971 |        "      <td>4</td>\n",
 972 |        "      <td>4</td>\n",
 973 |        "      <td>3.531790</td>\n",
 974 |        "    </tr>\n",
 975 |        "    <tr>\n",
 976 |        "      <th>851826</th>\n",
 977 |        "      <td>M</td>\n",
 978 |        "      <td>4774</td>\n",
 979 |        "      <td>3176</td>\n",
 980 |        "      <td>45-49</td>\n",
 981 |        "      <td>sales/marketing</td>\n",
 982 |        "      <td>Talented Mr. Ripley, The (1999)</td>\n",
 983 |        "      <td>Drama|Mystery|Thriller</td>\n",
 984 |        "      <td>4</td>\n",
 985 |        "      <td>4</td>\n",
 986 |        "      <td>4.008468</td>\n",
 987 |        "    </tr>\n",
 988 |        "    <tr>\n",
 989 |        "      <th>566304</th>\n",
 990 |        "      <td>M</td>\n",
 991 |        "      <td>4957</td>\n",
 992 |        "      <td>2072</td>\n",
 993 |        "      <td>25-34</td>\n",
 994 |        "      <td>artist</td>\n",
 995 |        "      <td>'burbs, The (1989)</td>\n",
 996 |        "      <td>Comedy</td>\n",
 997 |        "      <td>3</td>\n",
 998 |        "      <td>3</td>\n",
 999 |        "      <td>2.680943</td>\n",
1000 |        "    </tr>\n",
1001 |        "    <tr>\n",
1002 |        "      <th>523888</th>\n",
1003 |        "      <td>M</td>\n",
1004 |        "      <td>4533</td>\n",
1005 |        "      <td>1954</td>\n",
1006 |        "      <td>25-34</td>\n",
1007 |        "      <td>programmer</td>\n",
1008 |        "      <td>Rocky (1976)</td>\n",
1009 |        "      <td>Action|Drama</td>\n",
1010 |        "      <td>3</td>\n",
1011 |        "      <td>4</td>\n",
1012 |        "      <td>3.779227</td>\n",
1013 |        "    </tr>\n",
1014 |        "    <tr>\n",
1015 |        "      <th>999132</th>\n",
1016 |        "      <td>M</td>\n",
1017 |        "      <td>2362</td>\n",
1018 |        "      <td>3948</td>\n",
1019 |        "      <td>25-34</td>\n",
1020 |        "      <td>sales/marketing</td>\n",
1021 |        "      <td>Meet the Parents (2000)</td>\n",
1022 |        "      <td>Comedy</td>\n",
1023 |        "      <td>5</td>\n",
1024 |        "      <td>4</td>\n",
1025 |        "      <td>4.275411</td>\n",
1026 |        "    </tr>\n",
1027 |        "    <tr>\n",
1028 |        "      <th>142695</th>\n",
1029 |        "      <td>F</td>\n",
1030 |        "      <td>2657</td>\n",
1031 |        "      <td>527</td>\n",
1032 |        "      <td>Under 18</td>\n",
1033 |        "      <td>K-12 student</td>\n",
1034 |        "      <td>Schindler's List (1993)</td>\n",
1035 |        "      <td>Drama|War</td>\n",
1036 |        "      <td>5</td>\n",
1037 |        "      <td>5</td>\n",
1038 |        "      <td>4.537353</td>\n",
1039 |        "    </tr>\n",
1040 |        "    <tr>\n",
1041 |        "      <th>169208</th>\n",
1042 |        "      <td>F</td>\n",
1043 |        "      <td>2926</td>\n",
1044 |        "      <td>595</td>\n",
1045 |        "      <td>18-24</td>\n",
1046 |        "      <td>artist</td>\n",
1047 |        "      <td>Beauty and the Beast (1991)</td>\n",
1048 |        "      <td>Animation|Children's|Musical</td>\n",
1049 |        "      <td>4</td>\n",
1050 |        "      <td>4</td>\n",
1051 |        "      <td>4.074231</td>\n",
1052 |        "    </tr>\n",
1053 |        "    <tr>\n",
1054 |        "      <th>950386</th>\n",
1055 |        "      <td>M</td>\n",
1056 |        "      <td>3554</td>\n",
1057 |        "      <td>3676</td>\n",
1058 |        "      <td>25-34</td>\n",
1059 |        "      <td>scientist</td>\n",
1060 |        "      <td>Eraserhead (1977)</td>\n",
1061 |        "      <td>Drama|Horror</td>\n",
1062 |        "      <td>4</td>\n",
1063 |        "      <td>3</td>\n",
1064 |        "      <td>2.755942</td>\n",
1065 |        "    </tr>\n",
1066 |        "    <tr>\n",
1067 |        "      <th>830422</th>\n",
1068 |        "      <td>F</td>\n",
1069 |        "      <td>5643</td>\n",
1070 |        "      <td>3098</td>\n",
1071 |        "      <td>35-44</td>\n",
1072 |        "      <td>academic/educator</td>\n",
1073 |        "      <td>Natural, The (1984)</td>\n",
1074 |        "      <td>Drama</td>\n",
1075 |        "      <td>3</td>\n",
1076 |        "      <td>3</td>\n",
1077 |        "      <td>3.336859</td>\n",
1078 |        "    </tr>\n",
1079 |        "    <tr>\n",
1080 |        "      <th>974878</th>\n",
1081 |        "      <td>M</td>\n",
1082 |        "      <td>1820</td>\n",
1083 |        "      <td>3770</td>\n",
1084 |        "      <td>25-34</td>\n",
1085 |        "      <td>writer</td>\n",
1086 |        "      <td>Dreamscape (1984)</td>\n",
1087 |        "      <td>Adventure|Crime|Sci-Fi|Thriller</td>\n",
1088 |        "      <td>4</td>\n",
1089 |        "      <td>4</td>\n",
1090 |        "      <td>3.603387</td>\n",
1091 |        "    </tr>\n",
1092 |        "    <tr>\n",
1093 |        "      <th>129671</th>\n",
1094 |        "      <td>M</td>\n",
1095 |        "      <td>1352</td>\n",
1096 |        "      <td>480</td>\n",
1097 |        "      <td>35-44</td>\n",
1098 |        "      <td>writer</td>\n",
1099 |        "      <td>Jurassic Park (1993)</td>\n",
1100 |        "      <td>Action|Adventure|Sci-Fi</td>\n",
1101 |        "      <td>3</td>\n",
1102 |        "      <td>3</td>\n",
1103 |        "      <td>2.773809</td>\n",
1104 |        "    </tr>\n",
1105 |        "    <tr>\n",
1106 |        "      <th>629803</th>\n",
1107 |        "      <td>M</td>\n",
1108 |        "      <td>5488</td>\n",
1109 |        "      <td>2333</td>\n",
1110 |        "      <td>25-34</td>\n",
1111 |        "      <td>scientist</td>\n",
1112 |        "      <td>Gods and Monsters (1998)</td>\n",
1113 |        "      <td>Drama</td>\n",
1114 |        "      <td>5</td>\n",
1115 |        "      <td>4</td>\n",
1116 |        "      <td>3.892385</td>\n",
1117 |        "    </tr>\n",
1118 |        "    <tr>\n",
1119 |        "      <th>80676</th>\n",
1120 |        "      <td>F</td>\n",
1121 |        "      <td>4208</td>\n",
1122 |        "      <td>302</td>\n",
1123 |        "      <td>35-44</td>\n",
1124 |        "      <td>other or not specified</td>\n",
1125 |        "      <td>Queen Margot (La Reine Margot) (1994)</td>\n",
1126 |        "      <td>Drama|Romance</td>\n",
1127 |        "      <td>2</td>\n",
1128 |        "      <td>4</td>\n",
1129 |        "      <td>4.066545</td>\n",
1130 |        "    </tr>\n",
1131 |        "    <tr>\n",
1132 |        "      <th>851072</th>\n",
1133 |        "      <td>M</td>\n",
1134 |        "      <td>1548</td>\n",
1135 |        "      <td>3176</td>\n",
1136 |        "      <td>35-44</td>\n",
1137 |        "      <td>academic/educator</td>\n",
1138 |        "      <td>Talented Mr. Ripley, The (1999)</td>\n",
1139 |        "      <td>Drama|Mystery|Thriller</td>\n",
1140 |        "      <td>5</td>\n",
1141 |        "      <td>4</td>\n",
1142 |        "      <td>4.229926</td>\n",
1143 |        "    </tr>\n",
1144 |        "    <tr>\n",
1145 |        "      <th>902891</th>\n",
1146 |        "      <td>M</td>\n",
1147 |        "      <td>4030</td>\n",
1148 |        "      <td>3444</td>\n",
1149 |        "      <td>35-44</td>\n",
1150 |        "      <td>executive/managerial</td>\n",
1151 |        "      <td>Bloodsport (1988)</td>\n",
1152 |        "      <td>Action</td>\n",
1153 |        "      <td>4</td>\n",
1154 |        "      <td>3</td>\n",
1155 |        "      <td>3.480523</td>\n",
1156 |        "    </tr>\n",
1157 |        "    <tr>\n",
1158 |        "      <th>516319</th>\n",
1159 |        "      <td>F</td>\n",
1160 |        "      <td>4472</td>\n",
1161 |        "      <td>1923</td>\n",
1162 |        "      <td>45-49</td>\n",
1163 |        "      <td>self-employed</td>\n",
1164 |        "      <td>There's Something About Mary (1998)</td>\n",
1165 |        "      <td>Comedy</td>\n",
1166 |        "      <td>2</td>\n",
1167 |        "      <td>3</td>\n",
1168 |        "      <td>3.330519</td>\n",
1169 |        "    </tr>\n",
1170 |        "    <tr>\n",
1171 |        "      <th>674849</th>\n",
1172 |        "      <td>M</td>\n",
1173 |        "      <td>5102</td>\n",
1174 |        "      <td>2474</td>\n",
1175 |        "      <td>25-34</td>\n",
1176 |        "      <td>programmer</td>\n",
1177 |        "      <td>Color of Money, The (1986)</td>\n",
1178 |        "      <td>Drama</td>\n",
1179 |        "      <td>4</td>\n",
1180 |        "      <td>4</td>\n",
1181 |        "      <td>4.380075</td>\n",
1182 |        "    </tr>\n",
1183 |        "    <tr>\n",
1184 |        "      <th>184396</th>\n",
1185 |        "      <td>F</td>\n",
1186 |        "      <td>1051</td>\n",
1187 |        "      <td>708</td>\n",
1188 |        "      <td>25-34</td>\n",
1189 |        "      <td>other or not specified</td>\n",
1190 |        "      <td>Truth About Cats &amp; Dogs, The (1996)</td>\n",
1191 |        "      <td>Comedy|Romance</td>\n",
1192 |        "      <td>4</td>\n",
1193 |        "      <td>3</td>\n",
1194 |        "      <td>3.434905</td>\n",
1195 |        "    </tr>\n",
1196 |        "    <tr>\n",
1197 |        "      <th>737397</th>\n",
1198 |        "      <td>M</td>\n",
1199 |        "      <td>1854</td>\n",
1200 |        "      <td>2723</td>\n",
1201 |        "      <td>25-34</td>\n",
1202 |        "      <td>other or not specified</td>\n",
1203 |        "      <td>Mystery Men (1999)</td>\n",
1204 |        "      <td>Action|Adventure|Comedy</td>\n",
1205 |        "      <td>2</td>\n",
1206 |        "      <td>2</td>\n",
1207 |        "      <td>2.071134</td>\n",
1208 |        "    </tr>\n",
1209 |        "  </tbody>\n",
1210 |        "</table>\n",
1211 |        "</div>"
1212 |       ],
1213 |       "text/plain": [
1214 |        "       gender  userid  movieid  age_desc                occ_desc  \\\n",
1215 |        "323753      M    3432     1220     25-34              programmer   \n",
1216 |        "572412      M    3579     2089     18-24  other or not specified   \n",
1217 |        "473636      M    3931     1680     25-34        customer service   \n",
1218 |        "308120      M    4378     1203     18-24     technician/engineer   \n",
1219 |        "621235      M    4251     2301     45-49    executive/managerial   \n",
1220 |        "280736      M    1482     1127     25-34    executive/managerial   \n",
1221 |        "251324      M    2414     1033     25-34       academic/educator   \n",
1222 |        "178892      M    4732      648     25-34         sales/marketing   \n",
1223 |        "723124      M    4439     2694     35-44      doctor/health care   \n",
1224 |        "83939       M     727      317     35-44                  lawyer   \n",
1225 |        "541832      M    4098     2001     35-44    executive/managerial   \n",
1226 |        "915511      M    1579     3499     25-34  other or not specified   \n",
1227 |        "405738      M    3934     1379     25-34  other or not specified   \n",
1228 |        "285899      F    1899     1147     45-49      doctor/health care   \n",
1229 |        "181963      F    4048      673     35-44       academic/educator   \n",
1230 |        "931626      M    3125     3555     25-34    executive/managerial   \n",
1231 |        "602570      M    5788     2193     25-34  other or not specified   \n",
1232 |        "516839      F    5530     1924     18-24    college/grad student   \n",
1233 |        "596378      M    2242     2162     18-24    college/grad student   \n",
1234 |        "336866      F    1516     1240     25-34              programmer   \n",
1235 |        "403557      F    4208     1376     35-44  other or not specified   \n",
1236 |        "532357      M    3101     1968     18-24               scientist   \n",
1237 |        "898240      M    5648     3421     25-34     technician/engineer   \n",
1238 |        "877298      M    2095     3301       56+                 retired   \n",
1239 |        "796428      M    1395     2968     25-34                  artist   \n",
1240 |        "732618      M     523     2716     50-55    executive/managerial   \n",
1241 |        "627537      F    3713     2324     25-34    executive/managerial   \n",
1242 |        "40433       M    5677      141     25-34       academic/educator   \n",
1243 |        "621897      F    3464     2302     25-34         sales/marketing   \n",
1244 |        "904828      F    3205     3450     35-44           self-employed   \n",
1245 |        "387733      M    5111     1334     35-44                  artist   \n",
1246 |        "482628      F    1894     1721     35-44    executive/managerial   \n",
1247 |        "851826      M    4774     3176     45-49         sales/marketing   \n",
1248 |        "566304      M    4957     2072     25-34                  artist   \n",
1249 |        "523888      M    4533     1954     25-34              programmer   \n",
1250 |        "999132      M    2362     3948     25-34         sales/marketing   \n",
1251 |        "142695      F    2657      527  Under 18            K-12 student   \n",
1252 |        "169208      F    2926      595     18-24                  artist   \n",
1253 |        "950386      M    3554     3676     25-34               scientist   \n",
1254 |        "830422      F    5643     3098     35-44       academic/educator   \n",
1255 |        "974878      M    1820     3770     25-34                  writer   \n",
1256 |        "129671      M    1352      480     35-44                  writer   \n",
1257 |        "629803      M    5488     2333     25-34               scientist   \n",
1258 |        "80676       F    4208      302     35-44  other or not specified   \n",
1259 |        "851072      M    1548     3176     35-44       academic/educator   \n",
1260 |        "902891      M    4030     3444     35-44    executive/managerial   \n",
1261 |        "516319      F    4472     1923     45-49           self-employed   \n",
1262 |        "674849      M    5102     2474     25-34              programmer   \n",
1263 |        "184396      F    1051      708     25-34  other or not specified   \n",
1264 |        "737397      M    1854     2723     25-34  other or not specified   \n",
1265 |        "\n",
1266 |        "                                                    title  \\\n",
1267 |        "323753                         Blues Brothers, The (1980)   \n",
1268 |        "572412                    Rescuers Down Under, The (1990)   \n",
1269 |        "473636                               Sliding Doors (1998)   \n",
1270 |        "308120                                12 Angry Men (1957)   \n",
1271 |        "621235                History of the World: Part I (1981)   \n",
1272 |        "280736                                  Abyss, The (1989)   \n",
1273 |        "251324                      Fox and the Hound, The (1981)   \n",
1274 |        "178892                         Mission: Impossible (1996)   \n",
1275 |        "723124                                   Big Daddy (1999)   \n",
1276 |        "83939                            Santa Clause, The (1994)   \n",
1277 |        "541832                             Lethal Weapon 2 (1989)   \n",
1278 |        "915511                                      Misery (1990)   \n",
1279 |        "405738                               Young Guns II (1990)   \n",
1280 |        "285899                          When We Were Kings (1996)   \n",
1281 |        "181963                                   Space Jam (1996)   \n",
1282 |        "931626                                       U-571 (2000)   \n",
1283 |        "602570                                      Willow (1988)   \n",
1284 |        "516839                     Plan 9 from Outer Space (1958)   \n",
1285 |        "596378  NeverEnding Story II: The Next Chapter, The (1...   \n",
1286 |        "336866                             Terminator, The (1984)   \n",
1287 |        "403557               Star Trek IV: The Voyage Home (1986)   \n",
1288 |        "532357                         Breakfast Club, The (1985)   \n",
1289 |        "898240                                Animal House (1978)   \n",
1290 |        "877298                       Whole Nine Yards, The (2000)   \n",
1291 |        "796428                                Time Bandits (1981)   \n",
1292 |        "732618                                Ghostbusters (1984)   \n",
1293 |        "627537         Life Is Beautiful (La Vita è bella) (1997)   \n",
1294 |        "40433                                Birdcage, The (1996)   \n",
1295 |        "621897                             My Cousin Vinny (1992)   \n",
1296 |        "904828                              Grumpy Old Men (1993)   \n",
1297 |        "387733                                   Blob, The (1958)   \n",
1298 |        "482628                                     Titanic (1997)   \n",
1299 |        "851826                    Talented Mr. Ripley, The (1999)   \n",
1300 |        "566304                                 'burbs, The (1989)   \n",
1301 |        "523888                                       Rocky (1976)   \n",
1302 |        "999132                            Meet the Parents (2000)   \n",
1303 |        "142695                            Schindler's List (1993)   \n",
1304 |        "169208                        Beauty and the Beast (1991)   \n",
1305 |        "950386                                  Eraserhead (1977)   \n",
1306 |        "830422                                Natural, The (1984)   \n",
1307 |        "974878                                  Dreamscape (1984)   \n",
1308 |        "129671                               Jurassic Park (1993)   \n",
1309 |        "629803                           Gods and Monsters (1998)   \n",
1310 |        "80676               Queen Margot (La Reine Margot) (1994)   \n",
1311 |        "851072                    Talented Mr. Ripley, The (1999)   \n",
1312 |        "902891                                  Bloodsport (1988)   \n",
1313 |        "516319                There's Something About Mary (1998)   \n",
1314 |        "674849                         Color of Money, The (1986)   \n",
1315 |        "184396                Truth About Cats & Dogs, The (1996)   \n",
1316 |        "737397                                 Mystery Men (1999)   \n",
1317 |        "\n",
1318 |        "                                                genre  rating  \\\n",
1319 |        "323753                          Action|Comedy|Musical       4   \n",
1320 |        "572412                           Animation|Children's       3   \n",
1321 |        "473636                                  Drama|Romance       4   \n",
1322 |        "308120                                          Drama       3   \n",
1323 |        "621235                                         Comedy       2   \n",
1324 |        "280736               Action|Adventure|Sci-Fi|Thriller       3   \n",
1325 |        "251324                           Animation|Children's       4   \n",
1326 |        "178892                       Action|Adventure|Mystery       3   \n",
1327 |        "723124                                         Comedy       3   \n",
1328 |        "83939                       Children's|Comedy|Fantasy       4   \n",
1329 |        "541832                      Action|Comedy|Crime|Drama       4   \n",
1330 |        "915511                                         Horror       4   \n",
1331 |        "405738                          Action|Comedy|Western       2   \n",
1332 |        "285899                                    Documentary       4   \n",
1333 |        "181963  Adventure|Animation|Children's|Comedy|Fantasy       4   \n",
1334 |        "931626                                Action|Thriller       4   \n",
1335 |        "602570                       Action|Adventure|Fantasy       3   \n",
1336 |        "516839                                  Horror|Sci-Fi       3   \n",
1337 |        "596378                   Adventure|Children's|Fantasy       1   \n",
1338 |        "336866                         Action|Sci-Fi|Thriller       4   \n",
1339 |        "403557                        Action|Adventure|Sci-Fi       5   \n",
1340 |        "532357                                   Comedy|Drama       5   \n",
1341 |        "898240                                         Comedy       4   \n",
1342 |        "877298                                   Comedy|Crime       3   \n",
1343 |        "796428                       Adventure|Fantasy|Sci-Fi       5   \n",
1344 |        "732618                                  Comedy|Horror       3   \n",
1345 |        "627537                                   Comedy|Drama       5   \n",
1346 |        "40433                                          Comedy       5   \n",
1347 |        "621897                                         Comedy       3   \n",
1348 |        "904828                                         Comedy       4   \n",
1349 |        "387733                                  Horror|Sci-Fi       4   \n",
1350 |        "482628                                  Drama|Romance       4   \n",
1351 |        "851826                         Drama|Mystery|Thriller       4   \n",
1352 |        "566304                                         Comedy       3   \n",
1353 |        "523888                                   Action|Drama       3   \n",
1354 |        "999132                                         Comedy       5   \n",
1355 |        "142695                                      Drama|War       5   \n",
1356 |        "169208                   Animation|Children's|Musical       4   \n",
1357 |        "950386                                   Drama|Horror       4   \n",
1358 |        "830422                                          Drama       3   \n",
1359 |        "974878                Adventure|Crime|Sci-Fi|Thriller       4   \n",
1360 |        "129671                        Action|Adventure|Sci-Fi       3   \n",
1361 |        "629803                                          Drama       5   \n",
1362 |        "80676                                   Drama|Romance       2   \n",
1363 |        "851072                         Drama|Mystery|Thriller       5   \n",
1364 |        "902891                                         Action       4   \n",
1365 |        "516319                                         Comedy       2   \n",
1366 |        "674849                                          Drama       4   \n",
1367 |        "184396                                 Comedy|Romance       4   \n",
1368 |        "737397                        Action|Adventure|Comedy       2   \n",
1369 |        "\n",
1370 |        "        prediction (rnd.)  prediction (prc.)  \n",
1371 |        "323753                  4           3.595490  \n",
1372 |        "572412                  3           2.909302  \n",
1373 |        "473636                  4           3.858740  \n",
1374 |        "308120                  3           3.197440  \n",
1375 |        "621235                  3           3.245275  \n",
1376 |        "280736                  4           3.565892  \n",
1377 |        "251324                  4           3.941283  \n",
1378 |        "178892                  3           3.208387  \n",
1379 |        "723124                  3           3.135849  \n",
1380 |        "83939                   4           3.934950  \n",
1381 |        "541832                  4           4.141999  \n",
1382 |        "915511                  4           4.010313  \n",
1383 |        "405738                  2           2.088194  \n",
1384 |        "285899                  4           4.463424  \n",
1385 |        "181963                  4           4.013864  \n",
1386 |        "931626                  4           3.929566  \n",
1387 |        "602570                  4           3.773484  \n",
1388 |        "516839                  3           2.927153  \n",
1389 |        "596378                  3           3.229770  \n",
1390 |        "336866                  3           3.214885  \n",
1391 |        "403557                  4           3.898657  \n",
1392 |        "532357                  4           4.322102  \n",
1393 |        "898240                  4           4.268606  \n",
1394 |        "877298                  4           3.500426  \n",
1395 |        "796428                  5           4.660578  \n",
1396 |        "732618                  4           4.430697  \n",
1397 |        "627537                  5           4.849105  \n",
1398 |        "40433                   4           3.652851  \n",
1399 |        "621897                  4           3.703601  \n",
1400 |        "904828                  3           3.459073  \n",
1401 |        "387733                  4           3.665181  \n",
1402 |        "482628                  4           3.531790  \n",
1403 |        "851826                  4           4.008468  \n",
1404 |        "566304                  3           2.680943  \n",
1405 |        "523888                  4           3.779227  \n",
1406 |        "999132                  4           4.275411  \n",
1407 |        "142695                  5           4.537353  \n",
1408 |        "169208                  4           4.074231  \n",
1409 |        "950386                  3           2.755942  \n",
1410 |        "830422                  3           3.336859  \n",
1411 |        "974878                  4           3.603387  \n",
1412 |        "129671                  3           2.773809  \n",
1413 |        "629803                  4           3.892385  \n",
1414 |        "80676                   4           4.066545  \n",
1415 |        "851072                  4           4.229926  \n",
1416 |        "902891                  3           3.480523  \n",
1417 |        "516319                  3           3.330519  \n",
1418 |        "674849                  4           4.380075  \n",
1419 |        "184396                  3           3.434905  \n",
1420 |        "737397                  2           2.071134  "
1421 |       ]
1422 |      },
1423 |      "execution_count": 17,
1424 |      "metadata": {},
1425 |      "output_type": "execute_result"
1426 |     }
1427 |    ],
1428 |    "source": [
1429 |     "results = validset[[\"gender\", \"userid\",\"movieid\",\"age_desc\",\"occ_desc\", \"title\", \"genre\", \"rating\"]].copy()\n",
1430 |     "results[\"prediction (rnd.)\"] = np.asarray(np.round(final_valid_predictions), dtype=np.int16)\n",
1431 |     "results[\"prediction (prc.)\"] = final_valid_predictions\n",
1432 |     "results.head(50)"
1433 |    ]
1434 |   },
1435 |   {
1436 |    "cell_type": "markdown",
1437 |    "metadata": {},
1438 |    "source": [
1439 |     "### Measures"
1440 |    ]
1441 |   },
1442 |   {
1443 |    "cell_type": "code",
1444 |    "execution_count": 12,
1445 |    "metadata": {
1446 |     "collapsed": true
1447 |    },
1448 |    "outputs": [],
1449 |    "source": [
1450 |     "### Precision, Recall, MAE, RMSE measures:\n",
1451 |     "\n",
1452 |     "def compute_recall(prediction_col, target_col):\n",
1453 |     "    recall=[]\n",
1454 |     "    for i in range(5):\n",
1455 |     "        rating_df = results[results[target_col]==i+1]\n",
1456 |     "        num_true_rating = len(rating_df)+0.0\n",
1457 |     "        current_recall = (len(rating_df[rating_df[prediction_col]==i+1]))/num_true_rating\n",
1458 |     "        recall.append(current_recall)\n",
1459 |     "    return recall\n",
1460 |     "\n",
1461 |     "def compute_precision(prediction_col, target_col): \n",
1462 |     "    precision=[]\n",
1463 |     "    for i in range(5):\n",
1464 |     "        pred_df = results[results[prediction_col]==i+1]\n",
1465 |     "        pred_rating = len(pred_df)+0.0\n",
1466 |     "        current_precision = (len(pred_df[pred_df[target_col]==i+1]))/pred_rating\n",
1467 |     "        precision.append(current_precision)\n",
1468 |     "    return precision    \n",
1469 |     "\n",
1470 |     "def compute_mae(prediction_col, target_col):\n",
1471 |     "    return np.mean(np.abs(results[prediction_col]-results[target_col]))\n",
1472 |     "\n",
1473 |     "def compute_rmse(prediction_col, target_col):\n",
1474 |     "    return np.sqrt(1/len(results)*np.sum((results[prediction_col]- results[target_col])**2))"
1475 |    ]
1476 |   },
1477 |   {
1478 |    "cell_type": "code",
1479 |    "execution_count": 13,
1480 |    "metadata": {},
1481 |    "outputs": [
1482 |     {
1483 |      "data": {
1484 |       "text/plain": [
1485 |        "[0.12242835057676299,\n",
1486 |        " 0.2602769297190613,\n",
1487 |        " 0.502085911485909,\n",
1488 |        " 0.6385712646755134,\n",
1489 |        " 0.2782322852636389]"
1490 |       ]
1491 |      },
1492 |      "execution_count": 13,
1493 |      "metadata": {},
1494 |      "output_type": "execute_result"
1495 |     }
1496 |    ],
1497 |    "source": [
1498 |     "compute_recall('prediction (rnd.)', 'rating')"
1499 |    ]
1500 |   },
1501 |   {
1502 |    "cell_type": "code",
1503 |    "execution_count": 14,
1504 |    "metadata": {},
1505 |    "outputs": [
1506 |     {
1507 |      "data": {
1508 |       "text/plain": [
1509 |        "[0.6914036265950302,\n",
1510 |        " 0.3469681397738952,\n",
1511 |        " 0.4058722824965967,\n",
1512 |        " 0.45429626657053657,\n",
1513 |        " 0.6664430478073582]"
1514 |       ]
1515 |      },
1516 |      "execution_count": 14,
1517 |      "metadata": {},
1518 |      "output_type": "execute_result"
1519 |     }
1520 |    ],
1521 |    "source": [
1522 |     "compute_precision('prediction (rnd.)', 'rating')"
1523 |    ]
1524 |   },
1525 |   {
1526 |    "cell_type": "code",
1527 |    "execution_count": 15,
1528 |    "metadata": {},
1529 |    "outputs": [
1530 |     {
1531 |      "data": {
1532 |       "text/plain": [
1533 |        "0.6391991015220138"
1534 |       ]
1535 |      },
1536 |      "execution_count": 15,
1537 |      "metadata": {},
1538 |      "output_type": "execute_result"
1539 |     }
1540 |    ],
1541 |    "source": [
1542 |     "compute_mae('prediction (rnd.)', 'rating')"
1543 |    ]
1544 |   },
1545 |   {
1546 |    "cell_type": "code",
1547 |    "execution_count": 16,
1548 |    "metadata": {},
1549 |    "outputs": [
1550 |     {
1551 |      "data": {
1552 |       "text/plain": [
1553 |        "0.91609156911837586"
1554 |       ]
1555 |      },
1556 |      "execution_count": 16,
1557 |      "metadata": {},
1558 |      "output_type": "execute_result"
1559 |     }
1560 |    ],
1561 |    "source": [
1562 |     "compute_rmse('prediction (rnd.)', 'rating')"
1563 |    ]
1564 |   },
1565 |   {
1566 |    "cell_type": "code",
1567 |    "execution_count": null,
1568 |    "metadata": {
1569 |     "collapsed": true
1570 |    },
1571 |    "outputs": [],
1572 |    "source": []
1573 |   }
1574 |  ],
1575 |  "metadata": {
1576 |   "kernelspec": {
1577 |    "display_name": "Python 3",
1578 |    "language": "python",
1579 |    "name": "python3"
1580 |   },
1581 |   "language_info": {
1582 |    "codemirror_mode": {
1583 |     "name": "ipython",
1584 |     "version": 3
1585 |    },
1586 |    "file_extension": ".py",
1587 |    "mimetype": "text/x-python",
1588 |    "name": "python",
1589 |    "nbconvert_exporter": "python",
1590 |    "pygments_lexer": "ipython3",
1591 |    "version": "3.6.1"
1592 |   }
1593 |  },
1594 |  "nbformat": 4,
1595 |  "nbformat_minor": 2
1596 | }
1597 | 


--------------------------------------------------------------------------------
/Deep_Recommender_Tutorial_Strata_NY_2017.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/datadynamo/strata_ny_2017_recommender_tutorial/db8631fc1892702aa909674a473093ae1796b274/Deep_Recommender_Tutorial_Strata_NY_2017.pdf


--------------------------------------------------------------------------------
/Deep_Wide_Learning.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "metadata": {},
  6 |    "source": [
  7 |     "In this tutorial we learn how to use the tf.esimator API to train a wide linear model and a deep feed-forward neural network. This approach combines the strengths of memorization and generalization.\n",
  8 |     "\n",
  9 |     "![wide and deep model](wide_n_deep.svg)\n",
 10 |     "\n",
 11 |     "The explaination of the model from tensorflow website as following:\n",
 12 |     "\n",
 13 |     "The figure above shows a comparison of a wide model (logistic regression with sparse features and transformations), a deep model (feed-forward neural network with an embedding layer and several hidden layers), and a Wide & Deep model (joint training of both). At a high level, here are the steps using the tf.estimator API:\n",
 14 |     "\n",
 15 |     "1. Preprocess our movielens dataset in pandas.\n",
 16 |     "2. Define features\n",
 17 |     "3. Build inputs from the original dataset \n",
 18 |     "4. Hash string type categorical features and use int type features value as category id directly.\n",
 19 |     "5. Create embeddings of sparse features for the deep model.\n",
 20 |     "6. Define features for both the deep and the wide part of the model.\n",
 21 |     "7. Train and validate the model."
 22 |    ]
 23 |   },
 24 |   {
 25 |    "cell_type": "code",
 26 |    "execution_count": 2,
 27 |    "metadata": {
 28 |     "collapsed": true
 29 |    },
 30 |    "outputs": [],
 31 |    "source": [
 32 |     "import tensorflow as tf\n",
 33 |     "import pandas as pd\n",
 34 |     "import matplotlib.pyplot as plt\n",
 35 |     "import numpy as np\n",
 36 |     "%matplotlib inline"
 37 |    ]
 38 |   },
 39 |   {
 40 |    "cell_type": "markdown",
 41 |    "metadata": {},
 42 |    "source": [
 43 |     "### 1. data preprocessing"
 44 |    ]
 45 |   },
 46 |   {
 47 |    "cell_type": "markdown",
 48 |    "metadata": {},
 49 |    "source": [
 50 |     "load dataset and split train and valid set. This is same step we have shown in the previous notebook for collaborative filtering."
 51 |    ]
 52 |   },
 53 |   {
 54 |    "cell_type": "code",
 55 |    "execution_count": 3,
 56 |    "metadata": {
 57 |     "collapsed": true
 58 |    },
 59 |    "outputs": [],
 60 |    "source": [
 61 |     "def load_movie_lens():\n",
 62 |     "    age_desc = {\n",
 63 |     "        1: \"Under 18\", 18: \"18-24\", 25: \"25-34\", 35: \"35-44\", 45: \"45-49\", 50: \"50-55\", 56: \"56+\"\n",
 64 |     "    }\n",
 65 |     "    occupation_desc = { \n",
 66 |     "        0: \"other or not specified\", 1: \"academic/educator\", 2: \"artist\", 3: \"clerical/admin\",\n",
 67 |     "        4: \"college/grad student\", 5: \"customer service\", 6: \"doctor/health care\",\n",
 68 |     "        7: \"executive/managerial\", 8: \"farmer\", 9: \"homemaker\", 10: \"K-12 student\", 11: \"lawyer\",\n",
 69 |     "        12: \"programmer\", 13: \"retired\", 14: \"sales/marketing\", 15: \"scientist\", 16: \"self-employed\",\n",
 70 |     "        17: \"technician/engineer\", 18: \"tradesman/craftsman\", 19: \"unemployed\", 20: \"writer\"\n",
 71 |     "    }\n",
 72 |     "    rating_data = pd.read_csv(\n",
 73 |     "        \"ml-1m/ratings.dat\",\n",
 74 |     "        sep=\"::\",\n",
 75 |     "        engine=\"python\",\n",
 76 |     "        encoding=\"latin-1\",\n",
 77 |     "        names=['userid', 'movieid', 'rating', 'timestamp'])\n",
 78 |     "    user_data = pd.read_csv(\n",
 79 |     "        \"ml-1m/users.dat\", \n",
 80 |     "        sep='::', \n",
 81 |     "        engine='python', \n",
 82 |     "        encoding='latin-1',\n",
 83 |     "        names=['userid', 'gender', 'age', 'occupation', 'zipcode']\n",
 84 |     "    )\n",
 85 |     "    user_data['age_desc'] = user_data['age'].apply(lambda x: age_desc[x])\n",
 86 |     "    user_data['occ_desc'] = user_data['occupation'].apply(lambda x: occupation_desc[x])\n",
 87 |     "    movie_data = pd.read_csv(\n",
 88 |     "        \"ml-1m/movies.dat\",\n",
 89 |     "        sep='::', \n",
 90 |     "        engine='python', \n",
 91 |     "        encoding='latin-1',\n",
 92 |     "        names=['movieid', 'title', 'genre']\n",
 93 |     "    )\n",
 94 |     "    dataset = pd.merge(pd.merge(rating_data, movie_data, how=\"left\", on=\"movieid\"), user_data, how=\"left\", on=\"userid\")\n",
 95 |     "    adj_col = dataset['movieid']\n",
 96 |     "    adj_col_uni = adj_col.sort_values().unique()\n",
 97 |     "    adj_df = pd.DataFrame(adj_col_uni).reset_index().rename(columns = {0:'movieid','index':'adj_movieid'})\n",
 98 |     "    dataset = pd.merge(adj_df,dataset,how=\"right\", on=\"movieid\")\n",
 99 |     "    dataset['adj_userid'] = dataset['userid'] - 1\n",
100 |     "    return dataset\n",
101 |     "\n",
102 |     "def split_dataset(dataset, split_frac=.7):\n",
103 |     "    dataset = dataset.sample(frac=1, replace=False)\n",
104 |     "    n_split = int(len(dataset)*split_frac)\n",
105 |     "    trainset = dataset[:n_split]\n",
106 |     "    validset = dataset[n_split:]\n",
107 |     "    return trainset, validset\n",
108 |     "\n",
109 |     "fullset = load_movie_lens()\n",
110 |     "trainset, validset = split_dataset(fullset)"
111 |    ]
112 |   },
113 |   {
114 |    "cell_type": "markdown",
115 |    "metadata": {},
116 |    "source": [
117 |     "### 2. define features"
118 |    ]
119 |   },
120 |   {
121 |    "cell_type": "markdown",
122 |    "metadata": {},
123 |    "source": [
124 |     "By looking at the dataset, we know that we have following features: \"genre\", \"zipcode\", \"gender\", \"age\", \"occupation\".\n",
125 |     " - The data type of \"genre\", \"zipcode\", \"gender\" are string, the data type of \"age\", \"occupation\" are int. So we group the features in STR and INT groups accordingly for further encoding.\n",
126 |     " - We select all features for the deep model\n",
127 |     " - We select some feature transformation for the wide model.<br>\n",
128 |     "\n",
129 |     "<font color=blue>The feature selection for deep and wide parts of the model is flexible, you can try out different combinations.</font>"
130 |    ]
131 |   },
132 |   {
133 |    "cell_type": "code",
134 |    "execution_count": 4,
135 |    "metadata": {
136 |     "collapsed": true
137 |    },
138 |    "outputs": [],
139 |    "source": [
140 |     "CAT_STR_COLS = [\"genre\", \"zipcode\" ,\"gender\"]\n",
141 |     "CAT_INT_COLS = [\"age\", \"occupation\"]\n",
142 |     "LABEL_COL = \"rating\"\n",
143 |     "DEEP_COLS = CAT_STR_COLS + CAT_INT_COLS\n",
144 |     "WIDE_COL_CROSSES = [[\"gender\", \"age\"], [\"gender\", \"occupation\"]]"
145 |    ]
146 |   },
147 |   {
148 |    "cell_type": "markdown",
149 |    "metadata": {},
150 |    "source": [
151 |     "### 3. build inputs from original dataset"
152 |    ]
153 |   },
154 |   {
155 |    "cell_type": "markdown",
156 |    "metadata": {},
157 |    "source": [
158 |     "Since this Deep and Widel Model API expects sparse tensors as inputs, we convert here all the feature columns and the label column from our original dataset to sparse tensors."
159 |    ]
160 |   },
161 |   {
162 |    "cell_type": "code",
163 |    "execution_count": 5,
164 |    "metadata": {
165 |     "collapsed": true
166 |    },
167 |    "outputs": [],
168 |    "source": [
169 |     "def make_inputs(dataframe):\n",
170 |     "    \"\"\"\n",
171 |     "    Creates sparse tensors to hold our feature values and constants to hold our label values.\n",
172 |     "    For each feature we have selected for the deep and wide model, we create a sparse tensor. We use tf.SparseTensor \n",
173 |     "    to create sparse tensors for features, and use tf.constant to create a constant with label values.\n",
174 |     "    \n",
175 |     "    Arguments:\n",
176 |     "    dataframe -- pandas dataframe containing the values of features and labels.\n",
177 |     "    \n",
178 |     "    Returns:\n",
179 |     "    feature_inputs -- a dictionary of sparse tensors of features.\n",
180 |     "    label_input -- a constant with shape of [number of training example, 1]\n",
181 |     "    \"\"\" \n",
182 |     "    feature_inputs = {\n",
183 |     "        col_name: tf.SparseTensor(\n",
184 |     "            indices = [[i, 0] for i in range(len(dataframe[col_name]))],\n",
185 |     "            values = dataframe[col_name].values,\n",
186 |     "            dense_shape = [len(dataframe[col_name]), 1]\n",
187 |     "        )\n",
188 |     "        for col_name in CAT_STR_COLS + CAT_INT_COLS\n",
189 |     "    }\n",
190 |     "    label_input = tf.constant(dataframe[LABEL_COL].values-1)\n",
191 |     "    return (feature_inputs, label_input)"
192 |    ]
193 |   },
194 |   {
195 |    "cell_type": "markdown",
196 |    "metadata": {},
197 |    "source": [
198 |     "### 4. create hash buckets for categorical features"
199 |    ]
200 |   },
201 |   {
202 |    "cell_type": "markdown",
203 |    "metadata": {},
204 |    "source": [
205 |     "Here we define two functions to encode string type categorical features and int type categorical features."
206 |    ]
207 |   },
208 |   {
209 |    "cell_type": "code",
210 |    "execution_count": 6,
211 |    "metadata": {
212 |     "collapsed": true
213 |    },
214 |    "outputs": [],
215 |    "source": [
216 |     "def make_hash_columns(CAT_STR_COLS):\n",
217 |     "    \"\"\"\n",
218 |     "    Use tf.feature_column.categorical_column_with_hash_bucket to encode the string type categorical features.\n",
219 |     "    Documentation of this function from tensorflow:\n",
220 |     "        Use this when your sparse features are in string or integer format, and you want to distribute your inputs \n",
221 |     "        into a finite number of buckets by hashing. output_id = Hash(input_feature_string) % bucket_size.\n",
222 |     "    \n",
223 |     "    Arguments:\n",
224 |     "    CAT_STR_COLS -- string type categorical columns.\n",
225 |     "    \n",
226 |     "    Returns:\n",
227 |     "    hashed_layers -- 3 hashed categorical columns in a list.\n",
228 |     "    \n",
229 |     "    \"\"\"\n",
230 |     "    \n",
231 |     "    hashed_columns = [\n",
232 |     "        tf.feature_column.categorical_column_with_hash_bucket(col_name, hash_bucket_size=1000) \n",
233 |     "        for col_name in CAT_STR_COLS\n",
234 |     "    ]\n",
235 |     "    return hashed_columns"
236 |    ]
237 |   },
238 |   {
239 |    "cell_type": "code",
240 |    "execution_count": 7,
241 |    "metadata": {
242 |     "collapsed": true
243 |    },
244 |    "outputs": [],
245 |    "source": [
246 |     "def make_int_columns(CAT_INT_COLS):   \n",
247 |     "    \"\"\"\n",
248 |     "    Use tf.feature_column.categorical_column_with_identity to encode the int type categorical features.\n",
249 |     "    Documentation of this function from tensorflow:\n",
250 |     "        Use this when your inputs are integers in the range [0, num_buckets), and you want to use the \n",
251 |     "        input value itself as the categorical ID.\n",
252 |     "    \n",
253 |     "    Arguments:\n",
254 |     "    CAT_INT_COLS -- int type categorical columns.\n",
255 |     "    \n",
256 |     "    Returns:\n",
257 |     "    hashed_layers -- 2 categorical columns in a list.\n",
258 |     "    \n",
259 |     "    \"\"\"\n",
260 |     "    int_columns = [\n",
261 |     "        tf.feature_column.categorical_column_with_identity(col_name, num_buckets=1000, default_value=0)\n",
262 |     "        for col_name in CAT_INT_COLS\n",
263 |     "    ]\n",
264 |     "    return int_columns"
265 |    ]
266 |   },
267 |   {
268 |    "cell_type": "markdown",
269 |    "metadata": {},
270 |    "source": [
271 |     "In the tensorflow tutorial, they have used tf.feature_column.categorical_column_with_vocabulary_list to create the int type categorical columns.\n",
272 |     "<div class=\"alert alert-block alert-info\">\n",
273 |     "    age = tf.feature_column.categorical_column_with_vocabulary_list(\"age\", [1,18, 25, 35, 45, 50, 56])<br>\n",
274 |     "    occupation = tf.feature_column.categorical_column_with_vocabulary_list(\"occupation\", [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20])<br>\n",
275 |     "    age = tf.feature_column.indicator_column(age)<br>\n",
276 |     "    occupation = tf.feature_column.indicator_column(occupation)<br>\n",
277 |     "</div>"
278 |    ]
279 |   },
280 |   {
281 |    "cell_type": "markdown",
282 |    "metadata": {},
283 |    "source": [
284 |     "### 5. create embedding for sparse features "
285 |    ]
286 |   },
287 |   {
288 |    "cell_type": "markdown",
289 |    "metadata": {},
290 |    "source": [
291 |     "Create dense embeddings for the sparse features to feed into DNN."
292 |    ]
293 |   },
294 |   {
295 |    "cell_type": "code",
296 |    "execution_count": 8,
297 |    "metadata": {
298 |     "collapsed": true
299 |    },
300 |    "outputs": [],
301 |    "source": [
302 |     "def make_embeddings(hashed_columns, int_columns, dim=6):\n",
303 |     "    \"\"\"\n",
304 |     "    Create embeddings for sparse features in the deep model. We use function tf.feature_column.embedding_colum to \n",
305 |     "    convert the categorical columns we have created from the above steps to a dense representation.\n",
306 |     "    \n",
307 |     "    Arguments:\n",
308 |     "    hashed_columns -- all categorical columnns that came out of the make_hash_columns function \n",
309 |     "                    and are going to be fed into the DNN.\n",
310 |     "    int_columns -- all categorical columnns that came out of the make_int_columns function \n",
311 |     "                    and are going to be fed into the DNN.\n",
312 |     "    dim -- 6, hyper-parameter dimension for the feature embeddings.\n",
313 |     "    \n",
314 |     "    Returns:\n",
315 |     "    emdedding_layers -- list of columns with dense (embedded) representations.\n",
316 |     "    \n",
317 |     "    \"\"\" \n",
318 |     "    embedding_layers = [\n",
319 |     "        tf.feature_column.embedding_column(\n",
320 |     "            column,\n",
321 |     "            dimension=dim\n",
322 |     "        )\n",
323 |     "        for column in hashed_columns+int_columns\n",
324 |     "    ]\n",
325 |     "    return embedding_layers"
326 |    ]
327 |   },
328 |   {
329 |    "cell_type": "markdown",
330 |    "metadata": {},
331 |    "source": [
332 |     "### 6. define features for the wide part"
333 |    ]
334 |   },
335 |   {
336 |    "cell_type": "markdown",
337 |    "metadata": {},
338 |    "source": [
339 |     "In our case, all the columns from embedding layers should go into the deep model, so we our deep model input equals to embedding_layers and we are not going to write a function for this."
340 |    ]
341 |   },
342 |   {
343 |    "cell_type": "code",
344 |    "execution_count": 9,
345 |    "metadata": {
346 |     "collapsed": true
347 |    },
348 |    "outputs": [],
349 |    "source": [
350 |     "def make_wide_input_layers(WIDE_COL_CROSSES):\n",
351 |     "    \"\"\"\n",
352 |     "    Make input cross features for the wide model. We use the tf.feature_column.crossed_column function to hash the \n",
353 |     "    cross transformation.\n",
354 |     "    \n",
355 |     "    Arguments:\n",
356 |     "    WIDE_COL_CROSSES -- cross feature combinations.\n",
357 |     "    \n",
358 |     "    Returns:\n",
359 |     "    crossed_wide_input_layers -- input cross features for the wide model.\n",
360 |     "    \n",
361 |     "    \"\"\" \n",
362 |     "    crossed_wide_input_layers = [\n",
363 |     "        tf.feature_column.crossed_column([c for c in cs], hash_bucket_size=int(10**(3+len(cs))))\n",
364 |     "        for cs in WIDE_COL_CROSSES\n",
365 |     "    ]\n",
366 |     "    return crossed_wide_input_layers"
367 |    ]
368 |   },
369 |   {
370 |    "cell_type": "markdown",
371 |    "metadata": {},
372 |    "source": [
373 |     "### 7. train and validate the model"
374 |    ]
375 |   },
376 |   {
377 |    "cell_type": "markdown",
378 |    "metadata": {},
379 |    "source": [
380 |     "Here we provide input features for the deep model and wide model, define the number of layers and layer sizes of DNN \n",
381 |     "and create the model with tf.contrib.learn.DNNLinearCombinedClassifier. We save the model in directory ./model/"
382 |    ]
383 |   },
384 |   {
385 |    "cell_type": "code",
386 |    "execution_count": 23,
387 |    "metadata": {
388 |     "scrolled": false
389 |    },
390 |    "outputs": [
391 |     {
392 |      "name": "stdout",
393 |      "output_type": "stream",
394 |      "text": [
395 |       "create input layers...done!\n",
396 |       "create model...INFO:tensorflow:Using config: {'_task_type': None, '_task_id': 0, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7fcf41ae5668>, '_master': '', '_num_ps_replicas': 0, '_num_worker_replicas': 0, '_environment': 'local', '_is_chief': True, '_evaluation_master': '', '_tf_config': gpu_options {\n",
397 |       "  per_process_gpu_memory_fraction: 1.0\n",
398 |       "}\n",
399 |       ", '_tf_random_seed': None, '_save_summary_steps': 10, '_save_checkpoints_secs': 600, '_log_step_count_steps': 100, '_session_config': None, '_save_checkpoints_steps': None, '_keep_checkpoint_max': 1, '_keep_checkpoint_every_n_hours': 10000, '_model_dir': './model/'}\n",
400 |       "done!\n",
401 |       "training model...WARNING:tensorflow:From /home/cynthia/anaconda3/lib/python3.6/site-packages/tensorflow/contrib/learn/python/learn/estimators/head.py:641: scalar_summary (from tensorflow.python.ops.logging_ops) is deprecated and will be removed after 2016-11-30.\n",
402 |       "Instructions for updating:\n",
403 |       "Please switch to tf.summary.scalar. Note that tf.summary.scalar uses the node name instead of the tag. This means that TensorFlow will automatically de-duplicate summary names based on the scope they are created in. Also, passing a tensor or list of tags to a scalar summary op is no longer supported.\n",
404 |       "WARNING:tensorflow:From /home/cynthia/anaconda3/lib/python3.6/site-packages/tensorflow/contrib/layers/python/layers/optimizers.py:160: assert_global_step (from tensorflow.contrib.framework.python.ops.variables) is deprecated and will be removed in a future version.\n",
405 |       "Instructions for updating:\n",
406 |       "Please switch to tf.train.assert_global_step\n",
407 |       "WARNING:tensorflow:From /home/cynthia/anaconda3/lib/python3.6/site-packages/tensorflow/contrib/layers/python/layers/optimizers.py:160: assert_global_step (from tensorflow.contrib.framework.python.ops.variables) is deprecated and will be removed in a future version.\n",
408 |       "Instructions for updating:\n",
409 |       "Please switch to tf.train.assert_global_step\n",
410 |       "INFO:tensorflow:Create CheckpointSaverHook.\n",
411 |       "INFO:tensorflow:Restoring parameters from ./model/model.ckpt-1000\n",
412 |       "INFO:tensorflow:Saving checkpoints for 1001 into ./model/model.ckpt.\n",
413 |       "INFO:tensorflow:loss = 1.44681, step = 1001\n",
414 |       "INFO:tensorflow:global_step/sec: 0.881767\n",
415 |       "INFO:tensorflow:loss = 1.44639, step = 1101 (113.410 sec)\n",
416 |       "INFO:tensorflow:global_step/sec: 0.919119\n",
417 |       "INFO:tensorflow:loss = 1.44599, step = 1201 (108.799 sec)\n",
418 |       "INFO:tensorflow:global_step/sec: 0.917763\n",
419 |       "INFO:tensorflow:loss = 1.4456, step = 1301 (108.961 sec)\n",
420 |       "INFO:tensorflow:global_step/sec: 0.913875\n",
421 |       "INFO:tensorflow:loss = 1.44523, step = 1401 (109.424 sec)\n",
422 |       "INFO:tensorflow:global_step/sec: 0.917907\n",
423 |       "INFO:tensorflow:loss = 1.44485, step = 1501 (108.944 sec)\n",
424 |       "INFO:tensorflow:Saving checkpoints for 1547 into ./model/model.ckpt.\n",
425 |       "INFO:tensorflow:global_step/sec: 0.907156\n",
426 |       "INFO:tensorflow:loss = 1.44448, step = 1601 (110.235 sec)\n",
427 |       "INFO:tensorflow:global_step/sec: 0.781231\n",
428 |       "INFO:tensorflow:loss = 1.44411, step = 1701 (128.003 sec)\n",
429 |       "INFO:tensorflow:global_step/sec: 0.629334\n",
430 |       "INFO:tensorflow:loss = 1.44374, step = 1801 (158.898 sec)\n",
431 |       "INFO:tensorflow:global_step/sec: 0.627976\n",
432 |       "INFO:tensorflow:loss = 1.44336, step = 1901 (159.244 sec)\n",
433 |       "INFO:tensorflow:Saving checkpoints for 1960 into ./model/model.ckpt.\n",
434 |       "INFO:tensorflow:Saving checkpoints for 2000 into ./model/model.ckpt.\n",
435 |       "INFO:tensorflow:Loss for final step: 1.44299.\n",
436 |       "done!\n",
437 |       "evaluating model...WARNING:tensorflow:From /home/cynthia/anaconda3/lib/python3.6/site-packages/tensorflow/contrib/learn/python/learn/estimators/head.py:641: scalar_summary (from tensorflow.python.ops.logging_ops) is deprecated and will be removed after 2016-11-30.\n",
438 |       "Instructions for updating:\n",
439 |       "Please switch to tf.summary.scalar. Note that tf.summary.scalar uses the node name instead of the tag. This means that TensorFlow will automatically de-duplicate summary names based on the scope they are created in. Also, passing a tensor or list of tags to a scalar summary op is no longer supported.\n",
440 |       "INFO:tensorflow:Starting evaluation at 2017-09-22-18:15:36\n",
441 |       "INFO:tensorflow:Restoring parameters from ./model/model.ckpt-2000\n",
442 |       "INFO:tensorflow:Evaluation [1/1]\n",
443 |       "INFO:tensorflow:Finished evaluation at 2017-09-22-18:15:40\n",
444 |       "INFO:tensorflow:Saving dict for global step 2000: accuracy = 0.34907, global_step = 2000, loss = 1.44281\n",
445 |       "done!\n",
446 |       "calculating predictions...INFO:tensorflow:Restoring parameters from ./model/model.ckpt-2000\n",
447 |       "done!\n",
448 |       "calculating probabilites...INFO:tensorflow:Restoring parameters from ./model/model.ckpt-2000\n",
449 |       "done!\n"
450 |      ]
451 |     }
452 |    ],
453 |    "source": [
454 |     "print(\"create input layers...\", end=\"\")\n",
455 |     "hash_columns = make_hash_columns(CAT_STR_COLS)\n",
456 |     "int_columns = make_int_columns(CAT_INT_COLS)\n",
457 |     "embedding_layers = make_embeddings(hash_columns, int_columns,dim =6)\n",
458 |     "deep_input_layers = embedding_layers\n",
459 |     "wide_input_layers = make_wide_input_layers(WIDE_COL_CROSSES)\n",
460 |     "print(\"done!\")\n",
461 |     "print(\"create model...\", end=\"\")\n",
462 |     "model = tf.contrib.learn.DNNLinearCombinedClassifier(\n",
463 |     "    n_classes=5,\n",
464 |     "    linear_feature_columns = wide_input_layers,\n",
465 |     "    dnn_feature_columns = deep_input_layers,\n",
466 |     "    dnn_hidden_units = [32, 16],\n",
467 |     "    fix_global_step_increment_bug=True,\n",
468 |     "    config = tf.contrib.learn.RunConfig(\n",
469 |     "        keep_checkpoint_max = 1,\n",
470 |     "        save_summary_steps = 10,\n",
471 |     "        model_dir = \"./model/\"\n",
472 |     "    )\n",
473 |     ")\n",
474 |     "print(\"done!\")\n",
475 |     "print(\"training model...\", end=\"\")\n",
476 |     "model.fit(input_fn = lambda: make_inputs(trainset), steps=1000)\n",
477 |     "print(\"done!\")\n",
478 |     "print(\"evaluating model...\", end=\"\")\n",
479 |     "results = model.evaluate(input_fn = lambda: make_inputs(validset), steps=1)\n",
480 |     "print(\"done!\")\n",
481 |     "print(\"calculating predictions...\", end=\"\")\n",
482 |     "predictions = model.predict_classes(input_fn = lambda: make_inputs(validset))\n",
483 |     "print(\"done!\")\n",
484 |     "print(\"calculating probabilites...\", end=\"\")\n",
485 |     "probabilities = model.predict_proba(input_fn = lambda: make_inputs(validset))\n",
486 |     "print(\"done!\")"
487 |    ]
488 |   },
489 |   {
490 |    "cell_type": "code",
491 |    "execution_count": 12,
492 |    "metadata": {},
493 |    "outputs": [
494 |     {
495 |      "name": "stdout",
496 |      "output_type": "stream",
497 |      "text": [
498 |       "loss: 1.4464459\n",
499 |       "accuracy: 0.34885341\n",
500 |       "global_step: 1000\n"
501 |      ]
502 |     }
503 |    ],
504 |    "source": [
505 |     "for n, r in results.items():\n",
506 |     "    print(\"%s: %a\"%(n, r))"
507 |    ]
508 |   },
509 |   {
510 |    "cell_type": "code",
511 |    "execution_count": 24,
512 |    "metadata": {
513 |     "collapsed": true
514 |    },
515 |    "outputs": [],
516 |    "source": [
517 |     "predict = list(predictions)"
518 |    ]
519 |   },
520 |   {
521 |    "cell_type": "code",
522 |    "execution_count": 27,
523 |    "metadata": {
524 |     "collapsed": true
525 |    },
526 |    "outputs": [],
527 |    "source": [
528 |     "prob = list(probabilities)"
529 |    ]
530 |   },
531 |   {
532 |    "cell_type": "code",
533 |    "execution_count": 25,
534 |    "metadata": {},
535 |    "outputs": [
536 |     {
537 |      "name": "stdout",
538 |      "output_type": "stream",
539 |      "text": [
540 |       "DNN Accuracy: 34.907003%\n"
541 |      ]
542 |     }
543 |    ],
544 |    "source": [
545 |     "dnw_accuracy = np.sum(np.asarray(predict)+1 == validset.rating.values) / len(validset)\n",
546 |     "print(\"DNW Accuracy: %f%%\"%(dnw_accuracy*100,))"
547 |    ]
548 |   },
549 |   {
550 |    "cell_type": "code",
551 |    "execution_count": 42,
552 |    "metadata": {},
553 |    "outputs": [
554 |     {
555 |      "data": {
556 |       "text/html": [
557 |        "<div>\n",
558 |        "<style>\n",
559 |        "    .dataframe thead tr:only-child th {\n",
560 |        "        text-align: right;\n",
561 |        "    }\n",
562 |        "\n",
563 |        "    .dataframe thead th {\n",
564 |        "        text-align: left;\n",
565 |        "    }\n",
566 |        "\n",
567 |        "    .dataframe tbody tr th {\n",
568 |        "        vertical-align: top;\n",
569 |        "    }\n",
570 |        "</style>\n",
571 |        "<table border=\"1\" class=\"dataframe\">\n",
572 |        "  <thead>\n",
573 |        "    <tr style=\"text-align: right;\">\n",
574 |        "      <th></th>\n",
575 |        "      <th>gender</th>\n",
576 |        "      <th>age_desc</th>\n",
577 |        "      <th>occ_desc</th>\n",
578 |        "      <th>title</th>\n",
579 |        "      <th>genre</th>\n",
580 |        "      <th>rating</th>\n",
581 |        "      <th>prediction</th>\n",
582 |        "      <th>rating1</th>\n",
583 |        "      <th>rating2</th>\n",
584 |        "      <th>rating3</th>\n",
585 |        "      <th>rating4</th>\n",
586 |        "      <th>rating5</th>\n",
587 |        "    </tr>\n",
588 |        "  </thead>\n",
589 |        "  <tbody>\n",
590 |        "    <tr>\n",
591 |        "      <th>223742</th>\n",
592 |        "      <td>M</td>\n",
593 |        "      <td>35-44</td>\n",
594 |        "      <td>clerical/admin</td>\n",
595 |        "      <td>Little Princess, The (1939)</td>\n",
596 |        "      <td>Children's|Drama</td>\n",
597 |        "      <td>4</td>\n",
598 |        "      <td>4</td>\n",
599 |        "      <td>0.025210</td>\n",
600 |        "      <td>0.068422</td>\n",
601 |        "      <td>0.234015</td>\n",
602 |        "      <td>0.375578</td>\n",
603 |        "      <td>0.296774</td>\n",
604 |        "    </tr>\n",
605 |        "    <tr>\n",
606 |        "      <th>915512</th>\n",
607 |        "      <td>M</td>\n",
608 |        "      <td>25-34</td>\n",
609 |        "      <td>unemployed</td>\n",
610 |        "      <td>Misery (1990)</td>\n",
611 |        "      <td>Horror</td>\n",
612 |        "      <td>4</td>\n",
613 |        "      <td>3</td>\n",
614 |        "      <td>0.105347</td>\n",
615 |        "      <td>0.148656</td>\n",
616 |        "      <td>0.290903</td>\n",
617 |        "      <td>0.287796</td>\n",
618 |        "      <td>0.167298</td>\n",
619 |        "    </tr>\n",
620 |        "    <tr>\n",
621 |        "      <th>209015</th>\n",
622 |        "      <td>M</td>\n",
623 |        "      <td>25-34</td>\n",
624 |        "      <td>technician/engineer</td>\n",
625 |        "      <td>Supercop (1992)</td>\n",
626 |        "      <td>Action|Thriller</td>\n",
627 |        "      <td>3</td>\n",
628 |        "      <td>4</td>\n",
629 |        "      <td>0.050473</td>\n",
630 |        "      <td>0.120541</td>\n",
631 |        "      <td>0.292055</td>\n",
632 |        "      <td>0.350531</td>\n",
633 |        "      <td>0.186400</td>\n",
634 |        "    </tr>\n",
635 |        "    <tr>\n",
636 |        "      <th>719570</th>\n",
637 |        "      <td>M</td>\n",
638 |        "      <td>25-34</td>\n",
639 |        "      <td>other or not specified</td>\n",
640 |        "      <td>Red Violin, The (Le Violon rouge) (1998)</td>\n",
641 |        "      <td>Drama|Mystery</td>\n",
642 |        "      <td>3</td>\n",
643 |        "      <td>4</td>\n",
644 |        "      <td>0.057457</td>\n",
645 |        "      <td>0.115074</td>\n",
646 |        "      <td>0.279706</td>\n",
647 |        "      <td>0.332208</td>\n",
648 |        "      <td>0.215554</td>\n",
649 |        "    </tr>\n",
650 |        "    <tr>\n",
651 |        "      <th>283590</th>\n",
652 |        "      <td>M</td>\n",
653 |        "      <td>35-44</td>\n",
654 |        "      <td>programmer</td>\n",
655 |        "      <td>Manon of the Spring (Manon des sources) (1986)</td>\n",
656 |        "      <td>Drama</td>\n",
657 |        "      <td>4</td>\n",
658 |        "      <td>4</td>\n",
659 |        "      <td>0.025996</td>\n",
660 |        "      <td>0.079624</td>\n",
661 |        "      <td>0.237717</td>\n",
662 |        "      <td>0.393170</td>\n",
663 |        "      <td>0.263494</td>\n",
664 |        "    </tr>\n",
665 |        "  </tbody>\n",
666 |        "</table>\n",
667 |        "</div>"
668 |       ],
669 |       "text/plain": [
670 |        "       gender age_desc                occ_desc  \\\n",
671 |        "223742      M    35-44          clerical/admin   \n",
672 |        "915512      M    25-34              unemployed   \n",
673 |        "209015      M    25-34     technician/engineer   \n",
674 |        "719570      M    25-34  other or not specified   \n",
675 |        "283590      M    35-44              programmer   \n",
676 |        "\n",
677 |        "                                                 title             genre  \\\n",
678 |        "223742                     Little Princess, The (1939)  Children's|Drama   \n",
679 |        "915512                                   Misery (1990)            Horror   \n",
680 |        "209015                                 Supercop (1992)   Action|Thriller   \n",
681 |        "719570        Red Violin, The (Le Violon rouge) (1998)     Drama|Mystery   \n",
682 |        "283590  Manon of the Spring (Manon des sources) (1986)             Drama   \n",
683 |        "\n",
684 |        "        rating  prediction   rating1   rating2   rating3   rating4   rating5  \n",
685 |        "223742       4           4  0.025210  0.068422  0.234015  0.375578  0.296774  \n",
686 |        "915512       4           3  0.105347  0.148656  0.290903  0.287796  0.167298  \n",
687 |        "209015       3           4  0.050473  0.120541  0.292055  0.350531  0.186400  \n",
688 |        "719570       3           4  0.057457  0.115074  0.279706  0.332208  0.215554  \n",
689 |        "283590       4           4  0.025996  0.079624  0.237717  0.393170  0.263494  "
690 |       ]
691 |      },
692 |      "execution_count": 42,
693 |      "metadata": {},
694 |      "output_type": "execute_result"
695 |     }
696 |    ],
697 |    "source": [
698 |     "results = validset[[\"gender\",\"age_desc\",\"occ_desc\", \"title\", \"genre\", \"rating\"]].copy()\n",
699 |     "results[\"prediction\"] = np.asarray(predict)+1\n",
700 |     "results[\"rating1\"] = np.vstack(prob)[:,0]\n",
701 |     "results[\"rating2\"] = np.vstack(prob)[:,1]\n",
702 |     "results[\"rating3\"] = np.vstack(prob)[:,2]\n",
703 |     "results[\"rating4\"] = np.vstack(prob)[:,3]\n",
704 |     "results[\"rating5\"] = np.vstack(prob)[:,4]\n",
705 |     "results.head(5)"
706 |    ]
707 |   },
708 |   {
709 |    "cell_type": "code",
710 |    "execution_count": null,
711 |    "metadata": {
712 |     "collapsed": true
713 |    },
714 |    "outputs": [],
715 |    "source": []
716 |   }
717 |  ],
718 |  "metadata": {
719 |   "kernelspec": {
720 |    "display_name": "Python 3",
721 |    "language": "python",
722 |    "name": "python3"
723 |   },
724 |   "language_info": {
725 |    "codemirror_mode": {
726 |     "name": "ipython",
727 |     "version": 3
728 |    },
729 |    "file_extension": ".py",
730 |    "mimetype": "text/x-python",
731 |    "name": "python",
732 |    "nbconvert_exporter": "python",
733 |    "pygments_lexer": "ipython3",
734 |    "version": "3.6.2"
735 |   }
736 |  },
737 |  "nbformat": 4,
738 |  "nbformat_minor": 2
739 | }
740 | 


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) 2017 Mo Patel & Junxia Li
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # strata_ny_2017_recommender_tutorial


--------------------------------------------------------------------------------