├── ALS_Matrix_Factorization.ipynb
├── Deep_Matrix_Factorization.ipynb
├── Deep_Recommender_Tutorial_Strata_NY_2017.pdf
├── Deep_Wide_Learning.ipynb
├── LICENSE
└── README.md
/ALS_Matrix_Factorization.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "In this notebook, we demonstrate how to use tensorflow API to design and implement a matrix factorization model for predicting the movie ratings on the movie lens 1m dataset.\n",
8 | "\n",
9 | "In our dataset, we have user ids and movie ids and ratings from each user on different movies. This gives us information of user preferences. \n",
10 | "\n",
11 | "Using this information, we want to learn movie and user features and use these features to predict the rating, so that we can recommend the potentially highest rated movies to our users. \n",
12 | "\n",
13 | "\n",
14 | "\n",
15 | "- users $\\{1,...,N\\}$ with user features $\\{\\theta_1,...,\\theta_N\\} =: \\theta$\n",
16 | "- movies $\\{1,...,M\\}$ with movie features $\\{\\phi_1,...,\\phi_M\\} =: \\phi$\n",
17 | "- dataset $D = \\{r_{i,j}: user\\ i\\ has\\ rated\\ movie\\ j\\}$ of all ratings\n",
18 | "- predicted rating of user $i$ of movie $j$: $\\hat{r}_{i,j} = \\theta_i^T \\cdot \\phi_j$\n",
19 | "- cost function $J(\\theta, \\phi) = \\sum_{r_{i,j} \\in D} (r_{i,j} - \\hat{r}_{i,j})^2$\n",
20 | "- We will use ALS method for matrix factorization. This means we calculate two gradients of our cost function $\\nabla_\\theta J$ and $\\nabla_\\phi J$ independently in order to update our parameters $\\theta$ and $\\phi$ one after another.\n",
21 | " \n",
22 | "We will achieve this in following steps:\n",
23 | "1. Load the movielens data into a pandas dataframe for preprocessing\n",
24 | "2. Set up our parameters: $\\theta$ and $\\phi$\n",
25 | "3. Load the data into a tensorflow variable\n",
26 | "4. Define the prediction and cost\n",
27 | "5. Start the session and run training, evaluation and finally predictions"
28 | ]
29 | },
30 | {
31 | "cell_type": "code",
32 | "execution_count": 1,
33 | "metadata": {
34 | "collapsed": true
35 | },
36 | "outputs": [],
37 | "source": [
38 | "import tensorflow as tf\n",
39 | "import numpy as np\n",
40 | "import pandas as pd\n",
41 | "import matplotlib.pyplot as plt\n",
42 | "%matplotlib inline"
43 | ]
44 | },
45 | {
46 | "cell_type": "markdown",
47 | "metadata": {},
48 | "source": [
49 | "### download data"
50 | ]
51 | },
52 | {
53 | "cell_type": "markdown",
54 | "metadata": {},
55 | "source": [
56 | "First download the data from movie lens website. Once the data file is unzipped, there are three files that we are using-- user.csv, movie.csv and rating.csv."
57 | ]
58 | },
59 | {
60 | "cell_type": "code",
61 | "execution_count": 2,
62 | "metadata": {
63 | "collapsed": true
64 | },
65 | "outputs": [],
66 | "source": [
67 | "#! wget -q http://files.grouplens.org/datasets/movielens/ml-1m.zip\n",
68 | "#! unzip ml-1m.zip"
69 | ]
70 | },
71 | {
72 | "cell_type": "markdown",
73 | "metadata": {},
74 | "source": [
75 | "### data preprocessing"
76 | ]
77 | },
78 | {
79 | "cell_type": "markdown",
80 | "metadata": {},
81 | "source": [
82 | "step 1: load data to pandas dataframes, merge the dataframes to one table."
83 | ]
84 | },
85 | {
86 | "cell_type": "code",
87 | "execution_count": 3,
88 | "metadata": {
89 | "collapsed": true
90 | },
91 | "outputs": [],
92 | "source": [
93 | "age_desc = { 1: \"Under 18\", 18: \"18-24\", 25: \"25-34\", 35: \"35-44\", 45: \"45-49\", 50: \"50-55\", 56: \"56+\" }\n",
94 | "occupation_desc = { 0: \"other or not specified\", 1: \"academic/educator\", 2: \"artist\", 3: \"clerical/admin\",\n",
95 | " 4: \"college/grad student\", 5: \"customer service\", 6: \"doctor/health care\",\n",
96 | " 7: \"executive/managerial\", 8: \"farmer\", 9: \"homemaker\", 10: \"K-12 student\", 11: \"lawyer\",\n",
97 | " 12: \"programmer\", 13: \"retired\", 14: \"sales/marketing\", 15: \"scientist\", 16: \"self-employed\",\n",
98 | " 17: \"technician/engineer\", 18: \"tradesman/craftsman\", 19: \"unemployed\", 20: \"writer\" }\n",
99 | "\n",
100 | "rating_data = pd.read_csv(\n",
101 | " \"ml-1m/ratings.dat\",\n",
102 | " sep=\"::\",\n",
103 | " engine=\"python\",\n",
104 | " encoding=\"latin-1\",\n",
105 | " names=['userid', 'movieid', 'rating', 'timestamp'])\n",
106 | "\n",
107 | "user_data = pd.read_csv(\n",
108 | " \"ml-1m/users.dat\", \n",
109 | " sep='::', \n",
110 | " engine='python', \n",
111 | " encoding='latin-1',\n",
112 | " names=['userid', 'gender', 'age', 'occupation', 'zipcode']\n",
113 | ")\n",
114 | "user_data['age_desc'] = user_data['age'].apply(lambda x: age_desc[x])\n",
115 | "user_data['occ_desc'] = user_data['occupation'].apply(lambda x: occupation_desc[x])\n",
116 | "\n",
117 | "movie_data = pd.read_csv(\n",
118 | " \"ml-1m/movies.dat\",\n",
119 | " sep='::', \n",
120 | " engine='python', \n",
121 | " encoding='latin-1',\n",
122 | " names=['movieid', 'title', 'genre']\n",
123 | ")\n",
124 | "\n",
125 | "dataset = pd.merge(pd.merge(rating_data, movie_data, how=\"left\", on=\"movieid\"), user_data, how=\"left\", on=\"userid\")"
126 | ]
127 | },
128 | {
129 | "cell_type": "markdown",
130 | "metadata": {},
131 | "source": [
132 | "Step 2: preprocess the movie id and user id"
133 | ]
134 | },
135 | {
136 | "cell_type": "code",
137 | "execution_count": 4,
138 | "metadata": {
139 | "collapsed": true
140 | },
141 | "outputs": [],
142 | "source": [
143 | "def check_cols(df, cols):\n",
144 | " \"\"\"\n",
145 | " check if there are gaps of index, and if the index starts from 0\n",
146 | " \n",
147 | " Arguments:\n",
148 | " df -- dataframe of the dataset\n",
149 | " cols -- dataframe columns that needs to be checked, in our case user id and movie id\n",
150 | " \n",
151 | " Returns:\n",
152 | " a list of tuple [('COLUMN_NAME', boolean)], if True, the column needs to be fixed, if False, the column is ok.\n",
153 | " \"\"\"\n",
154 | " return [(col, False) if len(dataset[col].unique())-1 == dataset[col].max() else (col, True) for col in cols]\n",
155 | "\n",
156 | "def remove_gaps(df, col):\n",
157 | " \"\"\"\n",
158 | " preprocess the index of user id and movie id to start from 0 and eliminate gaps in the index\n",
159 | " \n",
160 | " Arguments:\n",
161 | " df -- dataframe of the dataset\n",
162 | " col -- dataframe columns that needs to be adjusted, in our case both user id and movie id\n",
163 | " \n",
164 | " Returns:\n",
165 | " a dataframe with adjusted columns.\n",
166 | " \"\"\"\n",
167 | " adj_col_uni = df[col].sort_values().unique()\n",
168 | " adj_df = pd.DataFrame(adj_col_uni).reset_index().rename(columns = {0: col, 'index': \"adj_%s\"%(col,)})\n",
169 | " return pd.merge(adj_df, df, how=\"right\", on=col)"
170 | ]
171 | },
172 | {
173 | "cell_type": "code",
174 | "execution_count": 5,
175 | "metadata": {},
176 | "outputs": [
177 | {
178 | "name": "stdout",
179 | "output_type": "stream",
180 | "text": [
181 | "before fix:\n",
182 | "userid needs fix!\n",
183 | "movieid needs fix!\n",
184 | "\n",
185 | "after fix\n",
186 | "adj_userid ok.\n",
187 | "adj_movieid ok.\n"
188 | ]
189 | }
190 | ],
191 | "source": [
192 | "index_cols = [\"userid\", \"movieid\"]\n",
193 | "cols_check = check_cols(dataset, index_cols)\n",
194 | "print_check = lambda check: print(*[\"%s needs fix!\"%(c,) if f else \"%s ok.\"%(c,) for c, f in check], sep=\"\\n\")\n",
195 | "print(\"before fix:\")\n",
196 | "print_check(cols_check)\n",
197 | "for col, needs_fix in cols_check:\n",
198 | " if needs_fix:\n",
199 | " dataset = remove_gaps(dataset, col)\n",
200 | "\n",
201 | "print(\"\\nafter fix\")\n",
202 | "print_check(check_cols(dataset, [\"adj_userid\", \"adj_movieid\"]))"
203 | ]
204 | },
205 | {
206 | "cell_type": "markdown",
207 | "metadata": {},
208 | "source": [
209 | "step 3: shuffle data and split the data to train and validation set."
210 | ]
211 | },
212 | {
213 | "cell_type": "code",
214 | "execution_count": 6,
215 | "metadata": {
216 | "collapsed": true
217 | },
218 | "outputs": [],
219 | "source": [
220 | "dataset = dataset.sample(frac=1, replace=False)\n",
221 | "n_split = int(len(dataset)*.7)\n",
222 | "trainset = dataset[:n_split]\n",
223 | "validset = dataset[n_split:]"
224 | ]
225 | },
226 | {
227 | "cell_type": "markdown",
228 | "metadata": {},
229 | "source": [
230 | "### build the model"
231 | ]
232 | },
233 | {
234 | "cell_type": "markdown",
235 | "metadata": {},
236 | "source": [
237 | ">Recall that following is our model:\n",
238 | "
\n",
239 | "- users $\\{1,...,N\\}$ with user features $\\{\\theta_1,...,\\theta_N\\} =: \\theta$
\n",
240 | "- movies $\\{1,...,M\\}$ with movie features $\\{\\phi_1,...,\\phi_M\\} =: \\phi$
\n",
241 | "- dataset $D = \\{r_{i,j}: user\\ i\\ has\\ rated\\ movie\\ j\\}$ of all ratings
\n",
242 | "- predicted rating of user $i$ of movie $j$: $\\hat{r}_{i,j} = \\theta_i^T \\cdot \\phi_j$
\n",
243 | "- cost function $J(\\theta, \\phi) = \\sum_{r_{i,j} \\in D} (r_{i,j} - \\hat{r}_{i,j})^2$
\n",
244 | "- We will use ALS method for matrix factorization. This means we calculate two gradients of our cost function $\\nabla_\\theta J$ and $\\nabla_\\phi J$ independently in order to update our parameters $\\theta$ and $\\phi$ one after another.
"
245 | ]
246 | },
247 | {
248 | "cell_type": "code",
249 | "execution_count": 10,
250 | "metadata": {
251 | "collapsed": true
252 | },
253 | "outputs": [],
254 | "source": [
255 | "def initialize_features(num_users, num_movies, dim):\n",
256 | " \"\"\"\n",
257 | " Initialize features. User_features and movie_features need to be trained by the matrix factorization model.\n",
258 | " \n",
259 | " Arguments:\n",
260 | " num_users -- number of users\n",
261 | " num_movies -- number of movies\n",
262 | " dim -- dimension of learned user and movie features, it's a hyper-parameter\n",
263 | " \n",
264 | " Returns:\n",
265 | " user_features -- a matrix (variable) of shape [number of users, dim]\n",
266 | " movie_features -- a matrix (variable) of shape [number of movies, dim]\n",
267 | " \"\"\" \n",
268 | " user_features = tf.get_variable(\n",
269 | " \"theta\",\n",
270 | " shape = [num_users, dim],\n",
271 | " dtype = tf.float32,\n",
272 | " initializer = tf.truncated_normal_initializer(mean=0, stddev=.05)\n",
273 | " )\n",
274 | " movie_features = tf.get_variable(\n",
275 | " \"phi\",\n",
276 | " shape = [num_movies, dim],\n",
277 | " dtype = tf.float32,\n",
278 | " initializer = tf.truncated_normal_initializer(mean=0, stddev=.05)\n",
279 | " )\n",
280 | " return user_features, movie_features\n",
281 | "\n",
282 | "def create_dataset(user_ids, movie_ids, ratings):\n",
283 | " \"\"\"\n",
284 | " Load user id, movie id and rating values. Turn numpy array to tensors.\n",
285 | " \n",
286 | " Arguments:\n",
287 | " user_ids -- user index\n",
288 | " movie_ids -- movies index\n",
289 | " ratings -- true rating value\n",
290 | " \n",
291 | " Returns:\n",
292 | " user_id_var -- a constant of shape [number of training examples, 1]\n",
293 | " movie_id_var -- a constant of shape [number of training examples, 1]\n",
294 | " ratings_var -- a constant of shape [number of training examples, 1]\n",
295 | " \"\"\" \n",
296 | " user_id_var = tf.constant(name=\"userid\", value=user_ids)\n",
297 | " movie_id_var = tf.constant(name=\"movieid\", value=movie_ids)\n",
298 | " ratings_var = tf.constant(name=\"ratings\", value=np.asarray(ratings, dtype=np.float32))\n",
299 | " return user_id_var, movie_id_var, ratings_var\n",
300 | " \n",
301 | "def lookup_features(user_features, movie_features, user_ids, movie_ids): \n",
302 | " \"\"\"\n",
303 | " Retrieve embeddings based on user ids and movie ids respectively.\n",
304 | " We use tf.gather function for this. tf.gather gathers slices from params according to indices.\n",
305 | " \n",
306 | " Arguments:\n",
307 | " user_features -- shape [number of user ids, dim]\n",
308 | " movie_features -- shape [number of movie ids, dim]\n",
309 | " user_ids -- user id tensor (in our case loaded user ids from dataset)\n",
310 | " movie_ids -- movie id tensor (in our case loaded movie ids from dataset)\n",
311 | " \n",
312 | " Returns:\n",
313 | " selected_user_features -- a tensor of shape [number of examples, dim]\n",
314 | " selected_movie_features -- a tensor of shape [number of examples, dim]\n",
315 | " \"\"\" \n",
316 | " selected_user_features = tf.gather(user_features, user_ids)\n",
317 | " selected_movie_features = tf.gather(movie_features, movie_ids)\n",
318 | " return selected_user_features, selected_movie_features\n",
319 | "\n",
320 | "def predict(selected_user_features, selected_movie_features):\n",
321 | " \"\"\"\n",
322 | " Calculate predictions. This is the dot product of user features and movie features. \n",
323 | " For each training example, this corresponds to a single number.\n",
324 | " \n",
325 | " Arguments:\n",
326 | " selected_user_features -- matrix of user features for each example -- shape [number of examples, dim]\n",
327 | " selected_movie_features -- matrix of movies features value for each example -- shape [number of examples, dim]\n",
328 | " \n",
329 | " Returns:\n",
330 | " selected_predictions -- a tensor of shape [number of examples, 1]\n",
331 | " \"\"\" \n",
332 | " selected_predictions = tf.reduce_sum(\n",
333 | " selected_user_features * selected_movie_features,\n",
334 | " axis = 1\n",
335 | " )\n",
336 | " ##alternatively, using tf.reduce_sum(tf.multiply(selected_user_embeddings,selected_movie_embeddings), axis=1)\n",
337 | " return selected_predictions\n",
338 | "\n",
339 | "def mean_squared_difference(predictions, ratings):\n",
340 | " \"\"\"\n",
341 | " Calculate cost.\n",
342 | " \n",
343 | " Arguments:\n",
344 | " predictions -- predicted ratings.\n",
345 | " ratings -- true ratings.\n",
346 | " \n",
347 | " Returns:\n",
348 | " difference -- mean squared error. It's a real number. \n",
349 | " \"\"\" \n",
350 | " difference = tf.reduce_mean(tf.squared_difference(predictions, ratings))\n",
351 | " return difference"
352 | ]
353 | },
354 | {
355 | "cell_type": "markdown",
356 | "metadata": {},
357 | "source": [
358 | "### set hyper parameters"
359 | ]
360 | },
361 | {
362 | "cell_type": "code",
363 | "execution_count": 11,
364 | "metadata": {
365 | "collapsed": true
366 | },
367 | "outputs": [],
368 | "source": [
369 | "emb_dim = 8\n",
370 | "learning_rate = 50\n",
371 | "epochs = 1000"
372 | ]
373 | },
374 | {
375 | "cell_type": "markdown",
376 | "metadata": {},
377 | "source": [
378 | "### train model"
379 | ]
380 | },
381 | {
382 | "cell_type": "markdown",
383 | "metadata": {},
384 | "source": [
385 | "Here we define the tensorflow graph and create the session to compute the values.\n",
386 | "\n",
387 | "From the tensorflow documentation:\n",
388 | " - A graph defines the computation. It doesn’t compute anything, it doesn’t hold any values, it just defines the operations that you specified in your code.\n",
389 | " - A session allows to execute graphs or part of graphs. It allocates resources (on one or more machines) for that and holds the actual values of intermediate results and variables."
390 | ]
391 | },
392 | {
393 | "cell_type": "code",
394 | "execution_count": 12,
395 | "metadata": {
396 | "scrolled": false
397 | },
398 | "outputs": [
399 | {
400 | "name": "stdout",
401 | "output_type": "stream",
402 | "text": [
403 | "valid loss at step 1: 14.061668\n",
404 | "valid loss at step 101: 1.053597\n",
405 | "valid loss at step 201: 0.873390\n",
406 | "valid loss at step 301: 0.833664\n",
407 | "valid loss at step 401: 0.804873\n",
408 | "valid loss at step 501: 0.786872\n",
409 | "valid loss at step 601: 0.775171\n",
410 | "valid loss at step 701: 0.767974\n",
411 | "valid loss at step 801: 0.763356\n",
412 | "valid loss at step 901: 0.760175\n"
413 | ]
414 | }
415 | ],
416 | "source": [
417 | "with tf.Graph().as_default():\n",
418 | " with tf.variable_scope(\"features\"):\n",
419 | " usr_embs, mov_embs = initialize_features(len(dataset.adj_userid.unique()), len(dataset.adj_movieid.unique()), emb_dim)\n",
420 | " with tf.variable_scope(\"train_set\"):\n",
421 | " train_data = trainset[[\"adj_userid\", \"adj_movieid\", \"rating\"]].values.T #shape(3, 700146)\n",
422 | " train_usr_ids, train_mov_ids, train_ratings = create_dataset(*train_data)# expend to 3 lists\n",
423 | " with tf.variable_scope(\"valid_set\"):\n",
424 | " valid_data = validset[[\"adj_userid\", \"adj_movieid\", \"rating\"]].values.T\n",
425 | " valid_usr_ids, valid_mov_ids, valid_ratings = create_dataset(*valid_data)\n",
426 | " with tf.variable_scope(\"training\"):\n",
427 | " train_sel_usr_emb, train_sel_mov_emb = lookup_features(usr_embs, mov_embs, train_usr_ids, train_mov_ids)\n",
428 | " train_preds = predict(train_sel_usr_emb, train_sel_mov_emb)\n",
429 | " train_loss = mean_squared_difference(train_preds, train_ratings)\n",
430 | " optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)\n",
431 | " train_usr_embs = optimizer.minimize(train_loss, var_list=[usr_embs])\n",
432 | " train_mov_embs = optimizer.minimize(train_loss, var_list=[mov_embs]) \n",
433 | " with tf.variable_scope(\"validation\"):\n",
434 | " valid_sel_usr_emb, valid_sel_mov_emb = lookup_features(usr_embs, mov_embs, valid_usr_ids, valid_mov_ids)\n",
435 | " valid_preds = predict(valid_sel_usr_emb, valid_sel_mov_emb)\n",
436 | " valid_loss = mean_squared_difference(valid_preds, valid_ratings)\n",
437 | " with tf.Session() as sess:\n",
438 | " writer = tf.summary.FileWriter('Graph/MF',sess.graph)\n",
439 | " sess.run(tf.global_variables_initializer())\n",
440 | " train_loss_history = []\n",
441 | " valid_loss_history = []\n",
442 | " for i in range(epochs):\n",
443 | " current_train_loss, _ = sess.run([train_loss, train_usr_embs])\n",
444 | " current_train_loss, _ = sess.run([train_loss, train_mov_embs])\n",
445 | " current_valid_loss = sess.run(valid_loss)\n",
446 | " if i%100 ==0:\n",
447 | " print(\"valid loss at step %i: %f\"%(i+1, current_valid_loss))\n",
448 | " train_loss_history.append(current_train_loss)\n",
449 | " valid_loss_history.append(current_valid_loss)\n",
450 | " final_user_features, final_movie_features = sess.run([usr_embs, mov_embs])\n",
451 | " final_valid_predictions = sess.run(valid_preds) \n",
452 | " writer.close()"
453 | ]
454 | },
455 | {
456 | "cell_type": "markdown",
457 | "metadata": {},
458 | "source": [
459 | "### plot losses"
460 | ]
461 | },
462 | {
463 | "cell_type": "markdown",
464 | "metadata": {},
465 | "source": [
466 | "plot the traing loss and valid loss"
467 | ]
468 | },
469 | {
470 | "cell_type": "code",
471 | "execution_count": 10,
472 | "metadata": {},
473 | "outputs": [
474 | {
475 | "data": {
476 | "image/png": "iVBORw0KGgoAAAANSUhEUgAAAtEAAAJQCAYAAABIJTh6AAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzs3Xt43HWd9//XOzPfmSRN2vQQoKWFFtalLbQUCAftTwFB\nbhAVTwjc1NN6y+qigHop6P1bV69dL/Fa9hZxVbarqCsIN1vFw8rKwQXq/gS0xQLFolgotLRAekia\nNJM5fn5/zCRNS9pmkvl+PzPfPB/XNdccMpl5M18Pz3z4zIw55wQAAABg7Jp8DwAAAAA0GiIaAAAA\nqBIRDQAAAFSJiAYAAACqREQDAAAAVSKiAQAAgCoR0QAAAECViGgAAACgSkQ0AAAAUKWk7wHGYtas\nWW7+/Pm+xwAAAEDMrV27drtzrvNQ92uIiJ4/f77WrFnjewwAAADEnJk9P5b7sZ0DAAAAqBIRDQAA\nAFSJiAYAAACq1BB7ogEAAFCWz+e1ZcsWDQ4O+h6loTU3N2vu3LkKgmBcv09EAwAANJAtW7aovb1d\n8+fPl5n5HqchOee0Y8cObdmyRQsWLBjXY7CdAwAAoIEMDg5q5syZBPQEmJlmzpw5odV8IhoAAKDB\nENATN9HXkIgGAAAAqkREAwAAYMx6enr0zW9+c1y/++Y3v1k9PT1jvv8XvvAF3XDDDeN6rrAR0QAA\nABizg0V0oVA46O/efffd6ujoCGOsyBHRAAAAGLPrrrtOGzdu1LJly/TpT39aDz74oF7/+tfrbW97\nmxYvXixJevvb365TTjlFxx9/vFauXDn8u/Pnz9f27du1adMmLVq0SB/+8Id1/PHH67zzzlMmkzno\n865bt05nnHGGli5dqne84x3atWuXJOmmm27S4sWLtXTpUl166aWSpIceekjLli3TsmXLdNJJJ6mv\nr6/mrwMfcQcAANCorrlGWreuto+5bJl0440H/PH111+v9evXa13leR988EE99thjWr9+/fDHxd1y\nyy2aMWOGMpmMTj31VL3rXe/SzJkz93mcZ555Rrfffrv+9V//Ve95z3v0ox/9SCtWrDjg877vfe/T\n17/+dZ155pn6/Oc/ry9+8Yu68cYbdf311+u5555TOp0e3ipyww036Bvf+IaWL1+u/v5+NTc3T/RV\neRVWogEAADAhp5122j6ft3zTTTfpxBNP1BlnnKHNmzfrmWeeedXvLFiwQMuWLZMknXLKKdq0adMB\nH7+3t1c9PT0688wzJUnvf//7tXr1aknS0qVLdfnll+vWW29VMlleH16+fLk++clP6qabblJPT8/w\n7bXESjQAAECjOsiKcZSmTJkyfPnBBx/U/fffr4cfflitra0666yzRv085nQ6PXw5kUgccjvHgfzi\nF7/Q6tWr9fOf/1xf+tKX9OSTT+q6667ThRdeqLvvvlvLly/XPffco4ULF47r8Q+ElWgAAACMWXt7\n+0H3GPf29mr69OlqbW3V008/rUceeWTCzzlt2jRNnz5dv/71ryVJP/jBD3TmmWeqVCpp8+bNOvvs\ns/WVr3xFvb296u/v18aNG7VkyRJde+21OvXUU/X0009PeIb9sRINAACAMZs5c6aWL1+uE044QRdc\ncIEuvPDCfX5+/vnn6+abb9aiRYt03HHH6YwzzqjJ837/+9/XRz7yEQ0MDOiYY47Rd7/7XRWLRa1Y\nsUK9vb1yzumqq65SR0eH/vZv/1YPPPCAmpqadPzxx+uCCy6oyQwjmXOu5g9aa11dXW7NmjW+xwAA\nAPBuw4YNWrRoke8xYmG019LM1jrnug71u2znAAAAAKpERAMAAABVIqIBAACAKhHRAAAAQJWIaAAA\nAKBKoX3EnZndIuktkl5xzp2w388+JekGSZ3Oue1hzTART37tv/TTf3hSyYRTMql9TumU09xZg1p4\nYlrz3rpMOussqYm/RwAAACaLMD8n+nuS/lnSv4280czmSTpP0gshPveE/f7lOfrb7W88+J0ekJbe\n+Lg+f+Qn9K5f/JV04onRDAcAANBA2tra1N/fr61bt+qqq67SqlWrXnWfs846SzfccIO6urrGdLtv\noS2fOudWS9o5yo++Kukzkur6A6rf+6WFKhSkwUGpv1/atUvq7pa2bZOefVZ68EHpn76clZt3lN79\n4tf0ldfeJR3kO98BAAAmuzlz5owa0I0o0j0IZnaRpBedc49H+bzjYSYlElI6LU2ZInV0SLNmSUcc\nIS1YIJ15pvTJ69Jau3G6Lntrn67LfEG/+Z//7HtsAACAUF133XX6xje+MXz9C1/4gm644Qb19/fr\nnHPO0cknn6wlS5bopz/96at+d9OmTTrhhPIu30wmo0svvVSLFi3SO97xDmUymUM+9+23364lS5bo\nhBNO0LXXXitJKhaL+sAHPqATTjhBS5Ys0Ve/+lVJ0k033aTFixdr6dKluvTSS2vxj76PyL7228xa\nJX1O5a0cY7n/FZKukKSjjjoqxMkmJgiklT9s1+ojd+uzD79VD61bJy1b5nssAAAwCVxzjbRuXW0f\nc9ky6cYbD/zzSy65RNdcc42uvPJKSdKdd96pe+65R83Nzbrrrrs0depUbd++XWeccYbe9ra3ycxG\nfZxvfetbam1t1YYNG/TEE0/o5JNPPuhcW7du1bXXXqu1a9dq+vTpOu+88/STn/xE8+bN04svvqj1\n69dLknp6eiRJ119/vZ577jml0+nh22opypXoYyUtkPS4mW2SNFfSY2Z2xGh3ds6tdM51Oee6Ojs7\nIxyzem1t0tWfTGq1ztRTX7vf9zgAAAChOemkk/TKK69o69atevzxxzV9+nTNmzdPzjl97nOf09Kl\nS3XuuefqxRdf1Msvv3zAx1m9erVWrFghSVq6dKmWLl160Of93e9+p7POOkudnZ1KJpO6/PLLtXr1\nah1zzDF69tln9fGPf1y//OUvNXXq1OHHvPzyy3Xrrbcqmaz9unFkK9HOuSclHTZ0vRLSXfX66RzV\net9HWvWZL0h3/cR0/Hd9TwMAACaDg60Yh+niiy/WqlWr9NJLL+mSSy6RJN12223q7u7W2rVrFQSB\n5s+fr8HBwdBnmT59uh5//HHdc889uvnmm3XnnXfqlltu0S9+8QutXr1aP//5z/WlL31JTz75ZE1j\nOrSVaDO7XdLDko4zsy1m9qGwnqseHH64dMq8l/XLntOlzZt9jwMAABCaSy65RHfccYdWrVqliy++\nWJLU29urww47TEEQ6IEHHtDzzz9/0Md4wxveoB/+8IeSpPXr1+uJJ5446P1PO+00PfTQQ9q+fbuK\nxaJuv/12nXnmmdq+fbtKpZLe9a536R/+4R/02GOPqVQqafPmzTr77LP1la98Rb29verv76/NP3xF\naCvRzrnLDvHz+WE9ty/nvtHpn75/urIP3qX0e+f5HgcAACAUxx9/vPr6+nTkkUdq9uzZkqTLL79c\nb33rW7VkyRJ1dXVp4cKFB32Mj370o/rgBz+oRYsWadGiRTrllFMOev/Zs2fr+uuv19lnny3nnC68\n8EJddNFFevzxx/XBD35QpVJJkvTlL39ZxWJRK1asUG9vr5xzuuqqq9TR0VGbf/gKc66uP2lOktTV\n1eXWrFnje4xDWvV/i7r40oTWXPZPOuWHn/I9DgAAiKENGzZo0aJFvseIhdFeSzNb65w75IdS8zV7\nNXTyqQlJ0mOPJzxPAgAAgDAR0TW0YIHUnszoiRdq+68LAAAAUF+I6Boyk46d1auN/YeVv+YQAAAg\nBI2wHbfeTfQ1JKJr7C/m57VRx0obN/oeBQAAxFBzc7N27NhBSE+Ac047duxQc3PzuB8jss+JniyO\nfU1CP31kgYqb7lXixBN9jwMAAGJm7ty52rJli7q7u32P0tCam5s1d+7ccf8+EV1jxy5pVV4pbXly\nl46+yPc0AAAgboIg0IIFC3yPMemxnaPG5i5qlyRt/RN7ogEAAOKKiK6x2XPLH2+37fmc50kAAAAQ\nFiK6xo44ony+bZvfOQAAABAeIrrGOjulJhW1bdf43+0JAACA+kZE11giIR3e2qeX+tt8jwIAAICQ\nENEhOKJ9QNsGp0ulku9RAAAAEAIiOgSHT8/pFXVKPT2+RwEAAEAIiOgQzJjhtEvTJT4EHQAAIJaI\n6BBMn5nQTs2Qtm/3PQoAAABCQESHYMbhgXrUodLLrEQDAADEEREdghmz03JqUu+WPt+jAAAAIARE\ndAimz05Lkna+xLcWAgAAxBERHYIZs8tftLJre9HzJAAAAAgDER2CGZ0JSdLO7XxONAAAQBwR0SGY\nMaN8vnOn3zkAAAAQDiI6BFOnls937/Y7BwAAAMJBRIegvb183seHcwAAAMQSER2Ctrbyed8eXl4A\nAIA4ovJC0NQkTUkOqm8w6XsUAAAAhICIDkl7kFXfYOB7DAAAAISAiA5Je3NOfdm07zEAAAAQAiI6\nJO3NefWVWqUiX7gCAAAQN0R0SNpbiupXm7Rnj+9RAAAAUGNEdEjaWkvqUzsRDQAAEENEdEja24ho\nAACAuCKiQ9LeJiIaAAAgpojokLRPNSIaAAAgpojokLRNbdIetcn1E9EAAABxQ0SHpKW9/G2F2Z6M\n50kAAABQa0R0SFqnliN6YFfW8yQAAACoNSI6JK3Tyl/5PdCT8zwJAAAAao2IDknLtJQkIhoAACCO\niOiQtE5PS5Iyu/OeJwEAAECtEdEhGd4T3Vf0PAkAAABqjYgOSUtL+ZyIBgAAiB8iOiStreXzzIDz\nOwgAAABqjogOyVBEDwz4nQMAAAC1R0SHZHg7R8b8DgIAAICaI6JDMrydI0tEAwAAxA0RHZLhlejB\nhN9BAAAAUHNEdEiGIzpLRAMAAMQNER2SIJACyyuTI6IBAADihogOUUsir4Fc0vcYAAAAqDEiOkQt\nyZwGCoHvMQAAAFBjRHSImpMFZQts5wAAAIgbIjpE6WRR2QLbOQAAAOKGiA5Rc1DUYJHtHAAAAHFD\nRIcoHZSULSUl53yPAgAAgBoiokPUnCopq7SUy/keBQAAADVERIconXIaVLOUyfgeBQAAADVERIco\nnXLllejBQd+jAAAAoIaI6BA1N6u8Ek1EAwAAxAoRHaJ0WuWVaLZzAAAAxAoRHaLmZmNPNAAAQAwR\n0SFKN4tP5wAAAIghIjpEwyvR2azvUQAAAFBDRHSI0s3GSjQAAEAMEdEham5tUl4plTKsRAMAAMRJ\naBFtZreY2Stmtn7Ebf9oZk+b2RNmdpeZdYT1/PUg3VJ+ebP9ec+TAAAAoJbCXIn+nqTz97vtPkkn\nOOeWSvqTpM+G+PzeNbdWInqg6HkSAAAA1FJoEe2cWy1p53633eucK1SuPiJpbljPXw/SrQlJ0uAe\nIhoAACBOfO6J/itJ/3mgH5rZFWa2xszWdHd3RzhW7QxFNCvRAAAA8eIlos3sf0sqSLrtQPdxzq10\nznU557o6OzujG66GmtuSkliJBgAAiJtk1E9oZh+Q9BZJ5zjnXNTPH6X0lPLLm82UPE8CAACAWoo0\nos3sfEmfkXSmc24gyuf2gYgGAACIpzA/4u52SQ9LOs7MtpjZhyT9s6R2SfeZ2Tozuzms568HqdZy\nROcH2c4BAAAQJ6GtRDvnLhvl5u+E9Xz1KEiZJCk3yEo0AABAnPCNhSFKpcrnuSwRDQAAECdEdIiG\nIjqfjfX7JwEAACYdIjpEQVA+zxHRAAAAsUJEh2jvSjTbOQAAAOKEiA7R8Ep0zu8cAAAAqC0iOkTD\nK9E5tnMAAADECREdouGV6LzfOQAAAFBbRHSIhj/iLmd+BwEAAEBNEdEhGt7OwUo0AABArBDRIdq7\nnYOVaAAAgDghokM0vBJd8DsHAAAAaouIDlEiIZlKyhUSvkcBAABADRHRIUs1FZQv8jIDAADECXUX\nsqCpqBwRDQAAECvUXchSiaJyxaTvMQAAAFBDRHTIUoki2zkAAABihroLWdBUVK7EGwsBAADihIgO\nWSpRUr5IRAMAAMQJER2yIFFSzrEnGgAAIE6I6JClkiXl2c4BAAAQK0R0yIJkSTmlpGLR9ygAAACo\nESI6ZKmhiM7nfY8CAACAGiGiQ5ZKOuUVSLmc71EAAABQI0R0yIKkYyUaAAAgZojokKWCyko0EQ0A\nABAbRHTIgoCVaAAAgLghokOWCsRKNAAAQMwQ0SELArESDQAAEDNEdMhSqUpE8+kcAAAAsUFEhyyV\nYjsHAABA3BDRIQtSbOcAAACIGyI6ZKmUsRINAAAQM0R0yIKUsRINAAAQM0R0yFJpVqIBAADihogO\nWZBuUkkJFTN8OgcAAEBcENEhS6VNkpTLFD1PAgAAgFohokOWai6/xPlBIhoAACAuiOiQBenyS5wb\nLHmeBAAAALVCRIeMlWgAAID4IaJDxko0AABA/BDRIUu1JCSxEg0AABAnRHTIguZyRLMSDQAAEB9E\ndMiGVqJzWed5EgAAANQKER2yVGtSkpTPshINAAAQF0R0yIa3c7ASDQAAEBtEdMiGP+IuR0QDAADE\nBREdsiAon7MSDQAAEB9EdMhSqfI5K9EAAADxQUSHbHglmogGAACIDSI6ZEMr0bmc+R0EAAAANUNE\nh2x4O0eelWgAAIC4IKJDtnc7ByvRAAAAcUFEh2zvSrTfOQAAAFA7RHTIhlei86xEAwAAxAURHbKh\niM4X/M4BAACA2iGiQzYc0axEAwAAxAYRHbK9K9FENAAAQFwQ0SFLJsvn+SIRDQAAEBdEdMjMpIQV\nlS/wUgMAAMQFZReBwAoq8MZCAACA2CCiIxA0FZUv8lIDAADEBWUXASIaAAAgXii7CARNReVLvNQA\nAABxQdlFoLwSnfA9BgAAAGoktIg2s1vM7BUzWz/ithlmdp+ZPVM5nx7W89eTIFFiJRoAACBGwiy7\n70k6f7/brpP0K+fcayT9qnI99oJEiZVoAACAGAktop1zqyXt3O/miyR9v3L5+5LeHtbz15PySjQR\nDQAAEBdR7zE43Dm3rXL5JUmHR/z8XhDRAAAA8eJto65zzklyB/q5mV1hZmvMbE13d3eEk9VekCgp\n74hoAACAuIg6ol82s9mSVDl/5UB3dM6tdM51Oee6Ojs7IxswDEHCKe+Skjvg3wwAAABoIFFH9M8k\nvb9y+f2Sfhrx83sRJJ3yCsR3fwMAAMRDmB9xd7ukhyUdZ2ZbzOxDkq6X9CYze0bSuZXrsZcciuh8\n3vcoAAAAqIFkWA/snLvsAD86J6znrFdB0qmgJBENAAAQE3wDSAQCVqIBAABihYiOQJAUEQ0AABAj\nRHQEgoCIBgAAiBMiOgJ8OgcAAEC8ENERCFKsRAMAAMQJER0BtnMAAADECxEdgSAwIhoAACBGiOgI\nsJ0DAAAgXojoCAyvRPPGQgAAgFggoiMQpNjOAQAAECdEdASSKVNRSbkcEQ0AABAHRHQEglT5ZS4M\nsp0DAAAgDojoCAQpkyTlB4ueJwEAAEAtENERCNLllzmfYSUaAAAgDojoCATpykp0tuR5EgAAANQC\nER2BIJ2QREQDAADEBREdgeHtHOyJBgAAiAUiOgJBcyWis0Q0AABAHBDRERjezjHIdg4AAIA4IKIj\nEDRXIjrnPE8CAACAWiCiIzAc0byxEAAAIBaI6AiwEg0AABAvRHQEWIkGAACIFyI6Ask0K9EAAABx\nQkRHIAjK54UcK9EAAABxQERHYCii83m/cwAAAKA2iOgIDEc02zkAAABigYiOACvRAAAA8UJER4CI\nBgAAiBciOgJ7I5rtHAAAAHFAREdgb0Sb30EAAABQE0R0BNjOAQAAEC9EdASGI7rASjQAAEAcENER\n2BvRfucAAABAbRDREWBPNAAAQLwQ0RFgOwcAAEC8ENERSCbL5/kiLzcAAEAcUHURMJMSVlShyEo0\nAABAHBDREQmswHYOAACAmCCiIxI0FdnOAQAAEBNUXUSIaAAAgPig6iISNBWVL/FyAwAAxAFVF5Hy\nSnTC9xgAAACoASI6IkFTiZVoAACAmKDqIhIkWIkGAACICyI6IkGipHyJiAYAAIgDIjoiRDQAAEB8\nENERCRIl5V3S9xgAAACoASI6IkHCKe9YiQYAAIgDIjoiyYQrr0Q753sUAAAATBARHZEg6VRQUioU\nfI8CAACACSKiIxIknfIKpHze9ygAAACYICI6IkQ0AABAfBDRERmOaLZzAAAANDwiOiJBwEo0AABA\nXBDREQkCEdEAAAAxQURHhIgGAACIDyI6IkQ0AABAfBDRERmOaN5YCAAA0PCI6IgEgbESDQAAEBNE\ndESCFBENAAAQF0R0RIIUe6IBAADigoiOSDJoUlFJuRwRDQAA0OiI6IgEKZMkFQZ5YyEAAECjI6Ij\nMhTR+cGi50kAAAAwUV4i2sw+YWZPmdl6M7vdzJp9zBGlIF1+qYloAACAxhd5RJvZkZKuktTlnDtB\nUkLSpVHPETUiGgAAID58bedISmoxs6SkVklbPc0RGSIaAAAgPiKPaOfci5JukPSCpG2Sep1z9+5/\nPzO7wszWmNma7u7uqMesOSIaAAAgPnxs55gu6SJJCyTNkTTFzFbsfz/n3ErnXJdzrquzszPqMWtu\nOKJzzvMkAAAAmCgf2znOlfScc67bOZeX9GNJr/MwR6RYiQYAAIgPHxH9gqQzzKzVzEzSOZI2eJgj\nUkFzQpKUz5Y8TwIAAICJ8rEn+lFJqyQ9JunJygwro54jakQ0AABAfCR9PKlz7u8k/Z2P5/ZlOKJz\nRDQAAECj4xsLI5JsLv+9ks/yxkIAAIBGR0RHZGglupAnogEAABodER0RPuIOAAAgPojoiAQpk0RE\nAwAAxAERHZEgKJ8T0QAAAI2PiI4IEQ0AABAfRHREhiO64HcOAAAATBwRHZG9K9F+5wAAAMDEEdER\nGY7ovN85AAAAMHFEdESIaAAAgPggoiOyd0+0+R0EAAAAEzamiDazq81sqpV9x8weM7Pzwh4uTliJ\nBgAAiI+xrkT/lXNut6TzJE2X9F5J14c2VQwNR3SRlWgAAIBGN9aIHiq/N0v6gXPuqRG3YQySyfI5\n2zkAAAAa31gjeq2Z3atyRN9jZu2SSuGNFT9DK9EFIhoAAKDhJcd4vw9JWibpWefcgJnNkPTB8MaK\nHzMpoSIr0QAAADEw1pXo10r6o3Oux8xWSPp/JfWGN1Y8BU0F9kQDAADEwFgj+luSBszsREmfkrRR\n0r+FNlVMBVZUvpjwPQYAAAAmaKwRXXDOOUkXSfpn59w3JLWHN1Y8sRINAAAQD2PdE91nZp9V+aPt\nXm9mTZKC8MaKp6CpqHyR77cBAABodGMtukskZVX+vOiXJM2V9I+hTRVTQVOJ7RwAAAAxMKaIroTz\nbZKmmdlbJA0659gTXaUgwUo0AABAHIz1a7/fI+m3ki6W9B5Jj5rZu8McLI6CREn5EhENAADQ6Ma6\nJ/p/SzrVOfeKJJlZp6T7Ja0Ka7A4YjsHAABAPIx1WbRpKKArdlTxu6gIkqxEAwAAxMFYV6J/aWb3\nSLq9cv0SSXeHM1J8BQmnfImVaAAAgEY3poh2zn3azN4laXnlppXOubvCGyuekkkiGgAAIA7GuhIt\n59yPJP0oxFliL0g4FZSUCgUpOeaXHgAAAHXmoCVnZn2S3Gg/kuScc1NDmSqmgsApo0DK5YhoAACA\nBnbQknPO8dXeNRQkpd1DEd3a6nscAAAAjBMfFRGhIHDKD0U0AAAAGhYRHaEgUDmis1nfowAAAGAC\niOgIBYGxEg0AABADRHSEUikppxQRDQAA0OCI6Ail01JWaSIaAACgwRHREUqnKyvR7IkGAABoaER0\nhFLNxko0AABADBDREUoT0QAAALFAREco3dykvFIqDRLRAAAAjYyIjlC62SRJ+YG850kAAAAwEUR0\nhFItCUlSdk/B8yQAAACYCCI6QumW8sudHSh6ngQAAAATQURHKN1aWYkmogEAABoaER2hoYjODbCd\nAwAAoJER0RFKtSYlSdlMyfMkAAAAmAgiOkLD2zmIaAAAgIZGREcoPaWyEj3oPE8CAACAiSCiIzQU\n0bkMbywEAABoZER0hFLp8petZLOsRAMAADQyIjpC6XT5PDvodw4AAABMDBEdoeGIzvqdAwAAABND\nREdoKKJzbOcAAABoaER0hFKp8jkr0QAAAI2NiI7Q8HaOnPkdBAAAABNCREdoeDtHzu8cAAAAmBgi\nOkLDK9F5XnYAAIBGRs1FaHhPNNs5AAAAGhoRHaHhiGYlGgAAoKFRcxEyk1KWU67Ayw4AANDIqLmI\npRMFZYloAACAhkbNRSzVVFA2n/A9BgAAACaAiI5YOlFQtkhEAwAANDIiOmLpREE5IhoAAKChEdER\nSyeKyhaTvscAAADABHiJaDPrMLNVZva0mW0ws9f6mMOHVJKIBgAAaHS+au5rkn7pnHu3maUktXqa\nI3LpZFHZYuB7DAAAAExA5BFtZtMkvUHSByTJOZeTlIt6Dl/SQUk5x0o0AABAI/OxnWOBpG5J3zWz\n35vZt81sioc5vEgHJWVLrEQDAAA0Mh8RnZR0sqRvOedOkrRH0nX738nMrjCzNWa2pru7O+oZQ5MO\nSsoqLRUKvkcBAADAOPmI6C2StjjnHq1cX6VyVO/DObfSOdflnOvq7OyMdMAwpQJXjujcpNnBAgAA\nEDuRR7Rz7iVJm83suMpN50j6Q9Rz+JJOOeWUkrJZ36MAAABgnHy9w+3jkm6rfDLHs5I+6GmOyKVT\nKq9EE9EAAAANy0tEO+fWSery8dy+pZutHNGZjO9RAAAAME58Y2HEUmkrb+cYHPQ9CgAAAMaJiI5Y\nuqWJlWgAAIAGR0RHbHg7ByvRAAAADYuIjli6NaG8UirtYSUaAACgURHREUu1JCRJ+T18TjQAAECj\nIqIjlm4tR3R2Nx9xBwAA0KiI6Ig1t5U/VXCwL+95EgAAAIwXER2xlvZyRGf6Cp4nAQAAwHgR0RFr\nmRpIIqIBAAAaGREdsaGIHugveZ4EAAAA40VER6x1WmUlur/oeRIAAACMFxEdsZYp5Zc8M8BKNAAA\nQKMioiPW0lI+z+xxfgcBAADAuBHRERuOaL6wEAAAoGER0REbiuiBAb9zAAAAYPyI6IgNr0QPmt9B\nAAAAMG6w1Y5QAAAb9UlEQVREdMRaW8vnRDQAAEDjIqIjNrwSneWlBwAAaFSUXMSCQEpYUZkcLz0A\nAECjouQ8aGnKaiCX9D0GAAAAxomI9qAlmVeGiAYAAGhYRLQHrcmcMgUiGgAAoFER0R60JAvK5APf\nYwAAAGCciGgPWlIFZYpENAAAQKMioj1oCYoaKKZ9jwEAAIBxIqI9aEkXlSmlJed8jwIAAIBxIKI9\naE2XlFGLlMv5HgUAAADjQER70NLsyhGdyfgeBQAAAONARHswHNGDg75HAQAAwDgQ0R60tEgDamUl\nGgAAoEER0R60tIiVaAAAgAZGRHvQOsXYEw0AANDAiGgPWlpNRSWV78/6HgUAAADjQER70NKWkCRl\neohoAACARkREezAU0QO9ec+TAAAAYDyIaA9apgaSWIkGAABoVES0B60dKUlENAAAQKMioj1o6UhL\nkjK7+dpvAACARkREezAc0eyJBgAAaEhEtAdDe6IH+oqeJwEAAMB4ENEeTGkzSdJAf8nzJAAAABgP\nItqDtrbyeX+f8zsIAAAAxoWI9mAoovv6ze8gAAAAGBci2oP29vJ5/wARDQAA0IiIaA9aWiRTSf2Z\nhO9RAAAAMA5EtAdNTdKUxKD6M0nfowAAAGAciGhP2oKs+rIp32MAAABgHIhoT9pTWfXniGgAAIBG\nRER70pbKqz+f9j0GAAAAxoGI9qStOa/+YrPvMQAAADAORLQn7S1F9ZWmSEW++hsAAKDRENGetLWW\n1K82ac8e36MAAACgSkS0J21TXDmi+/t9jwIAAIAqEdGetLWJlWgAAIAGRUR70j7V1Kd2uT5WogEA\nABoNEe1J27QmFZVUdteA71EAAABQJSLak7Zp5a/87t8+6HkSAAAAVIuI9qRteiWid+Y8TwIAAIBq\nEdGetM8of+V3386850kAAABQLSLak7ZKRPfvIqIBAAAaDRHtSdus8ld+9/fyjYUAAACNhoj2pG1m\nWpLU10NEAwAANBoi2pP2aeWXvn93yfMkAAAAqBYR7UlbW/m8fzcr0QAAAI2GiPZkOKL7/M4BAACA\n6nmLaDNLmNnvzew/fM3gU0uL1KSidvfzdwwAAECj8VlwV0va4PH5vTKTpgUD6h0IfI8CAACAKnmJ\naDObK+lCSd/28fz1oiOVUc9g2vcYAAAAqJKvlegbJX1G0gE/msLMrjCzNWa2pru7O7rJItTRMqie\nbIvvMQAAAFClyCPazN4i6RXn3NqD3c85t9I51+Wc6+rs7Ixoumh1tObVk5/iewwAAABUycdK9HJJ\nbzOzTZLukPRGM7vVwxzedbQV1OOmSYODvkcBAABAFSKPaOfcZ51zc51z8yVdKum/nHMrop6jHnRM\nLalHHVJvr+9RAAAAUAU+X82jjg4R0QAAAA3Ia0Q75x50zr3F5ww+dcxoUr/aVdhBRAMAADQSVqI9\n6piZkCT1bhvwPAkAAACqQUR71HFYSpLUsy3jeRIAAABUg4j2qOPw8het9Lyc9TwJAAAAqkFEe9Qx\np1WS1NOd9zwJAAAAqkFEezQc0TuKnicBAABANYhoj4beWNizy3meBAAAANUgoj3q6Cif9/Sa30EA\nAABQFSLao7Y2qUlF9ezmMAAAADQS6s2jpiZpWnKPevYkfY8CAACAKhDRnnWkBtSzJ/A9BgAAAKpA\nRHvW0ZzVrkyz7zEAAABQBSLasxltOe3MtfkeAwAAAFUgoj3r7Mhre2mGNDDgexQAAACMERHt2ayZ\nTt3qlHbs8D0KAAAAxoiI9qzzcFOPpiu/bbvvUQAAADBGRLRns44ofzLHjk19nicBAADAWBHRnnXO\nK38yx/YX2BMNAADQKIhoz2YdPUWS1L150PMkAAAAGCsi2rPOY9olSd3bCp4nAQAAwFgR0Z7Nml3e\nE72923meBAAAAGNFRHs2c2b5vHsHhwIAAKBRUG6eBYHUkdit7b1J36MAAABgjIjoOtCZ3q3uvhbf\nYwAAAGCMiOg6MKs1o+2ZVt9jAAAAYIyI6DrQOXVQ3blpvscAAADAGBHRdWDWjJK2l2ZIe/b4HgUA\nAABjQETXgcMOb9IrOkylrS/5HgUAAABjQETXgTlHJVVQoO1Pb/c9CgAAAMaAiK4Dc/6i/MkcW5/e\n7XkSAAAAjAURXQfmHDdVkrTt2YznSQAAADAWRHQdmLO4Q5K09YWC50kAAAAwFkR0HThiTvkwbN1m\nnicBAADAWBDRdSCdlmYld2nr9pTvUQAAADAGRHSdmNPSo629U3yPAQAAgDEgouvEnGl7tHWgw/cY\nAAAAGAMiuk7MmZXV1kKnVCz6HgUAAACHQETXiTmznV7SESq+1O17FAAAABwCEV0n5hwVqKSEXln/\nsu9RAAAAcAhEdJ2Yu7BNkrR53U7PkwAAAOBQiOg6Mb9rliRp0x8GPE8CAACAQyGi68TRS8pf/b3p\n2ZLnSQAAAHAoRHSdmDrNNCPRo+deDHyPAgAAgEMgouvIgind2rSj3fcYAAAAOAQiuo7Mn9mnTXs6\nfY8BAACAQyCi68j8I/PaVJwn17/H9ygAAAA4CCK6jsw/NqFBtejlx170PQoAAAAOgoiuIwsWt0iS\nNq3d4XkSAAAAHAwRXUfmnzJTkvTsk2znAAAAqGdEdB055ozDZCrpmQ0F36MAAADgIIjoOtIypUlH\nB9v0xxeafY8CAACAgyCi68zCGa/o6R18zB0AAEA9I6LrzMKjBvTH7HyVsnnfowAAAOAAiOg6c9yi\nJg1oirY8vNn3KAAAADgAIrrOLDxtqiTpj79+xfMkAAAAOBAius4sPHu2JOnp32c8TwIAAIADIaLr\nzOGLZmia9WrDH833KAAAADgAIrrOmElL2p/XE5un+x4FAAAAB0BE16FlR+/S433HqFQo+R4FAAAA\noyCi69BJJ5v61a6ND23xPQoAAABGQUTXoWVvnCFJWvfLlzxPAgAAgNEQ0XVo8YULlFRe6x7N+h4F\nAAAAoyCi61DzzClalNqodX9q9T0KAAAARkFE16mTZ7+k33UfLed8TwIAAID9EdF16nVdOXWXZmnj\nb3f4HgUAAAD7iTyizWyemT1gZn8ws6fM7OqoZ2gEr3vbLEnSb+54wfMkAAAA2J+PleiCpE855xZL\nOkPSlWa22MMcdW3xOxdqqnr1m4fyvkcBAADAfiKPaOfcNufcY5XLfZI2SDoy6jnqXVNbq1479Sn9\n5plZvkcBAADAfrzuiTaz+ZJOkvSozznq1euO26n1/fO1a3vR9ygAAAAYwVtEm1mbpB9JusY5t3uU\nn19hZmvMbE13d3f0A9aBN16QllOT/uuWTb5HAQAAwAheItrMApUD+jbn3I9Hu49zbqVzrss519XZ\n2RntgHXi9P+1RO3arfvv6vM9CgAAAEbw8ekcJuk7kjY45/5P1M/fSIJ5R+istrW674nDfI8CAACA\nEXysRC+X9F5JbzSzdZXTmz3M0RDOPWmHNg7M0XN/zPkeBQAAABU+Pp3jv51z5pxb6pxbVjndHfUc\njeL8S6ZJkv7jm3xeNAAAQL3gGwvr3F++93Qt0gbddZfvSQAAADCEiK53U6fqHQvWafXm+drBN4AD\nAADUBSK6Abzj4qSKSupn/7LN9ygAAAAQEd0QTrnyDB2jjfrhdwd9jwIAAAAR0Q3Bjpqn9859UL/6\n89HassX3NAAAACCiG8R7P5SSU5Nu/Ue2dAAAAPhGRDeIY688X6/Xr/XtfwtUKvmeBgAAYHIjohtF\nZ6euPOURbeyZpf/8Wd73NAAAAJMaEd1A3vmFpZqjF/X1v9vuexQAAIBJjYhuIMGb36QrZ9yhe56Y\nrTVrfE8DAAAweRHRjaSpSR/7TKtmaIf+7uN88woAAIAvRHSDmfrx9+vTbd/S3Y/M1KOP+p4GAABg\nciKiG01rqz722amapW599m965JzvgQAAACYfIroBtV39IX2x7QY98FiH7ridigYAAIgaEd2IpkzR\nX1+/QF36nT7xN4Pq6fE9EAAAwORCRDeoxEc+rJsX3qju3pQ+dVXO9zgAAACTChHdqBIJnfK9q3St\nvqJbfpDSHXf4HggAAGDyIKIb2emn64tXduu1+o2u+Ku8nnnG90AAAACTAxHd4IIbvqzbF/29UoO7\ndeH/KGg7X2YIAAAQOiK60TU36+gff1U/TV+iFzYVddFbixoY8D0UAABAvBHRcbBwoZb/+zW6TSv0\nyCPSmy9w6u/3PRQAAEB8EdFx8Za36F03nalbtUL//euSznuTY2sHAABASIjoOPnYx3TZ9ct0p7tY\nj/02r9NOdXrqKd9DAQAAxA8RHTfXXqt3/tP/o4dKr1fmxR067TSnlSvF14MDAADUEBEdR5/8pE7/\nv5/SmqbT9brif+uv/1q66CLpxRd9DwYAABAPRHRcvec9OvL/u1P3zPtf+qo+oXvvzuu445y+/GUp\nm/U9HAAAQGMjouPslFPU9Pu1uubDe/RUcaHOdffrc5+T/vIvnW6+mZgGAAAYLyI67trapJUrdey9\nN+snR1+t+3Su5vRs0Ec/Kh1zjPT3fy9t2+Z7SAAAgMZCRE8Wb3qT9PjjOvfGt+o36bN1n87V8Znf\n6fOfl446yund75buukvKZHwPCgAAUP+I6MkkCKSrr5Ztek7nfvUtunfKO/UnvUZXp/9FD969R+98\np3TYYdJll0n//u/Sjh2+BwYAAKhP5hrgs8+6urrcmjVrfI8RP4WCdPfd0s03K/+f9+tBnal/n/HX\numvwAm0fmCIzp5NPNr3pTdLrXy+ddpo0a5bvoQEAAMJjZmudc12HvB8RDUnlz79btUq6804VfvOo\nfqdTdd+Ut+v+1ov08M6/VKFY/pcWxx4rnX661NUlHX+8dMIJ0uzZkpnn+QEAAGqAiMb4bdsm3Xef\ndO+90r33ak/3Hq1Rlx5tOUuPTn2THhlYqq19U4fvPn16OaiPO678ZsWRp5kzCWwAANA4iGjURqkk\nbdggPfqo9Mgj0sMPS089pW43U0/peK0PTtb6acu13pboz5k5erm/bZ9fb2uTjj66vFo9dJozZ+/l\nI44oh3ZHh9TEDn0AAOAZEY3w7Nkj/eEP0vr15dOTT5ZDe8sW7VGrntMCPacFejZYqGfbT9Tm5Hxt\nc0doW3amtu6Zqnwx8aqHNCuvaM+cKc2YUT4fujxjhjR1qtTefvBTOs2qNwAAmBgiGtEbHJSef17a\nuFF69tny6bnnyttDtm6VXnpJLp/XTs3QVs3RNs3WSzpCOxOHaUfzkdqZOkI7kodpp2Zqp+vQjvxU\n7Rycot3Z9JiePpksx3Rbm9TSUv2ptVVqbi7HeCr16vPRbhv5MwIeAIDGN9aITkYxDCaJ5ubyxujj\njhv956WSbMcOzdy2TTO3btWSbduk7dvLn6W34w/Szv+uXN4h7dwp9e2QslkVlFCf2g942m3T1Jea\npb7UDPVpuvYMtCuTmaKMtSijVmVcs3a7Zg2U0soU08oUA2UK5dNoq+LjFQQHD+10eu99gmDf01hu\nC+v3kkn+AAAAoFpENKLT1CR1dpZPS5ce+v7OSQMDSvb2anpfn6b39Um7d0t9fXtPw9d3SLufK1/O\nZMqngYFXX85n9vlGmaKalFHLPqecUsOnrNKjn1uLcslW5ZKtyiZalUu0KNvUolyiufwz16xcNq1s\nNq1cf1rZocd0KQ0oqZwLlHfJ8qmUUK5UPs8XE8qXmpQvNilXSChfjGajeDI5vmgPI+yTyfGfhn4/\nkeAPAwBAuIho1C8zacqU8qmWnCtvPclklBgYUFsmo7aRoZ3LSdnsoU+5nJTtkbIvj/G+o9yWzx98\nVElFJZRXoLwC5ZQavjza9eHbkq3KBy3KJ8qxn08073PKJZqVb0orb+nyeVNKOWtW3gLllVLe9nvs\nXKB8NqmcSypfSirjEtpdCf/ccPgnKvHfpHzRlC+UT7m8n3eMJhITC/GxnhKJ0U9NTdXd7vN3mprK\nJ7N9L/OHCAAcGBGNycds70boGTP8zuJcOaSHIjuX2+dkuZyS2aySuZxa9vvZaPc/8O27xnb/7Ci3\nZ7PlOcf7j6h9/xAYjv2mFuWDVuVTU5QPyqv6hWTz3lPQokIiXT4l0yo0pfdeT6RVaEqp0JRS3lLD\nlwsW7D0pWbmcLF9WUgWXVEGJveelhAouoYIr/xFQcE0qlJpUyDdpIGsqFE2FgqlQ0EFPxeK+p1Kp\nfN4Abzk5qKGQHgrrkafRbo/yvtX+flinsB8/itPQP8PQMR95/DmP//mQaq6Hdd+R15uapGnTVNeI\naMAns73vTGxrO/T9fSkWxxnvOVnlj4BX/SFw0MfZOfrtmYM87yFW9cdt/3eQplJSOiW1j3L7fu9A\ndUFKxaBZxWS6fB40qxSky9eHTonyfUrJVPnyq06BSomUik3B3lMipaIlVUoEKlpSRdd0wJAfy+3O\nlW8vlfa9PPI02u1jva0W9y2Vyn+wTOS5wjqN9fEBjN0xx5Q/p6CeEdEADi2R2Lt6X68Osaof+u0D\nA9KuXa/+NwmVf5tQi1X9A0okRo/9/T9WZui0/76VQ11OjfF+Y7081vslavfG33oRZsxP5I+AodlG\nzsl5/M+HVHM9rPvuf33qVNU9IhpAPEyCVf2a3L5nz757UfL5Q18uFv28VmZji/Ch4B65SX3/DesH\n+tlELo/jdyyZlE34sZr23dw+dL7/vxcHECoiGgCi1Air+vsb2gNysNAea5BP9HcO9PtDe1P2v5zN\njn77yP0sB/v9kfdrBCPfKTpaaI92Xqv7RH3foQ3dB9oQX+31WjxGFI/JH0t1g4gGABzc0P95B0Fj\nxX8tDe17qEWQjzXaR/vZyI3hIze5jzw/2M9qdd98PvwZcGC1DPOR7y6t9vJ4f28sj3HEEdLXv+77\nlT4oIhoAgEMx27ulIpXyPc3k4NzBg3u0d5FO9Ppke8yhdxYP/Ww8lw/080M99qEet7fX938CD4mI\nBgAA9WdoTzxQp5p8DwAAAAA0GiIaAAAAqBIRDQAAAFSJiAYAAACqREQDAAAAVSKiAQAAgCoR0QAA\nAECViGgAAACgSkQ0AAAAUCUiGgAAAKgSEQ0AAABUiYgGAAAAqkREAwAAAFUiogEAAIAqEdEAAABA\nlYhoAAAAoEpENAAAAFAlIhoAAACokpeINrPzzeyPZvZnM7vOxwwAAADAeEUe0WaWkPQNSRdIWizp\nMjNbHPUcAAAAwHj5WIk+TdKfnXPPOudyku6QdJGHOQAAAIBx8RHRR0raPOL6lspt+zCzK8xsjZmt\n6e7ujmw4AAAA4FDq9o2FzrmVzrku51xXZ2en73EAAACAYUkPz/mipHkjrs+t3HZAa9eu3W5mz4c6\n1ehmSdru4XkRLY7z5MBxnhw4zpMDx3ly8HWcjx7Lncw5F/Yg+z6hWVLSnySdo3I8/07S/3TOPRXp\nIGNgZmucc12+50C4OM6TA8d5cuA4Tw4c58mh3o9z5CvRzrmCmX1M0j2SEpJuqceABgAAAA7Ex3YO\nOefulnS3j+cGAAAAJqpu31hYJ1b6HgCR4DhPDhznyYHjPDlwnCeHuj7Oke+JBgAAABodK9EAAABA\nlYjoAzCz883sj2b2ZzO7zvc8GB8zm2dmD5jZH8zsKTO7unL7DDO7z8yeqZxPH/E7n60c9z+a2f/w\nNz2qZWYJM/u9mf1H5TrHOWbMrMPMVpnZ02a2wcxey3GOHzP7ROV/s9eb2e1m1sxxbnxmdouZvWJm\n60fcVvVxNbNTzOzJys9uMjOL+p9FIqJHZWYJSd+QdIGkxZIuM7PFfqfCOBUkfco5t1jSGZKurBzL\n6yT9yjn3Gkm/qlxX5WeXSjpe0vmSvln5zwMaw9WSNoy4znGOn69J+qVzbqGkE1U+3hznGDGzIyVd\nJanLOXeCyp/kdak4znHwPZWP0UjjOa7fkvRhSa+pnPZ/zEgQ0aM7TdKfnXPPOudyku6QdJHnmTAO\nzrltzrnHKpf7VP4/3CNVPp7fr9zt+5LeXrl8kaQ7nHNZ59xzkv6s8n8eUOfMbK6kCyV9e8TNHOcY\nMbNpkt4g6TuS5JzLOed6xHGOo6Sklsp3S7RK2iqOc8Nzzq2WtHO/m6s6rmY2W9JU59wjrvzGvn8b\n8TuRIqJHd6SkzSOub6nchgZmZvMlnSTpUUmHO+e2VX70kqTDK5c59o3rRkmfkVQacRvHOV4WSOqW\n9N3Ktp1vm9kUcZxjxTn3oqQbJL0gaZukXufcveI4x1W1x/XIyuX9b48cEY1JwczaJP1I0jXOud0j\nf1b5S5aPqWlgZvYWSa8459Ye6D4c51hISjpZ0reccydJ2qPKv/odwnFufJU9sRep/EfTHElTzGzF\nyPtwnOOp0Y4rET26FyXNG3F9buU2NCAzC1QO6Nuccz+u3Pxy5V8JqXL+SuV2jn1jWi7pbWa2SeXt\nV280s1vFcY6bLZK2OOcerVxfpXJUc5zj5VxJzznnup1zeUk/lvQ6cZzjqtrj+mLl8v63R46IHt3v\nJL3GzBaYWUrlje0/8zwTxqHyjt3vSNrgnPs/I370M0nvr1x+v6Sfjrj9UjNLm9kCld+w8Nuo5sX4\nOOc+65yb65ybr/J/X//LObdCHOdYcc69JGmzmR1XuekcSX8QxzluXpB0hpm1Vv43/ByV38/CcY6n\nqo5rZevHbjM7o/Kfj/eN+J1Iefna73rnnCuY2cck3aPyu4Jvcc495XksjM9ySe+V9KSZravc9jlJ\n10u608w+JOl5Se+RJOfcU2Z2p8r/x1yQdKVzrhj92KgRjnP8fFzSbZUFjmclfVDlBSGOc0w45x41\ns1WSHlP5uP1e5W+uaxPHuaGZ2e2SzpI0y8y2SPo7je9/p/9G5U/6aJH0n5VT5PjGQgAAAKBKbOcA\nAAAAqkREAwAAAFUiogEAAIAqEdEAAABAlYhoAAAAoEpENABMUmZ2lpn9h+85AKAREdEAAABAlYho\nAKhzZrbCzH5rZuvM7F/MLGFm/Wb2VTN7ysx+ZWadlfsuM7NHzOwJM7vLzKZXbv8LM7vfzB43s8fM\n7NjKw7eZ2Soze9rMbqt8AxgA4BCIaACoY2a2SNIlkpY79/+3d/+uPsVxHMefL4v8igwWAxkpP1IG\nZfIPGK6FbjJbbFIs/gfFeMUgxa4M37oTBqWMplvqLhKKdL0M3zNcFs4t935dz8d0zvt8evf5LOe8\ne59PfXocWAEuAjuAl22PABOmJ38B3AOutT0KvF4VfwDcbnsMOA28G+IngKvAYeAQ01M+JUm/4bHf\nkjTbzgIngRdDk3gbsAx8Bx4OY+4Dj5PsBva0nQzxBeBRkl3A/rZPANp+ARjyPW+7NNy/Ag4Ci39/\nWZL0b7OIlqTZFmCh7fWfgsnNX8Z1jfm/rrpewe+CJP0Rt3NI0mx7Bswl2QeQZG+SA0zf33PDmAvA\nYtsPwPskZ4b4PDBp+xFYSnJuyLE1yfZ1XYUkbTJ2HCRphrV9k+QG8DTJFuAbcAX4DJwani0z3TcN\ncAm4MxTJb4HLQ3weuJvk1pDj/DouQ5I2nbRr/QMoSdooST613bnR85Ck/5XbOSRJkqSR7ERLkiRJ\nI9mJliRJkkayiJYkSZJGsoiWJEmSRrKIliRJkkayiJYkSZJGsoiWJEmSRvoBrU2O1eukuE4AAAAA\nSUVORK5CYII=\n",
477 | "text/plain": [
478 | ""
479 | ]
480 | },
481 | "metadata": {},
482 | "output_type": "display_data"
483 | }
484 | ],
485 | "source": [
486 | "plt.figure(figsize=(12,10))\n",
487 | "plt.plot(train_loss_history, color=\"red\", label=\"train loss\")\n",
488 | "plt.plot(valid_loss_history, color=\"blue\", label=\"valid loss\")\n",
489 | "plt.xlabel(\"epoch\")\n",
490 | "plt.ylabel(\"loss\")\n",
491 | "plt.legend()\n",
492 | "plt.show()"
493 | ]
494 | },
495 | {
496 | "cell_type": "code",
497 | "execution_count": 23,
498 | "metadata": {},
499 | "outputs": [
500 | {
501 | "name": "stdout",
502 | "output_type": "stream",
503 | "text": [
504 | "MF Accuracy: 45.008881%\n"
505 | ]
506 | }
507 | ],
508 | "source": [
509 | "mf_accuracy = np.sum(np.round(final_valid_predictions) == validset.rating.values) / len(final_valid_predictions)\n",
510 | "print(\"MF Accuracy: %f%%\"%(mf_accuracy*100,))"
511 | ]
512 | },
513 | {
514 | "cell_type": "markdown",
515 | "metadata": {},
516 | "source": [
517 | "### results on validation set"
518 | ]
519 | },
520 | {
521 | "cell_type": "code",
522 | "execution_count": 17,
523 | "metadata": {},
524 | "outputs": [
525 | {
526 | "data": {
527 | "text/html": [
528 | "\n",
529 | "\n",
542 | "
\n",
543 | " \n",
544 | " \n",
545 | " | \n",
546 | " gender | \n",
547 | " userid | \n",
548 | " movieid | \n",
549 | " age_desc | \n",
550 | " occ_desc | \n",
551 | " title | \n",
552 | " genre | \n",
553 | " rating | \n",
554 | " prediction (rnd.) | \n",
555 | " prediction (prc.) | \n",
556 | "
\n",
557 | " \n",
558 | " \n",
559 | " \n",
560 | " 323753 | \n",
561 | " M | \n",
562 | " 3432 | \n",
563 | " 1220 | \n",
564 | " 25-34 | \n",
565 | " programmer | \n",
566 | " Blues Brothers, The (1980) | \n",
567 | " Action|Comedy|Musical | \n",
568 | " 4 | \n",
569 | " 4 | \n",
570 | " 3.595490 | \n",
571 | "
\n",
572 | " \n",
573 | " 572412 | \n",
574 | " M | \n",
575 | " 3579 | \n",
576 | " 2089 | \n",
577 | " 18-24 | \n",
578 | " other or not specified | \n",
579 | " Rescuers Down Under, The (1990) | \n",
580 | " Animation|Children's | \n",
581 | " 3 | \n",
582 | " 3 | \n",
583 | " 2.909302 | \n",
584 | "
\n",
585 | " \n",
586 | " 473636 | \n",
587 | " M | \n",
588 | " 3931 | \n",
589 | " 1680 | \n",
590 | " 25-34 | \n",
591 | " customer service | \n",
592 | " Sliding Doors (1998) | \n",
593 | " Drama|Romance | \n",
594 | " 4 | \n",
595 | " 4 | \n",
596 | " 3.858740 | \n",
597 | "
\n",
598 | " \n",
599 | " 308120 | \n",
600 | " M | \n",
601 | " 4378 | \n",
602 | " 1203 | \n",
603 | " 18-24 | \n",
604 | " technician/engineer | \n",
605 | " 12 Angry Men (1957) | \n",
606 | " Drama | \n",
607 | " 3 | \n",
608 | " 3 | \n",
609 | " 3.197440 | \n",
610 | "
\n",
611 | " \n",
612 | " 621235 | \n",
613 | " M | \n",
614 | " 4251 | \n",
615 | " 2301 | \n",
616 | " 45-49 | \n",
617 | " executive/managerial | \n",
618 | " History of the World: Part I (1981) | \n",
619 | " Comedy | \n",
620 | " 2 | \n",
621 | " 3 | \n",
622 | " 3.245275 | \n",
623 | "
\n",
624 | " \n",
625 | " 280736 | \n",
626 | " M | \n",
627 | " 1482 | \n",
628 | " 1127 | \n",
629 | " 25-34 | \n",
630 | " executive/managerial | \n",
631 | " Abyss, The (1989) | \n",
632 | " Action|Adventure|Sci-Fi|Thriller | \n",
633 | " 3 | \n",
634 | " 4 | \n",
635 | " 3.565892 | \n",
636 | "
\n",
637 | " \n",
638 | " 251324 | \n",
639 | " M | \n",
640 | " 2414 | \n",
641 | " 1033 | \n",
642 | " 25-34 | \n",
643 | " academic/educator | \n",
644 | " Fox and the Hound, The (1981) | \n",
645 | " Animation|Children's | \n",
646 | " 4 | \n",
647 | " 4 | \n",
648 | " 3.941283 | \n",
649 | "
\n",
650 | " \n",
651 | " 178892 | \n",
652 | " M | \n",
653 | " 4732 | \n",
654 | " 648 | \n",
655 | " 25-34 | \n",
656 | " sales/marketing | \n",
657 | " Mission: Impossible (1996) | \n",
658 | " Action|Adventure|Mystery | \n",
659 | " 3 | \n",
660 | " 3 | \n",
661 | " 3.208387 | \n",
662 | "
\n",
663 | " \n",
664 | " 723124 | \n",
665 | " M | \n",
666 | " 4439 | \n",
667 | " 2694 | \n",
668 | " 35-44 | \n",
669 | " doctor/health care | \n",
670 | " Big Daddy (1999) | \n",
671 | " Comedy | \n",
672 | " 3 | \n",
673 | " 3 | \n",
674 | " 3.135849 | \n",
675 | "
\n",
676 | " \n",
677 | " 83939 | \n",
678 | " M | \n",
679 | " 727 | \n",
680 | " 317 | \n",
681 | " 35-44 | \n",
682 | " lawyer | \n",
683 | " Santa Clause, The (1994) | \n",
684 | " Children's|Comedy|Fantasy | \n",
685 | " 4 | \n",
686 | " 4 | \n",
687 | " 3.934950 | \n",
688 | "
\n",
689 | " \n",
690 | " 541832 | \n",
691 | " M | \n",
692 | " 4098 | \n",
693 | " 2001 | \n",
694 | " 35-44 | \n",
695 | " executive/managerial | \n",
696 | " Lethal Weapon 2 (1989) | \n",
697 | " Action|Comedy|Crime|Drama | \n",
698 | " 4 | \n",
699 | " 4 | \n",
700 | " 4.141999 | \n",
701 | "
\n",
702 | " \n",
703 | " 915511 | \n",
704 | " M | \n",
705 | " 1579 | \n",
706 | " 3499 | \n",
707 | " 25-34 | \n",
708 | " other or not specified | \n",
709 | " Misery (1990) | \n",
710 | " Horror | \n",
711 | " 4 | \n",
712 | " 4 | \n",
713 | " 4.010313 | \n",
714 | "
\n",
715 | " \n",
716 | " 405738 | \n",
717 | " M | \n",
718 | " 3934 | \n",
719 | " 1379 | \n",
720 | " 25-34 | \n",
721 | " other or not specified | \n",
722 | " Young Guns II (1990) | \n",
723 | " Action|Comedy|Western | \n",
724 | " 2 | \n",
725 | " 2 | \n",
726 | " 2.088194 | \n",
727 | "
\n",
728 | " \n",
729 | " 285899 | \n",
730 | " F | \n",
731 | " 1899 | \n",
732 | " 1147 | \n",
733 | " 45-49 | \n",
734 | " doctor/health care | \n",
735 | " When We Were Kings (1996) | \n",
736 | " Documentary | \n",
737 | " 4 | \n",
738 | " 4 | \n",
739 | " 4.463424 | \n",
740 | "
\n",
741 | " \n",
742 | " 181963 | \n",
743 | " F | \n",
744 | " 4048 | \n",
745 | " 673 | \n",
746 | " 35-44 | \n",
747 | " academic/educator | \n",
748 | " Space Jam (1996) | \n",
749 | " Adventure|Animation|Children's|Comedy|Fantasy | \n",
750 | " 4 | \n",
751 | " 4 | \n",
752 | " 4.013864 | \n",
753 | "
\n",
754 | " \n",
755 | " 931626 | \n",
756 | " M | \n",
757 | " 3125 | \n",
758 | " 3555 | \n",
759 | " 25-34 | \n",
760 | " executive/managerial | \n",
761 | " U-571 (2000) | \n",
762 | " Action|Thriller | \n",
763 | " 4 | \n",
764 | " 4 | \n",
765 | " 3.929566 | \n",
766 | "
\n",
767 | " \n",
768 | " 602570 | \n",
769 | " M | \n",
770 | " 5788 | \n",
771 | " 2193 | \n",
772 | " 25-34 | \n",
773 | " other or not specified | \n",
774 | " Willow (1988) | \n",
775 | " Action|Adventure|Fantasy | \n",
776 | " 3 | \n",
777 | " 4 | \n",
778 | " 3.773484 | \n",
779 | "
\n",
780 | " \n",
781 | " 516839 | \n",
782 | " F | \n",
783 | " 5530 | \n",
784 | " 1924 | \n",
785 | " 18-24 | \n",
786 | " college/grad student | \n",
787 | " Plan 9 from Outer Space (1958) | \n",
788 | " Horror|Sci-Fi | \n",
789 | " 3 | \n",
790 | " 3 | \n",
791 | " 2.927153 | \n",
792 | "
\n",
793 | " \n",
794 | " 596378 | \n",
795 | " M | \n",
796 | " 2242 | \n",
797 | " 2162 | \n",
798 | " 18-24 | \n",
799 | " college/grad student | \n",
800 | " NeverEnding Story II: The Next Chapter, The (1... | \n",
801 | " Adventure|Children's|Fantasy | \n",
802 | " 1 | \n",
803 | " 3 | \n",
804 | " 3.229770 | \n",
805 | "
\n",
806 | " \n",
807 | " 336866 | \n",
808 | " F | \n",
809 | " 1516 | \n",
810 | " 1240 | \n",
811 | " 25-34 | \n",
812 | " programmer | \n",
813 | " Terminator, The (1984) | \n",
814 | " Action|Sci-Fi|Thriller | \n",
815 | " 4 | \n",
816 | " 3 | \n",
817 | " 3.214885 | \n",
818 | "
\n",
819 | " \n",
820 | " 403557 | \n",
821 | " F | \n",
822 | " 4208 | \n",
823 | " 1376 | \n",
824 | " 35-44 | \n",
825 | " other or not specified | \n",
826 | " Star Trek IV: The Voyage Home (1986) | \n",
827 | " Action|Adventure|Sci-Fi | \n",
828 | " 5 | \n",
829 | " 4 | \n",
830 | " 3.898657 | \n",
831 | "
\n",
832 | " \n",
833 | " 532357 | \n",
834 | " M | \n",
835 | " 3101 | \n",
836 | " 1968 | \n",
837 | " 18-24 | \n",
838 | " scientist | \n",
839 | " Breakfast Club, The (1985) | \n",
840 | " Comedy|Drama | \n",
841 | " 5 | \n",
842 | " 4 | \n",
843 | " 4.322102 | \n",
844 | "
\n",
845 | " \n",
846 | " 898240 | \n",
847 | " M | \n",
848 | " 5648 | \n",
849 | " 3421 | \n",
850 | " 25-34 | \n",
851 | " technician/engineer | \n",
852 | " Animal House (1978) | \n",
853 | " Comedy | \n",
854 | " 4 | \n",
855 | " 4 | \n",
856 | " 4.268606 | \n",
857 | "
\n",
858 | " \n",
859 | " 877298 | \n",
860 | " M | \n",
861 | " 2095 | \n",
862 | " 3301 | \n",
863 | " 56+ | \n",
864 | " retired | \n",
865 | " Whole Nine Yards, The (2000) | \n",
866 | " Comedy|Crime | \n",
867 | " 3 | \n",
868 | " 4 | \n",
869 | " 3.500426 | \n",
870 | "
\n",
871 | " \n",
872 | " 796428 | \n",
873 | " M | \n",
874 | " 1395 | \n",
875 | " 2968 | \n",
876 | " 25-34 | \n",
877 | " artist | \n",
878 | " Time Bandits (1981) | \n",
879 | " Adventure|Fantasy|Sci-Fi | \n",
880 | " 5 | \n",
881 | " 5 | \n",
882 | " 4.660578 | \n",
883 | "
\n",
884 | " \n",
885 | " 732618 | \n",
886 | " M | \n",
887 | " 523 | \n",
888 | " 2716 | \n",
889 | " 50-55 | \n",
890 | " executive/managerial | \n",
891 | " Ghostbusters (1984) | \n",
892 | " Comedy|Horror | \n",
893 | " 3 | \n",
894 | " 4 | \n",
895 | " 4.430697 | \n",
896 | "
\n",
897 | " \n",
898 | " 627537 | \n",
899 | " F | \n",
900 | " 3713 | \n",
901 | " 2324 | \n",
902 | " 25-34 | \n",
903 | " executive/managerial | \n",
904 | " Life Is Beautiful (La Vita è bella) (1997) | \n",
905 | " Comedy|Drama | \n",
906 | " 5 | \n",
907 | " 5 | \n",
908 | " 4.849105 | \n",
909 | "
\n",
910 | " \n",
911 | " 40433 | \n",
912 | " M | \n",
913 | " 5677 | \n",
914 | " 141 | \n",
915 | " 25-34 | \n",
916 | " academic/educator | \n",
917 | " Birdcage, The (1996) | \n",
918 | " Comedy | \n",
919 | " 5 | \n",
920 | " 4 | \n",
921 | " 3.652851 | \n",
922 | "
\n",
923 | " \n",
924 | " 621897 | \n",
925 | " F | \n",
926 | " 3464 | \n",
927 | " 2302 | \n",
928 | " 25-34 | \n",
929 | " sales/marketing | \n",
930 | " My Cousin Vinny (1992) | \n",
931 | " Comedy | \n",
932 | " 3 | \n",
933 | " 4 | \n",
934 | " 3.703601 | \n",
935 | "
\n",
936 | " \n",
937 | " 904828 | \n",
938 | " F | \n",
939 | " 3205 | \n",
940 | " 3450 | \n",
941 | " 35-44 | \n",
942 | " self-employed | \n",
943 | " Grumpy Old Men (1993) | \n",
944 | " Comedy | \n",
945 | " 4 | \n",
946 | " 3 | \n",
947 | " 3.459073 | \n",
948 | "
\n",
949 | " \n",
950 | " 387733 | \n",
951 | " M | \n",
952 | " 5111 | \n",
953 | " 1334 | \n",
954 | " 35-44 | \n",
955 | " artist | \n",
956 | " Blob, The (1958) | \n",
957 | " Horror|Sci-Fi | \n",
958 | " 4 | \n",
959 | " 4 | \n",
960 | " 3.665181 | \n",
961 | "
\n",
962 | " \n",
963 | " 482628 | \n",
964 | " F | \n",
965 | " 1894 | \n",
966 | " 1721 | \n",
967 | " 35-44 | \n",
968 | " executive/managerial | \n",
969 | " Titanic (1997) | \n",
970 | " Drama|Romance | \n",
971 | " 4 | \n",
972 | " 4 | \n",
973 | " 3.531790 | \n",
974 | "
\n",
975 | " \n",
976 | " 851826 | \n",
977 | " M | \n",
978 | " 4774 | \n",
979 | " 3176 | \n",
980 | " 45-49 | \n",
981 | " sales/marketing | \n",
982 | " Talented Mr. Ripley, The (1999) | \n",
983 | " Drama|Mystery|Thriller | \n",
984 | " 4 | \n",
985 | " 4 | \n",
986 | " 4.008468 | \n",
987 | "
\n",
988 | " \n",
989 | " 566304 | \n",
990 | " M | \n",
991 | " 4957 | \n",
992 | " 2072 | \n",
993 | " 25-34 | \n",
994 | " artist | \n",
995 | " 'burbs, The (1989) | \n",
996 | " Comedy | \n",
997 | " 3 | \n",
998 | " 3 | \n",
999 | " 2.680943 | \n",
1000 | "
\n",
1001 | " \n",
1002 | " 523888 | \n",
1003 | " M | \n",
1004 | " 4533 | \n",
1005 | " 1954 | \n",
1006 | " 25-34 | \n",
1007 | " programmer | \n",
1008 | " Rocky (1976) | \n",
1009 | " Action|Drama | \n",
1010 | " 3 | \n",
1011 | " 4 | \n",
1012 | " 3.779227 | \n",
1013 | "
\n",
1014 | " \n",
1015 | " 999132 | \n",
1016 | " M | \n",
1017 | " 2362 | \n",
1018 | " 3948 | \n",
1019 | " 25-34 | \n",
1020 | " sales/marketing | \n",
1021 | " Meet the Parents (2000) | \n",
1022 | " Comedy | \n",
1023 | " 5 | \n",
1024 | " 4 | \n",
1025 | " 4.275411 | \n",
1026 | "
\n",
1027 | " \n",
1028 | " 142695 | \n",
1029 | " F | \n",
1030 | " 2657 | \n",
1031 | " 527 | \n",
1032 | " Under 18 | \n",
1033 | " K-12 student | \n",
1034 | " Schindler's List (1993) | \n",
1035 | " Drama|War | \n",
1036 | " 5 | \n",
1037 | " 5 | \n",
1038 | " 4.537353 | \n",
1039 | "
\n",
1040 | " \n",
1041 | " 169208 | \n",
1042 | " F | \n",
1043 | " 2926 | \n",
1044 | " 595 | \n",
1045 | " 18-24 | \n",
1046 | " artist | \n",
1047 | " Beauty and the Beast (1991) | \n",
1048 | " Animation|Children's|Musical | \n",
1049 | " 4 | \n",
1050 | " 4 | \n",
1051 | " 4.074231 | \n",
1052 | "
\n",
1053 | " \n",
1054 | " 950386 | \n",
1055 | " M | \n",
1056 | " 3554 | \n",
1057 | " 3676 | \n",
1058 | " 25-34 | \n",
1059 | " scientist | \n",
1060 | " Eraserhead (1977) | \n",
1061 | " Drama|Horror | \n",
1062 | " 4 | \n",
1063 | " 3 | \n",
1064 | " 2.755942 | \n",
1065 | "
\n",
1066 | " \n",
1067 | " 830422 | \n",
1068 | " F | \n",
1069 | " 5643 | \n",
1070 | " 3098 | \n",
1071 | " 35-44 | \n",
1072 | " academic/educator | \n",
1073 | " Natural, The (1984) | \n",
1074 | " Drama | \n",
1075 | " 3 | \n",
1076 | " 3 | \n",
1077 | " 3.336859 | \n",
1078 | "
\n",
1079 | " \n",
1080 | " 974878 | \n",
1081 | " M | \n",
1082 | " 1820 | \n",
1083 | " 3770 | \n",
1084 | " 25-34 | \n",
1085 | " writer | \n",
1086 | " Dreamscape (1984) | \n",
1087 | " Adventure|Crime|Sci-Fi|Thriller | \n",
1088 | " 4 | \n",
1089 | " 4 | \n",
1090 | " 3.603387 | \n",
1091 | "
\n",
1092 | " \n",
1093 | " 129671 | \n",
1094 | " M | \n",
1095 | " 1352 | \n",
1096 | " 480 | \n",
1097 | " 35-44 | \n",
1098 | " writer | \n",
1099 | " Jurassic Park (1993) | \n",
1100 | " Action|Adventure|Sci-Fi | \n",
1101 | " 3 | \n",
1102 | " 3 | \n",
1103 | " 2.773809 | \n",
1104 | "
\n",
1105 | " \n",
1106 | " 629803 | \n",
1107 | " M | \n",
1108 | " 5488 | \n",
1109 | " 2333 | \n",
1110 | " 25-34 | \n",
1111 | " scientist | \n",
1112 | " Gods and Monsters (1998) | \n",
1113 | " Drama | \n",
1114 | " 5 | \n",
1115 | " 4 | \n",
1116 | " 3.892385 | \n",
1117 | "
\n",
1118 | " \n",
1119 | " 80676 | \n",
1120 | " F | \n",
1121 | " 4208 | \n",
1122 | " 302 | \n",
1123 | " 35-44 | \n",
1124 | " other or not specified | \n",
1125 | " Queen Margot (La Reine Margot) (1994) | \n",
1126 | " Drama|Romance | \n",
1127 | " 2 | \n",
1128 | " 4 | \n",
1129 | " 4.066545 | \n",
1130 | "
\n",
1131 | " \n",
1132 | " 851072 | \n",
1133 | " M | \n",
1134 | " 1548 | \n",
1135 | " 3176 | \n",
1136 | " 35-44 | \n",
1137 | " academic/educator | \n",
1138 | " Talented Mr. Ripley, The (1999) | \n",
1139 | " Drama|Mystery|Thriller | \n",
1140 | " 5 | \n",
1141 | " 4 | \n",
1142 | " 4.229926 | \n",
1143 | "
\n",
1144 | " \n",
1145 | " 902891 | \n",
1146 | " M | \n",
1147 | " 4030 | \n",
1148 | " 3444 | \n",
1149 | " 35-44 | \n",
1150 | " executive/managerial | \n",
1151 | " Bloodsport (1988) | \n",
1152 | " Action | \n",
1153 | " 4 | \n",
1154 | " 3 | \n",
1155 | " 3.480523 | \n",
1156 | "
\n",
1157 | " \n",
1158 | " 516319 | \n",
1159 | " F | \n",
1160 | " 4472 | \n",
1161 | " 1923 | \n",
1162 | " 45-49 | \n",
1163 | " self-employed | \n",
1164 | " There's Something About Mary (1998) | \n",
1165 | " Comedy | \n",
1166 | " 2 | \n",
1167 | " 3 | \n",
1168 | " 3.330519 | \n",
1169 | "
\n",
1170 | " \n",
1171 | " 674849 | \n",
1172 | " M | \n",
1173 | " 5102 | \n",
1174 | " 2474 | \n",
1175 | " 25-34 | \n",
1176 | " programmer | \n",
1177 | " Color of Money, The (1986) | \n",
1178 | " Drama | \n",
1179 | " 4 | \n",
1180 | " 4 | \n",
1181 | " 4.380075 | \n",
1182 | "
\n",
1183 | " \n",
1184 | " 184396 | \n",
1185 | " F | \n",
1186 | " 1051 | \n",
1187 | " 708 | \n",
1188 | " 25-34 | \n",
1189 | " other or not specified | \n",
1190 | " Truth About Cats & Dogs, The (1996) | \n",
1191 | " Comedy|Romance | \n",
1192 | " 4 | \n",
1193 | " 3 | \n",
1194 | " 3.434905 | \n",
1195 | "
\n",
1196 | " \n",
1197 | " 737397 | \n",
1198 | " M | \n",
1199 | " 1854 | \n",
1200 | " 2723 | \n",
1201 | " 25-34 | \n",
1202 | " other or not specified | \n",
1203 | " Mystery Men (1999) | \n",
1204 | " Action|Adventure|Comedy | \n",
1205 | " 2 | \n",
1206 | " 2 | \n",
1207 | " 2.071134 | \n",
1208 | "
\n",
1209 | " \n",
1210 | "
\n",
1211 | "
"
1212 | ],
1213 | "text/plain": [
1214 | " gender userid movieid age_desc occ_desc \\\n",
1215 | "323753 M 3432 1220 25-34 programmer \n",
1216 | "572412 M 3579 2089 18-24 other or not specified \n",
1217 | "473636 M 3931 1680 25-34 customer service \n",
1218 | "308120 M 4378 1203 18-24 technician/engineer \n",
1219 | "621235 M 4251 2301 45-49 executive/managerial \n",
1220 | "280736 M 1482 1127 25-34 executive/managerial \n",
1221 | "251324 M 2414 1033 25-34 academic/educator \n",
1222 | "178892 M 4732 648 25-34 sales/marketing \n",
1223 | "723124 M 4439 2694 35-44 doctor/health care \n",
1224 | "83939 M 727 317 35-44 lawyer \n",
1225 | "541832 M 4098 2001 35-44 executive/managerial \n",
1226 | "915511 M 1579 3499 25-34 other or not specified \n",
1227 | "405738 M 3934 1379 25-34 other or not specified \n",
1228 | "285899 F 1899 1147 45-49 doctor/health care \n",
1229 | "181963 F 4048 673 35-44 academic/educator \n",
1230 | "931626 M 3125 3555 25-34 executive/managerial \n",
1231 | "602570 M 5788 2193 25-34 other or not specified \n",
1232 | "516839 F 5530 1924 18-24 college/grad student \n",
1233 | "596378 M 2242 2162 18-24 college/grad student \n",
1234 | "336866 F 1516 1240 25-34 programmer \n",
1235 | "403557 F 4208 1376 35-44 other or not specified \n",
1236 | "532357 M 3101 1968 18-24 scientist \n",
1237 | "898240 M 5648 3421 25-34 technician/engineer \n",
1238 | "877298 M 2095 3301 56+ retired \n",
1239 | "796428 M 1395 2968 25-34 artist \n",
1240 | "732618 M 523 2716 50-55 executive/managerial \n",
1241 | "627537 F 3713 2324 25-34 executive/managerial \n",
1242 | "40433 M 5677 141 25-34 academic/educator \n",
1243 | "621897 F 3464 2302 25-34 sales/marketing \n",
1244 | "904828 F 3205 3450 35-44 self-employed \n",
1245 | "387733 M 5111 1334 35-44 artist \n",
1246 | "482628 F 1894 1721 35-44 executive/managerial \n",
1247 | "851826 M 4774 3176 45-49 sales/marketing \n",
1248 | "566304 M 4957 2072 25-34 artist \n",
1249 | "523888 M 4533 1954 25-34 programmer \n",
1250 | "999132 M 2362 3948 25-34 sales/marketing \n",
1251 | "142695 F 2657 527 Under 18 K-12 student \n",
1252 | "169208 F 2926 595 18-24 artist \n",
1253 | "950386 M 3554 3676 25-34 scientist \n",
1254 | "830422 F 5643 3098 35-44 academic/educator \n",
1255 | "974878 M 1820 3770 25-34 writer \n",
1256 | "129671 M 1352 480 35-44 writer \n",
1257 | "629803 M 5488 2333 25-34 scientist \n",
1258 | "80676 F 4208 302 35-44 other or not specified \n",
1259 | "851072 M 1548 3176 35-44 academic/educator \n",
1260 | "902891 M 4030 3444 35-44 executive/managerial \n",
1261 | "516319 F 4472 1923 45-49 self-employed \n",
1262 | "674849 M 5102 2474 25-34 programmer \n",
1263 | "184396 F 1051 708 25-34 other or not specified \n",
1264 | "737397 M 1854 2723 25-34 other or not specified \n",
1265 | "\n",
1266 | " title \\\n",
1267 | "323753 Blues Brothers, The (1980) \n",
1268 | "572412 Rescuers Down Under, The (1990) \n",
1269 | "473636 Sliding Doors (1998) \n",
1270 | "308120 12 Angry Men (1957) \n",
1271 | "621235 History of the World: Part I (1981) \n",
1272 | "280736 Abyss, The (1989) \n",
1273 | "251324 Fox and the Hound, The (1981) \n",
1274 | "178892 Mission: Impossible (1996) \n",
1275 | "723124 Big Daddy (1999) \n",
1276 | "83939 Santa Clause, The (1994) \n",
1277 | "541832 Lethal Weapon 2 (1989) \n",
1278 | "915511 Misery (1990) \n",
1279 | "405738 Young Guns II (1990) \n",
1280 | "285899 When We Were Kings (1996) \n",
1281 | "181963 Space Jam (1996) \n",
1282 | "931626 U-571 (2000) \n",
1283 | "602570 Willow (1988) \n",
1284 | "516839 Plan 9 from Outer Space (1958) \n",
1285 | "596378 NeverEnding Story II: The Next Chapter, The (1... \n",
1286 | "336866 Terminator, The (1984) \n",
1287 | "403557 Star Trek IV: The Voyage Home (1986) \n",
1288 | "532357 Breakfast Club, The (1985) \n",
1289 | "898240 Animal House (1978) \n",
1290 | "877298 Whole Nine Yards, The (2000) \n",
1291 | "796428 Time Bandits (1981) \n",
1292 | "732618 Ghostbusters (1984) \n",
1293 | "627537 Life Is Beautiful (La Vita è bella) (1997) \n",
1294 | "40433 Birdcage, The (1996) \n",
1295 | "621897 My Cousin Vinny (1992) \n",
1296 | "904828 Grumpy Old Men (1993) \n",
1297 | "387733 Blob, The (1958) \n",
1298 | "482628 Titanic (1997) \n",
1299 | "851826 Talented Mr. Ripley, The (1999) \n",
1300 | "566304 'burbs, The (1989) \n",
1301 | "523888 Rocky (1976) \n",
1302 | "999132 Meet the Parents (2000) \n",
1303 | "142695 Schindler's List (1993) \n",
1304 | "169208 Beauty and the Beast (1991) \n",
1305 | "950386 Eraserhead (1977) \n",
1306 | "830422 Natural, The (1984) \n",
1307 | "974878 Dreamscape (1984) \n",
1308 | "129671 Jurassic Park (1993) \n",
1309 | "629803 Gods and Monsters (1998) \n",
1310 | "80676 Queen Margot (La Reine Margot) (1994) \n",
1311 | "851072 Talented Mr. Ripley, The (1999) \n",
1312 | "902891 Bloodsport (1988) \n",
1313 | "516319 There's Something About Mary (1998) \n",
1314 | "674849 Color of Money, The (1986) \n",
1315 | "184396 Truth About Cats & Dogs, The (1996) \n",
1316 | "737397 Mystery Men (1999) \n",
1317 | "\n",
1318 | " genre rating \\\n",
1319 | "323753 Action|Comedy|Musical 4 \n",
1320 | "572412 Animation|Children's 3 \n",
1321 | "473636 Drama|Romance 4 \n",
1322 | "308120 Drama 3 \n",
1323 | "621235 Comedy 2 \n",
1324 | "280736 Action|Adventure|Sci-Fi|Thriller 3 \n",
1325 | "251324 Animation|Children's 4 \n",
1326 | "178892 Action|Adventure|Mystery 3 \n",
1327 | "723124 Comedy 3 \n",
1328 | "83939 Children's|Comedy|Fantasy 4 \n",
1329 | "541832 Action|Comedy|Crime|Drama 4 \n",
1330 | "915511 Horror 4 \n",
1331 | "405738 Action|Comedy|Western 2 \n",
1332 | "285899 Documentary 4 \n",
1333 | "181963 Adventure|Animation|Children's|Comedy|Fantasy 4 \n",
1334 | "931626 Action|Thriller 4 \n",
1335 | "602570 Action|Adventure|Fantasy 3 \n",
1336 | "516839 Horror|Sci-Fi 3 \n",
1337 | "596378 Adventure|Children's|Fantasy 1 \n",
1338 | "336866 Action|Sci-Fi|Thriller 4 \n",
1339 | "403557 Action|Adventure|Sci-Fi 5 \n",
1340 | "532357 Comedy|Drama 5 \n",
1341 | "898240 Comedy 4 \n",
1342 | "877298 Comedy|Crime 3 \n",
1343 | "796428 Adventure|Fantasy|Sci-Fi 5 \n",
1344 | "732618 Comedy|Horror 3 \n",
1345 | "627537 Comedy|Drama 5 \n",
1346 | "40433 Comedy 5 \n",
1347 | "621897 Comedy 3 \n",
1348 | "904828 Comedy 4 \n",
1349 | "387733 Horror|Sci-Fi 4 \n",
1350 | "482628 Drama|Romance 4 \n",
1351 | "851826 Drama|Mystery|Thriller 4 \n",
1352 | "566304 Comedy 3 \n",
1353 | "523888 Action|Drama 3 \n",
1354 | "999132 Comedy 5 \n",
1355 | "142695 Drama|War 5 \n",
1356 | "169208 Animation|Children's|Musical 4 \n",
1357 | "950386 Drama|Horror 4 \n",
1358 | "830422 Drama 3 \n",
1359 | "974878 Adventure|Crime|Sci-Fi|Thriller 4 \n",
1360 | "129671 Action|Adventure|Sci-Fi 3 \n",
1361 | "629803 Drama 5 \n",
1362 | "80676 Drama|Romance 2 \n",
1363 | "851072 Drama|Mystery|Thriller 5 \n",
1364 | "902891 Action 4 \n",
1365 | "516319 Comedy 2 \n",
1366 | "674849 Drama 4 \n",
1367 | "184396 Comedy|Romance 4 \n",
1368 | "737397 Action|Adventure|Comedy 2 \n",
1369 | "\n",
1370 | " prediction (rnd.) prediction (prc.) \n",
1371 | "323753 4 3.595490 \n",
1372 | "572412 3 2.909302 \n",
1373 | "473636 4 3.858740 \n",
1374 | "308120 3 3.197440 \n",
1375 | "621235 3 3.245275 \n",
1376 | "280736 4 3.565892 \n",
1377 | "251324 4 3.941283 \n",
1378 | "178892 3 3.208387 \n",
1379 | "723124 3 3.135849 \n",
1380 | "83939 4 3.934950 \n",
1381 | "541832 4 4.141999 \n",
1382 | "915511 4 4.010313 \n",
1383 | "405738 2 2.088194 \n",
1384 | "285899 4 4.463424 \n",
1385 | "181963 4 4.013864 \n",
1386 | "931626 4 3.929566 \n",
1387 | "602570 4 3.773484 \n",
1388 | "516839 3 2.927153 \n",
1389 | "596378 3 3.229770 \n",
1390 | "336866 3 3.214885 \n",
1391 | "403557 4 3.898657 \n",
1392 | "532357 4 4.322102 \n",
1393 | "898240 4 4.268606 \n",
1394 | "877298 4 3.500426 \n",
1395 | "796428 5 4.660578 \n",
1396 | "732618 4 4.430697 \n",
1397 | "627537 5 4.849105 \n",
1398 | "40433 4 3.652851 \n",
1399 | "621897 4 3.703601 \n",
1400 | "904828 3 3.459073 \n",
1401 | "387733 4 3.665181 \n",
1402 | "482628 4 3.531790 \n",
1403 | "851826 4 4.008468 \n",
1404 | "566304 3 2.680943 \n",
1405 | "523888 4 3.779227 \n",
1406 | "999132 4 4.275411 \n",
1407 | "142695 5 4.537353 \n",
1408 | "169208 4 4.074231 \n",
1409 | "950386 3 2.755942 \n",
1410 | "830422 3 3.336859 \n",
1411 | "974878 4 3.603387 \n",
1412 | "129671 3 2.773809 \n",
1413 | "629803 4 3.892385 \n",
1414 | "80676 4 4.066545 \n",
1415 | "851072 4 4.229926 \n",
1416 | "902891 3 3.480523 \n",
1417 | "516319 3 3.330519 \n",
1418 | "674849 4 4.380075 \n",
1419 | "184396 3 3.434905 \n",
1420 | "737397 2 2.071134 "
1421 | ]
1422 | },
1423 | "execution_count": 17,
1424 | "metadata": {},
1425 | "output_type": "execute_result"
1426 | }
1427 | ],
1428 | "source": [
1429 | "results = validset[[\"gender\", \"userid\",\"movieid\",\"age_desc\",\"occ_desc\", \"title\", \"genre\", \"rating\"]].copy()\n",
1430 | "results[\"prediction (rnd.)\"] = np.asarray(np.round(final_valid_predictions), dtype=np.int16)\n",
1431 | "results[\"prediction (prc.)\"] = final_valid_predictions\n",
1432 | "results.head(50)"
1433 | ]
1434 | },
1435 | {
1436 | "cell_type": "markdown",
1437 | "metadata": {},
1438 | "source": [
1439 | "### Measures"
1440 | ]
1441 | },
1442 | {
1443 | "cell_type": "code",
1444 | "execution_count": 12,
1445 | "metadata": {
1446 | "collapsed": true
1447 | },
1448 | "outputs": [],
1449 | "source": [
1450 | "### Precision, Recall, MAE, RMSE measures:\n",
1451 | "\n",
1452 | "def compute_recall(prediction_col, target_col):\n",
1453 | " recall=[]\n",
1454 | " for i in range(5):\n",
1455 | " rating_df = results[results[target_col]==i+1]\n",
1456 | " num_true_rating = len(rating_df)+0.0\n",
1457 | " current_recall = (len(rating_df[rating_df[prediction_col]==i+1]))/num_true_rating\n",
1458 | " recall.append(current_recall)\n",
1459 | " return recall\n",
1460 | "\n",
1461 | "def compute_precision(prediction_col, target_col): \n",
1462 | " precision=[]\n",
1463 | " for i in range(5):\n",
1464 | " pred_df = results[results[prediction_col]==i+1]\n",
1465 | " pred_rating = len(pred_df)+0.0\n",
1466 | " current_precision = (len(pred_df[pred_df[target_col]==i+1]))/pred_rating\n",
1467 | " precision.append(current_precision)\n",
1468 | " return precision \n",
1469 | "\n",
1470 | "def compute_mae(prediction_col, target_col):\n",
1471 | " return np.mean(np.abs(results[prediction_col]-results[target_col]))\n",
1472 | "\n",
1473 | "def compute_rmse(prediction_col, target_col):\n",
1474 | " return np.sqrt(1/len(results)*np.sum((results[prediction_col]- results[target_col])**2))"
1475 | ]
1476 | },
1477 | {
1478 | "cell_type": "code",
1479 | "execution_count": 13,
1480 | "metadata": {},
1481 | "outputs": [
1482 | {
1483 | "data": {
1484 | "text/plain": [
1485 | "[0.12242835057676299,\n",
1486 | " 0.2602769297190613,\n",
1487 | " 0.502085911485909,\n",
1488 | " 0.6385712646755134,\n",
1489 | " 0.2782322852636389]"
1490 | ]
1491 | },
1492 | "execution_count": 13,
1493 | "metadata": {},
1494 | "output_type": "execute_result"
1495 | }
1496 | ],
1497 | "source": [
1498 | "compute_recall('prediction (rnd.)', 'rating')"
1499 | ]
1500 | },
1501 | {
1502 | "cell_type": "code",
1503 | "execution_count": 14,
1504 | "metadata": {},
1505 | "outputs": [
1506 | {
1507 | "data": {
1508 | "text/plain": [
1509 | "[0.6914036265950302,\n",
1510 | " 0.3469681397738952,\n",
1511 | " 0.4058722824965967,\n",
1512 | " 0.45429626657053657,\n",
1513 | " 0.6664430478073582]"
1514 | ]
1515 | },
1516 | "execution_count": 14,
1517 | "metadata": {},
1518 | "output_type": "execute_result"
1519 | }
1520 | ],
1521 | "source": [
1522 | "compute_precision('prediction (rnd.)', 'rating')"
1523 | ]
1524 | },
1525 | {
1526 | "cell_type": "code",
1527 | "execution_count": 15,
1528 | "metadata": {},
1529 | "outputs": [
1530 | {
1531 | "data": {
1532 | "text/plain": [
1533 | "0.6391991015220138"
1534 | ]
1535 | },
1536 | "execution_count": 15,
1537 | "metadata": {},
1538 | "output_type": "execute_result"
1539 | }
1540 | ],
1541 | "source": [
1542 | "compute_mae('prediction (rnd.)', 'rating')"
1543 | ]
1544 | },
1545 | {
1546 | "cell_type": "code",
1547 | "execution_count": 16,
1548 | "metadata": {},
1549 | "outputs": [
1550 | {
1551 | "data": {
1552 | "text/plain": [
1553 | "0.91609156911837586"
1554 | ]
1555 | },
1556 | "execution_count": 16,
1557 | "metadata": {},
1558 | "output_type": "execute_result"
1559 | }
1560 | ],
1561 | "source": [
1562 | "compute_rmse('prediction (rnd.)', 'rating')"
1563 | ]
1564 | },
1565 | {
1566 | "cell_type": "code",
1567 | "execution_count": null,
1568 | "metadata": {
1569 | "collapsed": true
1570 | },
1571 | "outputs": [],
1572 | "source": []
1573 | }
1574 | ],
1575 | "metadata": {
1576 | "kernelspec": {
1577 | "display_name": "Python 3",
1578 | "language": "python",
1579 | "name": "python3"
1580 | },
1581 | "language_info": {
1582 | "codemirror_mode": {
1583 | "name": "ipython",
1584 | "version": 3
1585 | },
1586 | "file_extension": ".py",
1587 | "mimetype": "text/x-python",
1588 | "name": "python",
1589 | "nbconvert_exporter": "python",
1590 | "pygments_lexer": "ipython3",
1591 | "version": "3.6.1"
1592 | }
1593 | },
1594 | "nbformat": 4,
1595 | "nbformat_minor": 2
1596 | }
1597 |
--------------------------------------------------------------------------------
/Deep_Recommender_Tutorial_Strata_NY_2017.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/datadynamo/strata_ny_2017_recommender_tutorial/db8631fc1892702aa909674a473093ae1796b274/Deep_Recommender_Tutorial_Strata_NY_2017.pdf
--------------------------------------------------------------------------------
/Deep_Wide_Learning.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "In this tutorial we learn how to use the tf.esimator API to train a wide linear model and a deep feed-forward neural network. This approach combines the strengths of memorization and generalization.\n",
8 | "\n",
9 | "\n",
10 | "\n",
11 | "The explaination of the model from tensorflow website as following:\n",
12 | "\n",
13 | "The figure above shows a comparison of a wide model (logistic regression with sparse features and transformations), a deep model (feed-forward neural network with an embedding layer and several hidden layers), and a Wide & Deep model (joint training of both). At a high level, here are the steps using the tf.estimator API:\n",
14 | "\n",
15 | "1. Preprocess our movielens dataset in pandas.\n",
16 | "2. Define features\n",
17 | "3. Build inputs from the original dataset \n",
18 | "4. Hash string type categorical features and use int type features value as category id directly.\n",
19 | "5. Create embeddings of sparse features for the deep model.\n",
20 | "6. Define features for both the deep and the wide part of the model.\n",
21 | "7. Train and validate the model."
22 | ]
23 | },
24 | {
25 | "cell_type": "code",
26 | "execution_count": 2,
27 | "metadata": {
28 | "collapsed": true
29 | },
30 | "outputs": [],
31 | "source": [
32 | "import tensorflow as tf\n",
33 | "import pandas as pd\n",
34 | "import matplotlib.pyplot as plt\n",
35 | "import numpy as np\n",
36 | "%matplotlib inline"
37 | ]
38 | },
39 | {
40 | "cell_type": "markdown",
41 | "metadata": {},
42 | "source": [
43 | "### 1. data preprocessing"
44 | ]
45 | },
46 | {
47 | "cell_type": "markdown",
48 | "metadata": {},
49 | "source": [
50 | "load dataset and split train and valid set. This is same step we have shown in the previous notebook for collaborative filtering."
51 | ]
52 | },
53 | {
54 | "cell_type": "code",
55 | "execution_count": 3,
56 | "metadata": {
57 | "collapsed": true
58 | },
59 | "outputs": [],
60 | "source": [
61 | "def load_movie_lens():\n",
62 | " age_desc = {\n",
63 | " 1: \"Under 18\", 18: \"18-24\", 25: \"25-34\", 35: \"35-44\", 45: \"45-49\", 50: \"50-55\", 56: \"56+\"\n",
64 | " }\n",
65 | " occupation_desc = { \n",
66 | " 0: \"other or not specified\", 1: \"academic/educator\", 2: \"artist\", 3: \"clerical/admin\",\n",
67 | " 4: \"college/grad student\", 5: \"customer service\", 6: \"doctor/health care\",\n",
68 | " 7: \"executive/managerial\", 8: \"farmer\", 9: \"homemaker\", 10: \"K-12 student\", 11: \"lawyer\",\n",
69 | " 12: \"programmer\", 13: \"retired\", 14: \"sales/marketing\", 15: \"scientist\", 16: \"self-employed\",\n",
70 | " 17: \"technician/engineer\", 18: \"tradesman/craftsman\", 19: \"unemployed\", 20: \"writer\"\n",
71 | " }\n",
72 | " rating_data = pd.read_csv(\n",
73 | " \"ml-1m/ratings.dat\",\n",
74 | " sep=\"::\",\n",
75 | " engine=\"python\",\n",
76 | " encoding=\"latin-1\",\n",
77 | " names=['userid', 'movieid', 'rating', 'timestamp'])\n",
78 | " user_data = pd.read_csv(\n",
79 | " \"ml-1m/users.dat\", \n",
80 | " sep='::', \n",
81 | " engine='python', \n",
82 | " encoding='latin-1',\n",
83 | " names=['userid', 'gender', 'age', 'occupation', 'zipcode']\n",
84 | " )\n",
85 | " user_data['age_desc'] = user_data['age'].apply(lambda x: age_desc[x])\n",
86 | " user_data['occ_desc'] = user_data['occupation'].apply(lambda x: occupation_desc[x])\n",
87 | " movie_data = pd.read_csv(\n",
88 | " \"ml-1m/movies.dat\",\n",
89 | " sep='::', \n",
90 | " engine='python', \n",
91 | " encoding='latin-1',\n",
92 | " names=['movieid', 'title', 'genre']\n",
93 | " )\n",
94 | " dataset = pd.merge(pd.merge(rating_data, movie_data, how=\"left\", on=\"movieid\"), user_data, how=\"left\", on=\"userid\")\n",
95 | " adj_col = dataset['movieid']\n",
96 | " adj_col_uni = adj_col.sort_values().unique()\n",
97 | " adj_df = pd.DataFrame(adj_col_uni).reset_index().rename(columns = {0:'movieid','index':'adj_movieid'})\n",
98 | " dataset = pd.merge(adj_df,dataset,how=\"right\", on=\"movieid\")\n",
99 | " dataset['adj_userid'] = dataset['userid'] - 1\n",
100 | " return dataset\n",
101 | "\n",
102 | "def split_dataset(dataset, split_frac=.7):\n",
103 | " dataset = dataset.sample(frac=1, replace=False)\n",
104 | " n_split = int(len(dataset)*split_frac)\n",
105 | " trainset = dataset[:n_split]\n",
106 | " validset = dataset[n_split:]\n",
107 | " return trainset, validset\n",
108 | "\n",
109 | "fullset = load_movie_lens()\n",
110 | "trainset, validset = split_dataset(fullset)"
111 | ]
112 | },
113 | {
114 | "cell_type": "markdown",
115 | "metadata": {},
116 | "source": [
117 | "### 2. define features"
118 | ]
119 | },
120 | {
121 | "cell_type": "markdown",
122 | "metadata": {},
123 | "source": [
124 | "By looking at the dataset, we know that we have following features: \"genre\", \"zipcode\", \"gender\", \"age\", \"occupation\".\n",
125 | " - The data type of \"genre\", \"zipcode\", \"gender\" are string, the data type of \"age\", \"occupation\" are int. So we group the features in STR and INT groups accordingly for further encoding.\n",
126 | " - We select all features for the deep model\n",
127 | " - We select some feature transformation for the wide model.
\n",
128 | "\n",
129 | "The feature selection for deep and wide parts of the model is flexible, you can try out different combinations."
130 | ]
131 | },
132 | {
133 | "cell_type": "code",
134 | "execution_count": 4,
135 | "metadata": {
136 | "collapsed": true
137 | },
138 | "outputs": [],
139 | "source": [
140 | "CAT_STR_COLS = [\"genre\", \"zipcode\" ,\"gender\"]\n",
141 | "CAT_INT_COLS = [\"age\", \"occupation\"]\n",
142 | "LABEL_COL = \"rating\"\n",
143 | "DEEP_COLS = CAT_STR_COLS + CAT_INT_COLS\n",
144 | "WIDE_COL_CROSSES = [[\"gender\", \"age\"], [\"gender\", \"occupation\"]]"
145 | ]
146 | },
147 | {
148 | "cell_type": "markdown",
149 | "metadata": {},
150 | "source": [
151 | "### 3. build inputs from original dataset"
152 | ]
153 | },
154 | {
155 | "cell_type": "markdown",
156 | "metadata": {},
157 | "source": [
158 | "Since this Deep and Widel Model API expects sparse tensors as inputs, we convert here all the feature columns and the label column from our original dataset to sparse tensors."
159 | ]
160 | },
161 | {
162 | "cell_type": "code",
163 | "execution_count": 5,
164 | "metadata": {
165 | "collapsed": true
166 | },
167 | "outputs": [],
168 | "source": [
169 | "def make_inputs(dataframe):\n",
170 | " \"\"\"\n",
171 | " Creates sparse tensors to hold our feature values and constants to hold our label values.\n",
172 | " For each feature we have selected for the deep and wide model, we create a sparse tensor. We use tf.SparseTensor \n",
173 | " to create sparse tensors for features, and use tf.constant to create a constant with label values.\n",
174 | " \n",
175 | " Arguments:\n",
176 | " dataframe -- pandas dataframe containing the values of features and labels.\n",
177 | " \n",
178 | " Returns:\n",
179 | " feature_inputs -- a dictionary of sparse tensors of features.\n",
180 | " label_input -- a constant with shape of [number of training example, 1]\n",
181 | " \"\"\" \n",
182 | " feature_inputs = {\n",
183 | " col_name: tf.SparseTensor(\n",
184 | " indices = [[i, 0] for i in range(len(dataframe[col_name]))],\n",
185 | " values = dataframe[col_name].values,\n",
186 | " dense_shape = [len(dataframe[col_name]), 1]\n",
187 | " )\n",
188 | " for col_name in CAT_STR_COLS + CAT_INT_COLS\n",
189 | " }\n",
190 | " label_input = tf.constant(dataframe[LABEL_COL].values-1)\n",
191 | " return (feature_inputs, label_input)"
192 | ]
193 | },
194 | {
195 | "cell_type": "markdown",
196 | "metadata": {},
197 | "source": [
198 | "### 4. create hash buckets for categorical features"
199 | ]
200 | },
201 | {
202 | "cell_type": "markdown",
203 | "metadata": {},
204 | "source": [
205 | "Here we define two functions to encode string type categorical features and int type categorical features."
206 | ]
207 | },
208 | {
209 | "cell_type": "code",
210 | "execution_count": 6,
211 | "metadata": {
212 | "collapsed": true
213 | },
214 | "outputs": [],
215 | "source": [
216 | "def make_hash_columns(CAT_STR_COLS):\n",
217 | " \"\"\"\n",
218 | " Use tf.feature_column.categorical_column_with_hash_bucket to encode the string type categorical features.\n",
219 | " Documentation of this function from tensorflow:\n",
220 | " Use this when your sparse features are in string or integer format, and you want to distribute your inputs \n",
221 | " into a finite number of buckets by hashing. output_id = Hash(input_feature_string) % bucket_size.\n",
222 | " \n",
223 | " Arguments:\n",
224 | " CAT_STR_COLS -- string type categorical columns.\n",
225 | " \n",
226 | " Returns:\n",
227 | " hashed_layers -- 3 hashed categorical columns in a list.\n",
228 | " \n",
229 | " \"\"\"\n",
230 | " \n",
231 | " hashed_columns = [\n",
232 | " tf.feature_column.categorical_column_with_hash_bucket(col_name, hash_bucket_size=1000) \n",
233 | " for col_name in CAT_STR_COLS\n",
234 | " ]\n",
235 | " return hashed_columns"
236 | ]
237 | },
238 | {
239 | "cell_type": "code",
240 | "execution_count": 7,
241 | "metadata": {
242 | "collapsed": true
243 | },
244 | "outputs": [],
245 | "source": [
246 | "def make_int_columns(CAT_INT_COLS): \n",
247 | " \"\"\"\n",
248 | " Use tf.feature_column.categorical_column_with_identity to encode the int type categorical features.\n",
249 | " Documentation of this function from tensorflow:\n",
250 | " Use this when your inputs are integers in the range [0, num_buckets), and you want to use the \n",
251 | " input value itself as the categorical ID.\n",
252 | " \n",
253 | " Arguments:\n",
254 | " CAT_INT_COLS -- int type categorical columns.\n",
255 | " \n",
256 | " Returns:\n",
257 | " hashed_layers -- 2 categorical columns in a list.\n",
258 | " \n",
259 | " \"\"\"\n",
260 | " int_columns = [\n",
261 | " tf.feature_column.categorical_column_with_identity(col_name, num_buckets=1000, default_value=0)\n",
262 | " for col_name in CAT_INT_COLS\n",
263 | " ]\n",
264 | " return int_columns"
265 | ]
266 | },
267 | {
268 | "cell_type": "markdown",
269 | "metadata": {},
270 | "source": [
271 | "In the tensorflow tutorial, they have used tf.feature_column.categorical_column_with_vocabulary_list to create the int type categorical columns.\n",
272 | "\n",
273 | " age = tf.feature_column.categorical_column_with_vocabulary_list(\"age\", [1,18, 25, 35, 45, 50, 56])
\n",
274 | " occupation = tf.feature_column.categorical_column_with_vocabulary_list(\"occupation\", [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20])
\n",
275 | " age = tf.feature_column.indicator_column(age)
\n",
276 | " occupation = tf.feature_column.indicator_column(occupation)
\n",
277 | "
"
278 | ]
279 | },
280 | {
281 | "cell_type": "markdown",
282 | "metadata": {},
283 | "source": [
284 | "### 5. create embedding for sparse features "
285 | ]
286 | },
287 | {
288 | "cell_type": "markdown",
289 | "metadata": {},
290 | "source": [
291 | "Create dense embeddings for the sparse features to feed into DNN."
292 | ]
293 | },
294 | {
295 | "cell_type": "code",
296 | "execution_count": 8,
297 | "metadata": {
298 | "collapsed": true
299 | },
300 | "outputs": [],
301 | "source": [
302 | "def make_embeddings(hashed_columns, int_columns, dim=6):\n",
303 | " \"\"\"\n",
304 | " Create embeddings for sparse features in the deep model. We use function tf.feature_column.embedding_colum to \n",
305 | " convert the categorical columns we have created from the above steps to a dense representation.\n",
306 | " \n",
307 | " Arguments:\n",
308 | " hashed_columns -- all categorical columnns that came out of the make_hash_columns function \n",
309 | " and are going to be fed into the DNN.\n",
310 | " int_columns -- all categorical columnns that came out of the make_int_columns function \n",
311 | " and are going to be fed into the DNN.\n",
312 | " dim -- 6, hyper-parameter dimension for the feature embeddings.\n",
313 | " \n",
314 | " Returns:\n",
315 | " emdedding_layers -- list of columns with dense (embedded) representations.\n",
316 | " \n",
317 | " \"\"\" \n",
318 | " embedding_layers = [\n",
319 | " tf.feature_column.embedding_column(\n",
320 | " column,\n",
321 | " dimension=dim\n",
322 | " )\n",
323 | " for column in hashed_columns+int_columns\n",
324 | " ]\n",
325 | " return embedding_layers"
326 | ]
327 | },
328 | {
329 | "cell_type": "markdown",
330 | "metadata": {},
331 | "source": [
332 | "### 6. define features for the wide part"
333 | ]
334 | },
335 | {
336 | "cell_type": "markdown",
337 | "metadata": {},
338 | "source": [
339 | "In our case, all the columns from embedding layers should go into the deep model, so we our deep model input equals to embedding_layers and we are not going to write a function for this."
340 | ]
341 | },
342 | {
343 | "cell_type": "code",
344 | "execution_count": 9,
345 | "metadata": {
346 | "collapsed": true
347 | },
348 | "outputs": [],
349 | "source": [
350 | "def make_wide_input_layers(WIDE_COL_CROSSES):\n",
351 | " \"\"\"\n",
352 | " Make input cross features for the wide model. We use the tf.feature_column.crossed_column function to hash the \n",
353 | " cross transformation.\n",
354 | " \n",
355 | " Arguments:\n",
356 | " WIDE_COL_CROSSES -- cross feature combinations.\n",
357 | " \n",
358 | " Returns:\n",
359 | " crossed_wide_input_layers -- input cross features for the wide model.\n",
360 | " \n",
361 | " \"\"\" \n",
362 | " crossed_wide_input_layers = [\n",
363 | " tf.feature_column.crossed_column([c for c in cs], hash_bucket_size=int(10**(3+len(cs))))\n",
364 | " for cs in WIDE_COL_CROSSES\n",
365 | " ]\n",
366 | " return crossed_wide_input_layers"
367 | ]
368 | },
369 | {
370 | "cell_type": "markdown",
371 | "metadata": {},
372 | "source": [
373 | "### 7. train and validate the model"
374 | ]
375 | },
376 | {
377 | "cell_type": "markdown",
378 | "metadata": {},
379 | "source": [
380 | "Here we provide input features for the deep model and wide model, define the number of layers and layer sizes of DNN \n",
381 | "and create the model with tf.contrib.learn.DNNLinearCombinedClassifier. We save the model in directory ./model/"
382 | ]
383 | },
384 | {
385 | "cell_type": "code",
386 | "execution_count": 23,
387 | "metadata": {
388 | "scrolled": false
389 | },
390 | "outputs": [
391 | {
392 | "name": "stdout",
393 | "output_type": "stream",
394 | "text": [
395 | "create input layers...done!\n",
396 | "create model...INFO:tensorflow:Using config: {'_task_type': None, '_task_id': 0, '_cluster_spec': , '_master': '', '_num_ps_replicas': 0, '_num_worker_replicas': 0, '_environment': 'local', '_is_chief': True, '_evaluation_master': '', '_tf_config': gpu_options {\n",
397 | " per_process_gpu_memory_fraction: 1.0\n",
398 | "}\n",
399 | ", '_tf_random_seed': None, '_save_summary_steps': 10, '_save_checkpoints_secs': 600, '_log_step_count_steps': 100, '_session_config': None, '_save_checkpoints_steps': None, '_keep_checkpoint_max': 1, '_keep_checkpoint_every_n_hours': 10000, '_model_dir': './model/'}\n",
400 | "done!\n",
401 | "training model...WARNING:tensorflow:From /home/cynthia/anaconda3/lib/python3.6/site-packages/tensorflow/contrib/learn/python/learn/estimators/head.py:641: scalar_summary (from tensorflow.python.ops.logging_ops) is deprecated and will be removed after 2016-11-30.\n",
402 | "Instructions for updating:\n",
403 | "Please switch to tf.summary.scalar. Note that tf.summary.scalar uses the node name instead of the tag. This means that TensorFlow will automatically de-duplicate summary names based on the scope they are created in. Also, passing a tensor or list of tags to a scalar summary op is no longer supported.\n",
404 | "WARNING:tensorflow:From /home/cynthia/anaconda3/lib/python3.6/site-packages/tensorflow/contrib/layers/python/layers/optimizers.py:160: assert_global_step (from tensorflow.contrib.framework.python.ops.variables) is deprecated and will be removed in a future version.\n",
405 | "Instructions for updating:\n",
406 | "Please switch to tf.train.assert_global_step\n",
407 | "WARNING:tensorflow:From /home/cynthia/anaconda3/lib/python3.6/site-packages/tensorflow/contrib/layers/python/layers/optimizers.py:160: assert_global_step (from tensorflow.contrib.framework.python.ops.variables) is deprecated and will be removed in a future version.\n",
408 | "Instructions for updating:\n",
409 | "Please switch to tf.train.assert_global_step\n",
410 | "INFO:tensorflow:Create CheckpointSaverHook.\n",
411 | "INFO:tensorflow:Restoring parameters from ./model/model.ckpt-1000\n",
412 | "INFO:tensorflow:Saving checkpoints for 1001 into ./model/model.ckpt.\n",
413 | "INFO:tensorflow:loss = 1.44681, step = 1001\n",
414 | "INFO:tensorflow:global_step/sec: 0.881767\n",
415 | "INFO:tensorflow:loss = 1.44639, step = 1101 (113.410 sec)\n",
416 | "INFO:tensorflow:global_step/sec: 0.919119\n",
417 | "INFO:tensorflow:loss = 1.44599, step = 1201 (108.799 sec)\n",
418 | "INFO:tensorflow:global_step/sec: 0.917763\n",
419 | "INFO:tensorflow:loss = 1.4456, step = 1301 (108.961 sec)\n",
420 | "INFO:tensorflow:global_step/sec: 0.913875\n",
421 | "INFO:tensorflow:loss = 1.44523, step = 1401 (109.424 sec)\n",
422 | "INFO:tensorflow:global_step/sec: 0.917907\n",
423 | "INFO:tensorflow:loss = 1.44485, step = 1501 (108.944 sec)\n",
424 | "INFO:tensorflow:Saving checkpoints for 1547 into ./model/model.ckpt.\n",
425 | "INFO:tensorflow:global_step/sec: 0.907156\n",
426 | "INFO:tensorflow:loss = 1.44448, step = 1601 (110.235 sec)\n",
427 | "INFO:tensorflow:global_step/sec: 0.781231\n",
428 | "INFO:tensorflow:loss = 1.44411, step = 1701 (128.003 sec)\n",
429 | "INFO:tensorflow:global_step/sec: 0.629334\n",
430 | "INFO:tensorflow:loss = 1.44374, step = 1801 (158.898 sec)\n",
431 | "INFO:tensorflow:global_step/sec: 0.627976\n",
432 | "INFO:tensorflow:loss = 1.44336, step = 1901 (159.244 sec)\n",
433 | "INFO:tensorflow:Saving checkpoints for 1960 into ./model/model.ckpt.\n",
434 | "INFO:tensorflow:Saving checkpoints for 2000 into ./model/model.ckpt.\n",
435 | "INFO:tensorflow:Loss for final step: 1.44299.\n",
436 | "done!\n",
437 | "evaluating model...WARNING:tensorflow:From /home/cynthia/anaconda3/lib/python3.6/site-packages/tensorflow/contrib/learn/python/learn/estimators/head.py:641: scalar_summary (from tensorflow.python.ops.logging_ops) is deprecated and will be removed after 2016-11-30.\n",
438 | "Instructions for updating:\n",
439 | "Please switch to tf.summary.scalar. Note that tf.summary.scalar uses the node name instead of the tag. This means that TensorFlow will automatically de-duplicate summary names based on the scope they are created in. Also, passing a tensor or list of tags to a scalar summary op is no longer supported.\n",
440 | "INFO:tensorflow:Starting evaluation at 2017-09-22-18:15:36\n",
441 | "INFO:tensorflow:Restoring parameters from ./model/model.ckpt-2000\n",
442 | "INFO:tensorflow:Evaluation [1/1]\n",
443 | "INFO:tensorflow:Finished evaluation at 2017-09-22-18:15:40\n",
444 | "INFO:tensorflow:Saving dict for global step 2000: accuracy = 0.34907, global_step = 2000, loss = 1.44281\n",
445 | "done!\n",
446 | "calculating predictions...INFO:tensorflow:Restoring parameters from ./model/model.ckpt-2000\n",
447 | "done!\n",
448 | "calculating probabilites...INFO:tensorflow:Restoring parameters from ./model/model.ckpt-2000\n",
449 | "done!\n"
450 | ]
451 | }
452 | ],
453 | "source": [
454 | "print(\"create input layers...\", end=\"\")\n",
455 | "hash_columns = make_hash_columns(CAT_STR_COLS)\n",
456 | "int_columns = make_int_columns(CAT_INT_COLS)\n",
457 | "embedding_layers = make_embeddings(hash_columns, int_columns,dim =6)\n",
458 | "deep_input_layers = embedding_layers\n",
459 | "wide_input_layers = make_wide_input_layers(WIDE_COL_CROSSES)\n",
460 | "print(\"done!\")\n",
461 | "print(\"create model...\", end=\"\")\n",
462 | "model = tf.contrib.learn.DNNLinearCombinedClassifier(\n",
463 | " n_classes=5,\n",
464 | " linear_feature_columns = wide_input_layers,\n",
465 | " dnn_feature_columns = deep_input_layers,\n",
466 | " dnn_hidden_units = [32, 16],\n",
467 | " fix_global_step_increment_bug=True,\n",
468 | " config = tf.contrib.learn.RunConfig(\n",
469 | " keep_checkpoint_max = 1,\n",
470 | " save_summary_steps = 10,\n",
471 | " model_dir = \"./model/\"\n",
472 | " )\n",
473 | ")\n",
474 | "print(\"done!\")\n",
475 | "print(\"training model...\", end=\"\")\n",
476 | "model.fit(input_fn = lambda: make_inputs(trainset), steps=1000)\n",
477 | "print(\"done!\")\n",
478 | "print(\"evaluating model...\", end=\"\")\n",
479 | "results = model.evaluate(input_fn = lambda: make_inputs(validset), steps=1)\n",
480 | "print(\"done!\")\n",
481 | "print(\"calculating predictions...\", end=\"\")\n",
482 | "predictions = model.predict_classes(input_fn = lambda: make_inputs(validset))\n",
483 | "print(\"done!\")\n",
484 | "print(\"calculating probabilites...\", end=\"\")\n",
485 | "probabilities = model.predict_proba(input_fn = lambda: make_inputs(validset))\n",
486 | "print(\"done!\")"
487 | ]
488 | },
489 | {
490 | "cell_type": "code",
491 | "execution_count": 12,
492 | "metadata": {},
493 | "outputs": [
494 | {
495 | "name": "stdout",
496 | "output_type": "stream",
497 | "text": [
498 | "loss: 1.4464459\n",
499 | "accuracy: 0.34885341\n",
500 | "global_step: 1000\n"
501 | ]
502 | }
503 | ],
504 | "source": [
505 | "for n, r in results.items():\n",
506 | " print(\"%s: %a\"%(n, r))"
507 | ]
508 | },
509 | {
510 | "cell_type": "code",
511 | "execution_count": 24,
512 | "metadata": {
513 | "collapsed": true
514 | },
515 | "outputs": [],
516 | "source": [
517 | "predict = list(predictions)"
518 | ]
519 | },
520 | {
521 | "cell_type": "code",
522 | "execution_count": 27,
523 | "metadata": {
524 | "collapsed": true
525 | },
526 | "outputs": [],
527 | "source": [
528 | "prob = list(probabilities)"
529 | ]
530 | },
531 | {
532 | "cell_type": "code",
533 | "execution_count": 25,
534 | "metadata": {},
535 | "outputs": [
536 | {
537 | "name": "stdout",
538 | "output_type": "stream",
539 | "text": [
540 | "DNN Accuracy: 34.907003%\n"
541 | ]
542 | }
543 | ],
544 | "source": [
545 | "dnw_accuracy = np.sum(np.asarray(predict)+1 == validset.rating.values) / len(validset)\n",
546 | "print(\"DNW Accuracy: %f%%\"%(dnw_accuracy*100,))"
547 | ]
548 | },
549 | {
550 | "cell_type": "code",
551 | "execution_count": 42,
552 | "metadata": {},
553 | "outputs": [
554 | {
555 | "data": {
556 | "text/html": [
557 | "\n",
558 | "\n",
571 | "
\n",
572 | " \n",
573 | " \n",
574 | " | \n",
575 | " gender | \n",
576 | " age_desc | \n",
577 | " occ_desc | \n",
578 | " title | \n",
579 | " genre | \n",
580 | " rating | \n",
581 | " prediction | \n",
582 | " rating1 | \n",
583 | " rating2 | \n",
584 | " rating3 | \n",
585 | " rating4 | \n",
586 | " rating5 | \n",
587 | "
\n",
588 | " \n",
589 | " \n",
590 | " \n",
591 | " 223742 | \n",
592 | " M | \n",
593 | " 35-44 | \n",
594 | " clerical/admin | \n",
595 | " Little Princess, The (1939) | \n",
596 | " Children's|Drama | \n",
597 | " 4 | \n",
598 | " 4 | \n",
599 | " 0.025210 | \n",
600 | " 0.068422 | \n",
601 | " 0.234015 | \n",
602 | " 0.375578 | \n",
603 | " 0.296774 | \n",
604 | "
\n",
605 | " \n",
606 | " 915512 | \n",
607 | " M | \n",
608 | " 25-34 | \n",
609 | " unemployed | \n",
610 | " Misery (1990) | \n",
611 | " Horror | \n",
612 | " 4 | \n",
613 | " 3 | \n",
614 | " 0.105347 | \n",
615 | " 0.148656 | \n",
616 | " 0.290903 | \n",
617 | " 0.287796 | \n",
618 | " 0.167298 | \n",
619 | "
\n",
620 | " \n",
621 | " 209015 | \n",
622 | " M | \n",
623 | " 25-34 | \n",
624 | " technician/engineer | \n",
625 | " Supercop (1992) | \n",
626 | " Action|Thriller | \n",
627 | " 3 | \n",
628 | " 4 | \n",
629 | " 0.050473 | \n",
630 | " 0.120541 | \n",
631 | " 0.292055 | \n",
632 | " 0.350531 | \n",
633 | " 0.186400 | \n",
634 | "
\n",
635 | " \n",
636 | " 719570 | \n",
637 | " M | \n",
638 | " 25-34 | \n",
639 | " other or not specified | \n",
640 | " Red Violin, The (Le Violon rouge) (1998) | \n",
641 | " Drama|Mystery | \n",
642 | " 3 | \n",
643 | " 4 | \n",
644 | " 0.057457 | \n",
645 | " 0.115074 | \n",
646 | " 0.279706 | \n",
647 | " 0.332208 | \n",
648 | " 0.215554 | \n",
649 | "
\n",
650 | " \n",
651 | " 283590 | \n",
652 | " M | \n",
653 | " 35-44 | \n",
654 | " programmer | \n",
655 | " Manon of the Spring (Manon des sources) (1986) | \n",
656 | " Drama | \n",
657 | " 4 | \n",
658 | " 4 | \n",
659 | " 0.025996 | \n",
660 | " 0.079624 | \n",
661 | " 0.237717 | \n",
662 | " 0.393170 | \n",
663 | " 0.263494 | \n",
664 | "
\n",
665 | " \n",
666 | "
\n",
667 | "
"
668 | ],
669 | "text/plain": [
670 | " gender age_desc occ_desc \\\n",
671 | "223742 M 35-44 clerical/admin \n",
672 | "915512 M 25-34 unemployed \n",
673 | "209015 M 25-34 technician/engineer \n",
674 | "719570 M 25-34 other or not specified \n",
675 | "283590 M 35-44 programmer \n",
676 | "\n",
677 | " title genre \\\n",
678 | "223742 Little Princess, The (1939) Children's|Drama \n",
679 | "915512 Misery (1990) Horror \n",
680 | "209015 Supercop (1992) Action|Thriller \n",
681 | "719570 Red Violin, The (Le Violon rouge) (1998) Drama|Mystery \n",
682 | "283590 Manon of the Spring (Manon des sources) (1986) Drama \n",
683 | "\n",
684 | " rating prediction rating1 rating2 rating3 rating4 rating5 \n",
685 | "223742 4 4 0.025210 0.068422 0.234015 0.375578 0.296774 \n",
686 | "915512 4 3 0.105347 0.148656 0.290903 0.287796 0.167298 \n",
687 | "209015 3 4 0.050473 0.120541 0.292055 0.350531 0.186400 \n",
688 | "719570 3 4 0.057457 0.115074 0.279706 0.332208 0.215554 \n",
689 | "283590 4 4 0.025996 0.079624 0.237717 0.393170 0.263494 "
690 | ]
691 | },
692 | "execution_count": 42,
693 | "metadata": {},
694 | "output_type": "execute_result"
695 | }
696 | ],
697 | "source": [
698 | "results = validset[[\"gender\",\"age_desc\",\"occ_desc\", \"title\", \"genre\", \"rating\"]].copy()\n",
699 | "results[\"prediction\"] = np.asarray(predict)+1\n",
700 | "results[\"rating1\"] = np.vstack(prob)[:,0]\n",
701 | "results[\"rating2\"] = np.vstack(prob)[:,1]\n",
702 | "results[\"rating3\"] = np.vstack(prob)[:,2]\n",
703 | "results[\"rating4\"] = np.vstack(prob)[:,3]\n",
704 | "results[\"rating5\"] = np.vstack(prob)[:,4]\n",
705 | "results.head(5)"
706 | ]
707 | },
708 | {
709 | "cell_type": "code",
710 | "execution_count": null,
711 | "metadata": {
712 | "collapsed": true
713 | },
714 | "outputs": [],
715 | "source": []
716 | }
717 | ],
718 | "metadata": {
719 | "kernelspec": {
720 | "display_name": "Python 3",
721 | "language": "python",
722 | "name": "python3"
723 | },
724 | "language_info": {
725 | "codemirror_mode": {
726 | "name": "ipython",
727 | "version": 3
728 | },
729 | "file_extension": ".py",
730 | "mimetype": "text/x-python",
731 | "name": "python",
732 | "nbconvert_exporter": "python",
733 | "pygments_lexer": "ipython3",
734 | "version": "3.6.2"
735 | }
736 | },
737 | "nbformat": 4,
738 | "nbformat_minor": 2
739 | }
740 |
--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
1 | MIT License
2 |
3 | Copyright (c) 2017 Mo Patel & Junxia Li
4 |
5 | Permission is hereby granted, free of charge, to any person obtaining a copy
6 | of this software and associated documentation files (the "Software"), to deal
7 | in the Software without restriction, including without limitation the rights
8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 |
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 |
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # strata_ny_2017_recommender_tutorial
--------------------------------------------------------------------------------