├── .gitignore ├── README.md ├── data ├── ua.base └── ua.test └── fm_tensorflow.ipynb /.gitignore: -------------------------------------------------------------------------------- 1 | .DS_Store 2 | ._.DS_Store 3 | .ipynb_checkpoints/ 4 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Factorization Machines with Tensorflow Tutorial 2 | 3 | Click [here](https://github.com/babakx/fm_tensorflow/blob/master/fm_tensorflow.ipynb) to see the ipython notebook tuturial. 4 | 5 | ## Prerequisite 6 | 7 | The following python packages should be installed: 8 | - pandas 9 | - numpy 10 | - scipy 11 | - sklearn 12 | - tensorflow 13 | - tqdm 14 | -------------------------------------------------------------------------------- /fm_tensorflow.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "## Factorization Machines with tensorflow tutorial - Python3\n", 8 | "\n", 9 | "In this tutial we demonstrate how to create a FMs model with tensorflow step-by-step.\n", 10 | "\n", 11 | "### References:\n", 12 | "\n", 13 | "Blog post by Gabriele Modena: [Factorization Machines with Tensorflow](http://nowave.it/factorization-machines-with-tensorflow.html)\n", 14 | "\n", 15 | "Factorization Machines paper: [Factorization Machines with LibFm (pdf)](http://www.csie.ntu.edu.tw/~b97053/paper/Factorization%20Machines%20with%20libFM.pdf)\n", 16 | "\n", 17 | "\n", 18 | "### Relevant repository:\n", 19 | "\n", 20 | "[tffm library](https://github.com/geffy/tffm)" 21 | ] 22 | }, 23 | { 24 | "cell_type": "markdown", 25 | "metadata": {}, 26 | "source": [ 27 | "## Utility function to convert list to sparse matrix\n", 28 | "\n", 29 | "Here we created a utility function to create a sparse matrix (that is needed by factorization machines) from a list of user/item ids.\n", 30 | "\n", 31 | "Check [this gist](https://gist.github.com/babakx/7a3fc9739b7778f6673a458605e18963) for more details about this utitly function." 32 | ] 33 | }, 34 | { 35 | "cell_type": "code", 36 | "execution_count": 70, 37 | "metadata": {}, 38 | "outputs": [], 39 | "source": [ 40 | "from itertools import count \n", 41 | "from collections import defaultdict\n", 42 | "from scipy.sparse import csr \n", 43 | "from __future__ import print_function \n", 44 | "\n", 45 | "def vectorize_dic(dic, ix=None, p=None):\n", 46 | " \"\"\" \n", 47 | " Creates a scipy csr matrix from a list of lists (each inner list is a set of values corresponding to a feature) \n", 48 | " \n", 49 | " parameters:\n", 50 | " -----------\n", 51 | " dic -- dictionary of feature lists. Keys are the name of features\n", 52 | " ix -- index generator (default None)\n", 53 | " p -- dimension of featrure space (number of columns in the sparse matrix) (default None)\n", 54 | " \"\"\"\n", 55 | " if (ix == None):\n", 56 | " d = count(0)\n", 57 | " ix = defaultdict(lambda: next(d)) \n", 58 | " \n", 59 | " n = len(list(dic.values())[0]) # num samples\n", 60 | " g = len(list(dic.keys())) # num groups\n", 61 | " nz = n * g # number of non-zeros\n", 62 | "\n", 63 | " col_ix = np.empty(nz, dtype=int) \n", 64 | " \n", 65 | " i = 0\n", 66 | " for k, lis in dic.items(): \n", 67 | " # append index el with k in order to prevet mapping different columns with same id to same index\n", 68 | " col_ix[i::g] = [ix[str(el) + str(k)] for el in lis]\n", 69 | " i += 1\n", 70 | " \n", 71 | " row_ix = np.repeat(np.arange(0, n), g) \n", 72 | " data = np.ones(nz)\n", 73 | " \n", 74 | " if (p == None):\n", 75 | " p = len(ix)\n", 76 | " \n", 77 | " ixx = np.where(col_ix < p)\n", 78 | "\n", 79 | " return csr.csr_matrix((data[ixx],(row_ix[ixx], col_ix[ixx])), shape=(n, p)), ix " 80 | ] 81 | }, 82 | { 83 | "cell_type": "markdown", 84 | "metadata": {}, 85 | "source": [ 86 | "## Loading data\n", 87 | "\n", 88 | "In this tutorial we use the [MovieLens100k Dataset](https://grouplens.org/datasets/movielens/100k/). Here we convert data to scipy csr (sparse) matrix format. " 89 | ] 90 | }, 91 | { 92 | "cell_type": "code", 93 | "execution_count": 71, 94 | "metadata": {}, 95 | "outputs": [], 96 | "source": [ 97 | "import pandas as pd\n", 98 | "import numpy as np\n", 99 | "from sklearn.feature_extraction import DictVectorizer\n", 100 | "\n", 101 | "# laod data with pandas\n", 102 | "cols = ['user', 'item', 'rating', 'timestamp']\n", 103 | "train = pd.read_csv('data/ua.base', delimiter='\\t', names=cols)\n", 104 | "test = pd.read_csv('data/ua.test', delimiter='\\t', names=cols)\n", 105 | "\n", 106 | "# vectorize data and convert them to csr matrix\n", 107 | "X_train, ix = vectorize_dic({'users': train.user.values, 'items': train.item.values})\n", 108 | "X_test, ix = vectorize_dic({'users': test.user.values, 'items': test.item.values}, ix, X_train.shape[1])\n", 109 | "y_train = train.rating.values\n", 110 | "y_test= test.rating.values" 111 | ] 112 | }, 113 | { 114 | "cell_type": "markdown", 115 | "metadata": {}, 116 | "source": [ 117 | "## Densifying the input matrices\n", 118 | "Here we convert the two matrices of `X_train` and `X_test` to dense format to be able to feed them to the tf model. For large datasets this trick is not recommended. You can use `tf.SparseTensor` for large sparse datasets. Check [this file from tffm library](https://github.com/geffy/tffm/blob/a98c786917f5ca74a249748ddef8b694b7f823c9/tffm/core.py#L127) to see how a sparse tensor can be defined." 119 | ] 120 | }, 121 | { 122 | "cell_type": "code", 123 | "execution_count": 72, 124 | "metadata": {}, 125 | "outputs": [ 126 | { 127 | "name": "stdout", 128 | "output_type": "stream", 129 | "text": [ 130 | "(90570, 2623)\n", 131 | "(9430, 2623)\n" 132 | ] 133 | } 134 | ], 135 | "source": [ 136 | "X_train = X_train.todense()\n", 137 | "X_test = X_test.todense()\n", 138 | "\n", 139 | "# print shape of data\n", 140 | "print(X_train.shape)\n", 141 | "print(X_test.shape)" 142 | ] 143 | }, 144 | { 145 | "cell_type": "markdown", 146 | "metadata": {}, 147 | "source": [ 148 | "## Define FM Model with tensorflow\n", 149 | "\n", 150 | "We first initialize the parameters of the model as follows:" 151 | ] 152 | }, 153 | { 154 | "cell_type": "code", 155 | "execution_count": 73, 156 | "metadata": {}, 157 | "outputs": [], 158 | "source": [ 159 | "import tensorflow as tf\n", 160 | "\n", 161 | "n, p = X_train.shape\n", 162 | "\n", 163 | "# number of latent factors\n", 164 | "k = 10\n", 165 | "\n", 166 | "# design matrix\n", 167 | "X = tf.placeholder('float', shape=[None, p])\n", 168 | "# target vector\n", 169 | "y = tf.placeholder('float', shape=[None, 1])\n", 170 | "\n", 171 | "# bias and weights\n", 172 | "w0 = tf.Variable(tf.zeros([1]))\n", 173 | "W = tf.Variable(tf.zeros([p]))\n", 174 | "\n", 175 | "# interaction factors, randomly initialized \n", 176 | "V = tf.Variable(tf.random_normal([k, p], stddev=0.01))\n", 177 | "\n", 178 | "# estimate of y, initialized to 0.\n", 179 | "y_hat = tf.Variable(tf.zeros([n, 1]))" 180 | ] 181 | }, 182 | { 183 | "cell_type": "markdown", 184 | "metadata": {}, 185 | "source": [ 186 | "### Now we define how the output values y should be calculated\n", 187 | "Using the trick in Rendle's paper, the output of a give feature vector `x` can be calculated using the following equation. The next cell implements that with tensorflow operations." 188 | ] 189 | }, 190 | { 191 | "cell_type": "code", 192 | "execution_count": 74, 193 | "metadata": {}, 194 | "outputs": [ 195 | { 196 | "data": { 197 | "text/latex": [ 198 | "$$\\hat{y}(\\mathbf{x}) = w_0 + \\sum_{j=1}^{p}w_jx_j + \\frac{1}{2} \\sum_{f=1}^{k} ((\\sum_{j=1}^{p}v_{j,f}x_j)^2-\\sum_{j=1}^{p}v_{j,f}^2 x_j^2)$$" 199 | ], 200 | "text/plain": [ 201 | "" 202 | ] 203 | }, 204 | "metadata": {}, 205 | "output_type": "display_data" 206 | } 207 | ], 208 | "source": [ 209 | "from IPython.display import display, Math, Latex\n", 210 | "display(Math(r'\\hat{y}(\\mathbf{x}) = w_0 + \\sum_{j=1}^{p}w_jx_j + \\frac{1}{2} \\sum_{f=1}^{k} ((\\sum_{j=1}^{p}v_{j,f}x_j)^2-\\sum_{j=1}^{p}v_{j,f}^2 x_j^2)'))" 211 | ] 212 | }, 213 | { 214 | "cell_type": "code", 215 | "execution_count": 75, 216 | "metadata": {}, 217 | "outputs": [], 218 | "source": [ 219 | "# Calculate output with FM equation\n", 220 | "linear_terms = tf.add(w0, tf.reduce_sum(tf.multiply(W, X), 1, keep_dims=True))\n", 221 | "pair_interactions = (tf.multiply(0.5,\n", 222 | " tf.reduce_sum(\n", 223 | " tf.subtract(\n", 224 | " tf.pow( tf.matmul(X, tf.transpose(V)), 2),\n", 225 | " tf.matmul(tf.pow(X, 2), tf.transpose(tf.pow(V, 2)))),\n", 226 | " 1, keep_dims=True)))\n", 227 | "y_hat = tf.add(linear_terms, pair_interactions)" 228 | ] 229 | }, 230 | { 231 | "cell_type": "markdown", 232 | "metadata": {}, 233 | "source": [ 234 | "## Loss function\n", 235 | "\n", 236 | "Here we implement FM point-wise loss function with tensorflow operations. The loss is defined as:" 237 | ] 238 | }, 239 | { 240 | "cell_type": "code", 241 | "execution_count": 76, 242 | "metadata": {}, 243 | "outputs": [ 244 | { 245 | "data": { 246 | "text/latex": [ 247 | "$$L = \\sum_{i=1}^{n} (y_i - \\hat{y}_i)^2 + \\lambda_w ||W||^2 + \\lambda_v ||V||^2$$" 248 | ], 249 | "text/plain": [ 250 | "" 251 | ] 252 | }, 253 | "metadata": {}, 254 | "output_type": "display_data" 255 | } 256 | ], 257 | "source": [ 258 | "display(Math(r'L = \\sum_{i=1}^{n} (y_i - \\hat{y}_i)^2 + \\lambda_w ||W||^2 + \\lambda_v ||V||^2'))" 259 | ] 260 | }, 261 | { 262 | "cell_type": "code", 263 | "execution_count": 77, 264 | "metadata": {}, 265 | "outputs": [], 266 | "source": [ 267 | "# L2 regularized sum of squares loss function over W and V\n", 268 | "lambda_w = tf.constant(0.001, name='lambda_w')\n", 269 | "lambda_v = tf.constant(0.001, name='lambda_v')\n", 270 | "\n", 271 | "l2_norm = (tf.reduce_sum(\n", 272 | " tf.add(\n", 273 | " tf.multiply(lambda_w, tf.pow(W, 2)),\n", 274 | " tf.multiply(lambda_v, tf.pow(V, 2)))))\n", 275 | "\n", 276 | "error = tf.reduce_mean(tf.square(tf.subtract(y, y_hat)))\n", 277 | "loss = tf.add(error, l2_norm)" 278 | ] 279 | }, 280 | { 281 | "cell_type": "markdown", 282 | "metadata": {}, 283 | "source": [ 284 | "## Optimization\n", 285 | "\n", 286 | "Given a loss function, tensorflow can automatically calculate the derivatives of the loss function and find the optimal values for the 'variables' of the loss function. Under the hood, gradient descent optimizer update model parameters iteratively with the following update rule:" 287 | ] 288 | }, 289 | { 290 | "cell_type": "code", 291 | "execution_count": 78, 292 | "metadata": {}, 293 | "outputs": [ 294 | { 295 | "data": { 296 | "text/latex": [ 297 | "$$\\Theta_{i+1} = \\Theta_{i} - \\eta \\frac{\\delta L}{\\delta \\Theta}$$" 298 | ], 299 | "text/plain": [ 300 | "" 301 | ] 302 | }, 303 | "metadata": {}, 304 | "output_type": "display_data" 305 | } 306 | ], 307 | "source": [ 308 | "display(Math(r'\\Theta_{i+1} = \\Theta_{i} - \\eta \\frac{\\delta L}{\\delta \\Theta}'))" 309 | ] 310 | }, 311 | { 312 | "cell_type": "code", 313 | "execution_count": 79, 314 | "metadata": {}, 315 | "outputs": [], 316 | "source": [ 317 | "optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.01).minimize(loss)" 318 | ] 319 | }, 320 | { 321 | "cell_type": "markdown", 322 | "metadata": {}, 323 | "source": [ 324 | "## Preparing Batches\n", 325 | "\n", 326 | "With SGD or Adam optimization methods, you can update model parameters in mini-batches. The larges the size of mini-batches are, the faster is the optimization method but it can become harder to find the optimized parameters. Size of mini-batches is a trade-off between accuracy and complexity. Here we implement a method to generate mini-batches from the input data. Check this great weblog for an [overview of gradient descent optimization methods](http://ruder.io/optimizing-gradient-descent/index.html)." 327 | ] 328 | }, 329 | { 330 | "cell_type": "code", 331 | "execution_count": 80, 332 | "metadata": {}, 333 | "outputs": [], 334 | "source": [ 335 | "def batcher(X_, y_=None, batch_size=-1):\n", 336 | " n_samples = X_.shape[0]\n", 337 | "\n", 338 | " if batch_size == -1:\n", 339 | " batch_size = n_samples\n", 340 | " if batch_size < 1:\n", 341 | " raise ValueError('Parameter batch_size={} is unsupported'.format(batch_size))\n", 342 | "\n", 343 | " for i in range(0, n_samples, batch_size):\n", 344 | " upper_bound = min(i + batch_size, n_samples)\n", 345 | " ret_x = X_[i:upper_bound]\n", 346 | " ret_y = None\n", 347 | " if y_ is not None:\n", 348 | " ret_y = y_[i:i + batch_size]\n", 349 | " yield (ret_x, ret_y)" 350 | ] 351 | }, 352 | { 353 | "cell_type": "markdown", 354 | "metadata": {}, 355 | "source": [ 356 | "## Lanching tensorflow graph and training the model\n", 357 | "Finally we can strat the tensorflow session to initialize the variabeles and optimize the model prameters. Training consists of running the `optimizer` operation and feeding the mini-batches to the optimizer." 358 | ] 359 | }, 360 | { 361 | "cell_type": "code", 362 | "execution_count": 81, 363 | "metadata": {}, 364 | "outputs": [ 365 | { 366 | "data": { 367 | "application/vnd.jupyter.widget-view+json": { 368 | "model_id": "983ce6eb5e604ee5aee7c85314180d77", 369 | "version_major": 2, 370 | "version_minor": 0 371 | }, 372 | "text/plain": [ 373 | "HBox(children=(IntProgress(value=0, max=10), HTML(value='')))" 374 | ] 375 | }, 376 | "metadata": {}, 377 | "output_type": "display_data" 378 | }, 379 | { 380 | "name": "stdout", 381 | "output_type": "stream", 382 | "text": [ 383 | "\n" 384 | ] 385 | } 386 | ], 387 | "source": [ 388 | "from tqdm import tqdm_notebook as tqdm\n", 389 | "\n", 390 | "epochs = 10\n", 391 | "batch_size = 1000\n", 392 | "\n", 393 | "# Launch the graph\n", 394 | "init = tf.global_variables_initializer()\n", 395 | "sess = tf.Session()\n", 396 | "\n", 397 | "sess.run(init)\n", 398 | "\n", 399 | "for epoch in tqdm(range(epochs), unit='epoch'):\n", 400 | " perm = np.random.permutation(X_train.shape[0])\n", 401 | " # iterate over batches\n", 402 | " for bX, bY in batcher(X_train[perm], y_train[perm], batch_size):\n", 403 | " sess.run(optimizer, feed_dict={X: bX.reshape(-1, p), y: bY.reshape(-1, 1)})" 404 | ] 405 | }, 406 | { 407 | "cell_type": "markdown", 408 | "metadata": {}, 409 | "source": [ 410 | "## Evaluating the model\n", 411 | "We can now evaluate the trained model on out test set. We use RMSE to measure the error of predictions. Note that here we need to run the `error` operation." 412 | ] 413 | }, 414 | { 415 | "cell_type": "code", 416 | "execution_count": 82, 417 | "metadata": {}, 418 | "outputs": [ 419 | { 420 | "name": "stdout", 421 | "output_type": "stream", 422 | "text": [ 423 | "1.2851532\n" 424 | ] 425 | } 426 | ], 427 | "source": [ 428 | "errors = []\n", 429 | "for bX, bY in batcher(X_test, y_test):\n", 430 | " errors.append(sess.run(error, feed_dict={X: bX.reshape(-1, p), y: bY.reshape(-1, 1)}))\n", 431 | "\n", 432 | "RMSE = np.sqrt(np.array(errors).mean())\n", 433 | "print(RMSE)" 434 | ] 435 | }, 436 | { 437 | "cell_type": "markdown", 438 | "metadata": {}, 439 | "source": [ 440 | "## Closing tensorflow session\n", 441 | "After finishing your experiment make sure you close the tf session that you crated to free the memory it uses." 442 | ] 443 | }, 444 | { 445 | "cell_type": "code", 446 | "execution_count": 83, 447 | "metadata": {}, 448 | "outputs": [], 449 | "source": [ 450 | "sess.close()" 451 | ] 452 | } 453 | ], 454 | "metadata": { 455 | "kernelspec": { 456 | "display_name": "Python 3", 457 | "language": "python", 458 | "name": "python3" 459 | }, 460 | "language_info": { 461 | "codemirror_mode": { 462 | "name": "ipython", 463 | "version": 3 464 | }, 465 | "file_extension": ".py", 466 | "mimetype": "text/x-python", 467 | "name": "python", 468 | "nbconvert_exporter": "python", 469 | "pygments_lexer": "ipython3", 470 | "version": "3.6.5" 471 | } 472 | }, 473 | "nbformat": 4, 474 | "nbformat_minor": 2 475 | } 476 | --------------------------------------------------------------------------------