├── .DS_Store ├── Data ├── .DS_Store └── ml-1m │ ├── README │ ├── movies.dat │ ├── ratings.dat │ └── users.dat ├── Like2Vec.py ├── Like2Vec_Example.ipynb ├── README.md ├── images └── tsne.png └── tsne.py /.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jacobBaumbach/Like2Vec_TensorFlow/d4355f93dcf413737faf46dcad81bc5fcf138c7e/.DS_Store -------------------------------------------------------------------------------- /Data/.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jacobBaumbach/Like2Vec_TensorFlow/d4355f93dcf413737faf46dcad81bc5fcf138c7e/Data/.DS_Store -------------------------------------------------------------------------------- /Data/ml-1m/README: -------------------------------------------------------------------------------- 1 | SUMMARY 2 | ================================================================================ 3 | 4 | These files contain 1,000,209 anonymous ratings of approximately 3,900 movies 5 | made by 6,040 MovieLens users who joined MovieLens in 2000. 6 | 7 | USAGE LICENSE 8 | ================================================================================ 9 | 10 | Neither the University of Minnesota nor any of the researchers 11 | involved can guarantee the correctness of the data, its suitability 12 | for any particular purpose, or the validity of results based on the 13 | use of the data set. The data set may be used for any research 14 | purposes under the following conditions: 15 | 16 | * The user may not state or imply any endorsement from the 17 | University of Minnesota or the GroupLens Research Group. 18 | 19 | * The user must acknowledge the use of the data set in 20 | publications resulting from the use of the data set 21 | (see below for citation information). 22 | 23 | * The user may not redistribute the data without separate 24 | permission. 25 | 26 | * The user may not use this information for any commercial or 27 | revenue-bearing purposes without first obtaining permission 28 | from a faculty member of the GroupLens Research Project at the 29 | University of Minnesota. 30 | 31 | If you have any further questions or comments, please contact GroupLens 32 | . 33 | 34 | CITATION 35 | ================================================================================ 36 | 37 | To acknowledge use of the dataset in publications, please cite the following 38 | paper: 39 | 40 | F. Maxwell Harper and Joseph A. Konstan. 2015. The MovieLens Datasets: History 41 | and Context. ACM Transactions on Interactive Intelligent Systems (TiiS) 5, 4, 42 | Article 19 (December 2015), 19 pages. DOI=http://dx.doi.org/10.1145/2827872 43 | 44 | 45 | ACKNOWLEDGEMENTS 46 | ================================================================================ 47 | 48 | Thanks to Shyong Lam and Jon Herlocker for cleaning up and generating the data 49 | set. 50 | 51 | FURTHER INFORMATION ABOUT THE GROUPLENS RESEARCH PROJECT 52 | ================================================================================ 53 | 54 | The GroupLens Research Project is a research group in the Department of 55 | Computer Science and Engineering at the University of Minnesota. Members of 56 | the GroupLens Research Project are involved in many research projects related 57 | to the fields of information filtering, collaborative filtering, and 58 | recommender systems. The project is lead by professors John Riedl and Joseph 59 | Konstan. The project began to explore automated collaborative filtering in 60 | 1992, but is most well known for its world wide trial of an automated 61 | collaborative filtering system for Usenet news in 1996. Since then the project 62 | has expanded its scope to research overall information filtering solutions, 63 | integrating in content-based methods as well as improving current collaborative 64 | filtering technology. 65 | 66 | Further information on the GroupLens Research project, including research 67 | publications, can be found at the following web site: 68 | 69 | http://www.grouplens.org/ 70 | 71 | GroupLens Research currently operates a movie recommender based on 72 | collaborative filtering: 73 | 74 | http://www.movielens.org/ 75 | 76 | RATINGS FILE DESCRIPTION 77 | ================================================================================ 78 | 79 | All ratings are contained in the file "ratings.dat" and are in the 80 | following format: 81 | 82 | UserID::MovieID::Rating::Timestamp 83 | 84 | - UserIDs range between 1 and 6040 85 | - MovieIDs range between 1 and 3952 86 | - Ratings are made on a 5-star scale (whole-star ratings only) 87 | - Timestamp is represented in seconds since the epoch as returned by time(2) 88 | - Each user has at least 20 ratings 89 | 90 | USERS FILE DESCRIPTION 91 | ================================================================================ 92 | 93 | User information is in the file "users.dat" and is in the following 94 | format: 95 | 96 | UserID::Gender::Age::Occupation::Zip-code 97 | 98 | All demographic information is provided voluntarily by the users and is 99 | not checked for accuracy. Only users who have provided some demographic 100 | information are included in this data set. 101 | 102 | - Gender is denoted by a "M" for male and "F" for female 103 | - Age is chosen from the following ranges: 104 | 105 | * 1: "Under 18" 106 | * 18: "18-24" 107 | * 25: "25-34" 108 | * 35: "35-44" 109 | * 45: "45-49" 110 | * 50: "50-55" 111 | * 56: "56+" 112 | 113 | - Occupation is chosen from the following choices: 114 | 115 | * 0: "other" or not specified 116 | * 1: "academic/educator" 117 | * 2: "artist" 118 | * 3: "clerical/admin" 119 | * 4: "college/grad student" 120 | * 5: "customer service" 121 | * 6: "doctor/health care" 122 | * 7: "executive/managerial" 123 | * 8: "farmer" 124 | * 9: "homemaker" 125 | * 10: "K-12 student" 126 | * 11: "lawyer" 127 | * 12: "programmer" 128 | * 13: "retired" 129 | * 14: "sales/marketing" 130 | * 15: "scientist" 131 | * 16: "self-employed" 132 | * 17: "technician/engineer" 133 | * 18: "tradesman/craftsman" 134 | * 19: "unemployed" 135 | * 20: "writer" 136 | 137 | MOVIES FILE DESCRIPTION 138 | ================================================================================ 139 | 140 | Movie information is in the file "movies.dat" and is in the following 141 | format: 142 | 143 | MovieID::Title::Genres 144 | 145 | - Titles are identical to titles provided by the IMDB (including 146 | year of release) 147 | - Genres are pipe-separated and are selected from the following genres: 148 | 149 | * Action 150 | * Adventure 151 | * Animation 152 | * Children's 153 | * Comedy 154 | * Crime 155 | * Documentary 156 | * Drama 157 | * Fantasy 158 | * Film-Noir 159 | * Horror 160 | * Musical 161 | * Mystery 162 | * Romance 163 | * Sci-Fi 164 | * Thriller 165 | * War 166 | * Western 167 | 168 | - Some MovieIDs do not correspond to a movie due to accidental duplicate 169 | entries and/or test entries 170 | - Movies are mostly entered by hand, so errors and inconsistencies may exist 171 | -------------------------------------------------------------------------------- /Data/ml-1m/movies.dat: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jacobBaumbach/Like2Vec_TensorFlow/d4355f93dcf413737faf46dcad81bc5fcf138c7e/Data/ml-1m/movies.dat -------------------------------------------------------------------------------- /Like2Vec.py: -------------------------------------------------------------------------------- 1 | from __future__ import absolute_import 2 | from __future__ import print_function 3 | 4 | import collections 5 | import math 6 | import os 7 | import random 8 | import zipfile 9 | 10 | import numpy as np 11 | from six.moves import urllib 12 | from six.moves import xrange # pylint: disable=redefined-builtin 13 | import tensorflow as tf 14 | from tsne import tsne 15 | import matplotlib.pyplot as plt 16 | 17 | 18 | class Like2Vec(object): 19 | def __init__(self,user_item,name_index_dict,wpv,wl,add_coeff = .01,axis = 0): 20 | """ 21 | Initialize an instance of Like2Vec 22 | 23 | INPUT: 24 | user_item : user(rows) item(columns) matrix, 2 dimensional numpy array 25 | name_index_dict : label (key) index (value), dictonary(string:int) 26 | wpv : random walks per value, int 27 | wl : length of random walk, int 28 | add_coeff : laplace smoothing coefficient (amount you will add to each numerator of proportion matrix), double 29 | axis : when 0 you embeddings will be generated for users, else embeddings will be generated for items 30 | """ 31 | self.user_item = user_item 32 | self.axis_matrix = self.user_item.dot(self.user_item.T) if axis==0 else self.user_item.T.dot(self.user_item) 33 | self.final_matrix = self.laplace_smoothing(add_coeff) 34 | self.name_index_dict = name_index_dict 35 | self.index_name_dict = dict(zip(self.name_index_dict.values(),self.name_index_dict.keys())) 36 | self.labels = [i.decode('ascii',errors="ignore") for i in self.name_index_dict.keys()] 37 | self.data = self.rando_walks(wpv,wl) 38 | self.data_index = 0 39 | self.graph = tf.Graph() 40 | 41 | def laplace_smoothing(self,add_coeff): 42 | """ 43 | Laplace Smoothing is performed on the proportion matrix used to generate random walks 44 | 45 | INPUT: 46 | add_coeff : the coefficient that will be added to the numerator so for each given user or item 47 | all the other users or items will have nonzero proportions, double 48 | OUTPUT: 49 | 2 dimensional array of size either user x user or item x item containing the laplace smoothed proportions 50 | """ 51 | return (self.axis_matrix+add_coeff)/(np.sum(self.axis_matrix,axis=1)+add_coeff*len(self.axis_matrix)) 52 | 53 | def rando_walks(self,wpv,wl): 54 | """ 55 | For each either user or item, wpv random walks will be generated for each user and item and those random walks 56 | will all be of length wl. These random walks are used to generate the embeddings. 57 | 58 | INPUT: 59 | wpv : random walks per value, int 60 | wl : length of random walk, int 61 | 62 | OUTPUT: 63 | list of lists of lists, the outmost list contains a list for every user or item and that list contains 64 | wpv lists where each of those lists contains wl integers of the either users or items visited in the 65 | random walk 66 | """ 67 | rng = xrange(len(self.final_matrix)) 68 | wpvrng = xrange(wpv) 69 | wlrng = xrange(wl) 70 | def rw(idx): 71 | def repeatRw(): 72 | rwLst = [idx] 73 | def w(nuIdx): 74 | rwLst.append(np.random.choice(rng,p=self.final_matrix[nuIdx]/np.sum(self.final_matrix[nuIdx]))) 75 | map(w,wlrng) 76 | return rwLst 77 | return [repeatRw() for _ in wpvrng] 78 | return list(map(rw,rng)) 79 | 80 | def _generate_batch(self,batch_size, num_skips, skip_window): 81 | """ 82 | Creates the minibatch that embeddings will be trained on for a given iteration 83 | 84 | INPUT: 85 | batch_size : the number of samples the embeddings will be trained on for a given iteration 86 | num_skips : How many times to reuse an input to generate a label. 87 | skip_window : How many words to consider left and right 88 | 89 | OUTPUT: 90 | batch : 1 dimensional array of length batch_size that will be the input to train the embeddings 91 | labels : 1 dimensional array of length batch_size that will be the output the embeddings try to match 92 | during training 93 | """ 94 | assert batch_size % num_skips == 0 95 | assert num_skips <= 2 * skip_window 96 | batch = np.ndarray(shape=(batch_size), dtype=np.int32) 97 | labels = np.ndarray(shape=(batch_size, 1), dtype=np.int32) 98 | span = 2 * skip_window + 1 # [ skip_window target skip_window ] 99 | buffer = collections.deque(maxlen=span) 100 | for _ in range(span): 101 | buffer.append(random.choice(self.data[self.data_index]))#CHANGE! 102 | self.data_index = (self.data_index + 1) % len(self.final_matrix) 103 | 104 | for i in range(batch_size // num_skips): 105 | target = skip_window # target label at the center of the buffer 106 | targets_to_avoid = [ skip_window ] 107 | for j in range(num_skips): 108 | while target in targets_to_avoid: 109 | target = random.randint(0, span - 1) 110 | targets_to_avoid.append(target) 111 | 112 | batch[i * num_skips + j] = buffer[target][skip_window] 113 | labels[i * num_skips + j, 0] = buffer[target][0] 114 | buffer.append(random.choice(self.data[self.data_index])) 115 | self.data_index = (self.data_index + 1) % len(self.final_matrix) 116 | return batch, labels 117 | 118 | def _build_skip_gram(self,batch_size = 128,embedding_size = 128,learn_rate=1.0,num_sampled = 64,num_skips = 2, 119 | skip_window = 1,valid_size = 16 ,valid_window = 100 ): 120 | """ 121 | Creates the inputs needed to begin training the embeddings 122 | 123 | INPUT: 124 | batch_size : the number of samples the embeddings will be trained on for a given iteration, int 125 | embedding_size : Dimension of the embedding vector, int 126 | learn_rate : the learning rate used during optimization, double 127 | num_sampled : Number of negative examples to sample, int 128 | num_skips : How many times to reuse an input to generate a label, int 129 | skip_window : How many words to consider left and right, int 130 | valid_size : Random set of users or items to evaluate similarity on, int 131 | valid_window : Only pick dev samples in the head of the distribution, int 132 | 133 | OUTPUT: 134 | loss : Compute the average NCE loss for the batch. 135 | tf.nce_loss automatically draws a new sample of the negative labels each time we evaluate the loss. 136 | normalized_embeddings : tensorflow object that will hold the value of either the user or item embeddings. 137 | optimizer : Construct the SGD optimizer using a learning rate of learn_rate 138 | similarity : Computes the cosine similarity between minibatch examples and all embeddings 139 | train_inputs : placeholder for input data used to train embeddings 140 | train_labels : placeholder for output the embeddings try to match during training 141 | valid_examples : randomly chooses valid_size out of the numbers between 0 and valid_window to 142 | """ 143 | 144 | valid_examples = np.random.choice(valid_window, valid_size, replace=False) 145 | 146 | with self.graph.as_default(): 147 | vocabulary_size = len(self.final_matrix) 148 | # Input data. 149 | train_inputs = tf.placeholder(tf.int32, shape=[batch_size]) 150 | train_labels = tf.placeholder(tf.int32, shape=[batch_size, 1]) 151 | valid_dataset = tf.constant(valid_examples, dtype=tf.int32) 152 | 153 | # Ops and variables pinned to the CPU because of missing GPU implementation 154 | with tf.device('/cpu:0'): 155 | # Look up embeddings for inputs. 156 | embeddings = tf.Variable( 157 | tf.random_uniform([vocabulary_size, embedding_size], -1.0, 1.0)) 158 | embed = tf.nn.embedding_lookup(embeddings, train_inputs) 159 | 160 | # Construct the variables for the NCE loss 161 | nce_weights = tf.Variable( 162 | tf.truncated_normal([vocabulary_size, embedding_size], 163 | stddev=1.0 / math.sqrt(embedding_size))) 164 | nce_biases = tf.Variable(tf.zeros([vocabulary_size])) 165 | 166 | # Compute the average NCE loss for the batch. 167 | # tf.nce_loss automatically draws a new sample of the negative labels each 168 | # time we evaluate the loss. 169 | loss = tf.reduce_mean( 170 | tf.nn.nce_loss(nce_weights, nce_biases, embed, train_labels, 171 | num_sampled, vocabulary_size)) 172 | 173 | # Construct the SGD optimizer using a learning rate of 1.0. 174 | optimizer = tf.train.GradientDescentOptimizer(learn_rate).minimize(loss) 175 | 176 | # Compute the cosine similarity between minibatch examples and all embeddings. 177 | norm = tf.sqrt(tf.reduce_sum(tf.square(embeddings), 1, keep_dims=True)) 178 | normalized_embeddings = embeddings / norm 179 | valid_embeddings = tf.nn.embedding_lookup( 180 | normalized_embeddings, valid_dataset) 181 | similarity = tf.matmul( 182 | valid_embeddings, normalized_embeddings, transpose_b=True) 183 | return loss,normalized_embeddings,optimizer,similarity,train_inputs,train_labels,valid_examples 184 | 185 | def fit(self,batch_size = 128,embedding_size = 128,learn_rate=1.0,num_sampled = 64,num_steps = 100001, 186 | num_skips = 2,skip_window = 1,valid_size = 16 ,valid_window = 100, 187 | verbose=False): 188 | """ 189 | Generates embeddings 190 | 191 | INPUT: 192 | batch_size : the number of samples the embeddings will be trained on for a given iteration, int 193 | embedding_size : Dimension of the embedding vector, int 194 | learn_rate : the learning rate used during optimization, double 195 | num_sampled : Number of negative examples to sample, int 196 | num_steps : number of iterations to train the embeddings, int 197 | num_skips : How many times to reuse an input to generate a label, int 198 | skip_window : How many words to consider left and right, int 199 | valid_size : Random set of users or items to evaluate similarity on, int 200 | valid_window : Only pick dev samples in the head of the distribution, int 201 | verboose : True will print progress, else progress will not be printed, boolean 202 | 203 | OUTPUT: 204 | final_embeddings : either user x embedding_size or item x embedding_size 2 dimensional numpy array 205 | containing the final embeddings for all users or items 206 | """ 207 | 208 | loss,normalized_embeddings,optimizer,similarity,train_inputs,train_labels,valid_examples = self._build_skip_gram(batch_size,embedding_size,learn_rate,num_sampled,num_skips, 209 | skip_window,valid_size,valid_window) 210 | 211 | with tf.Session(graph=self.graph) as session: 212 | # We must initialize all variables before we use them. 213 | tf.initialize_all_variables().run() 214 | 215 | average_loss = 0 216 | for step in xrange(num_steps): 217 | batch_inputs, batch_labels = self._generate_batch( 218 | batch_size, num_skips, skip_window) 219 | feed_dict = {train_inputs : batch_inputs, train_labels : batch_labels} 220 | 221 | # We perform one update step by evaluating the optimizer op (including it 222 | # in the list of returned values for session.run() 223 | _, loss_val = session.run([optimizer, loss], feed_dict=feed_dict) 224 | if verbose: 225 | average_loss += loss_val 226 | if step % 2000 == 0: 227 | if step > 0: 228 | average_loss /= 2000 229 | # The average loss is an estimate of the loss over the last 2000 batches. 230 | print("Average loss at step ", step, ": ", average_loss) 231 | average_loss = 0 232 | if step % 10000 == 0:# Note that this is expensive (~20% slowdown if computed every 500 steps) 233 | sim = similarity.eval() 234 | for i in xrange(valid_size): 235 | valid_word = self.index_name_dict[valid_examples[i]] 236 | top_k = 8 # number of nearest neighbors 237 | nearest = (-sim[i, :]).argsort()[1:top_k+1] 238 | log_str = "Nearest to %s:" % valid_word 239 | for k in xrange(top_k): 240 | close_word = self.index_name_dict[nearest[k]] 241 | log_str = "%s %s," % (log_str, close_word) 242 | print(log_str) 243 | print("") 244 | final_embeddings = normalized_embeddings.eval() 245 | self.final_embeddings = final_embeddings 246 | return self.final_embeddings 247 | 248 | def plot_with_labels(self,plot_only = 100, title="Like2Vec meets TensorFlow", filename='tsne.png', 249 | num_tsne_dims = 2, perplexity = 5.0,verbose=False): 250 | """ 251 | randomly chooses some of the users or items and plots them using tsne 252 | 253 | INPUT: 254 | plot_only : the number of users or items you would like plotted,int 255 | title : the title of the plot generated, str 256 | filename : the name you would like the file saved under, str 257 | num_tsne_dims : number of dimensions, int 258 | perplexity : the perplexity used in generating tsne, recommended to be between 5.0-50.0, double 259 | verbose : whether or not to print the progress of tsne 260 | 261 | OUTPUT: 262 | Your plot will be saved under the name and in the location you passed in filename 263 | """ 264 | selected_rows = np.sort(np.random.choice(range(self.final_embeddings.shape[0]),plot_only,replace=False)) 265 | labels = [self.labels[i] for i in selected_rows] 266 | low_dim_embs = tsne(self.final_embeddings[selected_rows], num_tsne_dims, 267 | self.final_embeddings.shape[1], perplexity,verbose) 268 | assert low_dim_embs.shape[0] >= len(labels), "More labels than embeddings" 269 | plt.figure(figsize=(18, 18)) #in inches 270 | for i, label in enumerate(labels): 271 | x, y = low_dim_embs[i,:] 272 | plt.scatter(x, y) 273 | plt.annotate(label, 274 | xy=(x, y), 275 | xytext=(5, 2), 276 | textcoords='offset points', 277 | ha='right', 278 | va='bottom') 279 | plt.title(title) 280 | plt.savefig(filename) -------------------------------------------------------------------------------- /Like2Vec_Example.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": { 6 | "collapsed": false 7 | }, 8 | "source": [ 9 | "# Like2Vec\n", 10 | "\n", 11 | "## Word2Vec for users or items, via TensorFlow\n", 12 | "\n", 13 | "" 14 | ] 15 | }, 16 | { 17 | "cell_type": "markdown", 18 | "metadata": {}, 19 | "source": [ 20 | "This is an implementation of Like2Vec using TensorFlow. This class will allow you to generate word2vec like embeddings for users or items. The theory behind Like2Vec can be found here: http://www.perozzi.net/publications/14_kdd_deepwalk.pdf. I implement Like2Vec by altering TensorFlow's example to generate word2vec embeddings. The TensorFlow word2vec example I used can be found here : https://github.com/tensorflow/tensorflow/blob/r0.8/tensorflow/examples/tutorials/word2vec/word2vec_basic.py. I attempted to use sklearn's TSNE class, as it does in the TensorFlow example, but I was having trouble with the gradient used to train TSNE generating inf/nan values. I decided to use a different TSNE function to plot my embeddings, which can be found here : https://lvdmaaten.github.io/tsne/.\n", 21 | "\n", 22 | "Below is an example of how to use the Like2Vec class using the 1 million example movielens dataset, found here : http://grouplens.org/datasets/movielens/.\n", 23 | "\n", 24 | "\n", 25 | "### Resources:\n", 26 | "Data : http://grouplens.org/datasets/movielens/\n", 27 | "\n", 28 | "TSNE code : https://lvdmaaten.github.io/tsne/\n", 29 | "\n", 30 | "Like2Vec Theory : http://www.perozzi.net/publications/14_kdd_deepwalk.pdf\n", 31 | "\n", 32 | "Word2Vec Example Code : https://github.com/tensorflow/tensorflow/blob/r0.8/tensorflow/examples/tutorials/word2vec/word2vec_basic.py" 33 | ] 34 | }, 35 | { 36 | "cell_type": "code", 37 | "execution_count": 1, 38 | "metadata": { 39 | "collapsed": false 40 | }, 41 | "outputs": [], 42 | "source": [ 43 | "from __future__ import absolute_import\n", 44 | "from __future__ import print_function\n", 45 | "\n", 46 | "import collections\n", 47 | "import math\n", 48 | "import os\n", 49 | "import random\n", 50 | "import zipfile\n", 51 | "\n", 52 | "import numpy as np\n", 53 | "from six.moves import urllib\n", 54 | "from six.moves import xrange # pylint: disable=redefined-builtin\n", 55 | "import tensorflow as tf\n", 56 | "from tsne import tsne\n", 57 | "import matplotlib.pyplot as plt\n", 58 | "\n", 59 | "\n", 60 | "class Like2Vec(object):\n", 61 | " def __init__(self,user_item,name_index_dict,wpv,wl,add_coeff = .01,axis = 0):\n", 62 | " \"\"\"\n", 63 | " Initialize an instance of Like2Vec\n", 64 | " \n", 65 | " INPUT:\n", 66 | " user_item : user(rows) item(columns) matrix, 2 dimensional numpy array \n", 67 | " name_index_dict : label (key) index (value), dictonary(string:int)\n", 68 | " wpv : random walks per value, int\n", 69 | " wl : length of random walk, int\n", 70 | " add_coeff : laplace smoothing coefficient (amount you will add to each numerator of proportion matrix), double\n", 71 | " axis : when 0 you embeddings will be generated for users, else embeddings will be generated for items\n", 72 | " \"\"\"\n", 73 | " self.user_item = user_item\n", 74 | " self.axis_matrix = self.user_item.dot(self.user_item.T) if axis==0 else self.user_item.T.dot(self.user_item)\n", 75 | " self.final_matrix = self.laplace_smoothing(add_coeff)\n", 76 | " self.name_index_dict = name_index_dict\n", 77 | " self.index_name_dict = dict(zip(self.name_index_dict.values(),self.name_index_dict.keys()))\n", 78 | " self.labels = [i.decode('ascii',errors=\"ignore\") for i in self.name_index_dict.keys()]\n", 79 | " self.data = self.rando_walks(wpv,wl)\n", 80 | " self.data_index = 0\n", 81 | " self.graph = tf.Graph()\n", 82 | " \n", 83 | " def laplace_smoothing(self,add_coeff):\n", 84 | " \"\"\"\n", 85 | " Laplace Smoothing is performed on the proportion matrix used to generate random walks\n", 86 | " \n", 87 | " INPUT:\n", 88 | " add_coeff : the coefficient that will be added to the numerator so for each given user or item\n", 89 | " all the other users or items will have nonzero proportions, double\n", 90 | " OUTPUT:\n", 91 | " 2 dimensional array of size either user x user or item x item containing the laplace smoothed proportions\n", 92 | " \"\"\"\n", 93 | " return (self.axis_matrix+add_coeff)/(np.sum(self.axis_matrix,axis=1)+add_coeff*len(self.axis_matrix))\n", 94 | " \n", 95 | " def rando_walks(self,wpv,wl):\n", 96 | " \"\"\"\n", 97 | " For each either user or item, wpv random walks will be generated for each user and item and those random walks\n", 98 | " will all be of length wl. These random walks are used to generate the embeddings.\n", 99 | " \n", 100 | " INPUT:\n", 101 | " wpv : random walks per value, int\n", 102 | " wl : length of random walk, int\n", 103 | " \n", 104 | " OUTPUT:\n", 105 | " list of lists of lists, the outmost list contains a list for every user or item and that list contains\n", 106 | " wpv lists where each of those lists contains wl integers of the either users or items visited in the\n", 107 | " random walk\n", 108 | " \"\"\"\n", 109 | " rng = xrange(len(self.final_matrix))\n", 110 | " wpvrng = xrange(wpv)\n", 111 | " wlrng = xrange(wl)\n", 112 | " def rw(idx):\n", 113 | " def repeatRw():\n", 114 | " rwLst = [idx]\n", 115 | " def w(nuIdx):\n", 116 | " rwLst.append(np.random.choice(rng,p=self.final_matrix[nuIdx]/np.sum(self.final_matrix[nuIdx])))\n", 117 | " map(w,wlrng)\n", 118 | " return rwLst\n", 119 | " return [repeatRw() for _ in wpvrng]\n", 120 | " return list(map(rw,rng))\n", 121 | "\n", 122 | " def _generate_batch(self,batch_size, num_skips, skip_window):\n", 123 | " \"\"\"\n", 124 | " Creates the minibatch that embeddings will be trained on for a given iteration\n", 125 | " \n", 126 | " INPUT:\n", 127 | " batch_size : the number of samples the embeddings will be trained on for a given iteration\n", 128 | " num_skips : How many times to reuse an input to generate a label.\n", 129 | " skip_window : How many words to consider left and right\n", 130 | " \n", 131 | " OUTPUT:\n", 132 | " batch : 1 dimensional array of length batch_size that will be the input to train the embeddings\n", 133 | " labels : 1 dimensional array of length batch_size that will be the output the embeddings try to match\n", 134 | " during training\n", 135 | " \"\"\"\n", 136 | " assert batch_size % num_skips == 0\n", 137 | " assert num_skips <= 2 * skip_window\n", 138 | " batch = np.ndarray(shape=(batch_size), dtype=np.int32)\n", 139 | " labels = np.ndarray(shape=(batch_size, 1), dtype=np.int32)\n", 140 | " span = 2 * skip_window + 1 # [ skip_window target skip_window ]\n", 141 | " buffer = collections.deque(maxlen=span)\n", 142 | " for _ in range(span):\n", 143 | " buffer.append(random.choice(self.data[self.data_index]))#CHANGE!\n", 144 | " self.data_index = (self.data_index + 1) % len(self.final_matrix)\n", 145 | "\n", 146 | " for i in range(batch_size // num_skips):\n", 147 | " target = skip_window # target label at the center of the buffer\n", 148 | " targets_to_avoid = [ skip_window ]\n", 149 | " for j in range(num_skips):\n", 150 | " while target in targets_to_avoid:\n", 151 | " target = random.randint(0, span - 1)\n", 152 | " targets_to_avoid.append(target)\n", 153 | "\n", 154 | " batch[i * num_skips + j] = buffer[target][skip_window]\n", 155 | " labels[i * num_skips + j, 0] = buffer[target][0]\n", 156 | " buffer.append(random.choice(self.data[self.data_index]))\n", 157 | " self.data_index = (self.data_index + 1) % len(self.final_matrix)\n", 158 | " return batch, labels\n", 159 | "\n", 160 | " def _build_skip_gram(self,batch_size = 128,embedding_size = 128,learn_rate=1.0,num_sampled = 64,num_skips = 2,\n", 161 | " skip_window = 1,valid_size = 16 ,valid_window = 100 ):\n", 162 | " \"\"\"\n", 163 | " Creates the inputs needed to begin training the embeddings\n", 164 | " \n", 165 | " INPUT:\n", 166 | " batch_size : the number of samples the embeddings will be trained on for a given iteration, int\n", 167 | " embedding_size : Dimension of the embedding vector, int\n", 168 | " learn_rate : the learning rate used during optimization, double\n", 169 | " num_sampled : Number of negative examples to sample, int\n", 170 | " num_skips : How many times to reuse an input to generate a label, int\n", 171 | " skip_window : How many words to consider left and right, int\n", 172 | " valid_size : Random set of users or items to evaluate similarity on, int\n", 173 | " valid_window : Only pick dev samples in the head of the distribution, int\n", 174 | " \n", 175 | " OUTPUT:\n", 176 | " loss : Compute the average NCE loss for the batch. \n", 177 | " tf.nce_loss automatically draws a new sample of the negative labels each time we evaluate the loss.\n", 178 | " normalized_embeddings : tensorflow object that will hold the value of either the user or item embeddings.\n", 179 | " optimizer : Construct the SGD optimizer using a learning rate of learn_rate\n", 180 | " similarity : Computes the cosine similarity between minibatch examples and all embeddings\n", 181 | " train_inputs : placeholder for input data used to train embeddings\n", 182 | " train_labels : placeholder for output the embeddings try to match during training\n", 183 | " valid_examples : randomly chooses valid_size out of the numbers between 0 and valid_window to \n", 184 | " \"\"\"\n", 185 | " \n", 186 | " valid_examples = np.random.choice(valid_window, valid_size, replace=False)\n", 187 | "\n", 188 | " with self.graph.as_default():\n", 189 | " vocabulary_size = len(self.final_matrix)\n", 190 | " # Input data.\n", 191 | " train_inputs = tf.placeholder(tf.int32, shape=[batch_size])\n", 192 | " train_labels = tf.placeholder(tf.int32, shape=[batch_size, 1])\n", 193 | " valid_dataset = tf.constant(valid_examples, dtype=tf.int32)\n", 194 | "\n", 195 | " # Ops and variables pinned to the CPU because of missing GPU implementation\n", 196 | " with tf.device('/cpu:0'):\n", 197 | " # Look up embeddings for inputs.\n", 198 | " embeddings = tf.Variable(\n", 199 | " tf.random_uniform([vocabulary_size, embedding_size], -1.0, 1.0))\n", 200 | " embed = tf.nn.embedding_lookup(embeddings, train_inputs)\n", 201 | "\n", 202 | " # Construct the variables for the NCE loss\n", 203 | " nce_weights = tf.Variable(\n", 204 | " tf.truncated_normal([vocabulary_size, embedding_size],\n", 205 | " stddev=1.0 / math.sqrt(embedding_size)))\n", 206 | " nce_biases = tf.Variable(tf.zeros([vocabulary_size]))\n", 207 | "\n", 208 | " # Compute the average NCE loss for the batch.\n", 209 | " # tf.nce_loss automatically draws a new sample of the negative labels each\n", 210 | " # time we evaluate the loss.\n", 211 | " loss = tf.reduce_mean(\n", 212 | " tf.nn.nce_loss(nce_weights, nce_biases, embed, train_labels,\n", 213 | " num_sampled, vocabulary_size))\n", 214 | "\n", 215 | " # Construct the SGD optimizer using a learning rate of 1.0.\n", 216 | " optimizer = tf.train.GradientDescentOptimizer(learn_rate).minimize(loss)\n", 217 | "\n", 218 | " # Compute the cosine similarity between minibatch examples and all embeddings.\n", 219 | " norm = tf.sqrt(tf.reduce_sum(tf.square(embeddings), 1, keep_dims=True))\n", 220 | " normalized_embeddings = embeddings / norm\n", 221 | " valid_embeddings = tf.nn.embedding_lookup(\n", 222 | " normalized_embeddings, valid_dataset)\n", 223 | " similarity = tf.matmul(\n", 224 | " valid_embeddings, normalized_embeddings, transpose_b=True)\n", 225 | " return loss,normalized_embeddings,optimizer,similarity,train_inputs,train_labels,valid_examples\n", 226 | " \n", 227 | " def fit(self,batch_size = 128,embedding_size = 128,learn_rate=1.0,num_sampled = 64,num_steps = 100001,\n", 228 | " num_skips = 2,skip_window = 1,valid_size = 16 ,valid_window = 100,\n", 229 | " verbose=False):\n", 230 | " \"\"\"\n", 231 | " Generates embeddings\n", 232 | " \n", 233 | " INPUT:\n", 234 | " batch_size : the number of samples the embeddings will be trained on for a given iteration, int\n", 235 | " embedding_size : Dimension of the embedding vector, int\n", 236 | " learn_rate : the learning rate used during optimization, double\n", 237 | " num_sampled : Number of negative examples to sample, int\n", 238 | " num_steps : number of iterations to train the embeddings, int\n", 239 | " num_skips : How many times to reuse an input to generate a label, int\n", 240 | " skip_window : How many words to consider left and right, int\n", 241 | " valid_size : Random set of users or items to evaluate similarity on, int\n", 242 | " valid_window : Only pick dev samples in the head of the distribution, int\n", 243 | " verboose : True will print progress, else progress will not be printed, boolean\n", 244 | " \n", 245 | " OUTPUT:\n", 246 | " final_embeddings : either user x embedding_size or item x embedding_size 2 dimensional numpy array\n", 247 | " containing the final embeddings for all users or items\n", 248 | " \"\"\"\n", 249 | " \n", 250 | " loss,normalized_embeddings,optimizer,similarity,train_inputs,train_labels,valid_examples = self._build_skip_gram(batch_size,embedding_size,learn_rate,num_sampled,num_skips,\n", 251 | " skip_window,valid_size,valid_window)\n", 252 | " \n", 253 | " with tf.Session(graph=self.graph) as session:\n", 254 | " # We must initialize all variables before we use them.\n", 255 | " tf.initialize_all_variables().run()\n", 256 | "\n", 257 | " average_loss = 0\n", 258 | " for step in xrange(num_steps):\n", 259 | " batch_inputs, batch_labels = self._generate_batch(\n", 260 | " batch_size, num_skips, skip_window)\n", 261 | " feed_dict = {train_inputs : batch_inputs, train_labels : batch_labels}\n", 262 | "\n", 263 | " # We perform one update step by evaluating the optimizer op (including it\n", 264 | " # in the list of returned values for session.run()\n", 265 | " _, loss_val = session.run([optimizer, loss], feed_dict=feed_dict)\n", 266 | " if verbose:\n", 267 | " average_loss += loss_val\n", 268 | " if step % 2000 == 0:\n", 269 | " if step > 0:\n", 270 | " average_loss /= 2000\n", 271 | " # The average loss is an estimate of the loss over the last 2000 batches.\n", 272 | " print(\"Average loss at step \", step, \": \", average_loss)\n", 273 | " average_loss = 0\n", 274 | " if step % 10000 == 0:# Note that this is expensive (~20% slowdown if computed every 500 steps)\n", 275 | " sim = similarity.eval()\n", 276 | " for i in xrange(valid_size):\n", 277 | " valid_word = self.index_name_dict[valid_examples[i]]\n", 278 | " top_k = 8 # number of nearest neighbors\n", 279 | " nearest = (-sim[i, :]).argsort()[1:top_k+1]\n", 280 | " log_str = \"Nearest to %s:\" % valid_word\n", 281 | " for k in xrange(top_k):\n", 282 | " close_word = self.index_name_dict[nearest[k]]\n", 283 | " log_str = \"%s %s,\" % (log_str, close_word)\n", 284 | " print(log_str)\n", 285 | " print(\"\")\n", 286 | " final_embeddings = normalized_embeddings.eval()\n", 287 | " self.final_embeddings = final_embeddings\n", 288 | " return self.final_embeddings\n", 289 | " \n", 290 | " def plot_with_labels(self,plot_only = 100, title=\"Like2Vec meets TensorFlow\", filename='tsne.png',\n", 291 | " num_tsne_dims = 2, perplexity = 5.0,verbose=False):\n", 292 | " \"\"\"\n", 293 | " randomly chooses some of the users or items and plots them using tsne\n", 294 | " \n", 295 | " INPUT:\n", 296 | " plot_only : the number of users or items you would like plotted,int\n", 297 | " title : the title of the plot generated, str\n", 298 | " filename : the name you would like the file saved under, str\n", 299 | " num_tsne_dims : number of dimensions, int\n", 300 | " perplexity : the perplexity used in generating tsne, recommended to be between 5.0-50.0, double\n", 301 | " verbose : whether or not to print the progress of tsne\n", 302 | " \n", 303 | " OUTPUT:\n", 304 | " Your plot will be saved under the name and in the location you passed in filename\n", 305 | " \"\"\"\n", 306 | " selected_rows = np.sort(np.random.choice(range(self.final_embeddings.shape[0]),plot_only,replace=False))\n", 307 | " labels = [self.labels[i] for i in selected_rows]\n", 308 | " low_dim_embs = tsne(self.final_embeddings[selected_rows], num_tsne_dims,\n", 309 | " self.final_embeddings.shape[1], perplexity,verbose)\n", 310 | " assert low_dim_embs.shape[0] >= len(labels), \"More labels than embeddings\"\n", 311 | " plt.figure(figsize=(18, 18)) #in inches\n", 312 | " for i, label in enumerate(labels):\n", 313 | " x, y = low_dim_embs[i,:]\n", 314 | " plt.scatter(x, y)\n", 315 | " plt.annotate(label,\n", 316 | " xy=(x, y),\n", 317 | " xytext=(5, 2),\n", 318 | " textcoords='offset points',\n", 319 | " ha='right',\n", 320 | " va='bottom')\n", 321 | " plt.title(title)\n", 322 | " plt.savefig(filename)" 323 | ] 324 | }, 325 | { 326 | "cell_type": "code", 327 | "execution_count": 2, 328 | "metadata": { 329 | "collapsed": false 330 | }, 331 | "outputs": [ 332 | { 333 | "name": "stderr", 334 | "output_type": "stream", 335 | "text": [ 336 | "/Users/jacobbaumbach/anaconda/lib/python2.7/site-packages/ipykernel/__main__.py:2: ParserWarning: Falling back to the 'python' engine because the 'c' engine does not support regex separators; you can avoid this warning by specifying engine='python'.\n", 337 | " from ipykernel import kernelapp as app\n" 338 | ] 339 | } 340 | ], 341 | "source": [ 342 | "import pandas as pd\n", 343 | "df1m = pd.read_table(\"Data/ml-1m/ratings.dat\",sep=\"::\",header=None)\n", 344 | "\n", 345 | "mv1m=[]\n", 346 | "with open(\"Data/ml-1m/movies.dat\") as f:\n", 347 | " for i in f.readlines():\n", 348 | " mv1m.append(i.split(\"::\")[:2])\n", 349 | " \n", 350 | "mv1mDict = {int(i[0]):i[1] for i in mv1m}\n", 351 | "\n", 352 | "newMovDict = { i:(mv1mDict[j],j) for i,j in enumerate(sorted(list(set(df1m[1]))))}\n", 353 | "revKey = {j[1]:i for i,j in zip(newMovDict.keys(),newMovDict.values())}\n", 354 | "ui = np.zeros((len(set(df1m[0])),len(set(df1m[1]))))\n", 355 | "def fillMat(i,j,k):\n", 356 | " ui[i-1,revKey[j]]=float(df1m[2][k])\n", 357 | "_=map(fillMat,df1m[0],df1m[1],df1m.index)" 358 | ] 359 | }, 360 | { 361 | "cell_type": "code", 362 | "execution_count": 3, 363 | "metadata": { 364 | "collapsed": false 365 | }, 366 | "outputs": [], 367 | "source": [ 368 | "movie_index_dict = {j[0]:i for i,j in newMovDict.items()}" 369 | ] 370 | }, 371 | { 372 | "cell_type": "code", 373 | "execution_count": 4, 374 | "metadata": { 375 | "collapsed": false 376 | }, 377 | "outputs": [], 378 | "source": [ 379 | "l2v = Like2Vec(ui,movie_index_dict,5,20,add_coeff = 1.0,axis = 1)" 380 | ] 381 | }, 382 | { 383 | "cell_type": "code", 384 | "execution_count": 5, 385 | "metadata": { 386 | "collapsed": false, 387 | "scrolled": true 388 | }, 389 | "outputs": [ 390 | { 391 | "name": "stdout", 392 | "output_type": "stream", 393 | "text": [ 394 | "Average loss at step 0 : 189.340576172\n", 395 | "Nearest to Mis�rables, Les (1995): Oliver! (1968), Single Girl, A (La Fille Seule) (1995), Jude (1996), Soapdish (1991), When the Cats Away (Chacun cherche son chat) (1996), Puppet Master III: Toulon's Revenge (1991), Anna (1996), Three Wishes (1995),\n", 396 | "\n", 397 | "Nearest to Money Train (1995): Star Wars: Episode V - The Empire Strikes Back (1980), Edge of Seventeen (1998), Carrie (1976), Freedom for Us (� nous la libert� ) (1931), Man on the Moon (1999), House Party 2 (1991), Carnosaur (1993), Dead Poets Society (1989),\n", 398 | "\n", 399 | "Nearest to Kicking and Screaming (1995): Forrest Gump (1994), M (1931), Urbania (2000), Mansfield Park (1999), Careful (1992), Emperor and the Assassin, The (Jing ke ci qin wang) (1999), Mummy's Hand, The (1940), Tequila Sunrise (1988),\n", 400 | "\n", 401 | "Nearest to Postino, Il (The Postman) (1994): Cat People (1982), 8 Heads in a Duffel Bag (1997), Secret Agent, The (1996), Ruby in Paradise (1993), Butterfly (La Lengua de las Mariposas) (2000), Agnes of God (1985), Devil's Advocate, The (1997), Maltese Falcon, The (1941),\n", 402 | "\n", 403 | "Nearest to Four Rooms (1995): Indian in the Cupboard, The (1995), I'll Be Home For Christmas (1998), Mr. Saturday Night (1992), Desert Bloom (1986), Great White Hype, The (1996), Alice and Martin (Alice et Martin) (1998), Space Cowboys (2000), Destiny Turns on the Radio (1995),\n", 404 | "\n", 405 | "Nearest to Balto (1995): Pyromaniac's Love Story, A (1995), Bridge on the River Kwai, The (1957), Third Miracle, The (1999), Soldier (1998), Lifeboat (1944), Last of the High Kings, The (a.k.a. Summer Fling) (1996), Shadow, The (1994), Barcelona (1994),\n", 406 | "\n", 407 | "Nearest to Once Upon a Time... When We Were Colored (1995): Three Wishes (1995), Name of the Rose, The (1986), Ravenous (1999), Say Anything... (1989), Evita (1996), Great Day in Harlem, A (1994), Dances with Wolves (1990), My Crazy Life (Mi vida loca) (1993),\n", 408 | "\n", 409 | "Nearest to Seven (Se7en) (1995): King Kong Lives (1986), Lady of Burlesque (1943), Five Easy Pieces (1970), Get Carter (1971), Gulliver's Travels (1939), Manchurian Candidate, The (1962), Sound of Music, The (1965), Species (1995),\n", 410 | "\n", 411 | "Nearest to Confessional, The (Le Confessionnal) (1995): Shadow Conspiracy (1997), Children of the Corn III (1994), Robin Hood: Men in Tights (1993), Highlander: Endgame (2000), Muppets Take Manhattan, The (1984), Metroland (1997), Big Hit, The (1998), Blue in the Face (1995),\n", 412 | "\n", 413 | "Nearest to Kids of the Round Table (1995): Jar, The (Khomreh) (1992), Live and Let Die (1973), Holy Smoke (1999), Messenger: The Story of Joan of Arc, The (1999), Carnosaur 3: Primal Species (1996), Haunted Honeymoon (1986), One Little Indian (1973), 8 Seconds (1994),\n", 414 | "\n", 415 | "Nearest to Usual Suspects, The (1995): Perils of Pauline, The (1947), Crimes and Misdemeanors (1989), King in New York, A (1957), Patton (1970), Jungle Book, The (1967), Othello (1995), Flying Tigers (1942), Beans of Egypt, Maine, The (1994),\n", 416 | "\n", 417 | "Nearest to Now and Then (1995): 8 1/2 Women (1999), Battle of the Sexes, The (1959), They Might Be Giants (1971), Hamlet (2000), Shaggy D.A., The (1976), Leaving Las Vegas (1995), Aladdin (1992), Couch in New York, A (1996),\n", 418 | "\n", 419 | "Nearest to Restoration (1995): Operation Condor (Feiying gaiwak) (1990), Carrie (1976), Santa Fe Trail (1940), Ulysses (Ulisse) (1954), Steal This Movie! (2000), Man of the Year (1995), Golden Child, The (1986), Wedding Gift, The (1994),\n", 420 | "\n", 421 | "Nearest to Angels and Insects (1995): Jeremiah Johnson (1972), Foreign Correspondent (1940), Slaves to the Underground (1997), Barbarella (1968), Jackie Chan's First Strike (1996), Jason's Lyric (1994), Giant (1956), Wilde (1997),\n", 422 | "\n", 423 | "Nearest to American President, The (1995): Drop Dead Fred (1991), Monster, The (Il Mostro) (1994), Hollow Man (2000), Star Wars: Episode VI - Return of the Jedi (1983), I'll Be Home For Christmas (1998), Black Sabbath (Tre Volti Della Paura, I) (1963), Local Hero (1983), Poltergeist III (1988),\n", 424 | "\n", 425 | "Nearest to Two Bits (1995): Maybe, Maybe Not (Bewegte Mann, Der) (1994), Poison Ivy (1992), Golden Voyage of Sinbad, The (1974), Crimson Tide (1995), Adventures of Rocky and Bullwinkle, The (2000), Tarantula (1955), What Ever Happened to Baby Jane? (1962), Gone in 60 Seconds (2000),\n", 426 | "\n", 427 | "Average loss at step 2000 : 22.7874141488\n", 428 | "Average loss at step 4000 : 1.77183082157\n", 429 | "Average loss at step 6000 : 0.97549440974\n", 430 | "Average loss at step 8000 : 0.693439252928\n", 431 | "Average loss at step 10000 : 0.562354820162\n", 432 | "Nearest to Mis�rables, Les (1995): Single Girl, A (La Fille Seule) (1995), Oliver! (1968), Jude (1996), Dear Diary (Caro Diario) (1994), Soapdish (1991), When the Cats Away (Chacun cherche son chat) (1996), Bicentennial Man (1999), Puppet Master III: Toulon's Revenge (1991),\n", 433 | "\n", 434 | "Nearest to Money Train (1995): Edge of Seventeen (1998), Star Wars: Episode V - The Empire Strikes Back (1980), Carrie (1976), Man on the Moon (1999), Audrey Rose (1977), Freedom for Us (� nous la libert� ) (1931), Dead Poets Society (1989), Get Shorty (1995),\n", 435 | "\n", 436 | "Nearest to Kicking and Screaming (1995): Forrest Gump (1994), M (1931), Urbania (2000), Mansfield Park (1999), Careful (1992), Emperor and the Assassin, The (Jing ke ci qin wang) (1999), Rocky Horror Picture Show, The (1975), Mummy's Hand, The (1940),\n", 437 | "\n", 438 | "Nearest to Postino, Il (The Postman) (1994): Secret Agent, The (1996), Cat People (1982), 8 Heads in a Duffel Bag (1997), Kiss of Death (1995), Devil's Advocate, The (1997), Things Change (1988), Butterfly (La Lengua de las Mariposas) (2000), Shadowlands (1993),\n", 439 | "\n", 440 | "Nearest to Four Rooms (1995): Indian in the Cupboard, The (1995), Mr. Saturday Night (1992), I'll Be Home For Christmas (1998), Great White Hype, The (1996), Juror, The (1996), Alice and Martin (Alice et Martin) (1998), Space Cowboys (2000), Bambi (1942),\n", 441 | "\n", 442 | "Nearest to Balto (1995): Pyromaniac's Love Story, A (1995), Lifeboat (1944), Third Miracle, The (1999), Shadow, The (1994), Bridge on the River Kwai, The (1957), Rain Man (1988), Seven (Se7en) (1995), Barcelona (1994),\n", 443 | "\n", 444 | "Nearest to Once Upon a Time... When We Were Colored (1995): Towering Inferno, The (1974), Great Day in Harlem, A (1994), Three Wishes (1995), Evita (1996), Dances with Wolves (1990), Parenthood (1989), Dream for an Insomniac (1996), Daddy Long Legs (1919),\n", 445 | "\n", 446 | "Nearest to Seven (Se7en) (1995): King Kong Lives (1986), Better Than Chocolate (1999), Alien Nation (1988), Balto (1995), Five Easy Pieces (1970), Manchurian Candidate, The (1962), Patch Adams (1998), Crime and Punishment in Suburbia (2000),\n", 447 | "\n", 448 | "Nearest to Confessional, The (Le Confessionnal) (1995): Shadow Conspiracy (1997), Highlander: Endgame (2000), Mumford (1999), Metroland (1997), Muppets Take Manhattan, The (1984), Children of the Corn III (1994), Robin Hood: Men in Tights (1993), Moll Flanders (1996),\n", 449 | "\n", 450 | "Nearest to Kids of the Round Table (1995): Live and Let Die (1973), Carnosaur 3: Primal Species (1996), Holy Smoke (1999), Messenger: The Story of Joan of Arc, The (1999), Haunted Honeymoon (1986), Days of Thunder (1990), Dreamlife of Angels, The (La Vie r�v�e des anges) (1998), Get Shorty (1995),\n", 451 | "\n", 452 | "Nearest to Usual Suspects, The (1995): Perils of Pauline, The (1947), Crimes and Misdemeanors (1989), King in New York, A (1957), Patton (1970), Firestarter (1984), Beans of Egypt, Maine, The (1994), Love Is the Devil (1998), Jungle Book, The (1967),\n", 453 | "\n", 454 | "Nearest to Now and Then (1995): Hamlet (2000), 8 1/2 Women (1999), Shaggy D.A., The (1976), Leaving Las Vegas (1995), Ballad of Narayama, The (Narayama Bushiko) (1982), Couch in New York, A (1996), They Might Be Giants (1971), Aladdin (1992),\n", 455 | "\n", 456 | "Nearest to Restoration (1995): Ulysses (Ulisse) (1954), Operation Condor (Feiying gaiwak) (1990), Carrie (1976), Santa Fe Trail (1940), Sanjuro (1962), Steal This Movie! (2000), Blue Sky (1994), Golden Child, The (1986),\n", 457 | "\n", 458 | "Nearest to Angels and Insects (1995): Jeremiah Johnson (1972), Foreign Correspondent (1940), Jason's Lyric (1994), Giant (1956), Slaves to the Underground (1997), Jackie Chan's First Strike (1996), Wilde (1997), Heaven's Burning (1997),\n", 459 | "\n", 460 | "Nearest to American President, The (1995): Drop Dead Fred (1991), Monster, The (Il Mostro) (1994), Star Wars: Episode VI - Return of the Jedi (1983), I'll Be Home For Christmas (1998), Mister Roberts (1955), Class Reunion (1982), House on Haunted Hill (1958), Back to the Future Part II (1989),\n", 461 | "\n", 462 | "Nearest to Two Bits (1995): Maybe, Maybe Not (Bewegte Mann, Der) (1994), Poison Ivy (1992), Crimson Tide (1995), Tarantula (1955), What Ever Happened to Baby Jane? (1962), Golden Voyage of Sinbad, The (1974), Afterglow (1997), Adventures of Rocky and Bullwinkle, The (2000),\n", 463 | "\n", 464 | "Average loss at step 12000 : 0.485157744862\n", 465 | "Average loss at step 14000 : 0.444180089496\n", 466 | "Average loss at step 16000 : 0.411198226966\n", 467 | "Average loss at step 18000 : 0.389385614231\n", 468 | "Average loss at step 20000 : 0.375010608561\n", 469 | "Nearest to Mis�rables, Les (1995): Single Girl, A (La Fille Seule) (1995), Jude (1996), Oliver! (1968), When the Cats Away (Chacun cherche son chat) (1996), Dear Diary (Caro Diario) (1994), Soapdish (1991), And God Created Woman (1988), Bicentennial Man (1999),\n", 470 | "\n", 471 | "Nearest to Money Train (1995): Edge of Seventeen (1998), Star Wars: Episode V - The Empire Strikes Back (1980), Carrie (1976), Man on the Moon (1999), Audrey Rose (1977), Perfect Candidate, A (1996), Carnosaur (1993), Freedom for Us (� nous la libert� ) (1931),\n", 472 | "\n", 473 | "Nearest to Kicking and Screaming (1995): Forrest Gump (1994), M (1931), Urbania (2000), Careful (1992), Mansfield Park (1999), Emperor and the Assassin, The (Jing ke ci qin wang) (1999), Rocky Horror Picture Show, The (1975), Mummy's Hand, The (1940),\n", 474 | "\n", 475 | "Nearest to Postino, Il (The Postman) (1994): Secret Agent, The (1996), Cat People (1982), 8 Heads in a Duffel Bag (1997), Kiss of Death (1995), Devil's Advocate, The (1997), Things Change (1988), Butterfly (La Lengua de las Mariposas) (2000), Shadowlands (1993),\n", 476 | "\n", 477 | "Nearest to Four Rooms (1995): Indian in the Cupboard, The (1995), Mr. Saturday Night (1992), I'll Be Home For Christmas (1998), Great White Hype, The (1996), Juror, The (1996), Alice and Martin (Alice et Martin) (1998), Space Cowboys (2000), Bambi (1942),\n", 478 | "\n", 479 | "Nearest to Balto (1995): Pyromaniac's Love Story, A (1995), Lifeboat (1944), Third Miracle, The (1999), Rain Man (1988), Bridge on the River Kwai, The (1957), Shadow, The (1994), Seven (Se7en) (1995), Barcelona (1994),\n", 480 | "\n", 481 | "Nearest to Once Upon a Time... When We Were Colored (1995): Towering Inferno, The (1974), Daddy Long Legs (1919), Great Day in Harlem, A (1994), Dances with Wolves (1990), Three Wishes (1995), Dream for an Insomniac (1996), Parenthood (1989), Evita (1996),\n", 482 | "\n", 483 | "Nearest to Seven (Se7en) (1995): King Kong Lives (1986), Better Than Chocolate (1999), Balto (1995), Alien Nation (1988), Patch Adams (1998), Five Easy Pieces (1970), Manchurian Candidate, The (1962), Crime and Punishment in Suburbia (2000),\n", 484 | "\n", 485 | "Nearest to Confessional, The (Le Confessionnal) (1995): Highlander: Endgame (2000), Shadow Conspiracy (1997), Mumford (1999), Metroland (1997), Chain Reaction (1996), Children of the Corn III (1994), Muppets Take Manhattan, The (1984), Moll Flanders (1996),\n", 486 | "\n", 487 | "Nearest to Kids of the Round Table (1995): Live and Let Die (1973), Messenger: The Story of Joan of Arc, The (1999), Holy Smoke (1999), Carnosaur 3: Primal Species (1996), Haunted Honeymoon (1986), Hard Target (1993), Dreamlife of Angels, The (La Vie r�v�e des anges) (1998), Brother from Another Planet, The (1984),\n", 488 | "\n", 489 | "Nearest to Usual Suspects, The (1995): Perils of Pauline, The (1947), Crimes and Misdemeanors (1989), King in New York, A (1957), Patton (1970), Beans of Egypt, Maine, The (1994), Love Is the Devil (1998), Firestarter (1984), Othello (1995),\n", 490 | "\n", 491 | "Nearest to Now and Then (1995): Hamlet (2000), 8 1/2 Women (1999), Shaggy D.A., The (1976), Leaving Las Vegas (1995), Ballad of Narayama, The (Narayama Bushiko) (1982), Couch in New York, A (1996), I Can't Sleep (J'ai pas sommeil) (1994), Postman, The (1997),\n", 492 | "\n", 493 | "Nearest to Restoration (1995): Ulysses (Ulisse) (1954), Operation Condor (Feiying gaiwak) (1990), Carrie (1976), Santa Fe Trail (1940), Sanjuro (1962), Golden Child, The (1986), Blue Sky (1994), Steal This Movie! (2000),\n", 494 | "\n", 495 | "Nearest to Angels and Insects (1995): Jeremiah Johnson (1972), Foreign Correspondent (1940), Jason's Lyric (1994), Giant (1956), Jackie Chan's First Strike (1996), Slaves to the Underground (1997), Heaven's Burning (1997), Wilde (1997),\n", 496 | "\n", 497 | "Nearest to American President, The (1995): Drop Dead Fred (1991), I'll Be Home For Christmas (1998), Monster, The (Il Mostro) (1994), Star Wars: Episode VI - Return of the Jedi (1983), Mister Roberts (1955), And God Created Woman (1988), Class Reunion (1982), Conversation, The (1974),\n", 498 | "\n", 499 | "Nearest to Two Bits (1995): Maybe, Maybe Not (Bewegte Mann, Der) (1994), Poison Ivy (1992), Crimson Tide (1995), Tarantula (1955), Afterglow (1997), Faraway, So Close (In Weiter Ferne, So Nah!) (1993), What Ever Happened to Baby Jane? (1962), Crocodile Dundee II (1988),\n", 500 | "\n", 501 | "Average loss at step 22000 : 0.360277223416\n", 502 | "Average loss at step 24000 : 0.352322299626\n", 503 | "Average loss at step 26000 : 0.344590137996\n", 504 | "Average loss at step 28000 : 0.3385856851\n", 505 | "Average loss at step 30000 : 0.331178202249\n", 506 | "Nearest to Mis�rables, Les (1995): Single Girl, A (La Fille Seule) (1995), Dear Diary (Caro Diario) (1994), Jude (1996), Oliver! (1968), When the Cats Away (Chacun cherche son chat) (1996), Under Siege (1992), And God Created Woman (1988), Soapdish (1991),\n", 507 | "\n", 508 | "Nearest to Money Train (1995): Edge of Seventeen (1998), Star Wars: Episode V - The Empire Strikes Back (1980), Carrie (1976), Man on the Moon (1999), Audrey Rose (1977), Carnosaur (1993), Perfect Candidate, A (1996), Get Shorty (1995),\n", 509 | "\n", 510 | "Nearest to Kicking and Screaming (1995): Forrest Gump (1994), M (1931), Urbania (2000), Careful (1992), Mansfield Park (1999), Emperor and the Assassin, The (Jing ke ci qin wang) (1999), Rocky Horror Picture Show, The (1975), Mummy's Hand, The (1940),\n", 511 | "\n", 512 | "Nearest to Postino, Il (The Postman) (1994): Secret Agent, The (1996), 8 Heads in a Duffel Bag (1997), Cat People (1982), Devil's Advocate, The (1997), Kiss of Death (1995), Things Change (1988), Shadowlands (1993), Butterfly (La Lengua de las Mariposas) (2000),\n", 513 | "\n", 514 | "Nearest to Four Rooms (1995): Indian in the Cupboard, The (1995), Mr. Saturday Night (1992), Alice and Martin (Alice et Martin) (1998), I'll Be Home For Christmas (1998), Great White Hype, The (1996), Juror, The (1996), Space Cowboys (2000), Bambi (1942),\n", 515 | "\n", 516 | "Nearest to Balto (1995): Pyromaniac's Love Story, A (1995), Lifeboat (1944), Third Miracle, The (1999), Rain Man (1988), Shadow, The (1994), Bridge on the River Kwai, The (1957), Seven (Se7en) (1995), Barcelona (1994),\n", 517 | "\n", 518 | "Nearest to Once Upon a Time... When We Were Colored (1995): Daddy Long Legs (1919), Dances with Wolves (1990), Towering Inferno, The (1974), Great Day in Harlem, A (1994), Dream for an Insomniac (1996), Parenthood (1989), Three Wishes (1995), Phantasm II (1988),\n", 519 | "\n", 520 | "Nearest to Seven (Se7en) (1995): King Kong Lives (1986), Better Than Chocolate (1999), Balto (1995), Alien Nation (1988), Five Easy Pieces (1970), Patch Adams (1998), Crime and Punishment in Suburbia (2000), Manchurian Candidate, The (1962),\n", 521 | "\n", 522 | "Nearest to Confessional, The (Le Confessionnal) (1995): Highlander: Endgame (2000), Mumford (1999), Shadow Conspiracy (1997), Metroland (1997), Chain Reaction (1996), Moll Flanders (1996), Muppets Take Manhattan, The (1984), Children of the Corn III (1994),\n", 523 | "\n", 524 | "Nearest to Kids of the Round Table (1995): Live and Let Die (1973), Carnosaur 3: Primal Species (1996), Messenger: The Story of Joan of Arc, The (1999), Holy Smoke (1999), Dreamlife of Angels, The (La Vie r�v�e des anges) (1998), Haunted Honeymoon (1986), Hard Target (1993), Brother from Another Planet, The (1984),\n", 525 | "\n", 526 | "Nearest to Usual Suspects, The (1995): Perils of Pauline, The (1947), Crimes and Misdemeanors (1989), King in New York, A (1957), Patton (1970), Beans of Egypt, Maine, The (1994), Love Is the Devil (1998), Firestarter (1984), Othello (1995),\n", 527 | "\n", 528 | "Nearest to Now and Then (1995): Hamlet (2000), 8 1/2 Women (1999), Shaggy D.A., The (1976), Leaving Las Vegas (1995), Ballad of Narayama, The (Narayama Bushiko) (1982), I Can't Sleep (J'ai pas sommeil) (1994), Postman, The (1997), Aladdin (1992),\n", 529 | "\n", 530 | "Nearest to Restoration (1995): Ulysses (Ulisse) (1954), Operation Condor (Feiying gaiwak) (1990), Santa Fe Trail (1940), Carrie (1976), Sanjuro (1962), Blue Sky (1994), Golden Child, The (1986), Steal This Movie! (2000),\n", 531 | "\n", 532 | "Nearest to Angels and Insects (1995): Foreign Correspondent (1940), Jason's Lyric (1994), Jeremiah Johnson (1972), Giant (1956), Jackie Chan's First Strike (1996), Heaven's Burning (1997), Slaves to the Underground (1997), Wilde (1997),\n", 533 | "\n", 534 | "Nearest to American President, The (1995): Drop Dead Fred (1991), I'll Be Home For Christmas (1998), Monster, The (Il Mostro) (1994), Mister Roberts (1955), Star Wars: Episode VI - Return of the Jedi (1983), And God Created Woman (1988), Conversation, The (1974), Class Reunion (1982),\n", 535 | "\n", 536 | "Nearest to Two Bits (1995): Maybe, Maybe Not (Bewegte Mann, Der) (1994), Poison Ivy (1992), Crimson Tide (1995), Afterglow (1997), Faraway, So Close (In Weiter Ferne, So Nah!) (1993), Crocodile Dundee II (1988), Timecop (1994), Tarantula (1955),\n", 537 | "\n", 538 | "Average loss at step 32000 : 0.324940440048\n", 539 | "Average loss at step 34000 : 0.32196463858\n", 540 | "Average loss at step 36000 : 0.319042801075\n", 541 | "Average loss at step 38000 : 0.319743265688\n", 542 | "Average loss at step 40000 : 0.313547527481\n", 543 | "Nearest to Mis�rables, Les (1995): Single Girl, A (La Fille Seule) (1995), Jude (1996), Dear Diary (Caro Diario) (1994), Oliver! (1968), When the Cats Away (Chacun cherche son chat) (1996), And God Created Woman (1988), Under Siege (1992), Hellraiser (1987),\n", 544 | "\n", 545 | "Nearest to Money Train (1995): Edge of Seventeen (1998), Star Wars: Episode V - The Empire Strikes Back (1980), Carrie (1976), Man on the Moon (1999), Audrey Rose (1977), Carnosaur (1993), Perfect Candidate, A (1996), Get Shorty (1995),\n", 546 | "\n", 547 | "Nearest to Kicking and Screaming (1995): Forrest Gump (1994), M (1931), Urbania (2000), Careful (1992), Mansfield Park (1999), Emperor and the Assassin, The (Jing ke ci qin wang) (1999), Rocky Horror Picture Show, The (1975), Mummy's Hand, The (1940),\n", 548 | "\n", 549 | "Nearest to Postino, Il (The Postman) (1994): Secret Agent, The (1996), 8 Heads in a Duffel Bag (1997), Devil's Advocate, The (1997), Cat People (1982), Kiss of Death (1995), Grumpier Old Men (1995), Shadowlands (1993), Butterfly (La Lengua de las Mariposas) (2000),\n", 550 | "\n", 551 | "Nearest to Four Rooms (1995): Indian in the Cupboard, The (1995), Mr. Saturday Night (1992), Alice and Martin (Alice et Martin) (1998), I'll Be Home For Christmas (1998), Great White Hype, The (1996), Juror, The (1996), Space Cowboys (2000), Bambi (1942),\n", 552 | "\n", 553 | "Nearest to Balto (1995): Pyromaniac's Love Story, A (1995), Lifeboat (1944), Third Miracle, The (1999), Rain Man (1988), Seven (Se7en) (1995), Shadow, The (1994), Bridge on the River Kwai, The (1957), Barcelona (1994),\n", 554 | "\n", 555 | "Nearest to Once Upon a Time... When We Were Colored (1995): Daddy Long Legs (1919), Towering Inferno, The (1974), Dances with Wolves (1990), Dream for an Insomniac (1996), Great Day in Harlem, A (1994), Parenthood (1989), Evita (1996), Phantasm II (1988),\n", 556 | "\n", 557 | "Nearest to Seven (Se7en) (1995): King Kong Lives (1986), Better Than Chocolate (1999), Balto (1995), Five Easy Pieces (1970), Crime and Punishment in Suburbia (2000), Alien Nation (1988), Patch Adams (1998), Manchurian Candidate, The (1962),\n", 558 | "\n", 559 | "Nearest to Confessional, The (Le Confessionnal) (1995): Highlander: Endgame (2000), Mumford (1999), Metroland (1997), Shadow Conspiracy (1997), Chain Reaction (1996), Moll Flanders (1996), Muppets Take Manhattan, The (1984), Thin Red Line, The (1998),\n", 560 | "\n", 561 | "Nearest to Kids of the Round Table (1995): Live and Let Die (1973), Holy Smoke (1999), Messenger: The Story of Joan of Arc, The (1999), Carnosaur 3: Primal Species (1996), Dreamlife of Angels, The (La Vie r�v�e des anges) (1998), Hard Target (1993), Brother from Another Planet, The (1984), Haunted Honeymoon (1986),\n", 562 | "\n", 563 | "Nearest to Usual Suspects, The (1995): Perils of Pauline, The (1947), Crimes and Misdemeanors (1989), King in New York, A (1957), Patton (1970), Beans of Egypt, Maine, The (1994), Othello (1995), Love Is the Devil (1998), Jungle Book, The (1967),\n", 564 | "\n", 565 | "Nearest to Now and Then (1995): Hamlet (2000), 8 1/2 Women (1999), Shaggy D.A., The (1976), Leaving Las Vegas (1995), Ballad of Narayama, The (Narayama Bushiko) (1982), Postman, The (1997), I Can't Sleep (J'ai pas sommeil) (1994), Couch in New York, A (1996),\n", 566 | "\n", 567 | "Nearest to Restoration (1995): Operation Condor (Feiying gaiwak) (1990), Ulysses (Ulisse) (1954), Santa Fe Trail (1940), Carrie (1976), Sanjuro (1962), Blue Sky (1994), Golden Child, The (1986), Alarmist, The (1997),\n", 568 | "\n", 569 | "Nearest to Angels and Insects (1995): Foreign Correspondent (1940), Jason's Lyric (1994), Jeremiah Johnson (1972), Giant (1956), Jackie Chan's First Strike (1996), Heaven's Burning (1997), Wilde (1997), Slaves to the Underground (1997),\n", 570 | "\n", 571 | "Nearest to American President, The (1995): Drop Dead Fred (1991), I'll Be Home For Christmas (1998), Mister Roberts (1955), Monster, The (Il Mostro) (1994), Star Wars: Episode VI - Return of the Jedi (1983), And God Created Woman (1988), Class Reunion (1982), Conversation, The (1974),\n", 572 | "\n", 573 | "Nearest to Two Bits (1995): Maybe, Maybe Not (Bewegte Mann, Der) (1994), Poison Ivy (1992), Crimson Tide (1995), Afterglow (1997), Faraway, So Close (In Weiter Ferne, So Nah!) (1993), Timecop (1994), Tarantula (1955), Madness of King George, The (1994),\n", 574 | "\n", 575 | "Average loss at step 42000 : 0.310328738891\n", 576 | "Average loss at step 44000 : 0.307768616728\n", 577 | "Average loss at step 46000 : 0.307188585833\n", 578 | "Average loss at step 48000 : 0.307109602977\n", 579 | "Average loss at step 50000 : 0.305754399396\n", 580 | "Nearest to Mis�rables, Les (1995): Single Girl, A (La Fille Seule) (1995), Dear Diary (Caro Diario) (1994), Jude (1996), When the Cats Away (Chacun cherche son chat) (1996), Oliver! (1968), And God Created Woman (1988), Bicentennial Man (1999), Under Siege (1992),\n", 581 | "\n", 582 | "Nearest to Money Train (1995): Edge of Seventeen (1998), Star Wars: Episode V - The Empire Strikes Back (1980), Carrie (1976), Man on the Moon (1999), Audrey Rose (1977), Carnosaur (1993), Perfect Candidate, A (1996), Get Shorty (1995),\n", 583 | "\n", 584 | "Nearest to Kicking and Screaming (1995): Forrest Gump (1994), M (1931), Urbania (2000), Careful (1992), Mansfield Park (1999), Emperor and the Assassin, The (Jing ke ci qin wang) (1999), Mummy's Hand, The (1940), Rocky Horror Picture Show, The (1975),\n", 585 | "\n", 586 | "Nearest to Postino, Il (The Postman) (1994): Secret Agent, The (1996), Kiss of Death (1995), Devil's Advocate, The (1997), 8 Heads in a Duffel Bag (1997), Cat People (1982), Things Change (1988), Shadowlands (1993), Grumpier Old Men (1995),\n", 587 | "\n", 588 | "Nearest to Four Rooms (1995): Mr. Saturday Night (1992), Indian in the Cupboard, The (1995), I'll Be Home For Christmas (1998), Alice and Martin (Alice et Martin) (1998), Great White Hype, The (1996), Juror, The (1996), Space Cowboys (2000), Bambi (1942),\n", 589 | "\n", 590 | "Nearest to Balto (1995): Pyromaniac's Love Story, A (1995), Lifeboat (1944), Third Miracle, The (1999), Rain Man (1988), Seven (Se7en) (1995), Shadow, The (1994), Bridge on the River Kwai, The (1957), Barcelona (1994),\n", 591 | "\n", 592 | "Nearest to Once Upon a Time... When We Were Colored (1995): Daddy Long Legs (1919), Dances with Wolves (1990), Towering Inferno, The (1974), Dream for an Insomniac (1996), Great Day in Harlem, A (1994), Parenthood (1989), Phantasm II (1988), Three Wishes (1995),\n", 593 | "\n", 594 | "Nearest to Seven (Se7en) (1995): King Kong Lives (1986), Better Than Chocolate (1999), Balto (1995), Five Easy Pieces (1970), Patch Adams (1998), Alien Nation (1988), Manchurian Candidate, The (1962), Crime and Punishment in Suburbia (2000),\n", 595 | "\n", 596 | "Nearest to Confessional, The (Le Confessionnal) (1995): Mumford (1999), Highlander: Endgame (2000), Metroland (1997), Shadow Conspiracy (1997), Chain Reaction (1996), Moll Flanders (1996), Thin Red Line, The (1998), Muppets Take Manhattan, The (1984),\n", 597 | "\n", 598 | "Nearest to Kids of the Round Table (1995): Live and Let Die (1973), Messenger: The Story of Joan of Arc, The (1999), Carnosaur 3: Primal Species (1996), Holy Smoke (1999), Dreamlife of Angels, The (La Vie r�v�e des anges) (1998), Hard Target (1993), Brother from Another Planet, The (1984), Without Limits (1998),\n", 599 | "\n", 600 | "Nearest to Usual Suspects, The (1995): Perils of Pauline, The (1947), Crimes and Misdemeanors (1989), King in New York, A (1957), Patton (1970), Love Is the Devil (1998), Othello (1995), Beans of Egypt, Maine, The (1994), Firestarter (1984),\n", 601 | "\n", 602 | "Nearest to Now and Then (1995): Hamlet (2000), 8 1/2 Women (1999), Shaggy D.A., The (1976), Ballad of Narayama, The (Narayama Bushiko) (1982), Leaving Las Vegas (1995), Postman, The (1997), I Can't Sleep (J'ai pas sommeil) (1994), Couch in New York, A (1996),\n", 603 | "\n", 604 | "Nearest to Restoration (1995): Operation Condor (Feiying gaiwak) (1990), Ulysses (Ulisse) (1954), Santa Fe Trail (1940), Carrie (1976), Sanjuro (1962), Blue Sky (1994), Golden Child, The (1986), Alarmist, The (1997),\n", 605 | "\n", 606 | "Nearest to Angels and Insects (1995): Foreign Correspondent (1940), Jason's Lyric (1994), Jeremiah Johnson (1972), Giant (1956), Jackie Chan's First Strike (1996), Heaven's Burning (1997), Wilde (1997), Slaves to the Underground (1997),\n", 607 | "\n", 608 | "Nearest to American President, The (1995): Drop Dead Fred (1991), I'll Be Home For Christmas (1998), Mister Roberts (1955), Star Wars: Episode VI - Return of the Jedi (1983), Monster, The (Il Mostro) (1994), And God Created Woman (1988), Conversation, The (1974), Class Reunion (1982),\n", 609 | "\n", 610 | "Nearest to Two Bits (1995): Maybe, Maybe Not (Bewegte Mann, Der) (1994), Poison Ivy (1992), Crimson Tide (1995), Afterglow (1997), Faraway, So Close (In Weiter Ferne, So Nah!) (1993), Timecop (1994), Madness of King George, The (1994), Crocodile Dundee II (1988),\n", 611 | "\n", 612 | "Average loss at step 52000 : 0.30643443419\n", 613 | "Average loss at step 54000 : 0.301461914234\n", 614 | "Average loss at step 56000 : 0.305708269566\n", 615 | "Average loss at step 58000 : 0.299068108898\n", 616 | "Average loss at step 60000 : 0.299301321991\n", 617 | "Nearest to Mis�rables, Les (1995): Single Girl, A (La Fille Seule) (1995), Dear Diary (Caro Diario) (1994), Jude (1996), When the Cats Away (Chacun cherche son chat) (1996), Oliver! (1968), And God Created Woman (1988), Hellraiser (1987), Under Siege (1992),\n", 618 | "\n", 619 | "Nearest to Money Train (1995): Edge of Seventeen (1998), Star Wars: Episode V - The Empire Strikes Back (1980), Carrie (1976), Man on the Moon (1999), Audrey Rose (1977), Carnosaur (1993), Perfect Candidate, A (1996), Sabrina (1954),\n", 620 | "\n", 621 | "Nearest to Kicking and Screaming (1995): Forrest Gump (1994), Urbania (2000), Careful (1992), M (1931), Mansfield Park (1999), Emperor and the Assassin, The (Jing ke ci qin wang) (1999), Mummy's Hand, The (1940), Rocky Horror Picture Show, The (1975),\n", 622 | "\n", 623 | "Nearest to Postino, Il (The Postman) (1994): Secret Agent, The (1996), 8 Heads in a Duffel Bag (1997), Kiss of Death (1995), Devil's Advocate, The (1997), Cat People (1982), Shadowlands (1993), Things Change (1988), Grumpier Old Men (1995),\n", 624 | "\n", 625 | "Nearest to Four Rooms (1995): Indian in the Cupboard, The (1995), Mr. Saturday Night (1992), I'll Be Home For Christmas (1998), Alice and Martin (Alice et Martin) (1998), Great White Hype, The (1996), Juror, The (1996), Space Cowboys (2000), Bambi (1942),\n", 626 | "\n", 627 | "Nearest to Balto (1995): Pyromaniac's Love Story, A (1995), Lifeboat (1944), Third Miracle, The (1999), Rain Man (1988), Seven (Se7en) (1995), Shadow, The (1994), Bridge on the River Kwai, The (1957), Barcelona (1994),\n", 628 | "\n", 629 | "Nearest to Once Upon a Time... When We Were Colored (1995): Daddy Long Legs (1919), Dances with Wolves (1990), Dream for an Insomniac (1996), Towering Inferno, The (1974), Great Day in Harlem, A (1994), Phantasm II (1988), Parenthood (1989), Three Wishes (1995),\n", 630 | "\n", 631 | "Nearest to Seven (Se7en) (1995): King Kong Lives (1986), Better Than Chocolate (1999), Balto (1995), Five Easy Pieces (1970), Patch Adams (1998), Crime and Punishment in Suburbia (2000), Alien Nation (1988), Manchurian Candidate, The (1962),\n", 632 | "\n", 633 | "Nearest to Confessional, The (Le Confessionnal) (1995): Mumford (1999), Highlander: Endgame (2000), Metroland (1997), Shadow Conspiracy (1997), Chain Reaction (1996), Moll Flanders (1996), Thin Red Line, The (1998), Muppets Take Manhattan, The (1984),\n", 634 | "\n", 635 | "Nearest to Kids of the Round Table (1995): Live and Let Die (1973), Messenger: The Story of Joan of Arc, The (1999), Carnosaur 3: Primal Species (1996), Holy Smoke (1999), Hard Target (1993), Dreamlife of Angels, The (La Vie r�v�e des anges) (1998), Brother from Another Planet, The (1984), Without Limits (1998),\n", 636 | "\n", 637 | "Nearest to Usual Suspects, The (1995): Perils of Pauline, The (1947), Crimes and Misdemeanors (1989), King in New York, A (1957), Patton (1970), Othello (1995), Love Is the Devil (1998), Beans of Egypt, Maine, The (1994), Firestarter (1984),\n", 638 | "\n", 639 | "Nearest to Now and Then (1995): Hamlet (2000), 8 1/2 Women (1999), Shaggy D.A., The (1976), Leaving Las Vegas (1995), Ballad of Narayama, The (Narayama Bushiko) (1982), Postman, The (1997), I Can't Sleep (J'ai pas sommeil) (1994), Soapdish (1991),\n", 640 | "\n", 641 | "Nearest to Restoration (1995): Operation Condor (Feiying gaiwak) (1990), Ulysses (Ulisse) (1954), Santa Fe Trail (1940), Carrie (1976), Sanjuro (1962), Golden Child, The (1986), Blue Sky (1994), Alarmist, The (1997),\n", 642 | "\n", 643 | "Nearest to Angels and Insects (1995): Foreign Correspondent (1940), Jason's Lyric (1994), Jeremiah Johnson (1972), Giant (1956), Heaven's Burning (1997), Jackie Chan's First Strike (1996), Wilde (1997), Slaves to the Underground (1997),\n", 644 | "\n", 645 | "Nearest to American President, The (1995): Drop Dead Fred (1991), I'll Be Home For Christmas (1998), Mister Roberts (1955), Star Wars: Episode VI - Return of the Jedi (1983), And God Created Woman (1988), Monster, The (Il Mostro) (1994), Conversation, The (1974), Class Reunion (1982),\n", 646 | "\n", 647 | "Nearest to Two Bits (1995): Maybe, Maybe Not (Bewegte Mann, Der) (1994), Poison Ivy (1992), Crimson Tide (1995), Afterglow (1997), Timecop (1994), Faraway, So Close (In Weiter Ferne, So Nah!) (1993), Ideal Husband, An (1999), Crocodile Dundee II (1988),\n", 648 | "\n", 649 | "Average loss at step 62000 : 0.295689028909\n", 650 | "Average loss at step 64000 : 0.298611372618\n", 651 | "Average loss at step 66000 : 0.294667745505\n", 652 | "Average loss at step 68000 : 0.29473749526\n", 653 | "Average loss at step 70000 : 0.2942708421\n", 654 | "Nearest to Mis�rables, Les (1995): Single Girl, A (La Fille Seule) (1995), Dear Diary (Caro Diario) (1994), Jude (1996), When the Cats Away (Chacun cherche son chat) (1996), And God Created Woman (1988), Oliver! (1968), Hellraiser (1987), Under Siege (1992),\n", 655 | "\n", 656 | "Nearest to Money Train (1995): Edge of Seventeen (1998), Star Wars: Episode V - The Empire Strikes Back (1980), Carrie (1976), Man on the Moon (1999), Audrey Rose (1977), Carnosaur (1993), Perfect Candidate, A (1996), Penny Serenade (1941),\n", 657 | "\n", 658 | "Nearest to Kicking and Screaming (1995): Forrest Gump (1994), Urbania (2000), Careful (1992), M (1931), Emperor and the Assassin, The (Jing ke ci qin wang) (1999), Mansfield Park (1999), Mummy's Hand, The (1940), Rocky Horror Picture Show, The (1975),\n", 659 | "\n", 660 | "Nearest to Postino, Il (The Postman) (1994): Secret Agent, The (1996), 8 Heads in a Duffel Bag (1997), Kiss of Death (1995), Devil's Advocate, The (1997), Cat People (1982), Things Change (1988), Shadowlands (1993), Grumpier Old Men (1995),\n", 661 | "\n", 662 | "Nearest to Four Rooms (1995): Mr. Saturday Night (1992), Indian in the Cupboard, The (1995), Alice and Martin (Alice et Martin) (1998), I'll Be Home For Christmas (1998), Great White Hype, The (1996), Juror, The (1996), Space Cowboys (2000), Bambi (1942),\n", 663 | "\n", 664 | "Nearest to Balto (1995): Pyromaniac's Love Story, A (1995), Lifeboat (1944), Third Miracle, The (1999), Rain Man (1988), Seven (Se7en) (1995), Shadow, The (1994), Bridge on the River Kwai, The (1957), Cape Fear (1991),\n", 665 | "\n", 666 | "Nearest to Once Upon a Time... When We Were Colored (1995): Daddy Long Legs (1919), Dances with Wolves (1990), Towering Inferno, The (1974), Dream for an Insomniac (1996), Great Day in Harlem, A (1994), Phantasm II (1988), Parenthood (1989), Highlander III: The Sorcerer (1994),\n", 667 | "\n", 668 | "Nearest to Seven (Se7en) (1995): King Kong Lives (1986), Better Than Chocolate (1999), Balto (1995), Five Easy Pieces (1970), Alien Nation (1988), Patch Adams (1998), Crime and Punishment in Suburbia (2000), Manchurian Candidate, The (1962),\n", 669 | "\n", 670 | "Nearest to Confessional, The (Le Confessionnal) (1995): Highlander: Endgame (2000), Mumford (1999), Metroland (1997), Shadow Conspiracy (1997), Chain Reaction (1996), Moll Flanders (1996), Thin Red Line, The (1998), Bachelor, The (1999),\n", 671 | "\n", 672 | "Nearest to Kids of the Round Table (1995): Live and Let Die (1973), Messenger: The Story of Joan of Arc, The (1999), Holy Smoke (1999), Carnosaur 3: Primal Species (1996), Dreamlife of Angels, The (La Vie r�v�e des anges) (1998), Brother from Another Planet, The (1984), Hard Target (1993), Delicatessen (1991),\n", 673 | "\n", 674 | "Nearest to Usual Suspects, The (1995): Perils of Pauline, The (1947), Crimes and Misdemeanors (1989), King in New York, A (1957), Patton (1970), Othello (1995), Love Is the Devil (1998), Beans of Egypt, Maine, The (1994), Firestarter (1984),\n", 675 | "\n", 676 | "Nearest to Now and Then (1995): Hamlet (2000), 8 1/2 Women (1999), Shaggy D.A., The (1976), Leaving Las Vegas (1995), Ballad of Narayama, The (Narayama Bushiko) (1982), Postman, The (1997), I Can't Sleep (J'ai pas sommeil) (1994), Grand Day Out, A (1992),\n", 677 | "\n", 678 | "Nearest to Restoration (1995): Operation Condor (Feiying gaiwak) (1990), Santa Fe Trail (1940), Ulysses (Ulisse) (1954), Carrie (1976), Sanjuro (1962), Golden Child, The (1986), Blue Sky (1994), Alarmist, The (1997),\n", 679 | "\n", 680 | "Nearest to Angels and Insects (1995): Foreign Correspondent (1940), Jason's Lyric (1994), Jeremiah Johnson (1972), Giant (1956), Heaven's Burning (1997), Jackie Chan's First Strike (1996), Wilde (1997), Slaves to the Underground (1997),\n", 681 | "\n", 682 | "Nearest to American President, The (1995): Drop Dead Fred (1991), Mister Roberts (1955), I'll Be Home For Christmas (1998), And God Created Woman (1988), Star Wars: Episode VI - Return of the Jedi (1983), Monster, The (Il Mostro) (1994), Class Reunion (1982), Courage Under Fire (1996),\n", 683 | "\n", 684 | "Nearest to Two Bits (1995): Maybe, Maybe Not (Bewegte Mann, Der) (1994), Poison Ivy (1992), Crimson Tide (1995), Afterglow (1997), Timecop (1994), Ideal Husband, An (1999), Faraway, So Close (In Weiter Ferne, So Nah!) (1993), Madness of King George, The (1994),\n", 685 | "\n", 686 | "Average loss at step 72000 : 0.296071705973\n", 687 | "Average loss at step 74000 : 0.292196372101\n", 688 | "Average loss at step 76000 : 0.291653270395\n", 689 | "Average loss at step 78000 : 0.292075764045\n", 690 | "Average loss at step 80000 : 0.292625549449\n", 691 | "Nearest to Mis�rables, Les (1995): Single Girl, A (La Fille Seule) (1995), Dear Diary (Caro Diario) (1994), Jude (1996), When the Cats Away (Chacun cherche son chat) (1996), And God Created Woman (1988), Oliver! (1968), Hellraiser (1987), Bicentennial Man (1999),\n", 692 | "\n", 693 | "Nearest to Money Train (1995): Edge of Seventeen (1998), Carrie (1976), Star Wars: Episode V - The Empire Strikes Back (1980), Man on the Moon (1999), Audrey Rose (1977), Carnosaur (1993), Perfect Candidate, A (1996), Penny Serenade (1941),\n", 694 | "\n", 695 | "Nearest to Kicking and Screaming (1995): Forrest Gump (1994), Urbania (2000), M (1931), Careful (1992), Emperor and the Assassin, The (Jing ke ci qin wang) (1999), Mansfield Park (1999), Mummy's Hand, The (1940), Rocky Horror Picture Show, The (1975),\n", 696 | "\n", 697 | "Nearest to Postino, Il (The Postman) (1994): Secret Agent, The (1996), Kiss of Death (1995), Devil's Advocate, The (1997), 8 Heads in a Duffel Bag (1997), Things Change (1988), Cat People (1982), Grumpier Old Men (1995), Shadowlands (1993),\n", 698 | "\n", 699 | "Nearest to Four Rooms (1995): Mr. Saturday Night (1992), Indian in the Cupboard, The (1995), Alice and Martin (Alice et Martin) (1998), Juror, The (1996), Great White Hype, The (1996), I'll Be Home For Christmas (1998), Space Cowboys (2000), Bambi (1942),\n", 700 | "\n", 701 | "Nearest to Balto (1995): Pyromaniac's Love Story, A (1995), Lifeboat (1944), Third Miracle, The (1999), Seven (Se7en) (1995), Rain Man (1988), Shadow, The (1994), Barcelona (1994), Bridge on the River Kwai, The (1957),\n", 702 | "\n", 703 | "Nearest to Once Upon a Time... When We Were Colored (1995): Daddy Long Legs (1919), Dances with Wolves (1990), Dream for an Insomniac (1996), Towering Inferno, The (1974), Great Day in Harlem, A (1994), Phantasm II (1988), Highlander III: The Sorcerer (1994), Parenthood (1989),\n", 704 | "\n", 705 | "Nearest to Seven (Se7en) (1995): King Kong Lives (1986), Better Than Chocolate (1999), Balto (1995), Five Easy Pieces (1970), Patch Adams (1998), Alien Nation (1988), Crime and Punishment in Suburbia (2000), Manchurian Candidate, The (1962),\n", 706 | "\n", 707 | "Nearest to Confessional, The (Le Confessionnal) (1995): Mumford (1999), Highlander: Endgame (2000), Metroland (1997), Chain Reaction (1996), Shadow Conspiracy (1997), Moll Flanders (1996), Thin Red Line, The (1998), Bicentennial Man (1999),\n", 708 | "\n", 709 | "Nearest to Kids of the Round Table (1995): Live and Let Die (1973), Holy Smoke (1999), Messenger: The Story of Joan of Arc, The (1999), Carnosaur 3: Primal Species (1996), Dreamlife of Angels, The (La Vie r�v�e des anges) (1998), Hard Target (1993), Brother from Another Planet, The (1984), Without Limits (1998),\n", 710 | "\n", 711 | "Nearest to Usual Suspects, The (1995): Perils of Pauline, The (1947), Crimes and Misdemeanors (1989), King in New York, A (1957), Patton (1970), Othello (1995), Love Is the Devil (1998), Beans of Egypt, Maine, The (1994), Firestarter (1984),\n", 712 | "\n", 713 | "Nearest to Now and Then (1995): Hamlet (2000), 8 1/2 Women (1999), Shaggy D.A., The (1976), Leaving Las Vegas (1995), Ballad of Narayama, The (Narayama Bushiko) (1982), I Can't Sleep (J'ai pas sommeil) (1994), Postman, The (1997), Grand Day Out, A (1992),\n", 714 | "\n", 715 | "Nearest to Restoration (1995): Operation Condor (Feiying gaiwak) (1990), Ulysses (Ulisse) (1954), Santa Fe Trail (1940), Carrie (1976), Sanjuro (1962), Blue Sky (1994), Golden Child, The (1986), Alarmist, The (1997),\n", 716 | "\n", 717 | "Nearest to Angels and Insects (1995): Foreign Correspondent (1940), Jason's Lyric (1994), Giant (1956), Jeremiah Johnson (1972), Heaven's Burning (1997), Wilde (1997), Jackie Chan's First Strike (1996), Heavy (1995),\n", 718 | "\n", 719 | "Nearest to American President, The (1995): Drop Dead Fred (1991), Mister Roberts (1955), I'll Be Home For Christmas (1998), And God Created Woman (1988), Star Wars: Episode VI - Return of the Jedi (1983), Monster, The (Il Mostro) (1994), Conversation, The (1974), Courage Under Fire (1996),\n", 720 | "\n", 721 | "Nearest to Two Bits (1995): Maybe, Maybe Not (Bewegte Mann, Der) (1994), Poison Ivy (1992), Crimson Tide (1995), Afterglow (1997), Faraway, So Close (In Weiter Ferne, So Nah!) (1993), Madness of King George, The (1994), Crocodile Dundee II (1988), Timecop (1994),\n", 722 | "\n", 723 | "Average loss at step 82000 : 0.294038546681\n", 724 | "Average loss at step 84000 : 0.293429126255\n", 725 | "Average loss at step 86000 : 0.292010341024\n", 726 | "Average loss at step 88000 : 0.294161101468\n", 727 | "Average loss at step 90000 : 0.291595101463\n", 728 | "Nearest to Mis�rables, Les (1995): Single Girl, A (La Fille Seule) (1995), Dear Diary (Caro Diario) (1994), Jude (1996), When the Cats Away (Chacun cherche son chat) (1996), And God Created Woman (1988), Oliver! (1968), Hellraiser (1987), Under Siege (1992),\n", 729 | "\n", 730 | "Nearest to Money Train (1995): Edge of Seventeen (1998), Carrie (1976), Star Wars: Episode V - The Empire Strikes Back (1980), Man on the Moon (1999), Audrey Rose (1977), Carnosaur (1993), Perfect Candidate, A (1996), Penny Serenade (1941),\n", 731 | "\n", 732 | "Nearest to Kicking and Screaming (1995): Forrest Gump (1994), Careful (1992), Urbania (2000), M (1931), Emperor and the Assassin, The (Jing ke ci qin wang) (1999), Mansfield Park (1999), Mummy's Hand, The (1940), Rocky Horror Picture Show, The (1975),\n", 733 | "\n", 734 | "Nearest to Postino, Il (The Postman) (1994): Secret Agent, The (1996), Kiss of Death (1995), 8 Heads in a Duffel Bag (1997), Devil's Advocate, The (1997), Cat People (1982), Grumpier Old Men (1995), Shadowlands (1993), Things Change (1988),\n", 735 | "\n", 736 | "Nearest to Four Rooms (1995): Mr. Saturday Night (1992), Indian in the Cupboard, The (1995), Alice and Martin (Alice et Martin) (1998), I'll Be Home For Christmas (1998), Juror, The (1996), Great White Hype, The (1996), Space Cowboys (2000), Bambi (1942),\n", 737 | "\n", 738 | "Nearest to Balto (1995): Pyromaniac's Love Story, A (1995), Lifeboat (1944), Third Miracle, The (1999), Rain Man (1988), Seven (Se7en) (1995), Shadow, The (1994), Bridge on the River Kwai, The (1957), Barcelona (1994),\n", 739 | "\n", 740 | "Nearest to Once Upon a Time... When We Were Colored (1995): Daddy Long Legs (1919), Dances with Wolves (1990), Dream for an Insomniac (1996), Towering Inferno, The (1974), Phantasm II (1988), Great Day in Harlem, A (1994), Highlander III: The Sorcerer (1994), Mad Max 2 (a.k.a. The Road Warrior) (1981),\n", 741 | "\n", 742 | "Nearest to Seven (Se7en) (1995): King Kong Lives (1986), Better Than Chocolate (1999), Balto (1995), Five Easy Pieces (1970), Crime and Punishment in Suburbia (2000), Alien Nation (1988), Patch Adams (1998), Manchurian Candidate, The (1962),\n", 743 | "\n", 744 | "Nearest to Confessional, The (Le Confessionnal) (1995): Mumford (1999), Highlander: Endgame (2000), Metroland (1997), Chain Reaction (1996), Shadow Conspiracy (1997), Moll Flanders (1996), Thin Red Line, The (1998), Picnic (1955),\n", 745 | "\n", 746 | "Nearest to Kids of the Round Table (1995): Live and Let Die (1973), Holy Smoke (1999), Messenger: The Story of Joan of Arc, The (1999), Carnosaur 3: Primal Species (1996), Hard Target (1993), Dreamlife of Angels, The (La Vie r�v�e des anges) (1998), Brother from Another Planet, The (1984), Without Limits (1998),\n", 747 | "\n", 748 | "Nearest to Usual Suspects, The (1995): Perils of Pauline, The (1947), Crimes and Misdemeanors (1989), King in New York, A (1957), Patton (1970), Othello (1995), Beans of Egypt, Maine, The (1994), Love Is the Devil (1998), Firestarter (1984),\n", 749 | "\n", 750 | "Nearest to Now and Then (1995): Hamlet (2000), 8 1/2 Women (1999), Shaggy D.A., The (1976), Leaving Las Vegas (1995), Ballad of Narayama, The (Narayama Bushiko) (1982), I Can't Sleep (J'ai pas sommeil) (1994), Postman, The (1997), Grand Day Out, A (1992),\n", 751 | "\n", 752 | "Nearest to Restoration (1995): Operation Condor (Feiying gaiwak) (1990), Ulysses (Ulisse) (1954), Santa Fe Trail (1940), Carrie (1976), Sanjuro (1962), Golden Child, The (1986), Blue Sky (1994), Alarmist, The (1997),\n", 753 | "\n", 754 | "Nearest to Angels and Insects (1995): Foreign Correspondent (1940), Jason's Lyric (1994), Giant (1956), Jeremiah Johnson (1972), Heaven's Burning (1997), Jackie Chan's First Strike (1996), Wilde (1997), Slaves to the Underground (1997),\n", 755 | "\n", 756 | "Nearest to American President, The (1995): Drop Dead Fred (1991), Mister Roberts (1955), I'll Be Home For Christmas (1998), And God Created Woman (1988), Star Wars: Episode VI - Return of the Jedi (1983), Monster, The (Il Mostro) (1994), Conversation, The (1974), Class Reunion (1982),\n", 757 | "\n", 758 | "Nearest to Two Bits (1995): Maybe, Maybe Not (Bewegte Mann, Der) (1994), Poison Ivy (1992), Crimson Tide (1995), Afterglow (1997), Timecop (1994), Crocodile Dundee II (1988), Faraway, So Close (In Weiter Ferne, So Nah!) (1993), Tarantula (1955),\n", 759 | "\n", 760 | "Average loss at step 92000 : 0.291640399961\n", 761 | "Average loss at step 94000 : 0.289748490643\n", 762 | "Average loss at step 96000 : 0.287572378013\n", 763 | "Average loss at step 98000 : 0.291018143699\n", 764 | "Average loss at step 100000 : 0.290500450999\n", 765 | "Nearest to Mis�rables, Les (1995): Single Girl, A (La Fille Seule) (1995), Dear Diary (Caro Diario) (1994), Jude (1996), When the Cats Away (Chacun cherche son chat) (1996), Oliver! (1968), And God Created Woman (1988), Hellraiser (1987), Under Siege (1992),\n", 766 | "\n", 767 | "Nearest to Money Train (1995): Edge of Seventeen (1998), Star Wars: Episode V - The Empire Strikes Back (1980), Carrie (1976), Man on the Moon (1999), Audrey Rose (1977), Carnosaur (1993), Perfect Candidate, A (1996), Get Shorty (1995),\n", 768 | "\n", 769 | "Nearest to Kicking and Screaming (1995): Forrest Gump (1994), Careful (1992), Urbania (2000), M (1931), Emperor and the Assassin, The (Jing ke ci qin wang) (1999), Mansfield Park (1999), Mummy's Hand, The (1940), Rocky Horror Picture Show, The (1975),\n", 770 | "\n", 771 | "Nearest to Postino, Il (The Postman) (1994): Secret Agent, The (1996), Kiss of Death (1995), Devil's Advocate, The (1997), 8 Heads in a Duffel Bag (1997), Cat People (1982), Grumpier Old Men (1995), Shadowlands (1993), Things Change (1988),\n", 772 | "\n", 773 | "Nearest to Four Rooms (1995): Mr. Saturday Night (1992), Indian in the Cupboard, The (1995), Alice and Martin (Alice et Martin) (1998), Juror, The (1996), Great White Hype, The (1996), I'll Be Home For Christmas (1998), Space Cowboys (2000), Bambi (1942),\n", 774 | "\n", 775 | "Nearest to Balto (1995): Pyromaniac's Love Story, A (1995), Lifeboat (1944), Third Miracle, The (1999), Seven (Se7en) (1995), Rain Man (1988), Shadow, The (1994), Barcelona (1994), Cape Fear (1991),\n", 776 | "\n", 777 | "Nearest to Once Upon a Time... When We Were Colored (1995): Daddy Long Legs (1919), Dances with Wolves (1990), Dream for an Insomniac (1996), Towering Inferno, The (1974), Great Day in Harlem, A (1994), Phantasm II (1988), Parenthood (1989), Extremities (1986),\n", 778 | "\n", 779 | "Nearest to Seven (Se7en) (1995): King Kong Lives (1986), Better Than Chocolate (1999), Balto (1995), Five Easy Pieces (1970), Patch Adams (1998), Crime and Punishment in Suburbia (2000), Alien Nation (1988), Them! (1954),\n", 780 | "\n", 781 | "Nearest to Confessional, The (Le Confessionnal) (1995): Mumford (1999), Highlander: Endgame (2000), Metroland (1997), Chain Reaction (1996), Shadow Conspiracy (1997), Moll Flanders (1996), Thin Red Line, The (1998), Bachelor, The (1999),\n", 782 | "\n", 783 | "Nearest to Kids of the Round Table (1995): Live and Let Die (1973), Holy Smoke (1999), Messenger: The Story of Joan of Arc, The (1999), Carnosaur 3: Primal Species (1996), Hard Target (1993), Dreamlife of Angels, The (La Vie r�v�e des anges) (1998), Brother from Another Planet, The (1984), Star Trek: Insurrection (1998),\n", 784 | "\n", 785 | "Nearest to Usual Suspects, The (1995): Perils of Pauline, The (1947), Crimes and Misdemeanors (1989), King in New York, A (1957), Patton (1970), Othello (1995), Beans of Egypt, Maine, The (1994), Love Is the Devil (1998), Jonah Who Will Be 25 in the Year 2000 (1976),\n", 786 | "\n", 787 | "Nearest to Now and Then (1995): Hamlet (2000), 8 1/2 Women (1999), Shaggy D.A., The (1976), Leaving Las Vegas (1995), Ballad of Narayama, The (Narayama Bushiko) (1982), Postman, The (1997), I Can't Sleep (J'ai pas sommeil) (1994), Grand Day Out, A (1992),\n", 788 | "\n", 789 | "Nearest to Restoration (1995): Operation Condor (Feiying gaiwak) (1990), Ulysses (Ulisse) (1954), Santa Fe Trail (1940), Carrie (1976), Sanjuro (1962), Golden Child, The (1986), Blue Sky (1994), Alarmist, The (1997),\n", 790 | "\n", 791 | "Nearest to Angels and Insects (1995): Jason's Lyric (1994), Foreign Correspondent (1940), Giant (1956), Jeremiah Johnson (1972), Heaven's Burning (1997), Jackie Chan's First Strike (1996), Wilde (1997), Heavy (1995),\n", 792 | "\n", 793 | "Nearest to American President, The (1995): Drop Dead Fred (1991), I'll Be Home For Christmas (1998), Mister Roberts (1955), And God Created Woman (1988), Star Wars: Episode VI - Return of the Jedi (1983), Monster, The (Il Mostro) (1994), Class Reunion (1982), Courage Under Fire (1996),\n", 794 | "\n", 795 | "Nearest to Two Bits (1995): Maybe, Maybe Not (Bewegte Mann, Der) (1994), Poison Ivy (1992), Crimson Tide (1995), Afterglow (1997), Faraway, So Close (In Weiter Ferne, So Nah!) (1993), Timecop (1994), Madness of King George, The (1994), Crocodile Dundee II (1988),\n", 796 | "\n" 797 | ] 798 | } 799 | ], 800 | "source": [ 801 | "final_embeddings = l2v.fit(verbose=True)" 802 | ] 803 | }, 804 | { 805 | "cell_type": "code", 806 | "execution_count": 9, 807 | "metadata": { 808 | "collapsed": false 809 | }, 810 | "outputs": [ 811 | { 812 | "name": "stdout", 813 | "output_type": "stream", 814 | "text": [ 815 | "Preprocessing the data using PCA...\n", 816 | "Computing pairwise distances...\n", 817 | "Computing P-values for point 0 of 100 ...\n", 818 | "Mean value of sigma: 0.261786445009\n", 819 | "Iteration 10 : error is 13.9306964931\n", 820 | "Iteration 20 : error is 13.4528682686\n", 821 | "Iteration 30 : error is 13.9035123528\n", 822 | "Iteration 40 : error is 13.7435912217\n", 823 | "Iteration 50 : error is 13.6047532078\n", 824 | "Iteration 60 : error is 13.8736701425\n", 825 | "Iteration 70 : error is 13.4797018902\n", 826 | "Iteration 80 : error is 13.9096716619\n", 827 | "Iteration 90 : error is 13.7337567976\n", 828 | "Iteration 100 : error is 13.9217889952\n", 829 | "Iteration 110 : error is 2.17130003935\n", 830 | "Iteration 120 : error is 2.01289632975\n", 831 | "Iteration 130 : error is 1.8872638998\n", 832 | "Iteration 140 : error is 1.79649559208\n", 833 | "Iteration 150 : error is 1.72696517117\n", 834 | "Iteration 160 : error is 1.67048000619\n", 835 | "Iteration 170 : error is 1.58105398966\n", 836 | "Iteration 180 : error is 1.53024362161\n", 837 | "Iteration 190 : error is 1.48044631182\n", 838 | "Iteration 200 : error is 1.43464807549\n", 839 | "Iteration 210 : error is 1.38630805719\n", 840 | "Iteration 220 : error is 1.35730284383\n", 841 | "Iteration 230 : error is 1.34014513334\n", 842 | "Iteration 240 : error is 1.31788034473\n", 843 | "Iteration 250 : error is 1.30451068487\n", 844 | "Iteration 260 : error is 1.28525996623\n", 845 | "Iteration 270 : error is 1.27146738341\n", 846 | "Iteration 280 : error is 1.26186448404\n", 847 | "Iteration 290 : error is 1.25037144349\n", 848 | "Iteration 300 : error is 1.23741302131\n", 849 | "Iteration 310 : error is 1.22070156384\n", 850 | "Iteration 320 : error is 1.19466489817\n", 851 | "Iteration 330 : error is 1.17897212436\n", 852 | "Iteration 340 : error is 1.17061972909\n", 853 | "Iteration 350 : error is 1.15996210637\n", 854 | "Iteration 360 : error is 1.15055641062\n", 855 | "Iteration 370 : error is 1.1445069762\n", 856 | "Iteration 380 : error is 1.13045611783\n", 857 | "Iteration 390 : error is 1.12114592264\n", 858 | "Iteration 400 : error is 1.11499833764\n", 859 | "Iteration 410 : error is 1.10866469308\n", 860 | "Iteration 420 : error is 1.10262625828\n", 861 | "Iteration 430 : error is 1.09755270426\n", 862 | "Iteration 440 : error is 1.09431275896\n", 863 | "Iteration 450 : error is 1.09084672124\n", 864 | "Iteration 460 : error is 1.08806499752\n", 865 | "Iteration 470 : error is 1.08553769328\n", 866 | "Iteration 480 : error is 1.08286669014\n", 867 | "Iteration 490 : error is 1.08222025864\n", 868 | "Iteration 500 : error is 1.08195266589\n", 869 | "Iteration 510 : error is 1.08175068791\n", 870 | "Iteration 520 : error is 1.08138809898\n", 871 | "Iteration 530 : error is 1.08099217724\n", 872 | "Iteration 540 : error is 1.07942412455\n", 873 | "Iteration 550 : error is 1.07401188209\n", 874 | "Iteration 560 : error is 1.07071862003\n", 875 | "Iteration 570 : error is 1.06801266902\n", 876 | "Iteration 580 : error is 1.06756061634\n", 877 | "Iteration 590 : error is 1.067342358\n", 878 | "Iteration 600 : error is 1.06709781241\n", 879 | "Iteration 610 : error is 1.06702284941\n", 880 | "Iteration 620 : error is 1.06691426058\n", 881 | "Iteration 630 : error is 1.06656377984\n", 882 | "Iteration 640 : error is 1.06631384981\n", 883 | "Iteration 650 : error is 1.06628706813\n", 884 | "Iteration 660 : error is 1.0662829207\n", 885 | "Iteration 670 : error is 1.06628187566\n", 886 | "Iteration 680 : error is 1.06628129931\n", 887 | "Iteration 690 : error is 1.06628086557\n", 888 | "Iteration 700 : error is 1.06628050412\n", 889 | "Iteration 710 : error is 1.06628026599\n", 890 | "Iteration 720 : error is 1.06628012064\n", 891 | "Iteration 730 : error is 1.0662800258\n", 892 | "Iteration 740 : error is 1.06627996618\n", 893 | "Iteration 750 : error is 1.06627993101\n", 894 | "Iteration 760 : error is 1.06627990915\n", 895 | "Iteration 770 : error is 1.06627989527\n", 896 | "Iteration 780 : error is 1.0662798876\n", 897 | "Iteration 790 : error is 1.06627988335\n", 898 | "Iteration 800 : error is 1.06627988072\n", 899 | "Iteration 810 : error is 1.06627987911\n", 900 | "Iteration 820 : error is 1.06627987822\n", 901 | "Iteration 830 : error is 1.0662798777\n", 902 | "Iteration 840 : error is 1.06627987737\n", 903 | "Iteration 850 : error is 1.06627987716\n", 904 | "Iteration 860 : error is 1.06627987705\n", 905 | "Iteration 870 : error is 1.06627987698\n", 906 | "Iteration 880 : error is 1.06627987694\n", 907 | "Iteration 890 : error is 1.06627987691\n", 908 | "Iteration 900 : error is 1.0662798769\n", 909 | "Iteration 910 : error is 1.06627987689\n", 910 | "Iteration 920 : error is 1.06627987689\n", 911 | "Iteration 930 : error is 1.06627987688\n", 912 | "Iteration 940 : error is 1.06627987688\n", 913 | "Iteration 950 : error is 1.06627987688\n", 914 | "Iteration 960 : error is 1.06627987688\n", 915 | "Iteration 970 : error is 1.06627987688\n", 916 | "Iteration 980 : error is 1.06627987688\n", 917 | "Iteration 990 : error is 1.06627987688\n", 918 | "Iteration 1000 : error is 1.06627987688\n" 919 | ] 920 | } 921 | ], 922 | "source": [ 923 | "l2v.plot_with_labels(perplexity=20.0,verbose=True)" 924 | ] 925 | }, 926 | { 927 | "cell_type": "code", 928 | "execution_count": null, 929 | "metadata": { 930 | "collapsed": false 931 | }, 932 | "outputs": [], 933 | "source": [] 934 | }, 935 | { 936 | "cell_type": "code", 937 | "execution_count": null, 938 | "metadata": { 939 | "collapsed": true 940 | }, 941 | "outputs": [], 942 | "source": [] 943 | }, 944 | { 945 | "cell_type": "code", 946 | "execution_count": null, 947 | "metadata": { 948 | "collapsed": true 949 | }, 950 | "outputs": [], 951 | "source": [] 952 | }, 953 | { 954 | "cell_type": "code", 955 | "execution_count": null, 956 | "metadata": { 957 | "collapsed": false 958 | }, 959 | "outputs": [], 960 | "source": [] 961 | }, 962 | { 963 | "cell_type": "code", 964 | "execution_count": null, 965 | "metadata": { 966 | "collapsed": false 967 | }, 968 | "outputs": [], 969 | "source": [] 970 | }, 971 | { 972 | "cell_type": "code", 973 | "execution_count": null, 974 | "metadata": { 975 | "collapsed": true 976 | }, 977 | "outputs": [], 978 | "source": [] 979 | }, 980 | { 981 | "cell_type": "code", 982 | "execution_count": null, 983 | "metadata": { 984 | "collapsed": false 985 | }, 986 | "outputs": [], 987 | "source": [] 988 | }, 989 | { 990 | "cell_type": "code", 991 | "execution_count": null, 992 | "metadata": { 993 | "collapsed": true 994 | }, 995 | "outputs": [], 996 | "source": [] 997 | } 998 | ], 999 | "metadata": { 1000 | "kernelspec": { 1001 | "display_name": "Python 2", 1002 | "language": "python", 1003 | "name": "python2" 1004 | }, 1005 | "language_info": { 1006 | "codemirror_mode": { 1007 | "name": "ipython", 1008 | "version": 2 1009 | }, 1010 | "file_extension": ".py", 1011 | "mimetype": "text/x-python", 1012 | "name": "python", 1013 | "nbconvert_exporter": "python", 1014 | "pygments_lexer": "ipython2", 1015 | "version": "2.7.11" 1016 | } 1017 | }, 1018 | "nbformat": 4, 1019 | "nbformat_minor": 0 1020 | } 1021 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Like2Vec_TensorFlow 2 | Implementing Like2Vec (Word2Vec for users or items) using TensorFlow 3 | 4 | Like2Vec.py : File containing the class used to generate Like2Vec using TensorFlow 5 | 6 | Like2Vec_Example.ipynb : Jupyter Notebook showing how to use the Like2Vec class 7 | 8 | tsne.py : File used to generate my tsne plot and was acquired here : https://lvdmaaten.github.io/tsne/ 9 | 10 | ### Inspiration: 11 | I was inspired to do this project by work my fellow classmates have done on Like2Vec: bit.ly/1Oz9V50 12 | 13 | ### Resources: 14 | Data : http://grouplens.org/datasets/movielens/ 15 | 16 | TSNE code : https://lvdmaaten.github.io/tsne/ 17 | 18 | Like2Vec Theory : http://www.perozzi.net/publications/14_kdd_deepwalk.pdf 19 | 20 | Word2Vec Example Code : https://github.com/tensorflow/tensorflow/blob/r0.8/tensorflow/examples/tutorials/word2vec/word2vec_basic.py 21 | 22 | 23 | -------------------------------------------------------------------------------- /images/tsne.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jacobBaumbach/Like2Vec_TensorFlow/d4355f93dcf413737faf46dcad81bc5fcf138c7e/images/tsne.png -------------------------------------------------------------------------------- /tsne.py: -------------------------------------------------------------------------------- 1 | # 2 | # tsne.py 3 | # 4 | # Implementation of t-SNE in Python. The implementation was tested on Python 2.7.10, and it requires a working 5 | # installation of NumPy. The implementation comes with an example on the MNIST dataset. In order to plot the 6 | # results of this example, a working installation of matplotlib is required. 7 | # 8 | # The example can be run by executing: `ipython tsne.py` 9 | # 10 | # 11 | # Created by Laurens van der Maaten on 20-12-08. 12 | # Copyright (c) 2008 Tilburg University. All rights reserved. 13 | 14 | import numpy as Math 15 | import pylab as Plot 16 | 17 | def Hbeta(D = Math.array([]), beta = 1.0): 18 | """Compute the perplexity and the P-row for a specific value of the precision of a Gaussian distribution.""" 19 | 20 | # Compute P-row and corresponding perplexity 21 | P = Math.exp(-D.copy() * beta); 22 | sumP = sum(P); 23 | H = Math.log(sumP) + beta * Math.sum(D * P) / sumP; 24 | P = P / sumP; 25 | return H, P; 26 | 27 | 28 | def x2p(X = Math.array([]), tol = 1e-5, perplexity = 30.0): 29 | """Performs a binary search to get P-values in such a way that each conditional Gaussian has the same perplexity.""" 30 | 31 | # Initialize some variables 32 | print "Computing pairwise distances..." 33 | (n, d) = X.shape; 34 | sum_X = Math.sum(Math.square(X), 1); 35 | D = Math.add(Math.add(-2 * Math.dot(X, X.T), sum_X).T, sum_X); 36 | P = Math.zeros((n, n)); 37 | beta = Math.ones((n, 1)); 38 | logU = Math.log(perplexity); 39 | 40 | # Loop over all datapoints 41 | for i in range(n): 42 | 43 | # Print progress 44 | if i % 500 == 0: 45 | print "Computing P-values for point ", i, " of ", n, "..." 46 | 47 | # Compute the Gaussian kernel and entropy for the current precision 48 | betamin = -Math.inf; 49 | betamax = Math.inf; 50 | Di = D[i, Math.concatenate((Math.r_[0:i], Math.r_[i+1:n]))]; 51 | (H, thisP) = Hbeta(Di, beta[i]); 52 | 53 | # Evaluate whether the perplexity is within tolerance 54 | Hdiff = H - logU; 55 | tries = 0; 56 | while Math.abs(Hdiff) > tol and tries < 50: 57 | 58 | # If not, increase or decrease precision 59 | if Hdiff > 0: 60 | betamin = beta[i].copy(); 61 | if betamax == Math.inf or betamax == -Math.inf: 62 | beta[i] = beta[i] * 2; 63 | else: 64 | beta[i] = (beta[i] + betamax) / 2; 65 | else: 66 | betamax = beta[i].copy(); 67 | if betamin == Math.inf or betamin == -Math.inf: 68 | beta[i] = beta[i] / 2; 69 | else: 70 | beta[i] = (beta[i] + betamin) / 2; 71 | 72 | # Recompute the values 73 | (H, thisP) = Hbeta(Di, beta[i]); 74 | Hdiff = H - logU; 75 | tries = tries + 1; 76 | 77 | # Set the final row of P 78 | P[i, Math.concatenate((Math.r_[0:i], Math.r_[i+1:n]))] = thisP; 79 | 80 | # Return final P-matrix 81 | print "Mean value of sigma: ", Math.mean(Math.sqrt(1 / beta)); 82 | return P; 83 | 84 | 85 | def pca(X = Math.array([]), no_dims = 50): 86 | """Runs PCA on the NxD array X in order to reduce its dimensionality to no_dims dimensions.""" 87 | 88 | print "Preprocessing the data using PCA..." 89 | (n, d) = X.shape; 90 | X = X - Math.tile(Math.mean(X, 0), (n, 1)); 91 | (l, M) = Math.linalg.eig(Math.dot(X.T, X)); 92 | Y = Math.dot(X, M[:,0:no_dims]); 93 | return Y; 94 | 95 | 96 | def tsne(X = Math.array([]), no_dims = 2, initial_dims = 50, perplexity = 30.0,verbose=False): 97 | """Runs t-SNE on the dataset in the NxD array X to reduce its dimensionality to no_dims dimensions. 98 | The syntaxis of the function is Y = tsne.tsne(X, no_dims, perplexity), where X is an NxD NumPy array.""" 99 | 100 | # Check inputs 101 | if isinstance(no_dims, float): 102 | print "Error: array X should have type float."; 103 | return -1; 104 | if round(no_dims) != no_dims: 105 | print "Error: number of dimensions should be an integer."; 106 | return -1; 107 | 108 | # Initialize variables 109 | X = pca(X, initial_dims).real; 110 | (n, d) = X.shape; 111 | max_iter = 1000; 112 | initial_momentum = 0.5; 113 | final_momentum = 0.8; 114 | eta = 500; 115 | min_gain = 0.01; 116 | Y = Math.random.randn(n, no_dims); 117 | dY = Math.zeros((n, no_dims)); 118 | iY = Math.zeros((n, no_dims)); 119 | gains = Math.ones((n, no_dims)); 120 | 121 | # Compute P-values 122 | P = x2p(X, 1e-5, perplexity); 123 | P = P + Math.transpose(P); 124 | P = P / Math.sum(P); 125 | P = P * 4; # early exaggeration 126 | P = Math.maximum(P, 1e-12); 127 | 128 | # Run iterations 129 | for iter in range(max_iter): 130 | 131 | # Compute pairwise affinities 132 | sum_Y = Math.sum(Math.square(Y), 1); 133 | num = 1 / (1 + Math.add(Math.add(-2 * Math.dot(Y, Y.T), sum_Y).T, sum_Y)); 134 | num[range(n), range(n)] = 0; 135 | Q = num / Math.sum(num); 136 | Q = Math.maximum(Q, 1e-12); 137 | 138 | # Compute gradient 139 | PQ = P - Q; 140 | for i in range(n): 141 | dY[i,:] = Math.sum(Math.tile(PQ[:,i] * num[:,i], (no_dims, 1)).T * (Y[i,:] - Y), 0); 142 | 143 | # Perform the update 144 | if iter < 20: 145 | momentum = initial_momentum 146 | else: 147 | momentum = final_momentum 148 | gains = (gains + 0.2) * ((dY > 0) != (iY > 0)) + (gains * 0.8) * ((dY > 0) == (iY > 0)); 149 | gains[gains < min_gain] = min_gain; 150 | iY = momentum * iY - eta * (gains * dY); 151 | Y = Y + iY; 152 | Y = Y - Math.tile(Math.mean(Y, 0), (n, 1)); 153 | 154 | # Compute current value of cost function 155 | if (iter + 1) % 10 == 0: 156 | C = Math.sum(P * Math.log(P / Q)); 157 | if verbose: 158 | print "Iteration ", (iter + 1), ": error is ", C 159 | 160 | # Stop lying about P-values 161 | if iter == 100: 162 | P = P / 4; 163 | 164 | # Return solution 165 | return Y; 166 | 167 | 168 | --------------------------------------------------------------------------------