├── .gitignore ├── Archive.zip ├── README.html ├── README.md ├── README.pdf ├── algos.py ├── data.py ├── data ├── README.txt ├── links.csv ├── ml-latest-small.zip ├── movies.csv ├── ratings.csv └── tags.csv └── docs ├── biparte (1).xml ├── biparte.png ├── biparte.xml ├── graph_based_recommendation system.Rmd ├── graph_based_recommendation_system.pdf └── report.pages /.gitignore: -------------------------------------------------------------------------------- 1 | *.pyc 2 | -------------------------------------------------------------------------------- /Archive.zip: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/chandan-u/graph-based-recommendation-system/897c6419d922fcbfe7f110522e530f3eeeceaf17/Archive.zip -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | 2 | # Graph-search based Recommendation system 3 | 4 | 5 | This is project is about building a recommendation system using graph search methodologies. We will be comparing these different approaches and closely observe the limitations of each. 6 | 7 | 8 | 9 | 10 | 11 | **Table of Contents** *generated with [DocToc](https://github.com/thlorenz/doctoc)* 12 | 13 | 14 | - [Abstract:](#abstract) 15 | - [Introduction:](#introduction) 16 | - [Data:](#data) 17 | - [Representation of data:](#representation-of-data) 18 | - [Related work:](#related-work) 19 | - [1. Content based approach:](#1-content-based-approach) 20 | - [2. Collaborative filtering:](#2-collaborative-filtering) 21 | - [3. A hybrid of collaborative and content based approach:](#3-a-hybrid-of-collaborative-and-content-based-approach) 22 | - [Recommendation Algorithms:](#recommendation-algorithms) 23 | - [1. The content based Filtering:](#1-the-content-based-filtering) 24 | - [2. Collaborative Filtering:](#2-collaborative-filtering) 25 | - [Euclidian distance Similarity:](#euclidian-distance-similarity) 26 | - [graph based recommendation :](#graph-based-recommendation-) 27 | - [Experimental Evaluation:](#experimental-evaluation) 28 | - [For collabertive:](#for-collabertive) 29 | - [For content based:](#for-content-based) 30 | - [Conclusion:](#conclusion) 31 | - [References:](#references) 32 | 33 | 34 | 35 | 36 | 37 | ## Abstract: 38 | 39 | Implemented a movie recommendation system using the movielens dataset from the grouplens site. This dataset is transformed to a bipartite graph which allowed to address the problem using graph based traversal algorithms instead of usual approaches that are used by recommendation systems. The goal is to implement collaborative filtering technique as well as content based recommendation using the graph traversal algorithms. We will evaluate the advantages and shortcomings and then also discuss how we can improve on this approach. 40 | 41 | ## Introduction: 42 | 43 | The amount of content that is being generated by social media sites, movies, tv shows etc is increasing tremendously and its very hard for a user or person to choose from such a huge pool of content. There are endless choices. Hence we need to filter out most of these content and give suggestions to user. 44 | Recommendation systems are designed to solve this very problem to give users best suggestions based on existing data and the user preferences. 45 | 46 | Recommendation systems are widely used in e-commerce sites such as Netflix to suggest movies , amazon to suggest products, music application such as iTunes and spotify to suggest next songs that the user may like to hear. It can be applied to even domains such as social networking. Facebook uses it for suggesting friends. 47 | 48 | In this project we implement a collaberative filtering recommendation system that uses existing data to give better suggestions. We will be building a bipartite graph from the data set to support graph traversal for collaborative filtering system. 49 | 50 | 51 | ## Data: 52 | 53 | The data set is obtained from http://grouplens.org/ . They have a collection of ratings of movies from MovieLens website . This data set covers 100,000 ratings and 1,300 tag applications applied to 9,000 movies by 671 users. For implementing the collaberative filter system we will be using all the data except for tags. The data set has mainly two files : movies.csv, ratings.csv. 54 | 55 | Number of users: 671 56 | Number of movies: >9000 57 | Number of ratings: 100,000 58 | 59 | Files: movies.csv, ratings.csv 60 | 61 | 62 | 63 | 64 | ## Representation of data: 65 | 66 | To facilitate graph traversal techniques and collaborative filtering , the data is transformed into bipartite graph representation. In a bipartite graph, nodes are divided into two distinctive sets. Links between pairs of nodes from different node sets are admissible, while links between nodes from the same node set are not allowedIn our case the information is about weather or not a person(customer) has watched the movie (product) and the how much rating the customer has given for the movie. Such an information can be easily represented as show in the below table: 67 | 68 | 69 | customer/movie | movie 1 | movie 2 | movie 3 | movie 4 70 | --------------- | -------- | -------- | -------- | ------- 71 | customer1 | 0 | 1 | 0 | 1 72 | customer2 | 0 | 1 | 1 | 1 73 | customer3 | 1 | 0 | 1 | 0 74 | 75 | 76 | 77 | 78 | The zeros in the above table represent weather a customer has watched a movie or not. The nonzero’s represent that a customer has watched the movie and the numeric value represents the rating he/she has given for that movie. You can traverse from customer to movie but you cannot traverse from customer to customer directly . Likewise you cannot directly traverse form movie to movie either. 79 | 80 | 81 | Biparte matrix translation to graph: 82 | 83 | 84 | ![](./docs/biparte.png) 85 | 86 | 87 | 88 | 89 | 90 | The actual data set has 9123 movies and 671 customer. So the biparte graph matrix is of size (9123 * 671). 91 | 92 | 93 | 94 | 95 | 96 | ## Related work: 97 | In general recommendation systems are implemented in three ways: 98 | 99 | ### 1. Content based approach: 100 | Another common approach when designing recommender systems is content-based filtering. Content-based filtering methods are based on a description of the item or product and a profile of the user’s preference 101 | 102 | ### 2. Collaborative filtering: 103 | Collaborative filtering methods are based on collecting and analyzing a large amount of information on users’ behaviors, activities or preferences and predicting what users will like based on their similarity to other users. A key advantage of the collaborative filtering approach is that it does not rely on machine analyzable content and therefore it is capable of accurately recommending complex items such as movies without requiring an "understanding" of the item itself 104 | ### 3. A hybrid of collaborative and content based approach: 105 | In this approach we combine the both the collaborative and content based approaches to come with recommendations. 106 | 107 | Our main focus in this project is to implement the collaborative filtering as well as the content based filtering. 108 | 109 | 110 | ## Recommendation Algorithms: 111 | 112 | ### 1. The content based Filtering: 113 | The idea behind content based filtering is when a user likes/watches certain movies, using the meta information of the movies that the user watched we will suggest similar movies which may have the same properties. For example the following meta information such as the genre about a movie can be used to suggest similar movies that belong to the same genre. We can also couple this with ratings that the user has given to these movies earlier. 114 | meta-properties: genre 115 | user-given-properties: rating 116 | 117 | Here is a simple algorithm 118 | 119 | ```{p} 120 | Algorithm: 121 | Step1: choose all the movies that the user watched. 122 | Step2: obtain genre of all the movies that user watched 123 | 124 | Step 3: sum all the ratings for each genre that is given by the target user 125 | 126 | Step 4: Divide the cumulative rating of each genre with 127 | the number of movies in that genre. 128 | 129 | Step 4: Now pick the top three genres using the above computation 130 | and recommend movies that belong to that genre. 131 | 132 | ``` 133 | 134 | This may not be the best approach but this takes into consideration that may be a user likes a particular genre and he is trying to find a good movie in that genre. He may not have found a good movie so far. Or it can also be that the user in general likes movies from certain genres more than other genre. 135 | 136 | 137 | ### 2. Collaborative Filtering: 138 | Collaborative filtering can be implemented in two ways . User based collaborative filtering and item based collaborative filtering. In this project we will be focussing on the user-user collaborative filtering. In user-user collaborative filtering when try to recommend a user , we try to find other similar users who have watched almost the same movies as our current user. We use similarity metrics such as euclidian distance, manhattan distance , pearson correlation etc to find such similar users. In this project implemented euclidian distance to find similar users. We will be taking the example of the tabl1 and try to recommend movies for customer 1 in the table. 139 | 140 | #### Euclidian distance Similarity: 141 | From the table 1 let’s assume we are trying to recommend movies for customer 1. Our goal is to find similar users. In order to do this we try to find the euclidian distance of customer 1 to all other customers respectively. 142 | 143 | The euclidian distance for any two vectors, p = (p1, p2,..., pn) and q = (q1, q2,..., qn) are two points in Euclidean n-space, is the distance (d) from p to q, or from q to p and is given by the Pythagorean formula: 144 | 145 | $\sqrt[2]{\sum_{i=1}^{n} (q_{i} - p_{i} )^{2}}$ 146 | 147 | Here vecotors are nothing but the rows of the biparte matrix shown in table 1. i.e customer 1 row is vector p and any other customer such as customer 2 row is vector q. The distance between customer 1 and customer 2 is computed as follows: 148 | 149 | customer 1, p = (0,1,0,1) 150 | customer 2, q = (0,1,1,1) 151 | 152 | $\sqrt[2]{( (0-0)^{2} + (1-1)^{2} + (0-1)^{2} + (1-1)^{2} )^{2}}$ = 1 153 | 154 | 155 | As you can see the distance is one. We obtain distances of customer 1 w.r.t all other users. We choose the first few users( atleast three) who are closer to the customer 1. The most important thing is we ignore all those users who have a euclidian distance of zero w.r.t customer 1. This is becuase those users have watched the same set of movies as customer 1 and have no other information to provide that would be helpfull in recommending customer 1. 156 | 157 | We compute a distance metric as follows: 158 | 159 | d | customer1 | customer 2 | customer 3 160 | ------ | ---------- | ----------- | ---------- 161 | distance from customer1 | 0 | 1 | 2 162 | 163 | So customer 3 not very similar to customer 1 . But customer 2 is similar and may have some intresting information that we can use to suggest movies to customer 1. 164 | 165 | 166 | #### graph based recommendation : 167 | 168 | Now that we have the list of users (neighborhood users) similar to the target users( whom we are recommending)we will use the biparte graph matrix to search for movies that can be recommended to the target user. 169 | 170 | The similarity between customer 1 and customer 2 is obvious becuase they have both watched movies "move1" and "movie4". As a result, "movie3" is recommended to customer 1 because customer 2 has watched it too. From the distance metrics we know that customer 1 and customer 3 are not very similar. Therefore, customer 1, which has been purchased by customer 3, will not be recommended to customer 1. 171 | 172 | The above recommendation approach can be easily implemented in a graph-based model by computing the associations between movie nodes and customer nodes. In this context, the association between two nodes is determined by the existence and length of the path(s) connecting them. Standard collaborative filtering approaches, including both the user-based and item-based approaches, consider only paths with length equal to 3. For instance, the 173 | association between customer 1 and movie3 is determined by all paths of length 3 connecting customer 1 and movie3. It is easy to see from Figure 1 that there exist two paths connecting customer1 and movie3: 174 | customer1—movie1—customer2-movie3 175 | and customer1—movie4—customer2—movie3. 176 | 177 | This strong association leads to the recommendation of movie3 to customer1. Intuitively, the higher the number of distinctive paths connecting a product node to a consumer node, the higher the association between these two nodes. The product therefore is more likely to be recommended to the consumer. 178 | Extending the above approach to explore and incorporate transitive associations is straightforward in a graph-based model. By considering paths whose length exceeds 3, the model will be able to explore transitive associations. 179 | 180 | 181 | 182 | So we can formalize this as follows: 183 | If there are n paths between (customer i , movie i) then the the wieight of each path is computed as follows: 184 | 185 | Aglorithm: 186 | 187 | ``` 188 | Take constant alpha= (0,1) 189 | weights = 0 190 | For each path between (customer i and movie i): 191 | compute the depth of the path. 192 | weights = weights + $(alpha)^3$ 193 | 194 | ``` 195 | 196 | 197 | $weights(customer1 , movie3)= (0.5)^3 +(0.5)^3 = 0.25, and weihts(cusotmer1, movie1)=0$ 198 | 199 | It's zero becuase there is no path to movie1. Hence we will recommend movie 3 to customer 1. 200 | 201 | 202 | 203 | ## Experimental Evaluation: 204 | 205 | One of the ways to evaluate the content based and collaberative based techniques is to use the similarity metrics as follows: 206 | 207 | 1.) Compute the euclidian distances of the all users with respect to the target user(who is to be recommended). Obtain all the similar users i.e users whose distance is less w.r.t target user. 208 | 209 | 2.) Now compute the recommendation filters and recommend the movie to the target user. Update the movie list of the target user. 210 | 211 | 3.) Now compute the euclidian distances of all the previous similar users w.r.t the updated target user. And see how much has the cummulative distace varied. 212 | 213 | 4.) We compute this change in cummulative distance for both collaberative as well as content based recommendation. The better algorithm is the one whose cummulative distance has reduced drastically. 214 | 215 | 216 | Lets say the collabertive algorithm recommends "movie3" to customer 1 217 | And the content based algorithm recommends "movie1" to customer 1 218 | 219 | 220 | So now updated vectors are: 221 | 222 | 223 | ### For collabertive: 224 | 225 | Target User: 226 | customer 1, p = (0,1,1,1) 227 | Similar Users: 228 | customer 2, q = (0,1,1,1) 229 | 230 | Total distance = $\sqrt[2]{( (0-0)^{2} + (1-1)^{2} + (1-1)^{2} + (1-1)^{2} )^{2}}$ = 0 231 | 232 | ### For content based: 233 | 234 | Target User: 235 | customer 1, p = (1,1,0,1) 236 | Similar Users: 237 | customer 2, q = (0,1,1,1) 238 | 239 | Total distance = $\sqrt[2]{( (1-0)^{2} + (1-1)^{2} + (0-1)^{2} + (1-1)^{2} )^{2}}$ = $\sqrt[2]{2}$ 240 | 241 | It seems like the content based has not fared well as it failed to give the most similar recommendation. But this is epxected behaviour. This approach is more helpfull when we are trying to compare one collabertive filtering algorithm with another collaberative filtering algorithm. 242 | 243 | In this program we have implemented only for one movie recommendation to the target user customer 1. So apparently the euclidian distances are not good enough to compare the algorithms. This is because the program took a lot of time just to execute for one target user. 244 | 245 | ``` 246 | The content Based recommendatin is: 247 | 284 318 Shawshank Redemption, The (1994) Crime|Drama 248 | 249 | The collaberative based recommendatoin is : 250 | 15 16 Casino (1995) Crime 251 | ``` 252 | 253 | 254 | 255 | ## Conclusion: 256 | 257 | 1.) There are many ways to implement collaberative as well as content based filtering. What we have implemented in this project is not the best approach. It can be improved a lot more. 258 | 259 | 2.) In this experiment though we have used the Euclidian distance to get similar users , there are far more better approaches to get similar / neighbouring users such as the manhattan distance , the pearson corellation which is considered one of the best for the collabertive filtering problem. 260 | 261 | 3.) In the project we have implemented BFS to obtain the paths from customer to movie in biparte graph. But as the number of movies increases this might be problem as we have to compute more number of paths. We have to explore other ways such as implementing greedy algorithms which are faster and take less computation but may not give an optimal solution. Or we can do iterative deepening search to limit depth of the graph search. This is something that would very efficient and intresting to try. 262 | 263 | 4.) As for the testing it would be more realistic to have data where we have information about what movies did the customer/user choose after the recommendation. This way we can implement precission and recall metircs easily. 264 | 265 | 266 | 267 | 268 | 269 | 270 | 271 | 272 | 273 | 274 | ## References: 275 | 276 | 1.) https://en.wikipedia.org/wiki/Recommender_system 277 | 2.) https://en.wikipedia.org/wiki/Euclidean_distance 278 | 3.) GRAPH-BASED ANALYSIS FOR E-COMMERCE RECOMMENDATION[http://arizona.openrepository.com/arizona/bitstream/10150/196109/1/azu_etd_1167_sip1_m.pdf] 279 | 4.) Collaborative Filtering using Weighted BiPartite Graph Projection [http://snap.stanford.edu/class/cs224w-2013/projects2013/cs224w-038-final.pdf] 280 | 281 | 5.)Movie Recommendation based on graph traversal Algorithms[http://www2.fiit.stuba.sk/~bielik/publ/abstracts/2013/televido-dexa2013.pdf] 282 | 6.) http://grouplens.org/blog/ 283 | 284 | 285 | -------------------------------------------------------------------------------- /README.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/chandan-u/graph-based-recommendation-system/897c6419d922fcbfe7f110522e530f3eeeceaf17/README.pdf -------------------------------------------------------------------------------- /algos.py: -------------------------------------------------------------------------------- 1 | import operator 2 | 3 | from data import load, load_movielens 4 | 5 | import numpy as np 6 | 7 | # load the biparte graph: np matrix 8 | user_movie_matrix = load() 9 | 10 | 11 | 12 | 13 | 14 | def greedy(): 15 | 16 | 17 | """ 18 | 19 | This uses greedy approach for the recommendation 20 | It traverses the graph based on the greedy approach : pick highest rating while traversing from node to node 21 | """ 22 | 23 | 24 | 25 | for row in user_movie_matrix: 26 | # compute for each user 27 | 28 | row = list(row) 29 | maxrating = max(row) 30 | print user_movie_matrix[user_movie_matrix[row] == maxrating] 31 | 32 | pass 33 | 34 | 35 | 36 | 37 | 38 | def bfs_paths(graph, start, goal): 39 | flag = "product" 40 | queue = [(start, [start])] 41 | nonzero_indices = [] 42 | while queue: 43 | (vertex, path) = queue.pop(0) 44 | if flag == "product": 45 | # the children nodes are the vertical of biparte matrix(nonzero) 46 | column = graph[:, [vertex]] 47 | nonzero_indices = column.nonzero() 48 | nonzero_indices = nonzero_indices[0] 49 | for child_index in nonzero_indices: 50 | row = graph[child_index] 51 | nonzero_row_indices = row.nonzero()[0] 52 | 53 | for row_index in nonzero_row_indices: 54 | if goal == row_index: 55 | yield path + [child_index] 56 | else: 57 | queue.append((row_index, path + [row_index])) 58 | 59 | 60 | 61 | 62 | 63 | 64 | 65 | def graph_search(biparte_matrix): 66 | 67 | """ 68 | the first row of the biparte matrix is : targetUser 69 | 70 | the second to last row is the closest neighbors 71 | 72 | Now using graph search we have to find recommend a movie 73 | 74 | Since its a biparte graph: 75 | 1. First move from customer to movie 76 | 2. move from movie to customer 77 | 78 | 79 | """ 80 | 81 | 82 | 83 | # the user who is being recommeneded 84 | target_vector = biparte_matrix[0] 85 | 86 | # get indices of all /some of the indices of the watched movies of the target user 87 | 88 | #bfs_roots = [i for i, e in enumerate(target_vector) if e != 0] 89 | 90 | bfs_all_roots = np.argpartition(target_vector, -10 )[-10:] 91 | bfs_roots = [] 92 | 93 | for root in bfs_all_roots: 94 | if target_vector[root] != 0 : 95 | bfs_roots.append(root) 96 | 97 | 98 | path =[] 99 | data = {} 100 | for item_index in range(biparte_matrix.shape[0]): 101 | 102 | if target_vector[item_index] == 0: 103 | 104 | for root in bfs_roots: 105 | path.append( bfs_paths(biparte_matrix, root, item_index )) 106 | data[item_index] = path 107 | 108 | 109 | return data 110 | def user_base_collabertive_filtering(): 111 | 112 | 113 | # find euclidian distance of first two users w.r.t all users 114 | # note: distance betwee two same vecotrs is zero 115 | 116 | for user in range(2): 117 | distances = [] 118 | for anotheruser in range(user_movie_matrix.shape[0]): 119 | 120 | distance = np.linalg.norm(user_movie_matrix[user] - user_movie_matrix[anotheruser] ) 121 | distances.append(distance) 122 | 123 | 124 | # get the similar users 125 | # indices of the similar users: closest 20 126 | # non zero too 127 | closest_all_indices=np.argpartition(distances, -20)[-20:] 128 | closest_indices = [] 129 | for index in closest_all_indices: 130 | if distances[index] != 0: 131 | closest_indices.append(index) 132 | 133 | 134 | 135 | 136 | # consider the first five closest neighbors for the recommendation 137 | 138 | closest_indices.insert(0, user) 139 | biparte_matrix = user_movie_matrix[closest_indices[0:4]] 140 | # now execute the graph search for recommendation 141 | 142 | paths = graph_search(biparte_matrix) 143 | 144 | 145 | 146 | 147 | # compute the weights of paths 148 | data= {} 149 | for item in paths.keys(): 150 | print item 151 | weight = 0 152 | allpaths = paths[item] 153 | for path in allpaths: 154 | depth = len(path) 155 | weight = weight + (0.5)**depth 156 | data[item] = weight 157 | 158 | # find the which movie has great weight: 159 | fav_movie = max(data.iteritems(), key=operator.itemgetter(1))[0] 160 | 161 | 162 | def get_movie_avg_rating(id, ratings): 163 | """ 164 | return avg rating of the movie 165 | 166 | """ 167 | 168 | 169 | #df = ratings.movieId == id 170 | 171 | rating = 0 172 | for index, row in ratings.iterrows(): 173 | if row["movieId"] == id: 174 | rating = rating + row["rating"] 175 | return rating 176 | 177 | 178 | 179 | def get_user_movie_rating( id, ratings, target_user): 180 | """ 181 | return user rating of the movie 182 | 183 | """ 184 | 185 | 186 | #df = ratings.movieId == id 187 | 188 | rating = 0 189 | df = ratings[ratings.userId == 1] 190 | for index, row in df.iterrows(): 191 | if row["movieId"] == id: 192 | rating = rating + row["rating"] 193 | return rating 194 | 195 | 196 | 197 | 198 | 199 | def content_based_filtering(): 200 | """ 201 | For content based filtering we wont directly use the biparte graph. We will be directly querying the file that is loaded 202 | using the pandas dataframe. 203 | 204 | The idea behind content based filtering is : when a user likes certain movies, using the meta informaton of the movies that the user watched we 205 | will suggest similar movies which may have the same properties. 206 | 207 | 208 | meta-properties: genre 209 | user-given-properties: rating 210 | Algorithm: 211 | choose all the movies that the user watched and arrange in descending order of ratings 212 | obtain genre of all the movies that user watched 213 | 214 | sum all the ratings for each genre 215 | Now pick the top three generes and suggest based on that 216 | 217 | 218 | """ 219 | 220 | 221 | 222 | movies, ratings = load_movielens() 223 | 224 | 225 | # considerng customer 1 226 | target_user = 1 227 | 228 | movie_ids = [] 229 | # list of movies watched by user 230 | for index, row in ratings.iterrows(): 231 | if row["userId"] == 1 : 232 | movie_ids.append(row["movieId"]) 233 | 234 | 235 | # compute the average of the ratings for each genre watched by the user 236 | genre_dict = {} 237 | genre_count_dict = {} 238 | genre_ratio = {} 239 | for id in movie_ids: 240 | df = movies[movies.movieId == id] 241 | genres = [] 242 | for index, row in df.iterrows(): 243 | genres = row["genres"] 244 | genres = genres.lower() 245 | genres = genres.split('|') 246 | 247 | rating = get_user_movie_rating(id, ratings, target_user =1 ) 248 | 249 | for genre in genres: 250 | if genre in genre_dict.keys(): 251 | genre_dict[genre] = genre_dict[genre] + rating 252 | genre_count_dict[genre] = genre_count_dict[genre] + 1 253 | else: 254 | genre_dict[genre] = rating 255 | genre_count_dict[genre] = rating 256 | 257 | for key in genre_dict.keys(): 258 | ratio = genre_dict[key] / float(genre_count_dict[key]) 259 | genre_dict[key] = ratio 260 | 261 | fav_genre = max(genre_dict.iteritems(), key=operator.itemgetter(1))[0] 262 | 263 | # get the best movies from that genre: 264 | 265 | genres_ids = [] 266 | fav_movie_id = 0 267 | for index, row in movies.iterrows(): 268 | genres = row['genres'] 269 | genres = genres.lower() 270 | genres = genres.split('|') 271 | if fav_genre in genres: 272 | movie_rating = get_movie_avg_rating(row["movieId"], ratings ) 273 | if movie_rating > rating: 274 | fav_movie_id = row["movieId"] 275 | rating = movie_rating 276 | 277 | print "content based recommended movie is:" 278 | print movies[movies.movieId == fav_movie_id] 279 | 280 | 281 | return fav_mov_id 282 | 283 | 284 | def evaluation(): 285 | 286 | """ 287 | lets only compute for one target user: As it is taking too long 288 | """ 289 | 290 | fav_mov_id_con = content_based_filtering() 291 | 292 | fav_mov_id_col = user_base_collabertive_filtering() 293 | 294 | 295 | for user in range(2): 296 | distances = [] 297 | for anotheruser in range(user_movie_matrix.shape[0]): 298 | 299 | distance = np.linalg.norm(user_movie_matrix[user] - user_movie_matrix[anotheruser] ) 300 | distances.append(distance) 301 | 302 | 303 | # get the similar users 304 | # indices of the similar users: closest 20 305 | # non zero too 306 | closest_all_indices=np.argpartition(distances, -20)[-20:] 307 | closest_indices = [] 308 | for index in closest_all_indices: 309 | if distances[index] != 0: 310 | closest_indices.append(index) 311 | 312 | 313 | 314 | 315 | # consider the first five closest neighbors for the recommendation 316 | 317 | closest_indices.insert(0, user) 318 | biparte_matrix = user_movie_matrix[closest_indices[0:4]] 319 | 320 | 321 | target_col_vector =user_movie_matrix[0] 322 | target_con_vector = user_movie_matrix[0] 323 | 324 | score_col = 0 325 | score_con = 0 326 | for row in biparte_matrix: 327 | score_col = score_col + np.linalg.norm(row, target_col_vector) 328 | score_con = score_con + np.linalg.norm(row, target_con_vector) 329 | 330 | 331 | print "The collaberative filtering sum of distances is: ", score_col 332 | print "The contend based filtering sum of distances is: ", score_con 333 | 334 | 335 | 336 | evaluation() 337 | -------------------------------------------------------------------------------- /data.py: -------------------------------------------------------------------------------- 1 | 2 | import pandas as pd 3 | import numpy as np 4 | 5 | def load_movielens(): 6 | 7 | """ 8 | load the three csv files: 9 | 1. movies.csv: movieId,title,genres 10 | 2. ratings.csv: userId,movieId,rating,timestamp 11 | 3. tags.csv: userId,movieId,tag,timestamp ( This is not needed for now) 12 | 13 | """ 14 | #movies_csv = np.genfromtxt('./data/movies.csv', delimeter=',') 15 | 16 | #ratings_csv = np.genfromtxt('./data/ratings.csv',delimeter=',') 17 | 18 | #tags_csv = np.genfromtxt('./data/tags.csv', delimeter=',') 19 | 20 | movies = pd.read_csv('./data/movies.csv', sep=',') 21 | 22 | ratings = pd.read_csv('./data/ratings.csv', sep=',') 23 | 24 | 25 | 26 | return movies, ratings 27 | 28 | 29 | def biparteMatrix(movies_frame, ratings_frame): 30 | 31 | """ 32 | 33 | convert the movies data frame into userid-movies biparte adjacency graph matrix 34 | 35 | """ 36 | 37 | 38 | user_ids = list(ratings_frame.userId.unique()) 39 | movie_ids = list(movies_frame.movieId.unique()) 40 | 41 | numberOfUsers = len(user_ids) 42 | 43 | numberOfMovies = len(movie_ids) 44 | 45 | 46 | 47 | # initialize a numpy matrix of of numberOfUsers * numberOfMovies 48 | 49 | user_movie_biparte = np.zeros((numberOfUsers, numberOfMovies)) 50 | 51 | 52 | for name, group in ratings_frame.groupby(["userId", "movieId"]): 53 | 54 | #print name 55 | #print group 56 | 57 | # name is a tuple (userId, movieId) 58 | 59 | userId, movieId = name 60 | 61 | user_index = user_ids.index(userId) 62 | movie_index = movie_ids.index(movieId) 63 | user_movie_biparte[user_index, movie_index] = group[["rating"]].values[0,0] 64 | 65 | return user_movie_biparte 66 | 67 | 68 | def load(): 69 | 70 | 71 | """ 72 | convert the csv flies into required dataformats 73 | 74 | ratings_csv : convert into user-moveID biparte sparse adjacency graph matrix 75 | 76 | tags_csv: not requried currently 77 | 78 | movies_csv: convert into clusters of data 79 | """ 80 | 81 | # load csv into dataframes 82 | movies, ratings = load_movielens() 83 | 84 | 85 | #convet the ratings datafrom into user-movieId biparte adjacency matrix 86 | matrix = biparteMatrix(movies, ratings) 87 | 88 | return matrix 89 | 90 | 91 | 92 | 93 | 94 | 95 | 96 | load() 97 | -------------------------------------------------------------------------------- /data/README.txt: -------------------------------------------------------------------------------- 1 | Summary 2 | ======= 3 | 4 | This dataset (ml-latest-small) describes 5-star rating and free-text tagging activity from [MovieLens](http://movielens.org), a movie recommendation service. It contains 100004 ratings and 1296 tag applications across 9125 movies. These data were created by 671 users between January 09, 1995 and October 16, 2016. This dataset was generated on October 17, 2016. 5 | 6 | Users were selected at random for inclusion. All selected users had rated at least 20 movies. No demographic information is included. Each user is represented by an id, and no other information is provided. 7 | 8 | The data are contained in the files `links.csv`, `movies.csv`, `ratings.csv` and `tags.csv`. More details about the contents and use of all these files follows. 9 | 10 | This is a *development* dataset. As such, it may change over time and is not an appropriate dataset for shared research results. See available *benchmark* datasets if that is your intent. 11 | 12 | This and other GroupLens data sets are publicly available for download at . 13 | 14 | 15 | Usage License 16 | ============= 17 | 18 | Neither the University of Minnesota nor any of the researchers involved can guarantee the correctness of the data, its suitability for any particular purpose, or the validity of results based on the use of the data set. The data set may be used for any research purposes under the following conditions: 19 | 20 | * The user may not state or imply any endorsement from the University of Minnesota or the GroupLens Research Group. 21 | * The user must acknowledge the use of the data set in publications resulting from the use of the data set (see below for citation information). 22 | * The user may redistribute the data set, including transformations, so long as it is distributed under these same license conditions. 23 | * The user may not use this information for any commercial or revenue-bearing purposes without first obtaining permission from a faculty member of the GroupLens Research Project at the University of Minnesota. 24 | * The executable software scripts are provided "as is" without warranty of any kind, either expressed or implied, including, but not limited to, the implied warranties of merchantability and fitness for a particular purpose. The entire risk as to the quality and performance of them is with you. Should the program prove defective, you assume the cost of all necessary servicing, repair or correction. 25 | 26 | In no event shall the University of Minnesota, its affiliates or employees be liable to you for any damages arising out of the use or inability to use these programs (including but not limited to loss of data or data being rendered inaccurate). 27 | 28 | If you have any further questions or comments, please email 29 | 30 | 31 | Citation 32 | ======== 33 | 34 | To acknowledge use of the dataset in publications, please cite the following paper: 35 | 36 | > F. Maxwell Harper and Joseph A. Konstan. 2015. The MovieLens Datasets: History and Context. ACM Transactions on Interactive Intelligent Systems (TiiS) 5, 4, Article 19 (December 2015), 19 pages. DOI= 37 | 38 | 39 | Further Information About GroupLens 40 | =================================== 41 | 42 | GroupLens is a research group in the Department of Computer Science and Engineering at the University of Minnesota. Since its inception in 1992, GroupLens's research projects have explored a variety of fields including: 43 | 44 | * recommender systems 45 | * online communities 46 | * mobile and ubiquitious technologies 47 | * digital libraries 48 | * local geographic information systems 49 | 50 | GroupLens Research operates a movie recommender based on collaborative filtering, MovieLens, which is the source of these data. We encourage you to visit to try it out! If you have exciting ideas for experimental work to conduct on MovieLens, send us an email at - we are always interested in working with external collaborators. 51 | 52 | 53 | Content and Use of Files 54 | ======================== 55 | 56 | Formatting and Encoding 57 | ----------------------- 58 | 59 | The dataset files are written as [comma-separated values](http://en.wikipedia.org/wiki/Comma-separated_values) files with a single header row. Columns that contain commas (`,`) are escaped using double-quotes (`"`). These files are encoded as UTF-8. If accented characters in movie titles or tag values (e.g. Misérables, Les (1995)) display incorrectly, make sure that any program reading the data, such as a text editor, terminal, or script, is configured for UTF-8. 60 | 61 | User Ids 62 | -------- 63 | 64 | MovieLens users were selected at random for inclusion. Their ids have been anonymized. User ids are consistent between `ratings.csv` and `tags.csv` (i.e., the same id refers to the same user across the two files). 65 | 66 | Movie Ids 67 | --------- 68 | 69 | Only movies with at least one rating or tag are included in the dataset. These movie ids are consistent with those used on the MovieLens web site (e.g., id `1` corresponds to the URL ). Movie ids are consistent between `ratings.csv`, `tags.csv`, `movies.csv`, and `links.csv` (i.e., the same id refers to the same movie across these four data files). 70 | 71 | 72 | Ratings Data File Structure (ratings.csv) 73 | ----------------------------------------- 74 | 75 | All ratings are contained in the file `ratings.csv`. Each line of this file after the header row represents one rating of one movie by one user, and has the following format: 76 | 77 | userId,movieId,rating,timestamp 78 | 79 | The lines within this file are ordered first by userId, then, within user, by movieId. 80 | 81 | Ratings are made on a 5-star scale, with half-star increments (0.5 stars - 5.0 stars). 82 | 83 | Timestamps represent seconds since midnight Coordinated Universal Time (UTC) of January 1, 1970. 84 | 85 | Tags Data File Structure (tags.csv) 86 | ----------------------------------- 87 | 88 | All tags are contained in the file `tags.csv`. Each line of this file after the header row represents one tag applied to one movie by one user, and has the following format: 89 | 90 | userId,movieId,tag,timestamp 91 | 92 | The lines within this file are ordered first by userId, then, within user, by movieId. 93 | 94 | Tags are user-generated metadata about movies. Each tag is typically a single word or short phrase. The meaning, value, and purpose of a particular tag is determined by each user. 95 | 96 | Timestamps represent seconds since midnight Coordinated Universal Time (UTC) of January 1, 1970. 97 | 98 | Movies Data File Structure (movies.csv) 99 | --------------------------------------- 100 | 101 | Movie information is contained in the file `movies.csv`. Each line of this file after the header row represents one movie, and has the following format: 102 | 103 | movieId,title,genres 104 | 105 | Movie titles are entered manually or imported from , and include the year of release in parentheses. Errors and inconsistencies may exist in these titles. 106 | 107 | Genres are a pipe-separated list, and are selected from the following: 108 | 109 | * Action 110 | * Adventure 111 | * Animation 112 | * Children's 113 | * Comedy 114 | * Crime 115 | * Documentary 116 | * Drama 117 | * Fantasy 118 | * Film-Noir 119 | * Horror 120 | * Musical 121 | * Mystery 122 | * Romance 123 | * Sci-Fi 124 | * Thriller 125 | * War 126 | * Western 127 | * (no genres listed) 128 | 129 | Links Data File Structure (links.csv) 130 | --------------------------------------- 131 | 132 | Identifiers that can be used to link to other sources of movie data are contained in the file `links.csv`. Each line of this file after the header row represents one movie, and has the following format: 133 | 134 | movieId,imdbId,tmdbId 135 | 136 | movieId is an identifier for movies used by . E.g., the movie Toy Story has the link . 137 | 138 | imdbId is an identifier for movies used by . E.g., the movie Toy Story has the link . 139 | 140 | tmdbId is an identifier for movies used by . E.g., the movie Toy Story has the link . 141 | 142 | Use of the resources listed above is subject to the terms of each provider. 143 | 144 | Cross-Validation 145 | ---------------- 146 | 147 | Prior versions of the MovieLens dataset included either pre-computed cross-folds or scripts to perform this computation. We no longer bundle either of these features with the dataset, since most modern toolkits provide this as a built-in feature. If you wish to learn about standard approaches to cross-fold computation in the context of recommender systems evaluation, see [LensKit](http://lenskit.org) for tools, documentation, and open-source code examples. 148 | -------------------------------------------------------------------------------- /data/ml-latest-small.zip: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/chandan-u/graph-based-recommendation-system/897c6419d922fcbfe7f110522e530f3eeeceaf17/data/ml-latest-small.zip -------------------------------------------------------------------------------- /data/tags.csv: -------------------------------------------------------------------------------- 1 | userId,movieId,tag,timestamp 2 | 15,339,sandra 'boring' bullock,1138537770 3 | 15,1955,dentist,1193435061 4 | 15,7478,Cambodia,1170560997 5 | 15,32892,Russian,1170626366 6 | 15,34162,forgettable,1141391765 7 | 15,35957,short,1141391873 8 | 15,37729,dull story,1141391806 9 | 15,45950,powerpoint,1169616291 10 | 15,100365,activist,1425876220 11 | 15,100365,documentary,1425876220 12 | 15,100365,uganda,1425876220 13 | 23,150,Ron Howard,1148672905 14 | 68,2174,music,1249808064 15 | 68,2174,weird,1249808102 16 | 68,8623,Steve Martin,1249808497 17 | 73,107999,action,1430799184 18 | 73,107999,anime,1430799184 19 | 73,107999,kung fu,1430799184 20 | 73,111624,drama,1431584497 21 | 73,111624,indie,1431584497 22 | 73,111624,love,1431584497 23 | 73,130682,b movie,1432523704 24 | 73,130682,comedt,1432523704 25 | 73,130682,horror,1432523704 26 | 77,1199,Trilogy of the Imagination,1163220043 27 | 77,2968,Gilliam,1163220138 28 | 77,2968,Trilogy of the Imagination,1163220039 29 | 77,4467,Trilogy of the Imagination,1163220065 30 | 77,4911,Gilliam,1163220167 31 | 77,5909,Takashi Miike,1163219591 32 | 77,47465,Gilliam,1163220186 33 | 84,296,intense,1429911417 34 | 84,296,r:violence,1429911417 35 | 84,296,tarantino,1429911417 36 | 91,4388,parody,1448813502 37 | 94,1131,emotional,1291781542 38 | 94,1131,tragedy,1291781538 39 | 94,64957,original plot,1291781246 40 | 94,74458,Predictable,1291780920 41 | 106,48711,CHRISTIAN,1215923364 42 | 132,4189,jesus,1367909949 43 | 132,4612,jesus,1367909949 44 | 132,6683,bollywood,1367909913 45 | 132,6986,jesus,1367909949 46 | 132,27255,No progress,1283581045 47 | 132,27255,Too slow,1283581045 48 | 132,27255,Views,1283581045 49 | 138,260,cult classic,1440379022 50 | 138,260,Science Fiction,1440379018 51 | 138,1258,cult film,1440380361 52 | 138,1258,jack nicholson,1440380355 53 | 138,1258,psychological,1440380357 54 | 138,1258,Stanley Kubrick,1440380352 55 | 138,1704,genius,1440380467 56 | 138,1704,intellectual,1440380463 57 | 138,1704,mathematics,1440380466 58 | 138,1704,psychology,1440380470 59 | 138,4226,Mindfuck,1440380125 60 | 138,4226,nonlinear,1440380113 61 | 138,4226,psychology,1440380115 62 | 138,4226,twist ending,1440380118 63 | 138,4995,genius,1440380440 64 | 138,4995,intelligent,1440380446 65 | 138,4995,math,1440380438 66 | 138,4995,mathematics,1440380436 67 | 138,4995,twist ending,1440380448 68 | 138,48780,Christopher Nolan,1440380053 69 | 138,48780,complicated,1440380062 70 | 138,48780,Hugh Jackman,1440380072 71 | 138,48780,nonlinear,1440380067 72 | 138,48780,psychological,1440380064 73 | 138,48780,twist ending,1440380047 74 | 138,79132,alternate reality,1440380233 75 | 138,79132,Christopher Nolan,1440380255 76 | 138,79132,intellectual,1440380251 77 | 138,79132,mindfuck,1440380252 78 | 138,79132,philosophy,1440380242 79 | 138,79132,sci-fi,1440380239 80 | 138,79132,twist ending,1440380245 81 | 138,109487,artificial intelligence,1440379984 82 | 138,109487,Christopher Nolan,1440380001 83 | 138,109487,good science,1440379977 84 | 138,109487,interesting ideea,1440379991 85 | 138,109487,philosophical issues,1440379996 86 | 138,109487,physics,1440379925 87 | 138,109487,relativity,1440380500 88 | 138,109487,sci-fi,1440380008 89 | 138,109487,sentimental,1440379993 90 | 138,109487,space,1440379974 91 | 138,109487,time travel,1440379981 92 | 138,109487,time-travel,1440379968 93 | 149,121231,eerie,1436920551 94 | 149,121231,friendship,1436920551 95 | 149,121231,teenagers,1436920551 96 | 152,52319,World War II,1335900622 97 | 164,608,Quirky,1179031045 98 | 176,5445,Tom Cruise,1378384753 99 | 176,7458,Brad Pitt,1341055445 100 | 176,8972,Nicolas Cage,1341055378 101 | 176,69640,Johnny Depp,1340916254 102 | 176,104841,sexist,1384107184 103 | 179,260,nerdy,1436669973 104 | 179,260,Science Fiction,1436669968 105 | 187,2082,Emilio Estevez,1233517175 106 | 200,260,critically acclaimed,1437932756 107 | 200,260,Science Fiction,1437932763 108 | 200,80551,strong female presence,1438023607 109 | 212,1061,emotional,1253930128 110 | 212,1061,revenge,1253930131 111 | 212,1061,true story,1253930149 112 | 212,1732,cult film,1253929373 113 | 212,1732,dark comedy,1253929372 114 | 212,1732,satirical,1253929376 115 | 212,2359,British,1253930034 116 | 212,2641,childish plot,1253930390 117 | 212,2641,plot holes,1253930395 118 | 212,2641,superhero,1253930399 119 | 212,3257,Kevin Costner,1253930539 120 | 212,3257,snorefest,1253930543 121 | 212,3275,vigilantism,1218405594 122 | 212,3300,anti-hero,1253929917 123 | 212,3300,Cole Hauser,1253929922 124 | 212,3300,Vin Diesel,1253929910 125 | 212,3301,Amanda Peet,1253929798 126 | 212,3301,Matthew Perry,1253929802 127 | 212,3301,Rosanna Arquette,1253929813 128 | 212,3809,Bill Murray,1253931205 129 | 212,3809,quirky,1253931216 130 | 212,3882,cheerleading,1253932873 131 | 212,3882,Eliza Dushku,1253932876 132 | 212,3882,Overrated,1253932878 133 | 212,4069,standard romantic comedy,1253933882 134 | 212,5219,video game adaptation,1253930587 135 | 212,5507,play enough video games and you can become an NSA agent,1253930345 136 | 212,5507,ridiculous training sequence,1253930342 137 | 212,7373,Guillermo del Toro,1253929290 138 | 212,7373,steampunk,1253929296 139 | 212,7373,superhero,1253929293 140 | 212,8861,Oded Fehr,1253930624 141 | 212,8861,post-apocalyptic,1253930636 142 | 212,8861,Sienna Guillory,1253930633 143 | 212,8861,zombies,1253930644 144 | 212,8928,Beautiful Woman,1253926735 145 | 212,8928,campy,1253926723 146 | 212,8928,irreverent,1253926726 147 | 212,8928,quirky,1253926728 148 | 212,8928,Roman Polanski,1253926730 149 | 212,27904,good animation,1253931241 150 | 212,27904,interesting concept - bad execution,1253931234 151 | 212,27904,surrealism,1253931244 152 | 212,34405,aliens,1253928597 153 | 212,34405,black comedy,1253928590 154 | 212,34405,Firefly,1253928585 155 | 212,37733,disappointing,1253929514 156 | 212,37733,overrated,1253929532 157 | 212,37733,Viggo Mortensen,1253929518 158 | 212,48394,dark,1218405627 159 | 212,48394,fairytale,1218405627 160 | 212,60684,alternate reality,1253926526 161 | 212,60684,diluted version of comic,1253926511 162 | 212,60684,dystopia,1253926517 163 | 212,60684,stylized,1253926519 164 | 212,63992,high school,1253932469 165 | 212,63992,Kristen Stewart,1253932484 166 | 212,63992,Robert Pattinson,1253932481 167 | 212,63992,Teen movie,1253932459 168 | 212,63992,Vampire Human Love,1253932463 169 | 212,64957,adapted from:book,1253929184 170 | 212,64957,Aging Disorder,1253929189 171 | 212,64957,Brad Pitt,1253929141 172 | 212,64957,cinematography,1253929177 173 | 212,64957,drama,1253929173 174 | 212,64957,original plot,1253929180 175 | 212,64957,slow parts,1253929162 176 | 212,64957,touching,1253929169 177 | 212,66097,alternate universe,1253926354 178 | 212,66097,author:Neil Gaiman,1253926328 179 | 212,66097,claymation,1253926330 180 | 212,66097,Dakota Fanning,1253926349 181 | 212,66097,dark,1253926337 182 | 212,66097,fairy tale,1253926364 183 | 212,66097,Neil Gaiman,1253926360 184 | 212,66934,Captain Hammer,1253926165 185 | 212,66934,joss whedon,1253926158 186 | 212,66934,mad scientist,1253926186 187 | 212,66934,musical,1253926178 188 | 212,66934,Nathan Fillion,1253926182 189 | 212,66934,Neil Patrick Harris,1253926160 190 | 212,66934,parody,1253926168 191 | 212,68157,Brad Pitt,1253926443 192 | 212,68157,ending,1253926451 193 | 212,68157,gratuitous violence,1253926429 194 | 212,68157,Quentin Tarantino,1253926404 195 | 212,68157,satire,1253926435 196 | 212,68157,unusual plot structure,1253926417 197 | 212,68157,violence,1253926426 198 | 212,68157,World War II,1253926408 199 | 212,68319,action,1253928934 200 | 212,68319,bad plot,1253928950 201 | 212,68319,hugh jackman,1253928926 202 | 212,68319,superhero,1253928935 203 | 212,68319,too long,1253928954 204 | 212,69526,bad plot,1253929636 205 | 212,69526,robots,1253929657 206 | 212,69526,sufficiently explodey to be good,1253929639 207 | 212,69712,Abigail Breslin,1260688098 208 | 212,69712,ending,1260688053 209 | 212,69712,genuine characters,1260688086 210 | 212,69712,Jason Patric,1260688105 211 | 212,71057,animation style,1253926100 212 | 219,46970,funny,1156788918 213 | 219,46970,nascar,1156788904 214 | 219,46970,will farell,1156788913 215 | 219,47640,beer,1156788764 216 | 219,47640,guy movie,1156788774 217 | 255,6985,breathtaking,1237060605 218 | 255,7361,i'd like to live in this movie,1237060140 219 | 255,44555,breathtaking,1237060235 220 | 257,3030,akira kurosawa,1338084833 221 | 257,3030,black comedy,1338084822 222 | 277,5618,anime,1463916275 223 | 277,5618,children,1463916304 224 | 277,5618,fantasy,1463916289 225 | 277,5971,fantasy,1463916386 226 | 277,5971,feel-good,1463916401 227 | 277,82461,bad acting,1463916776 228 | 277,82461,bad plot,1463916772 229 | 277,104283,anime,1463916484 230 | 277,132046,sci-fi,1463916817 231 | 294,6754,vampire,1138983469 232 | 294,8865,1940's feel,1140395930 233 | 294,8865,unique look,1140395930 234 | 294,36401,fairy tales,1138983064 235 | 297,104,adam sandler,1318706504 236 | 297,608,overrated,1318706739 237 | 297,1103,overrated,1318706182 238 | 297,2502,that fat nerd is just annoying,1318706118 239 | 297,8528,seen it before,1318706449 240 | 297,47640,Just stupid,1318706536 241 | 297,52973,overrated,1318706470 242 | 297,55820,overrated,1318706705 243 | 297,61024,Exellent first half,1318706053 244 | 297,61024,terrible ending,1318706042 245 | 297,76251,Gore!!!,1318706275 246 | 297,76251,sort of honest,1318706296 247 | 297,78209,Lame everything,1318706227 248 | 313,153,Ei muista,1168879424 249 | 313,293,Ei muista,1168878292 250 | 313,434,Ei muista,1168879406 251 | 313,494,Ei muista,1168878844 252 | 313,593,Katso Sanna!,1146476938 253 | 313,745,Ei muista,1146476364 254 | 313,780,Ei muista,1168878578 255 | 313,858,Katso Sanna!,1146476794 256 | 313,1036,Ei muista,1168878388 257 | 313,1042,Katso Sanna!,1146477133 258 | 313,1120,Katso Sanna!,1146477118 259 | 313,1200,Katso Sanna!,1146477102 260 | 313,1291,Ei muista,1146476308 261 | 313,1610,Katso Sanna!,1146476692 262 | 313,1617,Ei muista,1168878593 263 | 313,2273,Ei muista,1146477049 264 | 313,2329,Ei muista,1146477033 265 | 313,2600,Ei muista,1168879003 266 | 313,2985,Ei muista,1168878743 267 | 313,2989,Ei muista,1168878972 268 | 313,3160,Ei muista,1168878736 269 | 313,3175,Ei muista,1168879647 270 | 313,3499,Ei muista,1168878727 271 | 313,3793,Katso Sanna!,1146476954 272 | 313,3916,Ei muista,1146476265 273 | 313,3949,Ei muista,1168878697 274 | 313,3994,Ei muista,1168879611 275 | 313,4011,Ei muista,1168878363 276 | 313,4023,ei muista,1146477341 277 | 313,4121,Katso Sanna!,1146477233 278 | 313,4223,Ei muista,1146476274 279 | 313,4638,Ei muista,1146477217 280 | 313,5064,Ei muista,1146476225 281 | 313,5602,Ei muista,1168879553 282 | 313,6218,Ei muista,1146477184 283 | 313,7347,Ei muista,1168879177 284 | 313,8781,Ei muista,1168879161 285 | 313,44199,Ei muista,1168878406 286 | 314,260,awesome,1437338154 287 | 314,260,awesome soundtrack,1437338150 288 | 314,260,jedi,1437338139 289 | 314,260,space adventure,1437338130 290 | 346,741,anime,1159734531 291 | 346,924,boring,1159734220 292 | 346,924,slow,1159734220 293 | 346,1036,explosions,1159734488 294 | 346,1199,dystopia,1159734421 295 | 346,1274,anime,1159734245 296 | 346,2019,long,1159734734 297 | 346,2571,sci-fi,1159734634 298 | 346,2571,virtual reality,1159734636 299 | 346,2924,martial arts,1159734502 300 | 346,2997,interesting,1159734363 301 | 346,3265,explosions,1159734552 302 | 346,3265,martial arts,1159734552 303 | 346,3996,martial arts,1159734445 304 | 346,4993,long,1159734600 305 | 346,5782,explosions,1159734683 306 | 346,5782,interesting,1159734683 307 | 346,5902,boring,1159734233 308 | 346,5952,boring,1159734622 309 | 346,5952,long,1159734622 310 | 346,6350,anime,1159734417 311 | 346,6539,Johnny Depp,1159734656 312 | 346,6857,anime,1159734652 313 | 346,7022,explosions,1159734326 314 | 346,7022,martial arts,1159734467 315 | 346,7090,martial arts,1159734555 316 | 346,7153,boring,1159734608 317 | 346,7153,long,1159734608 318 | 346,7361,interesting,1159734515 319 | 346,7844,martial arts,1159734589 320 | 346,8795,Long,1159734165 321 | 346,8874,comedy,1159734745 322 | 346,8874,zombies,1159734745 323 | 346,8961,super-hero,1159734563 324 | 346,26554,apocalypse,1159734706 325 | 346,26554,interesting,1159734706 326 | 346,26585,explosions,1159734379 327 | 346,31878,comedy,1159734584 328 | 346,31878,martial arts,1159734584 329 | 346,31878,Stephen Chow,1159734584 330 | 346,33794,bad camerawork,1159734301 331 | 346,33794,slow,1159734301 332 | 347,1721,romance,1463337171 333 | 347,122886,space,1463337217 334 | 347,122886,star,1463337213 335 | 353,4721,As historicaly correct as Germany winning WW2,1140389056 336 | 353,4721,but still a fun movie.,1140389056 337 | 353,7376,"The Rocks ""finest"" work need I say more?",1140389511 338 | 353,31221,Try not to mistake this for an episode of Alias,1140389595 339 | 353,32025,Mossad,1142686944 340 | 353,35836,dumb,1137217440 341 | 364,47,biblical,1444534976 342 | 364,47,crime,1444534982 343 | 364,47,dark,1444534994 344 | 364,47,disturbing,1444534971 345 | 364,47,greed,1444534998 346 | 364,47,horror,1444534981 347 | 364,47,serial killer,1444534961 348 | 364,47,violent,1444534985 349 | 364,50,thriller,1444534932 350 | 364,232,aging,1444531181 351 | 364,232,Ang Lee,1444531177 352 | 364,232,cooking,1444531178 353 | 364,232,food,1444531186 354 | 364,232,relationships,1444531173 355 | 364,293,assassin,1444534872 356 | 364,293,hit men,1444534872 357 | 364,293,thriller,1444534887 358 | 364,318,friendship,1444529800 359 | 364,318,Morgan Freeman,1444529792 360 | 364,318,narrated,1444529829 361 | 364,318,prison,1444529824 362 | 364,318,prison escape,1444529820 363 | 364,318,revenge,1444529816 364 | 364,318,Tim Robbins,1444529804 365 | 364,318,wrongful imprisonment,1444529809 366 | 364,1176,imaginative,1444528969 367 | 364,1176,intellectual,1444528955 368 | 364,1176,Irene Jacob,1444528944 369 | 364,1176,Krzysztof Kieslowski,1444528941 370 | 364,1176,lyrical,1444528947 371 | 364,1210,action,1444529881 372 | 364,1210,aliens,1444529879 373 | 364,1210,George Lucas,1444529892 374 | 364,1210,Harrison Ford,1444529867 375 | 364,1210,sci-fi,1444529889 376 | 364,1210,sequel,1444529879 377 | 364,1210,space,1444529864 378 | 364,1210,Star Wars,1444529869 379 | 364,1210,starship pilots,1444529899 380 | 364,1210,war,1444529884 381 | 364,1265,alternate reality,1444530190 382 | 364,1265,Bill Murray,1444530207 383 | 364,1265,character development,1444530216 384 | 364,1265,comedy,1444530195 385 | 364,1265,existentialism,1444530199 386 | 364,1265,feel-good,1444530209 387 | 364,1265,funny,1444530200 388 | 364,1265,love,1444530219 389 | 364,1265,romantic,1444530228 390 | 364,1265,self discovery,1444530203 391 | 364,1732,great dialogue,1444535166 392 | 364,1732,Jeff Bridges,1444535201 393 | 364,1732,Nudity (Full Frontal),1444535170 394 | 364,1732,off-beat comedy,1444535205 395 | 364,1732,quirky,1444535198 396 | 364,1732,satirical,1444535172 397 | 364,1732,Steve Buscemi,1444535164 398 | 364,2068,coming of age,1444530920 399 | 364,2068,funny,1444530913 400 | 364,2424,bookshop,1444530521 401 | 364,2424,happy ending,1444530524 402 | 364,2424,Meg Ryan,1444530516 403 | 364,2424,Romance,1444530517 404 | 364,2424,romantic comedy,1444530529 405 | 364,2424,Tom Hanks,1444530527 406 | 364,3948,awkward,1444529225 407 | 364,4018,creative,1444529752 408 | 364,4018,funny,1444529739 409 | 364,4018,Helen Hunt,1444529729 410 | 364,4018,hilarious,1444529746 411 | 364,4018,Mel Gibson,1444529756 412 | 364,4018,stereotypes,1444529760 413 | 364,4973,beautifully filmed,1444528866 414 | 364,4973,comedy,1444528870 415 | 364,4973,feel-good,1444528900 416 | 364,4973,imagination,1444528880 417 | 364,4973,love,1444528875 418 | 364,4973,notable soundtrack,1444528857 419 | 364,4973,quirky,1444528852 420 | 364,4973,whimsical,1444528860 421 | 364,5299,comedy,1444529040 422 | 364,5299,family,1444529044 423 | 364,5299,funny,1444529038 424 | 364,6350,aviation,1444531651 425 | 364,6350,funny,1444531664 426 | 364,6350,imagination,1444531644 427 | 364,6350,Studio Ghibli,1444531642 428 | 364,6350,visually appealing,1444531647 429 | 364,6539,comedy,1444529977 430 | 364,6539,Disney,1444529957 431 | 364,6539,funny,1444529960 432 | 364,6539,johnny depp,1444529962 433 | 364,6539,magic,1444529965 434 | 364,6539,Orlando Bloom,1444529978 435 | 364,6539,pirates,1444529987 436 | 364,6539,sword fight,1444529953 437 | 364,7161,family,1444535334 438 | 364,7161,funny,1444535340 439 | 364,7161,Steve Martin,1444535326 440 | 364,8464,american idiocy,1444531783 441 | 364,8464,documentary,1444531780 442 | 364,8464,social criticism,1444531787 443 | 364,26614,thriller,1444534834 444 | 364,26662,anime,1444530775 445 | 364,26662,Hayao Miyazaki,1444530771 446 | 364,26662,Studio Ghibli,1444530778 447 | 364,27611,military,1444534772 448 | 364,27611,sci-fi,1444534765 449 | 364,34321,Billy Bob Thornton,1444531424 450 | 364,34321,comedy,1444531438 451 | 364,34321,cursing,1444531450 452 | 364,34321,funny,1444531430 453 | 364,34321,redemption,1444531445 454 | 364,46578,beauty pageant,1444529380 455 | 364,46578,family,1444529368 456 | 364,46578,independent film,1444529351 457 | 364,46578,off-beat comedy,1444529354 458 | 364,46578,quirky,1444529369 459 | 364,46578,road trip,1444529384 460 | 364,46578,satire,1444529372 461 | 364,46578,steve carell,1444529375 462 | 364,56367,comedy,1444535241 463 | 364,56367,indie,1444535247 464 | 364,56367,pregnancy,1444535236 465 | 364,56367,teen pregnancy,1444535244 466 | 364,56367,witty,1444535238 467 | 364,64957,Aging,1444531050 468 | 364,64957,Brad Pitt,1444531041 469 | 364,64957,original plot,1444531048 470 | 364,64957,philosophical,1444531044 471 | 364,66934,comedy,1444530699 472 | 364,66934,great soundtrack,1444530694 473 | 364,66934,mad scientist,1444530688 474 | 364,66934,Neil Patrick Harris,1444530691 475 | 364,66934,parody,1444530697 476 | 364,68358,sci fi,1444530041 477 | 364,68358,spock,1444530038 478 | 364,68358,Star Trek,1444530030 479 | 364,68358,teleportation,1444530045 480 | 364,72998,aliens,1444529449 481 | 364,72998,graphic design,1444529463 482 | 364,72998,predictable,1444529440 483 | 364,72998,racism,1444529457 484 | 364,72998,sci-fi,1444529439 485 | 364,93040,drama,1444530864 486 | 364,93040,historical,1444530873 487 | 364,93040,interesting,1444530861 488 | 364,95441,cocaine,1444529273 489 | 364,95441,crude humor,1444529258 490 | 364,95441,directorial debut,1444529282 491 | 364,95441,Mark Wahlberg,1444529284 492 | 364,97938,Ang Lee,1444531076 493 | 364,97938,India,1444531086 494 | 364,97938,ocean,1444531083 495 | 364,97938,visually appealing,1444531077 496 | 364,106696,Disney,1444530267 497 | 364,106696,feminist,1444530281 498 | 364,106696,musical,1444530270 499 | 364,106696,storyline,1444530270 500 | 364,109249,beautiful scenery,1444529645 501 | 364,109249,coming of age,1444529671 502 | 364,109249,dark humor,1444529660 503 | 364,109249,Fernando E. Solanas,1444529608 504 | 364,109249,imaginative,1444529635 505 | 364,109249,quirky,1444529613 506 | 364,109249,whimsical,1444529629 507 | 364,109374,amazing storytelling,1444529104 508 | 364,109374,Bill Murray,1444529124 509 | 364,109374,cinematography,1444529143 510 | 364,109374,eastern europe,1444529119 511 | 364,109374,europe,1444529115 512 | 364,109374,historical,1444529139 513 | 364,109374,on the run,1444529148 514 | 364,109374,quirky,1444529109 515 | 364,115617,comedy,1444530392 516 | 364,115617,Coming of Age,1444530417 517 | 364,115617,family,1444530402 518 | 364,115617,friends,1444530410 519 | 364,115617,funny,1444530394 520 | 364,115617,happy ending,1444530413 521 | 364,115617,heartwarming,1444528813 522 | 364,115617,inspiring,1444530404 523 | 364,115617,japanese influence,1444528772 524 | 364,115617,pixar,1444530397 525 | 364,115617,technology,1444528781 526 | 364,118997,comedy,1444530159 527 | 364,118997,fairy tale,1444530100 528 | 364,118997,funny,1444530106 529 | 364,118997,great soundtrack,1444530127 530 | 364,118997,imaginative,1444530113 531 | 364,118997,meryl streep,1444530142 532 | 364,118997,musical,1444530098 533 | 364,134853,coming of age,1444530618 534 | 364,134853,creative,1444530620 535 | 364,134853,happiness,1444530631 536 | 364,134853,imaginative,1444530628 537 | 364,134853,Pixar,1444530633 538 | 377,54290,Why the terrorists hate us,1187054169 539 | 380,40614,predictable,1148077862 540 | 402,260,coming of age,1443393648 541 | 402,260,"space epic, science fiction, hero's journey",1443393664 542 | 423,111,cult film,1354033605 543 | 423,111,Martin Scorsese,1354033608 544 | 423,111,social commentary,1354033602 545 | 423,247,stylized,1353702088 546 | 423,247,surreal,1353702083 547 | 423,247,surrealism,1353702082 548 | 423,247,visceral,1353702078 549 | 423,745,animation,1354033746 550 | 423,1079,dark comedy,1353703178 551 | 423,1079,dark humor,1353703182 552 | 423,1079,John Cleese,1353703171 553 | 423,1079,quirky,1353703185 554 | 423,1080,Monty Python,1354033667 555 | 423,1089,nonlinear,1354033640 556 | 423,1089,organized crime,1354033636 557 | 423,1089,Quentin Tarantino,1354033638 558 | 423,1219,Alfred Hitchcock,1354033644 559 | 423,1219,psychology,1354033646 560 | 423,1256,Marx Brothers,1354033718 561 | 423,1258,cult film,1354033633 562 | 423,1258,psychology,1354033630 563 | 423,1729,dark comedy,1354033685 564 | 423,2288,disturbing,1354033600 565 | 423,2529,post-apocalyptic,1354033650 566 | 423,2529,twist ending,1354033653 567 | 423,3030,atmospheric,1354033596 568 | 423,3546,insanity,1353702162 569 | 423,3546,psychological thriller,1353702155 570 | 423,5618,atmospheric,1354033628 571 | 423,5618,surreal,1354033624 572 | 423,5662,Dave Foley,1354033569 573 | 423,6287,Adam Sandler,1354044970 574 | 423,6713,Satoshi Kon,1354033681 575 | 423,8188,elegant,1353701935 576 | 423,27592,stylized,1354033620 577 | 423,27592,violent,1354033611 578 | 423,27773,depressing,1353702390 579 | 423,27773,disturbing,1353702357 580 | 423,27773,hallucinatory,1353702345 581 | 423,27773,paranoid,1353702396 582 | 423,27773,revenge,1353702393 583 | 423,27773,stylized,1353702395 584 | 423,27773,twist ending,1353702400 585 | 423,27773,vengeance,1353702404 586 | 423,30745,Takashi Miike,1354033711 587 | 423,31658,Studio Ghibli,1354033715 588 | 423,52885,anime,1354033657 589 | 423,55820,coen brothers,1354033663 590 | 423,58559,Batman,1354033727 591 | 423,67997,british,1354033698 592 | 423,67997,politics,1354033700 593 | 423,67997,satire,1354033695 594 | 431,5,steve martin,1140455432 595 | 431,150,tom hanks,1165548454 596 | 431,186,hugh grant,1140455419 597 | 431,215,ethan hawke,1165548716 598 | 431,236,meg ryan,1140455397 599 | 431,246,school,1140455370 600 | 431,260,classic,1140454408 601 | 431,300,Ralph Fiennes,1165548551 602 | 431,318,Phenomenal!,1140454312 603 | 431,337,Johnny Depp***,1140455226 604 | 431,350,overrated,1140455303 605 | 431,357,didn't get it,1140455255 606 | 431,377,Sandra Bullock,1140454879 607 | 431,520,very funny!,1140454299 608 | 431,529,jodi foster,1140455238 609 | 431,587,sweet,1140455292 610 | 431,628,edward norton,1140455195 611 | 431,661,creepy good,1140454443 612 | 431,904,jimmy stewart,1140455208 613 | 431,1059,claire daines,1140455275 614 | 431,1059,clever,1140455275 615 | 431,1059,Leonardo DiCaprio,1140455275 616 | 431,1059,shakespeare,1140455275 617 | 431,1242,denzel washington,1140455181 618 | 431,1250,war,1140455073 619 | 431,1259,r. phoneix,1140455011 620 | 431,1291,cool,1140454996 621 | 431,1358,billy bob thorton,1140455032 622 | 431,1376,not too thrilled,1140455126 623 | 431,1517,cheezy to the max!,1140454281 624 | 431,1635,moody,1165548220 625 | 431,1639,hillarious,1140455102 626 | 431,1801,good but not accurate,1140454574 627 | 431,1954,stallone,1140455043 628 | 431,1956,il,1165548392 629 | 431,1956,lake forest,1165548392 630 | 431,2004,not worth your time,1140454610 631 | 431,2012,liked the other two better,1140455152 632 | 431,2019,classic,1140455086 633 | 431,2085,cute,1140454548 634 | 431,2174,Micheal Keaton,1140454821 635 | 431,2369,must see,1140454589 636 | 431,2396,colin firth,1165548675 637 | 431,2396,Gwenth Paltrow,1165548675 638 | 431,2396,joseph fiennes,1165548675 639 | 431,2571,philosophy,1140454933 640 | 431,2692,cool and great music,1140454393 641 | 431,2700,stupidity,1140454856 642 | 431,3052,controversial,1140454805 643 | 431,3113,stupid,1140455730 644 | 431,3114,Tom Hanks,1140454833 645 | 431,3260,boring,1140455703 646 | 431,3361,tim robbins,1140455685 647 | 431,3751,mel gibson,1140454892 648 | 431,3785,stupid,1140455743 649 | 431,3793,can't stand rogue!,1140454770 650 | 431,3893,renee z,1140455496 651 | 431,3978,boring,1140455549 652 | 431,3994,predictable,1140454952 653 | 431,4016,okay,1140454522 654 | 431,4041,overrated,1140455478 655 | 431,4306,witty!,1140454779 656 | 431,4351,keanu reeves,1140455511 657 | 431,4369,no desire to see this,1140454634 658 | 431,4641,thora birch,1140455465 659 | 431,4701,jakie chan,1140455522 660 | 431,4823,john cusack,1140455540 661 | 431,4890,Gwenth Paltrow,1140455564 662 | 431,5064,not by book,1165548741 663 | 431,5135,nair,1165548358 664 | 431,5349,tobey maguire,1140454907 665 | 431,5377,books,1140455343 666 | 431,5377,relationships,1140455343 667 | 431,5481,mike myers,1140455409 668 | 431,5812,moore,1165548495 669 | 431,6281,collin farrel,1140455383 670 | 431,6539,I loved it! Seen it five times already!,1140454336 671 | 431,6953,Benicio Del Toro,1165548439 672 | 431,8636,the best comic adaptation!,1140454370 673 | 446,65514,kung fu,1436621295 674 | 446,65514,martial arts,1436621295 675 | 446,65514,well done,1436621295 676 | 446,68954,feel good,1436621325 677 | 446,68954,light,1436621329 678 | 446,88125,dark,1436621257 679 | 448,260,sci-fi,1444761481 680 | 448,260,supernatural powers,1444761492 681 | 450,107649,Alex van Warmerdam,1475737520 682 | 450,107649,magical realism,1475737516 683 | 450,107649,weird,1475737522 684 | 456,3703,action,1432308664 685 | 456,3703,desert,1432308674 686 | 456,3703,Dieselpunk,1432308755 687 | 456,3703,Mel Gibson,1432308659 688 | 456,3703,Post apocalyptic,1432308686 689 | 456,3703,post-apocalyptic,1432308656 690 | 456,44191,dystopia,1432308622 691 | 456,44191,thought-provoking,1432308624 692 | 456,106473,animation,1432309088 693 | 456,106473,anime,1432309076 694 | 456,106473,based on a TV show,1432309085 695 | 456,106473,fantasy world,1432309090 696 | 468,3676,National Film Registry,1296204035 697 | 468,5449,Adam Sandler,1296202619 698 | 478,55247,adventure,1446622291 699 | 478,55247,freedom,1446622248 700 | 478,55247,road trip,1446622240 701 | 478,55247,self discovery,1446622259 702 | 478,55247,travel,1446622390 703 | 478,106918,Iceland,1446830144 704 | 478,106918,photography,1446830153 705 | 478,106918,travel,1446830142 706 | 480,8369,who done it,1339456173 707 | 480,70286,aliens,1339283926 708 | 480,70286,humor,1339283929 709 | 480,70286,sci-fi,1339283933 710 | 480,70286,violence,1339283935 711 | 480,78349,claustrophobic,1339283636 712 | 480,78349,experiment,1339283627 713 | 480,78349,group psychology,1339283610 714 | 480,78349,psychological,1339283617 715 | 480,94777,aliens,1339284270 716 | 480,94777,franchise,1339284250 717 | 480,94777,funny,1339284240 718 | 480,94777,time travel,1339284259 719 | 481,260,George Lucas,1437000955 720 | 481,260,starwars,1437000940 721 | 481,1900,brother sister relationship,1437107499 722 | 481,1900,sad,1437107475 723 | 481,1900,siblings,1437107491 724 | 481,2361,cult film,1437105893 725 | 481,117533,big brother,1437004053 726 | 481,117533,documentary,1437004053 727 | 481,117533,privacy,1437004053 728 | 501,1,Pixar,1292956344 729 | 501,47,psychology,1292956276 730 | 501,47,twist ending,1292956362 731 | 501,296,dark comedy,1292956439 732 | 501,296,Quentin Tarantino,1292956435 733 | 501,778,dark comedy,1292956194 734 | 501,1089,organized crime,1292956420 735 | 501,1089,Quentin Tarantino,1292956416 736 | 501,2542,dark comedy,1292956446 737 | 501,2542,Guy Ritchie,1292956448 738 | 501,2542,organized crime,1292956450 739 | 501,2692,nonlinear,1292956226 740 | 501,2692,time travel,1292956392 741 | 501,2959,dark comedy,1292956481 742 | 501,2959,psychology,1292956476 743 | 501,2959,twist ending,1292956479 744 | 501,3114,animation,1292956346 745 | 501,3949,addiction,1292956422 746 | 501,3949,psychology,1292956425 747 | 501,4011,Guy Ritchie,1292956203 748 | 501,4011,twist ending,1292956205 749 | 501,5608,psychology,1292956490 750 | 501,6016,multiple storylines,1292956518 751 | 501,40148,guy ritchie,1292956408 752 | 501,47099,based on a true story,1292956431 753 | 501,60072,assassin,1292956189 754 | 501,62849,Drugs,1292956230 755 | 501,62849,Guy Ritchie,1292956234 756 | 501,62849,twist ending,1292956236 757 | 501,68157,Quentin Tarantino,1292956453 758 | 501,74458,psychological,1292956215 759 | 501,74458,twist ending,1292956212 760 | 501,77455,renegade art,1292956306 761 | 501,78499,Pixar,1292956196 762 | 501,79132,alternate reality,1292956469 763 | 503,260,Science Fiction,1432365802 764 | 503,260,space,1432365814 765 | 512,74458,Psychological Thriller,1434169401 766 | 512,79132,Intrigued,1434169472 767 | 520,4322,baseball,1146596187 768 | 531,1028,comedy of manners,1243454358 769 | 531,1028,Disney,1243454364 770 | 531,1028,Disney studios,1243454367 771 | 531,1028,Julie Andrews,1243454386 772 | 531,1028,multiple roles,1243454371 773 | 531,1028,musical,1243454374 774 | 531,1028,villain nonexistent or not needed for good story,1243454382 775 | 531,1088,80's classic,1243454483 776 | 531,1088,dance,1243454500 777 | 531,1088,girlie movie,1243454518 778 | 531,1088,music,1243454488 779 | 531,1088,musical parodies,1243454503 780 | 531,1088,rich families,1243454508 781 | 531,1258,disturbing,1243454832 782 | 531,1258,Nudity (Full Frontal - Notable),1243454840 783 | 531,1258,Nudity (Full Frontal),1243454842 784 | 531,1258,psychological,1243454846 785 | 531,1258,violent,1243454848 786 | 531,1997,demons,1243454875 787 | 531,1997,horror,1243454878 788 | 531,1997,possession,1243454880 789 | 531,1997,scary,1243454882 790 | 531,2942,dance,1243454447 791 | 531,2942,girlie movie,1243454450 792 | 531,2942,Nudity (Topless),1243454452 793 | 531,2942,strippers,1243454454 794 | 531,4973,comedy,1243510202 795 | 531,4973,drama,1243510204 796 | 531,4973,notable soundtrack,1243510209 797 | 531,4973,quirky,1243510212 798 | 531,6218,football,1243509668 799 | 531,6218,Funny,1243509674 800 | 531,6218,Keira Knightley,1243509670 801 | 531,6218,love,1243509684 802 | 531,6863,funny,1243454300 803 | 531,6863,Jack Black,1243454312 804 | 531,6863,music,1243454317 805 | 531,6863,not only for kids,1243454321 806 | 531,6863,Rock,1243454333 807 | 531,6863,rock and roll,1243454326 808 | 531,6942,british,1243454933 809 | 531,6942,christmas,1243454936 810 | 531,6942,ensemble cast,1243454939 811 | 531,6942,Keira Knightley,1243454943 812 | 531,6942,love,1243454946 813 | 531,6942,multiple storylines,1243454948 814 | 531,6942,Nudity (Topless - Notable),1243454950 815 | 531,6942,Nudity (Topless),1243454952 816 | 531,6942,Romance,1243454954 817 | 531,8533,covers a lifespan,1243455160 818 | 531,8533,memories,1243455166 819 | 531,35836,comedy,1243454586 820 | 531,35836,crude,1243454590 821 | 531,35836,funny,1243454588 822 | 531,35836,nerds,1243454598 823 | 531,35836,Nudity (Topless - Notable),1243454603 824 | 531,35836,Nudity (Topless),1243454610 825 | 531,35836,sex,1243454608 826 | 531,45720,fashion,1243454978 827 | 531,45720,New York,1243454980 828 | 531,45720,Paris,1243454982 829 | 531,45720,Streep strong & funny,1243454985 830 | 531,59725,fashion,1243455007 831 | 531,59725,New York City,1243455012 832 | 531,59725,Nudity (Topless),1243455014 833 | 531,59725,R:strong sexual content,1243455020 834 | 531,59725,romance,1243455018 835 | 531,63131,funny,1243509641 836 | 531,63131,obvious plot,1243509645 837 | 531,63992,romance,1243455092 838 | 531,63992,Teen movie,1243455095 839 | 531,63992,vampires,1243455099 840 | 531,64957,Brad Pitt,1243455053 841 | 531,64957,cinematography,1243455057 842 | 531,64957,drama,1243455060 843 | 531,64969,easily confused with other movie(s) (title),1243454548 844 | 531,64969,funny,1243454556 845 | 531,64969,Jim carrey,1243454553 846 | 531,64969,Zooey Deschanel,1243454561 847 | 546,3176,Gwyneth Paltrow,1301715429 848 | 546,3176,Jude Law,1301715415 849 | 546,3707,Nudity (Rear),1301284184 850 | 546,3707,Nudity (Topless),1301284182 851 | 546,3707,sexy food,1301284180 852 | 546,5984,BDSM,1301785972 853 | 546,5984,Blindfold,1301785976 854 | 546,5984,Bondage,1301785990 855 | 546,5984,Domination,1301785971 856 | 546,5984,Fetish,1301785980 857 | 546,5984,Gag,1301785974 858 | 546,5984,Spanked,1301785983 859 | 546,5984,Submission,1301785988 860 | 546,5984,Tied,1301785981 861 | 546,5984,Udo Kier,1301785978 862 | 546,7155,Nudity (Topless - Notable),1334013574 863 | 546,7155,Nudity (Topless),1334013570 864 | 546,39421,Nudity (Topless),1334013436 865 | 546,39421,porn,1334013433 866 | 546,48780,based on a book,1301715225 867 | 546,48780,twist ending,1301715231 868 | 546,53318,notable nudity,1301196436 869 | 546,53318,Nudity (Full Frontal),1301196425 870 | 547,215,holes90s,1342849999 871 | 547,293,holes90s,1342849862 872 | 547,306,holes90s,1342849839 873 | 547,319,holes90s,1342849790 874 | 547,364,holes90s,1342849955 875 | 547,541,afi,1182393913 876 | 547,588,holes90s,1342849969 877 | 547,599,dvd,1388422564 878 | 547,599,holes60s,1388820371 879 | 547,757,getdvd,1423201850 880 | 547,914,holes60s,1342850735 881 | 547,946,holes40s,1412348245 882 | 547,954,afi,1182393876 883 | 547,1153,tcm,1189187818 884 | 547,1172,holes80s,1342849456 885 | 547,1197,holes80s,1342849599 886 | 547,1209,holes60s,1351319934 887 | 547,1211,holes80s,1342849466 888 | 547,1232,sightsound,1343973832 889 | 547,1245,holes90s,1342849804 890 | 547,1260,tcm,1189187750 891 | 547,1272,holes70s,1342849087 892 | 547,1273,holes80s,1342849477 893 | 547,1281,holes40s,1412348178 894 | 547,1281,tivo,1476583274 895 | 547,1283,afi,1182394030 896 | 547,1283,hdtv,1199396362 897 | 547,1572,sightsound,1343973703 898 | 547,1935,holes40s,1412348396 899 | 547,1942,holes40s,1412348211 900 | 547,1957,holes80s,1342849494 901 | 547,2028,afi,1182394053 902 | 547,2028,hdtv,1200784012 903 | 547,2028,holes90s,1342849945 904 | 547,2109,holes70s,1342849272 905 | 547,2178,holes70s,1342849166 906 | 547,2182,tcm,1412347392 907 | 547,2202,holes40s,1412348261 908 | 547,2330,holes90s,1342849749 909 | 547,2677,holes90s,1342849900 910 | 547,2730,getdvd,1350531508 911 | 547,2730,holes70s,1342849220 912 | 547,2747,tivo,1476583080 913 | 547,2857,holes60s,1342850705 914 | 547,2920,getdvd,1412348038 915 | 547,2920,holes40s,1412348038 916 | 547,2927,holes40s,1412348319 917 | 547,2936,afi,1182393939 918 | 547,2936,holes40s,1412348275 919 | 547,3022,afi,1182393819 920 | 547,3089,holes40s,1412348191 921 | 547,3111,holes80s,1342849685 922 | 547,3246,holes90s,1342849768 923 | 547,3384,holes70s,1342849030 924 | 547,3415,sightsound,1343973557 925 | 547,3498,holes70s,1342849114 926 | 547,3539,tivo,1476583329 927 | 547,3629,afi,1182433856 928 | 547,3629,hdtv,1200784036 929 | 547,3632,holes40s,1412348295 930 | 547,3634,getdvd,1418986019 931 | 547,3634,holes60s,1418986015 932 | 547,3671,holes70s,1342849229 933 | 547,3679,holes80s,1342849435 934 | 547,3742,sightsound,1343973400 935 | 547,3789,holes60s,1342850721 936 | 547,3811,holes80s,1342849524 937 | 547,3947,holes70s,1431088744 938 | 547,4103,holes80s,1342849627 939 | 547,4262,holes80s,1342849544 940 | 547,4326,holes80s,1342849552 941 | 547,4327,holes60s,1342850801 942 | 547,4437,holes70s,1342849298 943 | 547,4712,holes70s,1342849242 944 | 547,4763,getdvd,1471010946 945 | 547,4928,holes70s,1342849195 946 | 547,5169,tcm,1189187923 947 | 547,5198,holes80s,1342849514 948 | 547,5289,tcm,1189187738 949 | 547,5464,holes00s,1342850481 950 | 547,5747,holes80s,1342849443 951 | 547,5791,holes00s,1342850231 952 | 547,5795,tosee,1475705682 953 | 547,5956,dvd,1259388129 954 | 547,6229,holes70s,1342849313 955 | 547,6433,sightsound,1343973308 956 | 547,6515,tcm,1190475823 957 | 547,6643,sightsound,1343973230 958 | 547,6783,sightsound,1343973251 959 | 547,6830,tcm,1190475890 960 | 547,6874,holes00s,1342850598 961 | 547,6981,sightsound,1343973754 962 | 547,6983,holes40s,1412348440 963 | 547,6985,sightsound,1343973328 964 | 547,7195,tcm,1189187983 965 | 547,7243,afi,1182394001 966 | 547,7243,hdtv,1200784002 967 | 547,7335,tcm,1189187908 968 | 547,7438,holes00s,1342850493 969 | 547,7792,holes70s,1342849103 970 | 547,7926,getdvd,1412347952 971 | 547,8125,afi,1182393855 972 | 547,8128,holes80s,1342849418 973 | 547,8154,holes60s,1342850820 974 | 547,8195,sightsound,1343973620 975 | 547,8207,holes70s,1342849135 976 | 547,8236,tcm,1190475838 977 | 547,8494,dvd,1254390631 978 | 547,8584,tivo,1476583430 979 | 547,8645,holes00s,1342850358 980 | 547,8751,tcm,1189187687 981 | 547,8765,tcm,1189187626 982 | 547,8766,tcm,1189187878 983 | 547,8767,tcm,1189187947 984 | 547,25805,sightsound,1343973445 985 | 547,25927,holes40s,1412348360 986 | 547,25927,tcm,1189187934 987 | 547,26150,sightsound,1343973817 988 | 547,26151,sightsound,1343973505 989 | 547,26366,holes70s,1342849254 990 | 547,31770,tcm,1189187673 991 | 547,34517,tivo,1476583581 992 | 547,37741,toplist05,1378985294 993 | 547,38304,toplist05,1378985376 994 | 547,39183,toplist05,1334343486 995 | 547,40629,toplist05,1378985440 996 | 547,40819,toplist05,1378985328 997 | 547,41585,tcm,1189187574 998 | 547,41863,toplist06,1197165976 999 | 547,41997,hdtv,1200107294 1000 | 547,41997,holes00s,1342850302 1001 | 547,42217,sightsound,1343973482 1002 | 547,44199,toplist06,1197165844 1003 | 547,44555,toplist06,1235048573 1004 | 547,45028,toplist06,1197165941 1005 | 547,45210,toplist06,1198810519 1006 | 547,46578,toplist06,1197165894 1007 | 547,46664,holes40s,1412348651 1008 | 547,46664,tcm,1412348651 1009 | 547,46723,toplist06,1197165832 1010 | 547,47099,toplist06,1197164385 1011 | 547,47274,holes70s,1342849207 1012 | 547,47423,toplist06,1197164296 1013 | 547,47629,toplist06,1197165707 1014 | 547,48394,hdtv,1200549984 1015 | 547,48394,holes00s,1342850387 1016 | 547,48516,toplist06,1197165676 1017 | 547,48696,toplist06,1197165862 1018 | 547,48738,toplist06,1197164312 1019 | 547,48774,toplist06,1197165663 1020 | 547,49824,toplist06,1197165695 1021 | 547,49917,toplist06,1268555186 1022 | 547,50068,holes00s,1342850218 1023 | 547,51540,toplist07,1192323038 1024 | 547,52241,toplist07,1197122754 1025 | 547,52281,hdtv,1200875555 1026 | 547,52281,holes00s,1342850259 1027 | 547,52579,getdvd,1471606640 1028 | 547,52967,toplist07,1197313756 1029 | 547,53000,toplist07,1198811122 1030 | 547,53123,toplist07,1187738100 1031 | 547,53894,toplist07,1197122145 1032 | 547,53953,getdvd,1467870755 1033 | 547,54272,toplist07,1187738146 1034 | 547,54286,toplist07,1197122822 1035 | 547,54513,toplist07,1187737659 1036 | 547,54881,toplist07,1268555304 1037 | 547,55052,toplist07,1197122048 1038 | 547,55063,toplist08,1228486494 1039 | 547,55069,holes00s,1472570165 1040 | 547,55069,toplist07,1301765688 1041 | 547,55118,toplist07,1197121987 1042 | 547,55247,toplist07,1197121953 1043 | 547,55253,toplist07,1268555286 1044 | 547,55269,toplist07,1197122030 1045 | 547,55276,toplist07,1195959915 1046 | 547,55280,holes00s,1342850401 1047 | 547,55363,getdvd,1412346850 1048 | 547,55363,holes00s,1342850280 1049 | 547,55442,holes00s,1342850172 1050 | 547,55442,toplist07,1268555437 1051 | 547,55820,toplist07,1197121965 1052 | 547,55946,tivo,1476113630 1053 | 547,56015,tcm,1412347417 1054 | 547,56152,toplist07,1198811180 1055 | 547,56286,toplist07,1268555500 1056 | 547,56367,toplist07,1198968553 1057 | 547,56607,toplist07,1200334592 1058 | 547,56782,toplist07,1200334609 1059 | 547,56788,toplist07,1201665448 1060 | 547,56805,toplist07,1198968631 1061 | 547,57669,toplist08,1250738298 1062 | 547,58191,toplist08,1223243755 1063 | 547,58879,toplist08,1207957343 1064 | 547,60069,toplist08,1230814631 1065 | 547,60766,toplist08,1223243682 1066 | 547,61024,toplist08,1223243801 1067 | 547,61236,holes00s,1342850181 1068 | 547,61236,toplist08,1230870494 1069 | 547,61240,holes00s,1342850379 1070 | 547,61240,toplist08,1230814624 1071 | 547,61323,toplist08,1223244145 1072 | 547,61357,toplist08,1222269690 1073 | 547,63082,toplist08,1228486518 1074 | 547,63876,toplist08,1230814303 1075 | 547,64614,toplist08,1250440712 1076 | 547,64620,toplist08,1230814286 1077 | 547,64622,toplist08,1230942216 1078 | 547,64701,toplist08,1230869509 1079 | 547,64839,toplist08,1236951337 1080 | 547,66665,toplist09,1253860576 1081 | 547,67087,toplist09,1259384470 1082 | 547,67255,toplist10,1270485757 1083 | 547,67429,toplist08,1238823068 1084 | 547,67665,getdvd,1321564884 1085 | 547,67665,toplist09,1250742907 1086 | 547,67997,toplist09,1250742829 1087 | 547,68157,toplist09,1253860562 1088 | 547,68954,holes00s,1342850438 1089 | 547,69481,toplist09,1250742859 1090 | 547,70286,toplist09,1250742877 1091 | 547,70293,toplist09,1253860588 1092 | 547,71108,toplist09,1292047039 1093 | 547,71464,toplist09,1259383811 1094 | 547,71745,toplist09,1259383809 1095 | 547,72011,toplist09,1262145753 1096 | 547,72131,toplist09,1257694906 1097 | 547,72176,tivo,1476113567 1098 | 547,72226,toplist09,1259383836 1099 | 547,72395,toplist09,1262146022 1100 | 547,72720,toplist09,1271429973 1101 | 547,72741,tivo,1476113594 1102 | 547,73023,toplist09,1267280510 1103 | 547,74324,toplist10,1312868042 1104 | 547,74458,toplist10,1276350948 1105 | 547,74545,toplist10,1276350928 1106 | 547,77455,getdvd,1446815214 1107 | 547,77455,toplist10,1292077803 1108 | 547,78039,toplist10,1296798992 1109 | 547,78499,toplist10,1293381435 1110 | 547,78574,toplist10,1281363877 1111 | 547,78653,toplist09,1277386636 1112 | 547,79132,toplist10,1279986154 1113 | 547,79242,toplist10,1281363890 1114 | 547,80463,toplist10,1287199615 1115 | 547,80489,toplist10,1294976085 1116 | 547,81562,toplist10,1292077909 1117 | 547,81591,toplist10,1296799009 1118 | 547,81786,getdvd,1468248495 1119 | 547,81786,toplist11,1312868144 1120 | 547,81845,toplist10,1296798962 1121 | 547,81932,toplist10,1296799034 1122 | 547,82313,tivo,1476650744 1123 | 547,82459,toplist10,1296798978 1124 | 547,82463,toplist10,1299563653 1125 | 547,83976,toplist11,1328012694 1126 | 547,85394,toplist11,1312868243 1127 | 547,86320,toplist11,1317998755 1128 | 547,86833,toplist11,1312868088 1129 | 547,87304,toplist11,1312868078 1130 | 547,88129,toplist11,1322848734 1131 | 547,88235,toplist11,1322849030 1132 | 547,88810,toplist11,1322894457 1133 | 547,89260,toplist11,1329513993 1134 | 547,89470,toplist11,1317998768 1135 | 547,89492,toplist11,1317998745 1136 | 547,89759,toplist11,1327491628 1137 | 547,89804,toplist11,1322848784 1138 | 547,90057,toplist11,1322848999 1139 | 547,90376,toplist11,1327491698 1140 | 547,90439,toplist11,1332487973 1141 | 547,90531,toplist11,1327491714 1142 | 547,90866,toplist11,1327491687 1143 | 547,91077,toplist11,1322848957 1144 | 547,94931,toplist12,1355599327 1145 | 547,94959,toplist12,1342275142 1146 | 547,94969,getdvd,1409381147 1147 | 547,95135,toplist12,1354966882 1148 | 547,95449,toplist12,1356967567 1149 | 547,95558,toplist12,1348198533 1150 | 547,95761,dvd,1387360646 1151 | 547,96417,dvd,1367718836 1152 | 547,96588,toplist12,1355599298 1153 | 547,96610,dvd,1361888026 1154 | 547,96610,toplist12,1354966811 1155 | 547,96728,toplist12,1355599008 1156 | 547,96811,toplist12,1356623017 1157 | 547,96832,dvd,1369217683 1158 | 547,96832,toplist12,1356409181 1159 | 547,97304,toplist12,1353063038 1160 | 547,97673,toplist13,1383626018 1161 | 547,97752,toplist12,1354966835 1162 | 547,97921,toplist12,1354966859 1163 | 547,97923,toplist12,1356623118 1164 | 547,97938,toplist12,1354966894 1165 | 547,98154,toplist12,1354966827 1166 | 547,98961,toplist12,1355938883 1167 | 547,99114,toplist12,1357543606 1168 | 547,99149,toplist12,1356273884 1169 | 547,101285,getdvd,1386729453 1170 | 547,101525,toplist12,1386245688 1171 | 547,101895,toplist13,1383626036 1172 | 547,102194,toplist13,1386658060 1173 | 547,102469,getdvd,1387083948 1174 | 547,103107,toplist13,1396017727 1175 | 547,103372,toplist13,1383625950 1176 | 547,103449,dvd,1378132522 1177 | 547,103624,toplist13,1388209512 1178 | 547,103688,dvd,1472303875 1179 | 547,105197,toplist13,1386657913 1180 | 547,105355,toplist13,1397129838 1181 | 547,105504,toplist13,1386245656 1182 | 547,106100,toplist13,1389766232 1183 | 547,106766,toplist13,1386658220 1184 | 547,106916,toplist13,1386657846 1185 | 547,106920,toplist13,1386657798 1186 | 547,107141,toplist13,1386946018 1187 | 547,107636,tivo,1476583144 1188 | 547,109374,toplist14,1402760218 1189 | 547,110871,getdvd,1399051398 1190 | 547,111249,getdvd,1423194272 1191 | 547,111251,dvd,1410947141 1192 | 547,111251,toplist14,1446283752 1193 | 547,111505,getdvd,1400501925 1194 | 547,111622,toplist14,1420131617 1195 | 547,112070,tivo,1476113970 1196 | 547,112183,toplist14,1416281804 1197 | 547,112290,toplist14,1411137921 1198 | 547,112515,getdvd,1423194952 1199 | 547,112515,toplist14,1418485385 1200 | 547,112550,getdvd,1470238109 1201 | 547,112552,toplist14,1417010184 1202 | 547,112556,toplist14,1412347244 1203 | 547,112852,toplist14,1407510485 1204 | 547,113064,getdvd,1423194143 1205 | 547,114254,getdvd,1450365156 1206 | 547,114342,toplist14,1417010271 1207 | 547,114459,getdvd,1412474417 1208 | 547,114662,toplist14,1423131235 1209 | 547,115139,tcm,1413426713 1210 | 547,115569,toplist14,1431187332 1211 | 547,115713,toplist15,1446211131 1212 | 547,116161,toplist14,1417010500 1213 | 547,116797,toplist14,1425780053 1214 | 547,117176,toplist14,1417010208 1215 | 547,117533,dvd,1473748292 1216 | 547,117533,getdvd,1446815098 1217 | 547,117533,toplist14,1417010466 1218 | 547,118700,getdvd,1446815138 1219 | 547,118700,toplist14,1423224182 1220 | 547,118880,getdvd,1449757705 1221 | 547,118880,toplist14,1449757002 1222 | 547,122882,getdvd,1470112098 1223 | 547,122882,toplist15,1449755519 1224 | 547,123663,tivo,1476583542 1225 | 547,123695,tivo,1476583459 1226 | 547,127108,tivo,1476583101 1227 | 547,127108,toplist15,1449756138 1228 | 547,127114,toplist15,1449756046 1229 | 547,127144,tivo,1476583306 1230 | 547,127206,getdvd,1470111984 1231 | 547,127212,tivo,1476113822 1232 | 547,128235,tivo,1476583410 1233 | 547,128360,toplist15,1449755822 1234 | 547,128512,getdvd,1470112064 1235 | 547,128606,toplist15,1468592921 1236 | 547,131796,tivo,1476583367 1237 | 547,132458,tivo,1476583517 1238 | 547,132547,dvd,1473748244 1239 | 547,132549,tivo,1476583387 1240 | 547,132800,getdvd,1454164221 1241 | 547,133645,toplist15,1449754606 1242 | 547,133771,getdvd,1470111738 1243 | 547,134130,toplist15,1446210938 1244 | 547,134853,toplist15,1446211185 1245 | 547,134859,getdvd,1470111811 1246 | 547,134859,toplist15,1449755731 1247 | 547,134881,toplist15,1446283359 1248 | 547,137337,toplist15,1446211164 1249 | 547,139385,toplist15,1449755171 1250 | 547,139642,tivo,1476583038 1251 | 547,140174,toplist15,1446211303 1252 | 547,140715,tivo,1476583167 1253 | 547,140725,tivo,1476583059 1254 | 547,141749,toplist15,1468077718 1255 | 547,142488,toplist15,1449754627 1256 | 547,143385,toplist15,1446211337 1257 | 547,144172,bkk,1472400655 1258 | 547,146656,toplist15,1449755899 1259 | 547,148482,tivo,1476583253 1260 | 547,148626,toplist15,1449755637 1261 | 547,155064,bkk,1472179444 1262 | 547,156387,toplist16,1467946629 1263 | 547,158783,bkk,1466133970 1264 | 547,160954,bkk,1472178574 1265 | 547,161336,getdvd,1469112392 1266 | 547,161582,bkk,1472737430 1267 | 547,163056,bkk,1472178747 1268 | 547,163949,toplist16,1476419254 1269 | 547,164977,tivo,1476113746 1270 | 547,164979,tivo,1476113908 1271 | 567,138610,found footage,1436827601 1272 | 567,138610,survival horror,1436827612 1273 | 574,47044,Michael Mann,1232812787 1274 | 583,112552,determination,1430526450 1275 | 583,112552,devotion,1430526450 1276 | 583,112552,music,1430526450 1277 | 599,66203,honest,1344134759 1278 | 599,82152,alex pettyfer,1306109318 1279 | 611,105504,hijacking,1471521058 1280 | 611,105504,ocean,1471521036 1281 | 611,105504,pirates,1471521067 1282 | 611,105504,suspense,1471521044 1283 | 615,111384,revenge,1408781036 1284 | 615,111384,vengeance,1408781025 1285 | 615,111931,femme-fatale,1425503802 1286 | 615,111931,gritty,1425503802 1287 | 615,111931,low-budget,1425503802 1288 | 630,260,classic sci-fi,1443807766 1289 | 630,260,series,1443807803 1290 | 652,146501,gay,1449533216 1291 | 660,260,"imaginary world, characters, story, philosophical",1436680217 1292 | 660,260,script,1436680177 1293 | 660,135518,meaning of life,1436680885 1294 | 660,135518,philosophical,1436680885 1295 | 660,135518,sci-fi,1436680885 1296 | 663,260,action,1438398078 1297 | 663,260,Syfy,1438398050 1298 | -------------------------------------------------------------------------------- /docs/biparte (1).xml: -------------------------------------------------------------------------------- 1 | zVhNc9sgEP01vmYkoa8cWzdtL53JjA9tjlgiElPJeBD+6q/vyoAlgZy4jmzqgwcesMDbx7Johub1/hvH6/IHy0k1C7x8P0NfZkGQPHrw3wIHCYRJIoGC01xCfgcs6B+iQDWu2NCcNIOOgrFK0PUQzNhqRTIxwDDnbDfs9sqq4axrXBALWGS4stGfNBelRNPI6/DvhBalntn3VMsSZ78LzjYrNd8sQK/Hn2yusbal+jclztmuB6GnGZpzxoQs1fs5qVpqNW1y3Nczrad1c7ISlwwI5IAtrjZq6zXbUqLgRhw0IzAMyIfK511JBVmscda27MD9gJWirqDmQ1EZJFyQ/dlF+aetgoIIq4ngB+iiBgRIsXPQqlD1XecLXzNY9vyQKgwr9xcn0x0FUFAsjDOCxhkBCDklJQwckhKeJSV0SkrsUinRWVJ8t0pxyElscZJtGgF2+Du0QCBct2DONsuKPN2RLt8f8oWiO/KVvMUXtMS4bne8Wjbr027/M/5Q6JC/9C3+3grY7viKPYd8PVp82Ryt8k9tCgW1rMJNQ7OWLoG5sOEeM7B7fvgFFe8h0tUX3ban4tikyi+qLOcmuZWMGVzC+tiGZ2Tgc1hRQUQvFNuM9xiNRgjVGCcVFnQ7XMQYy2qGZ0ZheV0ASc44VJuQi1ej+smYYSgwrnjkGYbkli1DR6eftn2RDnTM+2ch9D2uvSod/mG/xrZfA6d+TaOhO9C1fkVDQ9ZJntCvdkI/2QEfdbdx9OEw9A+/9+ClqQaeCaewFQjNH1FJYKskcamS2Dizpm8vFYmVhphqm1Ak9hvnrrfARPEisZWAXCohRBPdA2F6v3vAftlNLgX/1kIYSQicCiE2Y0J4pRAiQwiBqagJhWC/ZqePCeeE4N8qfwhdysC6GuIrZRAbV8MpT7yBDOwH/D2vhmmEMHIxOBVCZL74rs0RouhWOQJUu4/Osnv3YR89/QU= -------------------------------------------------------------------------------- /docs/biparte.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/chandan-u/graph-based-recommendation-system/897c6419d922fcbfe7f110522e530f3eeeceaf17/docs/biparte.png -------------------------------------------------------------------------------- /docs/biparte.xml: -------------------------------------------------------------------------------- 1 | zVjLcpswFP0alvUAAoyXbZq2m85kxos2SxkU0BQsj5Bf/fpeLMmAZCcpIVa88EhH73OOLhc8dFcfvnO8KX+ynFRe6OcHD331wnC+8OG/BY4SiMNAAgWnuYR6wJL+JQpU44otzUkz6CgYqwTdDMGMrdckEwMMc872w25PrBquusEFsYBlhisb/UVzUUo0jf0O/0FoUeqVA1+1rHD2p+Bsu1breSF6Ov1kc431XKp/U+Kc7XsQuvfQHWdMyFJ9uCNVS62mTY77dqX1vG9O1uI1A0I5YIerrTp6zXaUKLgRR80IDAPyofJlX1JBlhuctS17kB+wUtQV1AIoqgkJF+RwdVPB+ajgIMJqIvgRuqgBIVLsHLUrVH3faRFoBsueDqnCsJK/OE/dUQAFxcJlRtBlRgBCTkmJQoekRFdJiZySkrh0SnyVlMCtUxxyklicZNtGwDz8BVogEG5aMGfbVUXub0hXEA/5QvEN+Zo/xxe0JLhuT7xeNZvzaT8Yf2jhkL/0Of6eC9ju+DJjFopuyNfC4svmaJ1/blMoqGUVbhqatXQJzIUN95iB0/Pjb6j4s1hXH3XbgYpTkyo/qrJcm+RWMmZwCftjW56Rgeawo4KIXii2Ge8xGl8gVGOcVFjQ3XATl1hWKzwwCtvrAsj8ygXQU8jNq1H9ZMyYKDQe8cg3JpJHtiY6iX4+9qt8oBPt/zZCX3GtqhT8zbomtq6hU13TeCgHGqsrGk5k3eQJdbUT+skueCd3inqCBzM/TVX9gXAKu4Xoqy57FxTgmvTDgq9HtYA5bJx/Qts/c5f+SYzbbKr+WvsEgTGR6cMJ7WO//dz0+TBL/LRnrU/gkviKtcZ5ZG57BLn0SGQmA2OfHVF6u2eH/Tb4DiaZD+MFRJkAvT1efBjhEzM6RCOFjw3hQ9NBEwpvv/FOL7wZG8ZpG7nU1or8yUhtEyPynxPEd9DWfnO/ZeQPvCkyyAvR3akRYn+iFCA2v1FMlgJAtfvaLLt3X/TR/T8= -------------------------------------------------------------------------------- /docs/graph_based_recommendation system.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "recommendation system using graph traversal" 3 | author: "chandan u" 4 | Instructor: "Funda ergun" 5 | date: "12/6/2016" 6 | output: pdf_document 7 | --- 8 | 9 | 10 | 11 | 12 | # Abstract: 13 | 14 | Implemented a movie recommendation system using the movielens dataset from the grouplens site. This dataset is transformed to a bipartite graph which allowed to address the problem using graph based traversal algorithms instead of usual approaches that are used by recommendation systems. The goal is to implement collaborative filtering technique as well as content based recommendation using the graph traversal algorithms. We will evaluate the advantages and shortcomings and then also discuss how we can improve on this approach. 15 | 16 | # Introduction: 17 | The amount of content that is being generated by social media sites, movies, tv shows etc is increasing tremendously and its very hard for a user or person to choose from such a huge pool of content. There are endless choices. Hence we need to filter out most of these content and give suggestions to user. 18 | Recommendation systems are designed to solve this very problem to give users best suggestions based on existing data and the user preferences. 19 | 20 | Recommendation systems are widely used in e-commerce sites such as Netflix to suggest movies , amazon to suggest products, music application such as iTunes and spotify to suggest next songs that the user may like to hear. It can be applied to even domains such as social networking. Facebook uses it for suggesting friends. 21 | 22 | In this project we implement a collaberative filtering recommendation system that uses existing data to give better suggestions. We will be building a bipartite graph from the data set to support graph traversal for collaborative filtering system. 23 | 24 | 25 | # Data: 26 | The data set is obtained from http://grouplens.org/ . They have a collection of ratings of movies from MovieLens website . This data set covers 100,000 ratings and 1,300 tag applications applied to 9,000 movies by 671 users. For implementing the collaberative filter system we will be using all the data except for tags. The data set has mainly two files : movies.csv, ratings.csv. 27 | 28 | Number of users: 671 29 | Number of movies: >9000 30 | Number of ratings: 100,000 31 | 32 | Files: movies.csv, ratings.csv 33 | 34 | 35 | 36 | 37 | # Representation of data: 38 | To facilitate graph traversal techniques and collaborative filtering , the data is transformed into bipartite graph representation. In a bipartite graph, nodes are divided into two distinctive sets. Links between pairs of nodes from different node sets are admissible, while links between nodes from the same node set are not allowedIn our case the information is about weather or not a person(customer) has watched the movie (product) and the how much rating the customer has given for the movie. Such an information can be easily represented as show in the below table: 39 | 40 | 41 | customer/movie | movie 1 | movie 2 | movie 3 | movie 4 42 | --------------- | -------- | -------- | -------- | ------- 43 | customer1 | 0 | 1 | 0 | 1 44 | customer2 | 0 | 1 | 1 | 1 45 | customer3 | 1 | 0 | 1 | 0 46 | 47 | 48 | 49 | 50 | The zeros in the above table represent weather a customer has watched a movie or not. The nonzero’s represent that a customer has watched the movie and the numeric value represents the rating he/she has given for that movie. You can traverse from customer to movie but you cannot traverse from customer to customer directly . Likewise you cannot directly traverse form movie to movie either. 51 | 52 | 53 | Biparte matrix translation to graph: 54 | 55 | 56 | ![](biparte.png) 57 | 58 | 59 | The actual data set has 9123 movies and 671 customer. So the biparte graph matrix is of size (9123 * 671). 60 | 61 | 62 | 63 | 64 | 65 | # Related work: 66 | In general recommendation systems are implemented in three ways: 67 | 68 | ## 1.)Content based approach: 69 | Another common approach when designing recommender systems is content-based filtering. Content-based filtering methods are based on a description of the item or product and a profile of the user’s preference 70 | 71 | ## 2.)Collaborative filtering: 72 | Collaborative filtering methods are based on collecting and analyzing a large amount of information on users’ behaviors, activities or preferences and predicting what users will like based on their similarity to other users. A key advantage of the collaborative filtering approach is that it does not rely on machine analyzable content and therefore it is capable of accurately recommending complex items such as movies without requiring an "understanding" of the item itself 73 | ## 3.) A hybrid of collaborative and content based approach: 74 | In this approach we combine the both the collaborative and content based approaches to come with recommendations. 75 | 76 | Our main focus in this project is to implement the collaborative filtering as well as the content based filtering. 77 | 78 | 79 | # Recommendation Algorithms: 80 | 81 | ## 1.) The content based Filtering: 82 | The idea behind content based filtering is when a user likes/watches certain movies, using the meta information of the movies that the user watched we will suggest similar movies which may have the same properties. For example the following meta information such as the genre about a movie can be used to suggest similar movies that belong to the same genre. We can also couple this with ratings that the user has given to these movies earlier. 83 | meta-properties: genre 84 | user-given-properties: rating 85 | 86 | Here is a simple algorithm 87 | 88 | ```{p} 89 | Algorithm: 90 | Step1: choose all the movies that the user watched. 91 | Step2: obtain genre of all the movies that user watched 92 | 93 | Step 3: sum all the ratings for each genre that is given by the target user 94 | 95 | Step 4: Divide the cumulative rating of each genre with 96 | the number of movies in that genre. 97 | 98 | Step 4: Now pick the top three genres using the above computation 99 | and recommend movies that belong to that genre. 100 | 101 | ``` 102 | 103 | This may not be the best approach but this takes into consideration that may be a user likes a particular genre and he is trying to find a good movie in that genre. He may not have found a good movie so far. Or it can also be that the user in general likes movies from certain genres more than other genre. 104 | 105 | 106 | ## 2.) Collaborative Filtering: 107 | Collaborative filtering can be implemented in two ways . User based collaborative filtering and item based collaborative filtering. In this project we will be focussing on the user-user collaborative filtering. In user-user collaborative filtering when try to recommend a user , we try to find other similar users who have watched almost the same movies as our current user. We use similarity metrics such as euclidian distance, manhattan distance , pearson correlation etc to find such similar users. In this project implemented euclidian distance to find similar users. We will be taking the example of the tabl1 and try to recommend movies for customer 1 in the table. 108 | 109 | ### Euclidian distance Similarity: 110 | From the table 1 let’s assume we are trying to recommend movies for customer 1. Our goal is to find similar users. In order to do this we try to find the euclidian distance of customer 1 to all other customers respectively. 111 | 112 | The euclidian distance for any two vectors, p = (p1, p2,..., pn) and q = (q1, q2,..., qn) are two points in Euclidean n-space, is the distance (d) from p to q, or from q to p and is given by the Pythagorean formula: 113 | 114 | $\sqrt[2]{\sum_{i=1}^{n} (q_{i} - p_{i} )^{2}}$ 115 | 116 | Here vecotors are nothing but the rows of the biparte matrix shown in table 1. i.e customer 1 row is vector p and any other customer such as customer 2 row is vector q. The distance between customer 1 and customer 2 is computed as follows: 117 | 118 | customer 1, p = (0,1,0,1) 119 | customer 2, q = (0,1,1,1) 120 | 121 | $\sqrt[2]{( (0-0)^{2} + (1-1)^{2} + (0-1)^{2} + (1-1)^{2} )^{2}}$ = 1 122 | 123 | 124 | As you can see the distance is one. We obtain distances of customer 1 w.r.t all other users. We choose the first few users( atleast three) who are closer to the customer 1. The most important thing is we ignore all those users who have a euclidian distance of zero w.r.t customer 1. This is becuase those users have watched the same set of movies as customer 1 and have no other information to provide that would be helpfull in recommending customer 1. 125 | 126 | We compute a distance metric as follows: 127 | 128 | d | customer1 | customer 2 | customer 3 129 | ------ | ---------- | ----------- | ---------- 130 | distance from customer1 | 0 | 1 | 2 131 | 132 | So customer 3 not very similar to customer 1 . But customer 2 is similar and may have some intresting information that we can use to suggest movies to customer 1. 133 | 134 | 135 | ### graph based recommendation : 136 | 137 | Now that we have the list of users (neighborhood users) similar to the target users( whom we are recommending)we will use the biparte graph matrix to search for movies that can be recommended to the target user. 138 | 139 | The similarity between customer 1 and customer 2 is obvious becuase they have both watched movies "move1" and "movie4". As a result, "movie3" is recommended to customer 1 because customer 2 has watched it too. From the distance metrics we know that customer 1 and customer 3 are not very similar. Therefore, customer 1, which has been purchased by customer 3, will not be recommended to customer 1. 140 | 141 | The above recommendation approach can be easily implemented in a graph-based model by computing the associations between movie nodes and customer nodes. In this context, the association between two nodes is determined by the existence and length of the path(s) connecting them. Standard collaborative filtering approaches, including both the user-based and item-based approaches, consider only paths with length equal to 3. For instance, the 142 | association between customer 1 and movie3 is determined by all paths of length 3 connecting customer 1 and movie3. It is easy to see from Figure 1 that there exist two paths connecting customer1 and movie3: 143 | customer1—movie1—customer2-movie3 144 | and customer1—movie4—customer2—movie3. 145 | 146 | This strong association leads to the recommendation of movie3 to customer1. Intuitively, the higher the number of distinctive paths connecting a product node to a consumer node, the higher the association between these two nodes. The product therefore is more likely to be recommended to the consumer. 147 | Extending the above approach to explore and incorporate transitive associations is straightforward in a graph-based model. By considering paths whose length exceeds 3, the model will be able to explore transitive associations. 148 | 149 | 150 | 151 | So we can formalize this as follows: 152 | If there are n paths between (customer i , movie i) then the the wieight of each path is computed as follows: 153 | 154 | Aglorithm: 155 | 156 | Take constant alpha= (0,1) 157 | weights = 0 158 | For each path between (customer i and movie i): 159 | compute the depth of the path. 160 | weights = weights + $(alpha)^3$ 161 | 162 | 163 | 164 | 165 | $weights(customer1 , movie3)= (0.5)^3 +(0.5)^3 = 0.25, and weihts(cusotmer1, movie1)=0$ 166 | 167 | It's zero becuase there is no path to movie1. Hence we will recommend movie 3 to customer 1. 168 | 169 | 170 | 171 | # Experimental Evaluation: 172 | 173 | One of the ways to evaluate the content based and collaberative based techniques is to use the similarity metrics as follows: 174 | 175 | 1.) Compute the euclidian distances of the all users with respect to the target user(who is to be recommended). Obtain all the similar users i.e users whose distance is less w.r.t target user. 176 | 177 | 2.) Now compute the recommendation filters and recommend the movie to the target user. Update the movie list of the target user. 178 | 179 | 3.) Now compute the euclidian distances of all the previous similar users w.r.t the updated target user. And see how much has the cummulative distace varied. 180 | 181 | 4.) We compute this change in cummulative distance for both collaberative as well as content based recommendation. The better algorithm is the one whose cummulative distance has reduced drastically. 182 | 183 | 184 | Lets say the collabertive algorithm recommends "movie3" to customer 1 185 | And the content based algorithm recommends "movie1" to customer 1 186 | 187 | 188 | So now updated vectors are: 189 | 190 | 191 | ### For collabertive: 192 | 193 | Target User: 194 | customer 1, p = (0,1,1,1) 195 | Similar Users: 196 | customer 2, q = (0,1,1,1) 197 | 198 | Total distance = $\sqrt[2]{( (0-0)^{2} + (1-1)^{2} + (1-1)^{2} + (1-1)^{2} )^{2}}$ = 0 199 | 200 | ### For content based: 201 | 202 | Target User: 203 | customer 1, p = (1,1,0,1) 204 | Similar Users: 205 | customer 2, q = (0,1,1,1) 206 | 207 | Total distance = $\sqrt[2]{( (1-0)^{2} + (1-1)^{2} + (0-1)^{2} + (1-1)^{2} )^{2}}$ = $\sqrt[2]{2}$ 208 | 209 | It seems like the content based has not fared well as it failed to give the most similar recommendation. But this is epxected behaviour. This approach is more helpfull when we are trying to compare one collabertive filtering algorithm with another collaberative filtering algorithm. 210 | 211 | In this program we have implemented only for one movie recommendation to the target user customer 1. So apparently the euclidian distances are not good enough to compare the algorithms. This is because the program took a lot of time just to execute for one target user. 212 | 213 | ``` 214 | The content Based recommendatin is: 215 | 284 318 Shawshank Redemption, The (1994) Crime|Drama 216 | 217 | The collaberative based recommendatoin is : 218 | 15 16 Casino (1995) Crime 219 | ``` 220 | 221 | 222 | 223 | # Conclusion: 224 | 225 | 1.) There are many ways to implement collaberative as well as content based filtering. What we have implemented in this project is not the best approach. It can be improved a lot more. 226 | 227 | 2.) In this experiment though we have used the Euclidian distance to get similar users , there are far more better approaches to get similar / neighbouring users such as the manhattan distance , the pearson corellation which is considered one of the best for the collabertive filtering problem. 228 | 229 | 3.) In the project we have implemented BFS to obtain the paths from customer to movie in biparte graph. But as the number of movies increases this might be problem as we have to compute more number of paths. We have to explore other ways such as implementing greedy algorithms which are faster and take less computation but may not give an optimal solution. Or we can do iterative deepening search to limit depth of the graph search. This is something that would very efficient and intresting to try. 230 | 231 | 4.) As for the testing it would be more realistic to have data where we have information about what movies did the customer/user choose after the recommendation. This way we can implement precission and recall metircs easily. 232 | 233 | 234 | 235 | 236 | 237 | 238 | 239 | 240 | 241 | 242 | References: 243 | 244 | 1.) https://en.wikipedia.org/wiki/Recommender_system 245 | 2.) https://en.wikipedia.org/wiki/Euclidean_distance 246 | 3.) GRAPH-BASED ANALYSIS FOR E-COMMERCE RECOMMENDATION[http://arizona.openrepository.com/arizona/bitstream/10150/196109/1/azu_etd_1167_sip1_m.pdf] 247 | 4.) Collaborative Filtering using Weighted BiPartite Graph Projection [http://snap.stanford.edu/class/cs224w-2013/projects2013/cs224w-038-final.pdf] 248 | 249 | 5.)Movie Recommendation based on graph traversal Algorithms[http://www2.fiit.stuba.sk/~bielik/publ/abstracts/2013/televido-dexa2013.pdf] 250 | 6.) http://grouplens.org/blog/ 251 | 252 | 253 | -------------------------------------------------------------------------------- /docs/graph_based_recommendation_system.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/chandan-u/graph-based-recommendation-system/897c6419d922fcbfe7f110522e530f3eeeceaf17/docs/graph_based_recommendation_system.pdf -------------------------------------------------------------------------------- /docs/report.pages: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/chandan-u/graph-based-recommendation-system/897c6419d922fcbfe7f110522e530f3eeeceaf17/docs/report.pages --------------------------------------------------------------------------------