├── .gitignore
├── Archive.zip
├── README.html
├── README.md
├── README.pdf
├── algos.py
├── data.py
├── data
├── README.txt
├── links.csv
├── ml-latest-small.zip
├── movies.csv
├── ratings.csv
└── tags.csv
└── docs
├── biparte (1).xml
├── biparte.png
├── biparte.xml
├── graph_based_recommendation system.Rmd
├── graph_based_recommendation_system.pdf
└── report.pages
/.gitignore:
--------------------------------------------------------------------------------
1 | *.pyc
2 |
--------------------------------------------------------------------------------
/Archive.zip:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/chandan-u/graph-based-recommendation-system/897c6419d922fcbfe7f110522e530f3eeeceaf17/Archive.zip
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 |
2 | # Graph-search based Recommendation system
3 |
4 |
5 | This is project is about building a recommendation system using graph search methodologies. We will be comparing these different approaches and closely observe the limitations of each.
6 |
7 |
8 |
9 |
10 |
11 | **Table of Contents** *generated with [DocToc](https://github.com/thlorenz/doctoc)*
12 |
13 |
14 | - [Abstract:](#abstract)
15 | - [Introduction:](#introduction)
16 | - [Data:](#data)
17 | - [Representation of data:](#representation-of-data)
18 | - [Related work:](#related-work)
19 | - [1. Content based approach:](#1-content-based-approach)
20 | - [2. Collaborative filtering:](#2-collaborative-filtering)
21 | - [3. A hybrid of collaborative and content based approach:](#3-a-hybrid-of-collaborative-and-content-based-approach)
22 | - [Recommendation Algorithms:](#recommendation-algorithms)
23 | - [1. The content based Filtering:](#1-the-content-based-filtering)
24 | - [2. Collaborative Filtering:](#2-collaborative-filtering)
25 | - [Euclidian distance Similarity:](#euclidian-distance-similarity)
26 | - [graph based recommendation :](#graph-based-recommendation-)
27 | - [Experimental Evaluation:](#experimental-evaluation)
28 | - [For collabertive:](#for-collabertive)
29 | - [For content based:](#for-content-based)
30 | - [Conclusion:](#conclusion)
31 | - [References:](#references)
32 |
33 |
34 |
35 |
36 |
37 | ## Abstract:
38 |
39 | Implemented a movie recommendation system using the movielens dataset from the grouplens site. This dataset is transformed to a bipartite graph which allowed to address the problem using graph based traversal algorithms instead of usual approaches that are used by recommendation systems. The goal is to implement collaborative filtering technique as well as content based recommendation using the graph traversal algorithms. We will evaluate the advantages and shortcomings and then also discuss how we can improve on this approach.
40 |
41 | ## Introduction:
42 |
43 | The amount of content that is being generated by social media sites, movies, tv shows etc is increasing tremendously and its very hard for a user or person to choose from such a huge pool of content. There are endless choices. Hence we need to filter out most of these content and give suggestions to user.
44 | Recommendation systems are designed to solve this very problem to give users best suggestions based on existing data and the user preferences.
45 |
46 | Recommendation systems are widely used in e-commerce sites such as Netflix to suggest movies , amazon to suggest products, music application such as iTunes and spotify to suggest next songs that the user may like to hear. It can be applied to even domains such as social networking. Facebook uses it for suggesting friends.
47 |
48 | In this project we implement a collaberative filtering recommendation system that uses existing data to give better suggestions. We will be building a bipartite graph from the data set to support graph traversal for collaborative filtering system.
49 |
50 |
51 | ## Data:
52 |
53 | The data set is obtained from http://grouplens.org/ . They have a collection of ratings of movies from MovieLens website . This data set covers 100,000 ratings and 1,300 tag applications applied to 9,000 movies by 671 users. For implementing the collaberative filter system we will be using all the data except for tags. The data set has mainly two files : movies.csv, ratings.csv.
54 |
55 | Number of users: 671
56 | Number of movies: >9000
57 | Number of ratings: 100,000
58 |
59 | Files: movies.csv, ratings.csv
60 |
61 |
62 |
63 |
64 | ## Representation of data:
65 |
66 | To facilitate graph traversal techniques and collaborative filtering , the data is transformed into bipartite graph representation. In a bipartite graph, nodes are divided into two distinctive sets. Links between pairs of nodes from different node sets are admissible, while links between nodes from the same node set are not allowedIn our case the information is about weather or not a person(customer) has watched the movie (product) and the how much rating the customer has given for the movie. Such an information can be easily represented as show in the below table:
67 |
68 |
69 | customer/movie | movie 1 | movie 2 | movie 3 | movie 4
70 | --------------- | -------- | -------- | -------- | -------
71 | customer1 | 0 | 1 | 0 | 1
72 | customer2 | 0 | 1 | 1 | 1
73 | customer3 | 1 | 0 | 1 | 0
74 |
75 |
76 |
77 |
78 | The zeros in the above table represent weather a customer has watched a movie or not. The nonzero’s represent that a customer has watched the movie and the numeric value represents the rating he/she has given for that movie. You can traverse from customer to movie but you cannot traverse from customer to customer directly . Likewise you cannot directly traverse form movie to movie either.
79 |
80 |
81 | Biparte matrix translation to graph:
82 |
83 |
84 | 
85 |
86 |
87 |
88 |
89 |
90 | The actual data set has 9123 movies and 671 customer. So the biparte graph matrix is of size (9123 * 671).
91 |
92 |
93 |
94 |
95 |
96 | ## Related work:
97 | In general recommendation systems are implemented in three ways:
98 |
99 | ### 1. Content based approach:
100 | Another common approach when designing recommender systems is content-based filtering. Content-based filtering methods are based on a description of the item or product and a profile of the user’s preference
101 |
102 | ### 2. Collaborative filtering:
103 | Collaborative filtering methods are based on collecting and analyzing a large amount of information on users’ behaviors, activities or preferences and predicting what users will like based on their similarity to other users. A key advantage of the collaborative filtering approach is that it does not rely on machine analyzable content and therefore it is capable of accurately recommending complex items such as movies without requiring an "understanding" of the item itself
104 | ### 3. A hybrid of collaborative and content based approach:
105 | In this approach we combine the both the collaborative and content based approaches to come with recommendations.
106 |
107 | Our main focus in this project is to implement the collaborative filtering as well as the content based filtering.
108 |
109 |
110 | ## Recommendation Algorithms:
111 |
112 | ### 1. The content based Filtering:
113 | The idea behind content based filtering is when a user likes/watches certain movies, using the meta information of the movies that the user watched we will suggest similar movies which may have the same properties. For example the following meta information such as the genre about a movie can be used to suggest similar movies that belong to the same genre. We can also couple this with ratings that the user has given to these movies earlier.
114 | meta-properties: genre
115 | user-given-properties: rating
116 |
117 | Here is a simple algorithm
118 |
119 | ```{p}
120 | Algorithm:
121 | Step1: choose all the movies that the user watched.
122 | Step2: obtain genre of all the movies that user watched
123 |
124 | Step 3: sum all the ratings for each genre that is given by the target user
125 |
126 | Step 4: Divide the cumulative rating of each genre with
127 | the number of movies in that genre.
128 |
129 | Step 4: Now pick the top three genres using the above computation
130 | and recommend movies that belong to that genre.
131 |
132 | ```
133 |
134 | This may not be the best approach but this takes into consideration that may be a user likes a particular genre and he is trying to find a good movie in that genre. He may not have found a good movie so far. Or it can also be that the user in general likes movies from certain genres more than other genre.
135 |
136 |
137 | ### 2. Collaborative Filtering:
138 | Collaborative filtering can be implemented in two ways . User based collaborative filtering and item based collaborative filtering. In this project we will be focussing on the user-user collaborative filtering. In user-user collaborative filtering when try to recommend a user , we try to find other similar users who have watched almost the same movies as our current user. We use similarity metrics such as euclidian distance, manhattan distance , pearson correlation etc to find such similar users. In this project implemented euclidian distance to find similar users. We will be taking the example of the tabl1 and try to recommend movies for customer 1 in the table.
139 |
140 | #### Euclidian distance Similarity:
141 | From the table 1 let’s assume we are trying to recommend movies for customer 1. Our goal is to find similar users. In order to do this we try to find the euclidian distance of customer 1 to all other customers respectively.
142 |
143 | The euclidian distance for any two vectors, p = (p1, p2,..., pn) and q = (q1, q2,..., qn) are two points in Euclidean n-space, is the distance (d) from p to q, or from q to p and is given by the Pythagorean formula:
144 |
145 | $\sqrt[2]{\sum_{i=1}^{n} (q_{i} - p_{i} )^{2}}$
146 |
147 | Here vecotors are nothing but the rows of the biparte matrix shown in table 1. i.e customer 1 row is vector p and any other customer such as customer 2 row is vector q. The distance between customer 1 and customer 2 is computed as follows:
148 |
149 | customer 1, p = (0,1,0,1)
150 | customer 2, q = (0,1,1,1)
151 |
152 | $\sqrt[2]{( (0-0)^{2} + (1-1)^{2} + (0-1)^{2} + (1-1)^{2} )^{2}}$ = 1
153 |
154 |
155 | As you can see the distance is one. We obtain distances of customer 1 w.r.t all other users. We choose the first few users( atleast three) who are closer to the customer 1. The most important thing is we ignore all those users who have a euclidian distance of zero w.r.t customer 1. This is becuase those users have watched the same set of movies as customer 1 and have no other information to provide that would be helpfull in recommending customer 1.
156 |
157 | We compute a distance metric as follows:
158 |
159 | d | customer1 | customer 2 | customer 3
160 | ------ | ---------- | ----------- | ----------
161 | distance from customer1 | 0 | 1 | 2
162 |
163 | So customer 3 not very similar to customer 1 . But customer 2 is similar and may have some intresting information that we can use to suggest movies to customer 1.
164 |
165 |
166 | #### graph based recommendation :
167 |
168 | Now that we have the list of users (neighborhood users) similar to the target users( whom we are recommending)we will use the biparte graph matrix to search for movies that can be recommended to the target user.
169 |
170 | The similarity between customer 1 and customer 2 is obvious becuase they have both watched movies "move1" and "movie4". As a result, "movie3" is recommended to customer 1 because customer 2 has watched it too. From the distance metrics we know that customer 1 and customer 3 are not very similar. Therefore, customer 1, which has been purchased by customer 3, will not be recommended to customer 1.
171 |
172 | The above recommendation approach can be easily implemented in a graph-based model by computing the associations between movie nodes and customer nodes. In this context, the association between two nodes is determined by the existence and length of the path(s) connecting them. Standard collaborative filtering approaches, including both the user-based and item-based approaches, consider only paths with length equal to 3. For instance, the
173 | association between customer 1 and movie3 is determined by all paths of length 3 connecting customer 1 and movie3. It is easy to see from Figure 1 that there exist two paths connecting customer1 and movie3:
174 | customer1—movie1—customer2-movie3
175 | and customer1—movie4—customer2—movie3.
176 |
177 | This strong association leads to the recommendation of movie3 to customer1. Intuitively, the higher the number of distinctive paths connecting a product node to a consumer node, the higher the association between these two nodes. The product therefore is more likely to be recommended to the consumer.
178 | Extending the above approach to explore and incorporate transitive associations is straightforward in a graph-based model. By considering paths whose length exceeds 3, the model will be able to explore transitive associations.
179 |
180 |
181 |
182 | So we can formalize this as follows:
183 | If there are n paths between (customer i , movie i) then the the wieight of each path is computed as follows:
184 |
185 | Aglorithm:
186 |
187 | ```
188 | Take constant alpha= (0,1)
189 | weights = 0
190 | For each path between (customer i and movie i):
191 | compute the depth of the path.
192 | weights = weights + $(alpha)^3$
193 |
194 | ```
195 |
196 |
197 | $weights(customer1 , movie3)= (0.5)^3 +(0.5)^3 = 0.25, and weihts(cusotmer1, movie1)=0$
198 |
199 | It's zero becuase there is no path to movie1. Hence we will recommend movie 3 to customer 1.
200 |
201 |
202 |
203 | ## Experimental Evaluation:
204 |
205 | One of the ways to evaluate the content based and collaberative based techniques is to use the similarity metrics as follows:
206 |
207 | 1.) Compute the euclidian distances of the all users with respect to the target user(who is to be recommended). Obtain all the similar users i.e users whose distance is less w.r.t target user.
208 |
209 | 2.) Now compute the recommendation filters and recommend the movie to the target user. Update the movie list of the target user.
210 |
211 | 3.) Now compute the euclidian distances of all the previous similar users w.r.t the updated target user. And see how much has the cummulative distace varied.
212 |
213 | 4.) We compute this change in cummulative distance for both collaberative as well as content based recommendation. The better algorithm is the one whose cummulative distance has reduced drastically.
214 |
215 |
216 | Lets say the collabertive algorithm recommends "movie3" to customer 1
217 | And the content based algorithm recommends "movie1" to customer 1
218 |
219 |
220 | So now updated vectors are:
221 |
222 |
223 | ### For collabertive:
224 |
225 | Target User:
226 | customer 1, p = (0,1,1,1)
227 | Similar Users:
228 | customer 2, q = (0,1,1,1)
229 |
230 | Total distance = $\sqrt[2]{( (0-0)^{2} + (1-1)^{2} + (1-1)^{2} + (1-1)^{2} )^{2}}$ = 0
231 |
232 | ### For content based:
233 |
234 | Target User:
235 | customer 1, p = (1,1,0,1)
236 | Similar Users:
237 | customer 2, q = (0,1,1,1)
238 |
239 | Total distance = $\sqrt[2]{( (1-0)^{2} + (1-1)^{2} + (0-1)^{2} + (1-1)^{2} )^{2}}$ = $\sqrt[2]{2}$
240 |
241 | It seems like the content based has not fared well as it failed to give the most similar recommendation. But this is epxected behaviour. This approach is more helpfull when we are trying to compare one collabertive filtering algorithm with another collaberative filtering algorithm.
242 |
243 | In this program we have implemented only for one movie recommendation to the target user customer 1. So apparently the euclidian distances are not good enough to compare the algorithms. This is because the program took a lot of time just to execute for one target user.
244 |
245 | ```
246 | The content Based recommendatin is:
247 | 284 318 Shawshank Redemption, The (1994) Crime|Drama
248 |
249 | The collaberative based recommendatoin is :
250 | 15 16 Casino (1995) Crime
251 | ```
252 |
253 |
254 |
255 | ## Conclusion:
256 |
257 | 1.) There are many ways to implement collaberative as well as content based filtering. What we have implemented in this project is not the best approach. It can be improved a lot more.
258 |
259 | 2.) In this experiment though we have used the Euclidian distance to get similar users , there are far more better approaches to get similar / neighbouring users such as the manhattan distance , the pearson corellation which is considered one of the best for the collabertive filtering problem.
260 |
261 | 3.) In the project we have implemented BFS to obtain the paths from customer to movie in biparte graph. But as the number of movies increases this might be problem as we have to compute more number of paths. We have to explore other ways such as implementing greedy algorithms which are faster and take less computation but may not give an optimal solution. Or we can do iterative deepening search to limit depth of the graph search. This is something that would very efficient and intresting to try.
262 |
263 | 4.) As for the testing it would be more realistic to have data where we have information about what movies did the customer/user choose after the recommendation. This way we can implement precission and recall metircs easily.
264 |
265 |
266 |
267 |
268 |
269 |
270 |
271 |
272 |
273 |
274 | ## References:
275 |
276 | 1.) https://en.wikipedia.org/wiki/Recommender_system
277 | 2.) https://en.wikipedia.org/wiki/Euclidean_distance
278 | 3.) GRAPH-BASED ANALYSIS FOR E-COMMERCE RECOMMENDATION[http://arizona.openrepository.com/arizona/bitstream/10150/196109/1/azu_etd_1167_sip1_m.pdf]
279 | 4.) Collaborative Filtering using Weighted BiPartite Graph Projection [http://snap.stanford.edu/class/cs224w-2013/projects2013/cs224w-038-final.pdf]
280 |
281 | 5.)Movie Recommendation based on graph traversal Algorithms[http://www2.fiit.stuba.sk/~bielik/publ/abstracts/2013/televido-dexa2013.pdf]
282 | 6.) http://grouplens.org/blog/
283 |
284 |
285 |
--------------------------------------------------------------------------------
/README.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/chandan-u/graph-based-recommendation-system/897c6419d922fcbfe7f110522e530f3eeeceaf17/README.pdf
--------------------------------------------------------------------------------
/algos.py:
--------------------------------------------------------------------------------
1 | import operator
2 |
3 | from data import load, load_movielens
4 |
5 | import numpy as np
6 |
7 | # load the biparte graph: np matrix
8 | user_movie_matrix = load()
9 |
10 |
11 |
12 |
13 |
14 | def greedy():
15 |
16 |
17 | """
18 |
19 | This uses greedy approach for the recommendation
20 | It traverses the graph based on the greedy approach : pick highest rating while traversing from node to node
21 | """
22 |
23 |
24 |
25 | for row in user_movie_matrix:
26 | # compute for each user
27 |
28 | row = list(row)
29 | maxrating = max(row)
30 | print user_movie_matrix[user_movie_matrix[row] == maxrating]
31 |
32 | pass
33 |
34 |
35 |
36 |
37 |
38 | def bfs_paths(graph, start, goal):
39 | flag = "product"
40 | queue = [(start, [start])]
41 | nonzero_indices = []
42 | while queue:
43 | (vertex, path) = queue.pop(0)
44 | if flag == "product":
45 | # the children nodes are the vertical of biparte matrix(nonzero)
46 | column = graph[:, [vertex]]
47 | nonzero_indices = column.nonzero()
48 | nonzero_indices = nonzero_indices[0]
49 | for child_index in nonzero_indices:
50 | row = graph[child_index]
51 | nonzero_row_indices = row.nonzero()[0]
52 |
53 | for row_index in nonzero_row_indices:
54 | if goal == row_index:
55 | yield path + [child_index]
56 | else:
57 | queue.append((row_index, path + [row_index]))
58 |
59 |
60 |
61 |
62 |
63 |
64 |
65 | def graph_search(biparte_matrix):
66 |
67 | """
68 | the first row of the biparte matrix is : targetUser
69 |
70 | the second to last row is the closest neighbors
71 |
72 | Now using graph search we have to find recommend a movie
73 |
74 | Since its a biparte graph:
75 | 1. First move from customer to movie
76 | 2. move from movie to customer
77 |
78 |
79 | """
80 |
81 |
82 |
83 | # the user who is being recommeneded
84 | target_vector = biparte_matrix[0]
85 |
86 | # get indices of all /some of the indices of the watched movies of the target user
87 |
88 | #bfs_roots = [i for i, e in enumerate(target_vector) if e != 0]
89 |
90 | bfs_all_roots = np.argpartition(target_vector, -10 )[-10:]
91 | bfs_roots = []
92 |
93 | for root in bfs_all_roots:
94 | if target_vector[root] != 0 :
95 | bfs_roots.append(root)
96 |
97 |
98 | path =[]
99 | data = {}
100 | for item_index in range(biparte_matrix.shape[0]):
101 |
102 | if target_vector[item_index] == 0:
103 |
104 | for root in bfs_roots:
105 | path.append( bfs_paths(biparte_matrix, root, item_index ))
106 | data[item_index] = path
107 |
108 |
109 | return data
110 | def user_base_collabertive_filtering():
111 |
112 |
113 | # find euclidian distance of first two users w.r.t all users
114 | # note: distance betwee two same vecotrs is zero
115 |
116 | for user in range(2):
117 | distances = []
118 | for anotheruser in range(user_movie_matrix.shape[0]):
119 |
120 | distance = np.linalg.norm(user_movie_matrix[user] - user_movie_matrix[anotheruser] )
121 | distances.append(distance)
122 |
123 |
124 | # get the similar users
125 | # indices of the similar users: closest 20
126 | # non zero too
127 | closest_all_indices=np.argpartition(distances, -20)[-20:]
128 | closest_indices = []
129 | for index in closest_all_indices:
130 | if distances[index] != 0:
131 | closest_indices.append(index)
132 |
133 |
134 |
135 |
136 | # consider the first five closest neighbors for the recommendation
137 |
138 | closest_indices.insert(0, user)
139 | biparte_matrix = user_movie_matrix[closest_indices[0:4]]
140 | # now execute the graph search for recommendation
141 |
142 | paths = graph_search(biparte_matrix)
143 |
144 |
145 |
146 |
147 | # compute the weights of paths
148 | data= {}
149 | for item in paths.keys():
150 | print item
151 | weight = 0
152 | allpaths = paths[item]
153 | for path in allpaths:
154 | depth = len(path)
155 | weight = weight + (0.5)**depth
156 | data[item] = weight
157 |
158 | # find the which movie has great weight:
159 | fav_movie = max(data.iteritems(), key=operator.itemgetter(1))[0]
160 |
161 |
162 | def get_movie_avg_rating(id, ratings):
163 | """
164 | return avg rating of the movie
165 |
166 | """
167 |
168 |
169 | #df = ratings.movieId == id
170 |
171 | rating = 0
172 | for index, row in ratings.iterrows():
173 | if row["movieId"] == id:
174 | rating = rating + row["rating"]
175 | return rating
176 |
177 |
178 |
179 | def get_user_movie_rating( id, ratings, target_user):
180 | """
181 | return user rating of the movie
182 |
183 | """
184 |
185 |
186 | #df = ratings.movieId == id
187 |
188 | rating = 0
189 | df = ratings[ratings.userId == 1]
190 | for index, row in df.iterrows():
191 | if row["movieId"] == id:
192 | rating = rating + row["rating"]
193 | return rating
194 |
195 |
196 |
197 |
198 |
199 | def content_based_filtering():
200 | """
201 | For content based filtering we wont directly use the biparte graph. We will be directly querying the file that is loaded
202 | using the pandas dataframe.
203 |
204 | The idea behind content based filtering is : when a user likes certain movies, using the meta informaton of the movies that the user watched we
205 | will suggest similar movies which may have the same properties.
206 |
207 |
208 | meta-properties: genre
209 | user-given-properties: rating
210 | Algorithm:
211 | choose all the movies that the user watched and arrange in descending order of ratings
212 | obtain genre of all the movies that user watched
213 |
214 | sum all the ratings for each genre
215 | Now pick the top three generes and suggest based on that
216 |
217 |
218 | """
219 |
220 |
221 |
222 | movies, ratings = load_movielens()
223 |
224 |
225 | # considerng customer 1
226 | target_user = 1
227 |
228 | movie_ids = []
229 | # list of movies watched by user
230 | for index, row in ratings.iterrows():
231 | if row["userId"] == 1 :
232 | movie_ids.append(row["movieId"])
233 |
234 |
235 | # compute the average of the ratings for each genre watched by the user
236 | genre_dict = {}
237 | genre_count_dict = {}
238 | genre_ratio = {}
239 | for id in movie_ids:
240 | df = movies[movies.movieId == id]
241 | genres = []
242 | for index, row in df.iterrows():
243 | genres = row["genres"]
244 | genres = genres.lower()
245 | genres = genres.split('|')
246 |
247 | rating = get_user_movie_rating(id, ratings, target_user =1 )
248 |
249 | for genre in genres:
250 | if genre in genre_dict.keys():
251 | genre_dict[genre] = genre_dict[genre] + rating
252 | genre_count_dict[genre] = genre_count_dict[genre] + 1
253 | else:
254 | genre_dict[genre] = rating
255 | genre_count_dict[genre] = rating
256 |
257 | for key in genre_dict.keys():
258 | ratio = genre_dict[key] / float(genre_count_dict[key])
259 | genre_dict[key] = ratio
260 |
261 | fav_genre = max(genre_dict.iteritems(), key=operator.itemgetter(1))[0]
262 |
263 | # get the best movies from that genre:
264 |
265 | genres_ids = []
266 | fav_movie_id = 0
267 | for index, row in movies.iterrows():
268 | genres = row['genres']
269 | genres = genres.lower()
270 | genres = genres.split('|')
271 | if fav_genre in genres:
272 | movie_rating = get_movie_avg_rating(row["movieId"], ratings )
273 | if movie_rating > rating:
274 | fav_movie_id = row["movieId"]
275 | rating = movie_rating
276 |
277 | print "content based recommended movie is:"
278 | print movies[movies.movieId == fav_movie_id]
279 |
280 |
281 | return fav_mov_id
282 |
283 |
284 | def evaluation():
285 |
286 | """
287 | lets only compute for one target user: As it is taking too long
288 | """
289 |
290 | fav_mov_id_con = content_based_filtering()
291 |
292 | fav_mov_id_col = user_base_collabertive_filtering()
293 |
294 |
295 | for user in range(2):
296 | distances = []
297 | for anotheruser in range(user_movie_matrix.shape[0]):
298 |
299 | distance = np.linalg.norm(user_movie_matrix[user] - user_movie_matrix[anotheruser] )
300 | distances.append(distance)
301 |
302 |
303 | # get the similar users
304 | # indices of the similar users: closest 20
305 | # non zero too
306 | closest_all_indices=np.argpartition(distances, -20)[-20:]
307 | closest_indices = []
308 | for index in closest_all_indices:
309 | if distances[index] != 0:
310 | closest_indices.append(index)
311 |
312 |
313 |
314 |
315 | # consider the first five closest neighbors for the recommendation
316 |
317 | closest_indices.insert(0, user)
318 | biparte_matrix = user_movie_matrix[closest_indices[0:4]]
319 |
320 |
321 | target_col_vector =user_movie_matrix[0]
322 | target_con_vector = user_movie_matrix[0]
323 |
324 | score_col = 0
325 | score_con = 0
326 | for row in biparte_matrix:
327 | score_col = score_col + np.linalg.norm(row, target_col_vector)
328 | score_con = score_con + np.linalg.norm(row, target_con_vector)
329 |
330 |
331 | print "The collaberative filtering sum of distances is: ", score_col
332 | print "The contend based filtering sum of distances is: ", score_con
333 |
334 |
335 |
336 | evaluation()
337 |
--------------------------------------------------------------------------------
/data.py:
--------------------------------------------------------------------------------
1 |
2 | import pandas as pd
3 | import numpy as np
4 |
5 | def load_movielens():
6 |
7 | """
8 | load the three csv files:
9 | 1. movies.csv: movieId,title,genres
10 | 2. ratings.csv: userId,movieId,rating,timestamp
11 | 3. tags.csv: userId,movieId,tag,timestamp ( This is not needed for now)
12 |
13 | """
14 | #movies_csv = np.genfromtxt('./data/movies.csv', delimeter=',')
15 |
16 | #ratings_csv = np.genfromtxt('./data/ratings.csv',delimeter=',')
17 |
18 | #tags_csv = np.genfromtxt('./data/tags.csv', delimeter=',')
19 |
20 | movies = pd.read_csv('./data/movies.csv', sep=',')
21 |
22 | ratings = pd.read_csv('./data/ratings.csv', sep=',')
23 |
24 |
25 |
26 | return movies, ratings
27 |
28 |
29 | def biparteMatrix(movies_frame, ratings_frame):
30 |
31 | """
32 |
33 | convert the movies data frame into userid-movies biparte adjacency graph matrix
34 |
35 | """
36 |
37 |
38 | user_ids = list(ratings_frame.userId.unique())
39 | movie_ids = list(movies_frame.movieId.unique())
40 |
41 | numberOfUsers = len(user_ids)
42 |
43 | numberOfMovies = len(movie_ids)
44 |
45 |
46 |
47 | # initialize a numpy matrix of of numberOfUsers * numberOfMovies
48 |
49 | user_movie_biparte = np.zeros((numberOfUsers, numberOfMovies))
50 |
51 |
52 | for name, group in ratings_frame.groupby(["userId", "movieId"]):
53 |
54 | #print name
55 | #print group
56 |
57 | # name is a tuple (userId, movieId)
58 |
59 | userId, movieId = name
60 |
61 | user_index = user_ids.index(userId)
62 | movie_index = movie_ids.index(movieId)
63 | user_movie_biparte[user_index, movie_index] = group[["rating"]].values[0,0]
64 |
65 | return user_movie_biparte
66 |
67 |
68 | def load():
69 |
70 |
71 | """
72 | convert the csv flies into required dataformats
73 |
74 | ratings_csv : convert into user-moveID biparte sparse adjacency graph matrix
75 |
76 | tags_csv: not requried currently
77 |
78 | movies_csv: convert into clusters of data
79 | """
80 |
81 | # load csv into dataframes
82 | movies, ratings = load_movielens()
83 |
84 |
85 | #convet the ratings datafrom into user-movieId biparte adjacency matrix
86 | matrix = biparteMatrix(movies, ratings)
87 |
88 | return matrix
89 |
90 |
91 |
92 |
93 |
94 |
95 |
96 | load()
97 |
--------------------------------------------------------------------------------
/data/README.txt:
--------------------------------------------------------------------------------
1 | Summary
2 | =======
3 |
4 | This dataset (ml-latest-small) describes 5-star rating and free-text tagging activity from [MovieLens](http://movielens.org), a movie recommendation service. It contains 100004 ratings and 1296 tag applications across 9125 movies. These data were created by 671 users between January 09, 1995 and October 16, 2016. This dataset was generated on October 17, 2016.
5 |
6 | Users were selected at random for inclusion. All selected users had rated at least 20 movies. No demographic information is included. Each user is represented by an id, and no other information is provided.
7 |
8 | The data are contained in the files `links.csv`, `movies.csv`, `ratings.csv` and `tags.csv`. More details about the contents and use of all these files follows.
9 |
10 | This is a *development* dataset. As such, it may change over time and is not an appropriate dataset for shared research results. See available *benchmark* datasets if that is your intent.
11 |
12 | This and other GroupLens data sets are publicly available for download at .
13 |
14 |
15 | Usage License
16 | =============
17 |
18 | Neither the University of Minnesota nor any of the researchers involved can guarantee the correctness of the data, its suitability for any particular purpose, or the validity of results based on the use of the data set. The data set may be used for any research purposes under the following conditions:
19 |
20 | * The user may not state or imply any endorsement from the University of Minnesota or the GroupLens Research Group.
21 | * The user must acknowledge the use of the data set in publications resulting from the use of the data set (see below for citation information).
22 | * The user may redistribute the data set, including transformations, so long as it is distributed under these same license conditions.
23 | * The user may not use this information for any commercial or revenue-bearing purposes without first obtaining permission from a faculty member of the GroupLens Research Project at the University of Minnesota.
24 | * The executable software scripts are provided "as is" without warranty of any kind, either expressed or implied, including, but not limited to, the implied warranties of merchantability and fitness for a particular purpose. The entire risk as to the quality and performance of them is with you. Should the program prove defective, you assume the cost of all necessary servicing, repair or correction.
25 |
26 | In no event shall the University of Minnesota, its affiliates or employees be liable to you for any damages arising out of the use or inability to use these programs (including but not limited to loss of data or data being rendered inaccurate).
27 |
28 | If you have any further questions or comments, please email
29 |
30 |
31 | Citation
32 | ========
33 |
34 | To acknowledge use of the dataset in publications, please cite the following paper:
35 |
36 | > F. Maxwell Harper and Joseph A. Konstan. 2015. The MovieLens Datasets: History and Context. ACM Transactions on Interactive Intelligent Systems (TiiS) 5, 4, Article 19 (December 2015), 19 pages. DOI=
37 |
38 |
39 | Further Information About GroupLens
40 | ===================================
41 |
42 | GroupLens is a research group in the Department of Computer Science and Engineering at the University of Minnesota. Since its inception in 1992, GroupLens's research projects have explored a variety of fields including:
43 |
44 | * recommender systems
45 | * online communities
46 | * mobile and ubiquitious technologies
47 | * digital libraries
48 | * local geographic information systems
49 |
50 | GroupLens Research operates a movie recommender based on collaborative filtering, MovieLens, which is the source of these data. We encourage you to visit to try it out! If you have exciting ideas for experimental work to conduct on MovieLens, send us an email at - we are always interested in working with external collaborators.
51 |
52 |
53 | Content and Use of Files
54 | ========================
55 |
56 | Formatting and Encoding
57 | -----------------------
58 |
59 | The dataset files are written as [comma-separated values](http://en.wikipedia.org/wiki/Comma-separated_values) files with a single header row. Columns that contain commas (`,`) are escaped using double-quotes (`"`). These files are encoded as UTF-8. If accented characters in movie titles or tag values (e.g. Misérables, Les (1995)) display incorrectly, make sure that any program reading the data, such as a text editor, terminal, or script, is configured for UTF-8.
60 |
61 | User Ids
62 | --------
63 |
64 | MovieLens users were selected at random for inclusion. Their ids have been anonymized. User ids are consistent between `ratings.csv` and `tags.csv` (i.e., the same id refers to the same user across the two files).
65 |
66 | Movie Ids
67 | ---------
68 |
69 | Only movies with at least one rating or tag are included in the dataset. These movie ids are consistent with those used on the MovieLens web site (e.g., id `1` corresponds to the URL ). Movie ids are consistent between `ratings.csv`, `tags.csv`, `movies.csv`, and `links.csv` (i.e., the same id refers to the same movie across these four data files).
70 |
71 |
72 | Ratings Data File Structure (ratings.csv)
73 | -----------------------------------------
74 |
75 | All ratings are contained in the file `ratings.csv`. Each line of this file after the header row represents one rating of one movie by one user, and has the following format:
76 |
77 | userId,movieId,rating,timestamp
78 |
79 | The lines within this file are ordered first by userId, then, within user, by movieId.
80 |
81 | Ratings are made on a 5-star scale, with half-star increments (0.5 stars - 5.0 stars).
82 |
83 | Timestamps represent seconds since midnight Coordinated Universal Time (UTC) of January 1, 1970.
84 |
85 | Tags Data File Structure (tags.csv)
86 | -----------------------------------
87 |
88 | All tags are contained in the file `tags.csv`. Each line of this file after the header row represents one tag applied to one movie by one user, and has the following format:
89 |
90 | userId,movieId,tag,timestamp
91 |
92 | The lines within this file are ordered first by userId, then, within user, by movieId.
93 |
94 | Tags are user-generated metadata about movies. Each tag is typically a single word or short phrase. The meaning, value, and purpose of a particular tag is determined by each user.
95 |
96 | Timestamps represent seconds since midnight Coordinated Universal Time (UTC) of January 1, 1970.
97 |
98 | Movies Data File Structure (movies.csv)
99 | ---------------------------------------
100 |
101 | Movie information is contained in the file `movies.csv`. Each line of this file after the header row represents one movie, and has the following format:
102 |
103 | movieId,title,genres
104 |
105 | Movie titles are entered manually or imported from , and include the year of release in parentheses. Errors and inconsistencies may exist in these titles.
106 |
107 | Genres are a pipe-separated list, and are selected from the following:
108 |
109 | * Action
110 | * Adventure
111 | * Animation
112 | * Children's
113 | * Comedy
114 | * Crime
115 | * Documentary
116 | * Drama
117 | * Fantasy
118 | * Film-Noir
119 | * Horror
120 | * Musical
121 | * Mystery
122 | * Romance
123 | * Sci-Fi
124 | * Thriller
125 | * War
126 | * Western
127 | * (no genres listed)
128 |
129 | Links Data File Structure (links.csv)
130 | ---------------------------------------
131 |
132 | Identifiers that can be used to link to other sources of movie data are contained in the file `links.csv`. Each line of this file after the header row represents one movie, and has the following format:
133 |
134 | movieId,imdbId,tmdbId
135 |
136 | movieId is an identifier for movies used by . E.g., the movie Toy Story has the link .
137 |
138 | imdbId is an identifier for movies used by . E.g., the movie Toy Story has the link .
139 |
140 | tmdbId is an identifier for movies used by . E.g., the movie Toy Story has the link .
141 |
142 | Use of the resources listed above is subject to the terms of each provider.
143 |
144 | Cross-Validation
145 | ----------------
146 |
147 | Prior versions of the MovieLens dataset included either pre-computed cross-folds or scripts to perform this computation. We no longer bundle either of these features with the dataset, since most modern toolkits provide this as a built-in feature. If you wish to learn about standard approaches to cross-fold computation in the context of recommender systems evaluation, see [LensKit](http://lenskit.org) for tools, documentation, and open-source code examples.
148 |
--------------------------------------------------------------------------------
/data/ml-latest-small.zip:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/chandan-u/graph-based-recommendation-system/897c6419d922fcbfe7f110522e530f3eeeceaf17/data/ml-latest-small.zip
--------------------------------------------------------------------------------
/data/tags.csv:
--------------------------------------------------------------------------------
1 | userId,movieId,tag,timestamp
2 | 15,339,sandra 'boring' bullock,1138537770
3 | 15,1955,dentist,1193435061
4 | 15,7478,Cambodia,1170560997
5 | 15,32892,Russian,1170626366
6 | 15,34162,forgettable,1141391765
7 | 15,35957,short,1141391873
8 | 15,37729,dull story,1141391806
9 | 15,45950,powerpoint,1169616291
10 | 15,100365,activist,1425876220
11 | 15,100365,documentary,1425876220
12 | 15,100365,uganda,1425876220
13 | 23,150,Ron Howard,1148672905
14 | 68,2174,music,1249808064
15 | 68,2174,weird,1249808102
16 | 68,8623,Steve Martin,1249808497
17 | 73,107999,action,1430799184
18 | 73,107999,anime,1430799184
19 | 73,107999,kung fu,1430799184
20 | 73,111624,drama,1431584497
21 | 73,111624,indie,1431584497
22 | 73,111624,love,1431584497
23 | 73,130682,b movie,1432523704
24 | 73,130682,comedt,1432523704
25 | 73,130682,horror,1432523704
26 | 77,1199,Trilogy of the Imagination,1163220043
27 | 77,2968,Gilliam,1163220138
28 | 77,2968,Trilogy of the Imagination,1163220039
29 | 77,4467,Trilogy of the Imagination,1163220065
30 | 77,4911,Gilliam,1163220167
31 | 77,5909,Takashi Miike,1163219591
32 | 77,47465,Gilliam,1163220186
33 | 84,296,intense,1429911417
34 | 84,296,r:violence,1429911417
35 | 84,296,tarantino,1429911417
36 | 91,4388,parody,1448813502
37 | 94,1131,emotional,1291781542
38 | 94,1131,tragedy,1291781538
39 | 94,64957,original plot,1291781246
40 | 94,74458,Predictable,1291780920
41 | 106,48711,CHRISTIAN,1215923364
42 | 132,4189,jesus,1367909949
43 | 132,4612,jesus,1367909949
44 | 132,6683,bollywood,1367909913
45 | 132,6986,jesus,1367909949
46 | 132,27255,No progress,1283581045
47 | 132,27255,Too slow,1283581045
48 | 132,27255,Views,1283581045
49 | 138,260,cult classic,1440379022
50 | 138,260,Science Fiction,1440379018
51 | 138,1258,cult film,1440380361
52 | 138,1258,jack nicholson,1440380355
53 | 138,1258,psychological,1440380357
54 | 138,1258,Stanley Kubrick,1440380352
55 | 138,1704,genius,1440380467
56 | 138,1704,intellectual,1440380463
57 | 138,1704,mathematics,1440380466
58 | 138,1704,psychology,1440380470
59 | 138,4226,Mindfuck,1440380125
60 | 138,4226,nonlinear,1440380113
61 | 138,4226,psychology,1440380115
62 | 138,4226,twist ending,1440380118
63 | 138,4995,genius,1440380440
64 | 138,4995,intelligent,1440380446
65 | 138,4995,math,1440380438
66 | 138,4995,mathematics,1440380436
67 | 138,4995,twist ending,1440380448
68 | 138,48780,Christopher Nolan,1440380053
69 | 138,48780,complicated,1440380062
70 | 138,48780,Hugh Jackman,1440380072
71 | 138,48780,nonlinear,1440380067
72 | 138,48780,psychological,1440380064
73 | 138,48780,twist ending,1440380047
74 | 138,79132,alternate reality,1440380233
75 | 138,79132,Christopher Nolan,1440380255
76 | 138,79132,intellectual,1440380251
77 | 138,79132,mindfuck,1440380252
78 | 138,79132,philosophy,1440380242
79 | 138,79132,sci-fi,1440380239
80 | 138,79132,twist ending,1440380245
81 | 138,109487,artificial intelligence,1440379984
82 | 138,109487,Christopher Nolan,1440380001
83 | 138,109487,good science,1440379977
84 | 138,109487,interesting ideea,1440379991
85 | 138,109487,philosophical issues,1440379996
86 | 138,109487,physics,1440379925
87 | 138,109487,relativity,1440380500
88 | 138,109487,sci-fi,1440380008
89 | 138,109487,sentimental,1440379993
90 | 138,109487,space,1440379974
91 | 138,109487,time travel,1440379981
92 | 138,109487,time-travel,1440379968
93 | 149,121231,eerie,1436920551
94 | 149,121231,friendship,1436920551
95 | 149,121231,teenagers,1436920551
96 | 152,52319,World War II,1335900622
97 | 164,608,Quirky,1179031045
98 | 176,5445,Tom Cruise,1378384753
99 | 176,7458,Brad Pitt,1341055445
100 | 176,8972,Nicolas Cage,1341055378
101 | 176,69640,Johnny Depp,1340916254
102 | 176,104841,sexist,1384107184
103 | 179,260,nerdy,1436669973
104 | 179,260,Science Fiction,1436669968
105 | 187,2082,Emilio Estevez,1233517175
106 | 200,260,critically acclaimed,1437932756
107 | 200,260,Science Fiction,1437932763
108 | 200,80551,strong female presence,1438023607
109 | 212,1061,emotional,1253930128
110 | 212,1061,revenge,1253930131
111 | 212,1061,true story,1253930149
112 | 212,1732,cult film,1253929373
113 | 212,1732,dark comedy,1253929372
114 | 212,1732,satirical,1253929376
115 | 212,2359,British,1253930034
116 | 212,2641,childish plot,1253930390
117 | 212,2641,plot holes,1253930395
118 | 212,2641,superhero,1253930399
119 | 212,3257,Kevin Costner,1253930539
120 | 212,3257,snorefest,1253930543
121 | 212,3275,vigilantism,1218405594
122 | 212,3300,anti-hero,1253929917
123 | 212,3300,Cole Hauser,1253929922
124 | 212,3300,Vin Diesel,1253929910
125 | 212,3301,Amanda Peet,1253929798
126 | 212,3301,Matthew Perry,1253929802
127 | 212,3301,Rosanna Arquette,1253929813
128 | 212,3809,Bill Murray,1253931205
129 | 212,3809,quirky,1253931216
130 | 212,3882,cheerleading,1253932873
131 | 212,3882,Eliza Dushku,1253932876
132 | 212,3882,Overrated,1253932878
133 | 212,4069,standard romantic comedy,1253933882
134 | 212,5219,video game adaptation,1253930587
135 | 212,5507,play enough video games and you can become an NSA agent,1253930345
136 | 212,5507,ridiculous training sequence,1253930342
137 | 212,7373,Guillermo del Toro,1253929290
138 | 212,7373,steampunk,1253929296
139 | 212,7373,superhero,1253929293
140 | 212,8861,Oded Fehr,1253930624
141 | 212,8861,post-apocalyptic,1253930636
142 | 212,8861,Sienna Guillory,1253930633
143 | 212,8861,zombies,1253930644
144 | 212,8928,Beautiful Woman,1253926735
145 | 212,8928,campy,1253926723
146 | 212,8928,irreverent,1253926726
147 | 212,8928,quirky,1253926728
148 | 212,8928,Roman Polanski,1253926730
149 | 212,27904,good animation,1253931241
150 | 212,27904,interesting concept - bad execution,1253931234
151 | 212,27904,surrealism,1253931244
152 | 212,34405,aliens,1253928597
153 | 212,34405,black comedy,1253928590
154 | 212,34405,Firefly,1253928585
155 | 212,37733,disappointing,1253929514
156 | 212,37733,overrated,1253929532
157 | 212,37733,Viggo Mortensen,1253929518
158 | 212,48394,dark,1218405627
159 | 212,48394,fairytale,1218405627
160 | 212,60684,alternate reality,1253926526
161 | 212,60684,diluted version of comic,1253926511
162 | 212,60684,dystopia,1253926517
163 | 212,60684,stylized,1253926519
164 | 212,63992,high school,1253932469
165 | 212,63992,Kristen Stewart,1253932484
166 | 212,63992,Robert Pattinson,1253932481
167 | 212,63992,Teen movie,1253932459
168 | 212,63992,Vampire Human Love,1253932463
169 | 212,64957,adapted from:book,1253929184
170 | 212,64957,Aging Disorder,1253929189
171 | 212,64957,Brad Pitt,1253929141
172 | 212,64957,cinematography,1253929177
173 | 212,64957,drama,1253929173
174 | 212,64957,original plot,1253929180
175 | 212,64957,slow parts,1253929162
176 | 212,64957,touching,1253929169
177 | 212,66097,alternate universe,1253926354
178 | 212,66097,author:Neil Gaiman,1253926328
179 | 212,66097,claymation,1253926330
180 | 212,66097,Dakota Fanning,1253926349
181 | 212,66097,dark,1253926337
182 | 212,66097,fairy tale,1253926364
183 | 212,66097,Neil Gaiman,1253926360
184 | 212,66934,Captain Hammer,1253926165
185 | 212,66934,joss whedon,1253926158
186 | 212,66934,mad scientist,1253926186
187 | 212,66934,musical,1253926178
188 | 212,66934,Nathan Fillion,1253926182
189 | 212,66934,Neil Patrick Harris,1253926160
190 | 212,66934,parody,1253926168
191 | 212,68157,Brad Pitt,1253926443
192 | 212,68157,ending,1253926451
193 | 212,68157,gratuitous violence,1253926429
194 | 212,68157,Quentin Tarantino,1253926404
195 | 212,68157,satire,1253926435
196 | 212,68157,unusual plot structure,1253926417
197 | 212,68157,violence,1253926426
198 | 212,68157,World War II,1253926408
199 | 212,68319,action,1253928934
200 | 212,68319,bad plot,1253928950
201 | 212,68319,hugh jackman,1253928926
202 | 212,68319,superhero,1253928935
203 | 212,68319,too long,1253928954
204 | 212,69526,bad plot,1253929636
205 | 212,69526,robots,1253929657
206 | 212,69526,sufficiently explodey to be good,1253929639
207 | 212,69712,Abigail Breslin,1260688098
208 | 212,69712,ending,1260688053
209 | 212,69712,genuine characters,1260688086
210 | 212,69712,Jason Patric,1260688105
211 | 212,71057,animation style,1253926100
212 | 219,46970,funny,1156788918
213 | 219,46970,nascar,1156788904
214 | 219,46970,will farell,1156788913
215 | 219,47640,beer,1156788764
216 | 219,47640,guy movie,1156788774
217 | 255,6985,breathtaking,1237060605
218 | 255,7361,i'd like to live in this movie,1237060140
219 | 255,44555,breathtaking,1237060235
220 | 257,3030,akira kurosawa,1338084833
221 | 257,3030,black comedy,1338084822
222 | 277,5618,anime,1463916275
223 | 277,5618,children,1463916304
224 | 277,5618,fantasy,1463916289
225 | 277,5971,fantasy,1463916386
226 | 277,5971,feel-good,1463916401
227 | 277,82461,bad acting,1463916776
228 | 277,82461,bad plot,1463916772
229 | 277,104283,anime,1463916484
230 | 277,132046,sci-fi,1463916817
231 | 294,6754,vampire,1138983469
232 | 294,8865,1940's feel,1140395930
233 | 294,8865,unique look,1140395930
234 | 294,36401,fairy tales,1138983064
235 | 297,104,adam sandler,1318706504
236 | 297,608,overrated,1318706739
237 | 297,1103,overrated,1318706182
238 | 297,2502,that fat nerd is just annoying,1318706118
239 | 297,8528,seen it before,1318706449
240 | 297,47640,Just stupid,1318706536
241 | 297,52973,overrated,1318706470
242 | 297,55820,overrated,1318706705
243 | 297,61024,Exellent first half,1318706053
244 | 297,61024,terrible ending,1318706042
245 | 297,76251,Gore!!!,1318706275
246 | 297,76251,sort of honest,1318706296
247 | 297,78209,Lame everything,1318706227
248 | 313,153,Ei muista,1168879424
249 | 313,293,Ei muista,1168878292
250 | 313,434,Ei muista,1168879406
251 | 313,494,Ei muista,1168878844
252 | 313,593,Katso Sanna!,1146476938
253 | 313,745,Ei muista,1146476364
254 | 313,780,Ei muista,1168878578
255 | 313,858,Katso Sanna!,1146476794
256 | 313,1036,Ei muista,1168878388
257 | 313,1042,Katso Sanna!,1146477133
258 | 313,1120,Katso Sanna!,1146477118
259 | 313,1200,Katso Sanna!,1146477102
260 | 313,1291,Ei muista,1146476308
261 | 313,1610,Katso Sanna!,1146476692
262 | 313,1617,Ei muista,1168878593
263 | 313,2273,Ei muista,1146477049
264 | 313,2329,Ei muista,1146477033
265 | 313,2600,Ei muista,1168879003
266 | 313,2985,Ei muista,1168878743
267 | 313,2989,Ei muista,1168878972
268 | 313,3160,Ei muista,1168878736
269 | 313,3175,Ei muista,1168879647
270 | 313,3499,Ei muista,1168878727
271 | 313,3793,Katso Sanna!,1146476954
272 | 313,3916,Ei muista,1146476265
273 | 313,3949,Ei muista,1168878697
274 | 313,3994,Ei muista,1168879611
275 | 313,4011,Ei muista,1168878363
276 | 313,4023,ei muista,1146477341
277 | 313,4121,Katso Sanna!,1146477233
278 | 313,4223,Ei muista,1146476274
279 | 313,4638,Ei muista,1146477217
280 | 313,5064,Ei muista,1146476225
281 | 313,5602,Ei muista,1168879553
282 | 313,6218,Ei muista,1146477184
283 | 313,7347,Ei muista,1168879177
284 | 313,8781,Ei muista,1168879161
285 | 313,44199,Ei muista,1168878406
286 | 314,260,awesome,1437338154
287 | 314,260,awesome soundtrack,1437338150
288 | 314,260,jedi,1437338139
289 | 314,260,space adventure,1437338130
290 | 346,741,anime,1159734531
291 | 346,924,boring,1159734220
292 | 346,924,slow,1159734220
293 | 346,1036,explosions,1159734488
294 | 346,1199,dystopia,1159734421
295 | 346,1274,anime,1159734245
296 | 346,2019,long,1159734734
297 | 346,2571,sci-fi,1159734634
298 | 346,2571,virtual reality,1159734636
299 | 346,2924,martial arts,1159734502
300 | 346,2997,interesting,1159734363
301 | 346,3265,explosions,1159734552
302 | 346,3265,martial arts,1159734552
303 | 346,3996,martial arts,1159734445
304 | 346,4993,long,1159734600
305 | 346,5782,explosions,1159734683
306 | 346,5782,interesting,1159734683
307 | 346,5902,boring,1159734233
308 | 346,5952,boring,1159734622
309 | 346,5952,long,1159734622
310 | 346,6350,anime,1159734417
311 | 346,6539,Johnny Depp,1159734656
312 | 346,6857,anime,1159734652
313 | 346,7022,explosions,1159734326
314 | 346,7022,martial arts,1159734467
315 | 346,7090,martial arts,1159734555
316 | 346,7153,boring,1159734608
317 | 346,7153,long,1159734608
318 | 346,7361,interesting,1159734515
319 | 346,7844,martial arts,1159734589
320 | 346,8795,Long,1159734165
321 | 346,8874,comedy,1159734745
322 | 346,8874,zombies,1159734745
323 | 346,8961,super-hero,1159734563
324 | 346,26554,apocalypse,1159734706
325 | 346,26554,interesting,1159734706
326 | 346,26585,explosions,1159734379
327 | 346,31878,comedy,1159734584
328 | 346,31878,martial arts,1159734584
329 | 346,31878,Stephen Chow,1159734584
330 | 346,33794,bad camerawork,1159734301
331 | 346,33794,slow,1159734301
332 | 347,1721,romance,1463337171
333 | 347,122886,space,1463337217
334 | 347,122886,star,1463337213
335 | 353,4721,As historicaly correct as Germany winning WW2,1140389056
336 | 353,4721,but still a fun movie.,1140389056
337 | 353,7376,"The Rocks ""finest"" work need I say more?",1140389511
338 | 353,31221,Try not to mistake this for an episode of Alias,1140389595
339 | 353,32025,Mossad,1142686944
340 | 353,35836,dumb,1137217440
341 | 364,47,biblical,1444534976
342 | 364,47,crime,1444534982
343 | 364,47,dark,1444534994
344 | 364,47,disturbing,1444534971
345 | 364,47,greed,1444534998
346 | 364,47,horror,1444534981
347 | 364,47,serial killer,1444534961
348 | 364,47,violent,1444534985
349 | 364,50,thriller,1444534932
350 | 364,232,aging,1444531181
351 | 364,232,Ang Lee,1444531177
352 | 364,232,cooking,1444531178
353 | 364,232,food,1444531186
354 | 364,232,relationships,1444531173
355 | 364,293,assassin,1444534872
356 | 364,293,hit men,1444534872
357 | 364,293,thriller,1444534887
358 | 364,318,friendship,1444529800
359 | 364,318,Morgan Freeman,1444529792
360 | 364,318,narrated,1444529829
361 | 364,318,prison,1444529824
362 | 364,318,prison escape,1444529820
363 | 364,318,revenge,1444529816
364 | 364,318,Tim Robbins,1444529804
365 | 364,318,wrongful imprisonment,1444529809
366 | 364,1176,imaginative,1444528969
367 | 364,1176,intellectual,1444528955
368 | 364,1176,Irene Jacob,1444528944
369 | 364,1176,Krzysztof Kieslowski,1444528941
370 | 364,1176,lyrical,1444528947
371 | 364,1210,action,1444529881
372 | 364,1210,aliens,1444529879
373 | 364,1210,George Lucas,1444529892
374 | 364,1210,Harrison Ford,1444529867
375 | 364,1210,sci-fi,1444529889
376 | 364,1210,sequel,1444529879
377 | 364,1210,space,1444529864
378 | 364,1210,Star Wars,1444529869
379 | 364,1210,starship pilots,1444529899
380 | 364,1210,war,1444529884
381 | 364,1265,alternate reality,1444530190
382 | 364,1265,Bill Murray,1444530207
383 | 364,1265,character development,1444530216
384 | 364,1265,comedy,1444530195
385 | 364,1265,existentialism,1444530199
386 | 364,1265,feel-good,1444530209
387 | 364,1265,funny,1444530200
388 | 364,1265,love,1444530219
389 | 364,1265,romantic,1444530228
390 | 364,1265,self discovery,1444530203
391 | 364,1732,great dialogue,1444535166
392 | 364,1732,Jeff Bridges,1444535201
393 | 364,1732,Nudity (Full Frontal),1444535170
394 | 364,1732,off-beat comedy,1444535205
395 | 364,1732,quirky,1444535198
396 | 364,1732,satirical,1444535172
397 | 364,1732,Steve Buscemi,1444535164
398 | 364,2068,coming of age,1444530920
399 | 364,2068,funny,1444530913
400 | 364,2424,bookshop,1444530521
401 | 364,2424,happy ending,1444530524
402 | 364,2424,Meg Ryan,1444530516
403 | 364,2424,Romance,1444530517
404 | 364,2424,romantic comedy,1444530529
405 | 364,2424,Tom Hanks,1444530527
406 | 364,3948,awkward,1444529225
407 | 364,4018,creative,1444529752
408 | 364,4018,funny,1444529739
409 | 364,4018,Helen Hunt,1444529729
410 | 364,4018,hilarious,1444529746
411 | 364,4018,Mel Gibson,1444529756
412 | 364,4018,stereotypes,1444529760
413 | 364,4973,beautifully filmed,1444528866
414 | 364,4973,comedy,1444528870
415 | 364,4973,feel-good,1444528900
416 | 364,4973,imagination,1444528880
417 | 364,4973,love,1444528875
418 | 364,4973,notable soundtrack,1444528857
419 | 364,4973,quirky,1444528852
420 | 364,4973,whimsical,1444528860
421 | 364,5299,comedy,1444529040
422 | 364,5299,family,1444529044
423 | 364,5299,funny,1444529038
424 | 364,6350,aviation,1444531651
425 | 364,6350,funny,1444531664
426 | 364,6350,imagination,1444531644
427 | 364,6350,Studio Ghibli,1444531642
428 | 364,6350,visually appealing,1444531647
429 | 364,6539,comedy,1444529977
430 | 364,6539,Disney,1444529957
431 | 364,6539,funny,1444529960
432 | 364,6539,johnny depp,1444529962
433 | 364,6539,magic,1444529965
434 | 364,6539,Orlando Bloom,1444529978
435 | 364,6539,pirates,1444529987
436 | 364,6539,sword fight,1444529953
437 | 364,7161,family,1444535334
438 | 364,7161,funny,1444535340
439 | 364,7161,Steve Martin,1444535326
440 | 364,8464,american idiocy,1444531783
441 | 364,8464,documentary,1444531780
442 | 364,8464,social criticism,1444531787
443 | 364,26614,thriller,1444534834
444 | 364,26662,anime,1444530775
445 | 364,26662,Hayao Miyazaki,1444530771
446 | 364,26662,Studio Ghibli,1444530778
447 | 364,27611,military,1444534772
448 | 364,27611,sci-fi,1444534765
449 | 364,34321,Billy Bob Thornton,1444531424
450 | 364,34321,comedy,1444531438
451 | 364,34321,cursing,1444531450
452 | 364,34321,funny,1444531430
453 | 364,34321,redemption,1444531445
454 | 364,46578,beauty pageant,1444529380
455 | 364,46578,family,1444529368
456 | 364,46578,independent film,1444529351
457 | 364,46578,off-beat comedy,1444529354
458 | 364,46578,quirky,1444529369
459 | 364,46578,road trip,1444529384
460 | 364,46578,satire,1444529372
461 | 364,46578,steve carell,1444529375
462 | 364,56367,comedy,1444535241
463 | 364,56367,indie,1444535247
464 | 364,56367,pregnancy,1444535236
465 | 364,56367,teen pregnancy,1444535244
466 | 364,56367,witty,1444535238
467 | 364,64957,Aging,1444531050
468 | 364,64957,Brad Pitt,1444531041
469 | 364,64957,original plot,1444531048
470 | 364,64957,philosophical,1444531044
471 | 364,66934,comedy,1444530699
472 | 364,66934,great soundtrack,1444530694
473 | 364,66934,mad scientist,1444530688
474 | 364,66934,Neil Patrick Harris,1444530691
475 | 364,66934,parody,1444530697
476 | 364,68358,sci fi,1444530041
477 | 364,68358,spock,1444530038
478 | 364,68358,Star Trek,1444530030
479 | 364,68358,teleportation,1444530045
480 | 364,72998,aliens,1444529449
481 | 364,72998,graphic design,1444529463
482 | 364,72998,predictable,1444529440
483 | 364,72998,racism,1444529457
484 | 364,72998,sci-fi,1444529439
485 | 364,93040,drama,1444530864
486 | 364,93040,historical,1444530873
487 | 364,93040,interesting,1444530861
488 | 364,95441,cocaine,1444529273
489 | 364,95441,crude humor,1444529258
490 | 364,95441,directorial debut,1444529282
491 | 364,95441,Mark Wahlberg,1444529284
492 | 364,97938,Ang Lee,1444531076
493 | 364,97938,India,1444531086
494 | 364,97938,ocean,1444531083
495 | 364,97938,visually appealing,1444531077
496 | 364,106696,Disney,1444530267
497 | 364,106696,feminist,1444530281
498 | 364,106696,musical,1444530270
499 | 364,106696,storyline,1444530270
500 | 364,109249,beautiful scenery,1444529645
501 | 364,109249,coming of age,1444529671
502 | 364,109249,dark humor,1444529660
503 | 364,109249,Fernando E. Solanas,1444529608
504 | 364,109249,imaginative,1444529635
505 | 364,109249,quirky,1444529613
506 | 364,109249,whimsical,1444529629
507 | 364,109374,amazing storytelling,1444529104
508 | 364,109374,Bill Murray,1444529124
509 | 364,109374,cinematography,1444529143
510 | 364,109374,eastern europe,1444529119
511 | 364,109374,europe,1444529115
512 | 364,109374,historical,1444529139
513 | 364,109374,on the run,1444529148
514 | 364,109374,quirky,1444529109
515 | 364,115617,comedy,1444530392
516 | 364,115617,Coming of Age,1444530417
517 | 364,115617,family,1444530402
518 | 364,115617,friends,1444530410
519 | 364,115617,funny,1444530394
520 | 364,115617,happy ending,1444530413
521 | 364,115617,heartwarming,1444528813
522 | 364,115617,inspiring,1444530404
523 | 364,115617,japanese influence,1444528772
524 | 364,115617,pixar,1444530397
525 | 364,115617,technology,1444528781
526 | 364,118997,comedy,1444530159
527 | 364,118997,fairy tale,1444530100
528 | 364,118997,funny,1444530106
529 | 364,118997,great soundtrack,1444530127
530 | 364,118997,imaginative,1444530113
531 | 364,118997,meryl streep,1444530142
532 | 364,118997,musical,1444530098
533 | 364,134853,coming of age,1444530618
534 | 364,134853,creative,1444530620
535 | 364,134853,happiness,1444530631
536 | 364,134853,imaginative,1444530628
537 | 364,134853,Pixar,1444530633
538 | 377,54290,Why the terrorists hate us,1187054169
539 | 380,40614,predictable,1148077862
540 | 402,260,coming of age,1443393648
541 | 402,260,"space epic, science fiction, hero's journey",1443393664
542 | 423,111,cult film,1354033605
543 | 423,111,Martin Scorsese,1354033608
544 | 423,111,social commentary,1354033602
545 | 423,247,stylized,1353702088
546 | 423,247,surreal,1353702083
547 | 423,247,surrealism,1353702082
548 | 423,247,visceral,1353702078
549 | 423,745,animation,1354033746
550 | 423,1079,dark comedy,1353703178
551 | 423,1079,dark humor,1353703182
552 | 423,1079,John Cleese,1353703171
553 | 423,1079,quirky,1353703185
554 | 423,1080,Monty Python,1354033667
555 | 423,1089,nonlinear,1354033640
556 | 423,1089,organized crime,1354033636
557 | 423,1089,Quentin Tarantino,1354033638
558 | 423,1219,Alfred Hitchcock,1354033644
559 | 423,1219,psychology,1354033646
560 | 423,1256,Marx Brothers,1354033718
561 | 423,1258,cult film,1354033633
562 | 423,1258,psychology,1354033630
563 | 423,1729,dark comedy,1354033685
564 | 423,2288,disturbing,1354033600
565 | 423,2529,post-apocalyptic,1354033650
566 | 423,2529,twist ending,1354033653
567 | 423,3030,atmospheric,1354033596
568 | 423,3546,insanity,1353702162
569 | 423,3546,psychological thriller,1353702155
570 | 423,5618,atmospheric,1354033628
571 | 423,5618,surreal,1354033624
572 | 423,5662,Dave Foley,1354033569
573 | 423,6287,Adam Sandler,1354044970
574 | 423,6713,Satoshi Kon,1354033681
575 | 423,8188,elegant,1353701935
576 | 423,27592,stylized,1354033620
577 | 423,27592,violent,1354033611
578 | 423,27773,depressing,1353702390
579 | 423,27773,disturbing,1353702357
580 | 423,27773,hallucinatory,1353702345
581 | 423,27773,paranoid,1353702396
582 | 423,27773,revenge,1353702393
583 | 423,27773,stylized,1353702395
584 | 423,27773,twist ending,1353702400
585 | 423,27773,vengeance,1353702404
586 | 423,30745,Takashi Miike,1354033711
587 | 423,31658,Studio Ghibli,1354033715
588 | 423,52885,anime,1354033657
589 | 423,55820,coen brothers,1354033663
590 | 423,58559,Batman,1354033727
591 | 423,67997,british,1354033698
592 | 423,67997,politics,1354033700
593 | 423,67997,satire,1354033695
594 | 431,5,steve martin,1140455432
595 | 431,150,tom hanks,1165548454
596 | 431,186,hugh grant,1140455419
597 | 431,215,ethan hawke,1165548716
598 | 431,236,meg ryan,1140455397
599 | 431,246,school,1140455370
600 | 431,260,classic,1140454408
601 | 431,300,Ralph Fiennes,1165548551
602 | 431,318,Phenomenal!,1140454312
603 | 431,337,Johnny Depp***,1140455226
604 | 431,350,overrated,1140455303
605 | 431,357,didn't get it,1140455255
606 | 431,377,Sandra Bullock,1140454879
607 | 431,520,very funny!,1140454299
608 | 431,529,jodi foster,1140455238
609 | 431,587,sweet,1140455292
610 | 431,628,edward norton,1140455195
611 | 431,661,creepy good,1140454443
612 | 431,904,jimmy stewart,1140455208
613 | 431,1059,claire daines,1140455275
614 | 431,1059,clever,1140455275
615 | 431,1059,Leonardo DiCaprio,1140455275
616 | 431,1059,shakespeare,1140455275
617 | 431,1242,denzel washington,1140455181
618 | 431,1250,war,1140455073
619 | 431,1259,r. phoneix,1140455011
620 | 431,1291,cool,1140454996
621 | 431,1358,billy bob thorton,1140455032
622 | 431,1376,not too thrilled,1140455126
623 | 431,1517,cheezy to the max!,1140454281
624 | 431,1635,moody,1165548220
625 | 431,1639,hillarious,1140455102
626 | 431,1801,good but not accurate,1140454574
627 | 431,1954,stallone,1140455043
628 | 431,1956,il,1165548392
629 | 431,1956,lake forest,1165548392
630 | 431,2004,not worth your time,1140454610
631 | 431,2012,liked the other two better,1140455152
632 | 431,2019,classic,1140455086
633 | 431,2085,cute,1140454548
634 | 431,2174,Micheal Keaton,1140454821
635 | 431,2369,must see,1140454589
636 | 431,2396,colin firth,1165548675
637 | 431,2396,Gwenth Paltrow,1165548675
638 | 431,2396,joseph fiennes,1165548675
639 | 431,2571,philosophy,1140454933
640 | 431,2692,cool and great music,1140454393
641 | 431,2700,stupidity,1140454856
642 | 431,3052,controversial,1140454805
643 | 431,3113,stupid,1140455730
644 | 431,3114,Tom Hanks,1140454833
645 | 431,3260,boring,1140455703
646 | 431,3361,tim robbins,1140455685
647 | 431,3751,mel gibson,1140454892
648 | 431,3785,stupid,1140455743
649 | 431,3793,can't stand rogue!,1140454770
650 | 431,3893,renee z,1140455496
651 | 431,3978,boring,1140455549
652 | 431,3994,predictable,1140454952
653 | 431,4016,okay,1140454522
654 | 431,4041,overrated,1140455478
655 | 431,4306,witty!,1140454779
656 | 431,4351,keanu reeves,1140455511
657 | 431,4369,no desire to see this,1140454634
658 | 431,4641,thora birch,1140455465
659 | 431,4701,jakie chan,1140455522
660 | 431,4823,john cusack,1140455540
661 | 431,4890,Gwenth Paltrow,1140455564
662 | 431,5064,not by book,1165548741
663 | 431,5135,nair,1165548358
664 | 431,5349,tobey maguire,1140454907
665 | 431,5377,books,1140455343
666 | 431,5377,relationships,1140455343
667 | 431,5481,mike myers,1140455409
668 | 431,5812,moore,1165548495
669 | 431,6281,collin farrel,1140455383
670 | 431,6539,I loved it! Seen it five times already!,1140454336
671 | 431,6953,Benicio Del Toro,1165548439
672 | 431,8636,the best comic adaptation!,1140454370
673 | 446,65514,kung fu,1436621295
674 | 446,65514,martial arts,1436621295
675 | 446,65514,well done,1436621295
676 | 446,68954,feel good,1436621325
677 | 446,68954,light,1436621329
678 | 446,88125,dark,1436621257
679 | 448,260,sci-fi,1444761481
680 | 448,260,supernatural powers,1444761492
681 | 450,107649,Alex van Warmerdam,1475737520
682 | 450,107649,magical realism,1475737516
683 | 450,107649,weird,1475737522
684 | 456,3703,action,1432308664
685 | 456,3703,desert,1432308674
686 | 456,3703,Dieselpunk,1432308755
687 | 456,3703,Mel Gibson,1432308659
688 | 456,3703,Post apocalyptic,1432308686
689 | 456,3703,post-apocalyptic,1432308656
690 | 456,44191,dystopia,1432308622
691 | 456,44191,thought-provoking,1432308624
692 | 456,106473,animation,1432309088
693 | 456,106473,anime,1432309076
694 | 456,106473,based on a TV show,1432309085
695 | 456,106473,fantasy world,1432309090
696 | 468,3676,National Film Registry,1296204035
697 | 468,5449,Adam Sandler,1296202619
698 | 478,55247,adventure,1446622291
699 | 478,55247,freedom,1446622248
700 | 478,55247,road trip,1446622240
701 | 478,55247,self discovery,1446622259
702 | 478,55247,travel,1446622390
703 | 478,106918,Iceland,1446830144
704 | 478,106918,photography,1446830153
705 | 478,106918,travel,1446830142
706 | 480,8369,who done it,1339456173
707 | 480,70286,aliens,1339283926
708 | 480,70286,humor,1339283929
709 | 480,70286,sci-fi,1339283933
710 | 480,70286,violence,1339283935
711 | 480,78349,claustrophobic,1339283636
712 | 480,78349,experiment,1339283627
713 | 480,78349,group psychology,1339283610
714 | 480,78349,psychological,1339283617
715 | 480,94777,aliens,1339284270
716 | 480,94777,franchise,1339284250
717 | 480,94777,funny,1339284240
718 | 480,94777,time travel,1339284259
719 | 481,260,George Lucas,1437000955
720 | 481,260,starwars,1437000940
721 | 481,1900,brother sister relationship,1437107499
722 | 481,1900,sad,1437107475
723 | 481,1900,siblings,1437107491
724 | 481,2361,cult film,1437105893
725 | 481,117533,big brother,1437004053
726 | 481,117533,documentary,1437004053
727 | 481,117533,privacy,1437004053
728 | 501,1,Pixar,1292956344
729 | 501,47,psychology,1292956276
730 | 501,47,twist ending,1292956362
731 | 501,296,dark comedy,1292956439
732 | 501,296,Quentin Tarantino,1292956435
733 | 501,778,dark comedy,1292956194
734 | 501,1089,organized crime,1292956420
735 | 501,1089,Quentin Tarantino,1292956416
736 | 501,2542,dark comedy,1292956446
737 | 501,2542,Guy Ritchie,1292956448
738 | 501,2542,organized crime,1292956450
739 | 501,2692,nonlinear,1292956226
740 | 501,2692,time travel,1292956392
741 | 501,2959,dark comedy,1292956481
742 | 501,2959,psychology,1292956476
743 | 501,2959,twist ending,1292956479
744 | 501,3114,animation,1292956346
745 | 501,3949,addiction,1292956422
746 | 501,3949,psychology,1292956425
747 | 501,4011,Guy Ritchie,1292956203
748 | 501,4011,twist ending,1292956205
749 | 501,5608,psychology,1292956490
750 | 501,6016,multiple storylines,1292956518
751 | 501,40148,guy ritchie,1292956408
752 | 501,47099,based on a true story,1292956431
753 | 501,60072,assassin,1292956189
754 | 501,62849,Drugs,1292956230
755 | 501,62849,Guy Ritchie,1292956234
756 | 501,62849,twist ending,1292956236
757 | 501,68157,Quentin Tarantino,1292956453
758 | 501,74458,psychological,1292956215
759 | 501,74458,twist ending,1292956212
760 | 501,77455,renegade art,1292956306
761 | 501,78499,Pixar,1292956196
762 | 501,79132,alternate reality,1292956469
763 | 503,260,Science Fiction,1432365802
764 | 503,260,space,1432365814
765 | 512,74458,Psychological Thriller,1434169401
766 | 512,79132,Intrigued,1434169472
767 | 520,4322,baseball,1146596187
768 | 531,1028,comedy of manners,1243454358
769 | 531,1028,Disney,1243454364
770 | 531,1028,Disney studios,1243454367
771 | 531,1028,Julie Andrews,1243454386
772 | 531,1028,multiple roles,1243454371
773 | 531,1028,musical,1243454374
774 | 531,1028,villain nonexistent or not needed for good story,1243454382
775 | 531,1088,80's classic,1243454483
776 | 531,1088,dance,1243454500
777 | 531,1088,girlie movie,1243454518
778 | 531,1088,music,1243454488
779 | 531,1088,musical parodies,1243454503
780 | 531,1088,rich families,1243454508
781 | 531,1258,disturbing,1243454832
782 | 531,1258,Nudity (Full Frontal - Notable),1243454840
783 | 531,1258,Nudity (Full Frontal),1243454842
784 | 531,1258,psychological,1243454846
785 | 531,1258,violent,1243454848
786 | 531,1997,demons,1243454875
787 | 531,1997,horror,1243454878
788 | 531,1997,possession,1243454880
789 | 531,1997,scary,1243454882
790 | 531,2942,dance,1243454447
791 | 531,2942,girlie movie,1243454450
792 | 531,2942,Nudity (Topless),1243454452
793 | 531,2942,strippers,1243454454
794 | 531,4973,comedy,1243510202
795 | 531,4973,drama,1243510204
796 | 531,4973,notable soundtrack,1243510209
797 | 531,4973,quirky,1243510212
798 | 531,6218,football,1243509668
799 | 531,6218,Funny,1243509674
800 | 531,6218,Keira Knightley,1243509670
801 | 531,6218,love,1243509684
802 | 531,6863,funny,1243454300
803 | 531,6863,Jack Black,1243454312
804 | 531,6863,music,1243454317
805 | 531,6863,not only for kids,1243454321
806 | 531,6863,Rock,1243454333
807 | 531,6863,rock and roll,1243454326
808 | 531,6942,british,1243454933
809 | 531,6942,christmas,1243454936
810 | 531,6942,ensemble cast,1243454939
811 | 531,6942,Keira Knightley,1243454943
812 | 531,6942,love,1243454946
813 | 531,6942,multiple storylines,1243454948
814 | 531,6942,Nudity (Topless - Notable),1243454950
815 | 531,6942,Nudity (Topless),1243454952
816 | 531,6942,Romance,1243454954
817 | 531,8533,covers a lifespan,1243455160
818 | 531,8533,memories,1243455166
819 | 531,35836,comedy,1243454586
820 | 531,35836,crude,1243454590
821 | 531,35836,funny,1243454588
822 | 531,35836,nerds,1243454598
823 | 531,35836,Nudity (Topless - Notable),1243454603
824 | 531,35836,Nudity (Topless),1243454610
825 | 531,35836,sex,1243454608
826 | 531,45720,fashion,1243454978
827 | 531,45720,New York,1243454980
828 | 531,45720,Paris,1243454982
829 | 531,45720,Streep strong & funny,1243454985
830 | 531,59725,fashion,1243455007
831 | 531,59725,New York City,1243455012
832 | 531,59725,Nudity (Topless),1243455014
833 | 531,59725,R:strong sexual content,1243455020
834 | 531,59725,romance,1243455018
835 | 531,63131,funny,1243509641
836 | 531,63131,obvious plot,1243509645
837 | 531,63992,romance,1243455092
838 | 531,63992,Teen movie,1243455095
839 | 531,63992,vampires,1243455099
840 | 531,64957,Brad Pitt,1243455053
841 | 531,64957,cinematography,1243455057
842 | 531,64957,drama,1243455060
843 | 531,64969,easily confused with other movie(s) (title),1243454548
844 | 531,64969,funny,1243454556
845 | 531,64969,Jim carrey,1243454553
846 | 531,64969,Zooey Deschanel,1243454561
847 | 546,3176,Gwyneth Paltrow,1301715429
848 | 546,3176,Jude Law,1301715415
849 | 546,3707,Nudity (Rear),1301284184
850 | 546,3707,Nudity (Topless),1301284182
851 | 546,3707,sexy food,1301284180
852 | 546,5984,BDSM,1301785972
853 | 546,5984,Blindfold,1301785976
854 | 546,5984,Bondage,1301785990
855 | 546,5984,Domination,1301785971
856 | 546,5984,Fetish,1301785980
857 | 546,5984,Gag,1301785974
858 | 546,5984,Spanked,1301785983
859 | 546,5984,Submission,1301785988
860 | 546,5984,Tied,1301785981
861 | 546,5984,Udo Kier,1301785978
862 | 546,7155,Nudity (Topless - Notable),1334013574
863 | 546,7155,Nudity (Topless),1334013570
864 | 546,39421,Nudity (Topless),1334013436
865 | 546,39421,porn,1334013433
866 | 546,48780,based on a book,1301715225
867 | 546,48780,twist ending,1301715231
868 | 546,53318,notable nudity,1301196436
869 | 546,53318,Nudity (Full Frontal),1301196425
870 | 547,215,holes90s,1342849999
871 | 547,293,holes90s,1342849862
872 | 547,306,holes90s,1342849839
873 | 547,319,holes90s,1342849790
874 | 547,364,holes90s,1342849955
875 | 547,541,afi,1182393913
876 | 547,588,holes90s,1342849969
877 | 547,599,dvd,1388422564
878 | 547,599,holes60s,1388820371
879 | 547,757,getdvd,1423201850
880 | 547,914,holes60s,1342850735
881 | 547,946,holes40s,1412348245
882 | 547,954,afi,1182393876
883 | 547,1153,tcm,1189187818
884 | 547,1172,holes80s,1342849456
885 | 547,1197,holes80s,1342849599
886 | 547,1209,holes60s,1351319934
887 | 547,1211,holes80s,1342849466
888 | 547,1232,sightsound,1343973832
889 | 547,1245,holes90s,1342849804
890 | 547,1260,tcm,1189187750
891 | 547,1272,holes70s,1342849087
892 | 547,1273,holes80s,1342849477
893 | 547,1281,holes40s,1412348178
894 | 547,1281,tivo,1476583274
895 | 547,1283,afi,1182394030
896 | 547,1283,hdtv,1199396362
897 | 547,1572,sightsound,1343973703
898 | 547,1935,holes40s,1412348396
899 | 547,1942,holes40s,1412348211
900 | 547,1957,holes80s,1342849494
901 | 547,2028,afi,1182394053
902 | 547,2028,hdtv,1200784012
903 | 547,2028,holes90s,1342849945
904 | 547,2109,holes70s,1342849272
905 | 547,2178,holes70s,1342849166
906 | 547,2182,tcm,1412347392
907 | 547,2202,holes40s,1412348261
908 | 547,2330,holes90s,1342849749
909 | 547,2677,holes90s,1342849900
910 | 547,2730,getdvd,1350531508
911 | 547,2730,holes70s,1342849220
912 | 547,2747,tivo,1476583080
913 | 547,2857,holes60s,1342850705
914 | 547,2920,getdvd,1412348038
915 | 547,2920,holes40s,1412348038
916 | 547,2927,holes40s,1412348319
917 | 547,2936,afi,1182393939
918 | 547,2936,holes40s,1412348275
919 | 547,3022,afi,1182393819
920 | 547,3089,holes40s,1412348191
921 | 547,3111,holes80s,1342849685
922 | 547,3246,holes90s,1342849768
923 | 547,3384,holes70s,1342849030
924 | 547,3415,sightsound,1343973557
925 | 547,3498,holes70s,1342849114
926 | 547,3539,tivo,1476583329
927 | 547,3629,afi,1182433856
928 | 547,3629,hdtv,1200784036
929 | 547,3632,holes40s,1412348295
930 | 547,3634,getdvd,1418986019
931 | 547,3634,holes60s,1418986015
932 | 547,3671,holes70s,1342849229
933 | 547,3679,holes80s,1342849435
934 | 547,3742,sightsound,1343973400
935 | 547,3789,holes60s,1342850721
936 | 547,3811,holes80s,1342849524
937 | 547,3947,holes70s,1431088744
938 | 547,4103,holes80s,1342849627
939 | 547,4262,holes80s,1342849544
940 | 547,4326,holes80s,1342849552
941 | 547,4327,holes60s,1342850801
942 | 547,4437,holes70s,1342849298
943 | 547,4712,holes70s,1342849242
944 | 547,4763,getdvd,1471010946
945 | 547,4928,holes70s,1342849195
946 | 547,5169,tcm,1189187923
947 | 547,5198,holes80s,1342849514
948 | 547,5289,tcm,1189187738
949 | 547,5464,holes00s,1342850481
950 | 547,5747,holes80s,1342849443
951 | 547,5791,holes00s,1342850231
952 | 547,5795,tosee,1475705682
953 | 547,5956,dvd,1259388129
954 | 547,6229,holes70s,1342849313
955 | 547,6433,sightsound,1343973308
956 | 547,6515,tcm,1190475823
957 | 547,6643,sightsound,1343973230
958 | 547,6783,sightsound,1343973251
959 | 547,6830,tcm,1190475890
960 | 547,6874,holes00s,1342850598
961 | 547,6981,sightsound,1343973754
962 | 547,6983,holes40s,1412348440
963 | 547,6985,sightsound,1343973328
964 | 547,7195,tcm,1189187983
965 | 547,7243,afi,1182394001
966 | 547,7243,hdtv,1200784002
967 | 547,7335,tcm,1189187908
968 | 547,7438,holes00s,1342850493
969 | 547,7792,holes70s,1342849103
970 | 547,7926,getdvd,1412347952
971 | 547,8125,afi,1182393855
972 | 547,8128,holes80s,1342849418
973 | 547,8154,holes60s,1342850820
974 | 547,8195,sightsound,1343973620
975 | 547,8207,holes70s,1342849135
976 | 547,8236,tcm,1190475838
977 | 547,8494,dvd,1254390631
978 | 547,8584,tivo,1476583430
979 | 547,8645,holes00s,1342850358
980 | 547,8751,tcm,1189187687
981 | 547,8765,tcm,1189187626
982 | 547,8766,tcm,1189187878
983 | 547,8767,tcm,1189187947
984 | 547,25805,sightsound,1343973445
985 | 547,25927,holes40s,1412348360
986 | 547,25927,tcm,1189187934
987 | 547,26150,sightsound,1343973817
988 | 547,26151,sightsound,1343973505
989 | 547,26366,holes70s,1342849254
990 | 547,31770,tcm,1189187673
991 | 547,34517,tivo,1476583581
992 | 547,37741,toplist05,1378985294
993 | 547,38304,toplist05,1378985376
994 | 547,39183,toplist05,1334343486
995 | 547,40629,toplist05,1378985440
996 | 547,40819,toplist05,1378985328
997 | 547,41585,tcm,1189187574
998 | 547,41863,toplist06,1197165976
999 | 547,41997,hdtv,1200107294
1000 | 547,41997,holes00s,1342850302
1001 | 547,42217,sightsound,1343973482
1002 | 547,44199,toplist06,1197165844
1003 | 547,44555,toplist06,1235048573
1004 | 547,45028,toplist06,1197165941
1005 | 547,45210,toplist06,1198810519
1006 | 547,46578,toplist06,1197165894
1007 | 547,46664,holes40s,1412348651
1008 | 547,46664,tcm,1412348651
1009 | 547,46723,toplist06,1197165832
1010 | 547,47099,toplist06,1197164385
1011 | 547,47274,holes70s,1342849207
1012 | 547,47423,toplist06,1197164296
1013 | 547,47629,toplist06,1197165707
1014 | 547,48394,hdtv,1200549984
1015 | 547,48394,holes00s,1342850387
1016 | 547,48516,toplist06,1197165676
1017 | 547,48696,toplist06,1197165862
1018 | 547,48738,toplist06,1197164312
1019 | 547,48774,toplist06,1197165663
1020 | 547,49824,toplist06,1197165695
1021 | 547,49917,toplist06,1268555186
1022 | 547,50068,holes00s,1342850218
1023 | 547,51540,toplist07,1192323038
1024 | 547,52241,toplist07,1197122754
1025 | 547,52281,hdtv,1200875555
1026 | 547,52281,holes00s,1342850259
1027 | 547,52579,getdvd,1471606640
1028 | 547,52967,toplist07,1197313756
1029 | 547,53000,toplist07,1198811122
1030 | 547,53123,toplist07,1187738100
1031 | 547,53894,toplist07,1197122145
1032 | 547,53953,getdvd,1467870755
1033 | 547,54272,toplist07,1187738146
1034 | 547,54286,toplist07,1197122822
1035 | 547,54513,toplist07,1187737659
1036 | 547,54881,toplist07,1268555304
1037 | 547,55052,toplist07,1197122048
1038 | 547,55063,toplist08,1228486494
1039 | 547,55069,holes00s,1472570165
1040 | 547,55069,toplist07,1301765688
1041 | 547,55118,toplist07,1197121987
1042 | 547,55247,toplist07,1197121953
1043 | 547,55253,toplist07,1268555286
1044 | 547,55269,toplist07,1197122030
1045 | 547,55276,toplist07,1195959915
1046 | 547,55280,holes00s,1342850401
1047 | 547,55363,getdvd,1412346850
1048 | 547,55363,holes00s,1342850280
1049 | 547,55442,holes00s,1342850172
1050 | 547,55442,toplist07,1268555437
1051 | 547,55820,toplist07,1197121965
1052 | 547,55946,tivo,1476113630
1053 | 547,56015,tcm,1412347417
1054 | 547,56152,toplist07,1198811180
1055 | 547,56286,toplist07,1268555500
1056 | 547,56367,toplist07,1198968553
1057 | 547,56607,toplist07,1200334592
1058 | 547,56782,toplist07,1200334609
1059 | 547,56788,toplist07,1201665448
1060 | 547,56805,toplist07,1198968631
1061 | 547,57669,toplist08,1250738298
1062 | 547,58191,toplist08,1223243755
1063 | 547,58879,toplist08,1207957343
1064 | 547,60069,toplist08,1230814631
1065 | 547,60766,toplist08,1223243682
1066 | 547,61024,toplist08,1223243801
1067 | 547,61236,holes00s,1342850181
1068 | 547,61236,toplist08,1230870494
1069 | 547,61240,holes00s,1342850379
1070 | 547,61240,toplist08,1230814624
1071 | 547,61323,toplist08,1223244145
1072 | 547,61357,toplist08,1222269690
1073 | 547,63082,toplist08,1228486518
1074 | 547,63876,toplist08,1230814303
1075 | 547,64614,toplist08,1250440712
1076 | 547,64620,toplist08,1230814286
1077 | 547,64622,toplist08,1230942216
1078 | 547,64701,toplist08,1230869509
1079 | 547,64839,toplist08,1236951337
1080 | 547,66665,toplist09,1253860576
1081 | 547,67087,toplist09,1259384470
1082 | 547,67255,toplist10,1270485757
1083 | 547,67429,toplist08,1238823068
1084 | 547,67665,getdvd,1321564884
1085 | 547,67665,toplist09,1250742907
1086 | 547,67997,toplist09,1250742829
1087 | 547,68157,toplist09,1253860562
1088 | 547,68954,holes00s,1342850438
1089 | 547,69481,toplist09,1250742859
1090 | 547,70286,toplist09,1250742877
1091 | 547,70293,toplist09,1253860588
1092 | 547,71108,toplist09,1292047039
1093 | 547,71464,toplist09,1259383811
1094 | 547,71745,toplist09,1259383809
1095 | 547,72011,toplist09,1262145753
1096 | 547,72131,toplist09,1257694906
1097 | 547,72176,tivo,1476113567
1098 | 547,72226,toplist09,1259383836
1099 | 547,72395,toplist09,1262146022
1100 | 547,72720,toplist09,1271429973
1101 | 547,72741,tivo,1476113594
1102 | 547,73023,toplist09,1267280510
1103 | 547,74324,toplist10,1312868042
1104 | 547,74458,toplist10,1276350948
1105 | 547,74545,toplist10,1276350928
1106 | 547,77455,getdvd,1446815214
1107 | 547,77455,toplist10,1292077803
1108 | 547,78039,toplist10,1296798992
1109 | 547,78499,toplist10,1293381435
1110 | 547,78574,toplist10,1281363877
1111 | 547,78653,toplist09,1277386636
1112 | 547,79132,toplist10,1279986154
1113 | 547,79242,toplist10,1281363890
1114 | 547,80463,toplist10,1287199615
1115 | 547,80489,toplist10,1294976085
1116 | 547,81562,toplist10,1292077909
1117 | 547,81591,toplist10,1296799009
1118 | 547,81786,getdvd,1468248495
1119 | 547,81786,toplist11,1312868144
1120 | 547,81845,toplist10,1296798962
1121 | 547,81932,toplist10,1296799034
1122 | 547,82313,tivo,1476650744
1123 | 547,82459,toplist10,1296798978
1124 | 547,82463,toplist10,1299563653
1125 | 547,83976,toplist11,1328012694
1126 | 547,85394,toplist11,1312868243
1127 | 547,86320,toplist11,1317998755
1128 | 547,86833,toplist11,1312868088
1129 | 547,87304,toplist11,1312868078
1130 | 547,88129,toplist11,1322848734
1131 | 547,88235,toplist11,1322849030
1132 | 547,88810,toplist11,1322894457
1133 | 547,89260,toplist11,1329513993
1134 | 547,89470,toplist11,1317998768
1135 | 547,89492,toplist11,1317998745
1136 | 547,89759,toplist11,1327491628
1137 | 547,89804,toplist11,1322848784
1138 | 547,90057,toplist11,1322848999
1139 | 547,90376,toplist11,1327491698
1140 | 547,90439,toplist11,1332487973
1141 | 547,90531,toplist11,1327491714
1142 | 547,90866,toplist11,1327491687
1143 | 547,91077,toplist11,1322848957
1144 | 547,94931,toplist12,1355599327
1145 | 547,94959,toplist12,1342275142
1146 | 547,94969,getdvd,1409381147
1147 | 547,95135,toplist12,1354966882
1148 | 547,95449,toplist12,1356967567
1149 | 547,95558,toplist12,1348198533
1150 | 547,95761,dvd,1387360646
1151 | 547,96417,dvd,1367718836
1152 | 547,96588,toplist12,1355599298
1153 | 547,96610,dvd,1361888026
1154 | 547,96610,toplist12,1354966811
1155 | 547,96728,toplist12,1355599008
1156 | 547,96811,toplist12,1356623017
1157 | 547,96832,dvd,1369217683
1158 | 547,96832,toplist12,1356409181
1159 | 547,97304,toplist12,1353063038
1160 | 547,97673,toplist13,1383626018
1161 | 547,97752,toplist12,1354966835
1162 | 547,97921,toplist12,1354966859
1163 | 547,97923,toplist12,1356623118
1164 | 547,97938,toplist12,1354966894
1165 | 547,98154,toplist12,1354966827
1166 | 547,98961,toplist12,1355938883
1167 | 547,99114,toplist12,1357543606
1168 | 547,99149,toplist12,1356273884
1169 | 547,101285,getdvd,1386729453
1170 | 547,101525,toplist12,1386245688
1171 | 547,101895,toplist13,1383626036
1172 | 547,102194,toplist13,1386658060
1173 | 547,102469,getdvd,1387083948
1174 | 547,103107,toplist13,1396017727
1175 | 547,103372,toplist13,1383625950
1176 | 547,103449,dvd,1378132522
1177 | 547,103624,toplist13,1388209512
1178 | 547,103688,dvd,1472303875
1179 | 547,105197,toplist13,1386657913
1180 | 547,105355,toplist13,1397129838
1181 | 547,105504,toplist13,1386245656
1182 | 547,106100,toplist13,1389766232
1183 | 547,106766,toplist13,1386658220
1184 | 547,106916,toplist13,1386657846
1185 | 547,106920,toplist13,1386657798
1186 | 547,107141,toplist13,1386946018
1187 | 547,107636,tivo,1476583144
1188 | 547,109374,toplist14,1402760218
1189 | 547,110871,getdvd,1399051398
1190 | 547,111249,getdvd,1423194272
1191 | 547,111251,dvd,1410947141
1192 | 547,111251,toplist14,1446283752
1193 | 547,111505,getdvd,1400501925
1194 | 547,111622,toplist14,1420131617
1195 | 547,112070,tivo,1476113970
1196 | 547,112183,toplist14,1416281804
1197 | 547,112290,toplist14,1411137921
1198 | 547,112515,getdvd,1423194952
1199 | 547,112515,toplist14,1418485385
1200 | 547,112550,getdvd,1470238109
1201 | 547,112552,toplist14,1417010184
1202 | 547,112556,toplist14,1412347244
1203 | 547,112852,toplist14,1407510485
1204 | 547,113064,getdvd,1423194143
1205 | 547,114254,getdvd,1450365156
1206 | 547,114342,toplist14,1417010271
1207 | 547,114459,getdvd,1412474417
1208 | 547,114662,toplist14,1423131235
1209 | 547,115139,tcm,1413426713
1210 | 547,115569,toplist14,1431187332
1211 | 547,115713,toplist15,1446211131
1212 | 547,116161,toplist14,1417010500
1213 | 547,116797,toplist14,1425780053
1214 | 547,117176,toplist14,1417010208
1215 | 547,117533,dvd,1473748292
1216 | 547,117533,getdvd,1446815098
1217 | 547,117533,toplist14,1417010466
1218 | 547,118700,getdvd,1446815138
1219 | 547,118700,toplist14,1423224182
1220 | 547,118880,getdvd,1449757705
1221 | 547,118880,toplist14,1449757002
1222 | 547,122882,getdvd,1470112098
1223 | 547,122882,toplist15,1449755519
1224 | 547,123663,tivo,1476583542
1225 | 547,123695,tivo,1476583459
1226 | 547,127108,tivo,1476583101
1227 | 547,127108,toplist15,1449756138
1228 | 547,127114,toplist15,1449756046
1229 | 547,127144,tivo,1476583306
1230 | 547,127206,getdvd,1470111984
1231 | 547,127212,tivo,1476113822
1232 | 547,128235,tivo,1476583410
1233 | 547,128360,toplist15,1449755822
1234 | 547,128512,getdvd,1470112064
1235 | 547,128606,toplist15,1468592921
1236 | 547,131796,tivo,1476583367
1237 | 547,132458,tivo,1476583517
1238 | 547,132547,dvd,1473748244
1239 | 547,132549,tivo,1476583387
1240 | 547,132800,getdvd,1454164221
1241 | 547,133645,toplist15,1449754606
1242 | 547,133771,getdvd,1470111738
1243 | 547,134130,toplist15,1446210938
1244 | 547,134853,toplist15,1446211185
1245 | 547,134859,getdvd,1470111811
1246 | 547,134859,toplist15,1449755731
1247 | 547,134881,toplist15,1446283359
1248 | 547,137337,toplist15,1446211164
1249 | 547,139385,toplist15,1449755171
1250 | 547,139642,tivo,1476583038
1251 | 547,140174,toplist15,1446211303
1252 | 547,140715,tivo,1476583167
1253 | 547,140725,tivo,1476583059
1254 | 547,141749,toplist15,1468077718
1255 | 547,142488,toplist15,1449754627
1256 | 547,143385,toplist15,1446211337
1257 | 547,144172,bkk,1472400655
1258 | 547,146656,toplist15,1449755899
1259 | 547,148482,tivo,1476583253
1260 | 547,148626,toplist15,1449755637
1261 | 547,155064,bkk,1472179444
1262 | 547,156387,toplist16,1467946629
1263 | 547,158783,bkk,1466133970
1264 | 547,160954,bkk,1472178574
1265 | 547,161336,getdvd,1469112392
1266 | 547,161582,bkk,1472737430
1267 | 547,163056,bkk,1472178747
1268 | 547,163949,toplist16,1476419254
1269 | 547,164977,tivo,1476113746
1270 | 547,164979,tivo,1476113908
1271 | 567,138610,found footage,1436827601
1272 | 567,138610,survival horror,1436827612
1273 | 574,47044,Michael Mann,1232812787
1274 | 583,112552,determination,1430526450
1275 | 583,112552,devotion,1430526450
1276 | 583,112552,music,1430526450
1277 | 599,66203,honest,1344134759
1278 | 599,82152,alex pettyfer,1306109318
1279 | 611,105504,hijacking,1471521058
1280 | 611,105504,ocean,1471521036
1281 | 611,105504,pirates,1471521067
1282 | 611,105504,suspense,1471521044
1283 | 615,111384,revenge,1408781036
1284 | 615,111384,vengeance,1408781025
1285 | 615,111931,femme-fatale,1425503802
1286 | 615,111931,gritty,1425503802
1287 | 615,111931,low-budget,1425503802
1288 | 630,260,classic sci-fi,1443807766
1289 | 630,260,series,1443807803
1290 | 652,146501,gay,1449533216
1291 | 660,260,"imaginary world, characters, story, philosophical",1436680217
1292 | 660,260,script,1436680177
1293 | 660,135518,meaning of life,1436680885
1294 | 660,135518,philosophical,1436680885
1295 | 660,135518,sci-fi,1436680885
1296 | 663,260,action,1438398078
1297 | 663,260,Syfy,1438398050
1298 |
--------------------------------------------------------------------------------
/docs/biparte (1).xml:
--------------------------------------------------------------------------------
1 | zVhNc9sgEP01vmYkoa8cWzdtL53JjA9tjlgiElPJeBD+6q/vyoAlgZy4jmzqgwcesMDbx7Johub1/hvH6/IHy0k1C7x8P0NfZkGQPHrw3wIHCYRJIoGC01xCfgcs6B+iQDWu2NCcNIOOgrFK0PUQzNhqRTIxwDDnbDfs9sqq4axrXBALWGS4stGfNBelRNPI6/DvhBalntn3VMsSZ78LzjYrNd8sQK/Hn2yusbal+jclztmuB6GnGZpzxoQs1fs5qVpqNW1y3Nczrad1c7ISlwwI5IAtrjZq6zXbUqLgRhw0IzAMyIfK511JBVmscda27MD9gJWirqDmQ1EZJFyQ/dlF+aetgoIIq4ngB+iiBgRIsXPQqlD1XecLXzNY9vyQKgwr9xcn0x0FUFAsjDOCxhkBCDklJQwckhKeJSV0SkrsUinRWVJ8t0pxyElscZJtGgF2+Du0QCBct2DONsuKPN2RLt8f8oWiO/KVvMUXtMS4bne8Wjbr027/M/5Q6JC/9C3+3grY7viKPYd8PVp82Ryt8k9tCgW1rMJNQ7OWLoG5sOEeM7B7fvgFFe8h0tUX3ban4tikyi+qLOcmuZWMGVzC+tiGZ2Tgc1hRQUQvFNuM9xiNRgjVGCcVFnQ7XMQYy2qGZ0ZheV0ASc44VJuQi1ej+smYYSgwrnjkGYbkli1DR6eftn2RDnTM+2ch9D2uvSod/mG/xrZfA6d+TaOhO9C1fkVDQ9ZJntCvdkI/2QEfdbdx9OEw9A+/9+ClqQaeCaewFQjNH1FJYKskcamS2Dizpm8vFYmVhphqm1Ak9hvnrrfARPEisZWAXCohRBPdA2F6v3vAftlNLgX/1kIYSQicCiE2Y0J4pRAiQwiBqagJhWC/ZqePCeeE4N8qfwhdysC6GuIrZRAbV8MpT7yBDOwH/D2vhmmEMHIxOBVCZL74rs0RouhWOQJUu4/Osnv3YR89/QU=
--------------------------------------------------------------------------------
/docs/biparte.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/chandan-u/graph-based-recommendation-system/897c6419d922fcbfe7f110522e530f3eeeceaf17/docs/biparte.png
--------------------------------------------------------------------------------
/docs/biparte.xml:
--------------------------------------------------------------------------------
1 | zVjLcpswFP0alvUAAoyXbZq2m85kxos2SxkU0BQsj5Bf/fpeLMmAZCcpIVa88EhH73OOLhc8dFcfvnO8KX+ynFRe6OcHD331wnC+8OG/BY4SiMNAAgWnuYR6wJL+JQpU44otzUkz6CgYqwTdDMGMrdckEwMMc872w25PrBquusEFsYBlhisb/UVzUUo0jf0O/0FoUeqVA1+1rHD2p+Bsu1breSF6Ov1kc431XKp/U+Kc7XsQuvfQHWdMyFJ9uCNVS62mTY77dqX1vG9O1uI1A0I5YIerrTp6zXaUKLgRR80IDAPyofJlX1JBlhuctS17kB+wUtQV1AIoqgkJF+RwdVPB+ajgIMJqIvgRuqgBIVLsHLUrVH3faRFoBsueDqnCsJK/OE/dUQAFxcJlRtBlRgBCTkmJQoekRFdJiZySkrh0SnyVlMCtUxxyklicZNtGwDz8BVogEG5aMGfbVUXub0hXEA/5QvEN+Zo/xxe0JLhuT7xeNZvzaT8Yf2jhkL/0Of6eC9ju+DJjFopuyNfC4svmaJ1/blMoqGUVbhqatXQJzIUN95iB0/Pjb6j4s1hXH3XbgYpTkyo/qrJcm+RWMmZwCftjW56Rgeawo4KIXii2Ge8xGl8gVGOcVFjQ3XATl1hWKzwwCtvrAsj8ygXQU8jNq1H9ZMyYKDQe8cg3JpJHtiY6iX4+9qt8oBPt/zZCX3GtqhT8zbomtq6hU13TeCgHGqsrGk5k3eQJdbUT+skueCd3inqCBzM/TVX9gXAKu4Xoqy57FxTgmvTDgq9HtYA5bJx/Qts/c5f+SYzbbKr+WvsEgTGR6cMJ7WO//dz0+TBL/LRnrU/gkviKtcZ5ZG57BLn0SGQmA2OfHVF6u2eH/Tb4DiaZD+MFRJkAvT1efBjhEzM6RCOFjw3hQ9NBEwpvv/FOL7wZG8ZpG7nU1or8yUhtEyPynxPEd9DWfnO/ZeQPvCkyyAvR3akRYn+iFCA2v1FMlgJAtfvaLLt3X/TR/T8=
--------------------------------------------------------------------------------
/docs/graph_based_recommendation system.Rmd:
--------------------------------------------------------------------------------
1 | ---
2 | title: "recommendation system using graph traversal"
3 | author: "chandan u"
4 | Instructor: "Funda ergun"
5 | date: "12/6/2016"
6 | output: pdf_document
7 | ---
8 |
9 |
10 |
11 |
12 | # Abstract:
13 |
14 | Implemented a movie recommendation system using the movielens dataset from the grouplens site. This dataset is transformed to a bipartite graph which allowed to address the problem using graph based traversal algorithms instead of usual approaches that are used by recommendation systems. The goal is to implement collaborative filtering technique as well as content based recommendation using the graph traversal algorithms. We will evaluate the advantages and shortcomings and then also discuss how we can improve on this approach.
15 |
16 | # Introduction:
17 | The amount of content that is being generated by social media sites, movies, tv shows etc is increasing tremendously and its very hard for a user or person to choose from such a huge pool of content. There are endless choices. Hence we need to filter out most of these content and give suggestions to user.
18 | Recommendation systems are designed to solve this very problem to give users best suggestions based on existing data and the user preferences.
19 |
20 | Recommendation systems are widely used in e-commerce sites such as Netflix to suggest movies , amazon to suggest products, music application such as iTunes and spotify to suggest next songs that the user may like to hear. It can be applied to even domains such as social networking. Facebook uses it for suggesting friends.
21 |
22 | In this project we implement a collaberative filtering recommendation system that uses existing data to give better suggestions. We will be building a bipartite graph from the data set to support graph traversal for collaborative filtering system.
23 |
24 |
25 | # Data:
26 | The data set is obtained from http://grouplens.org/ . They have a collection of ratings of movies from MovieLens website . This data set covers 100,000 ratings and 1,300 tag applications applied to 9,000 movies by 671 users. For implementing the collaberative filter system we will be using all the data except for tags. The data set has mainly two files : movies.csv, ratings.csv.
27 |
28 | Number of users: 671
29 | Number of movies: >9000
30 | Number of ratings: 100,000
31 |
32 | Files: movies.csv, ratings.csv
33 |
34 |
35 |
36 |
37 | # Representation of data:
38 | To facilitate graph traversal techniques and collaborative filtering , the data is transformed into bipartite graph representation. In a bipartite graph, nodes are divided into two distinctive sets. Links between pairs of nodes from different node sets are admissible, while links between nodes from the same node set are not allowedIn our case the information is about weather or not a person(customer) has watched the movie (product) and the how much rating the customer has given for the movie. Such an information can be easily represented as show in the below table:
39 |
40 |
41 | customer/movie | movie 1 | movie 2 | movie 3 | movie 4
42 | --------------- | -------- | -------- | -------- | -------
43 | customer1 | 0 | 1 | 0 | 1
44 | customer2 | 0 | 1 | 1 | 1
45 | customer3 | 1 | 0 | 1 | 0
46 |
47 |
48 |
49 |
50 | The zeros in the above table represent weather a customer has watched a movie or not. The nonzero’s represent that a customer has watched the movie and the numeric value represents the rating he/she has given for that movie. You can traverse from customer to movie but you cannot traverse from customer to customer directly . Likewise you cannot directly traverse form movie to movie either.
51 |
52 |
53 | Biparte matrix translation to graph:
54 |
55 |
56 | 
57 |
58 |
59 | The actual data set has 9123 movies and 671 customer. So the biparte graph matrix is of size (9123 * 671).
60 |
61 |
62 |
63 |
64 |
65 | # Related work:
66 | In general recommendation systems are implemented in three ways:
67 |
68 | ## 1.)Content based approach:
69 | Another common approach when designing recommender systems is content-based filtering. Content-based filtering methods are based on a description of the item or product and a profile of the user’s preference
70 |
71 | ## 2.)Collaborative filtering:
72 | Collaborative filtering methods are based on collecting and analyzing a large amount of information on users’ behaviors, activities or preferences and predicting what users will like based on their similarity to other users. A key advantage of the collaborative filtering approach is that it does not rely on machine analyzable content and therefore it is capable of accurately recommending complex items such as movies without requiring an "understanding" of the item itself
73 | ## 3.) A hybrid of collaborative and content based approach:
74 | In this approach we combine the both the collaborative and content based approaches to come with recommendations.
75 |
76 | Our main focus in this project is to implement the collaborative filtering as well as the content based filtering.
77 |
78 |
79 | # Recommendation Algorithms:
80 |
81 | ## 1.) The content based Filtering:
82 | The idea behind content based filtering is when a user likes/watches certain movies, using the meta information of the movies that the user watched we will suggest similar movies which may have the same properties. For example the following meta information such as the genre about a movie can be used to suggest similar movies that belong to the same genre. We can also couple this with ratings that the user has given to these movies earlier.
83 | meta-properties: genre
84 | user-given-properties: rating
85 |
86 | Here is a simple algorithm
87 |
88 | ```{p}
89 | Algorithm:
90 | Step1: choose all the movies that the user watched.
91 | Step2: obtain genre of all the movies that user watched
92 |
93 | Step 3: sum all the ratings for each genre that is given by the target user
94 |
95 | Step 4: Divide the cumulative rating of each genre with
96 | the number of movies in that genre.
97 |
98 | Step 4: Now pick the top three genres using the above computation
99 | and recommend movies that belong to that genre.
100 |
101 | ```
102 |
103 | This may not be the best approach but this takes into consideration that may be a user likes a particular genre and he is trying to find a good movie in that genre. He may not have found a good movie so far. Or it can also be that the user in general likes movies from certain genres more than other genre.
104 |
105 |
106 | ## 2.) Collaborative Filtering:
107 | Collaborative filtering can be implemented in two ways . User based collaborative filtering and item based collaborative filtering. In this project we will be focussing on the user-user collaborative filtering. In user-user collaborative filtering when try to recommend a user , we try to find other similar users who have watched almost the same movies as our current user. We use similarity metrics such as euclidian distance, manhattan distance , pearson correlation etc to find such similar users. In this project implemented euclidian distance to find similar users. We will be taking the example of the tabl1 and try to recommend movies for customer 1 in the table.
108 |
109 | ### Euclidian distance Similarity:
110 | From the table 1 let’s assume we are trying to recommend movies for customer 1. Our goal is to find similar users. In order to do this we try to find the euclidian distance of customer 1 to all other customers respectively.
111 |
112 | The euclidian distance for any two vectors, p = (p1, p2,..., pn) and q = (q1, q2,..., qn) are two points in Euclidean n-space, is the distance (d) from p to q, or from q to p and is given by the Pythagorean formula:
113 |
114 | $\sqrt[2]{\sum_{i=1}^{n} (q_{i} - p_{i} )^{2}}$
115 |
116 | Here vecotors are nothing but the rows of the biparte matrix shown in table 1. i.e customer 1 row is vector p and any other customer such as customer 2 row is vector q. The distance between customer 1 and customer 2 is computed as follows:
117 |
118 | customer 1, p = (0,1,0,1)
119 | customer 2, q = (0,1,1,1)
120 |
121 | $\sqrt[2]{( (0-0)^{2} + (1-1)^{2} + (0-1)^{2} + (1-1)^{2} )^{2}}$ = 1
122 |
123 |
124 | As you can see the distance is one. We obtain distances of customer 1 w.r.t all other users. We choose the first few users( atleast three) who are closer to the customer 1. The most important thing is we ignore all those users who have a euclidian distance of zero w.r.t customer 1. This is becuase those users have watched the same set of movies as customer 1 and have no other information to provide that would be helpfull in recommending customer 1.
125 |
126 | We compute a distance metric as follows:
127 |
128 | d | customer1 | customer 2 | customer 3
129 | ------ | ---------- | ----------- | ----------
130 | distance from customer1 | 0 | 1 | 2
131 |
132 | So customer 3 not very similar to customer 1 . But customer 2 is similar and may have some intresting information that we can use to suggest movies to customer 1.
133 |
134 |
135 | ### graph based recommendation :
136 |
137 | Now that we have the list of users (neighborhood users) similar to the target users( whom we are recommending)we will use the biparte graph matrix to search for movies that can be recommended to the target user.
138 |
139 | The similarity between customer 1 and customer 2 is obvious becuase they have both watched movies "move1" and "movie4". As a result, "movie3" is recommended to customer 1 because customer 2 has watched it too. From the distance metrics we know that customer 1 and customer 3 are not very similar. Therefore, customer 1, which has been purchased by customer 3, will not be recommended to customer 1.
140 |
141 | The above recommendation approach can be easily implemented in a graph-based model by computing the associations between movie nodes and customer nodes. In this context, the association between two nodes is determined by the existence and length of the path(s) connecting them. Standard collaborative filtering approaches, including both the user-based and item-based approaches, consider only paths with length equal to 3. For instance, the
142 | association between customer 1 and movie3 is determined by all paths of length 3 connecting customer 1 and movie3. It is easy to see from Figure 1 that there exist two paths connecting customer1 and movie3:
143 | customer1—movie1—customer2-movie3
144 | and customer1—movie4—customer2—movie3.
145 |
146 | This strong association leads to the recommendation of movie3 to customer1. Intuitively, the higher the number of distinctive paths connecting a product node to a consumer node, the higher the association between these two nodes. The product therefore is more likely to be recommended to the consumer.
147 | Extending the above approach to explore and incorporate transitive associations is straightforward in a graph-based model. By considering paths whose length exceeds 3, the model will be able to explore transitive associations.
148 |
149 |
150 |
151 | So we can formalize this as follows:
152 | If there are n paths between (customer i , movie i) then the the wieight of each path is computed as follows:
153 |
154 | Aglorithm:
155 |
156 | Take constant alpha= (0,1)
157 | weights = 0
158 | For each path between (customer i and movie i):
159 | compute the depth of the path.
160 | weights = weights + $(alpha)^3$
161 |
162 |
163 |
164 |
165 | $weights(customer1 , movie3)= (0.5)^3 +(0.5)^3 = 0.25, and weihts(cusotmer1, movie1)=0$
166 |
167 | It's zero becuase there is no path to movie1. Hence we will recommend movie 3 to customer 1.
168 |
169 |
170 |
171 | # Experimental Evaluation:
172 |
173 | One of the ways to evaluate the content based and collaberative based techniques is to use the similarity metrics as follows:
174 |
175 | 1.) Compute the euclidian distances of the all users with respect to the target user(who is to be recommended). Obtain all the similar users i.e users whose distance is less w.r.t target user.
176 |
177 | 2.) Now compute the recommendation filters and recommend the movie to the target user. Update the movie list of the target user.
178 |
179 | 3.) Now compute the euclidian distances of all the previous similar users w.r.t the updated target user. And see how much has the cummulative distace varied.
180 |
181 | 4.) We compute this change in cummulative distance for both collaberative as well as content based recommendation. The better algorithm is the one whose cummulative distance has reduced drastically.
182 |
183 |
184 | Lets say the collabertive algorithm recommends "movie3" to customer 1
185 | And the content based algorithm recommends "movie1" to customer 1
186 |
187 |
188 | So now updated vectors are:
189 |
190 |
191 | ### For collabertive:
192 |
193 | Target User:
194 | customer 1, p = (0,1,1,1)
195 | Similar Users:
196 | customer 2, q = (0,1,1,1)
197 |
198 | Total distance = $\sqrt[2]{( (0-0)^{2} + (1-1)^{2} + (1-1)^{2} + (1-1)^{2} )^{2}}$ = 0
199 |
200 | ### For content based:
201 |
202 | Target User:
203 | customer 1, p = (1,1,0,1)
204 | Similar Users:
205 | customer 2, q = (0,1,1,1)
206 |
207 | Total distance = $\sqrt[2]{( (1-0)^{2} + (1-1)^{2} + (0-1)^{2} + (1-1)^{2} )^{2}}$ = $\sqrt[2]{2}$
208 |
209 | It seems like the content based has not fared well as it failed to give the most similar recommendation. But this is epxected behaviour. This approach is more helpfull when we are trying to compare one collabertive filtering algorithm with another collaberative filtering algorithm.
210 |
211 | In this program we have implemented only for one movie recommendation to the target user customer 1. So apparently the euclidian distances are not good enough to compare the algorithms. This is because the program took a lot of time just to execute for one target user.
212 |
213 | ```
214 | The content Based recommendatin is:
215 | 284 318 Shawshank Redemption, The (1994) Crime|Drama
216 |
217 | The collaberative based recommendatoin is :
218 | 15 16 Casino (1995) Crime
219 | ```
220 |
221 |
222 |
223 | # Conclusion:
224 |
225 | 1.) There are many ways to implement collaberative as well as content based filtering. What we have implemented in this project is not the best approach. It can be improved a lot more.
226 |
227 | 2.) In this experiment though we have used the Euclidian distance to get similar users , there are far more better approaches to get similar / neighbouring users such as the manhattan distance , the pearson corellation which is considered one of the best for the collabertive filtering problem.
228 |
229 | 3.) In the project we have implemented BFS to obtain the paths from customer to movie in biparte graph. But as the number of movies increases this might be problem as we have to compute more number of paths. We have to explore other ways such as implementing greedy algorithms which are faster and take less computation but may not give an optimal solution. Or we can do iterative deepening search to limit depth of the graph search. This is something that would very efficient and intresting to try.
230 |
231 | 4.) As for the testing it would be more realistic to have data where we have information about what movies did the customer/user choose after the recommendation. This way we can implement precission and recall metircs easily.
232 |
233 |
234 |
235 |
236 |
237 |
238 |
239 |
240 |
241 |
242 | References:
243 |
244 | 1.) https://en.wikipedia.org/wiki/Recommender_system
245 | 2.) https://en.wikipedia.org/wiki/Euclidean_distance
246 | 3.) GRAPH-BASED ANALYSIS FOR E-COMMERCE RECOMMENDATION[http://arizona.openrepository.com/arizona/bitstream/10150/196109/1/azu_etd_1167_sip1_m.pdf]
247 | 4.) Collaborative Filtering using Weighted BiPartite Graph Projection [http://snap.stanford.edu/class/cs224w-2013/projects2013/cs224w-038-final.pdf]
248 |
249 | 5.)Movie Recommendation based on graph traversal Algorithms[http://www2.fiit.stuba.sk/~bielik/publ/abstracts/2013/televido-dexa2013.pdf]
250 | 6.) http://grouplens.org/blog/
251 |
252 |
253 |
--------------------------------------------------------------------------------
/docs/graph_based_recommendation_system.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/chandan-u/graph-based-recommendation-system/897c6419d922fcbfe7f110522e530f3eeeceaf17/docs/graph_based_recommendation_system.pdf
--------------------------------------------------------------------------------
/docs/report.pages:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/chandan-u/graph-based-recommendation-system/897c6419d922fcbfe7f110522e530f3eeeceaf17/docs/report.pages
--------------------------------------------------------------------------------