├── README.md
├── movie recommendation system.ipynb
├── movie_dataset.csv
└── test.PNG


/README.md:
--------------------------------------------------------------------------------
 1 | # Content-Based-Movie-Recommendation-System
 2 | Content Based Movie Recommendations System using machine Learning
 3 | 
 4 | Wondered how Google comes up with movies that are similar to the ones you like? After reading this post you will be able to build one such recommendation system for yourself.
 5 | 
 6 | It turns out that there are (mostly) three ways to build a recommendation engine:
 7 | 
 8 | 1. Popularity based recommendation engine
 9 | 2. Content based recommendation engine
10 | 3. Collaborative filtering based recommendation engine
11 | 
12 | Now you might be thinking “That’s interesting. But, what are the differences between these recommendation engines?”. Let me help you out with that.
13 | 
14 | ### Popularity based recommendation engine:
15 | 
16 | Perhaps, this is the simplest kind of recommendation engine that you will come across. The trending list you see in YouTube or Netflix is based on this algorithm. It keeps a track of view counts for each movie/video and then lists movies based on views in descending order(highest view count to lowest view count). Pretty simple but, effective. Right?
17 | 
18 | ### Content based recommendation engine:
19 | 
20 | This type of recommendation systems, takes in a movie that a user currently likes as input. Then it analyzes the contents (storyline, genre, cast, director etc.) of the movie to find out other movies which have similar content. Then it ranks similar movies according to their similarity scores and recommends the most relevant movies to the user.
21 | 
22 | ### Collaborative filtering based recommendation engine:
23 | 
24 | This algorithm at first tries to find similar users based on their activities and preferences (for example, both the users watch same type of movies or movies directed by the same director). Now, between these users(say, A and B) if user A has seen a movie that user B has not seen yet, then that movie gets recommended to user B and vice-versa. In other words, the recommendations get filtered based on the collaboration between similar user’s preferences (thus, the name “Collaborative Filtering”). One typical application of this algorithm can be seen in the Amazon e-commerce platform, where you get to see the “Customers who viewed this item also viewed” and “Customers who bought this item also bought” list.
25 | 
26 | But we are going to implement a Content based recommendation system using the scikit-learn library.
27 | Enjoy!!!
28 | 
29 | 
30 | Thanks - CodeHeroku
31 | 


--------------------------------------------------------------------------------
/movie recommendation system.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "metadata": {},
  6 |    "source": [
  7 |     "#### Getting Started: Loading Libraries"
  8 |    ]
  9 |   },
 10 |   {
 11 |    "cell_type": "code",
 12 |    "execution_count": 1,
 13 |    "metadata": {},
 14 |    "outputs": [],
 15 |    "source": [
 16 |     "import pandas as pd\n",
 17 |     "import numpy as np\n",
 18 |     "from sklearn.feature_extraction.text import CountVectorizer\n",
 19 |     "from sklearn.metrics.pairwise import cosine_similarity"
 20 |    ]
 21 |   },
 22 |   {
 23 |    "cell_type": "markdown",
 24 |    "metadata": {},
 25 |    "source": [
 26 |     "#### Loading the Dataset\n",
 27 |     "Loading the Dataset provided by Kaggle <a href = \"https://www.kaggle.com/rounakbanik/the-movies-dataset\">The Movies Dataset</a> to a Pandas DataFrame"
 28 |    ]
 29 |   },
 30 |   {
 31 |    "cell_type": "code",
 32 |    "execution_count": 2,
 33 |    "metadata": {},
 34 |    "outputs": [],
 35 |    "source": [
 36 |     "df = pd.read_csv(\"movie_dataset.csv\")"
 37 |    ]
 38 |   },
 39 |   {
 40 |    "cell_type": "markdown",
 41 |    "metadata": {},
 42 |    "source": [
 43 |     "We have our dataframe ready, so let`s visualize it"
 44 |    ]
 45 |   },
 46 |   {
 47 |    "cell_type": "code",
 48 |    "execution_count": 3,
 49 |    "metadata": {},
 50 |    "outputs": [
 51 |     {
 52 |      "data": {
 53 |       "text/html": [
 54 |        "<div>\n",
 55 |        "<style scoped>\n",
 56 |        "    .dataframe tbody tr th:only-of-type {\n",
 57 |        "        vertical-align: middle;\n",
 58 |        "    }\n",
 59 |        "\n",
 60 |        "    .dataframe tbody tr th {\n",
 61 |        "        vertical-align: top;\n",
 62 |        "    }\n",
 63 |        "\n",
 64 |        "    .dataframe thead th {\n",
 65 |        "        text-align: right;\n",
 66 |        "    }\n",
 67 |        "</style>\n",
 68 |        "<table border=\"1\" class=\"dataframe\">\n",
 69 |        "  <thead>\n",
 70 |        "    <tr style=\"text-align: right;\">\n",
 71 |        "      <th></th>\n",
 72 |        "      <th>index</th>\n",
 73 |        "      <th>budget</th>\n",
 74 |        "      <th>genres</th>\n",
 75 |        "      <th>homepage</th>\n",
 76 |        "      <th>id</th>\n",
 77 |        "      <th>keywords</th>\n",
 78 |        "      <th>original_language</th>\n",
 79 |        "      <th>original_title</th>\n",
 80 |        "      <th>overview</th>\n",
 81 |        "      <th>popularity</th>\n",
 82 |        "      <th>...</th>\n",
 83 |        "      <th>runtime</th>\n",
 84 |        "      <th>spoken_languages</th>\n",
 85 |        "      <th>status</th>\n",
 86 |        "      <th>tagline</th>\n",
 87 |        "      <th>title</th>\n",
 88 |        "      <th>vote_average</th>\n",
 89 |        "      <th>vote_count</th>\n",
 90 |        "      <th>cast</th>\n",
 91 |        "      <th>crew</th>\n",
 92 |        "      <th>director</th>\n",
 93 |        "    </tr>\n",
 94 |        "  </thead>\n",
 95 |        "  <tbody>\n",
 96 |        "    <tr>\n",
 97 |        "      <td>0</td>\n",
 98 |        "      <td>0</td>\n",
 99 |        "      <td>237000000</td>\n",
100 |        "      <td>Action Adventure Fantasy Science Fiction</td>\n",
101 |        "      <td>http://www.avatarmovie.com/</td>\n",
102 |        "      <td>19995</td>\n",
103 |        "      <td>culture clash future space war space colony so...</td>\n",
104 |        "      <td>en</td>\n",
105 |        "      <td>Avatar</td>\n",
106 |        "      <td>In the 22nd century, a paraplegic Marine is di...</td>\n",
107 |        "      <td>150.437577</td>\n",
108 |        "      <td>...</td>\n",
109 |        "      <td>162.0</td>\n",
110 |        "      <td>[{\"iso_639_1\": \"en\", \"name\": \"English\"}, {\"iso...</td>\n",
111 |        "      <td>Released</td>\n",
112 |        "      <td>Enter the World of Pandora.</td>\n",
113 |        "      <td>Avatar</td>\n",
114 |        "      <td>7.2</td>\n",
115 |        "      <td>11800</td>\n",
116 |        "      <td>Sam Worthington Zoe Saldana Sigourney Weaver S...</td>\n",
117 |        "      <td>[{'name': 'Stephen E. Rivkin', 'gender': 0, 'd...</td>\n",
118 |        "      <td>James Cameron</td>\n",
119 |        "    </tr>\n",
120 |        "    <tr>\n",
121 |        "      <td>1</td>\n",
122 |        "      <td>1</td>\n",
123 |        "      <td>300000000</td>\n",
124 |        "      <td>Adventure Fantasy Action</td>\n",
125 |        "      <td>http://disney.go.com/disneypictures/pirates/</td>\n",
126 |        "      <td>285</td>\n",
127 |        "      <td>ocean drug abuse exotic island east india trad...</td>\n",
128 |        "      <td>en</td>\n",
129 |        "      <td>Pirates of the Caribbean: At World's End</td>\n",
130 |        "      <td>Captain Barbossa, long believed to be dead, ha...</td>\n",
131 |        "      <td>139.082615</td>\n",
132 |        "      <td>...</td>\n",
133 |        "      <td>169.0</td>\n",
134 |        "      <td>[{\"iso_639_1\": \"en\", \"name\": \"English\"}]</td>\n",
135 |        "      <td>Released</td>\n",
136 |        "      <td>At the end of the world, the adventure begins.</td>\n",
137 |        "      <td>Pirates of the Caribbean: At World's End</td>\n",
138 |        "      <td>6.9</td>\n",
139 |        "      <td>4500</td>\n",
140 |        "      <td>Johnny Depp Orlando Bloom Keira Knightley Stel...</td>\n",
141 |        "      <td>[{'name': 'Dariusz Wolski', 'gender': 2, 'depa...</td>\n",
142 |        "      <td>Gore Verbinski</td>\n",
143 |        "    </tr>\n",
144 |        "    <tr>\n",
145 |        "      <td>2</td>\n",
146 |        "      <td>2</td>\n",
147 |        "      <td>245000000</td>\n",
148 |        "      <td>Action Adventure Crime</td>\n",
149 |        "      <td>http://www.sonypictures.com/movies/spectre/</td>\n",
150 |        "      <td>206647</td>\n",
151 |        "      <td>spy based on novel secret agent sequel mi6</td>\n",
152 |        "      <td>en</td>\n",
153 |        "      <td>Spectre</td>\n",
154 |        "      <td>A cryptic message from Bond’s past sends him o...</td>\n",
155 |        "      <td>107.376788</td>\n",
156 |        "      <td>...</td>\n",
157 |        "      <td>148.0</td>\n",
158 |        "      <td>[{\"iso_639_1\": \"fr\", \"name\": \"Fran\\u00e7ais\"},...</td>\n",
159 |        "      <td>Released</td>\n",
160 |        "      <td>A Plan No One Escapes</td>\n",
161 |        "      <td>Spectre</td>\n",
162 |        "      <td>6.3</td>\n",
163 |        "      <td>4466</td>\n",
164 |        "      <td>Daniel Craig Christoph Waltz L\\u00e9a Seydoux ...</td>\n",
165 |        "      <td>[{'name': 'Thomas Newman', 'gender': 2, 'depar...</td>\n",
166 |        "      <td>Sam Mendes</td>\n",
167 |        "    </tr>\n",
168 |        "    <tr>\n",
169 |        "      <td>3</td>\n",
170 |        "      <td>3</td>\n",
171 |        "      <td>250000000</td>\n",
172 |        "      <td>Action Crime Drama Thriller</td>\n",
173 |        "      <td>http://www.thedarkknightrises.com/</td>\n",
174 |        "      <td>49026</td>\n",
175 |        "      <td>dc comics crime fighter terrorist secret ident...</td>\n",
176 |        "      <td>en</td>\n",
177 |        "      <td>The Dark Knight Rises</td>\n",
178 |        "      <td>Following the death of District Attorney Harve...</td>\n",
179 |        "      <td>112.312950</td>\n",
180 |        "      <td>...</td>\n",
181 |        "      <td>165.0</td>\n",
182 |        "      <td>[{\"iso_639_1\": \"en\", \"name\": \"English\"}]</td>\n",
183 |        "      <td>Released</td>\n",
184 |        "      <td>The Legend Ends</td>\n",
185 |        "      <td>The Dark Knight Rises</td>\n",
186 |        "      <td>7.6</td>\n",
187 |        "      <td>9106</td>\n",
188 |        "      <td>Christian Bale Michael Caine Gary Oldman Anne ...</td>\n",
189 |        "      <td>[{'name': 'Hans Zimmer', 'gender': 2, 'departm...</td>\n",
190 |        "      <td>Christopher Nolan</td>\n",
191 |        "    </tr>\n",
192 |        "    <tr>\n",
193 |        "      <td>4</td>\n",
194 |        "      <td>4</td>\n",
195 |        "      <td>260000000</td>\n",
196 |        "      <td>Action Adventure Science Fiction</td>\n",
197 |        "      <td>http://movies.disney.com/john-carter</td>\n",
198 |        "      <td>49529</td>\n",
199 |        "      <td>based on novel mars medallion space travel pri...</td>\n",
200 |        "      <td>en</td>\n",
201 |        "      <td>John Carter</td>\n",
202 |        "      <td>John Carter is a war-weary, former military ca...</td>\n",
203 |        "      <td>43.926995</td>\n",
204 |        "      <td>...</td>\n",
205 |        "      <td>132.0</td>\n",
206 |        "      <td>[{\"iso_639_1\": \"en\", \"name\": \"English\"}]</td>\n",
207 |        "      <td>Released</td>\n",
208 |        "      <td>Lost in our world, found in another.</td>\n",
209 |        "      <td>John Carter</td>\n",
210 |        "      <td>6.1</td>\n",
211 |        "      <td>2124</td>\n",
212 |        "      <td>Taylor Kitsch Lynn Collins Samantha Morton Wil...</td>\n",
213 |        "      <td>[{'name': 'Andrew Stanton', 'gender': 2, 'depa...</td>\n",
214 |        "      <td>Andrew Stanton</td>\n",
215 |        "    </tr>\n",
216 |        "  </tbody>\n",
217 |        "</table>\n",
218 |        "<p>5 rows × 24 columns</p>\n",
219 |        "</div>"
220 |       ],
221 |       "text/plain": [
222 |        "   index     budget                                    genres  \\\n",
223 |        "0      0  237000000  Action Adventure Fantasy Science Fiction   \n",
224 |        "1      1  300000000                  Adventure Fantasy Action   \n",
225 |        "2      2  245000000                    Action Adventure Crime   \n",
226 |        "3      3  250000000               Action Crime Drama Thriller   \n",
227 |        "4      4  260000000          Action Adventure Science Fiction   \n",
228 |        "\n",
229 |        "                                       homepage      id  \\\n",
230 |        "0                   http://www.avatarmovie.com/   19995   \n",
231 |        "1  http://disney.go.com/disneypictures/pirates/     285   \n",
232 |        "2   http://www.sonypictures.com/movies/spectre/  206647   \n",
233 |        "3            http://www.thedarkknightrises.com/   49026   \n",
234 |        "4          http://movies.disney.com/john-carter   49529   \n",
235 |        "\n",
236 |        "                                            keywords original_language  \\\n",
237 |        "0  culture clash future space war space colony so...                en   \n",
238 |        "1  ocean drug abuse exotic island east india trad...                en   \n",
239 |        "2         spy based on novel secret agent sequel mi6                en   \n",
240 |        "3  dc comics crime fighter terrorist secret ident...                en   \n",
241 |        "4  based on novel mars medallion space travel pri...                en   \n",
242 |        "\n",
243 |        "                             original_title  \\\n",
244 |        "0                                    Avatar   \n",
245 |        "1  Pirates of the Caribbean: At World's End   \n",
246 |        "2                                   Spectre   \n",
247 |        "3                     The Dark Knight Rises   \n",
248 |        "4                               John Carter   \n",
249 |        "\n",
250 |        "                                            overview  popularity  ... runtime  \\\n",
251 |        "0  In the 22nd century, a paraplegic Marine is di...  150.437577  ...   162.0   \n",
252 |        "1  Captain Barbossa, long believed to be dead, ha...  139.082615  ...   169.0   \n",
253 |        "2  A cryptic message from Bond’s past sends him o...  107.376788  ...   148.0   \n",
254 |        "3  Following the death of District Attorney Harve...  112.312950  ...   165.0   \n",
255 |        "4  John Carter is a war-weary, former military ca...   43.926995  ...   132.0   \n",
256 |        "\n",
257 |        "                                    spoken_languages    status  \\\n",
258 |        "0  [{\"iso_639_1\": \"en\", \"name\": \"English\"}, {\"iso...  Released   \n",
259 |        "1           [{\"iso_639_1\": \"en\", \"name\": \"English\"}]  Released   \n",
260 |        "2  [{\"iso_639_1\": \"fr\", \"name\": \"Fran\\u00e7ais\"},...  Released   \n",
261 |        "3           [{\"iso_639_1\": \"en\", \"name\": \"English\"}]  Released   \n",
262 |        "4           [{\"iso_639_1\": \"en\", \"name\": \"English\"}]  Released   \n",
263 |        "\n",
264 |        "                                          tagline  \\\n",
265 |        "0                     Enter the World of Pandora.   \n",
266 |        "1  At the end of the world, the adventure begins.   \n",
267 |        "2                           A Plan No One Escapes   \n",
268 |        "3                                 The Legend Ends   \n",
269 |        "4            Lost in our world, found in another.   \n",
270 |        "\n",
271 |        "                                      title vote_average vote_count  \\\n",
272 |        "0                                    Avatar          7.2      11800   \n",
273 |        "1  Pirates of the Caribbean: At World's End          6.9       4500   \n",
274 |        "2                                   Spectre          6.3       4466   \n",
275 |        "3                     The Dark Knight Rises          7.6       9106   \n",
276 |        "4                               John Carter          6.1       2124   \n",
277 |        "\n",
278 |        "                                                cast  \\\n",
279 |        "0  Sam Worthington Zoe Saldana Sigourney Weaver S...   \n",
280 |        "1  Johnny Depp Orlando Bloom Keira Knightley Stel...   \n",
281 |        "2  Daniel Craig Christoph Waltz L\\u00e9a Seydoux ...   \n",
282 |        "3  Christian Bale Michael Caine Gary Oldman Anne ...   \n",
283 |        "4  Taylor Kitsch Lynn Collins Samantha Morton Wil...   \n",
284 |        "\n",
285 |        "                                                crew           director  \n",
286 |        "0  [{'name': 'Stephen E. Rivkin', 'gender': 0, 'd...      James Cameron  \n",
287 |        "1  [{'name': 'Dariusz Wolski', 'gender': 2, 'depa...     Gore Verbinski  \n",
288 |        "2  [{'name': 'Thomas Newman', 'gender': 2, 'depar...         Sam Mendes  \n",
289 |        "3  [{'name': 'Hans Zimmer', 'gender': 2, 'departm...  Christopher Nolan  \n",
290 |        "4  [{'name': 'Andrew Stanton', 'gender': 2, 'depa...     Andrew Stanton  \n",
291 |        "\n",
292 |        "[5 rows x 24 columns]"
293 |       ]
294 |      },
295 |      "execution_count": 3,
296 |      "metadata": {},
297 |      "output_type": "execute_result"
298 |     }
299 |    ],
300 |    "source": [
301 |     "df.head()"
302 |    ]
303 |   },
304 |   {
305 |    "cell_type": "code",
306 |    "execution_count": 4,
307 |    "metadata": {
308 |     "scrolled": true
309 |    },
310 |    "outputs": [
311 |     {
312 |      "data": {
313 |       "text/html": [
314 |        "<div>\n",
315 |        "<style scoped>\n",
316 |        "    .dataframe tbody tr th:only-of-type {\n",
317 |        "        vertical-align: middle;\n",
318 |        "    }\n",
319 |        "\n",
320 |        "    .dataframe tbody tr th {\n",
321 |        "        vertical-align: top;\n",
322 |        "    }\n",
323 |        "\n",
324 |        "    .dataframe thead th {\n",
325 |        "        text-align: right;\n",
326 |        "    }\n",
327 |        "</style>\n",
328 |        "<table border=\"1\" class=\"dataframe\">\n",
329 |        "  <thead>\n",
330 |        "    <tr style=\"text-align: right;\">\n",
331 |        "      <th></th>\n",
332 |        "      <th>index</th>\n",
333 |        "      <th>budget</th>\n",
334 |        "      <th>id</th>\n",
335 |        "      <th>popularity</th>\n",
336 |        "      <th>revenue</th>\n",
337 |        "      <th>runtime</th>\n",
338 |        "      <th>vote_average</th>\n",
339 |        "      <th>vote_count</th>\n",
340 |        "    </tr>\n",
341 |        "  </thead>\n",
342 |        "  <tbody>\n",
343 |        "    <tr>\n",
344 |        "      <td>count</td>\n",
345 |        "      <td>4803.000000</td>\n",
346 |        "      <td>4.803000e+03</td>\n",
347 |        "      <td>4803.000000</td>\n",
348 |        "      <td>4803.000000</td>\n",
349 |        "      <td>4.803000e+03</td>\n",
350 |        "      <td>4801.000000</td>\n",
351 |        "      <td>4803.000000</td>\n",
352 |        "      <td>4803.000000</td>\n",
353 |        "    </tr>\n",
354 |        "    <tr>\n",
355 |        "      <td>mean</td>\n",
356 |        "      <td>2401.000000</td>\n",
357 |        "      <td>2.904504e+07</td>\n",
358 |        "      <td>57165.484281</td>\n",
359 |        "      <td>21.492301</td>\n",
360 |        "      <td>8.226064e+07</td>\n",
361 |        "      <td>106.875859</td>\n",
362 |        "      <td>6.092172</td>\n",
363 |        "      <td>690.217989</td>\n",
364 |        "    </tr>\n",
365 |        "    <tr>\n",
366 |        "      <td>std</td>\n",
367 |        "      <td>1386.651002</td>\n",
368 |        "      <td>4.072239e+07</td>\n",
369 |        "      <td>88694.614033</td>\n",
370 |        "      <td>31.816650</td>\n",
371 |        "      <td>1.628571e+08</td>\n",
372 |        "      <td>22.611935</td>\n",
373 |        "      <td>1.194612</td>\n",
374 |        "      <td>1234.585891</td>\n",
375 |        "    </tr>\n",
376 |        "    <tr>\n",
377 |        "      <td>min</td>\n",
378 |        "      <td>0.000000</td>\n",
379 |        "      <td>0.000000e+00</td>\n",
380 |        "      <td>5.000000</td>\n",
381 |        "      <td>0.000000</td>\n",
382 |        "      <td>0.000000e+00</td>\n",
383 |        "      <td>0.000000</td>\n",
384 |        "      <td>0.000000</td>\n",
385 |        "      <td>0.000000</td>\n",
386 |        "    </tr>\n",
387 |        "    <tr>\n",
388 |        "      <td>25%</td>\n",
389 |        "      <td>1200.500000</td>\n",
390 |        "      <td>7.900000e+05</td>\n",
391 |        "      <td>9014.500000</td>\n",
392 |        "      <td>4.668070</td>\n",
393 |        "      <td>0.000000e+00</td>\n",
394 |        "      <td>94.000000</td>\n",
395 |        "      <td>5.600000</td>\n",
396 |        "      <td>54.000000</td>\n",
397 |        "    </tr>\n",
398 |        "    <tr>\n",
399 |        "      <td>50%</td>\n",
400 |        "      <td>2401.000000</td>\n",
401 |        "      <td>1.500000e+07</td>\n",
402 |        "      <td>14629.000000</td>\n",
403 |        "      <td>12.921594</td>\n",
404 |        "      <td>1.917000e+07</td>\n",
405 |        "      <td>103.000000</td>\n",
406 |        "      <td>6.200000</td>\n",
407 |        "      <td>235.000000</td>\n",
408 |        "    </tr>\n",
409 |        "    <tr>\n",
410 |        "      <td>75%</td>\n",
411 |        "      <td>3601.500000</td>\n",
412 |        "      <td>4.000000e+07</td>\n",
413 |        "      <td>58610.500000</td>\n",
414 |        "      <td>28.313505</td>\n",
415 |        "      <td>9.291719e+07</td>\n",
416 |        "      <td>118.000000</td>\n",
417 |        "      <td>6.800000</td>\n",
418 |        "      <td>737.000000</td>\n",
419 |        "    </tr>\n",
420 |        "    <tr>\n",
421 |        "      <td>max</td>\n",
422 |        "      <td>4802.000000</td>\n",
423 |        "      <td>3.800000e+08</td>\n",
424 |        "      <td>459488.000000</td>\n",
425 |        "      <td>875.581305</td>\n",
426 |        "      <td>2.787965e+09</td>\n",
427 |        "      <td>338.000000</td>\n",
428 |        "      <td>10.000000</td>\n",
429 |        "      <td>13752.000000</td>\n",
430 |        "    </tr>\n",
431 |        "  </tbody>\n",
432 |        "</table>\n",
433 |        "</div>"
434 |       ],
435 |       "text/plain": [
436 |        "             index        budget             id   popularity       revenue  \\\n",
437 |        "count  4803.000000  4.803000e+03    4803.000000  4803.000000  4.803000e+03   \n",
438 |        "mean   2401.000000  2.904504e+07   57165.484281    21.492301  8.226064e+07   \n",
439 |        "std    1386.651002  4.072239e+07   88694.614033    31.816650  1.628571e+08   \n",
440 |        "min       0.000000  0.000000e+00       5.000000     0.000000  0.000000e+00   \n",
441 |        "25%    1200.500000  7.900000e+05    9014.500000     4.668070  0.000000e+00   \n",
442 |        "50%    2401.000000  1.500000e+07   14629.000000    12.921594  1.917000e+07   \n",
443 |        "75%    3601.500000  4.000000e+07   58610.500000    28.313505  9.291719e+07   \n",
444 |        "max    4802.000000  3.800000e+08  459488.000000   875.581305  2.787965e+09   \n",
445 |        "\n",
446 |        "           runtime  vote_average    vote_count  \n",
447 |        "count  4801.000000   4803.000000   4803.000000  \n",
448 |        "mean    106.875859      6.092172    690.217989  \n",
449 |        "std      22.611935      1.194612   1234.585891  \n",
450 |        "min       0.000000      0.000000      0.000000  \n",
451 |        "25%      94.000000      5.600000     54.000000  \n",
452 |        "50%     103.000000      6.200000    235.000000  \n",
453 |        "75%     118.000000      6.800000    737.000000  \n",
454 |        "max     338.000000     10.000000  13752.000000  "
455 |       ]
456 |      },
457 |      "execution_count": 4,
458 |      "metadata": {},
459 |      "output_type": "execute_result"
460 |     }
461 |    ],
462 |    "source": [
463 |     "df.describe()"
464 |    ]
465 |   },
466 |   {
467 |    "cell_type": "code",
468 |    "execution_count": 5,
469 |    "metadata": {
470 |     "scrolled": true
471 |    },
472 |    "outputs": [
473 |     {
474 |      "name": "stdout",
475 |      "output_type": "stream",
476 |      "text": [
477 |       "['index' 'budget' 'genres' 'homepage' 'id' 'keywords' 'original_language'\n",
478 |       " 'original_title' 'overview' 'popularity' 'production_companies'\n",
479 |       " 'production_countries' 'release_date' 'revenue' 'runtime'\n",
480 |       " 'spoken_languages' 'status' 'tagline' 'title' 'vote_average' 'vote_count'\n",
481 |       " 'cast' 'crew' 'director']\n"
482 |      ]
483 |     }
484 |    ],
485 |    "source": [
486 |     "print(df.columns.values)"
487 |    ]
488 |   },
489 |   {
490 |    "cell_type": "markdown",
491 |    "metadata": {},
492 |    "source": [
493 |     "Onvisualizing the dataset, you may have noticed that it has many extra info about a movie. We don’t need all of them. So, we choose keywords, cast, genres, director and title column to use as our feature set."
494 |    ]
495 |   },
496 |   {
497 |    "cell_type": "code",
498 |    "execution_count": 6,
499 |    "metadata": {},
500 |    "outputs": [],
501 |    "source": [
502 |     "features = ['genres', 'keywords', 'title', 'cast', 'director']"
503 |    ]
504 |   },
505 |   {
506 |    "cell_type": "markdown",
507 |    "metadata": {},
508 |    "source": [
509 |     "As you may can noticed that some columns have NaN data points that will create a problem for us, so what we will do is instead of NaN values we will replace it with empty string ('')."
510 |    ]
511 |   },
512 |   {
513 |    "cell_type": "code",
514 |    "execution_count": 7,
515 |    "metadata": {},
516 |    "outputs": [
517 |     {
518 |      "data": {
519 |       "text/plain": [
520 |        "True"
521 |       ]
522 |      },
523 |      "execution_count": 7,
524 |      "metadata": {},
525 |      "output_type": "execute_result"
526 |     }
527 |    ],
528 |    "source": [
529 |     "df['cast'].isnull().values.any()"
530 |    ]
531 |   },
532 |   {
533 |    "cell_type": "markdown",
534 |    "metadata": {},
535 |    "source": [
536 |     "Our next task is to create a function for combining the values of these columns into a single string"
537 |    ]
538 |   },
539 |   {
540 |    "cell_type": "code",
541 |    "execution_count": 8,
542 |    "metadata": {},
543 |    "outputs": [],
544 |    "source": [
545 |     "def combine_features(row):\n",
546 |     "    return row['title']+' '+row['genres']+' '+row['director']+' '+row['keywords']+' '+row['cast']"
547 |    ]
548 |   },
549 |   {
550 |    "cell_type": "markdown",
551 |    "metadata": {},
552 |    "source": [
553 |     "Now, we need to call this function over each row of our dataframe. But, before doing that, we need to clean and preprocess the data for our use. We will fill all the NaN values with blank string in the dataframe"
554 |    ]
555 |   },
556 |   {
557 |    "cell_type": "code",
558 |    "execution_count": 9,
559 |    "metadata": {},
560 |    "outputs": [],
561 |    "source": [
562 |     "for feature in features:\n",
563 |     "    df[feature] = df[feature].fillna('')"
564 |    ]
565 |   },
566 |   {
567 |    "cell_type": "markdown",
568 |    "metadata": {},
569 |    "source": [
570 |     "applying combine_feature method over each row of Dataframe and storing the combined string in \"combined_features\" column"
571 |    ]
572 |   },
573 |   {
574 |    "cell_type": "code",
575 |    "execution_count": 10,
576 |    "metadata": {},
577 |    "outputs": [],
578 |    "source": [
579 |     "df['combined_features'] = df.apply(combine_features, axis = 1)"
580 |    ]
581 |   },
582 |   {
583 |    "cell_type": "code",
584 |    "execution_count": 11,
585 |    "metadata": {},
586 |    "outputs": [
587 |     {
588 |      "name": "stdout",
589 |      "output_type": "stream",
590 |      "text": [
591 |       "Avatar Action Adventure Fantasy Science Fiction James Cameron culture clash future space war space colony society Sam Worthington Zoe Saldana Sigourney Weaver Stephen Lang Michelle Rodriguez\n"
592 |      ]
593 |     }
594 |    ],
595 |    "source": [
596 |     "print(df.loc[0, 'combined_features'])"
597 |    ]
598 |   },
599 |   {
600 |    "cell_type": "markdown",
601 |    "metadata": {},
602 |    "source": [
603 |     "Now that we have obtained the combined strings, we can now feed these strings to a CountVectorizer() object for getting the count matrix."
604 |    ]
605 |   },
606 |   {
607 |    "cell_type": "code",
608 |    "execution_count": 12,
609 |    "metadata": {},
610 |    "outputs": [],
611 |    "source": [
612 |     "cv = CountVectorizer()\n",
613 |     "count_matrix = cv.fit_transform(df['combined_features'])"
614 |    ]
615 |   },
616 |   {
617 |    "cell_type": "markdown",
618 |    "metadata": {},
619 |    "source": [
620 |     "Now, we need to obtain the cosine similarity matrix from the count matrix."
621 |    ]
622 |   },
623 |   {
624 |    "cell_type": "code",
625 |    "execution_count": 13,
626 |    "metadata": {},
627 |    "outputs": [],
628 |    "source": [
629 |     "cosine_sim = cosine_similarity(count_matrix)"
630 |    ]
631 |   },
632 |   {
633 |    "cell_type": "markdown",
634 |    "metadata": {},
635 |    "source": [
636 |     "Now, we will define two helper functions to get movie title from movie index and vice-versa."
637 |    ]
638 |   },
639 |   {
640 |    "cell_type": "code",
641 |    "execution_count": 14,
642 |    "metadata": {},
643 |    "outputs": [],
644 |    "source": [
645 |     "def get_title_from_index(index):\n",
646 |     "    return df[df.index == index][\"title\"].values[0]\n",
647 |     "def get_index_from_title(title):\n",
648 |     "    return df[df.title == title][\"index\"].values[0]"
649 |    ]
650 |   },
651 |   {
652 |    "cell_type": "markdown",
653 |    "metadata": {},
654 |    "source": [
655 |     "Our next step is to get the title of the movie that the user currently likes. Then we will find the index of that movie. After that, we will access the row corresponding to this movie in the similarity matrix. Thus, we will get the similarity scores of all other movies from the current movie. Then we will enumerate through all the similarity scores of that movie to make a tuple of movie index and similarity score. This will convert a row of similarity scores like this- [1 0.5 0.2 0.9] to this- [(0, 1) (1, 0.5) (2, 0.2) (3, 0.9)] . Here, each item is in this form- (movie index, similarity score)"
656 |    ]
657 |   },
658 |   {
659 |    "cell_type": "code",
660 |    "execution_count": 15,
661 |    "metadata": {},
662 |    "outputs": [],
663 |    "source": [
664 |     "movie_user_likes = \"Star Trek Beyond\"\n",
665 |     "movie_index = get_index_from_title(movie_user_likes)\n",
666 |     "similar_movies = list(enumerate(cosine_sim[movie_index])) #accessing the row corresponding to given movie to find all the similarity scores for that movie and then enumerating over it"
667 |    ]
668 |   },
669 |   {
670 |    "cell_type": "markdown",
671 |    "metadata": {},
672 |    "source": [
673 |     "We will sort the list similar_movies according to similarity scores in descending order. Since the most similar movie to a given movie will be itself, we will discard the first element after sorting the movies."
674 |    ]
675 |   },
676 |   {
677 |    "cell_type": "code",
678 |    "execution_count": 16,
679 |    "metadata": {},
680 |    "outputs": [],
681 |    "source": [
682 |     "sorted_similar_movies = sorted(similar_movies,key=lambda x:x[1],reverse=True)[1:]"
683 |    ]
684 |   },
685 |   {
686 |    "cell_type": "markdown",
687 |    "metadata": {},
688 |    "source": [
689 |     "Then, we will run a loop to print first 5 entries from sorted_similar_movies list."
690 |    ]
691 |   },
692 |   {
693 |    "cell_type": "code",
694 |    "execution_count": 17,
695 |    "metadata": {},
696 |    "outputs": [
697 |     {
698 |      "name": "stdout",
699 |      "output_type": "stream",
700 |      "text": [
701 |       "Top 10 similar movies to Star Trek Beyond are:\n",
702 |       "\n",
703 |       "Star Trek Into Darkness\n",
704 |       "Star Trek\n",
705 |       "Guardians of the Galaxy\n",
706 |       "Avatar\n",
707 |       "Star Trek: Insurrection\n",
708 |       "Star Wars: Episode III - Revenge of the Sith\n",
709 |       "Avengers: Age of Ultron\n",
710 |       "Star Wars: Clone Wars: Volume 1\n",
711 |       "Star Trek: Nemesis\n",
712 |       "Mad Max Beyond Thunderdome\n",
713 |       "Zathura: A Space Adventure\n"
714 |      ]
715 |     }
716 |    ],
717 |    "source": [
718 |     "i=0\n",
719 |     "print(\"Top 10 similar movies to \"+movie_user_likes+\" are:\\n\")\n",
720 |     "for element in sorted_similar_movies:\n",
721 |     "    print(get_title_from_index(element[0]))\n",
722 |     "    i=i+1\n",
723 |     "    if i>10:\n",
724 |     "        break"
725 |    ]
726 |   },
727 |   {
728 |    "cell_type": "markdown",
729 |    "metadata": {},
730 |    "source": [
731 |     "##### And here is our Movie Recommendation System"
732 |    ]
733 |   },
734 |   {
735 |    "cell_type": "markdown",
736 |    "metadata": {},
737 |    "source": [
738 |     "After seeing the output, I went one step further to compare it to other recommendation engines.\n",
739 |     "\n",
740 |     "So, I searched Google for similar movies to “Star Trek Beyond” and here is what I got-"
741 |    ]
742 |   },
743 |   {
744 |    "cell_type": "markdown",
745 |    "metadata": {},
746 |    "source": [
747 |     "<img src=\"files/test.PNG\">"
748 |    ]
749 |   },
750 |   {
751 |    "cell_type": "code",
752 |    "execution_count": null,
753 |    "metadata": {},
754 |    "outputs": [],
755 |    "source": []
756 |   }
757 |  ],
758 |  "metadata": {
759 |   "kernelspec": {
760 |    "display_name": "Python 3",
761 |    "language": "python",
762 |    "name": "python3"
763 |   },
764 |   "language_info": {
765 |    "codemirror_mode": {
766 |     "name": "ipython",
767 |     "version": 3
768 |    },
769 |    "file_extension": ".py",
770 |    "mimetype": "text/x-python",
771 |    "name": "python",
772 |    "nbconvert_exporter": "python",
773 |    "pygments_lexer": "ipython3",
774 |    "version": "3.7.4"
775 |   }
776 |  },
777 |  "nbformat": 4,
778 |  "nbformat_minor": 2
779 | }
780 | 


--------------------------------------------------------------------------------
/test.PNG:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/sanchitbhasin/Content-Based-Movie-Recommendation-System/bb560d61e663d4876593d106b45ad141835abaa9/test.PNG


--------------------------------------------------------------------------------
	index	budget	genres	homepage	id	keywords	original_language	original_title	overview	popularity	...	runtime	spoken_languages	status	tagline	title	vote_average	vote_count	cast	crew	director
0	0	237000000	Action Adventure Fantasy Science Fiction	http://www.avatarmovie.com/	19995	culture clash future space war space colony so...	en	Avatar	In the 22nd century, a paraplegic Marine is di...	150.437577	...	162.0	[{\"iso_639_1\": \"en\", \"name\": \"English\"}, {\"iso...	Released	Enter the World of Pandora.	Avatar	7.2	11800	Sam Worthington Zoe Saldana Sigourney Weaver S...	[{'name': 'Stephen E. Rivkin', 'gender': 0, 'd...	James Cameron
1	1	300000000	Adventure Fantasy Action	http://disney.go.com/disneypictures/pirates/	285	ocean drug abuse exotic island east india trad...	en	Pirates of the Caribbean: At World's End	Captain Barbossa, long believed to be dead, ha...	139.082615	...	169.0	[{\"iso_639_1\": \"en\", \"name\": \"English\"}]	Released	At the end of the world, the adventure begins.	Pirates of the Caribbean: At World's End	6.9	4500	Johnny Depp Orlando Bloom Keira Knightley Stel...	[{'name': 'Dariusz Wolski', 'gender': 2, 'depa...	Gore Verbinski
2	2	245000000	Action Adventure Crime	http://www.sonypictures.com/movies/spectre/	206647	spy based on novel secret agent sequel mi6	en	Spectre	A cryptic message from Bond’s past sends him o...	107.376788	...	148.0	[{\"iso_639_1\": \"fr\", \"name\": \"Fran\\u00e7ais\"},...	Released	A Plan No One Escapes	Spectre	6.3	4466	Daniel Craig Christoph Waltz L\\u00e9a Seydoux ...	[{'name': 'Thomas Newman', 'gender': 2, 'depar...	Sam Mendes
3	3	250000000	Action Crime Drama Thriller	http://www.thedarkknightrises.com/	49026	dc comics crime fighter terrorist secret ident...	en	The Dark Knight Rises	Following the death of District Attorney Harve...	112.312950	...	165.0	[{\"iso_639_1\": \"en\", \"name\": \"English\"}]	Released	The Legend Ends	The Dark Knight Rises	7.6	9106	Christian Bale Michael Caine Gary Oldman Anne ...	[{'name': 'Hans Zimmer', 'gender': 2, 'departm...	Christopher Nolan
4	4	260000000	Action Adventure Science Fiction	http://movies.disney.com/john-carter	49529	based on novel mars medallion space travel pri...	en	John Carter	John Carter is a war-weary, former military ca...	43.926995	...	132.0	[{\"iso_639_1\": \"en\", \"name\": \"English\"}]	Released	Lost in our world, found in another.	John Carter	6.1	2124	Taylor Kitsch Lynn Collins Samantha Morton Wil...	[{'name': 'Andrew Stanton', 'gender': 2, 'depa...	Andrew Stanton
	index	budget	id	popularity	revenue	runtime	vote_average	vote_count
count	4803.000000	4.803000e+03	4803.000000	4803.000000	4.803000e+03	4801.000000	4803.000000	4803.000000
mean	2401.000000	2.904504e+07	57165.484281	21.492301	8.226064e+07	106.875859	6.092172	690.217989
std	1386.651002	4.072239e+07	88694.614033	31.816650	1.628571e+08	22.611935	1.194612	1234.585891
min	0.000000	0.000000e+00	5.000000	0.000000	0.000000e+00	0.000000	0.000000	0.000000
25%	1200.500000	7.900000e+05	9014.500000	4.668070	0.000000e+00	94.000000	5.600000	54.000000
50%	2401.000000	1.500000e+07	14629.000000	12.921594	1.917000e+07	103.000000	6.200000	235.000000
75%	3601.500000	4.000000e+07	58610.500000	28.313505	9.291719e+07	118.000000	6.800000	737.000000
max	4802.000000	3.800000e+08	459488.000000	875.581305	2.787965e+09	338.000000	10.000000	13752.000000