├── README.md ├── movie recommendation system.ipynb ├── movie_dataset.csv └── test.PNG /README.md: -------------------------------------------------------------------------------- 1 | # Content-Based-Movie-Recommendation-System 2 | Content Based Movie Recommendations System using machine Learning 3 | 4 | Wondered how Google comes up with movies that are similar to the ones you like? After reading this post you will be able to build one such recommendation system for yourself. 5 | 6 | It turns out that there are (mostly) three ways to build a recommendation engine: 7 | 8 | 1. Popularity based recommendation engine 9 | 2. Content based recommendation engine 10 | 3. Collaborative filtering based recommendation engine 11 | 12 | Now you might be thinking “That’s interesting. But, what are the differences between these recommendation engines?”. Let me help you out with that. 13 | 14 | ### Popularity based recommendation engine: 15 | 16 | Perhaps, this is the simplest kind of recommendation engine that you will come across. The trending list you see in YouTube or Netflix is based on this algorithm. It keeps a track of view counts for each movie/video and then lists movies based on views in descending order(highest view count to lowest view count). Pretty simple but, effective. Right? 17 | 18 | ### Content based recommendation engine: 19 | 20 | This type of recommendation systems, takes in a movie that a user currently likes as input. Then it analyzes the contents (storyline, genre, cast, director etc.) of the movie to find out other movies which have similar content. Then it ranks similar movies according to their similarity scores and recommends the most relevant movies to the user. 21 | 22 | ### Collaborative filtering based recommendation engine: 23 | 24 | This algorithm at first tries to find similar users based on their activities and preferences (for example, both the users watch same type of movies or movies directed by the same director). Now, between these users(say, A and B) if user A has seen a movie that user B has not seen yet, then that movie gets recommended to user B and vice-versa. In other words, the recommendations get filtered based on the collaboration between similar user’s preferences (thus, the name “Collaborative Filtering”). One typical application of this algorithm can be seen in the Amazon e-commerce platform, where you get to see the “Customers who viewed this item also viewed” and “Customers who bought this item also bought” list. 25 | 26 | But we are going to implement a Content based recommendation system using the scikit-learn library. 27 | Enjoy!!! 28 | 29 | 30 | Thanks - CodeHeroku 31 | -------------------------------------------------------------------------------- /movie recommendation system.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "#### Getting Started: Loading Libraries" 8 | ] 9 | }, 10 | { 11 | "cell_type": "code", 12 | "execution_count": 1, 13 | "metadata": {}, 14 | "outputs": [], 15 | "source": [ 16 | "import pandas as pd\n", 17 | "import numpy as np\n", 18 | "from sklearn.feature_extraction.text import CountVectorizer\n", 19 | "from sklearn.metrics.pairwise import cosine_similarity" 20 | ] 21 | }, 22 | { 23 | "cell_type": "markdown", 24 | "metadata": {}, 25 | "source": [ 26 | "#### Loading the Dataset\n", 27 | "Loading the Dataset provided by Kaggle The Movies Dataset to a Pandas DataFrame" 28 | ] 29 | }, 30 | { 31 | "cell_type": "code", 32 | "execution_count": 2, 33 | "metadata": {}, 34 | "outputs": [], 35 | "source": [ 36 | "df = pd.read_csv(\"movie_dataset.csv\")" 37 | ] 38 | }, 39 | { 40 | "cell_type": "markdown", 41 | "metadata": {}, 42 | "source": [ 43 | "We have our dataframe ready, so let`s visualize it" 44 | ] 45 | }, 46 | { 47 | "cell_type": "code", 48 | "execution_count": 3, 49 | "metadata": {}, 50 | "outputs": [ 51 | { 52 | "data": { 53 | "text/html": [ 54 | "
\n", 55 | "\n", 68 | "\n", 69 | " \n", 70 | " \n", 71 | " \n", 72 | " \n", 73 | " \n", 74 | " \n", 75 | " \n", 76 | " \n", 77 | " \n", 78 | " \n", 79 | " \n", 80 | " \n", 81 | " \n", 82 | " \n", 83 | " \n", 84 | " \n", 85 | " \n", 86 | " \n", 87 | " \n", 88 | " \n", 89 | " \n", 90 | " \n", 91 | " \n", 92 | " \n", 93 | " \n", 94 | " \n", 95 | " \n", 96 | " \n", 97 | " \n", 98 | " \n", 99 | " \n", 100 | " \n", 101 | " \n", 102 | " \n", 103 | " \n", 104 | " \n", 105 | " \n", 106 | " \n", 107 | " \n", 108 | " \n", 109 | " \n", 110 | " \n", 111 | " \n", 112 | " \n", 113 | " \n", 114 | " \n", 115 | " \n", 116 | " \n", 117 | " \n", 118 | " \n", 119 | " \n", 120 | " \n", 121 | " \n", 122 | " \n", 123 | " \n", 124 | " \n", 125 | " \n", 126 | " \n", 127 | " \n", 128 | " \n", 129 | " \n", 130 | " \n", 131 | " \n", 132 | " \n", 133 | " \n", 134 | " \n", 135 | " \n", 136 | " \n", 137 | " \n", 138 | " \n", 139 | " \n", 140 | " \n", 141 | " \n", 142 | " \n", 143 | " \n", 144 | " \n", 145 | " \n", 146 | " \n", 147 | " \n", 148 | " \n", 149 | " \n", 150 | " \n", 151 | " \n", 152 | " \n", 153 | " \n", 154 | " \n", 155 | " \n", 156 | " \n", 157 | " \n", 158 | " \n", 159 | " \n", 160 | " \n", 161 | " \n", 162 | " \n", 163 | " \n", 164 | " \n", 165 | " \n", 166 | " \n", 167 | " \n", 168 | " \n", 169 | " \n", 170 | " \n", 171 | " \n", 172 | " \n", 173 | " \n", 174 | " \n", 175 | " \n", 176 | " \n", 177 | " \n", 178 | " \n", 179 | " \n", 180 | " \n", 181 | " \n", 182 | " \n", 183 | " \n", 184 | " \n", 185 | " \n", 186 | " \n", 187 | " \n", 188 | " \n", 189 | " \n", 190 | " \n", 191 | " \n", 192 | " \n", 193 | " \n", 194 | " \n", 195 | " \n", 196 | " \n", 197 | " \n", 198 | " \n", 199 | " \n", 200 | " \n", 201 | " \n", 202 | " \n", 203 | " \n", 204 | " \n", 205 | " \n", 206 | " \n", 207 | " \n", 208 | " \n", 209 | " \n", 210 | " \n", 211 | " \n", 212 | " \n", 213 | " \n", 214 | " \n", 215 | " \n", 216 | " \n", 217 | "
indexbudgetgenreshomepageidkeywordsoriginal_languageoriginal_titleoverviewpopularity...runtimespoken_languagesstatustaglinetitlevote_averagevote_countcastcrewdirector
00237000000Action Adventure Fantasy Science Fictionhttp://www.avatarmovie.com/19995culture clash future space war space colony so...enAvatarIn the 22nd century, a paraplegic Marine is di...150.437577...162.0[{\"iso_639_1\": \"en\", \"name\": \"English\"}, {\"iso...ReleasedEnter the World of Pandora.Avatar7.211800Sam Worthington Zoe Saldana Sigourney Weaver S...[{'name': 'Stephen E. Rivkin', 'gender': 0, 'd...James Cameron
11300000000Adventure Fantasy Actionhttp://disney.go.com/disneypictures/pirates/285ocean drug abuse exotic island east india trad...enPirates of the Caribbean: At World's EndCaptain Barbossa, long believed to be dead, ha...139.082615...169.0[{\"iso_639_1\": \"en\", \"name\": \"English\"}]ReleasedAt the end of the world, the adventure begins.Pirates of the Caribbean: At World's End6.94500Johnny Depp Orlando Bloom Keira Knightley Stel...[{'name': 'Dariusz Wolski', 'gender': 2, 'depa...Gore Verbinski
22245000000Action Adventure Crimehttp://www.sonypictures.com/movies/spectre/206647spy based on novel secret agent sequel mi6enSpectreA cryptic message from Bond’s past sends him o...107.376788...148.0[{\"iso_639_1\": \"fr\", \"name\": \"Fran\\u00e7ais\"},...ReleasedA Plan No One EscapesSpectre6.34466Daniel Craig Christoph Waltz L\\u00e9a Seydoux ...[{'name': 'Thomas Newman', 'gender': 2, 'depar...Sam Mendes
33250000000Action Crime Drama Thrillerhttp://www.thedarkknightrises.com/49026dc comics crime fighter terrorist secret ident...enThe Dark Knight RisesFollowing the death of District Attorney Harve...112.312950...165.0[{\"iso_639_1\": \"en\", \"name\": \"English\"}]ReleasedThe Legend EndsThe Dark Knight Rises7.69106Christian Bale Michael Caine Gary Oldman Anne ...[{'name': 'Hans Zimmer', 'gender': 2, 'departm...Christopher Nolan
44260000000Action Adventure Science Fictionhttp://movies.disney.com/john-carter49529based on novel mars medallion space travel pri...enJohn CarterJohn Carter is a war-weary, former military ca...43.926995...132.0[{\"iso_639_1\": \"en\", \"name\": \"English\"}]ReleasedLost in our world, found in another.John Carter6.12124Taylor Kitsch Lynn Collins Samantha Morton Wil...[{'name': 'Andrew Stanton', 'gender': 2, 'depa...Andrew Stanton
\n", 218 | "

5 rows × 24 columns

\n", 219 | "
" 220 | ], 221 | "text/plain": [ 222 | " index budget genres \\\n", 223 | "0 0 237000000 Action Adventure Fantasy Science Fiction \n", 224 | "1 1 300000000 Adventure Fantasy Action \n", 225 | "2 2 245000000 Action Adventure Crime \n", 226 | "3 3 250000000 Action Crime Drama Thriller \n", 227 | "4 4 260000000 Action Adventure Science Fiction \n", 228 | "\n", 229 | " homepage id \\\n", 230 | "0 http://www.avatarmovie.com/ 19995 \n", 231 | "1 http://disney.go.com/disneypictures/pirates/ 285 \n", 232 | "2 http://www.sonypictures.com/movies/spectre/ 206647 \n", 233 | "3 http://www.thedarkknightrises.com/ 49026 \n", 234 | "4 http://movies.disney.com/john-carter 49529 \n", 235 | "\n", 236 | " keywords original_language \\\n", 237 | "0 culture clash future space war space colony so... en \n", 238 | "1 ocean drug abuse exotic island east india trad... en \n", 239 | "2 spy based on novel secret agent sequel mi6 en \n", 240 | "3 dc comics crime fighter terrorist secret ident... en \n", 241 | "4 based on novel mars medallion space travel pri... en \n", 242 | "\n", 243 | " original_title \\\n", 244 | "0 Avatar \n", 245 | "1 Pirates of the Caribbean: At World's End \n", 246 | "2 Spectre \n", 247 | "3 The Dark Knight Rises \n", 248 | "4 John Carter \n", 249 | "\n", 250 | " overview popularity ... runtime \\\n", 251 | "0 In the 22nd century, a paraplegic Marine is di... 150.437577 ... 162.0 \n", 252 | "1 Captain Barbossa, long believed to be dead, ha... 139.082615 ... 169.0 \n", 253 | "2 A cryptic message from Bond’s past sends him o... 107.376788 ... 148.0 \n", 254 | "3 Following the death of District Attorney Harve... 112.312950 ... 165.0 \n", 255 | "4 John Carter is a war-weary, former military ca... 43.926995 ... 132.0 \n", 256 | "\n", 257 | " spoken_languages status \\\n", 258 | "0 [{\"iso_639_1\": \"en\", \"name\": \"English\"}, {\"iso... Released \n", 259 | "1 [{\"iso_639_1\": \"en\", \"name\": \"English\"}] Released \n", 260 | "2 [{\"iso_639_1\": \"fr\", \"name\": \"Fran\\u00e7ais\"},... Released \n", 261 | "3 [{\"iso_639_1\": \"en\", \"name\": \"English\"}] Released \n", 262 | "4 [{\"iso_639_1\": \"en\", \"name\": \"English\"}] Released \n", 263 | "\n", 264 | " tagline \\\n", 265 | "0 Enter the World of Pandora. \n", 266 | "1 At the end of the world, the adventure begins. \n", 267 | "2 A Plan No One Escapes \n", 268 | "3 The Legend Ends \n", 269 | "4 Lost in our world, found in another. \n", 270 | "\n", 271 | " title vote_average vote_count \\\n", 272 | "0 Avatar 7.2 11800 \n", 273 | "1 Pirates of the Caribbean: At World's End 6.9 4500 \n", 274 | "2 Spectre 6.3 4466 \n", 275 | "3 The Dark Knight Rises 7.6 9106 \n", 276 | "4 John Carter 6.1 2124 \n", 277 | "\n", 278 | " cast \\\n", 279 | "0 Sam Worthington Zoe Saldana Sigourney Weaver S... \n", 280 | "1 Johnny Depp Orlando Bloom Keira Knightley Stel... \n", 281 | "2 Daniel Craig Christoph Waltz L\\u00e9a Seydoux ... \n", 282 | "3 Christian Bale Michael Caine Gary Oldman Anne ... \n", 283 | "4 Taylor Kitsch Lynn Collins Samantha Morton Wil... \n", 284 | "\n", 285 | " crew director \n", 286 | "0 [{'name': 'Stephen E. Rivkin', 'gender': 0, 'd... James Cameron \n", 287 | "1 [{'name': 'Dariusz Wolski', 'gender': 2, 'depa... Gore Verbinski \n", 288 | "2 [{'name': 'Thomas Newman', 'gender': 2, 'depar... Sam Mendes \n", 289 | "3 [{'name': 'Hans Zimmer', 'gender': 2, 'departm... Christopher Nolan \n", 290 | "4 [{'name': 'Andrew Stanton', 'gender': 2, 'depa... Andrew Stanton \n", 291 | "\n", 292 | "[5 rows x 24 columns]" 293 | ] 294 | }, 295 | "execution_count": 3, 296 | "metadata": {}, 297 | "output_type": "execute_result" 298 | } 299 | ], 300 | "source": [ 301 | "df.head()" 302 | ] 303 | }, 304 | { 305 | "cell_type": "code", 306 | "execution_count": 4, 307 | "metadata": { 308 | "scrolled": true 309 | }, 310 | "outputs": [ 311 | { 312 | "data": { 313 | "text/html": [ 314 | "
\n", 315 | "\n", 328 | "\n", 329 | " \n", 330 | " \n", 331 | " \n", 332 | " \n", 333 | " \n", 334 | " \n", 335 | " \n", 336 | " \n", 337 | " \n", 338 | " \n", 339 | " \n", 340 | " \n", 341 | " \n", 342 | " \n", 343 | " \n", 344 | " \n", 345 | " \n", 346 | " \n", 347 | " \n", 348 | " \n", 349 | " \n", 350 | " \n", 351 | " \n", 352 | " \n", 353 | " \n", 354 | " \n", 355 | " \n", 356 | " \n", 357 | " \n", 358 | " \n", 359 | " \n", 360 | " \n", 361 | " \n", 362 | " \n", 363 | " \n", 364 | " \n", 365 | " \n", 366 | " \n", 367 | " \n", 368 | " \n", 369 | " \n", 370 | " \n", 371 | " \n", 372 | " \n", 373 | " \n", 374 | " \n", 375 | " \n", 376 | " \n", 377 | " \n", 378 | " \n", 379 | " \n", 380 | " \n", 381 | " \n", 382 | " \n", 383 | " \n", 384 | " \n", 385 | " \n", 386 | " \n", 387 | " \n", 388 | " \n", 389 | " \n", 390 | " \n", 391 | " \n", 392 | " \n", 393 | " \n", 394 | " \n", 395 | " \n", 396 | " \n", 397 | " \n", 398 | " \n", 399 | " \n", 400 | " \n", 401 | " \n", 402 | " \n", 403 | " \n", 404 | " \n", 405 | " \n", 406 | " \n", 407 | " \n", 408 | " \n", 409 | " \n", 410 | " \n", 411 | " \n", 412 | " \n", 413 | " \n", 414 | " \n", 415 | " \n", 416 | " \n", 417 | " \n", 418 | " \n", 419 | " \n", 420 | " \n", 421 | " \n", 422 | " \n", 423 | " \n", 424 | " \n", 425 | " \n", 426 | " \n", 427 | " \n", 428 | " \n", 429 | " \n", 430 | " \n", 431 | " \n", 432 | "
indexbudgetidpopularityrevenueruntimevote_averagevote_count
count4803.0000004.803000e+034803.0000004803.0000004.803000e+034801.0000004803.0000004803.000000
mean2401.0000002.904504e+0757165.48428121.4923018.226064e+07106.8758596.092172690.217989
std1386.6510024.072239e+0788694.61403331.8166501.628571e+0822.6119351.1946121234.585891
min0.0000000.000000e+005.0000000.0000000.000000e+000.0000000.0000000.000000
25%1200.5000007.900000e+059014.5000004.6680700.000000e+0094.0000005.60000054.000000
50%2401.0000001.500000e+0714629.00000012.9215941.917000e+07103.0000006.200000235.000000
75%3601.5000004.000000e+0758610.50000028.3135059.291719e+07118.0000006.800000737.000000
max4802.0000003.800000e+08459488.000000875.5813052.787965e+09338.00000010.00000013752.000000
\n", 433 | "
" 434 | ], 435 | "text/plain": [ 436 | " index budget id popularity revenue \\\n", 437 | "count 4803.000000 4.803000e+03 4803.000000 4803.000000 4.803000e+03 \n", 438 | "mean 2401.000000 2.904504e+07 57165.484281 21.492301 8.226064e+07 \n", 439 | "std 1386.651002 4.072239e+07 88694.614033 31.816650 1.628571e+08 \n", 440 | "min 0.000000 0.000000e+00 5.000000 0.000000 0.000000e+00 \n", 441 | "25% 1200.500000 7.900000e+05 9014.500000 4.668070 0.000000e+00 \n", 442 | "50% 2401.000000 1.500000e+07 14629.000000 12.921594 1.917000e+07 \n", 443 | "75% 3601.500000 4.000000e+07 58610.500000 28.313505 9.291719e+07 \n", 444 | "max 4802.000000 3.800000e+08 459488.000000 875.581305 2.787965e+09 \n", 445 | "\n", 446 | " runtime vote_average vote_count \n", 447 | "count 4801.000000 4803.000000 4803.000000 \n", 448 | "mean 106.875859 6.092172 690.217989 \n", 449 | "std 22.611935 1.194612 1234.585891 \n", 450 | "min 0.000000 0.000000 0.000000 \n", 451 | "25% 94.000000 5.600000 54.000000 \n", 452 | "50% 103.000000 6.200000 235.000000 \n", 453 | "75% 118.000000 6.800000 737.000000 \n", 454 | "max 338.000000 10.000000 13752.000000 " 455 | ] 456 | }, 457 | "execution_count": 4, 458 | "metadata": {}, 459 | "output_type": "execute_result" 460 | } 461 | ], 462 | "source": [ 463 | "df.describe()" 464 | ] 465 | }, 466 | { 467 | "cell_type": "code", 468 | "execution_count": 5, 469 | "metadata": { 470 | "scrolled": true 471 | }, 472 | "outputs": [ 473 | { 474 | "name": "stdout", 475 | "output_type": "stream", 476 | "text": [ 477 | "['index' 'budget' 'genres' 'homepage' 'id' 'keywords' 'original_language'\n", 478 | " 'original_title' 'overview' 'popularity' 'production_companies'\n", 479 | " 'production_countries' 'release_date' 'revenue' 'runtime'\n", 480 | " 'spoken_languages' 'status' 'tagline' 'title' 'vote_average' 'vote_count'\n", 481 | " 'cast' 'crew' 'director']\n" 482 | ] 483 | } 484 | ], 485 | "source": [ 486 | "print(df.columns.values)" 487 | ] 488 | }, 489 | { 490 | "cell_type": "markdown", 491 | "metadata": {}, 492 | "source": [ 493 | "Onvisualizing the dataset, you may have noticed that it has many extra info about a movie. We don’t need all of them. So, we choose keywords, cast, genres, director and title column to use as our feature set." 494 | ] 495 | }, 496 | { 497 | "cell_type": "code", 498 | "execution_count": 6, 499 | "metadata": {}, 500 | "outputs": [], 501 | "source": [ 502 | "features = ['genres', 'keywords', 'title', 'cast', 'director']" 503 | ] 504 | }, 505 | { 506 | "cell_type": "markdown", 507 | "metadata": {}, 508 | "source": [ 509 | "As you may can noticed that some columns have NaN data points that will create a problem for us, so what we will do is instead of NaN values we will replace it with empty string ('')." 510 | ] 511 | }, 512 | { 513 | "cell_type": "code", 514 | "execution_count": 7, 515 | "metadata": {}, 516 | "outputs": [ 517 | { 518 | "data": { 519 | "text/plain": [ 520 | "True" 521 | ] 522 | }, 523 | "execution_count": 7, 524 | "metadata": {}, 525 | "output_type": "execute_result" 526 | } 527 | ], 528 | "source": [ 529 | "df['cast'].isnull().values.any()" 530 | ] 531 | }, 532 | { 533 | "cell_type": "markdown", 534 | "metadata": {}, 535 | "source": [ 536 | "Our next task is to create a function for combining the values of these columns into a single string" 537 | ] 538 | }, 539 | { 540 | "cell_type": "code", 541 | "execution_count": 8, 542 | "metadata": {}, 543 | "outputs": [], 544 | "source": [ 545 | "def combine_features(row):\n", 546 | " return row['title']+' '+row['genres']+' '+row['director']+' '+row['keywords']+' '+row['cast']" 547 | ] 548 | }, 549 | { 550 | "cell_type": "markdown", 551 | "metadata": {}, 552 | "source": [ 553 | "Now, we need to call this function over each row of our dataframe. But, before doing that, we need to clean and preprocess the data for our use. We will fill all the NaN values with blank string in the dataframe" 554 | ] 555 | }, 556 | { 557 | "cell_type": "code", 558 | "execution_count": 9, 559 | "metadata": {}, 560 | "outputs": [], 561 | "source": [ 562 | "for feature in features:\n", 563 | " df[feature] = df[feature].fillna('')" 564 | ] 565 | }, 566 | { 567 | "cell_type": "markdown", 568 | "metadata": {}, 569 | "source": [ 570 | "applying combine_feature method over each row of Dataframe and storing the combined string in \"combined_features\" column" 571 | ] 572 | }, 573 | { 574 | "cell_type": "code", 575 | "execution_count": 10, 576 | "metadata": {}, 577 | "outputs": [], 578 | "source": [ 579 | "df['combined_features'] = df.apply(combine_features, axis = 1)" 580 | ] 581 | }, 582 | { 583 | "cell_type": "code", 584 | "execution_count": 11, 585 | "metadata": {}, 586 | "outputs": [ 587 | { 588 | "name": "stdout", 589 | "output_type": "stream", 590 | "text": [ 591 | "Avatar Action Adventure Fantasy Science Fiction James Cameron culture clash future space war space colony society Sam Worthington Zoe Saldana Sigourney Weaver Stephen Lang Michelle Rodriguez\n" 592 | ] 593 | } 594 | ], 595 | "source": [ 596 | "print(df.loc[0, 'combined_features'])" 597 | ] 598 | }, 599 | { 600 | "cell_type": "markdown", 601 | "metadata": {}, 602 | "source": [ 603 | "Now that we have obtained the combined strings, we can now feed these strings to a CountVectorizer() object for getting the count matrix." 604 | ] 605 | }, 606 | { 607 | "cell_type": "code", 608 | "execution_count": 12, 609 | "metadata": {}, 610 | "outputs": [], 611 | "source": [ 612 | "cv = CountVectorizer()\n", 613 | "count_matrix = cv.fit_transform(df['combined_features'])" 614 | ] 615 | }, 616 | { 617 | "cell_type": "markdown", 618 | "metadata": {}, 619 | "source": [ 620 | "Now, we need to obtain the cosine similarity matrix from the count matrix." 621 | ] 622 | }, 623 | { 624 | "cell_type": "code", 625 | "execution_count": 13, 626 | "metadata": {}, 627 | "outputs": [], 628 | "source": [ 629 | "cosine_sim = cosine_similarity(count_matrix)" 630 | ] 631 | }, 632 | { 633 | "cell_type": "markdown", 634 | "metadata": {}, 635 | "source": [ 636 | "Now, we will define two helper functions to get movie title from movie index and vice-versa." 637 | ] 638 | }, 639 | { 640 | "cell_type": "code", 641 | "execution_count": 14, 642 | "metadata": {}, 643 | "outputs": [], 644 | "source": [ 645 | "def get_title_from_index(index):\n", 646 | " return df[df.index == index][\"title\"].values[0]\n", 647 | "def get_index_from_title(title):\n", 648 | " return df[df.title == title][\"index\"].values[0]" 649 | ] 650 | }, 651 | { 652 | "cell_type": "markdown", 653 | "metadata": {}, 654 | "source": [ 655 | "Our next step is to get the title of the movie that the user currently likes. Then we will find the index of that movie. After that, we will access the row corresponding to this movie in the similarity matrix. Thus, we will get the similarity scores of all other movies from the current movie. Then we will enumerate through all the similarity scores of that movie to make a tuple of movie index and similarity score. This will convert a row of similarity scores like this- [1 0.5 0.2 0.9] to this- [(0, 1) (1, 0.5) (2, 0.2) (3, 0.9)] . Here, each item is in this form- (movie index, similarity score)" 656 | ] 657 | }, 658 | { 659 | "cell_type": "code", 660 | "execution_count": 15, 661 | "metadata": {}, 662 | "outputs": [], 663 | "source": [ 664 | "movie_user_likes = \"Star Trek Beyond\"\n", 665 | "movie_index = get_index_from_title(movie_user_likes)\n", 666 | "similar_movies = list(enumerate(cosine_sim[movie_index])) #accessing the row corresponding to given movie to find all the similarity scores for that movie and then enumerating over it" 667 | ] 668 | }, 669 | { 670 | "cell_type": "markdown", 671 | "metadata": {}, 672 | "source": [ 673 | "We will sort the list similar_movies according to similarity scores in descending order. Since the most similar movie to a given movie will be itself, we will discard the first element after sorting the movies." 674 | ] 675 | }, 676 | { 677 | "cell_type": "code", 678 | "execution_count": 16, 679 | "metadata": {}, 680 | "outputs": [], 681 | "source": [ 682 | "sorted_similar_movies = sorted(similar_movies,key=lambda x:x[1],reverse=True)[1:]" 683 | ] 684 | }, 685 | { 686 | "cell_type": "markdown", 687 | "metadata": {}, 688 | "source": [ 689 | "Then, we will run a loop to print first 5 entries from sorted_similar_movies list." 690 | ] 691 | }, 692 | { 693 | "cell_type": "code", 694 | "execution_count": 17, 695 | "metadata": {}, 696 | "outputs": [ 697 | { 698 | "name": "stdout", 699 | "output_type": "stream", 700 | "text": [ 701 | "Top 10 similar movies to Star Trek Beyond are:\n", 702 | "\n", 703 | "Star Trek Into Darkness\n", 704 | "Star Trek\n", 705 | "Guardians of the Galaxy\n", 706 | "Avatar\n", 707 | "Star Trek: Insurrection\n", 708 | "Star Wars: Episode III - Revenge of the Sith\n", 709 | "Avengers: Age of Ultron\n", 710 | "Star Wars: Clone Wars: Volume 1\n", 711 | "Star Trek: Nemesis\n", 712 | "Mad Max Beyond Thunderdome\n", 713 | "Zathura: A Space Adventure\n" 714 | ] 715 | } 716 | ], 717 | "source": [ 718 | "i=0\n", 719 | "print(\"Top 10 similar movies to \"+movie_user_likes+\" are:\\n\")\n", 720 | "for element in sorted_similar_movies:\n", 721 | " print(get_title_from_index(element[0]))\n", 722 | " i=i+1\n", 723 | " if i>10:\n", 724 | " break" 725 | ] 726 | }, 727 | { 728 | "cell_type": "markdown", 729 | "metadata": {}, 730 | "source": [ 731 | "##### And here is our Movie Recommendation System" 732 | ] 733 | }, 734 | { 735 | "cell_type": "markdown", 736 | "metadata": {}, 737 | "source": [ 738 | "After seeing the output, I went one step further to compare it to other recommendation engines.\n", 739 | "\n", 740 | "So, I searched Google for similar movies to “Star Trek Beyond” and here is what I got-" 741 | ] 742 | }, 743 | { 744 | "cell_type": "markdown", 745 | "metadata": {}, 746 | "source": [ 747 | "" 748 | ] 749 | }, 750 | { 751 | "cell_type": "code", 752 | "execution_count": null, 753 | "metadata": {}, 754 | "outputs": [], 755 | "source": [] 756 | } 757 | ], 758 | "metadata": { 759 | "kernelspec": { 760 | "display_name": "Python 3", 761 | "language": "python", 762 | "name": "python3" 763 | }, 764 | "language_info": { 765 | "codemirror_mode": { 766 | "name": "ipython", 767 | "version": 3 768 | }, 769 | "file_extension": ".py", 770 | "mimetype": "text/x-python", 771 | "name": "python", 772 | "nbconvert_exporter": "python", 773 | "pygments_lexer": "ipython3", 774 | "version": "3.7.4" 775 | } 776 | }, 777 | "nbformat": 4, 778 | "nbformat_minor": 2 779 | } 780 | -------------------------------------------------------------------------------- /test.PNG: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/sanchitbhasin/Content-Based-Movie-Recommendation-System/bb560d61e663d4876593d106b45ad141835abaa9/test.PNG --------------------------------------------------------------------------------