├── README.md ├── movie recommendation system.ipynb ├── movie_dataset.csv └── test.PNG /README.md: -------------------------------------------------------------------------------- 1 | # Content-Based-Movie-Recommendation-System 2 | Content Based Movie Recommendations System using machine Learning 3 | 4 | Wondered how Google comes up with movies that are similar to the ones you like? After reading this post you will be able to build one such recommendation system for yourself. 5 | 6 | It turns out that there are (mostly) three ways to build a recommendation engine: 7 | 8 | 1. Popularity based recommendation engine 9 | 2. Content based recommendation engine 10 | 3. Collaborative filtering based recommendation engine 11 | 12 | Now you might be thinking “That’s interesting. But, what are the differences between these recommendation engines?”. Let me help you out with that. 13 | 14 | ### Popularity based recommendation engine: 15 | 16 | Perhaps, this is the simplest kind of recommendation engine that you will come across. The trending list you see in YouTube or Netflix is based on this algorithm. It keeps a track of view counts for each movie/video and then lists movies based on views in descending order(highest view count to lowest view count). Pretty simple but, effective. Right? 17 | 18 | ### Content based recommendation engine: 19 | 20 | This type of recommendation systems, takes in a movie that a user currently likes as input. Then it analyzes the contents (storyline, genre, cast, director etc.) of the movie to find out other movies which have similar content. Then it ranks similar movies according to their similarity scores and recommends the most relevant movies to the user. 21 | 22 | ### Collaborative filtering based recommendation engine: 23 | 24 | This algorithm at first tries to find similar users based on their activities and preferences (for example, both the users watch same type of movies or movies directed by the same director). Now, between these users(say, A and B) if user A has seen a movie that user B has not seen yet, then that movie gets recommended to user B and vice-versa. In other words, the recommendations get filtered based on the collaboration between similar user’s preferences (thus, the name “Collaborative Filtering”). One typical application of this algorithm can be seen in the Amazon e-commerce platform, where you get to see the “Customers who viewed this item also viewed” and “Customers who bought this item also bought” list. 25 | 26 | But we are going to implement a Content based recommendation system using the scikit-learn library. 27 | Enjoy!!! 28 | 29 | 30 | Thanks - CodeHeroku 31 | -------------------------------------------------------------------------------- /movie recommendation system.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "#### Getting Started: Loading Libraries" 8 | ] 9 | }, 10 | { 11 | "cell_type": "code", 12 | "execution_count": 1, 13 | "metadata": {}, 14 | "outputs": [], 15 | "source": [ 16 | "import pandas as pd\n", 17 | "import numpy as np\n", 18 | "from sklearn.feature_extraction.text import CountVectorizer\n", 19 | "from sklearn.metrics.pairwise import cosine_similarity" 20 | ] 21 | }, 22 | { 23 | "cell_type": "markdown", 24 | "metadata": {}, 25 | "source": [ 26 | "#### Loading the Dataset\n", 27 | "Loading the Dataset provided by Kaggle The Movies Dataset to a Pandas DataFrame" 28 | ] 29 | }, 30 | { 31 | "cell_type": "code", 32 | "execution_count": 2, 33 | "metadata": {}, 34 | "outputs": [], 35 | "source": [ 36 | "df = pd.read_csv(\"movie_dataset.csv\")" 37 | ] 38 | }, 39 | { 40 | "cell_type": "markdown", 41 | "metadata": {}, 42 | "source": [ 43 | "We have our dataframe ready, so let`s visualize it" 44 | ] 45 | }, 46 | { 47 | "cell_type": "code", 48 | "execution_count": 3, 49 | "metadata": {}, 50 | "outputs": [ 51 | { 52 | "data": { 53 | "text/html": [ 54 | "
| \n", 72 | " | index | \n", 73 | "budget | \n", 74 | "genres | \n", 75 | "homepage | \n", 76 | "id | \n", 77 | "keywords | \n", 78 | "original_language | \n", 79 | "original_title | \n", 80 | "overview | \n", 81 | "popularity | \n", 82 | "... | \n", 83 | "runtime | \n", 84 | "spoken_languages | \n", 85 | "status | \n", 86 | "tagline | \n", 87 | "title | \n", 88 | "vote_average | \n", 89 | "vote_count | \n", 90 | "cast | \n", 91 | "crew | \n", 92 | "director | \n", 93 | "
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | \n", 98 | "0 | \n", 99 | "237000000 | \n", 100 | "Action Adventure Fantasy Science Fiction | \n", 101 | "http://www.avatarmovie.com/ | \n", 102 | "19995 | \n", 103 | "culture clash future space war space colony so... | \n", 104 | "en | \n", 105 | "Avatar | \n", 106 | "In the 22nd century, a paraplegic Marine is di... | \n", 107 | "150.437577 | \n", 108 | "... | \n", 109 | "162.0 | \n", 110 | "[{\"iso_639_1\": \"en\", \"name\": \"English\"}, {\"iso... | \n", 111 | "Released | \n", 112 | "Enter the World of Pandora. | \n", 113 | "Avatar | \n", 114 | "7.2 | \n", 115 | "11800 | \n", 116 | "Sam Worthington Zoe Saldana Sigourney Weaver S... | \n", 117 | "[{'name': 'Stephen E. Rivkin', 'gender': 0, 'd... | \n", 118 | "James Cameron | \n", 119 | "
| 1 | \n", 122 | "1 | \n", 123 | "300000000 | \n", 124 | "Adventure Fantasy Action | \n", 125 | "http://disney.go.com/disneypictures/pirates/ | \n", 126 | "285 | \n", 127 | "ocean drug abuse exotic island east india trad... | \n", 128 | "en | \n", 129 | "Pirates of the Caribbean: At World's End | \n", 130 | "Captain Barbossa, long believed to be dead, ha... | \n", 131 | "139.082615 | \n", 132 | "... | \n", 133 | "169.0 | \n", 134 | "[{\"iso_639_1\": \"en\", \"name\": \"English\"}] | \n", 135 | "Released | \n", 136 | "At the end of the world, the adventure begins. | \n", 137 | "Pirates of the Caribbean: At World's End | \n", 138 | "6.9 | \n", 139 | "4500 | \n", 140 | "Johnny Depp Orlando Bloom Keira Knightley Stel... | \n", 141 | "[{'name': 'Dariusz Wolski', 'gender': 2, 'depa... | \n", 142 | "Gore Verbinski | \n", 143 | "
| 2 | \n", 146 | "2 | \n", 147 | "245000000 | \n", 148 | "Action Adventure Crime | \n", 149 | "http://www.sonypictures.com/movies/spectre/ | \n", 150 | "206647 | \n", 151 | "spy based on novel secret agent sequel mi6 | \n", 152 | "en | \n", 153 | "Spectre | \n", 154 | "A cryptic message from Bond’s past sends him o... | \n", 155 | "107.376788 | \n", 156 | "... | \n", 157 | "148.0 | \n", 158 | "[{\"iso_639_1\": \"fr\", \"name\": \"Fran\\u00e7ais\"},... | \n", 159 | "Released | \n", 160 | "A Plan No One Escapes | \n", 161 | "Spectre | \n", 162 | "6.3 | \n", 163 | "4466 | \n", 164 | "Daniel Craig Christoph Waltz L\\u00e9a Seydoux ... | \n", 165 | "[{'name': 'Thomas Newman', 'gender': 2, 'depar... | \n", 166 | "Sam Mendes | \n", 167 | "
| 3 | \n", 170 | "3 | \n", 171 | "250000000 | \n", 172 | "Action Crime Drama Thriller | \n", 173 | "http://www.thedarkknightrises.com/ | \n", 174 | "49026 | \n", 175 | "dc comics crime fighter terrorist secret ident... | \n", 176 | "en | \n", 177 | "The Dark Knight Rises | \n", 178 | "Following the death of District Attorney Harve... | \n", 179 | "112.312950 | \n", 180 | "... | \n", 181 | "165.0 | \n", 182 | "[{\"iso_639_1\": \"en\", \"name\": \"English\"}] | \n", 183 | "Released | \n", 184 | "The Legend Ends | \n", 185 | "The Dark Knight Rises | \n", 186 | "7.6 | \n", 187 | "9106 | \n", 188 | "Christian Bale Michael Caine Gary Oldman Anne ... | \n", 189 | "[{'name': 'Hans Zimmer', 'gender': 2, 'departm... | \n", 190 | "Christopher Nolan | \n", 191 | "
| 4 | \n", 194 | "4 | \n", 195 | "260000000 | \n", 196 | "Action Adventure Science Fiction | \n", 197 | "http://movies.disney.com/john-carter | \n", 198 | "49529 | \n", 199 | "based on novel mars medallion space travel pri... | \n", 200 | "en | \n", 201 | "John Carter | \n", 202 | "John Carter is a war-weary, former military ca... | \n", 203 | "43.926995 | \n", 204 | "... | \n", 205 | "132.0 | \n", 206 | "[{\"iso_639_1\": \"en\", \"name\": \"English\"}] | \n", 207 | "Released | \n", 208 | "Lost in our world, found in another. | \n", 209 | "John Carter | \n", 210 | "6.1 | \n", 211 | "2124 | \n", 212 | "Taylor Kitsch Lynn Collins Samantha Morton Wil... | \n", 213 | "[{'name': 'Andrew Stanton', 'gender': 2, 'depa... | \n", 214 | "Andrew Stanton | \n", 215 | "
5 rows × 24 columns
\n", 219 | "| \n", 332 | " | index | \n", 333 | "budget | \n", 334 | "id | \n", 335 | "popularity | \n", 336 | "revenue | \n", 337 | "runtime | \n", 338 | "vote_average | \n", 339 | "vote_count | \n", 340 | "
|---|---|---|---|---|---|---|---|---|
| count | \n", 345 | "4803.000000 | \n", 346 | "4.803000e+03 | \n", 347 | "4803.000000 | \n", 348 | "4803.000000 | \n", 349 | "4.803000e+03 | \n", 350 | "4801.000000 | \n", 351 | "4803.000000 | \n", 352 | "4803.000000 | \n", 353 | "
| mean | \n", 356 | "2401.000000 | \n", 357 | "2.904504e+07 | \n", 358 | "57165.484281 | \n", 359 | "21.492301 | \n", 360 | "8.226064e+07 | \n", 361 | "106.875859 | \n", 362 | "6.092172 | \n", 363 | "690.217989 | \n", 364 | "
| std | \n", 367 | "1386.651002 | \n", 368 | "4.072239e+07 | \n", 369 | "88694.614033 | \n", 370 | "31.816650 | \n", 371 | "1.628571e+08 | \n", 372 | "22.611935 | \n", 373 | "1.194612 | \n", 374 | "1234.585891 | \n", 375 | "
| min | \n", 378 | "0.000000 | \n", 379 | "0.000000e+00 | \n", 380 | "5.000000 | \n", 381 | "0.000000 | \n", 382 | "0.000000e+00 | \n", 383 | "0.000000 | \n", 384 | "0.000000 | \n", 385 | "0.000000 | \n", 386 | "
| 25% | \n", 389 | "1200.500000 | \n", 390 | "7.900000e+05 | \n", 391 | "9014.500000 | \n", 392 | "4.668070 | \n", 393 | "0.000000e+00 | \n", 394 | "94.000000 | \n", 395 | "5.600000 | \n", 396 | "54.000000 | \n", 397 | "
| 50% | \n", 400 | "2401.000000 | \n", 401 | "1.500000e+07 | \n", 402 | "14629.000000 | \n", 403 | "12.921594 | \n", 404 | "1.917000e+07 | \n", 405 | "103.000000 | \n", 406 | "6.200000 | \n", 407 | "235.000000 | \n", 408 | "
| 75% | \n", 411 | "3601.500000 | \n", 412 | "4.000000e+07 | \n", 413 | "58610.500000 | \n", 414 | "28.313505 | \n", 415 | "9.291719e+07 | \n", 416 | "118.000000 | \n", 417 | "6.800000 | \n", 418 | "737.000000 | \n", 419 | "
| max | \n", 422 | "4802.000000 | \n", 423 | "3.800000e+08 | \n", 424 | "459488.000000 | \n", 425 | "875.581305 | \n", 426 | "2.787965e+09 | \n", 427 | "338.000000 | \n", 428 | "10.000000 | \n", 429 | "13752.000000 | \n", 430 | "