├── LICENSE ├── README.md └── MOVIE RECOMMENDATION SYSTEM BASED ON CONTENT BASED LEARNING.ipynb /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2024 Dhruv Sharma 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Dive Deeper: Unleash the Power of Content-Based Movie Recommendations 2 | Tired of generic recommendations that miss the mark? This repository houses a content-based movie recommendation system designed to curate personalized suggestions just for you! 3 | 4 | What Makes This Special? 5 | 6 | This system goes beyond ratings and delves into what makes a movie tick. Analyzing genre, cast, director, keywords, and other descriptive elements identifies films with a similar cinematic DNA to the ones you love. So, prepare to discover hidden gems and embark on unexpected movie marathons! 7 | 8 | What You'll Find: 9 | 10 | - Content-Based Recommendation Engine: Built on the foundation of content analysis, this system prioritizes movies that share characteristics with your favorites. 11 | - Clean and Modular Code: Explore well-structured and readable code, making it easy to understand the implementation and potentially contribute your ideas. 12 | - Open to Expansion: This system is designed to grow! Feel free to contribute additional movie data and features to refine the recommendations further. 13 | 14 | This is your chance to actively shape the future of this recommendation system. Here's how you can contribute: 15 | 16 | - Feature Requests: Have an idea for an additional data point or a desired functionality? Open an issue to discuss it! 17 | - Code Contributions: Fork the repository, make your changes, and submit a pull request to share your improvements. 18 | - Spread the Word: Star this repository and share it with fellow movie enthusiasts who crave a more tailored recommendation experience. 19 | 20 | Follow Me for More! 21 | 22 | Stay tuned for further updates and exciting projects by following me (your GitHub username) on GitHub. 23 | 24 | License: 25 | 26 | This repository is licensed under the MIT License. 27 | 28 | By Starring This Project: 29 | 30 | You show your support and help it gain visibility within the developer and movie buff communities. Let's build a recommendation system that truly caters to individual tastes! 31 | -------------------------------------------------------------------------------- /MOVIE RECOMMENDATION SYSTEM BASED ON CONTENT BASED LEARNING.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": 1, 6 | "id": "b62fec74", 7 | "metadata": {}, 8 | "outputs": [], 9 | "source": [ 10 | "import numpy as np #NumPy is mostly used for working with Numerical values as it makes it easy to apply mathematical functions\n", 11 | "import pandas as pd #Pandas is mostly used for data analysis tasks in Python.WITH THE HELP OF THIS LIBRARY USE OF ARRAYS BECOMES POSSIBLE AND EASIER.\n", 12 | " #Pandas is a Python library used for working with data sets. It has functions for analyzing, cleaning, exploring, and manipulating data\n", 13 | "movies=pd.read_csv(\"tmdb_5000_movies.csv\")#from here basically we are variable name movies in which our whole dataset is loaded.\n", 14 | "credits=pd.read_csv(\"tmdb_5000_credits.csv\")\n", 15 | "\n", 16 | "#as we know that we have two datasets.it's gonna be hectic for dealing with the two datasets so let's merge them and make it one.because both datasets contains same imformatiom for the same movies.\n", 17 | "#according to the dataset we can merge them on the basis of the movie_id and title.merge will only happen when there is something same in both of them.\n", 18 | "#so let's merge them on the basis title we can merge with title also.\n", 19 | "\n", 20 | "# movies.merge(credits,on=\"title\")#kisse.merge(merge karna hai)(kiske sath,on(on the basis of which thing)=title)\n", 21 | "# movies.merge(credits,on=\"title\").shape#(.shape) is used for giving me total rows and columns.\n", 22 | "# #we get total columns=23 not 24 because ek comman ke base pe humne merge kiya hai that's why.\n", 23 | "# movies=movies.merge(credits,on=\"title\")#esse basically humne yeh kiya hai ki joh hmara new merge dataset hogaa oska naam humne movies he rakhdiya hai.\n", 24 | "# movies.shape\n", 25 | "# credits.shape" 26 | ] 27 | }, 28 | { 29 | "cell_type": "code", 30 | "execution_count": 2, 31 | "id": "aec15ab7", 32 | "metadata": {}, 33 | "outputs": [ 34 | { 35 | "data": { 36 | "text/html": [ 37 | "
\n", 38 | "\n", 51 | "\n", 52 | " \n", 53 | " \n", 54 | " \n", 55 | " \n", 56 | " \n", 57 | " \n", 58 | " \n", 59 | " \n", 60 | " \n", 61 | " \n", 62 | " \n", 63 | " \n", 64 | " \n", 65 | " \n", 66 | " \n", 67 | " \n", 68 | " \n", 69 | " \n", 70 | " \n", 71 | " \n", 72 | " \n", 73 | " \n", 74 | " \n", 75 | " \n", 76 | " \n", 77 | " \n", 78 | " \n", 79 | " \n", 80 | " \n", 81 | " \n", 82 | " \n", 83 | " \n", 84 | " \n", 85 | " \n", 86 | " \n", 87 | " \n", 88 | " \n", 89 | " \n", 90 | " \n", 91 | " \n", 92 | " \n", 93 | " \n", 94 | " \n", 95 | " \n", 96 | " \n", 97 | " \n", 98 | " \n", 99 | " \n", 100 | " \n", 101 | " \n", 102 | "
budgetgenreshomepageidkeywordsoriginal_languageoriginal_titleoverviewpopularityproduction_companiesproduction_countriesrelease_daterevenueruntimespoken_languagesstatustaglinetitlevote_averagevote_count
0237000000[{\"id\": 28, \"name\": \"Action\"}, {\"id\": 12, \"nam...http://www.avatarmovie.com/19995[{\"id\": 1463, \"name\": \"culture clash\"}, {\"id\":...enAvatarIn the 22nd century, a paraplegic Marine is di...150.437577[{\"name\": \"Ingenious Film Partners\", \"id\": 289...[{\"iso_3166_1\": \"US\", \"name\": \"United States o...2009-12-102787965087162.0[{\"iso_639_1\": \"en\", \"name\": \"English\"}, {\"iso...ReleasedEnter the World of Pandora.Avatar7.211800
\n", 103 | "
" 104 | ], 105 | "text/plain": [ 106 | " budget genres \\\n", 107 | "0 237000000 [{\"id\": 28, \"name\": \"Action\"}, {\"id\": 12, \"nam... \n", 108 | "\n", 109 | " homepage id \\\n", 110 | "0 http://www.avatarmovie.com/ 19995 \n", 111 | "\n", 112 | " keywords original_language \\\n", 113 | "0 [{\"id\": 1463, \"name\": \"culture clash\"}, {\"id\":... en \n", 114 | "\n", 115 | " original_title overview \\\n", 116 | "0 Avatar In the 22nd century, a paraplegic Marine is di... \n", 117 | "\n", 118 | " popularity production_companies \\\n", 119 | "0 150.437577 [{\"name\": \"Ingenious Film Partners\", \"id\": 289... \n", 120 | "\n", 121 | " production_countries release_date revenue \\\n", 122 | "0 [{\"iso_3166_1\": \"US\", \"name\": \"United States o... 2009-12-10 2787965087 \n", 123 | "\n", 124 | " runtime spoken_languages status \\\n", 125 | "0 162.0 [{\"iso_639_1\": \"en\", \"name\": \"English\"}, {\"iso... Released \n", 126 | "\n", 127 | " tagline title vote_average vote_count \n", 128 | "0 Enter the World of Pandora. Avatar 7.2 11800 " 129 | ] 130 | }, 131 | "execution_count": 2, 132 | "metadata": {}, 133 | "output_type": "execute_result" 134 | } 135 | ], 136 | "source": [ 137 | "#movies.head() # esse hum dataframe dekh rhe hai movies valle dataset ka\n", 138 | "movies.head(1) #it is used for representing only the first row of dataframe. means ek puri movie ka data mill jaega aapko." 139 | ] 140 | }, 141 | { 142 | "cell_type": "code", 143 | "execution_count": 3, 144 | "id": "bcea33f1", 145 | "metadata": { 146 | "scrolled": true 147 | }, 148 | "outputs": [], 149 | "source": [ 150 | "#credits.head(1) #it is used for representing only the first mcredit of the whole dataframe.\n", 151 | "#credits.head(1)['cast']#it is used for representing the first dataframe's cast column only.\n", 152 | "#credits.head(1)['cast'].values#it gives me all the values present in the cast.\n", 153 | "# credits.head(1)['crew'].values#it gives me all the values present in the cast." 154 | ] 155 | }, 156 | { 157 | "cell_type": "code", 158 | "execution_count": 4, 159 | "id": "e860133c", 160 | "metadata": { 161 | "scrolled": true 162 | }, 163 | "outputs": [ 164 | { 165 | "data": { 166 | "text/html": [ 167 | "
\n", 168 | "\n", 181 | "\n", 182 | " \n", 183 | " \n", 184 | " \n", 185 | " \n", 186 | " \n", 187 | " \n", 188 | " \n", 189 | " \n", 190 | " \n", 191 | " \n", 192 | " \n", 193 | " \n", 194 | " \n", 195 | " \n", 196 | " \n", 197 | " \n", 198 | " \n", 199 | " \n", 200 | " \n", 201 | " \n", 202 | " \n", 203 | " \n", 204 | " \n", 205 | " \n", 206 | " \n", 207 | " \n", 208 | " \n", 209 | " \n", 210 | " \n", 211 | " \n", 212 | " \n", 213 | " \n", 214 | " \n", 215 | " \n", 216 | " \n", 217 | " \n", 218 | " \n", 219 | " \n", 220 | " \n", 221 | " \n", 222 | " \n", 223 | " \n", 224 | " \n", 225 | " \n", 226 | " \n", 227 | " \n", 228 | " \n", 229 | " \n", 230 | " \n", 231 | " \n", 232 | " \n", 233 | " \n", 234 | " \n", 235 | " \n", 236 | " \n", 237 | " \n", 238 | " \n", 239 | " \n", 240 | " \n", 241 | " \n", 242 | " \n", 243 | " \n", 244 | " \n", 245 | " \n", 246 | " \n", 247 | " \n", 248 | " \n", 249 | " \n", 250 | " \n", 251 | " \n", 252 | " \n", 253 | " \n", 254 | " \n", 255 | " \n", 256 | " \n", 257 | " \n", 258 | " \n", 259 | " \n", 260 | " \n", 261 | " \n", 262 | " \n", 263 | " \n", 264 | " \n", 265 | " \n", 266 | " \n", 267 | " \n", 268 | " \n", 269 | " \n", 270 | " \n", 271 | " \n", 272 | " \n", 273 | " \n", 274 | " \n", 275 | " \n", 276 | " \n", 277 | " \n", 278 | " \n", 279 | " \n", 280 | " \n", 281 | " \n", 282 | " \n", 283 | " \n", 284 | " \n", 285 | " \n", 286 | " \n", 287 | " \n", 288 | " \n", 289 | " \n", 290 | " \n", 291 | " \n", 292 | " \n", 293 | " \n", 294 | " \n", 295 | " \n", 296 | " \n", 297 | " \n", 298 | " \n", 299 | " \n", 300 | " \n", 301 | " \n", 302 | " \n", 303 | " \n", 304 | " \n", 305 | " \n", 306 | " \n", 307 | " \n", 308 | " \n", 309 | " \n", 310 | " \n", 311 | " \n", 312 | " \n", 313 | " \n", 314 | " \n", 315 | " \n", 316 | " \n", 317 | " \n", 318 | " \n", 319 | " \n", 320 | " \n", 321 | " \n", 322 | " \n", 323 | " \n", 324 | " \n", 325 | " \n", 326 | " \n", 327 | " \n", 328 | " \n", 329 | " \n", 330 | " \n", 331 | " \n", 332 | " \n", 333 | " \n", 334 | " \n", 335 | " \n", 336 | " \n", 337 | " \n", 338 | " \n", 339 | " \n", 340 | " \n", 341 | " \n", 342 | " \n", 343 | " \n", 344 | " \n", 345 | " \n", 346 | " \n", 347 | " \n", 348 | " \n", 349 | " \n", 350 | " \n", 351 | " \n", 352 | " \n", 353 | " \n", 354 | " \n", 355 | " \n", 356 | " \n", 357 | " \n", 358 | " \n", 359 | " \n", 360 | " \n", 361 | " \n", 362 | " \n", 363 | " \n", 364 | " \n", 365 | " \n", 366 | " \n", 367 | " \n", 368 | " \n", 369 | " \n", 370 | " \n", 371 | " \n", 372 | " \n", 373 | " \n", 374 | " \n", 375 | " \n", 376 | " \n", 377 | " \n", 378 | " \n", 379 | " \n", 380 | " \n", 381 | " \n", 382 | " \n", 383 | " \n", 384 | " \n", 385 | " \n", 386 | " \n", 387 | " \n", 388 | " \n", 389 | " \n", 390 | " \n", 391 | " \n", 392 | " \n", 393 | " \n", 394 | " \n", 395 | " \n", 396 | " \n", 397 | " \n", 398 | " \n", 399 | " \n", 400 | " \n", 401 | " \n", 402 | " \n", 403 | " \n", 404 | " \n", 405 | " \n", 406 | " \n", 407 | " \n", 408 | " \n", 409 | " \n", 410 | " \n", 411 | " \n", 412 | " \n", 413 | " \n", 414 | " \n", 415 | " \n", 416 | " \n", 417 | " \n", 418 | " \n", 419 | " \n", 420 | " \n", 421 | " \n", 422 | " \n", 423 | " \n", 424 | " \n", 425 | " \n", 426 | " \n", 427 | " \n", 428 | " \n", 429 | " \n", 430 | " \n", 431 | " \n", 432 | " \n", 433 | " \n", 434 | " \n", 435 | " \n", 436 | " \n", 437 | " \n", 438 | " \n", 439 | " \n", 440 | " \n", 441 | " \n", 442 | " \n", 443 | " \n", 444 | " \n", 445 | " \n", 446 | " \n", 447 | " \n", 448 | " \n", 449 | " \n", 450 | " \n", 451 | " \n", 452 | " \n", 453 | " \n", 454 | " \n", 455 | " \n", 456 | " \n", 457 | " \n", 458 | " \n", 459 | " \n", 460 | " \n", 461 | " \n", 462 | " \n", 463 | " \n", 464 | " \n", 465 | " \n", 466 | " \n", 467 | " \n", 468 | " \n", 469 | " \n", 470 | " \n", 471 | " \n", 472 | " \n", 473 | " \n", 474 | "
budgetgenreshomepageidkeywordsoriginal_languageoriginal_titleoverviewpopularityproduction_companies...runtimespoken_languagesstatustaglinetitlevote_averagevote_countmovie_idcastcrew
0237000000[{\"id\": 28, \"name\": \"Action\"}, {\"id\": 12, \"nam...http://www.avatarmovie.com/19995[{\"id\": 1463, \"name\": \"culture clash\"}, {\"id\":...enAvatarIn the 22nd century, a paraplegic Marine is di...150.437577[{\"name\": \"Ingenious Film Partners\", \"id\": 289......162.0[{\"iso_639_1\": \"en\", \"name\": \"English\"}, {\"iso...ReleasedEnter the World of Pandora.Avatar7.21180019995[{\"cast_id\": 242, \"character\": \"Jake Sully\", \"...[{\"credit_id\": \"52fe48009251416c750aca23\", \"de...
1300000000[{\"id\": 12, \"name\": \"Adventure\"}, {\"id\": 14, \"...http://disney.go.com/disneypictures/pirates/285[{\"id\": 270, \"name\": \"ocean\"}, {\"id\": 726, \"na...enPirates of the Caribbean: At World's EndCaptain Barbossa, long believed to be dead, ha...139.082615[{\"name\": \"Walt Disney Pictures\", \"id\": 2}, {\"......169.0[{\"iso_639_1\": \"en\", \"name\": \"English\"}]ReleasedAt the end of the world, the adventure begins.Pirates of the Caribbean: At World's End6.94500285[{\"cast_id\": 4, \"character\": \"Captain Jack Spa...[{\"credit_id\": \"52fe4232c3a36847f800b579\", \"de...
2245000000[{\"id\": 28, \"name\": \"Action\"}, {\"id\": 12, \"nam...http://www.sonypictures.com/movies/spectre/206647[{\"id\": 470, \"name\": \"spy\"}, {\"id\": 818, \"name...enSpectreA cryptic message from Bond’s past sends him o...107.376788[{\"name\": \"Columbia Pictures\", \"id\": 5}, {\"nam......148.0[{\"iso_639_1\": \"fr\", \"name\": \"Fran\\u00e7ais\"},...ReleasedA Plan No One EscapesSpectre6.34466206647[{\"cast_id\": 1, \"character\": \"James Bond\", \"cr...[{\"credit_id\": \"54805967c3a36829b5002c41\", \"de...
3250000000[{\"id\": 28, \"name\": \"Action\"}, {\"id\": 80, \"nam...http://www.thedarkknightrises.com/49026[{\"id\": 849, \"name\": \"dc comics\"}, {\"id\": 853,...enThe Dark Knight RisesFollowing the death of District Attorney Harve...112.312950[{\"name\": \"Legendary Pictures\", \"id\": 923}, {\"......165.0[{\"iso_639_1\": \"en\", \"name\": \"English\"}]ReleasedThe Legend EndsThe Dark Knight Rises7.6910649026[{\"cast_id\": 2, \"character\": \"Bruce Wayne / Ba...[{\"credit_id\": \"52fe4781c3a36847f81398c3\", \"de...
4260000000[{\"id\": 28, \"name\": \"Action\"}, {\"id\": 12, \"nam...http://movies.disney.com/john-carter49529[{\"id\": 818, \"name\": \"based on novel\"}, {\"id\":...enJohn CarterJohn Carter is a war-weary, former military ca...43.926995[{\"name\": \"Walt Disney Pictures\", \"id\": 2}]...132.0[{\"iso_639_1\": \"en\", \"name\": \"English\"}]ReleasedLost in our world, found in another.John Carter6.1212449529[{\"cast_id\": 5, \"character\": \"John Carter\", \"c...[{\"credit_id\": \"52fe479ac3a36847f813eaa3\", \"de...
..................................................................
4804220000[{\"id\": 28, \"name\": \"Action\"}, {\"id\": 80, \"nam...NaN9367[{\"id\": 5616, \"name\": \"united states\\u2013mexi...esEl MariachiEl Mariachi just wants to play his guitar and ...14.269792[{\"name\": \"Columbia Pictures\", \"id\": 5}]...81.0[{\"iso_639_1\": \"es\", \"name\": \"Espa\\u00f1ol\"}]ReleasedHe didn't come looking for trouble, but troubl...El Mariachi6.62389367[{\"cast_id\": 1, \"character\": \"El Mariachi\", \"c...[{\"credit_id\": \"52fe44eec3a36847f80b280b\", \"de...
48059000[{\"id\": 35, \"name\": \"Comedy\"}, {\"id\": 10749, \"...NaN72766[]enNewlywedsA newlywed couple's honeymoon is upended by th...0.642552[]...85.0[]ReleasedA newlywed couple's honeymoon is upended by th...Newlyweds5.9572766[{\"cast_id\": 1, \"character\": \"Buzzy\", \"credit_...[{\"credit_id\": \"52fe487dc3a368484e0fb013\", \"de...
48060[{\"id\": 35, \"name\": \"Comedy\"}, {\"id\": 18, \"nam...http://www.hallmarkchannel.com/signedsealeddel...231617[{\"id\": 248, \"name\": \"date\"}, {\"id\": 699, \"nam...enSigned, Sealed, Delivered\"Signed, Sealed, Delivered\" introduces a dedic...1.444476[{\"name\": \"Front Street Pictures\", \"id\": 3958}......120.0[{\"iso_639_1\": \"en\", \"name\": \"English\"}]ReleasedNaNSigned, Sealed, Delivered7.06231617[{\"cast_id\": 8, \"character\": \"Oliver O\\u2019To...[{\"credit_id\": \"52fe4df3c3a36847f8275ecf\", \"de...
48070[]http://shanghaicalling.com/126186[]enShanghai CallingWhen ambitious New York attorney Sam is sent t...0.857008[]...98.0[{\"iso_639_1\": \"en\", \"name\": \"English\"}]ReleasedA New Yorker in ShanghaiShanghai Calling5.77126186[{\"cast_id\": 3, \"character\": \"Sam\", \"credit_id...[{\"credit_id\": \"52fe4ad9c3a368484e16a36b\", \"de...
48080[{\"id\": 99, \"name\": \"Documentary\"}]NaN25975[{\"id\": 1523, \"name\": \"obsession\"}, {\"id\": 224...enMy Date with DrewEver since the second grade when he first saw ...1.929883[{\"name\": \"rusty bear entertainment\", \"id\": 87......90.0[{\"iso_639_1\": \"en\", \"name\": \"English\"}]ReleasedNaNMy Date with Drew6.31625975[{\"cast_id\": 3, \"character\": \"Herself\", \"credi...[{\"credit_id\": \"58ce021b9251415a390165d9\", \"de...
\n", 475 | "

4809 rows × 23 columns

\n", 476 | "
" 477 | ], 478 | "text/plain": [ 479 | " budget genres \\\n", 480 | "0 237000000 [{\"id\": 28, \"name\": \"Action\"}, {\"id\": 12, \"nam... \n", 481 | "1 300000000 [{\"id\": 12, \"name\": \"Adventure\"}, {\"id\": 14, \"... \n", 482 | "2 245000000 [{\"id\": 28, \"name\": \"Action\"}, {\"id\": 12, \"nam... \n", 483 | "3 250000000 [{\"id\": 28, \"name\": \"Action\"}, {\"id\": 80, \"nam... \n", 484 | "4 260000000 [{\"id\": 28, \"name\": \"Action\"}, {\"id\": 12, \"nam... \n", 485 | "... ... ... \n", 486 | "4804 220000 [{\"id\": 28, \"name\": \"Action\"}, {\"id\": 80, \"nam... \n", 487 | "4805 9000 [{\"id\": 35, \"name\": \"Comedy\"}, {\"id\": 10749, \"... \n", 488 | "4806 0 [{\"id\": 35, \"name\": \"Comedy\"}, {\"id\": 18, \"nam... \n", 489 | "4807 0 [] \n", 490 | "4808 0 [{\"id\": 99, \"name\": \"Documentary\"}] \n", 491 | "\n", 492 | " homepage id \\\n", 493 | "0 http://www.avatarmovie.com/ 19995 \n", 494 | "1 http://disney.go.com/disneypictures/pirates/ 285 \n", 495 | "2 http://www.sonypictures.com/movies/spectre/ 206647 \n", 496 | "3 http://www.thedarkknightrises.com/ 49026 \n", 497 | "4 http://movies.disney.com/john-carter 49529 \n", 498 | "... ... ... \n", 499 | "4804 NaN 9367 \n", 500 | "4805 NaN 72766 \n", 501 | "4806 http://www.hallmarkchannel.com/signedsealeddel... 231617 \n", 502 | "4807 http://shanghaicalling.com/ 126186 \n", 503 | "4808 NaN 25975 \n", 504 | "\n", 505 | " keywords original_language \\\n", 506 | "0 [{\"id\": 1463, \"name\": \"culture clash\"}, {\"id\":... en \n", 507 | "1 [{\"id\": 270, \"name\": \"ocean\"}, {\"id\": 726, \"na... en \n", 508 | "2 [{\"id\": 470, \"name\": \"spy\"}, {\"id\": 818, \"name... en \n", 509 | "3 [{\"id\": 849, \"name\": \"dc comics\"}, {\"id\": 853,... en \n", 510 | "4 [{\"id\": 818, \"name\": \"based on novel\"}, {\"id\":... en \n", 511 | "... ... ... \n", 512 | "4804 [{\"id\": 5616, \"name\": \"united states\\u2013mexi... es \n", 513 | "4805 [] en \n", 514 | "4806 [{\"id\": 248, \"name\": \"date\"}, {\"id\": 699, \"nam... en \n", 515 | "4807 [] en \n", 516 | "4808 [{\"id\": 1523, \"name\": \"obsession\"}, {\"id\": 224... en \n", 517 | "\n", 518 | " original_title \\\n", 519 | "0 Avatar \n", 520 | "1 Pirates of the Caribbean: At World's End \n", 521 | "2 Spectre \n", 522 | "3 The Dark Knight Rises \n", 523 | "4 John Carter \n", 524 | "... ... \n", 525 | "4804 El Mariachi \n", 526 | "4805 Newlyweds \n", 527 | "4806 Signed, Sealed, Delivered \n", 528 | "4807 Shanghai Calling \n", 529 | "4808 My Date with Drew \n", 530 | "\n", 531 | " overview popularity \\\n", 532 | "0 In the 22nd century, a paraplegic Marine is di... 150.437577 \n", 533 | "1 Captain Barbossa, long believed to be dead, ha... 139.082615 \n", 534 | "2 A cryptic message from Bond’s past sends him o... 107.376788 \n", 535 | "3 Following the death of District Attorney Harve... 112.312950 \n", 536 | "4 John Carter is a war-weary, former military ca... 43.926995 \n", 537 | "... ... ... \n", 538 | "4804 El Mariachi just wants to play his guitar and ... 14.269792 \n", 539 | "4805 A newlywed couple's honeymoon is upended by th... 0.642552 \n", 540 | "4806 \"Signed, Sealed, Delivered\" introduces a dedic... 1.444476 \n", 541 | "4807 When ambitious New York attorney Sam is sent t... 0.857008 \n", 542 | "4808 Ever since the second grade when he first saw ... 1.929883 \n", 543 | "\n", 544 | " production_companies ... runtime \\\n", 545 | "0 [{\"name\": \"Ingenious Film Partners\", \"id\": 289... ... 162.0 \n", 546 | "1 [{\"name\": \"Walt Disney Pictures\", \"id\": 2}, {\"... ... 169.0 \n", 547 | "2 [{\"name\": \"Columbia Pictures\", \"id\": 5}, {\"nam... ... 148.0 \n", 548 | "3 [{\"name\": \"Legendary Pictures\", \"id\": 923}, {\"... ... 165.0 \n", 549 | "4 [{\"name\": \"Walt Disney Pictures\", \"id\": 2}] ... 132.0 \n", 550 | "... ... ... ... \n", 551 | "4804 [{\"name\": \"Columbia Pictures\", \"id\": 5}] ... 81.0 \n", 552 | "4805 [] ... 85.0 \n", 553 | "4806 [{\"name\": \"Front Street Pictures\", \"id\": 3958}... ... 120.0 \n", 554 | "4807 [] ... 98.0 \n", 555 | "4808 [{\"name\": \"rusty bear entertainment\", \"id\": 87... ... 90.0 \n", 556 | "\n", 557 | " spoken_languages status \\\n", 558 | "0 [{\"iso_639_1\": \"en\", \"name\": \"English\"}, {\"iso... Released \n", 559 | "1 [{\"iso_639_1\": \"en\", \"name\": \"English\"}] Released \n", 560 | "2 [{\"iso_639_1\": \"fr\", \"name\": \"Fran\\u00e7ais\"},... Released \n", 561 | "3 [{\"iso_639_1\": \"en\", \"name\": \"English\"}] Released \n", 562 | "4 [{\"iso_639_1\": \"en\", \"name\": \"English\"}] Released \n", 563 | "... ... ... \n", 564 | "4804 [{\"iso_639_1\": \"es\", \"name\": \"Espa\\u00f1ol\"}] Released \n", 565 | "4805 [] Released \n", 566 | "4806 [{\"iso_639_1\": \"en\", \"name\": \"English\"}] Released \n", 567 | "4807 [{\"iso_639_1\": \"en\", \"name\": \"English\"}] Released \n", 568 | "4808 [{\"iso_639_1\": \"en\", \"name\": \"English\"}] Released \n", 569 | "\n", 570 | " tagline \\\n", 571 | "0 Enter the World of Pandora. \n", 572 | "1 At the end of the world, the adventure begins. \n", 573 | "2 A Plan No One Escapes \n", 574 | "3 The Legend Ends \n", 575 | "4 Lost in our world, found in another. \n", 576 | "... ... \n", 577 | "4804 He didn't come looking for trouble, but troubl... \n", 578 | "4805 A newlywed couple's honeymoon is upended by th... \n", 579 | "4806 NaN \n", 580 | "4807 A New Yorker in Shanghai \n", 581 | "4808 NaN \n", 582 | "\n", 583 | " title vote_average vote_count \\\n", 584 | "0 Avatar 7.2 11800 \n", 585 | "1 Pirates of the Caribbean: At World's End 6.9 4500 \n", 586 | "2 Spectre 6.3 4466 \n", 587 | "3 The Dark Knight Rises 7.6 9106 \n", 588 | "4 John Carter 6.1 2124 \n", 589 | "... ... ... ... \n", 590 | "4804 El Mariachi 6.6 238 \n", 591 | "4805 Newlyweds 5.9 5 \n", 592 | "4806 Signed, Sealed, Delivered 7.0 6 \n", 593 | "4807 Shanghai Calling 5.7 7 \n", 594 | "4808 My Date with Drew 6.3 16 \n", 595 | "\n", 596 | " movie_id cast \\\n", 597 | "0 19995 [{\"cast_id\": 242, \"character\": \"Jake Sully\", \"... \n", 598 | "1 285 [{\"cast_id\": 4, \"character\": \"Captain Jack Spa... \n", 599 | "2 206647 [{\"cast_id\": 1, \"character\": \"James Bond\", \"cr... \n", 600 | "3 49026 [{\"cast_id\": 2, \"character\": \"Bruce Wayne / Ba... \n", 601 | "4 49529 [{\"cast_id\": 5, \"character\": \"John Carter\", \"c... \n", 602 | "... ... ... \n", 603 | "4804 9367 [{\"cast_id\": 1, \"character\": \"El Mariachi\", \"c... \n", 604 | "4805 72766 [{\"cast_id\": 1, \"character\": \"Buzzy\", \"credit_... \n", 605 | "4806 231617 [{\"cast_id\": 8, \"character\": \"Oliver O\\u2019To... \n", 606 | "4807 126186 [{\"cast_id\": 3, \"character\": \"Sam\", \"credit_id... \n", 607 | "4808 25975 [{\"cast_id\": 3, \"character\": \"Herself\", \"credi... \n", 608 | "\n", 609 | " crew \n", 610 | "0 [{\"credit_id\": \"52fe48009251416c750aca23\", \"de... \n", 611 | "1 [{\"credit_id\": \"52fe4232c3a36847f800b579\", \"de... \n", 612 | "2 [{\"credit_id\": \"54805967c3a36829b5002c41\", \"de... \n", 613 | "3 [{\"credit_id\": \"52fe4781c3a36847f81398c3\", \"de... \n", 614 | "4 [{\"credit_id\": \"52fe479ac3a36847f813eaa3\", \"de... \n", 615 | "... ... \n", 616 | "4804 [{\"credit_id\": \"52fe44eec3a36847f80b280b\", \"de... \n", 617 | "4805 [{\"credit_id\": \"52fe487dc3a368484e0fb013\", \"de... \n", 618 | "4806 [{\"credit_id\": \"52fe4df3c3a36847f8275ecf\", \"de... \n", 619 | "4807 [{\"credit_id\": \"52fe4ad9c3a368484e16a36b\", \"de... \n", 620 | "4808 [{\"credit_id\": \"58ce021b9251415a390165d9\", \"de... \n", 621 | "\n", 622 | "[4809 rows x 23 columns]" 623 | ] 624 | }, 625 | "execution_count": 4, 626 | "metadata": {}, 627 | "output_type": "execute_result" 628 | } 629 | ], 630 | "source": [ 631 | "#as we know that we have two datasets.it's gonna be hectic for dealing with the two datasets so let's merge them and make it one.\n", 632 | "#according to the dataset we can merge them on the basis of the movie_id and title.\n", 633 | "#so let's merge them on the basis title we can merge with title also.\n", 634 | "\n", 635 | "movies.merge(credits,on=\"title\")#kisse.merge(merge karna hai)(kiske sath,on(on the basis of which thing)=title)" 636 | ] 637 | }, 638 | { 639 | "cell_type": "code", 640 | "execution_count": 5, 641 | "id": "69cd5d34", 642 | "metadata": {}, 643 | "outputs": [], 644 | "source": [ 645 | "#movies.merge(credits,on=\"title\").shape#(.shape) is used for giving me total rows and columns.\n", 646 | "# #we get total columns=23 not 24 because ek comman ke base pe humne merge kiya hai that's why.\n", 647 | "# movies=movies.merge(credits,on=\"title\")#esse basically humne yeh kiya hai ki joh hmara new merge dataset hogaa oska naam humne movies he rakhdiya hai.\n", 648 | "# movies.shape\n", 649 | "# credits.shape" 650 | ] 651 | }, 652 | { 653 | "cell_type": "code", 654 | "execution_count": 6, 655 | "id": "021c4cc9", 656 | "metadata": {}, 657 | "outputs": [ 658 | { 659 | "name": "stdout", 660 | "output_type": "stream", 661 | "text": [ 662 | "\n", 663 | "Int64Index: 4809 entries, 0 to 4808\n", 664 | "Data columns (total 23 columns):\n", 665 | " # Column Non-Null Count Dtype \n", 666 | "--- ------ -------------- ----- \n", 667 | " 0 budget 4809 non-null int64 \n", 668 | " 1 genres 4809 non-null object \n", 669 | " 2 homepage 1713 non-null object \n", 670 | " 3 id 4809 non-null int64 \n", 671 | " 4 keywords 4809 non-null object \n", 672 | " 5 original_language 4809 non-null object \n", 673 | " 6 original_title 4809 non-null object \n", 674 | " 7 overview 4806 non-null object \n", 675 | " 8 popularity 4809 non-null float64\n", 676 | " 9 production_companies 4809 non-null object \n", 677 | " 10 production_countries 4809 non-null object \n", 678 | " 11 release_date 4808 non-null object \n", 679 | " 12 revenue 4809 non-null int64 \n", 680 | " 13 runtime 4807 non-null float64\n", 681 | " 14 spoken_languages 4809 non-null object \n", 682 | " 15 status 4809 non-null object \n", 683 | " 16 tagline 3965 non-null object \n", 684 | " 17 title 4809 non-null object \n", 685 | " 18 vote_average 4809 non-null float64\n", 686 | " 19 vote_count 4809 non-null int64 \n", 687 | " 20 movie_id 4809 non-null int64 \n", 688 | " 21 cast 4809 non-null object \n", 689 | " 22 crew 4809 non-null object \n", 690 | "dtypes: float64(3), int64(5), object(15)\n", 691 | "memory usage: 901.7+ KB\n" 692 | ] 693 | } 694 | ], 695 | "source": [ 696 | "movies=movies.merge(credits,on=\"title\")#esse basically humne yeh kiya hai ki joh hmara new merge dataset hogaa oska naam humne movies he rakhdiya hai.\n", 697 | "movies.head()\n", 698 | "movies.info()" 699 | ] 700 | }, 701 | { 702 | "cell_type": "code", 703 | "execution_count": 7, 704 | "id": "180da1f4", 705 | "metadata": {}, 706 | "outputs": [ 707 | { 708 | "data": { 709 | "text/html": [ 710 | "
\n", 711 | "\n", 724 | "\n", 725 | " \n", 726 | " \n", 727 | " \n", 728 | " \n", 729 | " \n", 730 | " \n", 731 | " \n", 732 | " \n", 733 | " \n", 734 | " \n", 735 | " \n", 736 | " \n", 737 | " \n", 738 | " \n", 739 | " \n", 740 | " \n", 741 | " \n", 742 | " \n", 743 | " \n", 744 | " \n", 745 | " \n", 746 | " \n", 747 | " \n", 748 | " \n", 749 | " \n", 750 | " \n", 751 | " \n", 752 | " \n", 753 | " \n", 754 | " \n", 755 | " \n", 756 | " \n", 757 | " \n", 758 | " \n", 759 | " \n", 760 | " \n", 761 | " \n", 762 | " \n", 763 | " \n", 764 | " \n", 765 | " \n", 766 | " \n", 767 | " \n", 768 | " \n", 769 | " \n", 770 | " \n", 771 | " \n", 772 | " \n", 773 | " \n", 774 | " \n", 775 | " \n", 776 | " \n", 777 | " \n", 778 | " \n", 779 | " \n", 780 | " \n", 781 | " \n", 782 | " \n", 783 | " \n", 784 | " \n", 785 | " \n", 786 | " \n", 787 | " \n", 788 | " \n", 789 | "
movie_idtitleoverviewgenreskeywordscastcrew
019995AvatarIn the 22nd century, a paraplegic Marine is di...[{\"id\": 28, \"name\": \"Action\"}, {\"id\": 12, \"nam...[{\"id\": 1463, \"name\": \"culture clash\"}, {\"id\":...[{\"cast_id\": 242, \"character\": \"Jake Sully\", \"...[{\"credit_id\": \"52fe48009251416c750aca23\", \"de...
1285Pirates of the Caribbean: At World's EndCaptain Barbossa, long believed to be dead, ha...[{\"id\": 12, \"name\": \"Adventure\"}, {\"id\": 14, \"...[{\"id\": 270, \"name\": \"ocean\"}, {\"id\": 726, \"na...[{\"cast_id\": 4, \"character\": \"Captain Jack Spa...[{\"credit_id\": \"52fe4232c3a36847f800b579\", \"de...
2206647SpectreA cryptic message from Bond’s past sends him o...[{\"id\": 28, \"name\": \"Action\"}, {\"id\": 12, \"nam...[{\"id\": 470, \"name\": \"spy\"}, {\"id\": 818, \"name...[{\"cast_id\": 1, \"character\": \"James Bond\", \"cr...[{\"credit_id\": \"54805967c3a36829b5002c41\", \"de...
349026The Dark Knight RisesFollowing the death of District Attorney Harve...[{\"id\": 28, \"name\": \"Action\"}, {\"id\": 80, \"nam...[{\"id\": 849, \"name\": \"dc comics\"}, {\"id\": 853,...[{\"cast_id\": 2, \"character\": \"Bruce Wayne / Ba...[{\"credit_id\": \"52fe4781c3a36847f81398c3\", \"de...
449529John CarterJohn Carter is a war-weary, former military ca...[{\"id\": 28, \"name\": \"Action\"}, {\"id\": 12, \"nam...[{\"id\": 818, \"name\": \"based on novel\"}, {\"id\":...[{\"cast_id\": 5, \"character\": \"John Carter\", \"c...[{\"credit_id\": \"52fe479ac3a36847f813eaa3\", \"de...
\n", 790 | "
" 791 | ], 792 | "text/plain": [ 793 | " movie_id title \\\n", 794 | "0 19995 Avatar \n", 795 | "1 285 Pirates of the Caribbean: At World's End \n", 796 | "2 206647 Spectre \n", 797 | "3 49026 The Dark Knight Rises \n", 798 | "4 49529 John Carter \n", 799 | "\n", 800 | " overview \\\n", 801 | "0 In the 22nd century, a paraplegic Marine is di... \n", 802 | "1 Captain Barbossa, long believed to be dead, ha... \n", 803 | "2 A cryptic message from Bond’s past sends him o... \n", 804 | "3 Following the death of District Attorney Harve... \n", 805 | "4 John Carter is a war-weary, former military ca... \n", 806 | "\n", 807 | " genres \\\n", 808 | "0 [{\"id\": 28, \"name\": \"Action\"}, {\"id\": 12, \"nam... \n", 809 | "1 [{\"id\": 12, \"name\": \"Adventure\"}, {\"id\": 14, \"... \n", 810 | "2 [{\"id\": 28, \"name\": \"Action\"}, {\"id\": 12, \"nam... \n", 811 | "3 [{\"id\": 28, \"name\": \"Action\"}, {\"id\": 80, \"nam... \n", 812 | "4 [{\"id\": 28, \"name\": \"Action\"}, {\"id\": 12, \"nam... \n", 813 | "\n", 814 | " keywords \\\n", 815 | "0 [{\"id\": 1463, \"name\": \"culture clash\"}, {\"id\":... \n", 816 | "1 [{\"id\": 270, \"name\": \"ocean\"}, {\"id\": 726, \"na... \n", 817 | "2 [{\"id\": 470, \"name\": \"spy\"}, {\"id\": 818, \"name... \n", 818 | "3 [{\"id\": 849, \"name\": \"dc comics\"}, {\"id\": 853,... \n", 819 | "4 [{\"id\": 818, \"name\": \"based on novel\"}, {\"id\":... \n", 820 | "\n", 821 | " cast \\\n", 822 | "0 [{\"cast_id\": 242, \"character\": \"Jake Sully\", \"... \n", 823 | "1 [{\"cast_id\": 4, \"character\": \"Captain Jack Spa... \n", 824 | "2 [{\"cast_id\": 1, \"character\": \"James Bond\", \"cr... \n", 825 | "3 [{\"cast_id\": 2, \"character\": \"Bruce Wayne / Ba... \n", 826 | "4 [{\"cast_id\": 5, \"character\": \"John Carter\", \"c... \n", 827 | "\n", 828 | " crew \n", 829 | "0 [{\"credit_id\": \"52fe48009251416c750aca23\", \"de... \n", 830 | "1 [{\"credit_id\": \"52fe4232c3a36847f800b579\", \"de... \n", 831 | "2 [{\"credit_id\": \"54805967c3a36829b5002c41\", \"de... \n", 832 | "3 [{\"credit_id\": \"52fe4781c3a36847f81398c3\", \"de... \n", 833 | "4 [{\"credit_id\": \"52fe479ac3a36847f813eaa3\", \"de... " 834 | ] 835 | }, 836 | "execution_count": 7, 837 | "metadata": {}, 838 | "output_type": "execute_result" 839 | } 840 | ], 841 | "source": [ 842 | "#now we gonna remove that columns which are not used in our analysis.matlab onko htdenge jinke basis pe hum nhi decide karenge ki yeh movie dekhni hai ya koi si or.\n", 843 | "\n", 844 | "#the columns which gonna include basically they helps us the most to determine about the movie which we are watching: -\n", 845 | "#we are not taking the columns which numeric values.eg(popularity,vote count and all the numeric base columns)\n", 846 | "#[genres]\n", 847 | "#[id] : - because at the end this gonna help us for fetching the posters of the movie from the sites.\n", 848 | "#[keywords]\n", 849 | "#original title or title you can chose anyone but we chose [title] because on the base of title we merge the two dataframes.\n", 850 | "#overview\n", 851 | "#cast\n", 852 | "#crew\n", 853 | "movies=movies[['movie_id','title','overview','genres','keywords','cast','crew']]#now this is our main columns on which we are gonna start to work on.\n", 854 | "#movies.info()\n", 855 | "movies.head()" 856 | ] 857 | }, 858 | { 859 | "cell_type": "code", 860 | "execution_count": 8, 861 | "id": "798de9e0", 862 | "metadata": {}, 863 | "outputs": [ 864 | { 865 | "data": { 866 | "text/plain": [ 867 | "movie_id 0\n", 868 | "title 0\n", 869 | "overview 3\n", 870 | "genres 0\n", 871 | "keywords 0\n", 872 | "cast 0\n", 873 | "crew 0\n", 874 | "dtype: int64" 875 | ] 876 | }, 877 | "execution_count": 8, 878 | "metadata": {}, 879 | "output_type": "execute_result" 880 | } 881 | ], 882 | "source": [ 883 | "#now we are going to make a new dataframe which contains three columns now: - movie_id,title and tag.\n", 884 | "#tag column is made up of overview,genres,cast,crew and keywords.\n", 885 | "#merge aesse hogaa ki phele toh genres se lekar cast crew valle column tak saroo ko conjusted se simple form mai lana then on sab ko overview mai merge kar ke ek paragraph type bnana hai.\n", 886 | "#for all this some data preprocessing techniques are used like any missing data is present or not and other things.\n", 887 | "\n", 888 | "#first we are checking whether there is a missing data or duplicate data present.\n", 889 | "#for missing data: - \n", 890 | "movies.isnull().sum()#The method movies.isnull().sum() is used in Python with pandas DataFrames to count the number of missing (NaN or None) values in each column of the DataFrame movies.\n", 891 | "#according to output we have 3 columnbs in overview column whose value are unknown." 892 | ] 893 | }, 894 | { 895 | "cell_type": "code", 896 | "execution_count": 9, 897 | "id": "72194c75", 898 | "metadata": {}, 899 | "outputs": [], 900 | "source": [ 901 | "movies.dropna(inplace=True)#The movies.dropna(inplace=True) method in Python, when applied to a pandas DataFrame, is used to remove rows that contain missing (NaN or None) values from the DataFrame.\n", 902 | "#oper valla method humne eslie use kiya kyu ki humme missing valkues htane thae joh ki bhut kum thae=3 agr jyada hotti then we do something but as they are very few so there is no need." 903 | ] 904 | }, 905 | { 906 | "cell_type": "code", 907 | "execution_count": 10, 908 | "id": "3fa05e83", 909 | "metadata": {}, 910 | "outputs": [ 911 | { 912 | "data": { 913 | "text/plain": [ 914 | "0" 915 | ] 916 | }, 917 | "execution_count": 10, 918 | "metadata": {}, 919 | "output_type": "execute_result" 920 | } 921 | ], 922 | "source": [ 923 | "#now we gonna whether there is a duplicated column or not: -\n", 924 | "movies.duplicated().sum()" 925 | ] 926 | }, 927 | { 928 | "cell_type": "code", 929 | "execution_count": 11, 930 | "id": "06e45a3f", 931 | "metadata": {}, 932 | "outputs": [ 933 | { 934 | "data": { 935 | "text/plain": [ 936 | "'[{\"id\": 28, \"name\": \"Action\"}, {\"id\": 12, \"name\": \"Adventure\"}, {\"id\": 14, \"name\": \"Fantasy\"}, {\"id\": 878, \"name\": \"Science Fiction\"}]'" 937 | ] 938 | }, 939 | "execution_count": 11, 940 | "metadata": {}, 941 | "output_type": "execute_result" 942 | } 943 | ], 944 | "source": [ 945 | "#now we are gonna make the columns to come into right format: -\n", 946 | "#firstly we start with genres: -\n", 947 | "movies.iloc[0].genres#The movies.iloc[0].genres expression is used in Python with pandas DataFrames to access the value(s) in the \"genres\" column of the DataFrame for the first row (index 0)." 948 | ] 949 | }, 950 | { 951 | "cell_type": "code", 952 | "execution_count": 12, 953 | "id": "0f34f03f", 954 | "metadata": {}, 955 | "outputs": [ 956 | { 957 | "data": { 958 | "text/html": [ 959 | "
\n", 960 | "\n", 973 | "\n", 974 | " \n", 975 | " \n", 976 | " \n", 977 | " \n", 978 | " \n", 979 | " \n", 980 | " \n", 981 | " \n", 982 | " \n", 983 | " \n", 984 | " \n", 985 | " \n", 986 | " \n", 987 | " \n", 988 | " \n", 989 | " \n", 990 | " \n", 991 | " \n", 992 | " \n", 993 | " \n", 994 | " \n", 995 | " \n", 996 | " \n", 997 | " \n", 998 | " \n", 999 | " \n", 1000 | " \n", 1001 | " \n", 1002 | " \n", 1003 | " \n", 1004 | " \n", 1005 | " \n", 1006 | " \n", 1007 | " \n", 1008 | " \n", 1009 | " \n", 1010 | " \n", 1011 | " \n", 1012 | " \n", 1013 | " \n", 1014 | " \n", 1015 | " \n", 1016 | " \n", 1017 | " \n", 1018 | " \n", 1019 | " \n", 1020 | " \n", 1021 | " \n", 1022 | " \n", 1023 | " \n", 1024 | " \n", 1025 | " \n", 1026 | " \n", 1027 | " \n", 1028 | " \n", 1029 | " \n", 1030 | " \n", 1031 | " \n", 1032 | " \n", 1033 | " \n", 1034 | " \n", 1035 | " \n", 1036 | " \n", 1037 | " \n", 1038 | "
movie_idtitleoverviewgenreskeywordscastcrew
019995AvatarIn the 22nd century, a paraplegic Marine is di...[Action, Adventure, Fantasy, Science Fiction][{\"id\": 1463, \"name\": \"culture clash\"}, {\"id\":...[{\"cast_id\": 242, \"character\": \"Jake Sully\", \"...[{\"credit_id\": \"52fe48009251416c750aca23\", \"de...
1285Pirates of the Caribbean: At World's EndCaptain Barbossa, long believed to be dead, ha...[Adventure, Fantasy, Action][{\"id\": 270, \"name\": \"ocean\"}, {\"id\": 726, \"na...[{\"cast_id\": 4, \"character\": \"Captain Jack Spa...[{\"credit_id\": \"52fe4232c3a36847f800b579\", \"de...
2206647SpectreA cryptic message from Bond’s past sends him o...[Action, Adventure, Crime][{\"id\": 470, \"name\": \"spy\"}, {\"id\": 818, \"name...[{\"cast_id\": 1, \"character\": \"James Bond\", \"cr...[{\"credit_id\": \"54805967c3a36829b5002c41\", \"de...
349026The Dark Knight RisesFollowing the death of District Attorney Harve...[Action, Crime, Drama, Thriller][{\"id\": 849, \"name\": \"dc comics\"}, {\"id\": 853,...[{\"cast_id\": 2, \"character\": \"Bruce Wayne / Ba...[{\"credit_id\": \"52fe4781c3a36847f81398c3\", \"de...
449529John CarterJohn Carter is a war-weary, former military ca...[Action, Adventure, Science Fiction][{\"id\": 818, \"name\": \"based on novel\"}, {\"id\":...[{\"cast_id\": 5, \"character\": \"John Carter\", \"c...[{\"credit_id\": \"52fe479ac3a36847f813eaa3\", \"de...
\n", 1039 | "
" 1040 | ], 1041 | "text/plain": [ 1042 | " movie_id title \\\n", 1043 | "0 19995 Avatar \n", 1044 | "1 285 Pirates of the Caribbean: At World's End \n", 1045 | "2 206647 Spectre \n", 1046 | "3 49026 The Dark Knight Rises \n", 1047 | "4 49529 John Carter \n", 1048 | "\n", 1049 | " overview \\\n", 1050 | "0 In the 22nd century, a paraplegic Marine is di... \n", 1051 | "1 Captain Barbossa, long believed to be dead, ha... \n", 1052 | "2 A cryptic message from Bond’s past sends him o... \n", 1053 | "3 Following the death of District Attorney Harve... \n", 1054 | "4 John Carter is a war-weary, former military ca... \n", 1055 | "\n", 1056 | " genres \\\n", 1057 | "0 [Action, Adventure, Fantasy, Science Fiction] \n", 1058 | "1 [Adventure, Fantasy, Action] \n", 1059 | "2 [Action, Adventure, Crime] \n", 1060 | "3 [Action, Crime, Drama, Thriller] \n", 1061 | "4 [Action, Adventure, Science Fiction] \n", 1062 | "\n", 1063 | " keywords \\\n", 1064 | "0 [{\"id\": 1463, \"name\": \"culture clash\"}, {\"id\":... \n", 1065 | "1 [{\"id\": 270, \"name\": \"ocean\"}, {\"id\": 726, \"na... \n", 1066 | "2 [{\"id\": 470, \"name\": \"spy\"}, {\"id\": 818, \"name... \n", 1067 | "3 [{\"id\": 849, \"name\": \"dc comics\"}, {\"id\": 853,... \n", 1068 | "4 [{\"id\": 818, \"name\": \"based on novel\"}, {\"id\":... \n", 1069 | "\n", 1070 | " cast \\\n", 1071 | "0 [{\"cast_id\": 242, \"character\": \"Jake Sully\", \"... \n", 1072 | "1 [{\"cast_id\": 4, \"character\": \"Captain Jack Spa... \n", 1073 | "2 [{\"cast_id\": 1, \"character\": \"James Bond\", \"cr... \n", 1074 | "3 [{\"cast_id\": 2, \"character\": \"Bruce Wayne / Ba... \n", 1075 | "4 [{\"cast_id\": 5, \"character\": \"John Carter\", \"c... \n", 1076 | "\n", 1077 | " crew \n", 1078 | "0 [{\"credit_id\": \"52fe48009251416c750aca23\", \"de... \n", 1079 | "1 [{\"credit_id\": \"52fe4232c3a36847f800b579\", \"de... \n", 1080 | "2 [{\"credit_id\": \"54805967c3a36829b5002c41\", \"de... \n", 1081 | "3 [{\"credit_id\": \"52fe4781c3a36847f81398c3\", \"de... \n", 1082 | "4 [{\"credit_id\": \"52fe479ac3a36847f813eaa3\", \"de... " 1083 | ] 1084 | }, 1085 | "execution_count": 12, 1086 | "metadata": {}, 1087 | "output_type": "execute_result" 1088 | } 1089 | ], 1090 | "source": [ 1091 | "#'[{\"id\": 28, \"name\": \"Action\"}, {\"id\": 12, \"name\": \"Adventure\"}, {\"id\": 14, \"name\": \"Fantasy\"}, {\"id\": 878, \"name\": \"Science Fiction\"}] \n", 1092 | "#hum chahte hai ki oper vala column simplify hokee kuch aessa dikhe: -{action,adventure,fantasy,science fiction} yeh ek tarah se sirf genres dekhaega movie ke.\n", 1093 | "#for this we gonna take the help of the helper function.\n", 1094 | "import ast\n", 1095 | "def convert(obj):\n", 1096 | " L=[]\n", 1097 | " for i in ast.literal_eval(obj):#with the help of this method humme hmari list mill jaegi from string form.\n", 1098 | " #ast.literal_eval(obj) is a method in Python provided by the ast (Abstract Syntax Tree) module. It is used to safely evaluate a string containing a single Python literal (i.e., a value that can be expressed directly in Python code) and return the corresponding Python object.\n", 1099 | " L.append(i['name'])#basically humme sirf genres nikalne hai os puri dectionary mai se\n", 1100 | " return L\n", 1101 | "movies['genres']=movies['genres'].apply(convert)\n", 1102 | "movies.head()" 1103 | ] 1104 | }, 1105 | { 1106 | "cell_type": "code", 1107 | "execution_count": 13, 1108 | "id": "a5e212b7", 1109 | "metadata": {}, 1110 | "outputs": [ 1111 | { 1112 | "data": { 1113 | "text/html": [ 1114 | "
\n", 1115 | "\n", 1128 | "\n", 1129 | " \n", 1130 | " \n", 1131 | " \n", 1132 | " \n", 1133 | " \n", 1134 | " \n", 1135 | " \n", 1136 | " \n", 1137 | " \n", 1138 | " \n", 1139 | " \n", 1140 | " \n", 1141 | " \n", 1142 | " \n", 1143 | " \n", 1144 | " \n", 1145 | " \n", 1146 | " \n", 1147 | " \n", 1148 | " \n", 1149 | " \n", 1150 | " \n", 1151 | " \n", 1152 | " \n", 1153 | " \n", 1154 | " \n", 1155 | " \n", 1156 | " \n", 1157 | " \n", 1158 | " \n", 1159 | " \n", 1160 | " \n", 1161 | " \n", 1162 | " \n", 1163 | " \n", 1164 | " \n", 1165 | " \n", 1166 | " \n", 1167 | " \n", 1168 | " \n", 1169 | " \n", 1170 | " \n", 1171 | " \n", 1172 | " \n", 1173 | " \n", 1174 | " \n", 1175 | " \n", 1176 | " \n", 1177 | " \n", 1178 | " \n", 1179 | " \n", 1180 | " \n", 1181 | " \n", 1182 | " \n", 1183 | " \n", 1184 | " \n", 1185 | " \n", 1186 | " \n", 1187 | " \n", 1188 | " \n", 1189 | " \n", 1190 | " \n", 1191 | " \n", 1192 | " \n", 1193 | "
movie_idtitleoverviewgenreskeywordscastcrew
019995AvatarIn the 22nd century, a paraplegic Marine is di...[Action, Adventure, Fantasy, Science Fiction][culture clash, future, space war, space colon...[{\"cast_id\": 242, \"character\": \"Jake Sully\", \"...[{\"credit_id\": \"52fe48009251416c750aca23\", \"de...
1285Pirates of the Caribbean: At World's EndCaptain Barbossa, long believed to be dead, ha...[Adventure, Fantasy, Action][ocean, drug abuse, exotic island, east india ...[{\"cast_id\": 4, \"character\": \"Captain Jack Spa...[{\"credit_id\": \"52fe4232c3a36847f800b579\", \"de...
2206647SpectreA cryptic message from Bond’s past sends him o...[Action, Adventure, Crime][spy, based on novel, secret agent, sequel, mi...[{\"cast_id\": 1, \"character\": \"James Bond\", \"cr...[{\"credit_id\": \"54805967c3a36829b5002c41\", \"de...
349026The Dark Knight RisesFollowing the death of District Attorney Harve...[Action, Crime, Drama, Thriller][dc comics, crime fighter, terrorist, secret i...[{\"cast_id\": 2, \"character\": \"Bruce Wayne / Ba...[{\"credit_id\": \"52fe4781c3a36847f81398c3\", \"de...
449529John CarterJohn Carter is a war-weary, former military ca...[Action, Adventure, Science Fiction][based on novel, mars, medallion, space travel...[{\"cast_id\": 5, \"character\": \"John Carter\", \"c...[{\"credit_id\": \"52fe479ac3a36847f813eaa3\", \"de...
\n", 1194 | "
" 1195 | ], 1196 | "text/plain": [ 1197 | " movie_id title \\\n", 1198 | "0 19995 Avatar \n", 1199 | "1 285 Pirates of the Caribbean: At World's End \n", 1200 | "2 206647 Spectre \n", 1201 | "3 49026 The Dark Knight Rises \n", 1202 | "4 49529 John Carter \n", 1203 | "\n", 1204 | " overview \\\n", 1205 | "0 In the 22nd century, a paraplegic Marine is di... \n", 1206 | "1 Captain Barbossa, long believed to be dead, ha... \n", 1207 | "2 A cryptic message from Bond’s past sends him o... \n", 1208 | "3 Following the death of District Attorney Harve... \n", 1209 | "4 John Carter is a war-weary, former military ca... \n", 1210 | "\n", 1211 | " genres \\\n", 1212 | "0 [Action, Adventure, Fantasy, Science Fiction] \n", 1213 | "1 [Adventure, Fantasy, Action] \n", 1214 | "2 [Action, Adventure, Crime] \n", 1215 | "3 [Action, Crime, Drama, Thriller] \n", 1216 | "4 [Action, Adventure, Science Fiction] \n", 1217 | "\n", 1218 | " keywords \\\n", 1219 | "0 [culture clash, future, space war, space colon... \n", 1220 | "1 [ocean, drug abuse, exotic island, east india ... \n", 1221 | "2 [spy, based on novel, secret agent, sequel, mi... \n", 1222 | "3 [dc comics, crime fighter, terrorist, secret i... \n", 1223 | "4 [based on novel, mars, medallion, space travel... \n", 1224 | "\n", 1225 | " cast \\\n", 1226 | "0 [{\"cast_id\": 242, \"character\": \"Jake Sully\", \"... \n", 1227 | "1 [{\"cast_id\": 4, \"character\": \"Captain Jack Spa... \n", 1228 | "2 [{\"cast_id\": 1, \"character\": \"James Bond\", \"cr... \n", 1229 | "3 [{\"cast_id\": 2, \"character\": \"Bruce Wayne / Ba... \n", 1230 | "4 [{\"cast_id\": 5, \"character\": \"John Carter\", \"c... \n", 1231 | "\n", 1232 | " crew \n", 1233 | "0 [{\"credit_id\": \"52fe48009251416c750aca23\", \"de... \n", 1234 | "1 [{\"credit_id\": \"52fe4232c3a36847f800b579\", \"de... \n", 1235 | "2 [{\"credit_id\": \"54805967c3a36829b5002c41\", \"de... \n", 1236 | "3 [{\"credit_id\": \"52fe4781c3a36847f81398c3\", \"de... \n", 1237 | "4 [{\"credit_id\": \"52fe479ac3a36847f813eaa3\", \"de... " 1238 | ] 1239 | }, 1240 | "execution_count": 13, 1241 | "metadata": {}, 1242 | "output_type": "execute_result" 1243 | } 1244 | ], 1245 | "source": [ 1246 | "#aesse he simplify karenge hum ab: - keywords column ko\n", 1247 | "movies['keywords']=movies['keywords'].apply(convert)\n", 1248 | "movies.head()" 1249 | ] 1250 | }, 1251 | { 1252 | "cell_type": "code", 1253 | "execution_count": 14, 1254 | "id": "2a147aa2", 1255 | "metadata": {}, 1256 | "outputs": [], 1257 | "source": [ 1258 | "#hum same oper valle chizz karenge cast ko simplify karne ke liye or es ke ander bhi hum sirf phele 3 or main actors he dekhaenge hur movie ke liye.\n", 1259 | "def convert3(obj):\n", 1260 | " L=[]\n", 1261 | " counter=0\n", 1262 | " for i in ast.literal_eval(obj):#with the help of this method humme hmari list mill jaegi from string form.\n", 1263 | " #ast.literal_eval(obj) is a method in Python provided by the ast (Abstract Syntax Tree) module. It is used to safely evaluate a string containing a single Python literal (i.e., a value that can be expressed directly in Python code) and return the corresponding Python object.\n", 1264 | " if counter!=3:\n", 1265 | " L.append(i['name'])#basically humme sirf genres nikalne hai os puri dectionary mai se\n", 1266 | " counter+=1\n", 1267 | " else:\n", 1268 | " break\n", 1269 | " return L\n", 1270 | "#above whole function is just for extracting the first three main heroes from the cast column" 1271 | ] 1272 | }, 1273 | { 1274 | "cell_type": "code", 1275 | "execution_count": 15, 1276 | "id": "d0fb5c6f", 1277 | "metadata": {}, 1278 | "outputs": [ 1279 | { 1280 | "data": { 1281 | "text/html": [ 1282 | "
\n", 1283 | "\n", 1296 | "\n", 1297 | " \n", 1298 | " \n", 1299 | " \n", 1300 | " \n", 1301 | " \n", 1302 | " \n", 1303 | " \n", 1304 | " \n", 1305 | " \n", 1306 | " \n", 1307 | " \n", 1308 | " \n", 1309 | " \n", 1310 | " \n", 1311 | " \n", 1312 | " \n", 1313 | " \n", 1314 | " \n", 1315 | " \n", 1316 | " \n", 1317 | " \n", 1318 | " \n", 1319 | " \n", 1320 | " \n", 1321 | " \n", 1322 | " \n", 1323 | " \n", 1324 | " \n", 1325 | " \n", 1326 | " \n", 1327 | " \n", 1328 | " \n", 1329 | " \n", 1330 | " \n", 1331 | " \n", 1332 | " \n", 1333 | " \n", 1334 | " \n", 1335 | " \n", 1336 | " \n", 1337 | " \n", 1338 | " \n", 1339 | " \n", 1340 | " \n", 1341 | " \n", 1342 | " \n", 1343 | " \n", 1344 | " \n", 1345 | " \n", 1346 | " \n", 1347 | " \n", 1348 | " \n", 1349 | " \n", 1350 | " \n", 1351 | " \n", 1352 | " \n", 1353 | " \n", 1354 | " \n", 1355 | " \n", 1356 | " \n", 1357 | " \n", 1358 | " \n", 1359 | " \n", 1360 | " \n", 1361 | "
movie_idtitleoverviewgenreskeywordscastcrew
019995AvatarIn the 22nd century, a paraplegic Marine is di...[Action, Adventure, Fantasy, Science Fiction][culture clash, future, space war, space colon...[Sam Worthington, Zoe Saldana, Sigourney Weaver][{\"credit_id\": \"52fe48009251416c750aca23\", \"de...
1285Pirates of the Caribbean: At World's EndCaptain Barbossa, long believed to be dead, ha...[Adventure, Fantasy, Action][ocean, drug abuse, exotic island, east india ...[Johnny Depp, Orlando Bloom, Keira Knightley][{\"credit_id\": \"52fe4232c3a36847f800b579\", \"de...
2206647SpectreA cryptic message from Bond’s past sends him o...[Action, Adventure, Crime][spy, based on novel, secret agent, sequel, mi...[Daniel Craig, Christoph Waltz, Léa Seydoux][{\"credit_id\": \"54805967c3a36829b5002c41\", \"de...
349026The Dark Knight RisesFollowing the death of District Attorney Harve...[Action, Crime, Drama, Thriller][dc comics, crime fighter, terrorist, secret i...[Christian Bale, Michael Caine, Gary Oldman][{\"credit_id\": \"52fe4781c3a36847f81398c3\", \"de...
449529John CarterJohn Carter is a war-weary, former military ca...[Action, Adventure, Science Fiction][based on novel, mars, medallion, space travel...[Taylor Kitsch, Lynn Collins, Samantha Morton][{\"credit_id\": \"52fe479ac3a36847f813eaa3\", \"de...
\n", 1362 | "
" 1363 | ], 1364 | "text/plain": [ 1365 | " movie_id title \\\n", 1366 | "0 19995 Avatar \n", 1367 | "1 285 Pirates of the Caribbean: At World's End \n", 1368 | "2 206647 Spectre \n", 1369 | "3 49026 The Dark Knight Rises \n", 1370 | "4 49529 John Carter \n", 1371 | "\n", 1372 | " overview \\\n", 1373 | "0 In the 22nd century, a paraplegic Marine is di... \n", 1374 | "1 Captain Barbossa, long believed to be dead, ha... \n", 1375 | "2 A cryptic message from Bond’s past sends him o... \n", 1376 | "3 Following the death of District Attorney Harve... \n", 1377 | "4 John Carter is a war-weary, former military ca... \n", 1378 | "\n", 1379 | " genres \\\n", 1380 | "0 [Action, Adventure, Fantasy, Science Fiction] \n", 1381 | "1 [Adventure, Fantasy, Action] \n", 1382 | "2 [Action, Adventure, Crime] \n", 1383 | "3 [Action, Crime, Drama, Thriller] \n", 1384 | "4 [Action, Adventure, Science Fiction] \n", 1385 | "\n", 1386 | " keywords \\\n", 1387 | "0 [culture clash, future, space war, space colon... \n", 1388 | "1 [ocean, drug abuse, exotic island, east india ... \n", 1389 | "2 [spy, based on novel, secret agent, sequel, mi... \n", 1390 | "3 [dc comics, crime fighter, terrorist, secret i... \n", 1391 | "4 [based on novel, mars, medallion, space travel... \n", 1392 | "\n", 1393 | " cast \\\n", 1394 | "0 [Sam Worthington, Zoe Saldana, Sigourney Weaver] \n", 1395 | "1 [Johnny Depp, Orlando Bloom, Keira Knightley] \n", 1396 | "2 [Daniel Craig, Christoph Waltz, Léa Seydoux] \n", 1397 | "3 [Christian Bale, Michael Caine, Gary Oldman] \n", 1398 | "4 [Taylor Kitsch, Lynn Collins, Samantha Morton] \n", 1399 | "\n", 1400 | " crew \n", 1401 | "0 [{\"credit_id\": \"52fe48009251416c750aca23\", \"de... \n", 1402 | "1 [{\"credit_id\": \"52fe4232c3a36847f800b579\", \"de... \n", 1403 | "2 [{\"credit_id\": \"54805967c3a36829b5002c41\", \"de... \n", 1404 | "3 [{\"credit_id\": \"52fe4781c3a36847f81398c3\", \"de... \n", 1405 | "4 [{\"credit_id\": \"52fe479ac3a36847f813eaa3\", \"de... " 1406 | ] 1407 | }, 1408 | "execution_count": 15, 1409 | "metadata": {}, 1410 | "output_type": "execute_result" 1411 | } 1412 | ], 1413 | "source": [ 1414 | "movies['cast']=movies['cast'].apply(convert3)\n", 1415 | "movies.head()" 1416 | ] 1417 | }, 1418 | { 1419 | "cell_type": "code", 1420 | "execution_count": 16, 1421 | "id": "c50971f1", 1422 | "metadata": {}, 1423 | "outputs": [ 1424 | { 1425 | "name": "stdout", 1426 | "output_type": "stream", 1427 | "text": [ 1428 | "\n", 1429 | "Int64Index: 4806 entries, 0 to 4808\n", 1430 | "Data columns (total 7 columns):\n", 1431 | " # Column Non-Null Count Dtype \n", 1432 | "--- ------ -------------- ----- \n", 1433 | " 0 movie_id 4806 non-null int64 \n", 1434 | " 1 title 4806 non-null object\n", 1435 | " 2 overview 4806 non-null object\n", 1436 | " 3 genres 4806 non-null object\n", 1437 | " 4 keywords 4806 non-null object\n", 1438 | " 5 cast 4806 non-null object\n", 1439 | " 6 crew 4806 non-null object\n", 1440 | "dtypes: int64(1), object(6)\n", 1441 | "memory usage: 300.4+ KB\n" 1442 | ] 1443 | } 1444 | ], 1445 | "source": [ 1446 | "movies.info()" 1447 | ] 1448 | }, 1449 | { 1450 | "cell_type": "code", 1451 | "execution_count": 17, 1452 | "id": "5134194b", 1453 | "metadata": {}, 1454 | "outputs": [], 1455 | "source": [ 1456 | "#now we gonna simplify the last column which isd crew and in that also we want to extract the director name column only becasue we wqant only director to be reflected in that crew whole column.\n", 1457 | "def fetch_director(obj):\n", 1458 | " L=[]\n", 1459 | " for i in ast.literal_eval(obj):#with the help of this method humme hmari list mill jaegi from string form.\n", 1460 | " if i['job']=='Director': \n", 1461 | " #ast.literal_eval(obj) is a method in Python provided by the ast (Abstract Syntax Tree) module. It is used to safely evaluate a string containing a single Python literal (i.e., a value that can be expressed directly in Python code) and return the corresponding Python object.\n", 1462 | " L.append(i['name'])#basically humme sirf genres nikalne hai os puri dectionary mai se\n", 1463 | " break\n", 1464 | " return L" 1465 | ] 1466 | }, 1467 | { 1468 | "cell_type": "code", 1469 | "execution_count": 18, 1470 | "id": "3ad71954", 1471 | "metadata": {}, 1472 | "outputs": [ 1473 | { 1474 | "data": { 1475 | "text/html": [ 1476 | "
\n", 1477 | "\n", 1490 | "\n", 1491 | " \n", 1492 | " \n", 1493 | " \n", 1494 | " \n", 1495 | " \n", 1496 | " \n", 1497 | " \n", 1498 | " \n", 1499 | " \n", 1500 | " \n", 1501 | " \n", 1502 | " \n", 1503 | " \n", 1504 | " \n", 1505 | " \n", 1506 | " \n", 1507 | " \n", 1508 | " \n", 1509 | " \n", 1510 | " \n", 1511 | " \n", 1512 | " \n", 1513 | " \n", 1514 | " \n", 1515 | " \n", 1516 | " \n", 1517 | " \n", 1518 | " \n", 1519 | " \n", 1520 | " \n", 1521 | " \n", 1522 | " \n", 1523 | " \n", 1524 | " \n", 1525 | " \n", 1526 | " \n", 1527 | " \n", 1528 | " \n", 1529 | " \n", 1530 | " \n", 1531 | " \n", 1532 | " \n", 1533 | " \n", 1534 | " \n", 1535 | " \n", 1536 | " \n", 1537 | " \n", 1538 | " \n", 1539 | " \n", 1540 | " \n", 1541 | " \n", 1542 | " \n", 1543 | " \n", 1544 | " \n", 1545 | " \n", 1546 | " \n", 1547 | " \n", 1548 | " \n", 1549 | " \n", 1550 | " \n", 1551 | " \n", 1552 | " \n", 1553 | " \n", 1554 | " \n", 1555 | "
movie_idtitleoverviewgenreskeywordscastcrew
019995AvatarIn the 22nd century, a paraplegic Marine is di...[Action, Adventure, Fantasy, Science Fiction][culture clash, future, space war, space colon...[Sam Worthington, Zoe Saldana, Sigourney Weaver][James Cameron]
1285Pirates of the Caribbean: At World's EndCaptain Barbossa, long believed to be dead, ha...[Adventure, Fantasy, Action][ocean, drug abuse, exotic island, east india ...[Johnny Depp, Orlando Bloom, Keira Knightley][Gore Verbinski]
2206647SpectreA cryptic message from Bond’s past sends him o...[Action, Adventure, Crime][spy, based on novel, secret agent, sequel, mi...[Daniel Craig, Christoph Waltz, Léa Seydoux][Sam Mendes]
349026The Dark Knight RisesFollowing the death of District Attorney Harve...[Action, Crime, Drama, Thriller][dc comics, crime fighter, terrorist, secret i...[Christian Bale, Michael Caine, Gary Oldman][Christopher Nolan]
449529John CarterJohn Carter is a war-weary, former military ca...[Action, Adventure, Science Fiction][based on novel, mars, medallion, space travel...[Taylor Kitsch, Lynn Collins, Samantha Morton][Andrew Stanton]
\n", 1556 | "
" 1557 | ], 1558 | "text/plain": [ 1559 | " movie_id title \\\n", 1560 | "0 19995 Avatar \n", 1561 | "1 285 Pirates of the Caribbean: At World's End \n", 1562 | "2 206647 Spectre \n", 1563 | "3 49026 The Dark Knight Rises \n", 1564 | "4 49529 John Carter \n", 1565 | "\n", 1566 | " overview \\\n", 1567 | "0 In the 22nd century, a paraplegic Marine is di... \n", 1568 | "1 Captain Barbossa, long believed to be dead, ha... \n", 1569 | "2 A cryptic message from Bond’s past sends him o... \n", 1570 | "3 Following the death of District Attorney Harve... \n", 1571 | "4 John Carter is a war-weary, former military ca... \n", 1572 | "\n", 1573 | " genres \\\n", 1574 | "0 [Action, Adventure, Fantasy, Science Fiction] \n", 1575 | "1 [Adventure, Fantasy, Action] \n", 1576 | "2 [Action, Adventure, Crime] \n", 1577 | "3 [Action, Crime, Drama, Thriller] \n", 1578 | "4 [Action, Adventure, Science Fiction] \n", 1579 | "\n", 1580 | " keywords \\\n", 1581 | "0 [culture clash, future, space war, space colon... \n", 1582 | "1 [ocean, drug abuse, exotic island, east india ... \n", 1583 | "2 [spy, based on novel, secret agent, sequel, mi... \n", 1584 | "3 [dc comics, crime fighter, terrorist, secret i... \n", 1585 | "4 [based on novel, mars, medallion, space travel... \n", 1586 | "\n", 1587 | " cast crew \n", 1588 | "0 [Sam Worthington, Zoe Saldana, Sigourney Weaver] [James Cameron] \n", 1589 | "1 [Johnny Depp, Orlando Bloom, Keira Knightley] [Gore Verbinski] \n", 1590 | "2 [Daniel Craig, Christoph Waltz, Léa Seydoux] [Sam Mendes] \n", 1591 | "3 [Christian Bale, Michael Caine, Gary Oldman] [Christopher Nolan] \n", 1592 | "4 [Taylor Kitsch, Lynn Collins, Samantha Morton] [Andrew Stanton] " 1593 | ] 1594 | }, 1595 | "execution_count": 18, 1596 | "metadata": {}, 1597 | "output_type": "execute_result" 1598 | } 1599 | ], 1600 | "source": [ 1601 | "movies['crew']=movies['crew'].apply(fetch_director)\n", 1602 | "movies.head()" 1603 | ] 1604 | }, 1605 | { 1606 | "cell_type": "code", 1607 | "execution_count": 19, 1608 | "id": "096223cf", 1609 | "metadata": {}, 1610 | "outputs": [], 1611 | "source": [ 1612 | "#as our overview column is in string form let's also convert it in list form like other: -\n", 1613 | "#hukm ess overview valle block ko bhi list main eslie convert kar rhe hai takii osse bhi bakii list containing columns ke sath concatenate kar pae.\n", 1614 | "movies['overview']=movies['overview'].apply(lambda x:x.split())" 1615 | ] 1616 | }, 1617 | { 1618 | "cell_type": "code", 1619 | "execution_count": 20, 1620 | "id": "bc6899d4", 1621 | "metadata": {}, 1622 | "outputs": [ 1623 | { 1624 | "data": { 1625 | "text/html": [ 1626 | "
\n", 1627 | "\n", 1640 | "\n", 1641 | " \n", 1642 | " \n", 1643 | " \n", 1644 | " \n", 1645 | " \n", 1646 | " \n", 1647 | " \n", 1648 | " \n", 1649 | " \n", 1650 | " \n", 1651 | " \n", 1652 | " \n", 1653 | " \n", 1654 | " \n", 1655 | " \n", 1656 | " \n", 1657 | " \n", 1658 | " \n", 1659 | " \n", 1660 | " \n", 1661 | " \n", 1662 | " \n", 1663 | " \n", 1664 | " \n", 1665 | " \n", 1666 | " \n", 1667 | " \n", 1668 | " \n", 1669 | " \n", 1670 | " \n", 1671 | " \n", 1672 | " \n", 1673 | " \n", 1674 | " \n", 1675 | " \n", 1676 | " \n", 1677 | " \n", 1678 | " \n", 1679 | " \n", 1680 | " \n", 1681 | " \n", 1682 | " \n", 1683 | " \n", 1684 | " \n", 1685 | " \n", 1686 | " \n", 1687 | " \n", 1688 | " \n", 1689 | " \n", 1690 | " \n", 1691 | " \n", 1692 | " \n", 1693 | " \n", 1694 | " \n", 1695 | " \n", 1696 | " \n", 1697 | " \n", 1698 | " \n", 1699 | " \n", 1700 | " \n", 1701 | " \n", 1702 | " \n", 1703 | " \n", 1704 | " \n", 1705 | "
movie_idtitleoverviewgenreskeywordscastcrew
019995Avatar[In, the, 22nd, century,, a, paraplegic, Marin...[Action, Adventure, Fantasy, Science Fiction][culture clash, future, space war, space colon...[Sam Worthington, Zoe Saldana, Sigourney Weaver][James Cameron]
1285Pirates of the Caribbean: At World's End[Captain, Barbossa,, long, believed, to, be, d...[Adventure, Fantasy, Action][ocean, drug abuse, exotic island, east india ...[Johnny Depp, Orlando Bloom, Keira Knightley][Gore Verbinski]
2206647Spectre[A, cryptic, message, from, Bond’s, past, send...[Action, Adventure, Crime][spy, based on novel, secret agent, sequel, mi...[Daniel Craig, Christoph Waltz, Léa Seydoux][Sam Mendes]
349026The Dark Knight Rises[Following, the, death, of, District, Attorney...[Action, Crime, Drama, Thriller][dc comics, crime fighter, terrorist, secret i...[Christian Bale, Michael Caine, Gary Oldman][Christopher Nolan]
449529John Carter[John, Carter, is, a, war-weary,, former, mili...[Action, Adventure, Science Fiction][based on novel, mars, medallion, space travel...[Taylor Kitsch, Lynn Collins, Samantha Morton][Andrew Stanton]
\n", 1706 | "
" 1707 | ], 1708 | "text/plain": [ 1709 | " movie_id title \\\n", 1710 | "0 19995 Avatar \n", 1711 | "1 285 Pirates of the Caribbean: At World's End \n", 1712 | "2 206647 Spectre \n", 1713 | "3 49026 The Dark Knight Rises \n", 1714 | "4 49529 John Carter \n", 1715 | "\n", 1716 | " overview \\\n", 1717 | "0 [In, the, 22nd, century,, a, paraplegic, Marin... \n", 1718 | "1 [Captain, Barbossa,, long, believed, to, be, d... \n", 1719 | "2 [A, cryptic, message, from, Bond’s, past, send... \n", 1720 | "3 [Following, the, death, of, District, Attorney... \n", 1721 | "4 [John, Carter, is, a, war-weary,, former, mili... \n", 1722 | "\n", 1723 | " genres \\\n", 1724 | "0 [Action, Adventure, Fantasy, Science Fiction] \n", 1725 | "1 [Adventure, Fantasy, Action] \n", 1726 | "2 [Action, Adventure, Crime] \n", 1727 | "3 [Action, Crime, Drama, Thriller] \n", 1728 | "4 [Action, Adventure, Science Fiction] \n", 1729 | "\n", 1730 | " keywords \\\n", 1731 | "0 [culture clash, future, space war, space colon... \n", 1732 | "1 [ocean, drug abuse, exotic island, east india ... \n", 1733 | "2 [spy, based on novel, secret agent, sequel, mi... \n", 1734 | "3 [dc comics, crime fighter, terrorist, secret i... \n", 1735 | "4 [based on novel, mars, medallion, space travel... \n", 1736 | "\n", 1737 | " cast crew \n", 1738 | "0 [Sam Worthington, Zoe Saldana, Sigourney Weaver] [James Cameron] \n", 1739 | "1 [Johnny Depp, Orlando Bloom, Keira Knightley] [Gore Verbinski] \n", 1740 | "2 [Daniel Craig, Christoph Waltz, Léa Seydoux] [Sam Mendes] \n", 1741 | "3 [Christian Bale, Michael Caine, Gary Oldman] [Christopher Nolan] \n", 1742 | "4 [Taylor Kitsch, Lynn Collins, Samantha Morton] [Andrew Stanton] " 1743 | ] 1744 | }, 1745 | "execution_count": 20, 1746 | "metadata": {}, 1747 | "output_type": "execute_result" 1748 | } 1749 | ], 1750 | "source": [ 1751 | "movies.head()" 1752 | ] 1753 | }, 1754 | { 1755 | "cell_type": "code", 1756 | "execution_count": 21, 1757 | "id": "6d1e64d3", 1758 | "metadata": {}, 1759 | "outputs": [], 1760 | "source": [ 1761 | "#now we arer gonna merge all the columns from overview to crew and oske baad onko list se string main le aaenge feer hmaree pass ek bda sa paragraph ban jaega .and that whole paragraph is our whole tag column.\n", 1762 | "#there is one transformation which we have to apply which is based on removing all the spaces in between the words because: -\n", 1763 | "#yeh eslie kar rhe hai kyu ki jab hum onhe list se string main convert karenge then manna ek naam that par voh alg alg hokjee two different entities ban jaenge example; - [sam mendes]=sam alg and mendes alg\n", 1764 | "#oper valle chizz se problem aaenge jab hmare recommendation sysytem main log search karenge sam mendes toh onke pass alg alg sam aaenge or kya pta kisi or sam ki movie aajae jab ki onhe dekhni koi or thae toh os mistake ko htane ke liye hum combine kardenge names like sammendes\n", 1765 | "#jisse onhe onki recommendation valle movies mera model de paega:\n", 1766 | "#our transformation is: -\n", 1767 | "#for genres: -\n", 1768 | "movies['genres']=movies['genres'].apply(lambda x:[i.replace(\" \",\"\") for i in x])#esse basically humne space valli jgah pe(\" \") no space dalni hai(\"\")\n", 1769 | "movies['keywords']=movies['keywords'].apply(lambda x:[i.replace(\" \",\"\") for i in x])\n", 1770 | "movies['cast']=movies['cast'].apply(lambda x:[i.replace(\" \",\"\") for i in x])\n", 1771 | "movies['crew']=movies['crew'].apply(lambda x:[i.replace(\" \",\"\") for i in x])" 1772 | ] 1773 | }, 1774 | { 1775 | "cell_type": "code", 1776 | "execution_count": 22, 1777 | "id": "8aeb8062", 1778 | "metadata": {}, 1779 | "outputs": [ 1780 | { 1781 | "data": { 1782 | "text/html": [ 1783 | "
\n", 1784 | "\n", 1797 | "\n", 1798 | " \n", 1799 | " \n", 1800 | " \n", 1801 | " \n", 1802 | " \n", 1803 | " \n", 1804 | " \n", 1805 | " \n", 1806 | " \n", 1807 | " \n", 1808 | " \n", 1809 | " \n", 1810 | " \n", 1811 | " \n", 1812 | " \n", 1813 | " \n", 1814 | " \n", 1815 | " \n", 1816 | " \n", 1817 | " \n", 1818 | " \n", 1819 | " \n", 1820 | " \n", 1821 | " \n", 1822 | " \n", 1823 | " \n", 1824 | " \n", 1825 | " \n", 1826 | " \n", 1827 | " \n", 1828 | " \n", 1829 | " \n", 1830 | " \n", 1831 | " \n", 1832 | " \n", 1833 | " \n", 1834 | " \n", 1835 | " \n", 1836 | " \n", 1837 | " \n", 1838 | " \n", 1839 | " \n", 1840 | " \n", 1841 | " \n", 1842 | " \n", 1843 | " \n", 1844 | " \n", 1845 | " \n", 1846 | " \n", 1847 | " \n", 1848 | " \n", 1849 | " \n", 1850 | " \n", 1851 | " \n", 1852 | " \n", 1853 | " \n", 1854 | " \n", 1855 | " \n", 1856 | " \n", 1857 | " \n", 1858 | " \n", 1859 | " \n", 1860 | " \n", 1861 | " \n", 1862 | "
movie_idtitleoverviewgenreskeywordscastcrew
019995Avatar[In, the, 22nd, century,, a, paraplegic, Marin...[Action, Adventure, Fantasy, ScienceFiction][cultureclash, future, spacewar, spacecolony, ...[SamWorthington, ZoeSaldana, SigourneyWeaver][JamesCameron]
1285Pirates of the Caribbean: At World's End[Captain, Barbossa,, long, believed, to, be, d...[Adventure, Fantasy, Action][ocean, drugabuse, exoticisland, eastindiatrad...[JohnnyDepp, OrlandoBloom, KeiraKnightley][GoreVerbinski]
2206647Spectre[A, cryptic, message, from, Bond’s, past, send...[Action, Adventure, Crime][spy, basedonnovel, secretagent, sequel, mi6, ...[DanielCraig, ChristophWaltz, LéaSeydoux][SamMendes]
349026The Dark Knight Rises[Following, the, death, of, District, Attorney...[Action, Crime, Drama, Thriller][dccomics, crimefighter, terrorist, secretiden...[ChristianBale, MichaelCaine, GaryOldman][ChristopherNolan]
449529John Carter[John, Carter, is, a, war-weary,, former, mili...[Action, Adventure, ScienceFiction][basedonnovel, mars, medallion, spacetravel, p...[TaylorKitsch, LynnCollins, SamanthaMorton][AndrewStanton]
\n", 1863 | "
" 1864 | ], 1865 | "text/plain": [ 1866 | " movie_id title \\\n", 1867 | "0 19995 Avatar \n", 1868 | "1 285 Pirates of the Caribbean: At World's End \n", 1869 | "2 206647 Spectre \n", 1870 | "3 49026 The Dark Knight Rises \n", 1871 | "4 49529 John Carter \n", 1872 | "\n", 1873 | " overview \\\n", 1874 | "0 [In, the, 22nd, century,, a, paraplegic, Marin... \n", 1875 | "1 [Captain, Barbossa,, long, believed, to, be, d... \n", 1876 | "2 [A, cryptic, message, from, Bond’s, past, send... \n", 1877 | "3 [Following, the, death, of, District, Attorney... \n", 1878 | "4 [John, Carter, is, a, war-weary,, former, mili... \n", 1879 | "\n", 1880 | " genres \\\n", 1881 | "0 [Action, Adventure, Fantasy, ScienceFiction] \n", 1882 | "1 [Adventure, Fantasy, Action] \n", 1883 | "2 [Action, Adventure, Crime] \n", 1884 | "3 [Action, Crime, Drama, Thriller] \n", 1885 | "4 [Action, Adventure, ScienceFiction] \n", 1886 | "\n", 1887 | " keywords \\\n", 1888 | "0 [cultureclash, future, spacewar, spacecolony, ... \n", 1889 | "1 [ocean, drugabuse, exoticisland, eastindiatrad... \n", 1890 | "2 [spy, basedonnovel, secretagent, sequel, mi6, ... \n", 1891 | "3 [dccomics, crimefighter, terrorist, secretiden... \n", 1892 | "4 [basedonnovel, mars, medallion, spacetravel, p... \n", 1893 | "\n", 1894 | " cast crew \n", 1895 | "0 [SamWorthington, ZoeSaldana, SigourneyWeaver] [JamesCameron] \n", 1896 | "1 [JohnnyDepp, OrlandoBloom, KeiraKnightley] [GoreVerbinski] \n", 1897 | "2 [DanielCraig, ChristophWaltz, LéaSeydoux] [SamMendes] \n", 1898 | "3 [ChristianBale, MichaelCaine, GaryOldman] [ChristopherNolan] \n", 1899 | "4 [TaylorKitsch, LynnCollins, SamanthaMorton] [AndrewStanton] " 1900 | ] 1901 | }, 1902 | "execution_count": 22, 1903 | "metadata": {}, 1904 | "output_type": "execute_result" 1905 | } 1906 | ], 1907 | "source": [ 1908 | "movies.head()#now hmari saree spaces in between the words will be removed" 1909 | ] 1910 | }, 1911 | { 1912 | "cell_type": "code", 1913 | "execution_count": 23, 1914 | "id": "4eab8893", 1915 | "metadata": {}, 1916 | "outputs": [ 1917 | { 1918 | "data": { 1919 | "text/html": [ 1920 | "
\n", 1921 | "\n", 1934 | "\n", 1935 | " \n", 1936 | " \n", 1937 | " \n", 1938 | " \n", 1939 | " \n", 1940 | " \n", 1941 | " \n", 1942 | " \n", 1943 | " \n", 1944 | " \n", 1945 | " \n", 1946 | " \n", 1947 | " \n", 1948 | " \n", 1949 | " \n", 1950 | " \n", 1951 | " \n", 1952 | " \n", 1953 | " \n", 1954 | " \n", 1955 | " \n", 1956 | " \n", 1957 | " \n", 1958 | " \n", 1959 | " \n", 1960 | " \n", 1961 | " \n", 1962 | " \n", 1963 | " \n", 1964 | " \n", 1965 | " \n", 1966 | " \n", 1967 | " \n", 1968 | " \n", 1969 | " \n", 1970 | " \n", 1971 | " \n", 1972 | " \n", 1973 | " \n", 1974 | " \n", 1975 | " \n", 1976 | " \n", 1977 | " \n", 1978 | " \n", 1979 | " \n", 1980 | " \n", 1981 | " \n", 1982 | " \n", 1983 | " \n", 1984 | " \n", 1985 | " \n", 1986 | " \n", 1987 | " \n", 1988 | " \n", 1989 | " \n", 1990 | " \n", 1991 | " \n", 1992 | " \n", 1993 | " \n", 1994 | " \n", 1995 | " \n", 1996 | " \n", 1997 | " \n", 1998 | " \n", 1999 | " \n", 2000 | " \n", 2001 | " \n", 2002 | " \n", 2003 | " \n", 2004 | " \n", 2005 | "
movie_idtitleoverviewgenreskeywordscastcrewtags
019995Avatar[In, the, 22nd, century,, a, paraplegic, Marin...[Action, Adventure, Fantasy, ScienceFiction][cultureclash, future, spacewar, spacecolony, ...[SamWorthington, ZoeSaldana, SigourneyWeaver][JamesCameron][In, the, 22nd, century,, a, paraplegic, Marin...
1285Pirates of the Caribbean: At World's End[Captain, Barbossa,, long, believed, to, be, d...[Adventure, Fantasy, Action][ocean, drugabuse, exoticisland, eastindiatrad...[JohnnyDepp, OrlandoBloom, KeiraKnightley][GoreVerbinski][Captain, Barbossa,, long, believed, to, be, d...
2206647Spectre[A, cryptic, message, from, Bond’s, past, send...[Action, Adventure, Crime][spy, basedonnovel, secretagent, sequel, mi6, ...[DanielCraig, ChristophWaltz, LéaSeydoux][SamMendes][A, cryptic, message, from, Bond’s, past, send...
349026The Dark Knight Rises[Following, the, death, of, District, Attorney...[Action, Crime, Drama, Thriller][dccomics, crimefighter, terrorist, secretiden...[ChristianBale, MichaelCaine, GaryOldman][ChristopherNolan][Following, the, death, of, District, Attorney...
449529John Carter[John, Carter, is, a, war-weary,, former, mili...[Action, Adventure, ScienceFiction][basedonnovel, mars, medallion, spacetravel, p...[TaylorKitsch, LynnCollins, SamanthaMorton][AndrewStanton][John, Carter, is, a, war-weary,, former, mili...
\n", 2006 | "
" 2007 | ], 2008 | "text/plain": [ 2009 | " movie_id title \\\n", 2010 | "0 19995 Avatar \n", 2011 | "1 285 Pirates of the Caribbean: At World's End \n", 2012 | "2 206647 Spectre \n", 2013 | "3 49026 The Dark Knight Rises \n", 2014 | "4 49529 John Carter \n", 2015 | "\n", 2016 | " overview \\\n", 2017 | "0 [In, the, 22nd, century,, a, paraplegic, Marin... \n", 2018 | "1 [Captain, Barbossa,, long, believed, to, be, d... \n", 2019 | "2 [A, cryptic, message, from, Bond’s, past, send... \n", 2020 | "3 [Following, the, death, of, District, Attorney... \n", 2021 | "4 [John, Carter, is, a, war-weary,, former, mili... \n", 2022 | "\n", 2023 | " genres \\\n", 2024 | "0 [Action, Adventure, Fantasy, ScienceFiction] \n", 2025 | "1 [Adventure, Fantasy, Action] \n", 2026 | "2 [Action, Adventure, Crime] \n", 2027 | "3 [Action, Crime, Drama, Thriller] \n", 2028 | "4 [Action, Adventure, ScienceFiction] \n", 2029 | "\n", 2030 | " keywords \\\n", 2031 | "0 [cultureclash, future, spacewar, spacecolony, ... \n", 2032 | "1 [ocean, drugabuse, exoticisland, eastindiatrad... \n", 2033 | "2 [spy, basedonnovel, secretagent, sequel, mi6, ... \n", 2034 | "3 [dccomics, crimefighter, terrorist, secretiden... \n", 2035 | "4 [basedonnovel, mars, medallion, spacetravel, p... \n", 2036 | "\n", 2037 | " cast crew \\\n", 2038 | "0 [SamWorthington, ZoeSaldana, SigourneyWeaver] [JamesCameron] \n", 2039 | "1 [JohnnyDepp, OrlandoBloom, KeiraKnightley] [GoreVerbinski] \n", 2040 | "2 [DanielCraig, ChristophWaltz, LéaSeydoux] [SamMendes] \n", 2041 | "3 [ChristianBale, MichaelCaine, GaryOldman] [ChristopherNolan] \n", 2042 | "4 [TaylorKitsch, LynnCollins, SamanthaMorton] [AndrewStanton] \n", 2043 | "\n", 2044 | " tags \n", 2045 | "0 [In, the, 22nd, century,, a, paraplegic, Marin... \n", 2046 | "1 [Captain, Barbossa,, long, believed, to, be, d... \n", 2047 | "2 [A, cryptic, message, from, Bond’s, past, send... \n", 2048 | "3 [Following, the, death, of, District, Attorney... \n", 2049 | "4 [John, Carter, is, a, war-weary,, former, mili... " 2050 | ] 2051 | }, 2052 | "execution_count": 23, 2053 | "metadata": {}, 2054 | "output_type": "execute_result" 2055 | } 2056 | ], 2057 | "source": [ 2058 | "#ab hum ek column bna rhe hai [tags] naam ka jiske ander hum apne overview se lekar end tak ke saree columns ko add karlenge\n", 2059 | "movies['tags']=movies['overview'] + movies['genres'] + movies['keywords'] + movies['cast'] + movies['crew']\n", 2060 | "movies.head()" 2061 | ] 2062 | }, 2063 | { 2064 | "cell_type": "code", 2065 | "execution_count": 24, 2066 | "id": "b46b4ce1", 2067 | "metadata": {}, 2068 | "outputs": [ 2069 | { 2070 | "name": "stderr", 2071 | "output_type": "stream", 2072 | "text": [ 2073 | "C:\\Users\\dhruv\\AppData\\Local\\Temp\\ipykernel_13628\\669696692.py:3: SettingWithCopyWarning: \n", 2074 | "A value is trying to be set on a copy of a slice from a DataFrame.\n", 2075 | "Try using .loc[row_indexer,col_indexer] = value instead\n", 2076 | "\n", 2077 | "See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n", 2078 | " new_df['tags'] = new_df['tags'].apply(lambda x:\" \".join(x))#basically esse hmara tags valla list se string type mai aagya\n" 2079 | ] 2080 | } 2081 | ], 2082 | "source": [ 2083 | "#now we want only the movie_id,tags and title column bakii nhi chahiye humme.\n", 2084 | "new_df=movies[['movie_id','title','tags']]\n", 2085 | "new_df['tags'] = new_df['tags'].apply(lambda x:\" \".join(x))#basically esse hmara tags valla list se string type mai aagya" 2086 | ] 2087 | }, 2088 | { 2089 | "cell_type": "code", 2090 | "execution_count": 25, 2091 | "id": "1f459d73", 2092 | "metadata": {}, 2093 | "outputs": [ 2094 | { 2095 | "data": { 2096 | "text/plain": [ 2097 | "0 In the 22nd century, a paraplegic Marine is di...\n", 2098 | "1 Captain Barbossa, long believed to be dead, ha...\n", 2099 | "2 A cryptic message from Bond’s past sends him o...\n", 2100 | "3 Following the death of District Attorney Harve...\n", 2101 | "4 John Carter is a war-weary, former military ca...\n", 2102 | " ... \n", 2103 | "4804 El Mariachi just wants to play his guitar and ...\n", 2104 | "4805 A newlywed couple's honeymoon is upended by th...\n", 2105 | "4806 \"Signed, Sealed, Delivered\" introduces a dedic...\n", 2106 | "4807 When ambitious New York attorney Sam is sent t...\n", 2107 | "4808 Ever since the second grade when he first saw ...\n", 2108 | "Name: tags, Length: 4806, dtype: object" 2109 | ] 2110 | }, 2111 | "execution_count": 25, 2112 | "metadata": {}, 2113 | "output_type": "execute_result" 2114 | } 2115 | ], 2116 | "source": [ 2117 | "new_df['tags']" 2118 | ] 2119 | }, 2120 | { 2121 | "cell_type": "code", 2122 | "execution_count": 26, 2123 | "id": "30390d3d", 2124 | "metadata": {}, 2125 | "outputs": [ 2126 | { 2127 | "data": { 2128 | "text/html": [ 2129 | "
\n", 2130 | "\n", 2143 | "\n", 2144 | " \n", 2145 | " \n", 2146 | " \n", 2147 | " \n", 2148 | " \n", 2149 | " \n", 2150 | " \n", 2151 | " \n", 2152 | " \n", 2153 | " \n", 2154 | " \n", 2155 | " \n", 2156 | " \n", 2157 | " \n", 2158 | " \n", 2159 | " \n", 2160 | " \n", 2161 | " \n", 2162 | " \n", 2163 | " \n", 2164 | " \n", 2165 | " \n", 2166 | " \n", 2167 | " \n", 2168 | " \n", 2169 | " \n", 2170 | " \n", 2171 | " \n", 2172 | " \n", 2173 | " \n", 2174 | " \n", 2175 | " \n", 2176 | " \n", 2177 | " \n", 2178 | " \n", 2179 | " \n", 2180 | " \n", 2181 | " \n", 2182 | " \n", 2183 | " \n", 2184 | "
movie_idtitletags
019995AvatarIn the 22nd century, a paraplegic Marine is di...
1285Pirates of the Caribbean: At World's EndCaptain Barbossa, long believed to be dead, ha...
2206647SpectreA cryptic message from Bond’s past sends him o...
349026The Dark Knight RisesFollowing the death of District Attorney Harve...
449529John CarterJohn Carter is a war-weary, former military ca...
\n", 2185 | "
" 2186 | ], 2187 | "text/plain": [ 2188 | " movie_id title \\\n", 2189 | "0 19995 Avatar \n", 2190 | "1 285 Pirates of the Caribbean: At World's End \n", 2191 | "2 206647 Spectre \n", 2192 | "3 49026 The Dark Knight Rises \n", 2193 | "4 49529 John Carter \n", 2194 | "\n", 2195 | " tags \n", 2196 | "0 In the 22nd century, a paraplegic Marine is di... \n", 2197 | "1 Captain Barbossa, long believed to be dead, ha... \n", 2198 | "2 A cryptic message from Bond’s past sends him o... \n", 2199 | "3 Following the death of District Attorney Harve... \n", 2200 | "4 John Carter is a war-weary, former military ca... " 2201 | ] 2202 | }, 2203 | "execution_count": 26, 2204 | "metadata": {}, 2205 | "output_type": "execute_result" 2206 | } 2207 | ], 2208 | "source": [ 2209 | "new_df.head()" 2210 | ] 2211 | }, 2212 | { 2213 | "cell_type": "code", 2214 | "execution_count": 27, 2215 | "id": "44d49788", 2216 | "metadata": {}, 2217 | "outputs": [ 2218 | { 2219 | "data": { 2220 | "text/html": [ 2221 | "
\n", 2222 | "\n", 2235 | "\n", 2236 | " \n", 2237 | " \n", 2238 | " \n", 2239 | " \n", 2240 | " \n", 2241 | " \n", 2242 | " \n", 2243 | " \n", 2244 | " \n", 2245 | " \n", 2246 | " \n", 2247 | " \n", 2248 | " \n", 2249 | " \n", 2250 | " \n", 2251 | " \n", 2252 | " \n", 2253 | " \n", 2254 | " \n", 2255 | " \n", 2256 | " \n", 2257 | " \n", 2258 | " \n", 2259 | " \n", 2260 | " \n", 2261 | " \n", 2262 | " \n", 2263 | " \n", 2264 | " \n", 2265 | " \n", 2266 | " \n", 2267 | " \n", 2268 | " \n", 2269 | " \n", 2270 | " \n", 2271 | " \n", 2272 | " \n", 2273 | " \n", 2274 | " \n", 2275 | " \n", 2276 | "
movie_idtitletags
019995AvatarIn the 22nd century, a paraplegic Marine is di...
1285Pirates of the Caribbean: At World's EndCaptain Barbossa, long believed to be dead, ha...
2206647SpectreA cryptic message from Bond’s past sends him o...
349026The Dark Knight RisesFollowing the death of District Attorney Harve...
449529John CarterJohn Carter is a war-weary, former military ca...
\n", 2277 | "
" 2278 | ], 2279 | "text/plain": [ 2280 | " movie_id title \\\n", 2281 | "0 19995 Avatar \n", 2282 | "1 285 Pirates of the Caribbean: At World's End \n", 2283 | "2 206647 Spectre \n", 2284 | "3 49026 The Dark Knight Rises \n", 2285 | "4 49529 John Carter \n", 2286 | "\n", 2287 | " tags \n", 2288 | "0 In the 22nd century, a paraplegic Marine is di... \n", 2289 | "1 Captain Barbossa, long believed to be dead, ha... \n", 2290 | "2 A cryptic message from Bond’s past sends him o... \n", 2291 | "3 Following the death of District Attorney Harve... \n", 2292 | "4 John Carter is a war-weary, former military ca... " 2293 | ] 2294 | }, 2295 | "execution_count": 27, 2296 | "metadata": {}, 2297 | "output_type": "execute_result" 2298 | } 2299 | ], 2300 | "source": [ 2301 | "#now aesse he hum words in tags column ko lowercase mai convert kardete hai: -\n", 2302 | "new_df['tags'].apply(lambda x:x.lower())\n", 2303 | "new_df.head()" 2304 | ] 2305 | }, 2306 | { 2307 | "cell_type": "code", 2308 | "execution_count": 28, 2309 | "id": "1b845a87", 2310 | "metadata": {}, 2311 | "outputs": [ 2312 | { 2313 | "name": "stdout", 2314 | "output_type": "stream", 2315 | "text": [ 2316 | "Note: you may need to restart the kernel to use updated packages.\n" 2317 | ] 2318 | }, 2319 | { 2320 | "name": "stderr", 2321 | "output_type": "stream", 2322 | "text": [ 2323 | "ERROR: Invalid requirement: '#popular'\n" 2324 | ] 2325 | } 2326 | ], 2327 | "source": [ 2328 | "pip install nltk #popular natural language processing library" 2329 | ] 2330 | }, 2331 | { 2332 | "cell_type": "code", 2333 | "execution_count": 29, 2334 | "id": "2c0943ca", 2335 | "metadata": {}, 2336 | "outputs": [ 2337 | { 2338 | "name": "stderr", 2339 | "output_type": "stream", 2340 | "text": [ 2341 | "C:\\Users\\dhruv\\anaconda3\\lib\\site-packages\\scipy\\__init__.py:146: UserWarning: A NumPy version >=1.16.5 and <1.23.0 is required for this version of SciPy (detected version 1.26.1\n", 2342 | " warnings.warn(f\"A NumPy version >={np_minversion} and <{np_maxversion}\"\n" 2343 | ] 2344 | } 2345 | ], 2346 | "source": [ 2347 | "import nltk#it is one of the famous natural language processing library." 2348 | ] 2349 | }, 2350 | { 2351 | "cell_type": "code", 2352 | "execution_count": 30, 2353 | "id": "04faf9a5", 2354 | "metadata": {}, 2355 | "outputs": [], 2356 | "source": [ 2357 | "from nltk.stem.porter import PorterStemmer\n", 2358 | "ps=PorterStemmer()" 2359 | ] 2360 | }, 2361 | { 2362 | "cell_type": "code", 2363 | "execution_count": 31, 2364 | "id": "2a477e4e", 2365 | "metadata": {}, 2366 | "outputs": [], 2367 | "source": [ 2368 | "def stem(text):\n", 2369 | " y=[]\n", 2370 | " for i in text.split():#from here we are converting the string into the list.\n", 2371 | " y.append(ps.stem(i))#with this we are going to stem every word.\n", 2372 | " return\" \".join(y)#now we are converting the list into the string again." 2373 | ] 2374 | }, 2375 | { 2376 | "cell_type": "code", 2377 | "execution_count": 32, 2378 | "id": "8e78782f", 2379 | "metadata": {}, 2380 | "outputs": [ 2381 | { 2382 | "name": "stderr", 2383 | "output_type": "stream", 2384 | "text": [ 2385 | "C:\\Users\\dhruv\\AppData\\Local\\Temp\\ipykernel_13628\\1531475107.py:1: SettingWithCopyWarning: \n", 2386 | "A value is trying to be set on a copy of a slice from a DataFrame.\n", 2387 | "Try using .loc[row_indexer,col_indexer] = value instead\n", 2388 | "\n", 2389 | "See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n", 2390 | " new_df['tags']=new_df['tags'].apply(stem)#esse ab hmare sarree niche valle output ophele se differnt aaenge basically all the words according to their root node word .because we apply stemming here.\n" 2391 | ] 2392 | } 2393 | ], 2394 | "source": [ 2395 | "new_df['tags']=new_df['tags'].apply(stem)#esse ab hmare sarree niche valle output ophele se differnt aaenge basically all the words according to their root node word .because we apply stemming here." 2396 | ] 2397 | }, 2398 | { 2399 | "cell_type": "code", 2400 | "execution_count": 33, 2401 | "id": "899c19a9", 2402 | "metadata": {}, 2403 | "outputs": [ 2404 | { 2405 | "name": "stdout", 2406 | "output_type": "stream", 2407 | "text": [ 2408 | "\n" 2409 | ] 2410 | } 2411 | ], 2412 | "source": [ 2413 | "print(type(new_df))" 2414 | ] 2415 | }, 2416 | { 2417 | "cell_type": "code", 2418 | "execution_count": 34, 2419 | "id": "d439e73a", 2420 | "metadata": {}, 2421 | "outputs": [ 2422 | { 2423 | "data": { 2424 | "text/plain": [ 2425 | "\"captain barbossa, long believ to be dead, ha come back to life and is head to the edg of the earth with will turner and elizabeth swann. but noth is quit as it seems. adventur fantasi action ocean drugabus exoticisland eastindiatradingcompani loveofone'slif traitor shipwreck strongwoman ship allianc calypso afterlif fighter pirat swashbuckl aftercreditssting johnnydepp orlandobloom keiraknightley goreverbinski\"" 2426 | ] 2427 | }, 2428 | "execution_count": 34, 2429 | "metadata": {}, 2430 | "output_type": "execute_result" 2431 | } 2432 | ], 2433 | "source": [ 2434 | "new_df['tags'][1]" 2435 | ] 2436 | }, 2437 | { 2438 | "cell_type": "code", 2439 | "execution_count": 35, 2440 | "id": "4296a08f", 2441 | "metadata": {}, 2442 | "outputs": [ 2443 | { 2444 | "data": { 2445 | "text/html": [ 2446 | "
\n", 2447 | "\n", 2460 | "\n", 2461 | " \n", 2462 | " \n", 2463 | " \n", 2464 | " \n", 2465 | " \n", 2466 | " \n", 2467 | " \n", 2468 | " \n", 2469 | " \n", 2470 | " \n", 2471 | " \n", 2472 | " \n", 2473 | " \n", 2474 | " \n", 2475 | " \n", 2476 | " \n", 2477 | " \n", 2478 | " \n", 2479 | " \n", 2480 | " \n", 2481 | " \n", 2482 | " \n", 2483 | " \n", 2484 | " \n", 2485 | " \n", 2486 | " \n", 2487 | " \n", 2488 | " \n", 2489 | " \n", 2490 | " \n", 2491 | " \n", 2492 | " \n", 2493 | " \n", 2494 | " \n", 2495 | " \n", 2496 | " \n", 2497 | " \n", 2498 | " \n", 2499 | " \n", 2500 | " \n", 2501 | "
movie_idtitletags
019995Avatarin the 22nd century, a parapleg marin is dispa...
1285Pirates of the Caribbean: At World's Endcaptain barbossa, long believ to be dead, ha c...
2206647Spectrea cryptic messag from bond’ past send him on a...
349026The Dark Knight Risesfollow the death of district attorney harvey d...
449529John Carterjohn carter is a war-weary, former militari ca...
\n", 2502 | "
" 2503 | ], 2504 | "text/plain": [ 2505 | " movie_id title \\\n", 2506 | "0 19995 Avatar \n", 2507 | "1 285 Pirates of the Caribbean: At World's End \n", 2508 | "2 206647 Spectre \n", 2509 | "3 49026 The Dark Knight Rises \n", 2510 | "4 49529 John Carter \n", 2511 | "\n", 2512 | " tags \n", 2513 | "0 in the 22nd century, a parapleg marin is dispa... \n", 2514 | "1 captain barbossa, long believ to be dead, ha c... \n", 2515 | "2 a cryptic messag from bond’ past send him on a... \n", 2516 | "3 follow the death of district attorney harvey d... \n", 2517 | "4 john carter is a war-weary, former militari ca... " 2518 | ] 2519 | }, 2520 | "execution_count": 35, 2521 | "metadata": {}, 2522 | "output_type": "execute_result" 2523 | } 2524 | ], 2525 | "source": [ 2526 | "new_df.head()" 2527 | ] 2528 | }, 2529 | { 2530 | "cell_type": "code", 2531 | "execution_count": 36, 2532 | "id": "3252ad19", 2533 | "metadata": {}, 2534 | "outputs": [], 2535 | "source": [ 2536 | "#as wee see based on the basis of text of the two differtent movies we have to find the similarity in between them :-this is our chaLLENGE\n", 2537 | "#more the similarity in between the movie means we have to find more similarity in the text also.\n", 2538 | "\n", 2539 | "#hum basically similarity eslie dhondh rhe hain takii jiss type ki movie pasand kar rha hai user usse type ki movie usse mille\n", 2540 | "\n", 2541 | "#just because of this problem the concept of vectorisation comes on which we convert the text into vector form.\n", 2542 | "\n", 2543 | "#ek tarah se hum hur movie ko ek vector assign kar rhe hai jisme agr humme ek movie dekh rhe hai toh recommendation sysytem ko or konsi movies mujhe recommend karni chahiye this depends on the genre and just according to that it will recommend me the closest vector of the movie which i am currently watching.\n", 2544 | "\n", 2545 | "#we are using the [BAG OF WORDS] TECHNIQUE OF TEXT VERCTORISATION CONCEPT: -\n", 2546 | "#IN THIS TECHNIQUE WE ARE BASICALLY COMBINING ALL THE TAGS AND THROUGH THAT COMBINED TAG WE HAVE TO EXTRACT THAT MOST COMMON REPITITIVE WORDS FROM THAT COMBINED TAG.\n", 2547 | "#AFTER DOING THIS I CHECK THAT HOW MANY TIMES THIS MOST COMMON REPITITIVE WORDS COMES IN THE INDIVIDUAL TAG OF THE MOVIE.WE DO THIS FOR EVERY MOVIE.\n", 2548 | "#AND AT THE END OF THIS WE GET A TABLE WHOSE SHAPE IS((MOVIES=5000,(MOST COMMON WORDS=5000))\n", 2549 | "\n", 2550 | "# BASICALLY IN THIS WHOLE IMFORMATION AT THE END OUR VECTORS ARE THOSE WHICH ARE BUILD BY COUNTING THE MOST COMMON WORDS IN INDIVIDUAL MOVIE TAGS AND OSSE JOH MOST COMMON WORDS KE COUNT SE JOH VECTOR TAYAR HOGAA THAT IS THE ONE VECTOR.\n", 2551 | "#NOTE:- IN THIS VECTORISATION CONCEPT WE HAVE NOT TO COUNT THE STOP WORDS WE HAVE TO GIVE ASIDE TO THEM(NOT INCLUDE THEM)\n", 2552 | "#STOP WORDS ARE THOSE WHICH ARE USED IN SENTENCE FORMATION BUT THEY ARE NOT USED IN OR THERE IS NO CONTRIBUTION OF THEM IN SENTENCE BUILDING.EG: - IS,ARE,AND,OR,TO,FROM AND OTHERS." 2553 | ] 2554 | }, 2555 | { 2556 | "cell_type": "code", 2557 | "execution_count": 37, 2558 | "id": "22e998fa", 2559 | "metadata": {}, 2560 | "outputs": [], 2561 | "source": [ 2562 | "from sklearn.feature_extraction.text import CountVectorizer\n", 2563 | "cv=CountVectorizer(max_features=5000,stop_words='english')#max features means ki kitne words aapko lene hai yeh voh hmare most repititve common words hai" 2564 | ] 2565 | }, 2566 | { 2567 | "cell_type": "code", 2568 | "execution_count": 38, 2569 | "id": "3ff4630b", 2570 | "metadata": {}, 2571 | "outputs": [], 2572 | "source": [ 2573 | "vectors=cv.fit_transform(new_df['tags']).toarray()" 2574 | ] 2575 | }, 2576 | { 2577 | "cell_type": "code", 2578 | "execution_count": 39, 2579 | "id": "a7c40c06", 2580 | "metadata": {}, 2581 | "outputs": [ 2582 | { 2583 | "data": { 2584 | "text/plain": [ 2585 | "array([[0, 0, 0, ..., 0, 0, 0],\n", 2586 | " [0, 0, 0, ..., 0, 0, 0],\n", 2587 | " [0, 0, 0, ..., 0, 0, 0],\n", 2588 | " ...,\n", 2589 | " [0, 0, 0, ..., 0, 0, 0],\n", 2590 | " [0, 0, 0, ..., 0, 0, 0],\n", 2591 | " [0, 0, 0, ..., 0, 0, 0]], dtype=int64)" 2592 | ] 2593 | }, 2594 | "execution_count": 39, 2595 | "metadata": {}, 2596 | "output_type": "execute_result" 2597 | } 2598 | ], 2599 | "source": [ 2600 | "vectors" 2601 | ] 2602 | }, 2603 | { 2604 | "cell_type": "code", 2605 | "execution_count": 40, 2606 | "id": "145e86dd", 2607 | "metadata": {}, 2608 | "outputs": [ 2609 | { 2610 | "data": { 2611 | "text/plain": [ 2612 | "(4806, 5000)" 2613 | ] 2614 | }, 2615 | "execution_count": 40, 2616 | "metadata": {}, 2617 | "output_type": "execute_result" 2618 | } 2619 | ], 2620 | "source": [ 2621 | "vectors.shape#total movies=4806,common words=5000 according to this our transformation happens" 2622 | ] 2623 | }, 2624 | { 2625 | "cell_type": "code", 2626 | "execution_count": 41, 2627 | "id": "9f06fe57", 2628 | "metadata": {}, 2629 | "outputs": [ 2630 | { 2631 | "data": { 2632 | "text/plain": [ 2633 | "array([0, 0, 0, ..., 0, 0, 0], dtype=int64)" 2634 | ] 2635 | }, 2636 | "execution_count": 41, 2637 | "metadata": {}, 2638 | "output_type": "execute_result" 2639 | } 2640 | ], 2641 | "source": [ 2642 | "vectors[0]#avatar movie\n", 2643 | "#jyada 0 dikh sakte hai par numbers bhi dekhenge because common words jon jon se aaenge onke niche he number hogaa" 2644 | ] 2645 | }, 2646 | { 2647 | "cell_type": "code", 2648 | "execution_count": 42, 2649 | "id": "cdc7b057", 2650 | "metadata": {}, 2651 | "outputs": [ 2652 | { 2653 | "name": "stderr", 2654 | "output_type": "stream", 2655 | "text": [ 2656 | "C:\\Users\\dhruv\\anaconda3\\lib\\site-packages\\sklearn\\utils\\deprecation.py:87: FutureWarning: Function get_feature_names is deprecated; get_feature_names is deprecated in 1.0 and will be removed in 1.2. Please use get_feature_names_out instead.\n", 2657 | " warnings.warn(msg, category=FutureWarning)\n" 2658 | ] 2659 | }, 2660 | { 2661 | "data": { 2662 | "text/plain": [ 2663 | "['000',\n", 2664 | " '007',\n", 2665 | " '10',\n", 2666 | " '100',\n", 2667 | " '11',\n", 2668 | " '12',\n", 2669 | " '13',\n", 2670 | " '14',\n", 2671 | " '15',\n", 2672 | " '16',\n", 2673 | " '17',\n", 2674 | " '17th',\n", 2675 | " '18',\n", 2676 | " '18th',\n", 2677 | " '18thcenturi',\n", 2678 | " '19',\n", 2679 | " '1910',\n", 2680 | " '1920',\n", 2681 | " '1930',\n", 2682 | " '1940',\n", 2683 | " '1944',\n", 2684 | " '1950',\n", 2685 | " '1950s',\n", 2686 | " '1960',\n", 2687 | " '1960s',\n", 2688 | " '1970',\n", 2689 | " '1970s',\n", 2690 | " '1971',\n", 2691 | " '1974',\n", 2692 | " '1976',\n", 2693 | " '1980',\n", 2694 | " '1985',\n", 2695 | " '1990',\n", 2696 | " '1999',\n", 2697 | " '19th',\n", 2698 | " '19thcenturi',\n", 2699 | " '20',\n", 2700 | " '200',\n", 2701 | " '2003',\n", 2702 | " '2009',\n", 2703 | " '20th',\n", 2704 | " '21st',\n", 2705 | " '23',\n", 2706 | " '24',\n", 2707 | " '25',\n", 2708 | " '30',\n", 2709 | " '300',\n", 2710 | " '3d',\n", 2711 | " '40',\n", 2712 | " '50',\n", 2713 | " '500',\n", 2714 | " '60',\n", 2715 | " '70',\n", 2716 | " '80',\n", 2717 | " 'aaron',\n", 2718 | " 'aaroneckhart',\n", 2719 | " 'abandon',\n", 2720 | " 'abduct',\n", 2721 | " 'abigailbreslin',\n", 2722 | " 'abil',\n", 2723 | " 'abl',\n", 2724 | " 'aboard',\n", 2725 | " 'abov',\n", 2726 | " 'abus',\n", 2727 | " 'academ',\n", 2728 | " 'academi',\n", 2729 | " 'accept',\n", 2730 | " 'access',\n", 2731 | " 'accid',\n", 2732 | " 'accident',\n", 2733 | " 'acclaim',\n", 2734 | " 'accompani',\n", 2735 | " 'accomplish',\n", 2736 | " 'account',\n", 2737 | " 'accus',\n", 2738 | " 'ace',\n", 2739 | " 'achiev',\n", 2740 | " 'acquaint',\n", 2741 | " 'act',\n", 2742 | " 'action',\n", 2743 | " 'actionhero',\n", 2744 | " 'activ',\n", 2745 | " 'activist',\n", 2746 | " 'activities',\n", 2747 | " 'actor',\n", 2748 | " 'actress',\n", 2749 | " 'actual',\n", 2750 | " 'ad',\n", 2751 | " 'adam',\n", 2752 | " 'adamsandl',\n", 2753 | " 'adamshankman',\n", 2754 | " 'adapt',\n", 2755 | " 'add',\n", 2756 | " 'addict',\n", 2757 | " 'adjust',\n", 2758 | " 'admir',\n", 2759 | " 'admit',\n", 2760 | " 'adolesc',\n", 2761 | " 'adopt',\n", 2762 | " 'ador',\n", 2763 | " 'adrienbrodi',\n", 2764 | " 'adult',\n", 2765 | " 'adultanim',\n", 2766 | " 'adulteri',\n", 2767 | " 'adulthood',\n", 2768 | " 'advanc',\n", 2769 | " 'adventur',\n", 2770 | " 'adventure',\n", 2771 | " 'adventures',\n", 2772 | " 'advertis',\n", 2773 | " 'advic',\n", 2774 | " 'advis',\n", 2775 | " 'affair',\n", 2776 | " 'affect',\n", 2777 | " 'afghanistan',\n", 2778 | " 'africa',\n", 2779 | " 'african',\n", 2780 | " 'africanamerican',\n", 2781 | " 'aftercreditssting',\n", 2782 | " 'afterlif',\n", 2783 | " 'aftermath',\n", 2784 | " 'ag',\n", 2785 | " 'age',\n", 2786 | " 'agediffer',\n", 2787 | " 'agenc',\n", 2788 | " 'agency',\n", 2789 | " 'agenda',\n", 2790 | " 'agent',\n", 2791 | " 'agents',\n", 2792 | " 'aggress',\n", 2793 | " 'ago',\n", 2794 | " 'agre',\n", 2795 | " 'ahead',\n", 2796 | " 'aid',\n", 2797 | " 'aidanquinn',\n", 2798 | " 'ail',\n", 2799 | " 'aim',\n", 2800 | " 'air',\n", 2801 | " 'airplan',\n", 2802 | " 'airplanecrash',\n", 2803 | " 'airport',\n", 2804 | " 'aka',\n", 2805 | " 'al',\n", 2806 | " 'alabama',\n", 2807 | " 'alan',\n", 2808 | " 'alaska',\n", 2809 | " 'albert',\n", 2810 | " 'alcatraz',\n", 2811 | " 'alcohol',\n", 2812 | " 'alecbaldwin',\n", 2813 | " 'alex',\n", 2814 | " 'alexkendrick',\n", 2815 | " 'alfredhitchcock',\n", 2816 | " 'alfredmolina',\n", 2817 | " 'ali',\n", 2818 | " 'alic',\n", 2819 | " 'alice',\n", 2820 | " 'alien',\n", 2821 | " 'alieninvas',\n", 2822 | " 'alienlife',\n", 2823 | " 'alienplanet',\n", 2824 | " 'aliens',\n", 2825 | " 'alik',\n", 2826 | " 'aliv',\n", 2827 | " 'alive',\n", 2828 | " 'allen',\n", 2829 | " 'alli',\n", 2830 | " 'allianc',\n", 2831 | " 'allow',\n", 2832 | " 'alon',\n", 2833 | " 'alongsid',\n", 2834 | " 'alpacino',\n", 2835 | " 'alpha',\n", 2836 | " 'alreadi',\n", 2837 | " 'alter',\n", 2838 | " 'altern',\n", 2839 | " 'alway',\n", 2840 | " 'alyssa',\n", 2841 | " 'alzheimer',\n", 2842 | " 'amanda',\n", 2843 | " 'amandapeet',\n", 2844 | " 'amandaseyfri',\n", 2845 | " 'amateur',\n", 2846 | " 'amaz',\n", 2847 | " 'amazon',\n", 2848 | " 'ambassador',\n", 2849 | " 'ambit',\n", 2850 | " 'ambiti',\n", 2851 | " 'ambul',\n", 2852 | " 'ambush',\n", 2853 | " 'america',\n", 2854 | " 'american',\n", 2855 | " 'americanabroad',\n", 2856 | " 'americancivilwar',\n", 2857 | " 'americanfootbal',\n", 2858 | " 'americanfootballplay',\n", 2859 | " 'amid',\n", 2860 | " 'amidst',\n", 2861 | " 'amnesia',\n", 2862 | " 'amp',\n", 2863 | " 'amsterdam',\n", 2864 | " 'amus',\n", 2865 | " 'amusementpark',\n", 2866 | " 'amy',\n", 2867 | " 'amyadam',\n", 2868 | " 'amysmart',\n", 2869 | " 'ana',\n", 2870 | " 'anakin',\n", 2871 | " 'analyst',\n", 2872 | " 'anarchiccomedi',\n", 2873 | " 'ancient',\n", 2874 | " 'ancientrom',\n", 2875 | " 'ancientworld',\n", 2876 | " 'anderson',\n", 2877 | " 'andi',\n", 2878 | " 'andiemacdowel',\n", 2879 | " 'andrew',\n", 2880 | " 'android',\n", 2881 | " 'andy',\n", 2882 | " 'andygarcía',\n", 2883 | " 'angel',\n", 2884 | " 'angela',\n", 2885 | " 'angelabassett',\n", 2886 | " 'angeles',\n", 2887 | " 'angelinajoli',\n", 2888 | " 'anger',\n", 2889 | " 'angle',\n", 2890 | " 'angri',\n", 2891 | " 'ani',\n", 2892 | " 'anim',\n", 2893 | " 'animalattack',\n", 2894 | " 'animalhorror',\n", 2895 | " 'animals',\n", 2896 | " 'anjelicahuston',\n", 2897 | " 'ann',\n", 2898 | " 'anna',\n", 2899 | " 'annafari',\n", 2900 | " 'annakendrick',\n", 2901 | " 'anne',\n", 2902 | " 'annehathaway',\n", 2903 | " 'annemoss',\n", 2904 | " 'annetteben',\n", 2905 | " 'anni',\n", 2906 | " 'annie',\n", 2907 | " 'anniversari',\n", 2908 | " 'announc',\n", 2909 | " 'annual',\n", 2910 | " 'anonym',\n", 2911 | " 'anoth',\n", 2912 | " 'answer',\n", 2913 | " 'ant',\n", 2914 | " 'antholog',\n", 2915 | " 'anthoni',\n", 2916 | " 'anthonyanderson',\n", 2917 | " 'anthonyhopkin',\n", 2918 | " 'anthropomorph',\n", 2919 | " 'anti',\n", 2920 | " 'antic',\n", 2921 | " 'antihero',\n", 2922 | " 'antiqu',\n", 2923 | " 'antoinefuqua',\n", 2924 | " 'antoniobandera',\n", 2925 | " 'antonyelchin',\n", 2926 | " 'anyon',\n", 2927 | " 'anyth',\n", 2928 | " 'apart',\n", 2929 | " 'apartheid',\n", 2930 | " 'apartment',\n", 2931 | " 'ape',\n", 2932 | " 'apocalyps',\n", 2933 | " 'apocalypse',\n", 2934 | " 'apocalypt',\n", 2935 | " 'appar',\n", 2936 | " 'appear',\n", 2937 | " 'appl',\n", 2938 | " 'apple',\n", 2939 | " 'appoint',\n", 2940 | " 'appreci',\n", 2941 | " 'apprentic',\n", 2942 | " 'approach',\n", 2943 | " 'april',\n", 2944 | " 'aquarium',\n", 2945 | " 'arab',\n", 2946 | " 'arch',\n", 2947 | " 'archaeologist',\n", 2948 | " 'archeolog',\n", 2949 | " 'archer',\n", 2950 | " 'architect',\n", 2951 | " 'arctic',\n", 2952 | " 'area',\n", 2953 | " 'aren',\n", 2954 | " 'arena',\n", 2955 | " 'argument',\n", 2956 | " 'aris',\n", 2957 | " 'aristocrat',\n", 2958 | " 'arm',\n", 2959 | " 'armi',\n", 2960 | " 'armor',\n", 2961 | " 'armsdeal',\n", 2962 | " 'army',\n", 2963 | " 'arnold',\n", 2964 | " 'arnoldschwarzenegg',\n", 2965 | " 'arrang',\n", 2966 | " 'arrangedmarriag',\n", 2967 | " 'arrest',\n", 2968 | " 'arriv',\n", 2969 | " 'arrog',\n", 2970 | " 'art',\n", 2971 | " 'arthur',\n", 2972 | " 'artifact',\n", 2973 | " 'artifici',\n", 2974 | " 'artificialintellig',\n", 2975 | " 'artist',\n", 2976 | " 'ash',\n", 2977 | " 'ashley',\n", 2978 | " 'ashleyjudd',\n", 2979 | " 'ashtonkutch',\n", 2980 | " 'asia',\n", 2981 | " 'asian',\n", 2982 | " 'asid',\n", 2983 | " 'ask',\n", 2984 | " 'aspect',\n", 2985 | " 'aspir',\n", 2986 | " 'assassin',\n", 2987 | " 'assault',\n", 2988 | " 'assembl',\n", 2989 | " 'assign',\n", 2990 | " 'assist',\n", 2991 | " 'assistant',\n", 2992 | " 'associ',\n", 2993 | " 'assum',\n", 2994 | " 'asteroid',\n", 2995 | " 'astronaut',\n", 2996 | " 'asylum',\n", 2997 | " 'atheist',\n", 2998 | " 'athlet',\n", 2999 | " 'atom',\n", 3000 | " 'atomicbomb',\n", 3001 | " 'attack',\n", 3002 | " 'attacks',\n", 3003 | " 'attempt',\n", 3004 | " 'attend',\n", 3005 | " 'attent',\n", 3006 | " 'attic',\n", 3007 | " 'attitud',\n", 3008 | " 'attorney',\n", 3009 | " 'attract',\n", 3010 | " 'auction',\n", 3011 | " 'audienc',\n", 3012 | " 'audit',\n", 3013 | " 'august',\n", 3014 | " 'aunt',\n", 3015 | " 'austin',\n", 3016 | " 'australia',\n", 3017 | " 'australian',\n", 3018 | " 'author',\n", 3019 | " 'autism',\n", 3020 | " 'auto',\n", 3021 | " 'automobilerac',\n", 3022 | " 'aveng',\n", 3023 | " 'averag',\n", 3024 | " 'avoid',\n", 3025 | " 'await',\n", 3026 | " 'awak',\n", 3027 | " 'awaken',\n", 3028 | " 'awar',\n", 3029 | " 'award',\n", 3030 | " 'away',\n", 3031 | " 'awkward',\n", 3032 | " 'awri',\n", 3033 | " 'awry',\n", 3034 | " 'ax',\n", 3035 | " 'babe',\n", 3036 | " 'babi',\n", 3037 | " 'baby',\n", 3038 | " 'bachelor',\n", 3039 | " 'backdrop',\n", 3040 | " 'background',\n", 3041 | " 'backpack',\n", 3042 | " 'bad',\n", 3043 | " 'bag',\n", 3044 | " 'bahama',\n", 3045 | " 'bail',\n", 3046 | " 'balanc',\n", 3047 | " 'ball',\n", 3048 | " 'ballet',\n", 3049 | " 'balloon',\n", 3050 | " 'baltimor',\n", 3051 | " 'ban',\n", 3052 | " 'band',\n", 3053 | " 'bandit',\n", 3054 | " 'bangkok',\n", 3055 | " 'banish',\n", 3056 | " 'bank',\n", 3057 | " 'banker',\n", 3058 | " 'bankrobb',\n", 3059 | " 'bankrobberi',\n", 3060 | " 'bar',\n", 3061 | " 'barbrastreisand',\n", 3062 | " 'bare',\n", 3063 | " 'bargain',\n", 3064 | " 'barn',\n", 3065 | " 'barney',\n", 3066 | " 'baron',\n", 3067 | " 'barri',\n", 3068 | " 'barrylevinson',\n", 3069 | " 'barrysonnenfeld',\n", 3070 | " 'bas',\n", 3071 | " 'base',\n", 3072 | " 'basebal',\n", 3073 | " 'basedoncomicbook',\n", 3074 | " 'basedongraphicnovel',\n", 3075 | " 'basedonnovel',\n", 3076 | " 'basedonplay',\n", 3077 | " 'basedonstagemus',\n", 3078 | " 'basedontrueev',\n", 3079 | " 'basedontruestori',\n", 3080 | " 'basedontvseri',\n", 3081 | " 'basedonvideogam',\n", 3082 | " 'basedonyoungadultnovel',\n", 3083 | " 'basement',\n", 3084 | " 'basketbal',\n", 3085 | " 'basketball',\n", 3086 | " 'bat',\n", 3087 | " 'batman',\n", 3088 | " 'battl',\n", 3089 | " 'battle',\n", 3090 | " 'battlefield',\n", 3091 | " 'bay',\n", 3092 | " 'beach',\n", 3093 | " 'beam',\n", 3094 | " 'bear',\n", 3095 | " 'beard',\n", 3096 | " 'beast',\n", 3097 | " 'beat',\n", 3098 | " 'beauti',\n", 3099 | " 'beautiful',\n", 3100 | " 'beautifulwoman',\n", 3101 | " 'beauty',\n", 3102 | " 'becam',\n", 3103 | " 'becaus',\n", 3104 | " 'becki',\n", 3105 | " 'becom',\n", 3106 | " 'becominganadult',\n", 3107 | " 'bed',\n", 3108 | " 'bedroom',\n", 3109 | " 'bee',\n", 3110 | " 'beer',\n", 3111 | " 'befor',\n", 3112 | " 'befriend',\n", 3113 | " 'began',\n", 3114 | " 'begin',\n", 3115 | " 'begins',\n", 3116 | " 'behavior',\n", 3117 | " 'belief',\n", 3118 | " 'believ',\n", 3119 | " 'bell',\n", 3120 | " 'bella',\n", 3121 | " 'belong',\n", 3122 | " 'belov',\n", 3123 | " 'ben',\n", 3124 | " 'benaffleck',\n", 3125 | " 'bend',\n", 3126 | " 'beneath',\n", 3127 | " 'benefit',\n", 3128 | " 'benfost',\n", 3129 | " 'beniciodeltoro',\n", 3130 | " 'benjamin',\n", 3131 | " 'benjaminbratt',\n", 3132 | " 'benkingsley',\n", 3133 | " 'bennett',\n", 3134 | " 'benstil',\n", 3135 | " 'bent',\n", 3136 | " 'berlin',\n", 3137 | " 'best',\n", 3138 | " 'bestfriend',\n", 3139 | " 'bestfriendsinlov',\n", 3140 | " 'bet',\n", 3141 | " 'beth',\n", 3142 | " 'betray',\n", 3143 | " 'bettemidl',\n", 3144 | " 'better',\n", 3145 | " 'betti',\n", 3146 | " 'beverli',\n", 3147 | " 'bibl',\n", 3148 | " 'bid',\n", 3149 | " 'big',\n", 3150 | " 'bigger',\n", 3151 | " 'biggest',\n", 3152 | " 'bike',\n", 3153 | " 'biker',\n", 3154 | " 'bikini',\n", 3155 | " 'billhad',\n", 3156 | " 'billi',\n", 3157 | " 'billionair',\n", 3158 | " 'billmurray',\n", 3159 | " 'billnighi',\n", 3160 | " 'billpaxton',\n", 3161 | " 'billpullman',\n", 3162 | " 'billybobthornton',\n", 3163 | " 'billycrudup',\n", 3164 | " 'billycryst',\n", 3165 | " 'biographi',\n", 3166 | " 'biolog',\n", 3167 | " 'bird',\n", 3168 | " 'birth',\n", 3169 | " 'birthday',\n", 3170 | " 'bisexu',\n", 3171 | " 'bishop',\n", 3172 | " 'bit',\n", 3173 | " 'bite',\n", 3174 | " 'bitter',\n", 3175 | " 'bizarr',\n", 3176 | " 'black',\n", 3177 | " 'blackmag',\n", 3178 | " 'blackmail',\n", 3179 | " 'blackpeopl',\n", 3180 | " 'blacksmith',\n", 3181 | " 'blade',\n", 3182 | " 'blame',\n", 3183 | " 'blend',\n", 3184 | " 'blind',\n", 3185 | " 'bliss',\n", 3186 | " 'blizzard',\n", 3187 | " 'block',\n", 3188 | " 'blond',\n", 3189 | " 'blood',\n", 3190 | " 'bloodi',\n", 3191 | " 'bloodsplatt',\n", 3192 | " 'bloodthirsti',\n", 3193 | " 'blow',\n", 3194 | " 'blue',\n", 3195 | " 'board',\n", 3196 | " 'boardingschool',\n", 3197 | " 'boat',\n", 3198 | " 'bob',\n", 3199 | " 'bobbi',\n", 3200 | " 'bobbyfarrelli',\n", 3201 | " 'bobhoskin',\n", 3202 | " 'bodi',\n", 3203 | " 'body',\n", 3204 | " 'bodyguard',\n", 3205 | " 'bold',\n", 3206 | " 'bollywood',\n", 3207 | " 'bomb',\n", 3208 | " 'bond',\n", 3209 | " 'bone',\n", 3210 | " 'book',\n", 3211 | " 'border',\n", 3212 | " 'bore',\n", 3213 | " 'boredom',\n", 3214 | " 'born',\n", 3215 | " 'boss',\n", 3216 | " 'boston',\n", 3217 | " 'botch',\n", 3218 | " 'bound',\n", 3219 | " 'boundari',\n", 3220 | " 'bounti',\n", 3221 | " 'bountyhunt',\n", 3222 | " 'bout',\n", 3223 | " 'box',\n", 3224 | " 'boxer',\n", 3225 | " 'boy',\n", 3226 | " 'boyfriend',\n", 3227 | " 'boys',\n", 3228 | " 'bradleycoop',\n", 3229 | " 'bradpitt',\n", 3230 | " 'brain',\n", 3231 | " 'brainwash',\n", 3232 | " 'brand',\n", 3233 | " 'brandon',\n", 3234 | " 'brave',\n", 3235 | " 'braveri',\n", 3236 | " 'brazil',\n", 3237 | " 'brazilian',\n", 3238 | " 'break',\n", 3239 | " 'breakdown',\n", 3240 | " 'breast',\n", 3241 | " 'breath',\n", 3242 | " 'breed',\n", 3243 | " 'brendanfras',\n", 3244 | " 'brendangleeson',\n", 3245 | " 'brent',\n", 3246 | " 'brettratn',\n", 3247 | " 'brian',\n", 3248 | " 'briandepalma',\n", 3249 | " 'bride',\n", 3250 | " 'bridesmaid',\n", 3251 | " 'bridg',\n", 3252 | " 'brief',\n", 3253 | " 'brielarson',\n", 3254 | " 'brien',\n", 3255 | " 'bright',\n", 3256 | " 'brilliant',\n", 3257 | " 'bring',\n", 3258 | " 'brink',\n", 3259 | " 'britain',\n", 3260 | " 'british',\n", 3261 | " 'britishsecretservic',\n", 3262 | " 'brittanymurphi',\n", 3263 | " 'broadcast',\n", 3264 | " 'broadway',\n", 3265 | " 'broke',\n", 3266 | " 'broken',\n", 3267 | " 'broker',\n", 3268 | " 'bronx',\n", 3269 | " 'brook',\n", 3270 | " 'brooklyn',\n", 3271 | " 'broom',\n", 3272 | " 'brothel',\n", 3273 | " 'brother',\n", 3274 | " 'brotherbrotherrelationship',\n", 3275 | " 'brothers',\n", 3276 | " 'brothersisterrelationship',\n", 3277 | " 'brought',\n", 3278 | " 'brown',\n", 3279 | " 'bruce',\n", 3280 | " 'brucegreenwood',\n", 3281 | " 'brucewilli',\n", 3282 | " 'brutal',\n", 3283 | " 'bryansing',\n", 3284 | " 'bu',\n", 3285 | " 'buck',\n", 3286 | " 'bud',\n", 3287 | " 'buddi',\n", 3288 | " 'buddy',\n", 3289 | " 'buddycomedi',\n", 3290 | " 'buddycop',\n", 3291 | " 'budget',\n", 3292 | " 'build',\n", 3293 | " 'building',\n", 3294 | " 'built',\n", 3295 | " 'bullet',\n", 3296 | " 'bulli',\n", 3297 | " 'bumbl',\n", 3298 | " 'bunch',\n", 3299 | " 'bunker',\n", 3300 | " 'bunni',\n", 3301 | " 'burglar',\n", 3302 | " 'buri',\n", 3303 | " 'burn',\n", 3304 | " 'bush',\n", 3305 | " 'busi',\n", 3306 | " 'business',\n", 3307 | " 'businessman',\n", 3308 | " 'bust',\n", 3309 | " 'butcher',\n", 3310 | " 'butler',\n", 3311 | " 'butt',\n", 3312 | " 'button',\n", 3313 | " 'buy',\n", 3314 | " 'buzz',\n", 3315 | " 'cabin',\n", 3316 | " 'caesar',\n", 3317 | " 'cage',\n", 3318 | " 'cairo',\n", 3319 | " 'cal',\n", 3320 | " 'california',\n", 3321 | " 'calvin',\n", 3322 | " 'camcord',\n", 3323 | " 'came',\n", 3324 | " 'camera',\n", 3325 | " 'cameraman',\n", 3326 | " 'camerondiaz',\n", 3327 | " 'camp',\n", 3328 | " 'campaign',\n", 3329 | " 'campbell',\n", 3330 | " 'campu',\n", 3331 | " 'canada',\n", 3332 | " 'canadian',\n", 3333 | " 'cancer',\n", 3334 | " 'candi',\n", 3335 | " 'candid',\n", 3336 | " 'canin',\n", 3337 | " 'cannib',\n", 3338 | " 'canuxploit',\n", 3339 | " 'capabl',\n", 3340 | " 'caper',\n", 3341 | " 'capit',\n", 3342 | " 'capt',\n", 3343 | " 'captain',\n", 3344 | " 'captiv',\n", 3345 | " 'captur',\n", 3346 | " 'capture',\n", 3347 | " 'car',\n", 3348 | " 'caraccid',\n", 3349 | " 'carchas',\n", 3350 | " 'carcrash',\n", 3351 | " 'card',\n", 3352 | " 'care',\n", 3353 | " 'career',\n", 3354 | " 'carefre',\n", 3355 | " 'caretak',\n", 3356 | " 'careymulligan',\n", 3357 | " 'caribbean',\n", 3358 | " 'carjourney',\n", 3359 | " 'carl',\n", 3360 | " 'carlagugino',\n", 3361 | " 'carmen',\n", 3362 | " 'carol',\n", 3363 | " 'carolina',\n", 3364 | " 'carrac',\n", 3365 | " 'carri',\n", 3366 | " 'carrie',\n", 3367 | " 'cartel',\n", 3368 | " 'carter',\n", 3369 | " 'cartoon',\n", 3370 | " 'caryelw',\n", 3371 | " 'case',\n", 3372 | " 'caseyaffleck',\n", 3373 | " 'cash',\n", 3374 | " 'casino',\n", 3375 | " 'cast',\n", 3376 | " 'castl',\n", 3377 | " 'cat',\n", 3378 | " 'cataclysm',\n", 3379 | " 'catastroph',\n", 3380 | " 'catch',\n", 3381 | " 'cateblanchett',\n", 3382 | " 'catherinedeneuv',\n", 3383 | " 'catherinekeen',\n", 3384 | " 'catherinezeta',\n", 3385 | " 'cathol',\n", 3386 | " 'catholic',\n", 3387 | " 'cattl',\n", 3388 | " 'caught',\n", 3389 | " 'caus',\n", 3390 | " 'cavalri',\n", 3391 | " 'cave',\n", 3392 | " 'cavemen',\n", 3393 | " 'celebr',\n", 3394 | " 'celebration',\n", 3395 | " 'cell',\n", 3396 | " 'cellphon',\n", 3397 | " 'cemeteri',\n", 3398 | " 'center',\n", 3399 | " 'centr',\n", 3400 | " 'central',\n", 3401 | " 'centuri',\n", 3402 | " 'centuries',\n", 3403 | " 'century',\n", 3404 | " 'ceo',\n", 3405 | " 'certain',\n", 3406 | " 'chad',\n", 3407 | " 'chain',\n", 3408 | " 'chainsaw',\n", 3409 | " 'challeng',\n", 3410 | " 'chamber',\n", 3411 | " 'champion',\n", 3412 | " 'championship',\n", 3413 | " 'chanc',\n", 3414 | " 'chance',\n", 3415 | " 'chang',\n", 3416 | " 'change',\n", 3417 | " 'changed',\n", 3418 | " 'changes',\n", 3419 | " 'channingtatum',\n", 3420 | " 'chao',\n", 3421 | " 'chaos',\n", 3422 | " 'chaotic',\n", 3423 | " 'chapter',\n", 3424 | " 'charact',\n", 3425 | " 'character',\n", 3426 | " 'characters',\n", 3427 | " 'charg',\n", 3428 | " 'charismat',\n", 3429 | " 'charl',\n", 3430 | " 'charli',\n", 3431 | " 'charlie',\n", 3432 | " 'charliesheen',\n", 3433 | " 'charlizetheron',\n", 3434 | " 'charm',\n", 3435 | " 'chart',\n", 3436 | " 'chase',\n", 3437 | " 'chauffeur',\n", 3438 | " 'chazzpalminteri',\n", 3439 | " 'cheat',\n", 3440 | " 'check',\n", 3441 | " 'cheerlead',\n", 3442 | " 'chef',\n", 3443 | " 'chemic',\n", 3444 | " 'cher',\n", 3445 | " 'chevychas',\n", 3446 | " 'chicago',\n", 3447 | " 'chicken',\n", 3448 | " 'chief',\n", 3449 | " 'child',\n", 3450 | " 'childabus',\n", 3451 | " 'childhero',\n", 3452 | " 'childhood',\n", 3453 | " 'childprodigi',\n", 3454 | " 'children',\n", 3455 | " 'chill',\n", 3456 | " 'chimp',\n", 3457 | " 'china',\n", 3458 | " 'chines',\n", 3459 | " 'chip',\n", 3460 | " 'chipmunk',\n", 3461 | " 'chiwetelejiofor',\n", 3462 | " 'chloe',\n", 3463 | " 'chloëgracemoretz',\n", 3464 | " 'chloësevigni',\n", 3465 | " 'chocol',\n", 3466 | " 'choic',\n", 3467 | " 'choice',\n", 3468 | " 'choos',\n", 3469 | " 'chosen',\n", 3470 | " 'chowyun',\n", 3471 | " 'chri',\n", 3472 | " 'chriscolumbu',\n", 3473 | " 'chriscoop',\n", 3474 | " 'chrisevan',\n", 3475 | " 'chrishemsworth',\n", 3476 | " 'chrisklein',\n", 3477 | " 'chrispin',\n", 3478 | " 'chrisrock',\n", 3479 | " 'christ',\n", 3480 | " 'christian',\n", 3481 | " 'christianbal',\n", 3482 | " 'christianslat',\n", 3483 | " 'christin',\n", 3484 | " 'christinaappleg',\n", 3485 | " 'christinaricci',\n", 3486 | " 'christma',\n", 3487 | " 'christmas',\n", 3488 | " 'christmasparti',\n", 3489 | " 'christmastre',\n", 3490 | " 'christoph',\n", 3491 | " 'christopherlambert',\n", 3492 | " 'christopherlloyd',\n", 3493 | " 'christophernolan',\n", 3494 | " 'christopherplumm',\n", 3495 | " 'christopherwalken',\n", 3496 | " 'christophwaltz',\n", 3497 | " 'chrisweitz',\n", 3498 | " 'chronicl',\n", 3499 | " 'chuck',\n", 3500 | " 'church',\n", 3501 | " 'cia',\n", 3502 | " 'ciaránhind',\n", 3503 | " 'cigarettesmok',\n", 3504 | " 'cillianmurphi',\n", 3505 | " 'cinema',\n", 3506 | " 'circl',\n", 3507 | " 'circu',\n", 3508 | " 'circuit',\n", 3509 | " 'circumst',\n", 3510 | " 'citi',\n", 3511 | " 'citizen',\n", 3512 | " 'city',\n", 3513 | " 'civil',\n", 3514 | " 'civilian',\n", 3515 | " 'civilwar',\n", 3516 | " 'claim',\n", 3517 | " 'clair',\n", 3518 | " 'clairedan',\n", 3519 | " 'claireforlani',\n", 3520 | " 'clan',\n", 3521 | " 'clark',\n", 3522 | " 'clash',\n", 3523 | " 'class',\n", 3524 | " 'classdiffer',\n", 3525 | " 'classic',\n", 3526 | " 'classmat',\n", 3527 | " 'classroom',\n", 3528 | " 'claudevandamm',\n", 3529 | " 'clay',\n", 3530 | " 'clean',\n", 3531 | " 'clear',\n", 3532 | " 'clerk',\n", 3533 | " 'clever',\n", 3534 | " 'client',\n", 3535 | " 'clients',\n", 3536 | " 'cliff',\n", 3537 | " 'climat',\n", 3538 | " 'climb',\n", 3539 | " 'clinteastwood',\n", 3540 | " 'cliveowen',\n", 3541 | " 'clock',\n", 3542 | " 'clone',\n", 3543 | " 'close',\n", 3544 | " 'closer',\n", 3545 | " 'cloud',\n", 3546 | " 'clown',\n", 3547 | " 'club',\n", 3548 | " 'clue',\n", 3549 | " 'clueless',\n", 3550 | " 'clutch',\n", 3551 | " 'coach',\n", 3552 | " 'coast',\n", 3553 | " 'cocain',\n", 3554 | " 'code',\n", 3555 | " 'coffin',\n", 3556 | " 'cohen',\n", 3557 | " 'col',\n", 3558 | " 'cold',\n", 3559 | " 'coldwar',\n", 3560 | " 'cole',\n", 3561 | " 'colin',\n", 3562 | " 'colinfarrel',\n", 3563 | " 'colinfirth',\n", 3564 | " 'collaps',\n", 3565 | " 'colleagu',\n", 3566 | " 'collect',\n", 3567 | " 'collector',\n", 3568 | " 'colleg',\n", 3569 | " 'college',\n", 3570 | " 'collid',\n", 3571 | " 'collis',\n", 3572 | " 'colombia',\n", 3573 | " 'colonel',\n", 3574 | " 'coloni',\n", 3575 | " 'color',\n", 3576 | " 'colorado',\n", 3577 | " 'coma',\n", 3578 | " 'combat',\n", 3579 | " 'combin',\n", 3580 | " 'come',\n", 3581 | " 'comeback',\n", 3582 | " 'comed',\n", 3583 | " 'comedi',\n", 3584 | " 'comedian',\n", 3585 | " 'comedy',\n", 3586 | " 'comet',\n", 3587 | " 'comfort',\n", 3588 | " 'comic',\n", 3589 | " 'coming',\n", 3590 | " 'comingofag',\n", 3591 | " 'comingout',\n", 3592 | " 'command',\n", 3593 | " 'commando',\n", 3594 | " 'commerci',\n", 3595 | " 'commiss',\n", 3596 | " 'commit',\n", 3597 | " 'common',\n", 3598 | " 'commun',\n", 3599 | " 'communist',\n", 3600 | " 'community',\n", 3601 | " 'compani',\n", 3602 | " 'companion',\n", 3603 | " 'company',\n", 3604 | " 'compet',\n", 3605 | " 'competit',\n", 3606 | " 'competition',\n", 3607 | " 'complet',\n", 3608 | " 'complex',\n", 3609 | " 'complic',\n", 3610 | " 'compos',\n", 3611 | " 'compuls',\n", 3612 | " 'comput',\n", 3613 | " 'computerviru',\n", 3614 | " 'conan',\n", 3615 | " 'concern',\n", 3616 | " 'concert',\n", 3617 | " 'concoct',\n", 3618 | " 'condit',\n", 3619 | " 'condition',\n", 3620 | " 'conduct',\n", 3621 | " 'confeder',\n", 3622 | " 'confess',\n", 3623 | " 'confid',\n", 3624 | " 'confin',\n", 3625 | " 'conflict',\n", 3626 | " 'confront',\n", 3627 | " 'confus',\n", 3628 | " 'congress',\n", 3629 | " 'conman',\n", 3630 | " 'connect',\n", 3631 | " 'connecticut',\n", 3632 | " 'connel',\n", 3633 | " 'connor',\n", 3634 | " 'conquer',\n", 3635 | " 'consequ',\n", 3636 | " 'consequences',\n", 3637 | " 'conserv',\n", 3638 | " 'consid',\n", 3639 | " 'conspir',\n", 3640 | " 'conspiraci',\n", 3641 | " 'conspiracy',\n", 3642 | " 'constant',\n", 3643 | " 'constantli',\n", 3644 | " 'construct',\n", 3645 | " 'consum',\n", 3646 | " 'contact',\n", 3647 | " 'contain',\n", 3648 | " 'contemporari',\n", 3649 | " 'contend',\n", 3650 | " 'content',\n", 3651 | " 'contest',\n", 3652 | " 'continu',\n", 3653 | " 'contract',\n", 3654 | " 'contractor',\n", 3655 | " 'control',\n", 3656 | " 'controversi',\n", 3657 | " 'convent',\n", 3658 | " 'converg',\n", 3659 | " 'convers',\n", 3660 | " 'convict',\n", 3661 | " 'convinc',\n", 3662 | " 'cook',\n", 3663 | " ...]" 3664 | ] 3665 | }, 3666 | "execution_count": 42, 3667 | "metadata": {}, 3668 | "output_type": "execute_result" 3669 | } 3670 | ], 3671 | "source": [ 3672 | "#witht the help of CountVectorizer we can see the most frequent 5000 words\n", 3673 | "cv.get_feature_names()" 3674 | ] 3675 | }, 3676 | { 3677 | "cell_type": "code", 3678 | "execution_count": 43, 3679 | "id": "5b6221d5", 3680 | "metadata": {}, 3681 | "outputs": [ 3682 | { 3683 | "data": { 3684 | "text/plain": [ 3685 | "'love'" 3686 | ] 3687 | }, 3688 | "execution_count": 43, 3689 | "metadata": {}, 3690 | "output_type": "execute_result" 3691 | } 3692 | ], 3693 | "source": [ 3694 | "#as wee see there are some words which are same like: - action actions or love,lovies so en ko ek single naam edene ke liye \n", 3695 | "#we can use the concept of stemming: - \n", 3696 | "#Stemming is a natural language processing (NLP) technique that reduces words to their base form or root form. This is done by removing affixes, such as prefixes and suffixes. Stemming is often used in information retrieval (IR) systems, such as movie recommendation systems, to improve the accuracy and efficiency of keyword search.\n", 3697 | "#eg: - runner,run,running can all be stemmed to the root word \"run.\"\n", 3698 | "#humne oper pura stemming concept ke related kaam kardiya hai now let's see: - \n", 3699 | "\n", 3700 | "#Note: Stemming is a heuristic process and can sometimes produce incorrect results. For example, the words \"ran\" and \"rancher\" would both be stemmed to the word \"ran\". For more accurate results, you may want to consider using a lemmatizer instead of a stemmer.\n", 3701 | "#ps object of the porterstemmer class we a;ready made this in above cells \n", 3702 | "ps.stem('loved')\n", 3703 | "# ps.stem('loving')\n", 3704 | "# ps.stem('love')\n", 3705 | "#for all three we get the same output." 3706 | ] 3707 | }, 3708 | { 3709 | "cell_type": "code", 3710 | "execution_count": 44, 3711 | "id": "b149877d", 3712 | "metadata": {}, 3713 | "outputs": [ 3714 | { 3715 | "data": { 3716 | "text/plain": [ 3717 | "'danc'" 3718 | ] 3719 | }, 3720 | "execution_count": 44, 3721 | "metadata": {}, 3722 | "output_type": "execute_result" 3723 | } 3724 | ], 3725 | "source": [ 3726 | "ps.stem('dancing')#dancing,dance and dancer we get the root term alsways: - danc" 3727 | ] 3728 | }, 3729 | { 3730 | "cell_type": "code", 3731 | "execution_count": 45, 3732 | "id": "c2af3519", 3733 | "metadata": {}, 3734 | "outputs": [ 3735 | { 3736 | "data": { 3737 | "text/plain": [ 3738 | "'in the 22nd century, a parapleg marin is dispatch to the moon pandora on a uniqu mission, but becom torn between follow order and protect an alien civilization. action adventur fantasi sciencefict cultureclash futur spacewar spacecoloni societi spacetravel futurist romanc space alien tribe alienplanet cgi marin soldier battl loveaffair antiwar powerrel mindandsoul 3d samworthington zoesaldana sigourneyweav jamescameron'" 3739 | ] 3740 | }, 3741 | "execution_count": 45, 3742 | "metadata": {}, 3743 | "output_type": "execute_result" 3744 | } 3745 | ], 3746 | "source": [ 3747 | "stem('in the 22nd century, a paraplegic marine is dispatched to the moon pandora on a unique mission, but becomes torn between following orders and protecting an alien civilization. action adventure fantasy sciencefiction cultureclash future spacewar spacecolony society spacetravel futuristic romance space alien tribe alienplanet cgi marine soldier battle loveaffair antiwar powerrelations mindandsoul 3d samworthington zoesaldana sigourneyweaver jamescameron')\n", 3748 | "#with this we get the stem form of whole paragraph words.all the words come according to their root node word." 3749 | ] 3750 | }, 3751 | { 3752 | "cell_type": "code", 3753 | "execution_count": 46, 3754 | "id": "60762953", 3755 | "metadata": {}, 3756 | "outputs": [], 3757 | "source": [ 3758 | "#upto this code our vectorisation concept is finished" 3759 | ] 3760 | }, 3761 | { 3762 | "cell_type": "code", 3763 | "execution_count": 47, 3764 | "id": "de3a5b05", 3765 | "metadata": {}, 3766 | "outputs": [ 3767 | { 3768 | "data": { 3769 | "text/plain": [ 3770 | "array([[1. , 0.08346223, 0.0860309 , ..., 0.04499213, 0. ,\n", 3771 | " 0. ],\n", 3772 | " [0.08346223, 1. , 0.06063391, ..., 0.02378257, 0. ,\n", 3773 | " 0.02615329],\n", 3774 | " [0.0860309 , 0.06063391, 1. , ..., 0.02451452, 0. ,\n", 3775 | " 0. ],\n", 3776 | " ...,\n", 3777 | " [0.04499213, 0.02378257, 0.02451452, ..., 1. , 0.03962144,\n", 3778 | " 0.04229549],\n", 3779 | " [0. , 0. , 0. , ..., 0.03962144, 1. ,\n", 3780 | " 0.08714204],\n", 3781 | " [0. , 0.02615329, 0. , ..., 0.04229549, 0.08714204,\n", 3782 | " 1. ]])" 3783 | ] 3784 | }, 3785 | "execution_count": 47, 3786 | "metadata": {}, 3787 | "output_type": "execute_result" 3788 | } 3789 | ], 3790 | "source": [ 3791 | "#as we are talking about 5000 dimensional space in which every movie is a vector.\n", 3792 | "#no we are gonna calculating the distance of every movie vector with respect to another movie's vector. #more the distance lesser will be the similarity and less the distance more will be the similarity.\n", 3793 | "#here we are not gonna calculate the euclidian distance infact we are here using and calculating the cosine distance.\n", 3794 | "\n", 3795 | "#basicaly in cosine distance we are gonna calculating basically the angle between the two movies lesser then angle mmore will be the similarity.\n", 3796 | "\n", 3797 | "#we are using the cosine distance because in high order dimensionality like 5000 which we are using write now eucliedian distance doesn't work very well due to the reason of (curse of dimensionality).\n", 3798 | "\n", 3799 | "# The following are some of the specific implications of the curse of dimensionality for Euclidean distance:\n", 3800 | "\n", 3801 | "# The average distance between two points in high-dimensional space increases rapidly with the number of dimensions. This means that most points are likely to be very close to each other, regardless of their actual similarity.\n", 3802 | "# The distribution of distances between points in high-dimensional space becomes more and more uniform. This means that it becomes difficult to distinguish between similar and dissimilar points based on their Euclidean distance.\n", 3803 | "# Euclidean distance becomes less and less informative about the similarity of points in high-dimensional space. This is because there are many different ways to travel between two points in high-dimensional space, and the shortest path may not be the most meaningful.\n", 3804 | "# As a result of these factors, Euclidean distance is often not a good choice for measuring similarity in high-dimensional spaces. There are a number of other distance metrics that are better suited for high-dimensional data, such as cosine similarity and Jaccard similarity.\n", 3805 | "\n", 3806 | "# for finding the cosine similarity(as distance is inversely proportuional to similarity) we are using sckit learn library for importing one function name cosine similarity.\n", 3807 | "from sklearn.metrics.pairwise import cosine_similarity#cosine similarity has two values 0 and 1 noly.\n", 3808 | "cosine_similarity(vectors)" 3809 | ] 3810 | }, 3811 | { 3812 | "cell_type": "code", 3813 | "execution_count": 48, 3814 | "id": "0b68db69", 3815 | "metadata": {}, 3816 | "outputs": [ 3817 | { 3818 | "data": { 3819 | "text/plain": [ 3820 | "(4806, 4806)" 3821 | ] 3822 | }, 3823 | "execution_count": 48, 3824 | "metadata": {}, 3825 | "output_type": "execute_result" 3826 | } 3827 | ], 3828 | "source": [ 3829 | "cosine_similarity(vectors).shape# output 4806,4806 eslie aya because we are calculating the distances of every movie from every other movies vectors.eg: - hir ek movie ka 4806 movies ke sath distance we are calculating." 3830 | ] 3831 | }, 3832 | { 3833 | "cell_type": "code", 3834 | "execution_count": 49, 3835 | "id": "b393fb31", 3836 | "metadata": {}, 3837 | "outputs": [ 3838 | { 3839 | "data": { 3840 | "text/plain": [ 3841 | "array([[1. , 0.08346223, 0.0860309 , ..., 0.04499213, 0. ,\n", 3842 | " 0. ],\n", 3843 | " [0.08346223, 1. , 0.06063391, ..., 0.02378257, 0. ,\n", 3844 | " 0.02615329],\n", 3845 | " [0.0860309 , 0.06063391, 1. , ..., 0.02451452, 0. ,\n", 3846 | " 0. ],\n", 3847 | " ...,\n", 3848 | " [0.04499213, 0.02378257, 0.02451452, ..., 1. , 0.03962144,\n", 3849 | " 0.04229549],\n", 3850 | " [0. , 0. , 0. , ..., 0.03962144, 1. ,\n", 3851 | " 0.08714204],\n", 3852 | " [0. , 0.02615329, 0. , ..., 0.04229549, 0.08714204,\n", 3853 | " 1. ]])" 3854 | ] 3855 | }, 3856 | "execution_count": 49, 3857 | "metadata": {}, 3858 | "output_type": "execute_result" 3859 | } 3860 | ], 3861 | "source": [ 3862 | "similarity=cosine_similarity(vectors)\n", 3863 | "similarity#if we see carefully that in output we are getting array of arrays means hur ek movie ke ander humme oss movie ka bakii movies ke sath ka similarity mill rha hai in array form." 3864 | ] 3865 | }, 3866 | { 3867 | "cell_type": "code", 3868 | "execution_count": 50, 3869 | "id": "084bea55", 3870 | "metadata": {}, 3871 | "outputs": [ 3872 | { 3873 | "data": { 3874 | "text/plain": [ 3875 | "array([1. , 0.08346223, 0.0860309 , ..., 0.04499213, 0. ,\n", 3876 | " 0. ])" 3877 | ] 3878 | }, 3879 | "execution_count": 50, 3880 | "metadata": {}, 3881 | "output_type": "execute_result" 3882 | } 3883 | ], 3884 | "source": [ 3885 | "#exaple: - \n", 3886 | "similarity[0]#humme phele movie ka sab se sath ki similarity ka array mill gya hai." 3887 | ] 3888 | }, 3889 | { 3890 | "cell_type": "code", 3891 | "execution_count": 51, 3892 | "id": "f487d33b", 3893 | "metadata": {}, 3894 | "outputs": [], 3895 | "source": [ 3896 | "#hur ek movie ka osse se he similarity score is = 1 that's in above output we get 1 as a starting value of similarity and this changes when we compare with others.\n", 3897 | "#means we can say that hur movie ka apne se similarity hmesha 1 rhega because they are same obviously.so it means if we see output carefully than diagonal is always 1.because same movies ki similarities apne se 1 aaengi only on the diagonal." 3898 | ] 3899 | }, 3900 | { 3901 | "cell_type": "code", 3902 | "execution_count": 52, 3903 | "id": "cd6cecdb", 3904 | "metadata": {}, 3905 | "outputs": [], 3906 | "source": [ 3907 | "#now we are making a function jiske ander hum kuch aessa karenge ki agr humne ek movie daalli toh voh oske jaise he 5 movies mujhe recommend karke dega: -\n", 3908 | "#if we are given with any movie than first we have to find the index of that movie in the data.\n", 3909 | "#then we have to go in similarity matrix for that specific index movie \n", 3910 | "#feer oss movie se saree movies ka distance nikalenge(eg: - avatar)\n", 3911 | "#now we sort the distance and we get the less distance movie at the first means more similarity and then we fetch that msot similar movies from the dataset.\n", 3912 | "def recommended(movie):\n", 3913 | " #first we are going for the index:\n", 3914 | " movie_index=new_df[new_df['title']==movie].index[0]\n", 3915 | " distances=similarity[movie_index]#these distances are in array so we have to sort the array for getting the closest first five movies.\n", 3916 | " #humne niche valli line kyu likhe: - \n", 3917 | " #phele toh yeh line humme phele 5 movies dhondhne mai help karegi.or hum sort ess taarh se karenge takii hmare orginal movie ka distance osse hmesha 1 rhega voh na change hoon kyu ki agr voh change hogya during sorting then indexing kharab hoon sakti hai\n", 3918 | " #humme during sorting ek movie ka oska apne aap se indexing pakad ke rakhna pdega takii indexing lose na hoon or oske baad valli 5 movies par puraa kaaam karna hai. for this only we use the (enumerate function)\n", 3919 | " #list main karenge usse or then sorting and in reverse order takii smallest distance valli sabse phele aae.\n", 3920 | " movie_list=sorted(list(enumerate(distances)),reverse=True,key=lambda x:x[1])[1:6] #(x:x[1]) it is used for telling that humme first valle number ke base pe nhi second valle ke base pe se sorting chalo karna hai.takii indexing maintain rhe\n", 3921 | " \n", 3922 | " for i in movie_list:\n", 3923 | " print(new_df.iloc[i[0]].title)#with the help of this we get that specific index movies." 3924 | ] 3925 | }, 3926 | { 3927 | "cell_type": "code", 3928 | "execution_count": 53, 3929 | "id": "d2ca6a2e", 3930 | "metadata": {}, 3931 | "outputs": [ 3932 | { 3933 | "name": "stdout", 3934 | "output_type": "stream", 3935 | "text": [ 3936 | "Aliens vs Predator: Requiem\n", 3937 | "Aliens\n", 3938 | "Falcon Rising\n", 3939 | "Independence Day\n", 3940 | "Titan A.E.\n" 3941 | ] 3942 | } 3943 | ], 3944 | "source": [ 3945 | "recommended('Avatar')#witht this we get the index of the most similar movies according to the movie which we mentioned first." 3946 | ] 3947 | }, 3948 | { 3949 | "cell_type": "code", 3950 | "execution_count": 54, 3951 | "id": "d04efdbb", 3952 | "metadata": {}, 3953 | "outputs": [ 3954 | { 3955 | "data": { 3956 | "text/plain": [ 3957 | "'Aliens vs Predator: Requiem'" 3958 | ] 3959 | }, 3960 | "execution_count": 54, 3961 | "metadata": {}, 3962 | "output_type": "execute_result" 3963 | } 3964 | ], 3965 | "source": [ 3966 | "new_df.iloc[1216].title" 3967 | ] 3968 | }, 3969 | { 3970 | "cell_type": "code", 3971 | "execution_count": 55, 3972 | "id": "7cfafc05", 3973 | "metadata": {}, 3974 | "outputs": [], 3975 | "source": [ 3976 | "#here our recommendation system model making is done." 3977 | ] 3978 | } 3979 | ], 3980 | "metadata": { 3981 | "kernelspec": { 3982 | "display_name": "Python 3 (ipykernel)", 3983 | "language": "python", 3984 | "name": "python3" 3985 | }, 3986 | "language_info": { 3987 | "codemirror_mode": { 3988 | "name": "ipython", 3989 | "version": 3 3990 | }, 3991 | "file_extension": ".py", 3992 | "mimetype": "text/x-python", 3993 | "name": "python", 3994 | "nbconvert_exporter": "python", 3995 | "pygments_lexer": "ipython3", 3996 | "version": "3.9.12" 3997 | } 3998 | }, 3999 | "nbformat": 4, 4000 | "nbformat_minor": 5 4001 | } 4002 | --------------------------------------------------------------------------------