├── .gitignore ├── Processed Datasets ├── AmazonMusic.tar.xz ├── AmazonMusicCompact.tar.xz ├── Anime │ ├── README.md │ └── anime.zip ├── BookCrossing │ ├── README.md │ └── book_crossing.zip ├── FBC.ipynb ├── FC.ipynb ├── RS_NonPersonalized.ipynb ├── RetailrocketEcommerce │ ├── README.md │ └── Retailrocket_Ecommerce.zip └── Steam │ ├── README.md │ └── steam.zip └── README.md /.gitignore: -------------------------------------------------------------------------------- 1 | # Created by .ignore support plugin (hsz.mobi) 2 | .gitignore 3 | .idea/ 4 | -------------------------------------------------------------------------------- /Processed Datasets/AmazonMusic.tar.xz: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/caserec/Datasets-for-Recommender-Systems/4180b4dc4103452c591a1718560d29bdf1f48540/Processed Datasets/AmazonMusic.tar.xz -------------------------------------------------------------------------------- /Processed Datasets/AmazonMusicCompact.tar.xz: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/caserec/Datasets-for-Recommender-Systems/4180b4dc4103452c591a1718560d29bdf1f48540/Processed Datasets/AmazonMusicCompact.tar.xz -------------------------------------------------------------------------------- /Processed Datasets/Anime/README.md: -------------------------------------------------------------------------------- 1 | SUMMARY & USAGE LICENSE 2 | ============================================= 3 | 4 | This dataset is provided by Keagle through the link: https://www.kaggle.com/CooperUnion/anime-recommendations-database 5 | 6 | * Context 7 | The original dataset contains information on user preference data from 73,516 users on 12,294 anime. Each user is able to add anime to their completed list and give it a rating and this data set is a compilation of those ratings. This dataset has been organized and reduced by Arthur Fortes [1], which generated new IDs for users and items, separated in files the information of history and ratings and randomly selected a subsample of 5,000 users. 8 | 9 | The reduced was made with Python [2] with random.seed(123). 10 | 11 | * Acknowledgements 12 | Thanks to myanimelist.net API for providing anime data and user ratings. 13 | 14 | 15 | Detailed descriptions of the data file can be found at the end of this file. 16 | 17 | This dataset consists of: 18 | * 520,610 interactions (play / purchase) from 5,000 users on 7,718 animes. 19 | - History: 5,000 users and 7,390 games (520,610 interactions) 20 | - Ratings: 4,714 users and 7,157 animes (419,944 interactions) 21 | 22 | If you have any further questions or comments, please contact me 23 | . 24 | 25 | 26 | DETAILED DESCRIPTIONS OF DATA FILES 27 | ============================================== 28 | 29 | Here are brief descriptions of the data. 30 | 31 | anime_ratings.dat 32 | 33 | The full ratings set: 419,944 interactions by 4,714 users on 7,157 animes. 34 | Users and items are numbered consecutively from 1. The data is ordered by users ids. 35 | This is a tab separated list of 36 | User_ID | Anime_ID | Feedback 37 | 38 | Range of ratings: 1 - 10 39 | 40 | anime_history.dat 41 | 42 | The full history set: 520,610 interactions by 5,000 users on 7,390 animes. 43 | Users and items are numbered consecutively from 1. The data is ordered by users ids. 44 | This is a tab separated list of 45 | User_ID | Anime_ID | Feedback 46 | 47 | anime_info.dat 48 | 49 | Information about the items (animes); this is a tab separated list of 50 | anime_ids | name | genre | type | episodes | rating | members 51 | 52 | The item ids are the ones used in the game_purchase.dat 53 | and game_play.dat files. 54 | 55 | anime_ids - myanimelist.net's unique id identifying an anime. 56 | name - full name of anime. 57 | genre - comma separated list of genres for this anime. 58 | type - movie, TV, OVA, etc. 59 | episodes - how many episodes in this show. (1 if movie). 60 | rating - average rating out of 10 for this anime. 61 | members - number of community members that are in this anime's "group". 62 | 63 | 64 | REFERENCES 65 | ============================================== 66 | 67 | [1] Da Costa, Arthur Fortes. PhD candidate at the Institute of Mathematical and Computational Sciences, 68 | University of São Paulo. URL: https://arthurfortes.github.io/ 69 | 70 | [2] https://www.python.org/ -------------------------------------------------------------------------------- /Processed Datasets/Anime/anime.zip: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/caserec/Datasets-for-Recommender-Systems/4180b4dc4103452c591a1718560d29bdf1f48540/Processed Datasets/Anime/anime.zip -------------------------------------------------------------------------------- /Processed Datasets/BookCrossing/README.md: -------------------------------------------------------------------------------- 1 | SUMMARY & USAGE LICENSE 2 | ============================================= 3 | 4 | Book Crossing dataset were collected by Cai-Nicolas Ziegler in a 4-week crawl (August / September 2004) 5 | from the Book-Crossing community with kind permission from Ron Hornbaker, CTO of Humankind Systems. 6 | 7 | This data has been organized and cleaned up by Arthur Fortes [1] based on MovieLens 100k treatment [2], 8 | which removed all users and items who had less than 20 and 10 interactions, receptively, items that have no information and separated 9 | in files the explicit and implicit interactions. 10 | 11 | Detailed descriptions of the data file can be found at the end of this file. 12 | 13 | This dataset consists of: 14 | * 272,679 interactions (explicit / implicit) from 2,946 users on 17,384 books. 15 | - Ratings: 1,295 users and 14,684 books (62,657 ratings applied) 16 | - History: 2,946 users and 17,384 books (272,679 accesses) 17 | * Ratings are between 1 - 10. Implicit feedback are represented by 1. 18 | * Simple demographic info for the users (age, gender, occupation, zip) 19 | 20 | If you have any further questions or comments, please contact me 21 | . 22 | 23 | 24 | CITATION 25 | ============================================== 26 | 27 | Freely available for research use when acknowledged with the following reference (further details on the dataset are given in this publication): 28 | 29 | Improving Recommendation Lists Through Topic Diversification, Cai-Nicolas Ziegler, Sean M. McNee, Joseph A. Konstan, Georg Lausen; 30 | Proceedings of the 14th International World Wide Web Conference (WWW '05), May 10-14, 2005, Chiba, Japan. 31 | 32 | 33 | DETAILED DESCRIPTIONS OF DATA FILES 34 | ============================================== 35 | 36 | Here are brief descriptions of the data. 37 | 38 | items_info.dat 39 | 40 | Information about the items (books); this is a tab separated list of 41 | Book_ID | ISBN | Book-Title | Book-Author | Year-Of-Publication | 42 | Publisher | Image-URL-S | Image-URL-M | Image-URL-L | 43 | 44 | The item ids are the ones used in the book_history.dat 45 | and book_ratings.dat files. 46 | 47 | users_info.dat 48 | 49 | Demographic information about the users; this is a tab 50 | separated list of 51 | User-ID | Location | Age 52 | 53 | The user ids are the ones used in the book_history.dat 54 | and book_ratings.dat files. 55 | 56 | 57 | book_history.dat 58 | 59 | The full history set, 272,679 accesses by 2,946 users on 17,384 books. 60 | Each user has accessed at least 20 books. Users and items are 61 | numbered consecutively from 1. The data is ordered by users ids. 62 | This is a tab separated list of 63 | user id | item id | accessed 64 | 65 | book_ratings.dat 66 | 67 | The full ratings set, 62,657 ratings by 1,295 users on 14,684 books. 68 | Users and items are numbered consecutively from 1. The data is ordered by users ids. 69 | This is a tab separated list of 70 | user id | item id | ratings 71 | 72 | 73 | REFERENCES 74 | ============================================== 75 | 76 | [1] Da Costa, Arthur Fortes. PhD candidate at the Institute of Mathematical and Computational Sciences, 77 | University of São Paulo. URL: https://arthurfortes.github.io/ 78 | 79 | 80 | [2] MovieLens 100K Dataset. Stable benchmark dataset. 100,000 ratings from 1000 users on 1700 movies. 81 | Released 4/1998. URL: https://grouplens.org/datasets/movielens/100k/ 82 | Generated by GroupLens [Department of Computer Science and Engineering at the University of Minnesota]. 83 | -------------------------------------------------------------------------------- /Processed Datasets/BookCrossing/book_crossing.zip: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/caserec/Datasets-for-Recommender-Systems/4180b4dc4103452c591a1718560d29bdf1f48540/Processed Datasets/BookCrossing/book_crossing.zip -------------------------------------------------------------------------------- /Processed Datasets/FBC.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "nbformat": 4, 3 | "nbformat_minor": 0, 4 | "metadata": { 5 | "colab": { 6 | "name": "FBC.ipynb", 7 | "version": "0.3.2", 8 | "provenance": [] 9 | }, 10 | "kernelspec": { 11 | "name": "python3", 12 | "display_name": "Python 3" 13 | } 14 | }, 15 | "cells": [ 16 | { 17 | "cell_type": "code", 18 | "metadata": { 19 | "id": "UbCVwAjd4Lk7", 20 | "colab_type": "code", 21 | "colab": {} 22 | }, 23 | "source": [ 24 | "import pandas as pd\n", 25 | "import numpy as np" 26 | ], 27 | "execution_count": 0, 28 | "outputs": [] 29 | }, 30 | { 31 | "cell_type": "code", 32 | "metadata": { 33 | "id": "8OFEkZuC4d0G", 34 | "colab_type": "code", 35 | "colab": {} 36 | }, 37 | "source": [ 38 | "metadata = pd.read_csv('AmazonMusic/amazon_music_metadata.csv')" 39 | ], 40 | "execution_count": 0, 41 | "outputs": [] 42 | }, 43 | { 44 | "cell_type": "code", 45 | "metadata": { 46 | "id": "az-cFZrF7KY9", 47 | "colab_type": "code", 48 | "colab": { 49 | "base_uri": "https://localhost:8080/", 50 | "height": 412 51 | }, 52 | "outputId": "963ec8ff-2fcb-41f4-f4f8-493cf90befcb" 53 | }, 54 | "source": [ 55 | "metadata.head()" 56 | ], 57 | "execution_count": 6, 58 | "outputs": [ 59 | { 60 | "output_type": "execute_result", 61 | "data": { 62 | "text/html": [ 63 | "
\n", 64 | "\n", 77 | "\n", 78 | " \n", 79 | " \n", 80 | " \n", 81 | " \n", 82 | " \n", 83 | " \n", 84 | " \n", 85 | " \n", 86 | " \n", 87 | " \n", 88 | " \n", 89 | " \n", 90 | " \n", 91 | " \n", 92 | " \n", 93 | " \n", 94 | " \n", 95 | " \n", 96 | " \n", 97 | " \n", 98 | " \n", 99 | " \n", 100 | " \n", 101 | " \n", 102 | " \n", 103 | " \n", 104 | " \n", 105 | " \n", 106 | " \n", 107 | " \n", 108 | " \n", 109 | " \n", 110 | " \n", 111 | " \n", 112 | " \n", 113 | " \n", 114 | " \n", 115 | " \n", 116 | " \n", 117 | " \n", 118 | " \n", 119 | " \n", 120 | " \n", 121 | " \n", 122 | " \n", 123 | " \n", 124 | " \n", 125 | " \n", 126 | " \n", 127 | " \n", 128 | " \n", 129 | " \n", 130 | " \n", 131 | " \n", 132 | " \n", 133 | " \n", 134 | " \n", 135 | " \n", 136 | " \n", 137 | " \n", 138 | " \n", 139 | " \n", 140 | " \n", 141 | " \n", 142 | " \n", 143 | " \n", 144 | " \n", 145 | " \n", 146 | " \n", 147 | " \n", 148 | " \n", 149 | " \n", 150 | " \n", 151 | " \n", 152 | " \n", 153 | " \n", 154 | " \n", 155 | " \n", 156 | " \n", 157 | " \n", 158 | " \n", 159 | " \n", 160 | " \n", 161 | " \n", 162 | " \n", 163 | " \n", 164 | " \n", 165 | " \n", 166 | " \n", 167 | " \n", 168 | " \n", 169 | " \n", 170 | " \n", 171 | " \n", 172 | " \n", 173 | " \n", 174 | " \n", 175 | " \n", 176 | " \n", 177 | " \n", 178 | " \n", 179 | " \n", 180 | " \n", 181 | " \n", 182 | " \n", 183 | " \n", 184 | " \n", 185 | " \n", 186 | " \n", 187 | " \n", 188 | " \n", 189 | " \n", 190 | " \n", 191 | " \n", 192 | " \n", 193 | " \n", 194 | " \n", 195 | " \n", 196 | " \n", 197 | " \n", 198 | " \n", 199 | " \n", 200 | " \n", 201 | " \n", 202 | " \n", 203 | " \n", 204 | " \n", 205 | " \n", 206 | " \n", 207 | " \n", 208 | " \n", 209 | " \n", 210 | " \n", 211 | " \n", 212 | " \n", 213 | " \n", 214 | " \n", 215 | " \n", 216 | " \n", 217 | " \n", 218 | " \n", 219 | " \n", 220 | " \n", 221 | " \n", 222 | " \n", 223 | " \n", 224 | " \n", 225 | " \n", 226 | " \n", 227 | " \n", 228 | " \n", 229 | " \n", 230 | " \n", 231 | " \n", 232 | " \n", 233 | " \n", 234 | " \n", 235 | " \n", 236 | " \n", 237 | " \n", 238 | " \n", 239 | " \n", 240 | " \n", 241 | " \n", 242 | " \n", 243 | " \n", 244 | " \n", 245 | " \n", 246 | " \n", 247 | " \n", 248 | " \n", 249 | " \n", 250 | " \n", 251 | " \n", 252 | " \n", 253 | " \n", 254 | " \n", 255 | " \n", 256 | " \n", 257 | " \n", 258 | " \n", 259 | " \n", 260 | " \n", 261 | " \n", 262 | " \n", 263 | " \n", 264 | " \n", 265 | " \n", 266 | " \n", 267 | " \n", 268 | " \n", 269 | " \n", 270 | " \n", 271 | " \n", 272 | " \n", 273 | " \n", 274 | " \n", 275 | " \n", 276 | " \n", 277 | " \n", 278 | " \n", 279 | " \n", 280 | " \n", 281 | " \n", 282 | " \n", 283 | " \n", 284 | " \n", 285 | " \n", 286 | " \n", 287 | " \n", 288 | " \n", 289 | " \n", 290 | " \n", 291 | " \n", 292 | " \n", 293 | " \n", 294 | " \n", 295 | " \n", 296 | " \n", 297 | " \n", 298 | " \n", 299 | " \n", 300 | " \n", 301 | " \n", 302 | " \n", 303 | " \n", 304 | " \n", 305 | " \n", 306 | " \n", 307 | " \n", 308 | " \n", 309 | " \n", 310 | " \n", 311 | " \n", 312 | " \n", 313 | " \n", 314 | " \n", 315 | " \n", 316 | " \n", 317 | " \n", 318 | " \n", 319 | " \n", 320 | " \n", 321 | " \n", 322 | " \n", 323 | " \n", 324 | " \n", 325 | " \n", 326 | " \n", 327 | " \n", 328 | " \n", 329 | " \n", 330 | " \n", 331 | " \n", 332 | " \n", 333 | " \n", 334 | " \n", 335 | " \n", 336 | " \n", 337 | " \n", 338 | " \n", 339 | " \n", 340 | " \n", 341 | " \n", 342 | " \n", 343 | " \n", 344 | " \n", 345 | " \n", 346 | " \n", 347 | " \n", 348 | " \n", 349 | " \n", 350 | " \n", 351 | " \n", 352 | " \n", 353 | " \n", 354 | " \n", 355 | " \n", 356 | " \n", 357 | " \n", 358 | " \n", 359 | " \n", 360 | " \n", 361 | " \n", 362 | " \n", 363 | " \n", 364 | " \n", 365 | " \n", 366 | " \n", 367 | " \n", 368 | " \n", 369 | " \n", 370 | " \n", 371 | " \n", 372 | " \n", 373 | " \n", 374 | " \n", 375 | " \n", 376 | " \n", 377 | " \n", 378 | " \n", 379 | " \n", 380 | " \n", 381 | " \n", 382 | " \n", 383 | " \n", 384 | " \n", 385 | " \n", 386 | " \n", 387 | " \n", 388 | " \n", 389 | " \n", 390 | " \n", 391 | " \n", 392 | " \n", 393 | " \n", 394 | " \n", 395 | " \n", 396 | " \n", 397 | " \n", 398 | " \n", 399 | " \n", 400 | " \n", 401 | " \n", 402 | " \n", 403 | " \n", 404 | " \n", 405 | " \n", 406 | " \n", 407 | " \n", 408 | " \n", 409 | " \n", 410 | " \n", 411 | " \n", 412 | " \n", 413 | " \n", 414 | " \n", 415 | " \n", 416 | " \n", 417 | " \n", 418 | " \n", 419 | " \n", 420 | " \n", 421 | " \n", 422 | " \n", 423 | " \n", 424 | " \n", 425 | " \n", 426 | " \n", 427 | " \n", 428 | " \n", 429 | " \n", 430 | " \n", 431 | " \n", 432 | " \n", 433 | " \n", 434 | " \n", 435 | " \n", 436 | " \n", 437 | " \n", 438 | " \n", 439 | " \n", 440 | " \n", 441 | " \n", 442 | " \n", 443 | " \n", 444 | " \n", 445 | " \n", 446 | " \n", 447 | " \n", 448 | " \n", 449 | " \n", 450 | " \n", 451 | " \n", 452 | " \n", 453 | " \n", 454 | " \n", 455 | " \n", 456 | " \n", 457 | " \n", 458 | " \n", 459 | " \n", 460 | " \n", 461 | " \n", 462 | " \n", 463 | " \n", 464 | " \n", 465 | " \n", 466 | " \n", 467 | " \n", 468 | " \n", 469 | " \n", 470 | " \n", 471 | " \n", 472 | " \n", 473 | " \n", 474 | " \n", 475 | " \n", 476 | " \n", 477 | " \n", 478 | " \n", 479 | " \n", 480 | " \n", 481 | " \n", 482 | " \n", 483 | " \n", 484 | " \n", 485 | " \n", 486 | " \n", 487 | " \n", 488 | " \n", 489 | " \n", 490 | " \n", 491 | " \n", 492 | " \n", 493 | " \n", 494 | " \n", 495 | " \n", 496 | " \n", 497 | " \n", 498 | " \n", 499 | " \n", 500 | " \n", 501 | " \n", 502 | " \n", 503 | " \n", 504 | " \n", 505 | " \n", 506 | " \n", 507 | " \n", 508 | " \n", 509 | " \n", 510 | " \n", 511 | " \n", 512 | " \n", 513 | " \n", 514 | " \n", 515 | " \n", 516 | " \n", 517 | " \n", 518 | " \n", 519 | " \n", 520 | " \n", 521 | " \n", 522 | " \n", 523 | " \n", 524 | " \n", 525 | " \n", 526 | " \n", 527 | " \n", 528 | " \n", 529 | " \n", 530 | " \n", 531 | " \n", 532 | " \n", 533 | " \n", 534 | " \n", 535 | " \n", 536 | " \n", 537 | " \n", 538 | " \n", 539 | " \n", 540 | " \n", 541 | " \n", 542 | " \n", 543 | " \n", 544 | " \n", 545 | " \n", 546 | " \n", 547 | " \n", 548 | " \n", 549 | " \n", 550 | " \n", 551 | " \n", 552 | " \n", 553 | " \n", 554 | " \n", 555 | " \n", 556 | " \n", 557 | " \n", 558 | " \n", 559 | " \n", 560 | " \n", 561 | " \n", 562 | " \n", 563 | " \n", 564 | " \n", 565 | " \n", 566 | " \n", 567 | " \n", 568 | " \n", 569 | " \n", 570 | " \n", 571 | " \n", 572 | " \n", 573 | " \n", 574 | " \n", 575 | " \n", 576 | " \n", 577 | " \n", 578 | " \n", 579 | " \n", 580 | " \n", 581 | " \n", 582 | " \n", 583 | " \n", 584 | " \n", 585 | " \n", 586 | "
asintitleAccessoriesAcid JazzAcoustic BluesAdult AlternativeAdult ContemporaryAfricaAfro BrazilianAfro-CubanAir Tool AccessoriesAlbum-Oriented Rock (AOR)Alt IndustrialAlt-Country & AmericanaAlternative MedicineAlternative MetalAlternative RockAmbientAmbient PopAmerican AlternativeAmerican PunkAmericanaAmplifiers & EffectsAndesArena RockArgentinaArts & Crafts SuppliesArts, Crafts & SewingAustralia & New ZealandAustriaAvant Garde & Free JazzBaby ProductsBachataBags & CasesBakersfield SoundBalletsBallets & DancesBaroque PopBassBass Guitars...Third Wave SkaThrash & Speed MetalTin Pan AlleyTools & AccessoriesTools & Home ImprovementTraditionalTraditional BluesTraditional British & Celtic FolkTraditional FolkTraditional Jazz & RagtimeTraditional PopTraditional Vocal PopTranceTributesTrim & EmbellishmentsTrip-HopTurkeyTurntablistsTwee PopUrban & ContemporaryUrban FolkUruguayVenezuelaVitamins & Dietary SupplementsVocal BluesVocal JazzVocal Non-OperaVocal PopVoicesWalkersWall StickersWall SwitchesWashersWave Washers & Wave SpringsWedding MusicWest CoastWest Coast BluesWestern SwingWorld DanceWorld Music
05555991584Memory of Trees0.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0...0.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0
16308051551Dont Drink His Blood0.00.00.00.00.00.00.00.00.00.00.00.00.00.01.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0...0.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0
27901622466On Fire0.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0...0.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0
3B0000000ZWChanging Faces0.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0...0.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0
4B00000016WPet Sounds0.00.00.00.00.00.00.00.00.00.00.00.00.00.01.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.01.00.00.0...0.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0
\n", 587 | "

5 rows × 463 columns

\n", 588 | "
" 589 | ], 590 | "text/plain": [ 591 | " asin title ... World Dance World Music\n", 592 | "0 5555991584 Memory of Trees ... 0.0 0.0\n", 593 | "1 6308051551 Dont Drink His Blood ... 0.0 0.0\n", 594 | "2 7901622466 On Fire ... 0.0 0.0\n", 595 | "3 B0000000ZW Changing Faces ... 0.0 0.0\n", 596 | "4 B00000016W Pet Sounds ... 0.0 0.0\n", 597 | "\n", 598 | "[5 rows x 463 columns]" 599 | ] 600 | }, 601 | "metadata": { 602 | "tags": [] 603 | }, 604 | "execution_count": 6 605 | } 606 | ] 607 | }, 608 | { 609 | "cell_type": "code", 610 | "metadata": { 611 | "id": "LA5NCxJC7coB", 612 | "colab_type": "code", 613 | "colab": { 614 | "base_uri": "https://localhost:8080/", 615 | "height": 206 616 | }, 617 | "outputId": "875b8583-6db8-4a3d-c351-84acd5b7440a" 618 | }, 619 | "source": [ 620 | "new_metadata = metadata.iloc[:,1:]\n", 621 | "new_metadata = new_metadata.melt(id_vars=[\"title\"])\n", 622 | "new_metadata = new_metadata[new_metadata.value != 0]\n", 623 | "new_metadata.reset_index(inplace=True, drop=True)\n", 624 | "new_metadata.tail()" 625 | ], 626 | "execution_count": 31, 627 | "outputs": [ 628 | { 629 | "output_type": "execute_result", 630 | "data": { 631 | "text/html": [ 632 | "
\n", 633 | "\n", 646 | "\n", 647 | " \n", 648 | " \n", 649 | " \n", 650 | " \n", 651 | " \n", 652 | " \n", 653 | " \n", 654 | " \n", 655 | " \n", 656 | " \n", 657 | " \n", 658 | " \n", 659 | " \n", 660 | " \n", 661 | " \n", 662 | " \n", 663 | " \n", 664 | " \n", 665 | " \n", 666 | " \n", 667 | " \n", 668 | " \n", 669 | " \n", 670 | " \n", 671 | " \n", 672 | " \n", 673 | " \n", 674 | " \n", 675 | " \n", 676 | " \n", 677 | " \n", 678 | " \n", 679 | " \n", 680 | " \n", 681 | " \n", 682 | " \n", 683 | " \n", 684 | " \n", 685 | " \n", 686 | " \n", 687 | "
titlevariablevalue
62572Lloyd Im Ready to Be HeartbrokenWorld Music1.0
62573I Sincerely Apologize For All The Trouble Ive ...World Music1.0
62574Faster PussycatWorld Music1.0
62575Eva Contro EvaWorld Music1.0
62576Waters of NazarethWorld Music1.0
\n", 688 | "
" 689 | ], 690 | "text/plain": [ 691 | " title variable value\n", 692 | "62572 Lloyd Im Ready to Be Heartbroken World Music 1.0\n", 693 | "62573 I Sincerely Apologize For All The Trouble Ive ... World Music 1.0\n", 694 | "62574 Faster Pussycat World Music 1.0\n", 695 | "62575 Eva Contro Eva World Music 1.0\n", 696 | "62576 Waters of Nazareth World Music 1.0" 697 | ] 698 | }, 699 | "metadata": { 700 | "tags": [] 701 | }, 702 | "execution_count": 31 703 | } 704 | ] 705 | }, 706 | { 707 | "cell_type": "code", 708 | "metadata": { 709 | "id": "l14vmh1695tf", 710 | "colab_type": "code", 711 | "colab": {} 712 | }, 713 | "source": [ 714 | "dict_title = np.load('map_tilte.npy', allow_pickle=True).tolist()\n", 715 | "inverse_dict_title = {value: int(key) for key, value in dict_title.items()}" 716 | ], 717 | "execution_count": 0, 718 | "outputs": [] 719 | }, 720 | { 721 | "cell_type": "code", 722 | "metadata": { 723 | "id": "6LAXPwCj-btQ", 724 | "colab_type": "code", 725 | "colab": {} 726 | }, 727 | "source": [ 728 | "new_metadata['asin_id'] = new_metadata['title'].map(inverse_dict_title)" 729 | ], 730 | "execution_count": 0, 731 | "outputs": [] 732 | }, 733 | { 734 | "cell_type": "code", 735 | "metadata": { 736 | "id": "TxxEVPuv7SAf", 737 | "colab_type": "code", 738 | "colab": { 739 | "base_uri": "https://localhost:8080/", 740 | "height": 1000 741 | }, 742 | "outputId": "9f5f42f9-b4c9-4225-c86e-2df391f0e993" 743 | }, 744 | "source": [ 745 | "new_metadata.dropna(inplace=True)\n", 746 | "new_metadata = new_metadata[['asin_id', 'variable', 'value']]\n", 747 | "new_metadata['asin_id'] = new_metadata.asin_id.astype(int)\n", 748 | "new_metadata" 749 | ], 750 | "execution_count": 49, 751 | "outputs": [ 752 | { 753 | "output_type": "execute_result", 754 | "data": { 755 | "text/html": [ 756 | "
\n", 757 | "\n", 770 | "\n", 771 | " \n", 772 | " \n", 773 | " \n", 774 | " \n", 775 | " \n", 776 | " \n", 777 | " \n", 778 | " \n", 779 | " \n", 780 | " \n", 781 | " \n", 782 | " \n", 783 | " \n", 784 | " \n", 785 | " \n", 786 | " \n", 787 | " \n", 788 | " \n", 789 | " \n", 790 | " \n", 791 | " \n", 792 | " \n", 793 | " \n", 794 | " \n", 795 | " \n", 796 | " \n", 797 | " \n", 798 | " \n", 799 | " \n", 800 | " \n", 801 | " \n", 802 | " \n", 803 | " \n", 804 | " \n", 805 | " \n", 806 | " \n", 807 | " \n", 808 | " \n", 809 | " \n", 810 | " \n", 811 | " \n", 812 | " \n", 813 | " \n", 814 | " \n", 815 | " \n", 816 | " \n", 817 | " \n", 818 | " \n", 819 | " \n", 820 | " \n", 821 | " \n", 822 | " \n", 823 | " \n", 824 | " \n", 825 | " \n", 826 | " \n", 827 | " \n", 828 | " \n", 829 | " \n", 830 | " \n", 831 | " \n", 832 | " \n", 833 | " \n", 834 | " \n", 835 | " \n", 836 | " \n", 837 | " \n", 838 | " \n", 839 | " \n", 840 | " \n", 841 | " \n", 842 | " \n", 843 | " \n", 844 | " \n", 845 | " \n", 846 | " \n", 847 | " \n", 848 | " \n", 849 | " \n", 850 | " \n", 851 | " \n", 852 | " \n", 853 | " \n", 854 | " \n", 855 | " \n", 856 | " \n", 857 | " \n", 858 | " \n", 859 | " \n", 860 | " \n", 861 | " \n", 862 | " \n", 863 | " \n", 864 | " \n", 865 | " \n", 866 | " \n", 867 | " \n", 868 | " \n", 869 | " \n", 870 | " \n", 871 | " \n", 872 | " \n", 873 | " \n", 874 | " \n", 875 | " \n", 876 | " \n", 877 | " \n", 878 | " \n", 879 | " \n", 880 | " \n", 881 | " \n", 882 | " \n", 883 | " \n", 884 | " \n", 885 | " \n", 886 | " \n", 887 | " \n", 888 | " \n", 889 | " \n", 890 | " \n", 891 | " \n", 892 | " \n", 893 | " \n", 894 | " \n", 895 | " \n", 896 | " \n", 897 | " \n", 898 | " \n", 899 | " \n", 900 | " \n", 901 | " \n", 902 | " \n", 903 | " \n", 904 | " \n", 905 | " \n", 906 | " \n", 907 | " \n", 908 | " \n", 909 | " \n", 910 | " \n", 911 | " \n", 912 | " \n", 913 | " \n", 914 | " \n", 915 | " \n", 916 | " \n", 917 | " \n", 918 | " \n", 919 | " \n", 920 | " \n", 921 | " \n", 922 | " \n", 923 | " \n", 924 | " \n", 925 | " \n", 926 | " \n", 927 | " \n", 928 | " \n", 929 | " \n", 930 | " \n", 931 | " \n", 932 | " \n", 933 | " \n", 934 | " \n", 935 | " \n", 936 | " \n", 937 | " \n", 938 | " \n", 939 | " \n", 940 | " \n", 941 | " \n", 942 | " \n", 943 | " \n", 944 | " \n", 945 | " \n", 946 | " \n", 947 | " \n", 948 | " \n", 949 | " \n", 950 | " \n", 951 | " \n", 952 | " \n", 953 | " \n", 954 | " \n", 955 | " \n", 956 | " \n", 957 | " \n", 958 | " \n", 959 | " \n", 960 | " \n", 961 | " \n", 962 | " \n", 963 | " \n", 964 | " \n", 965 | " \n", 966 | " \n", 967 | " \n", 968 | " \n", 969 | " \n", 970 | " \n", 971 | " \n", 972 | " \n", 973 | " \n", 974 | " \n", 975 | " \n", 976 | " \n", 977 | " \n", 978 | " \n", 979 | " \n", 980 | " \n", 981 | " \n", 982 | " \n", 983 | " \n", 984 | " \n", 985 | " \n", 986 | " \n", 987 | " \n", 988 | " \n", 989 | " \n", 990 | " \n", 991 | " \n", 992 | " \n", 993 | " \n", 994 | " \n", 995 | " \n", 996 | " \n", 997 | " \n", 998 | " \n", 999 | " \n", 1000 | " \n", 1001 | " \n", 1002 | " \n", 1003 | " \n", 1004 | " \n", 1005 | " \n", 1006 | " \n", 1007 | " \n", 1008 | " \n", 1009 | " \n", 1010 | " \n", 1011 | " \n", 1012 | " \n", 1013 | " \n", 1014 | " \n", 1015 | " \n", 1016 | " \n", 1017 | " \n", 1018 | " \n", 1019 | " \n", 1020 | " \n", 1021 | " \n", 1022 | " \n", 1023 | " \n", 1024 | " \n", 1025 | " \n", 1026 | " \n", 1027 | " \n", 1028 | " \n", 1029 | " \n", 1030 | " \n", 1031 | " \n", 1032 | " \n", 1033 | " \n", 1034 | " \n", 1035 | " \n", 1036 | " \n", 1037 | " \n", 1038 | " \n", 1039 | " \n", 1040 | " \n", 1041 | " \n", 1042 | " \n", 1043 | " \n", 1044 | " \n", 1045 | " \n", 1046 | " \n", 1047 | " \n", 1048 | " \n", 1049 | " \n", 1050 | " \n", 1051 | " \n", 1052 | " \n", 1053 | " \n", 1054 | " \n", 1055 | " \n", 1056 | " \n", 1057 | " \n", 1058 | " \n", 1059 | " \n", 1060 | " \n", 1061 | " \n", 1062 | " \n", 1063 | " \n", 1064 | " \n", 1065 | " \n", 1066 | " \n", 1067 | " \n", 1068 | " \n", 1069 | " \n", 1070 | " \n", 1071 | " \n", 1072 | " \n", 1073 | " \n", 1074 | " \n", 1075 | " \n", 1076 | " \n", 1077 | " \n", 1078 | " \n", 1079 | " \n", 1080 | " \n", 1081 | " \n", 1082 | " \n", 1083 | " \n", 1084 | " \n", 1085 | " \n", 1086 | " \n", 1087 | " \n", 1088 | " \n", 1089 | " \n", 1090 | " \n", 1091 | " \n", 1092 | " \n", 1093 | " \n", 1094 | " \n", 1095 | " \n", 1096 | " \n", 1097 | " \n", 1098 | " \n", 1099 | " \n", 1100 | " \n", 1101 | " \n", 1102 | " \n", 1103 | " \n", 1104 | " \n", 1105 | " \n", 1106 | " \n", 1107 | " \n", 1108 | " \n", 1109 | " \n", 1110 | " \n", 1111 | " \n", 1112 | " \n", 1113 | " \n", 1114 | " \n", 1115 | " \n", 1116 | " \n", 1117 | " \n", 1118 | " \n", 1119 | " \n", 1120 | " \n", 1121 | " \n", 1122 | " \n", 1123 | " \n", 1124 | " \n", 1125 | " \n", 1126 | " \n", 1127 | " \n", 1128 | " \n", 1129 | " \n", 1130 | " \n", 1131 | " \n", 1132 | " \n", 1133 | " \n", 1134 | " \n", 1135 | " \n", 1136 | " \n", 1137 | " \n", 1138 | " \n", 1139 | " \n", 1140 | " \n", 1141 | " \n", 1142 | " \n", 1143 | " \n", 1144 | " \n", 1145 | " \n", 1146 | " \n", 1147 | "
asin_idvariablevalue
6132Acid Jazz1.0
8243Acid Jazz1.0
11545Acid Jazz1.0
12587Acid Jazz1.0
13601Acid Jazz1.0
241255Acid Jazz1.0
261258Acid Jazz1.0
291907Acid Jazz1.0
302048Acid Jazz1.0
331282Acid Jazz1.0
422149Acid Jazz1.0
432189Acid Jazz1.0
481487Acid Jazz1.0
551607Acid Jazz1.0
561620Acid Jazz1.0
611724Acid Jazz1.0
661837Acid Jazz1.0
751953Acid Jazz1.0
781999Acid Jazz1.0
792021Acid Jazz1.0
802028Acid Jazz1.0
902327Acid Jazz1.0
912364Acid Jazz1.0
942442Acid Jazz1.0
951126Acid Jazz1.0
991840Acoustic Blues1.0
106167Acoustic Blues1.0
110562Acoustic Blues1.0
111570Acoustic Blues1.0
112574Acoustic Blues1.0
............
624482533World Music1.0
624512394World Music1.0
624522515World Music1.0
624542401World Music1.0
624552515World Music1.0
624621850World Music1.0
624742515World Music1.0
624782427World Music1.0
624822432World Music1.0
624862438World Music1.0
624872441World Music1.0
624912530World Music1.0
624922451World Music1.0
624941907World Music1.0
625062471World Music1.0
625092512World Music1.0
625132479World Music1.0
625192490World Music1.0
625242498World Music1.0
625252499World Music1.0
625262502World Music1.0
625272504World Music1.0
625282509World Music1.0
625332521World Music1.0
625372526World Music1.0
625532541World Music1.0
625612550World Music1.0
625622555World Music1.0
625632555World Music1.0
625662562World Music1.0
\n", 1148 | "

26048 rows × 3 columns

\n", 1149 | "
" 1150 | ], 1151 | "text/plain": [ 1152 | " asin_id variable value\n", 1153 | "6 132 Acid Jazz 1.0\n", 1154 | "8 243 Acid Jazz 1.0\n", 1155 | "11 545 Acid Jazz 1.0\n", 1156 | "12 587 Acid Jazz 1.0\n", 1157 | "13 601 Acid Jazz 1.0\n", 1158 | "24 1255 Acid Jazz 1.0\n", 1159 | "26 1258 Acid Jazz 1.0\n", 1160 | "29 1907 Acid Jazz 1.0\n", 1161 | "30 2048 Acid Jazz 1.0\n", 1162 | "33 1282 Acid Jazz 1.0\n", 1163 | "42 2149 Acid Jazz 1.0\n", 1164 | "43 2189 Acid Jazz 1.0\n", 1165 | "48 1487 Acid Jazz 1.0\n", 1166 | "55 1607 Acid Jazz 1.0\n", 1167 | "56 1620 Acid Jazz 1.0\n", 1168 | "61 1724 Acid Jazz 1.0\n", 1169 | "66 1837 Acid Jazz 1.0\n", 1170 | "75 1953 Acid Jazz 1.0\n", 1171 | "78 1999 Acid Jazz 1.0\n", 1172 | "79 2021 Acid Jazz 1.0\n", 1173 | "80 2028 Acid Jazz 1.0\n", 1174 | "90 2327 Acid Jazz 1.0\n", 1175 | "91 2364 Acid Jazz 1.0\n", 1176 | "94 2442 Acid Jazz 1.0\n", 1177 | "95 1126 Acid Jazz 1.0\n", 1178 | "99 1840 Acoustic Blues 1.0\n", 1179 | "106 167 Acoustic Blues 1.0\n", 1180 | "110 562 Acoustic Blues 1.0\n", 1181 | "111 570 Acoustic Blues 1.0\n", 1182 | "112 574 Acoustic Blues 1.0\n", 1183 | "... ... ... ...\n", 1184 | "62448 2533 World Music 1.0\n", 1185 | "62451 2394 World Music 1.0\n", 1186 | "62452 2515 World Music 1.0\n", 1187 | "62454 2401 World Music 1.0\n", 1188 | "62455 2515 World Music 1.0\n", 1189 | "62462 1850 World Music 1.0\n", 1190 | "62474 2515 World Music 1.0\n", 1191 | "62478 2427 World Music 1.0\n", 1192 | "62482 2432 World Music 1.0\n", 1193 | "62486 2438 World Music 1.0\n", 1194 | "62487 2441 World Music 1.0\n", 1195 | "62491 2530 World Music 1.0\n", 1196 | "62492 2451 World Music 1.0\n", 1197 | "62494 1907 World Music 1.0\n", 1198 | "62506 2471 World Music 1.0\n", 1199 | "62509 2512 World Music 1.0\n", 1200 | "62513 2479 World Music 1.0\n", 1201 | "62519 2490 World Music 1.0\n", 1202 | "62524 2498 World Music 1.0\n", 1203 | "62525 2499 World Music 1.0\n", 1204 | "62526 2502 World Music 1.0\n", 1205 | "62527 2504 World Music 1.0\n", 1206 | "62528 2509 World Music 1.0\n", 1207 | "62533 2521 World Music 1.0\n", 1208 | "62537 2526 World Music 1.0\n", 1209 | "62553 2541 World Music 1.0\n", 1210 | "62561 2550 World Music 1.0\n", 1211 | "62562 2555 World Music 1.0\n", 1212 | "62563 2555 World Music 1.0\n", 1213 | "62566 2562 World Music 1.0\n", 1214 | "\n", 1215 | "[26048 rows x 3 columns]" 1216 | ] 1217 | }, 1218 | "metadata": { 1219 | "tags": [] 1220 | }, 1221 | "execution_count": 49 1222 | } 1223 | ] 1224 | }, 1225 | { 1226 | "cell_type": "code", 1227 | "metadata": { 1228 | "id": "j4YNe3p39Oj5", 1229 | "colab_type": "code", 1230 | "colab": {} 1231 | }, 1232 | "source": [ 1233 | "new_metadata.to_csv('items_metadata.dat', index=False, sep='\\t', header=False)" 1234 | ], 1235 | "execution_count": 0, 1236 | "outputs": [] 1237 | }, 1238 | { 1239 | "cell_type": "markdown", 1240 | "metadata": { 1241 | "id": "GRSYW3CY_r8T", 1242 | "colab_type": "text" 1243 | }, 1244 | "source": [ 1245 | "### Case Recommender\n" 1246 | ] 1247 | }, 1248 | { 1249 | "cell_type": "code", 1250 | "metadata": { 1251 | "id": "rYyrosGg_q82", 1252 | "colab_type": "code", 1253 | "colab": {} 1254 | }, 1255 | "source": [ 1256 | "from caserec.recommenders.rating_prediction.item_attribute_knn import ItemAttributeKNN" 1257 | ], 1258 | "execution_count": 0, 1259 | "outputs": [] 1260 | }, 1261 | { 1262 | "cell_type": "code", 1263 | "metadata": { 1264 | "id": "6itklNgN_8eh", 1265 | "colab_type": "code", 1266 | "colab": { 1267 | "base_uri": "https://localhost:8080/", 1268 | "height": 173 1269 | }, 1270 | "outputId": "42f96b5c-6251-4311-a683-5c5e72e69828" 1271 | }, 1272 | "source": [ 1273 | "ItemAttributeKNN('train.dat', 'test.dat', metadata_file='items_metadata.dat', as_similar_first=True).compute()" 1274 | ], 1275 | "execution_count": 56, 1276 | "outputs": [ 1277 | { 1278 | "output_type": "stream", 1279 | "text": [ 1280 | "[Case Recommender: Rating Prediction > Item Attribute KNN Algorithm]\n", 1281 | "\n", 1282 | "train data:: 5036 users and 2581 items (34703 interactions) | sparsity:: 99.73%\n", 1283 | "test data:: 4508 users and 2493 items (17093 interactions) | sparsity:: 99.85%\n", 1284 | "\n", 1285 | "training_time:: 10.775388 sec\n", 1286 | ">> metadata:: 2521 items and 292 metadata (26048 interactions) | sparsity:: 96.46%\n", 1287 | "prediction_time:: 0.544531 sec\n", 1288 | "Eval:: MAE: 0.698327 RMSE: 0.984717 \n" 1289 | ], 1290 | "name": "stdout" 1291 | } 1292 | ] 1293 | } 1294 | ] 1295 | } -------------------------------------------------------------------------------- /Processed Datasets/FC.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "nbformat": 4, 3 | "nbformat_minor": 0, 4 | "metadata": { 5 | "colab": { 6 | "name": "FC.ipynb", 7 | "version": "0.3.2", 8 | "provenance": [] 9 | }, 10 | "kernelspec": { 11 | "name": "python3", 12 | "display_name": "Python 3" 13 | } 14 | }, 15 | "cells": [ 16 | { 17 | "cell_type": "code", 18 | "metadata": { 19 | "id": "BwBeOhCaERjC", 20 | "colab_type": "code", 21 | "colab": {} 22 | }, 23 | "source": [ 24 | "import numpy as np\n", 25 | "import pandas as pd" 26 | ], 27 | "execution_count": 0, 28 | "outputs": [] 29 | }, 30 | { 31 | "cell_type": "code", 32 | "metadata": { 33 | "id": "kBWUYelzE4R1", 34 | "colab_type": "code", 35 | "colab": {} 36 | }, 37 | "source": [ 38 | "title_dict = np.load('map_tilte.npy', allow_pickle=True).tolist()" 39 | ], 40 | "execution_count": 0, 41 | "outputs": [] 42 | }, 43 | { 44 | "cell_type": "code", 45 | "metadata": { 46 | "id": "lfGR9oltFwcf", 47 | "colab_type": "code", 48 | "colab": { 49 | "base_uri": "https://localhost:8080/", 50 | "height": 142 51 | }, 52 | "outputId": "bb5a5368-adde-4b9c-fb6e-69e49b4e6f76" 53 | }, 54 | "source": [ 55 | "test = pd.read_csv('test.dat', sep='\\t', names=['reviewerID', 'asin', 'rate', 'title'])\n", 56 | "test[test.reviewerID == 0]" 57 | ], 58 | "execution_count": 18, 59 | "outputs": [ 60 | { 61 | "output_type": "execute_result", 62 | "data": { 63 | "text/html": [ 64 | "
\n", 65 | "\n", 78 | "\n", 79 | " \n", 80 | " \n", 81 | " \n", 82 | " \n", 83 | " \n", 84 | " \n", 85 | " \n", 86 | " \n", 87 | " \n", 88 | " \n", 89 | " \n", 90 | " \n", 91 | " \n", 92 | " \n", 93 | " \n", 94 | " \n", 95 | " \n", 96 | " \n", 97 | " \n", 98 | " \n", 99 | " \n", 100 | " \n", 101 | " \n", 102 | " \n", 103 | " \n", 104 | " \n", 105 | " \n", 106 | " \n", 107 | " \n", 108 | " \n", 109 | " \n", 110 | " \n", 111 | "
reviewerIDasinratetitle
11196024712Jagged Little Pill Acoustic
14517019785La Revancha Del Tango
16513005Memory of Trees
\n", 112 | "
" 113 | ], 114 | "text/plain": [ 115 | " reviewerID asin rate title\n", 116 | "11196 0 2471 2 Jagged Little Pill Acoustic\n", 117 | "14517 0 1978 5 La Revancha Del Tango\n", 118 | "16513 0 0 5 Memory of Trees" 119 | ] 120 | }, 121 | "metadata": { 122 | "tags": [] 123 | }, 124 | "execution_count": 18 125 | } 126 | ] 127 | }, 128 | { 129 | "cell_type": "markdown", 130 | "metadata": { 131 | "id": "PG2-igcjGgzs", 132 | "colab_type": "text" 133 | }, 134 | "source": [ 135 | "## Memory-based" 136 | ] 137 | }, 138 | { 139 | "cell_type": "code", 140 | "metadata": { 141 | "id": "gnQKrVXnDurV", 142 | "colab_type": "code", 143 | "colab": { 144 | "base_uri": "https://localhost:8080/", 145 | "height": 153 146 | }, 147 | "outputId": "a9098c97-03b2-4c74-d2d1-6cba9b7594eb" 148 | }, 149 | "source": [ 150 | "from caserec.recommenders.rating_prediction.itemknn import ItemKNN\n", 151 | "\n", 152 | "ItemKNN('train.dat', 'test.dat', 'rp_iknn.dat').compute()" 153 | ], 154 | "execution_count": 9, 155 | "outputs": [ 156 | { 157 | "output_type": "stream", 158 | "text": [ 159 | "[Case Recommender: Rating Prediction > ItemKNN Algorithm]\n", 160 | "\n", 161 | "train data:: 5036 users and 2581 items (34703 interactions) | sparsity:: 99.73%\n", 162 | "test data:: 4508 users and 2493 items (17093 interactions) | sparsity:: 99.85%\n", 163 | "\n", 164 | "training_time:: 10.051786 sec\n", 165 | "prediction_time:: 0.558465 sec\n", 166 | "Eval:: MAE: 0.710864 RMSE: 1.104636 \n" 167 | ], 168 | "name": "stdout" 169 | } 170 | ] 171 | }, 172 | { 173 | "cell_type": "code", 174 | "metadata": { 175 | "id": "L1EoGufqFHJp", 176 | "colab_type": "code", 177 | "colab": { 178 | "base_uri": "https://localhost:8080/", 179 | "height": 142 180 | }, 181 | "outputId": "9fcc6715-9345-4c8e-843d-df3a81a10553" 182 | }, 183 | "source": [ 184 | "predictions = pd.read_csv('rp_iknn.dat', sep='\\t', names=['reviewerID', 'asin', 'rate'])\n", 185 | "predictions['title'] = predictions.asin.map(title_dict)\n", 186 | "predictions.head(3)" 187 | ], 188 | "execution_count": 15, 189 | "outputs": [ 190 | { 191 | "output_type": "execute_result", 192 | "data": { 193 | "text/html": [ 194 | "
\n", 195 | "\n", 208 | "\n", 209 | " \n", 210 | " \n", 211 | " \n", 212 | " \n", 213 | " \n", 214 | " \n", 215 | " \n", 216 | " \n", 217 | " \n", 218 | " \n", 219 | " \n", 220 | " \n", 221 | " \n", 222 | " \n", 223 | " \n", 224 | " \n", 225 | " \n", 226 | " \n", 227 | " \n", 228 | " \n", 229 | " \n", 230 | " \n", 231 | " \n", 232 | " \n", 233 | " \n", 234 | " \n", 235 | " \n", 236 | " \n", 237 | " \n", 238 | " \n", 239 | " \n", 240 | " \n", 241 | "
reviewerIDasinratetitle
0002.705333Memory of Trees
1019784.839800La Revancha Del Tango
2024714.478565Jagged Little Pill Acoustic
\n", 242 | "
" 243 | ], 244 | "text/plain": [ 245 | " reviewerID asin rate title\n", 246 | "0 0 0 2.705333 Memory of Trees\n", 247 | "1 0 1978 4.839800 La Revancha Del Tango\n", 248 | "2 0 2471 4.478565 Jagged Little Pill Acoustic" 249 | ] 250 | }, 251 | "metadata": { 252 | "tags": [] 253 | }, 254 | "execution_count": 15 255 | } 256 | ] 257 | }, 258 | { 259 | "cell_type": "code", 260 | "metadata": { 261 | "id": "dqOhsX0XEIG_", 262 | "colab_type": "code", 263 | "colab": { 264 | "base_uri": "https://localhost:8080/", 265 | "height": 153 266 | }, 267 | "outputId": "10857605-fd20-41d2-a412-056e83d8e805" 268 | }, 269 | "source": [ 270 | "from caserec.recommenders.rating_prediction.userknn import UserKNN\n", 271 | "\n", 272 | "UserKNN('train.dat', 'test.dat', 'rp_uknn.dat').compute()" 273 | ], 274 | "execution_count": 13, 275 | "outputs": [ 276 | { 277 | "output_type": "stream", 278 | "text": [ 279 | "[Case Recommender: Rating Prediction > UserKNN Algorithm]\n", 280 | "\n", 281 | "train data:: 5036 users and 2581 items (34703 interactions) | sparsity:: 99.73%\n", 282 | "test data:: 4508 users and 2493 items (17093 interactions) | sparsity:: 99.85%\n", 283 | "\n", 284 | "training_time:: 9.999057 sec\n", 285 | "prediction_time:: 3.507684 sec\n", 286 | "Eval:: MAE: 0.687115 RMSE: 1.008135 \n" 287 | ], 288 | "name": "stdout" 289 | } 290 | ] 291 | }, 292 | { 293 | "cell_type": "code", 294 | "metadata": { 295 | "id": "fG8zLJsFFVN9", 296 | "colab_type": "code", 297 | "colab": { 298 | "base_uri": "https://localhost:8080/", 299 | "height": 142 300 | }, 301 | "outputId": "71f85886-4531-4851-e97a-a3bb4d04118f" 302 | }, 303 | "source": [ 304 | "predictions = pd.read_csv('rp_uknn.dat', sep='\\t', names=['reviewerID', 'asin', 'rate'])\n", 305 | "predictions['title'] = predictions.asin.map(title_dict)\n", 306 | "predictions.head(3)" 307 | ], 308 | "execution_count": 16, 309 | "outputs": [ 310 | { 311 | "output_type": "execute_result", 312 | "data": { 313 | "text/html": [ 314 | "
\n", 315 | "\n", 328 | "\n", 329 | " \n", 330 | " \n", 331 | " \n", 332 | " \n", 333 | " \n", 334 | " \n", 335 | " \n", 336 | " \n", 337 | " \n", 338 | " \n", 339 | " \n", 340 | " \n", 341 | " \n", 342 | " \n", 343 | " \n", 344 | " \n", 345 | " \n", 346 | " \n", 347 | " \n", 348 | " \n", 349 | " \n", 350 | " \n", 351 | " \n", 352 | " \n", 353 | " \n", 354 | " \n", 355 | " \n", 356 | " \n", 357 | " \n", 358 | " \n", 359 | " \n", 360 | " \n", 361 | "
reviewerIDasinratetitle
0005.000000Memory of Trees
1019784.736122La Revancha Del Tango
2024713.899040Jagged Little Pill Acoustic
\n", 362 | "
" 363 | ], 364 | "text/plain": [ 365 | " reviewerID asin rate title\n", 366 | "0 0 0 5.000000 Memory of Trees\n", 367 | "1 0 1978 4.736122 La Revancha Del Tango\n", 368 | "2 0 2471 3.899040 Jagged Little Pill Acoustic" 369 | ] 370 | }, 371 | "metadata": { 372 | "tags": [] 373 | }, 374 | "execution_count": 16 375 | } 376 | ] 377 | }, 378 | { 379 | "cell_type": "markdown", 380 | "metadata": { 381 | "id": "lch6FzLoGF4F", 382 | "colab_type": "text" 383 | }, 384 | "source": [ 385 | "## Model-based" 386 | ] 387 | }, 388 | { 389 | "cell_type": "code", 390 | "metadata": { 391 | "id": "hTgDDYUlGJTc", 392 | "colab_type": "code", 393 | "colab": { 394 | "base_uri": "https://localhost:8080/", 395 | "height": 187 396 | }, 397 | "outputId": "f9683bc4-47bc-4055-bb6d-0f03216a6cf4" 398 | }, 399 | "source": [ 400 | "from caserec.recommenders.rating_prediction.matrixfactorization import MatrixFactorization\n", 401 | "\n", 402 | "MatrixFactorization('train.dat', 'test.dat', 'rp_mf.dat').compute()" 403 | ], 404 | "execution_count": 19, 405 | "outputs": [ 406 | { 407 | "output_type": "stream", 408 | "text": [ 409 | "[Case Recommender: Rating Prediction > Matrix Factorization]\n", 410 | "\n", 411 | "train data:: 5036 users and 2581 items (34703 interactions) | sparsity:: 99.73%\n", 412 | "test data:: 4508 users and 2493 items (17093 interactions) | sparsity:: 99.85%\n", 413 | "\n", 414 | "training_time:: 14.870324 sec\n", 415 | "prediction_time:: 0.051896 sec\n", 416 | "\n", 417 | "\n", 418 | "Eval:: MAE: 0.713848 RMSE: 0.979218 \n" 419 | ], 420 | "name": "stdout" 421 | } 422 | ] 423 | }, 424 | { 425 | "cell_type": "code", 426 | "metadata": { 427 | "id": "uPF0WFUkGE3q", 428 | "colab_type": "code", 429 | "colab": { 430 | "base_uri": "https://localhost:8080/", 431 | "height": 142 432 | }, 433 | "outputId": "1a70e882-7d42-4afe-d7d7-0616855c1fc5" 434 | }, 435 | "source": [ 436 | "predictions = pd.read_csv('rp_mf.dat', sep='\\t', names=['reviewerID', 'asin', 'rate'])\n", 437 | "predictions['title'] = predictions.asin.map(title_dict)\n", 438 | "predictions.head(3)" 439 | ], 440 | "execution_count": 20, 441 | "outputs": [ 442 | { 443 | "output_type": "execute_result", 444 | "data": { 445 | "text/html": [ 446 | "
\n", 447 | "\n", 460 | "\n", 461 | " \n", 462 | " \n", 463 | " \n", 464 | " \n", 465 | " \n", 466 | " \n", 467 | " \n", 468 | " \n", 469 | " \n", 470 | " \n", 471 | " \n", 472 | " \n", 473 | " \n", 474 | " \n", 475 | " \n", 476 | " \n", 477 | " \n", 478 | " \n", 479 | " \n", 480 | " \n", 481 | " \n", 482 | " \n", 483 | " \n", 484 | " \n", 485 | " \n", 486 | " \n", 487 | " \n", 488 | " \n", 489 | " \n", 490 | " \n", 491 | " \n", 492 | " \n", 493 | "
reviewerIDasinratetitle
0024714.388470Jagged Little Pill Acoustic
1019784.943073La Revancha Del Tango
2004.829396Memory of Trees
\n", 494 | "
" 495 | ], 496 | "text/plain": [ 497 | " reviewerID asin rate title\n", 498 | "0 0 2471 4.388470 Jagged Little Pill Acoustic\n", 499 | "1 0 1978 4.943073 La Revancha Del Tango\n", 500 | "2 0 0 4.829396 Memory of Trees" 501 | ] 502 | }, 503 | "metadata": { 504 | "tags": [] 505 | }, 506 | "execution_count": 20 507 | } 508 | ] 509 | } 510 | ] 511 | } -------------------------------------------------------------------------------- /Processed Datasets/RS_NonPersonalized.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "nbformat": 4, 3 | "nbformat_minor": 0, 4 | "metadata": { 5 | "kernelspec": { 6 | "display_name": "Python 3", 7 | "language": "python", 8 | "name": "python3" 9 | }, 10 | "language_info": { 11 | "codemirror_mode": { 12 | "name": "ipython", 13 | "version": 3 14 | }, 15 | "file_extension": ".py", 16 | "mimetype": "text/x-python", 17 | "name": "python", 18 | "nbconvert_exporter": "python", 19 | "pygments_lexer": "ipython3", 20 | "version": "3.7.3" 21 | }, 22 | "colab": { 23 | "name": "RS_NonPersonalized.ipynb", 24 | "version": "0.3.2", 25 | "provenance": [] 26 | } 27 | }, 28 | "cells": [ 29 | { 30 | "cell_type": "code", 31 | "metadata": { 32 | "id": "UzAb8mL9A7D_", 33 | "colab_type": "code", 34 | "colab": { 35 | "base_uri": "https://localhost:8080/", 36 | "height": 442 37 | }, 38 | "outputId": "b8239530-747b-4b76-fcb7-8a2344469d5b" 39 | }, 40 | "source": [ 41 | "! wget https://github.com/caserec/Datasets-for-Recommneder-Systems/raw/master/Processed%20Datasets/AmazonMusic.tar.xz\n", 42 | "! tar -xf AmazonMusic.tar.xz\n", 43 | "! pip install caserecommender" 44 | ], 45 | "execution_count": 30, 46 | "outputs": [ 47 | { 48 | "output_type": "stream", 49 | "text": [ 50 | "--2019-09-04 20:48:25-- https://github.com/caserec/Datasets-for-Recommneder-Systems/raw/master/Processed%20Datasets/AmazonMusic.tar.xz\n", 51 | "Resolving github.com (github.com)... 192.30.255.112\n", 52 | "Connecting to github.com (github.com)|192.30.255.112|:443... connected.\n", 53 | "HTTP request sent, awaiting response... 302 Found\n", 54 | "Location: https://raw.githubusercontent.com/caserec/Datasets-for-Recommneder-Systems/master/Processed%20Datasets/AmazonMusic.tar.xz [following]\n", 55 | "--2019-09-04 20:48:25-- https://raw.githubusercontent.com/caserec/Datasets-for-Recommneder-Systems/master/Processed%20Datasets/AmazonMusic.tar.xz\n", 56 | "Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.0.133, 151.101.64.133, 151.101.128.133, ...\n", 57 | "Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.0.133|:443... connected.\n", 58 | "HTTP request sent, awaiting response... 200 OK\n", 59 | "Length: 22112728 (21M) [application/octet-stream]\n", 60 | "Saving to: ‘AmazonMusic.tar.xz.2’\n", 61 | "\n", 62 | "\rAmazonMusic.tar.xz. 0%[ ] 0 --.-KB/s \rAmazonMusic.tar.xz. 100%[===================>] 21.09M 137MB/s in 0.2s \n", 63 | "\n", 64 | "2019-09-04 20:48:25 (137 MB/s) - ‘AmazonMusic.tar.xz.2’ saved [22112728/22112728]\n", 65 | "\n", 66 | "Requirement already satisfied: caserecommender in /usr/local/lib/python3.6/dist-packages (1.0.918.post0)\n", 67 | "Requirement already satisfied: scikit-learn in /usr/local/lib/python3.6/dist-packages (from caserecommender) (0.21.3)\n", 68 | "Requirement already satisfied: numpy in /usr/local/lib/python3.6/dist-packages (from caserecommender) (1.16.4)\n", 69 | "Requirement already satisfied: scipy in /usr/local/lib/python3.6/dist-packages (from caserecommender) (1.3.1)\n", 70 | "Requirement already satisfied: pandas in /usr/local/lib/python3.6/dist-packages (from caserecommender) (0.24.2)\n", 71 | "Requirement already satisfied: joblib>=0.11 in /usr/local/lib/python3.6/dist-packages (from scikit-learn->caserecommender) (0.13.2)\n", 72 | "Requirement already satisfied: python-dateutil>=2.5.0 in /usr/local/lib/python3.6/dist-packages (from pandas->caserecommender) (2.5.3)\n", 73 | "Requirement already satisfied: pytz>=2011k in /usr/local/lib/python3.6/dist-packages (from pandas->caserecommender) (2018.9)\n", 74 | "Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.6/dist-packages (from python-dateutil>=2.5.0->pandas->caserecommender) (1.12.0)\n" 75 | ], 76 | "name": "stdout" 77 | } 78 | ] 79 | }, 80 | { 81 | "cell_type": "code", 82 | "metadata": { 83 | "id": "MngqcgCxBLeX", 84 | "colab_type": "code", 85 | "colab": { 86 | "base_uri": "https://localhost:8080/", 87 | "height": 51 88 | }, 89 | "outputId": "7490f411-aaf4-4582-aeb2-8df3677adf20" 90 | }, 91 | "source": [ 92 | "ls" 93 | ], 94 | "execution_count": 31, 95 | "outputs": [ 96 | { 97 | "output_type": "stream", 98 | "text": [ 99 | "\u001b[0m\u001b[01;34mAmazonMusic\u001b[0m/ AmazonMusic.tar.xz.1 map_tilte.npy test.dat\n", 100 | "AmazonMusic.tar.xz AmazonMusic.tar.xz.2 \u001b[01;34msample_data\u001b[0m/ train.dat\n" 101 | ], 102 | "name": "stdout" 103 | } 104 | ] 105 | }, 106 | { 107 | "cell_type": "code", 108 | "metadata": { 109 | "id": "L0e1hREHA1GH", 110 | "colab_type": "code", 111 | "colab": {} 112 | }, 113 | "source": [ 114 | "import pandas as pd\n", 115 | "import numpy as np" 116 | ], 117 | "execution_count": 0, 118 | "outputs": [] 119 | }, 120 | { 121 | "cell_type": "code", 122 | "metadata": { 123 | "id": "OAQRi8d5A1GO", 124 | "colab_type": "code", 125 | "colab": { 126 | "base_uri": "https://localhost:8080/", 127 | "height": 204 128 | }, 129 | "outputId": "66cd718f-710e-4bf7-aade-4fbf9bac24ae" 130 | }, 131 | "source": [ 132 | "dataset = pd.read_json('./AmazonMusic/Digital_Music_5.json', lines=True)\n", 133 | "dataset.head()" 134 | ], 135 | "execution_count": 33, 136 | "outputs": [ 137 | { 138 | "output_type": "execute_result", 139 | "data": { 140 | "text/html": [ 141 | "
\n", 142 | "\n", 155 | "\n", 156 | " \n", 157 | " \n", 158 | " \n", 159 | " \n", 160 | " \n", 161 | " \n", 162 | " \n", 163 | " \n", 164 | " \n", 165 | " \n", 166 | " \n", 167 | " \n", 168 | " \n", 169 | " \n", 170 | " \n", 171 | " \n", 172 | " \n", 173 | " \n", 174 | " \n", 175 | " \n", 176 | " \n", 177 | " \n", 178 | " \n", 179 | " \n", 180 | " \n", 181 | " \n", 182 | " \n", 183 | " \n", 184 | " \n", 185 | " \n", 186 | " \n", 187 | " \n", 188 | " \n", 189 | " \n", 190 | " \n", 191 | " \n", 192 | " \n", 193 | " \n", 194 | " \n", 195 | " \n", 196 | " \n", 197 | " \n", 198 | " \n", 199 | " \n", 200 | " \n", 201 | " \n", 202 | " \n", 203 | " \n", 204 | " \n", 205 | " \n", 206 | " \n", 207 | " \n", 208 | " \n", 209 | " \n", 210 | " \n", 211 | " \n", 212 | " \n", 213 | " \n", 214 | " \n", 215 | " \n", 216 | " \n", 217 | " \n", 218 | " \n", 219 | " \n", 220 | " \n", 221 | " \n", 222 | " \n", 223 | " \n", 224 | " \n", 225 | " \n", 226 | " \n", 227 | " \n", 228 | " \n", 229 | " \n", 230 | " \n", 231 | " \n", 232 | "
asinhelpfuloverallreviewTextreviewTimereviewerIDreviewerNamesummaryunixReviewTime
05555991584[3, 3]5It's hard to believe \"Memory of Trees\" came ou...09 12, 2006A3EBHHCZO6V2A4Amaranth \"music fan\"Enya's last great album1158019200
15555991584[0, 0]5A clasically-styled and introverted album, Mem...06 3, 2001AZPWAXJG9OJXVbethtexasEnya at her most elegant991526400
25555991584[2, 2]5I never thought Enya would reach the sublime h...07 14, 2003A38IRL0X2T4DPFbob turnleyThe best so far1058140800
35555991584[1, 1]5This is the third review of an irish album I w...05 3, 2000A22IK3I6U76GX0CalleIreland produces good music.957312000
45555991584[1, 1]4Enya, despite being a successful recording art...01 17, 2008A1AISPOIIHTHXXCloud \"...\"4.5; music to dream to1200528000
\n", 233 | "
" 234 | ], 235 | "text/plain": [ 236 | " asin helpful ... summary unixReviewTime\n", 237 | "0 5555991584 [3, 3] ... Enya's last great album 1158019200\n", 238 | "1 5555991584 [0, 0] ... Enya at her most elegant 991526400\n", 239 | "2 5555991584 [2, 2] ... The best so far 1058140800\n", 240 | "3 5555991584 [1, 1] ... Ireland produces good music. 957312000\n", 241 | "4 5555991584 [1, 1] ... 4.5; music to dream to 1200528000\n", 242 | "\n", 243 | "[5 rows x 9 columns]" 244 | ] 245 | }, 246 | "metadata": { 247 | "tags": [] 248 | }, 249 | "execution_count": 33 250 | } 251 | ] 252 | }, 253 | { 254 | "cell_type": "code", 255 | "metadata": { 256 | "id": "BOLVuBBqA1GV", 257 | "colab_type": "code", 258 | "colab": { 259 | "base_uri": "https://localhost:8080/", 260 | "height": 265 261 | }, 262 | "outputId": "2b82c7d7-bf47-41ae-81c1-5a3bc6a17ed4" 263 | }, 264 | "source": [ 265 | "dataset.overall.value_counts().plot(kind='bar', color=['g', 'c', 'y', 'b', 'r']);" 266 | ], 267 | "execution_count": 34, 268 | "outputs": [ 269 | { 270 | "output_type": "display_data", 271 | "data": { 272 | "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYcAAAD4CAYAAAAHHSreAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4zLCBo\ndHRwOi8vbWF0cGxvdGxpYi5vcmcvnQurowAAE4xJREFUeJzt3X+MXeWd3/H3J+bHsk0TTJgiZFsL\n2liKnLTrhFlDlVXLEsUYWtWslEbQanERwlsFVFZdVSHbVg5JkJI/dlEjJUhscTCr3RDKboQbmfVa\nLGyUVvwYCAUMi5glibDFj9mYH0tZgcx++8d9XK78zHjGM+O5Y/x+SVf33O95zrnfc2XPZ+45z72T\nqkKSpGEfGHUDkqTlx3CQJHUMB0lSx3CQJHUMB0lSx3CQJHUMB0lSx3CQJHUMB0lS56RRNzBfZ555\nZp1zzjmjbkOSjiuPPvro31TV2GzjjttwOOecc5iYmBh1G5J0XEnys7mM87SSJKljOEiSOoaDJKlj\nOEiSOoaDJKljOEiSOoaDJKljOEiSOrN+CC7JLwA/BE5t4++uqm1Jbgf+OfB6G/rvqurxJAH+G3Ap\n8FarP9b2tQX4L23816pqR6ufB9wOnAbsAq6vY/zHrXNjjuXu56y2+Te8JS0/c/mE9NvARVX1ZpKT\ngR8lubet+09Vdfdh4y8B1rbb+cAtwPlJzgC2AeNAAY8m2VlVr7Yx1wAPMQiHTcC9SJJGYtbTSjXw\nZnt4crsd6dfdzcAdbbsHgdOTnA1cDOypqgMtEPYAm9q6D1XVg+3dwh3AZQs4JknSAs3pmkOSFUke\nB15h8AP+obbqpiRPJLk5yamttgp4YWjzfa12pPq+aeqSpBGZUzhU1btVtR5YDWxI8gngS8DHgF8F\nzgC+eMy6bJJsTTKRZGJqaupYP50knbCOarZSVb0G3A9sqqoX26mjt4HvABvasP3AmqHNVrfakeqr\np6lP9/y3VtV4VY2Pjc36jbOSpHmaNRySjCU5vS2fBnwW+Kt2rYA2O+ky4Km2yU7gygxcALxeVS8C\nu4GNSVYmWQlsBHa3dW8kuaDt60rgnsU9TEnS0ZjLbKWzgR1JVjAIk7uq6gdJ/iLJGBDgceDft/G7\nGExjnWQwlfUqgKo6kOSrwCNt3Feq6kBb/gLvTWW9F2cqSdJIzRoOVfUE8Mlp6hfNML6Aa2dYtx3Y\nPk19AvjEbL1IkpaGn5CWJHUMB0lSx3CQJHUMB0lSx3CQJHUMB0lSx3CQJHUMB0lSx3CQJHUMB0lS\nx3CQJHUMB0lSx3CQJHUMB0lSx3CQJHUMB0lSx3CQJHUMB0lSx3CQJHUMB0lSx3CQJHVmDYckv5Dk\n4ST/J8neJDe2+rlJHkoymeR7SU5p9VPb48m2/pyhfX2p1Z9NcvFQfVOrTSa5YfEPU5J0NObyzuFt\n4KKq+hVgPbApyQXAN4Cbq+qjwKvA1W381cCrrX5zG0eSdcDlwMeBTcC3k6xIsgL4FnAJsA64oo2V\nJI3IrOFQA2+2hye3WwEXAXe3+g7gsra8uT2mrf9MkrT6nVX1dlX9BJgENrTbZFU9X1XvAHe2sZKk\nEZnTNYf2G/7jwCvAHuCvgdeq6mAbsg9Y1ZZXAS8AtPWvAx8Zrh+2zUz16frYmmQiycTU1NRcWpck\nzcOcwqGq3q2q9cBqBr/pf+yYdjVzH7dW1XhVjY+NjY2iBUk6IRzVbKWqeg24H/inwOlJTmqrVgP7\n2/J+YA1AW/9h4OfD9cO2makuSRqRucxWGktyels+Dfgs8AyDkPhcG7YFuKct72yPaev/oqqq1S9v\ns5nOBdYCDwOPAGvb7KdTGFy03rkYBydJmp+TZh/C2cCONqvoA8BdVfWDJE8Ddyb5GvBj4LY2/jbg\nD5NMAgcY/LCnqvYmuQt4GjgIXFtV7wIkuQ7YDawAtlfV3kU7QknSUcvgl/rjz/j4eE1MTMx7+9yY\nRexm/mrb8fn6Szo+JXm0qsZnG+cnpCVJHcNBktQxHCRJHcNBktQxHCRJHcNBktQxHCRJHcNBktQx\nHCRJHcNBktQxHCRJHcNBktQxHCRJHcNBktQxHCRJHcNBktQxHCRJHcNBktQxHCRJHcNBktSZNRyS\nrElyf5Knk+xNcn2rfznJ/iSPt9ulQ9t8KclkkmeTXDxU39Rqk0luGKqfm+ShVv9eklMW+0AlSXM3\nl3cOB4Hfqap1wAXAtUnWtXU3V9X6dtsF0NZdDnwc2AR8O8mKJCuAbwGXAOuAK4b28422r48CrwJX\nL9LxSZLmYdZwqKoXq+qxtvy3wDPAqiNsshm4s6rerqqfAJPAhnabrKrnq+od4E5gc5IAFwF3t+13\nAJfN94AkSQt3VNcckpwDfBJ4qJWuS/JEku1JVrbaKuCFoc32tdpM9Y8Ar1XVwcPq0z3/1iQTSSam\npqaOpnVJ0lGYczgk+SDwJ8BvV9UbwC3ALwPrgReB3zsmHQ6pqluraryqxsfGxo7100nSCeukuQxK\ncjKDYPijqvpTgKp6eWj9HwA/aA/3A2uGNl/dasxQ/zlwepKT2ruH4fGSpBGYy2ylALcBz1TV7w/V\nzx4a9hvAU215J3B5klOTnAusBR4GHgHWtplJpzC4aL2zqgq4H/hc234LcM/CDkuStBBzeefwaeA3\ngSeTPN5qv8tgttF6oICfAr8FUFV7k9wFPM1gptO1VfUuQJLrgN3ACmB7Ve1t+/sicGeSrwE/ZhBG\nkqQRmTUcqupHQKZZtesI29wE3DRNfdd021XV8wxmM0mSlgE/IS1J6hgOkqSO4SBJ6hgOkqSO4SBJ\n6hgOkqSO4SBJ6hgOkqSO4SBJ6hgOkqSO4SBJ6hgOkqSO4SBJ6hgOkqSO4SBJ6hgOkqSO4SBJ6hgO\nkqSO4SBJ6hgOkqTOrOGQZE2S+5M8nWRvkutb/Ywke5I81+5XtnqSfDPJZJInknxqaF9b2vjnkmwZ\nqp+X5Mm2zTeT5FgcrCRpbubyzuEg8DtVtQ64ALg2yTrgBuC+qloL3NceA1wCrG23rcAtMAgTYBtw\nPrAB2HYoUNqYa4a227TwQ5Mkzdes4VBVL1bVY235b4FngFXAZmBHG7YDuKwtbwbuqIEHgdOTnA1c\nDOypqgNV9SqwB9jU1n2oqh6sqgLuGNqXJGkEjuqaQ5JzgE8CDwFnVdWLbdVLwFlteRXwwtBm+1rt\nSPV909Sne/6tSSaSTExNTR1N65KkozDncEjyQeBPgN+uqjeG17Xf+GuRe+tU1a1VNV5V42NjY8f6\n6STphDWncEhyMoNg+KOq+tNWfrmdEqLdv9Lq+4E1Q5uvbrUj1VdPU5ckjchcZisFuA14pqp+f2jV\nTuDQjKMtwD1D9SvbrKULgNfb6afdwMYkK9uF6I3A7rbujSQXtOe6cmhfkqQROGkOYz4N/CbwZJLH\nW+13ga8DdyW5GvgZ8Pm2bhdwKTAJvAVcBVBVB5J8FXikjftKVR1oy18AbgdOA+5tN0nSiMwaDlX1\nI2Cmzx18ZprxBVw7w762A9unqU8An5itF0nS0vAT0pKkjuEgSeoYDpKkjuEgSeoYDpKkjuEgSeoY\nDpKkjuEgSeoYDpKkjuEgSeoYDpKkjuEgSeoYDpKkjuEgSerM5e856H0uDzww6hYAqAsvHHULkhrf\nOUiSOoaDJKljOEiSOoaDJKljOEiSOrOGQ5LtSV5J8tRQ7ctJ9id5vN0uHVr3pSSTSZ5NcvFQfVOr\nTSa5Yah+bpKHWv17SU5ZzAOUJB29ubxzuB3YNE395qpa3267AJKsAy4HPt62+XaSFUlWAN8CLgHW\nAVe0sQDfaPv6KPAqcPVCDkiStHCzhkNV/RA4MMf9bQburKq3q+onwCSwod0mq+r5qnoHuBPYnCTA\nRcDdbfsdwGVHeQySpEW2kGsO1yV5op12Wtlqq4AXhsbsa7WZ6h8BXquqg4fVp5Vka5KJJBNTU1ML\naF2SdCTzDYdbgF8G1gMvAr+3aB0dQVXdWlXjVTU+Nja2FE8pSSekeX19RlW9fGg5yR8AP2gP9wNr\nhoaubjVmqP8cOD3JSe3dw/B4SdKIzOudQ5Kzhx7+BnBoJtNO4PIkpyY5F1gLPAw8AqxtM5NOYXDR\nemdVFXA/8Lm2/Rbgnvn0JElaPLO+c0jyXeBC4Mwk+4BtwIVJ1gMF/BT4LYCq2pvkLuBp4CBwbVW9\n2/ZzHbAbWAFsr6q97Sm+CNyZ5GvAj4HbFu3oJEnzMms4VNUV05Rn/AFeVTcBN01T3wXsmqb+PIPZ\nTJKkZcJPSEuSOoaDJKljOEiSOoaDJKljOEiSOoaDJKljOEiSOoaDJKljOEiSOoaDJKljOEiSOoaD\nJKljOEiSOoaDJKljOEiSOoaDJKljOEiSOoaDJKljOEiSOoaDJKkzazgk2Z7klSRPDdXOSLInyXPt\nfmWrJ8k3k0wmeSLJp4a22dLGP5dky1D9vCRPtm2+mSSLfZCSpKMzl3cOtwObDqvdANxXVWuB+9pj\ngEuAte22FbgFBmECbAPOBzYA2w4FShtzzdB2hz+XJGmJzRoOVfVD4MBh5c3Ajra8A7hsqH5HDTwI\nnJ7kbOBiYE9VHaiqV4E9wKa27kNV9WBVFXDH0L4kSSMy32sOZ1XVi235JeCstrwKeGFo3L5WO1J9\n3zT1aSXZmmQiycTU1NQ8W5ckzWbBF6Tbb/y1CL3M5blurarxqhofGxtbiqeUpBPSfMPh5XZKiHb/\nSqvvB9YMjVvdakeqr56mLkkaofmGw07g0IyjLcA9Q/Ur26ylC4DX2+mn3cDGJCvbheiNwO627o0k\nF7RZSlcO7UuSNCInzTYgyXeBC4Ezk+xjMOvo68BdSa4GfgZ8vg3fBVwKTAJvAVcBVNWBJF8FHmnj\nvlJVhy5yf4HBjKjTgHvbTZI0QrOGQ1VdMcOqz0wztoBrZ9jPdmD7NPUJ4BOz9SFJWjp+QlqS1DEc\nJEkdw0GS1DEcJEkdw0GS1DEcJEkdw0GS1DEcJEkdw0GS1DEcJEkdw0GS1DEcJEkdw0GS1Jn1W1ml\nE8kDD2TULQBw4YVL8scVpRn5zkGS1DEcJEkdw0GS1DEcJEkdw0GS1DEcJEmdBYVDkp8meTLJ40km\nWu2MJHuSPNfuV7Z6knwzyWSSJ5J8amg/W9r455JsWdghSZIWajHeOfx6Va2vqvH2+AbgvqpaC9zX\nHgNcAqxtt63ALTAIE2AbcD6wAdh2KFAkSaNxLE4rbQZ2tOUdwGVD9Ttq4EHg9CRnAxcDe6rqQFW9\nCuwBNh2DviRJc7TQcCjgz5M8mmRrq51VVS+25ZeAs9ryKuCFoW33tdpM9U6SrUkmkkxMTU0tsHVJ\n0kwW+vUZv1ZV+5P8I2BPkr8aXllVlWTRvgegqm4FbgUYHx/3+wUk6RhZ0DuHqtrf7l8Bvs/gmsHL\n7XQR7f6VNnw/sGZo89WtNlNdkjQi8w6HJP8gyT88tAxsBJ4CdgKHZhxtAe5pyzuBK9uspQuA19vp\np93AxiQr24Xoja0mSRqRhZxWOgv4fpJD+/njqvqzJI8AdyW5GvgZ8Pk2fhdwKTAJvAVcBVBVB5J8\nFXikjftKVR1YQF+SpAWadzhU1fPAr0xT/znwmWnqBVw7w762A9vn24skaXH5CWlJUsdwkCR1DAdJ\nUsdwkCR1DAdJUsdwkCR1Fvr1GZLepwYfYRq98otyRsJwkKTZnIBJ6WklSVLHcJAkdQwHSVLHcJAk\ndQwHSVLHcJAkdQwHSVLHcJAkdQwHSVLHcJAkdQwHSVLHcJAkdZZNOCTZlOTZJJNJbhh1P5J0IlsW\n4ZBkBfAt4BJgHXBFknWj7UqSTlzLIhyADcBkVT1fVe8AdwKbR9yTJJ2wlsvfc1gFvDD0eB9w/uGD\nkmwFtraHbyZ5dgl6O5Izgb9ZyA7y5WXyPfELt/DXYpEaWQYW/Fq8j16Nhf+7eN+8FIvw72JxXoxf\nmsug5RIOc1JVtwK3jrqPQ5JMVNX4qPtYDnwt3uNr8R5fi/ccb6/FcjmttB9YM/R4datJkkZguYTD\nI8DaJOcmOQW4HNg54p4k6YS1LE4rVdXBJNcBu4EVwPaq2jvituZi2ZziWgZ8Ld7ja/EeX4v3HFev\nRWoJ/2C1JOn4sFxOK0mSlhHDQZLUMRwkSR3DQYsiyR2j7mFUkmxI8qtteV2S/5jk0lH3JS3Espit\ndDxK8msMvvbjqar681H3s5SSHD7NOMCvJzkdoKr+1dJ3NRpJtjH4TrCTkuxh8Mn++4Ebknyyqm4a\naYNLLMnHGHzjwUNV9eZQfVNV/dnoOtPRcrbSHCV5uKo2tOVrgGuB7wMbgf9ZVV8fZX9LKcljwNPA\nfweKQTh8l8HnU6iqvxxdd0sryZPAeuBU4CVgdVW9keQ0Bj8g/8lIG1xCSf4Dg/8XzzB4Ta6vqnva\nuseq6lOj7G+5SHJVVX1n1H3MxtNKc3fy0PJW4LNVdSODcPi3o2lpZMaBR4H/DLxeVQ8Af1dVf3ki\nBUNzsKreraq3gL+uqjcAqurvgL8fbWtL7hrgvKq6DLgQ+K9Jrm/r3j/fkLRwN466gbnwtNLcfSDJ\nSgaBmqqaAqiq/5vk4GhbW1pV9ffAzUn+R7t/mRP339I7SX6xhcN5h4pJPsyJFw4fOHQqqap+muRC\n4O4kv8QJFg5JnphpFXDWUvYyXyfqf+j5+DCD35YDVJKzq+rFJB/kBPuHf0hV7QP+dZJ/Abwx6n5G\n5J9V1dvw/0PzkJOBLaNpaWReTrK+qh4HqKo3k/xLYDvwj0fb2pI7C7gYePWweoD/vfTtHD2vOSxQ\nkl8Ezqqqn4y6F2mUkqxmcJrtpWnWfbqq/tcI2hqJJLcB36mqH02z7o+r6t+MoK2jYjhIkjpekJYk\ndQwHSVLHcJAkdQwHSVLn/wGh/eI8lieAxAAAAABJRU5ErkJggg==\n", 273 | "text/plain": [ 274 | "
" 275 | ] 276 | }, 277 | "metadata": { 278 | "tags": [] 279 | } 280 | } 281 | ] 282 | }, 283 | { 284 | "cell_type": "code", 285 | "metadata": { 286 | "id": "Ir7FstouA1Gc", 287 | "colab_type": "code", 288 | "colab": { 289 | "base_uri": "https://localhost:8080/", 290 | "height": 406 291 | }, 292 | "outputId": "a7e4a83e-2c67-4aaf-9179-5b019e4ac300" 293 | }, 294 | "source": [ 295 | "dataset_metadata = pd.read_csv('AmazonMusic/amazon_music_metadata.csv')\n", 296 | "dataset_metadata.head()" 297 | ], 298 | "execution_count": 35, 299 | "outputs": [ 300 | { 301 | "output_type": "execute_result", 302 | "data": { 303 | "text/html": [ 304 | "
\n", 305 | "\n", 318 | "\n", 319 | " \n", 320 | " \n", 321 | " \n", 322 | " \n", 323 | " \n", 324 | " \n", 325 | " \n", 326 | " \n", 327 | " \n", 328 | " \n", 329 | " \n", 330 | " \n", 331 | " \n", 332 | " \n", 333 | " \n", 334 | " \n", 335 | " \n", 336 | " \n", 337 | " \n", 338 | " \n", 339 | " \n", 340 | " \n", 341 | " \n", 342 | " \n", 343 | " \n", 344 | " \n", 345 | " \n", 346 | " \n", 347 | " \n", 348 | " \n", 349 | " \n", 350 | " \n", 351 | " \n", 352 | " \n", 353 | " \n", 354 | " \n", 355 | " \n", 356 | " \n", 357 | " \n", 358 | " \n", 359 | " \n", 360 | " \n", 361 | " \n", 362 | " \n", 363 | " \n", 364 | " \n", 365 | " \n", 366 | " \n", 367 | " \n", 368 | " \n", 369 | " \n", 370 | " \n", 371 | " \n", 372 | " \n", 373 | " \n", 374 | " \n", 375 | " \n", 376 | " \n", 377 | " \n", 378 | " \n", 379 | " \n", 380 | " \n", 381 | " \n", 382 | " \n", 383 | " \n", 384 | " \n", 385 | " \n", 386 | " \n", 387 | " \n", 388 | " \n", 389 | " \n", 390 | " \n", 391 | " \n", 392 | " \n", 393 | " \n", 394 | " \n", 395 | " \n", 396 | " \n", 397 | " \n", 398 | " \n", 399 | " \n", 400 | " \n", 401 | " \n", 402 | " \n", 403 | " \n", 404 | " \n", 405 | " \n", 406 | " \n", 407 | " \n", 408 | " \n", 409 | " \n", 410 | " \n", 411 | " \n", 412 | " \n", 413 | " \n", 414 | " \n", 415 | " \n", 416 | " \n", 417 | " \n", 418 | " \n", 419 | " \n", 420 | " \n", 421 | " \n", 422 | " \n", 423 | " \n", 424 | " \n", 425 | " \n", 426 | " \n", 427 | " \n", 428 | " \n", 429 | " \n", 430 | " \n", 431 | " \n", 432 | " \n", 433 | " \n", 434 | " \n", 435 | " \n", 436 | " \n", 437 | " \n", 438 | " \n", 439 | " \n", 440 | " \n", 441 | " \n", 442 | " \n", 443 | " \n", 444 | " \n", 445 | " \n", 446 | " \n", 447 | " \n", 448 | " \n", 449 | " \n", 450 | " \n", 451 | " \n", 452 | " \n", 453 | " \n", 454 | " \n", 455 | " \n", 456 | " \n", 457 | " \n", 458 | " \n", 459 | " \n", 460 | " \n", 461 | " \n", 462 | " \n", 463 | " \n", 464 | " \n", 465 | " \n", 466 | " \n", 467 | " \n", 468 | " \n", 469 | " \n", 470 | " \n", 471 | " \n", 472 | " \n", 473 | " \n", 474 | " \n", 475 | " \n", 476 | " \n", 477 | " \n", 478 | " \n", 479 | " \n", 480 | " \n", 481 | " \n", 482 | " \n", 483 | " \n", 484 | " \n", 485 | " \n", 486 | " \n", 487 | " \n", 488 | " \n", 489 | " \n", 490 | " \n", 491 | " \n", 492 | " \n", 493 | " \n", 494 | " \n", 495 | " \n", 496 | " \n", 497 | " \n", 498 | " \n", 499 | " \n", 500 | " \n", 501 | " \n", 502 | " \n", 503 | " \n", 504 | " \n", 505 | " \n", 506 | " \n", 507 | " \n", 508 | " \n", 509 | " \n", 510 | " \n", 511 | " \n", 512 | " \n", 513 | " \n", 514 | " \n", 515 | " \n", 516 | " \n", 517 | " \n", 518 | " \n", 519 | " \n", 520 | " \n", 521 | " \n", 522 | " \n", 523 | " \n", 524 | " \n", 525 | " \n", 526 | " \n", 527 | " \n", 528 | " \n", 529 | " \n", 530 | " \n", 531 | " \n", 532 | " \n", 533 | " \n", 534 | " \n", 535 | " \n", 536 | " \n", 537 | " \n", 538 | " \n", 539 | " \n", 540 | " \n", 541 | " \n", 542 | " \n", 543 | " \n", 544 | " \n", 545 | " \n", 546 | " \n", 547 | " \n", 548 | " \n", 549 | " \n", 550 | " \n", 551 | " \n", 552 | " \n", 553 | " \n", 554 | " \n", 555 | " \n", 556 | " \n", 557 | " \n", 558 | " \n", 559 | " \n", 560 | " \n", 561 | " \n", 562 | " \n", 563 | " \n", 564 | " \n", 565 | " \n", 566 | " \n", 567 | " \n", 568 | " \n", 569 | " \n", 570 | " \n", 571 | " \n", 572 | " \n", 573 | " \n", 574 | " \n", 575 | " \n", 576 | " \n", 577 | " \n", 578 | " \n", 579 | " \n", 580 | " \n", 581 | " \n", 582 | " \n", 583 | " \n", 584 | " \n", 585 | " \n", 586 | " \n", 587 | " \n", 588 | " \n", 589 | " \n", 590 | " \n", 591 | " \n", 592 | " \n", 593 | " \n", 594 | " \n", 595 | " \n", 596 | " \n", 597 | " \n", 598 | " \n", 599 | " \n", 600 | " \n", 601 | " \n", 602 | " \n", 603 | " \n", 604 | " \n", 605 | " \n", 606 | " \n", 607 | " \n", 608 | " \n", 609 | " \n", 610 | " \n", 611 | " \n", 612 | " \n", 613 | " \n", 614 | " \n", 615 | " \n", 616 | " \n", 617 | " \n", 618 | " \n", 619 | " \n", 620 | " \n", 621 | " \n", 622 | " \n", 623 | " \n", 624 | " \n", 625 | " \n", 626 | " \n", 627 | " \n", 628 | " \n", 629 | " \n", 630 | " \n", 631 | " \n", 632 | " \n", 633 | " \n", 634 | " \n", 635 | " \n", 636 | " \n", 637 | " \n", 638 | " \n", 639 | " \n", 640 | " \n", 641 | " \n", 642 | " \n", 643 | " \n", 644 | " \n", 645 | " \n", 646 | " \n", 647 | " \n", 648 | " \n", 649 | " \n", 650 | " \n", 651 | " \n", 652 | " \n", 653 | " \n", 654 | " \n", 655 | " \n", 656 | " \n", 657 | " \n", 658 | " \n", 659 | " \n", 660 | " \n", 661 | " \n", 662 | " \n", 663 | " \n", 664 | " \n", 665 | " \n", 666 | " \n", 667 | " \n", 668 | " \n", 669 | " \n", 670 | " \n", 671 | " \n", 672 | " \n", 673 | " \n", 674 | " \n", 675 | " \n", 676 | " \n", 677 | " \n", 678 | " \n", 679 | " \n", 680 | " \n", 681 | " \n", 682 | " \n", 683 | " \n", 684 | " \n", 685 | " \n", 686 | " \n", 687 | " \n", 688 | " \n", 689 | " \n", 690 | " \n", 691 | " \n", 692 | " \n", 693 | " \n", 694 | " \n", 695 | " \n", 696 | " \n", 697 | " \n", 698 | " \n", 699 | " \n", 700 | " \n", 701 | " \n", 702 | " \n", 703 | " \n", 704 | " \n", 705 | " \n", 706 | " \n", 707 | " \n", 708 | " \n", 709 | " \n", 710 | " \n", 711 | " \n", 712 | " \n", 713 | " \n", 714 | " \n", 715 | " \n", 716 | " \n", 717 | " \n", 718 | " \n", 719 | " \n", 720 | " \n", 721 | " \n", 722 | " \n", 723 | " \n", 724 | " \n", 725 | " \n", 726 | " \n", 727 | " \n", 728 | " \n", 729 | " \n", 730 | " \n", 731 | " \n", 732 | " \n", 733 | " \n", 734 | " \n", 735 | " \n", 736 | " \n", 737 | " \n", 738 | " \n", 739 | " \n", 740 | " \n", 741 | " \n", 742 | " \n", 743 | " \n", 744 | " \n", 745 | " \n", 746 | " \n", 747 | " \n", 748 | " \n", 749 | " \n", 750 | " \n", 751 | " \n", 752 | " \n", 753 | " \n", 754 | " \n", 755 | " \n", 756 | " \n", 757 | " \n", 758 | " \n", 759 | " \n", 760 | " \n", 761 | " \n", 762 | " \n", 763 | " \n", 764 | " \n", 765 | " \n", 766 | " \n", 767 | " \n", 768 | " \n", 769 | " \n", 770 | " \n", 771 | " \n", 772 | " \n", 773 | " \n", 774 | " \n", 775 | " \n", 776 | " \n", 777 | " \n", 778 | " \n", 779 | " \n", 780 | " \n", 781 | " \n", 782 | " \n", 783 | " \n", 784 | " \n", 785 | " \n", 786 | " \n", 787 | " \n", 788 | " \n", 789 | " \n", 790 | " \n", 791 | " \n", 792 | " \n", 793 | " \n", 794 | " \n", 795 | " \n", 796 | " \n", 797 | " \n", 798 | " \n", 799 | " \n", 800 | " \n", 801 | " \n", 802 | " \n", 803 | " \n", 804 | " \n", 805 | " \n", 806 | " \n", 807 | " \n", 808 | " \n", 809 | " \n", 810 | " \n", 811 | " \n", 812 | " \n", 813 | " \n", 814 | " \n", 815 | " \n", 816 | " \n", 817 | " \n", 818 | " \n", 819 | " \n", 820 | " \n", 821 | " \n", 822 | " \n", 823 | " \n", 824 | " \n", 825 | " \n", 826 | " \n", 827 | "
asintitleAccessoriesAcid JazzAcoustic BluesAdult AlternativeAdult ContemporaryAfricaAfro BrazilianAfro-CubanAir Tool AccessoriesAlbum-Oriented Rock (AOR)Alt IndustrialAlt-Country & AmericanaAlternative MedicineAlternative MetalAlternative RockAmbientAmbient PopAmerican AlternativeAmerican PunkAmericanaAmplifiers & EffectsAndesArena RockArgentinaArts & Crafts SuppliesArts, Crafts & SewingAustralia & New ZealandAustriaAvant Garde & Free JazzBaby ProductsBachataBags & CasesBakersfield SoundBalletsBallets & DancesBaroque PopBassBass Guitars...Third Wave SkaThrash & Speed MetalTin Pan AlleyTools & AccessoriesTools & Home ImprovementTraditionalTraditional BluesTraditional British & Celtic FolkTraditional FolkTraditional Jazz & RagtimeTraditional PopTraditional Vocal PopTranceTributesTrim & EmbellishmentsTrip-HopTurkeyTurntablistsTwee PopUrban & ContemporaryUrban FolkUruguayVenezuelaVitamins & Dietary SupplementsVocal BluesVocal JazzVocal Non-OperaVocal PopVoicesWalkersWall StickersWall SwitchesWashersWave Washers & Wave SpringsWedding MusicWest CoastWest Coast BluesWestern SwingWorld DanceWorld Music
05555991584Memory of Trees0.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0...0.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0
16308051551Dont Drink His Blood0.00.00.00.00.00.00.00.00.00.00.00.00.00.01.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0...0.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0
27901622466On Fire0.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0...0.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0
3B0000000ZWChanging Faces0.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0...0.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0
4B00000016WPet Sounds0.00.00.00.00.00.00.00.00.00.00.00.00.00.01.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.01.00.00.0...0.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0
\n", 828 | "

5 rows × 463 columns

\n", 829 | "
" 830 | ], 831 | "text/plain": [ 832 | " asin title ... World Dance World Music\n", 833 | "0 5555991584 Memory of Trees ... 0.0 0.0\n", 834 | "1 6308051551 Dont Drink His Blood ... 0.0 0.0\n", 835 | "2 7901622466 On Fire ... 0.0 0.0\n", 836 | "3 B0000000ZW Changing Faces ... 0.0 0.0\n", 837 | "4 B00000016W Pet Sounds ... 0.0 0.0\n", 838 | "\n", 839 | "[5 rows x 463 columns]" 840 | ] 841 | }, 842 | "metadata": { 843 | "tags": [] 844 | }, 845 | "execution_count": 35 846 | } 847 | ] 848 | }, 849 | { 850 | "cell_type": "code", 851 | "metadata": { 852 | "id": "5_SI0EI7A1Gh", 853 | "colab_type": "code", 854 | "colab": { 855 | "base_uri": "https://localhost:8080/", 856 | "height": 204 857 | }, 858 | "outputId": "03785f7b-5fc7-4c57-c65b-b99a18a1daac" 859 | }, 860 | "source": [ 861 | "df_recsys = dataset[['reviewerID', 'asin', 'overall']] \n", 862 | "df_recsys.head()" 863 | ], 864 | "execution_count": 36, 865 | "outputs": [ 866 | { 867 | "output_type": "execute_result", 868 | "data": { 869 | "text/html": [ 870 | "
\n", 871 | "\n", 884 | "\n", 885 | " \n", 886 | " \n", 887 | " \n", 888 | " \n", 889 | " \n", 890 | " \n", 891 | " \n", 892 | " \n", 893 | " \n", 894 | " \n", 895 | " \n", 896 | " \n", 897 | " \n", 898 | " \n", 899 | " \n", 900 | " \n", 901 | " \n", 902 | " \n", 903 | " \n", 904 | " \n", 905 | " \n", 906 | " \n", 907 | " \n", 908 | " \n", 909 | " \n", 910 | " \n", 911 | " \n", 912 | " \n", 913 | " \n", 914 | " \n", 915 | " \n", 916 | " \n", 917 | " \n", 918 | " \n", 919 | " \n", 920 | " \n", 921 | " \n", 922 | " \n", 923 | " \n", 924 | " \n", 925 | "
reviewerIDasinoverall
0A3EBHHCZO6V2A455559915845
1AZPWAXJG9OJXV55559915845
2A38IRL0X2T4DPF55559915845
3A22IK3I6U76GX055559915845
4A1AISPOIIHTHXX55559915844
\n", 926 | "
" 927 | ], 928 | "text/plain": [ 929 | " reviewerID asin overall\n", 930 | "0 A3EBHHCZO6V2A4 5555991584 5\n", 931 | "1 AZPWAXJG9OJXV 5555991584 5\n", 932 | "2 A38IRL0X2T4DPF 5555991584 5\n", 933 | "3 A22IK3I6U76GX0 5555991584 5\n", 934 | "4 A1AISPOIIHTHXX 5555991584 4" 935 | ] 936 | }, 937 | "metadata": { 938 | "tags": [] 939 | }, 940 | "execution_count": 36 941 | } 942 | ] 943 | }, 944 | { 945 | "cell_type": "code", 946 | "metadata": { 947 | "id": "Ig3bImzsA1Gm", 948 | "colab_type": "code", 949 | "colab": {} 950 | }, 951 | "source": [ 952 | "df_recsys = df_recsys.merge(dataset_metadata[['asin', 'title']])" 953 | ], 954 | "execution_count": 0, 955 | "outputs": [] 956 | }, 957 | { 958 | "cell_type": "code", 959 | "metadata": { 960 | "id": "EHTjinG5A1Gq", 961 | "colab_type": "code", 962 | "colab": { 963 | "base_uri": "https://localhost:8080/", 964 | "height": 51 965 | }, 966 | "outputId": "707561f9-176d-4cd1-aec2-9f3eed7825bd" 967 | }, 968 | "source": [ 969 | "# unique users\n", 970 | "df_recsys.reviewerID.unique()" 971 | ], 972 | "execution_count": 38, 973 | "outputs": [ 974 | { 975 | "output_type": "execute_result", 976 | "data": { 977 | "text/plain": [ 978 | "array(['A3EBHHCZO6V2A4', 'AZPWAXJG9OJXV', 'A38IRL0X2T4DPF', ...,\n", 979 | " 'A3IZB368BG43JS', 'A1TPW86OHXTXFC', 'AVSVOKDI0AGR7'], dtype=object)" 980 | ] 981 | }, 982 | "metadata": { 983 | "tags": [] 984 | }, 985 | "execution_count": 38 986 | } 987 | ] 988 | }, 989 | { 990 | "cell_type": "code", 991 | "metadata": { 992 | "id": "Nre-Q5NIA1Gx", 993 | "colab_type": "code", 994 | "colab": { 995 | "base_uri": "https://localhost:8080/", 996 | "height": 51 997 | }, 998 | "outputId": "536ad431-244c-4935-c76a-9e9e4be17878" 999 | }, 1000 | "source": [ 1001 | "# unique items\n", 1002 | "df_recsys.asin.unique()" 1003 | ], 1004 | "execution_count": 39, 1005 | "outputs": [ 1006 | { 1007 | "output_type": "execute_result", 1008 | "data": { 1009 | "text/plain": [ 1010 | "array(['5555991584', 'B0000000ZW', 'B00000016T', ..., 'B000FBGBQ6',\n", 1011 | " 'B000FDEUI0', 'B000FDFRX2'], dtype=object)" 1012 | ] 1013 | }, 1014 | "metadata": { 1015 | "tags": [] 1016 | }, 1017 | "execution_count": 39 1018 | } 1019 | ] 1020 | }, 1021 | { 1022 | "cell_type": "code", 1023 | "metadata": { 1024 | "id": "1PgnxlkhA1G1", 1025 | "colab_type": "code", 1026 | "colab": { 1027 | "base_uri": "https://localhost:8080/", 1028 | "height": 204 1029 | }, 1030 | "outputId": "c98e012e-0e8d-42e3-b00d-330c70d1c598" 1031 | }, 1032 | "source": [ 1033 | "df_recsys.tail()" 1034 | ], 1035 | "execution_count": 40, 1036 | "outputs": [ 1037 | { 1038 | "output_type": "execute_result", 1039 | "data": { 1040 | "text/html": [ 1041 | "
\n", 1042 | "\n", 1055 | "\n", 1056 | " \n", 1057 | " \n", 1058 | " \n", 1059 | " \n", 1060 | " \n", 1061 | " \n", 1062 | " \n", 1063 | " \n", 1064 | " \n", 1065 | " \n", 1066 | " \n", 1067 | " \n", 1068 | " \n", 1069 | " \n", 1070 | " \n", 1071 | " \n", 1072 | " \n", 1073 | " \n", 1074 | " \n", 1075 | " \n", 1076 | " \n", 1077 | " \n", 1078 | " \n", 1079 | " \n", 1080 | " \n", 1081 | " \n", 1082 | " \n", 1083 | " \n", 1084 | " \n", 1085 | " \n", 1086 | " \n", 1087 | " \n", 1088 | " \n", 1089 | " \n", 1090 | " \n", 1091 | " \n", 1092 | " \n", 1093 | " \n", 1094 | " \n", 1095 | " \n", 1096 | " \n", 1097 | " \n", 1098 | " \n", 1099 | " \n", 1100 | " \n", 1101 | " \n", 1102 | "
reviewerIDasinoveralltitle
51791A2LZJ5J9H862SNB000FDFRX25The Best Of Survivor
51792A14W8HXP3RM3ZSB000FDFRX23The Best Of Survivor
51793AIMMIYQCNGM24B000FDFRX25The Best Of Survivor
51794AGGC3BHIG6A5KB000FDFRX25The Best Of Survivor
51795A3464G00K8ZYD1B000FDFRX25The Best Of Survivor
\n", 1103 | "
" 1104 | ], 1105 | "text/plain": [ 1106 | " reviewerID asin overall title\n", 1107 | "51791 A2LZJ5J9H862SN B000FDFRX2 5 The Best Of Survivor\n", 1108 | "51792 A14W8HXP3RM3ZS B000FDFRX2 3 The Best Of Survivor\n", 1109 | "51793 AIMMIYQCNGM24 B000FDFRX2 5 The Best Of Survivor\n", 1110 | "51794 AGGC3BHIG6A5K B000FDFRX2 5 The Best Of Survivor\n", 1111 | "51795 A3464G00K8ZYD1 B000FDFRX2 5 The Best Of Survivor" 1112 | ] 1113 | }, 1114 | "metadata": { 1115 | "tags": [] 1116 | }, 1117 | "execution_count": 40 1118 | } 1119 | ] 1120 | }, 1121 | { 1122 | "cell_type": "markdown", 1123 | "metadata": { 1124 | "id": "ZZUCE_YSA1G6", 1125 | "colab_type": "text" 1126 | }, 1127 | "source": [ 1128 | "### Map users and itens" 1129 | ] 1130 | }, 1131 | { 1132 | "cell_type": "code", 1133 | "metadata": { 1134 | "id": "PJztpr4NA1G8", 1135 | "colab_type": "code", 1136 | "colab": {} 1137 | }, 1138 | "source": [ 1139 | "map_users = {user: u_id for u_id, user in enumerate(df_recsys.reviewerID.unique())}\n", 1140 | "map_items = {item: i_id for i_id, item in enumerate(df_recsys.asin.unique())}" 1141 | ], 1142 | "execution_count": 0, 1143 | "outputs": [] 1144 | }, 1145 | { 1146 | "cell_type": "code", 1147 | "metadata": { 1148 | "id": "FrynmP1eA1HA", 1149 | "colab_type": "code", 1150 | "colab": {} 1151 | }, 1152 | "source": [ 1153 | "df_recsys['asin'] = df_recsys['asin'].map(map_items)\n", 1154 | "df_recsys['reviewerID'] = df_recsys['reviewerID'].map(map_users)" 1155 | ], 1156 | "execution_count": 0, 1157 | "outputs": [] 1158 | }, 1159 | { 1160 | "cell_type": "code", 1161 | "metadata": { 1162 | "id": "LMtVjn6JA1HF", 1163 | "colab_type": "code", 1164 | "colab": { 1165 | "base_uri": "https://localhost:8080/", 1166 | "height": 204 1167 | }, 1168 | "outputId": "a7475e4d-06c9-47b3-bcb6-e46dca74b7cd" 1169 | }, 1170 | "source": [ 1171 | "df_recsys.head()" 1172 | ], 1173 | "execution_count": 43, 1174 | "outputs": [ 1175 | { 1176 | "output_type": "execute_result", 1177 | "data": { 1178 | "text/html": [ 1179 | "
\n", 1180 | "\n", 1193 | "\n", 1194 | " \n", 1195 | " \n", 1196 | " \n", 1197 | " \n", 1198 | " \n", 1199 | " \n", 1200 | " \n", 1201 | " \n", 1202 | " \n", 1203 | " \n", 1204 | " \n", 1205 | " \n", 1206 | " \n", 1207 | " \n", 1208 | " \n", 1209 | " \n", 1210 | " \n", 1211 | " \n", 1212 | " \n", 1213 | " \n", 1214 | " \n", 1215 | " \n", 1216 | " \n", 1217 | " \n", 1218 | " \n", 1219 | " \n", 1220 | " \n", 1221 | " \n", 1222 | " \n", 1223 | " \n", 1224 | " \n", 1225 | " \n", 1226 | " \n", 1227 | " \n", 1228 | " \n", 1229 | " \n", 1230 | " \n", 1231 | " \n", 1232 | " \n", 1233 | " \n", 1234 | " \n", 1235 | " \n", 1236 | " \n", 1237 | " \n", 1238 | " \n", 1239 | " \n", 1240 | "
reviewerIDasinoveralltitle
0005Memory of Trees
1105Memory of Trees
2205Memory of Trees
3305Memory of Trees
4404Memory of Trees
\n", 1241 | "
" 1242 | ], 1243 | "text/plain": [ 1244 | " reviewerID asin overall title\n", 1245 | "0 0 0 5 Memory of Trees\n", 1246 | "1 1 0 5 Memory of Trees\n", 1247 | "2 2 0 5 Memory of Trees\n", 1248 | "3 3 0 5 Memory of Trees\n", 1249 | "4 4 0 4 Memory of Trees" 1250 | ] 1251 | }, 1252 | "metadata": { 1253 | "tags": [] 1254 | }, 1255 | "execution_count": 43 1256 | } 1257 | ] 1258 | }, 1259 | { 1260 | "cell_type": "code", 1261 | "metadata": { 1262 | "id": "5ka4hSNAA1HK", 1263 | "colab_type": "code", 1264 | "colab": {} 1265 | }, 1266 | "source": [ 1267 | "asin_title = {}\n", 1268 | "\n", 1269 | "for idx, row in df_recsys.iterrows():\n", 1270 | " asin_title[row['asin']] = row['title']\n", 1271 | " \n", 1272 | "np.save('map_tilte.npy', asin_title)" 1273 | ], 1274 | "execution_count": 0, 1275 | "outputs": [] 1276 | }, 1277 | { 1278 | "cell_type": "markdown", 1279 | "metadata": { 1280 | "id": "jsZzl0wbA1HP", 1281 | "colab_type": "text" 1282 | }, 1283 | "source": [ 1284 | "### Divide dataset\n", 1285 | "\n", 1286 | "https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html" 1287 | ] 1288 | }, 1289 | { 1290 | "cell_type": "code", 1291 | "metadata": { 1292 | "id": "0p-YlyM6A1HQ", 1293 | "colab_type": "code", 1294 | "colab": {} 1295 | }, 1296 | "source": [ 1297 | "from sklearn.model_selection import train_test_split" 1298 | ], 1299 | "execution_count": 0, 1300 | "outputs": [] 1301 | }, 1302 | { 1303 | "cell_type": "code", 1304 | "metadata": { 1305 | "id": "h9Co8FfNA1HU", 1306 | "colab_type": "code", 1307 | "colab": {} 1308 | }, 1309 | "source": [ 1310 | "train, test = train_test_split(df_recsys, test_size=0.33, random_state=42)\n", 1311 | "train.to_csv('train.dat', index=False, header=False, sep='\\t')\n", 1312 | "test.to_csv('test.dat', index=False, header=False, sep='\\t')" 1313 | ], 1314 | "execution_count": 0, 1315 | "outputs": [] 1316 | }, 1317 | { 1318 | "cell_type": "code", 1319 | "metadata": { 1320 | "id": "qMY469OKBwJv", 1321 | "colab_type": "code", 1322 | "colab": { 1323 | "base_uri": "https://localhost:8080/", 1324 | "height": 170 1325 | }, 1326 | "outputId": "3186d9a2-ed33-49f1-bb50-bcc1834796b4" 1327 | }, 1328 | "source": [ 1329 | "ls -l" 1330 | ], 1331 | "execution_count": 47, 1332 | "outputs": [ 1333 | { 1334 | "output_type": "stream", 1335 | "text": [ 1336 | "total 66312\n", 1337 | "drwxrwxr-x 2 1001 1001 4096 Sep 4 20:33 \u001b[0m\u001b[01;34mAmazonMusic\u001b[0m/\n", 1338 | "-rw-r--r-- 1 root root 22112728 Sep 4 20:43 AmazonMusic.tar.xz\n", 1339 | "-rw-r--r-- 1 root root 22112728 Sep 4 20:46 AmazonMusic.tar.xz.1\n", 1340 | "-rw-r--r-- 1 root root 22112728 Sep 4 20:48 AmazonMusic.tar.xz.2\n", 1341 | "-rw-r--r-- 1 root root 74684 Sep 4 20:48 map_tilte.npy\n", 1342 | "drwxr-xr-x 1 root root 4096 Aug 27 16:17 \u001b[01;34msample_data\u001b[0m/\n", 1343 | "-rw-r--r-- 1 root root 484253 Sep 4 20:48 test.dat\n", 1344 | "-rw-r--r-- 1 root root 985210 Sep 4 20:48 train.dat\n" 1345 | ], 1346 | "name": "stdout" 1347 | } 1348 | ] 1349 | }, 1350 | { 1351 | "cell_type": "markdown", 1352 | "metadata": { 1353 | "id": "2n6LQb5cA1HZ", 1354 | "colab_type": "text" 1355 | }, 1356 | "source": [ 1357 | "# Case Recommender" 1358 | ] 1359 | }, 1360 | { 1361 | "cell_type": "markdown", 1362 | "metadata": { 1363 | "id": "XzzXj3o1A1Hc", 1364 | "colab_type": "text" 1365 | }, 1366 | "source": [ 1367 | "You could also use:\n", 1368 | "\n", 1369 | "> from caserec.utils.split_database import SplitDatabase\n", 1370 | "\n", 1371 | "> SplitDatabase(input_file=dataset, dir_folds=dir_path, n_splits=10).k_fold_cross_validation()" 1372 | ] 1373 | }, 1374 | { 1375 | "cell_type": "markdown", 1376 | "metadata": { 1377 | "id": "bV2yax1HA1Hd", 1378 | "colab_type": "text" 1379 | }, 1380 | "source": [ 1381 | "### Rating Prediction" 1382 | ] 1383 | }, 1384 | { 1385 | "cell_type": "code", 1386 | "metadata": { 1387 | "id": "YK7liINfA1He", 1388 | "colab_type": "code", 1389 | "colab": { 1390 | "base_uri": "https://localhost:8080/", 1391 | "height": 170 1392 | }, 1393 | "outputId": "8dd5f560-e485-40b0-91da-b3e06281aaee" 1394 | }, 1395 | "source": [ 1396 | "from caserec.recommenders.rating_prediction.most_popular import MostPopular\n", 1397 | "\n", 1398 | "MostPopular('train.dat', 'test.dat', 'rp_mostPopular.dat').compute()" 1399 | ], 1400 | "execution_count": 48, 1401 | "outputs": [ 1402 | { 1403 | "output_type": "stream", 1404 | "text": [ 1405 | "[Case Recommender: Rating Prediction > Most Popular]\n", 1406 | "\n", 1407 | "train data:: 5036 users and 2581 items (34703 interactions) | sparsity:: 99.73%\n", 1408 | "test data:: 4508 users and 2493 items (17093 interactions) | sparsity:: 99.85%\n", 1409 | "\n", 1410 | "prediction_time:: 0.349664 sec\n", 1411 | "\n", 1412 | "\n", 1413 | "Eval:: MAE: 0.744015 RMSE: 1.005638 \n" 1414 | ], 1415 | "name": "stdout" 1416 | } 1417 | ] 1418 | }, 1419 | { 1420 | "cell_type": "code", 1421 | "metadata": { 1422 | "id": "n1EcaoRjA1Hi", 1423 | "colab_type": "code", 1424 | "colab": { 1425 | "base_uri": "https://localhost:8080/", 1426 | "height": 204 1427 | }, 1428 | "outputId": "8f69e77d-22e1-4987-e672-3dd2c7153a0e" 1429 | }, 1430 | "source": [ 1431 | "predictions = pd.read_csv('rp_mostPopular.dat', sep='\\t', names=['reviewerID', 'asin', 'rate'])\n", 1432 | "predictions['title'] = predictions.asin.map(asin_title)\n", 1433 | "predictions.head()" 1434 | ], 1435 | "execution_count": 49, 1436 | "outputs": [ 1437 | { 1438 | "output_type": "execute_result", 1439 | "data": { 1440 | "text/html": [ 1441 | "
\n", 1442 | "\n", 1455 | "\n", 1456 | " \n", 1457 | " \n", 1458 | " \n", 1459 | " \n", 1460 | " \n", 1461 | " \n", 1462 | " \n", 1463 | " \n", 1464 | " \n", 1465 | " \n", 1466 | " \n", 1467 | " \n", 1468 | " \n", 1469 | " \n", 1470 | " \n", 1471 | " \n", 1472 | " \n", 1473 | " \n", 1474 | " \n", 1475 | " \n", 1476 | " \n", 1477 | " \n", 1478 | " \n", 1479 | " \n", 1480 | " \n", 1481 | " \n", 1482 | " \n", 1483 | " \n", 1484 | " \n", 1485 | " \n", 1486 | " \n", 1487 | " \n", 1488 | " \n", 1489 | " \n", 1490 | " \n", 1491 | " \n", 1492 | " \n", 1493 | " \n", 1494 | " \n", 1495 | " \n", 1496 | " \n", 1497 | " \n", 1498 | " \n", 1499 | " \n", 1500 | " \n", 1501 | " \n", 1502 | "
reviewerIDasinratetitle
0024713.777778Jagged Little Pill Acoustic
1019784.583333La Revancha Del Tango
2004.875000Memory of Trees
3112724.333333Ani Difranco
416674.833333For the Roses
\n", 1503 | "
" 1504 | ], 1505 | "text/plain": [ 1506 | " reviewerID asin rate title\n", 1507 | "0 0 2471 3.777778 Jagged Little Pill Acoustic\n", 1508 | "1 0 1978 4.583333 La Revancha Del Tango\n", 1509 | "2 0 0 4.875000 Memory of Trees\n", 1510 | "3 1 1272 4.333333 Ani Difranco\n", 1511 | "4 1 667 4.833333 For the Roses" 1512 | ] 1513 | }, 1514 | "metadata": { 1515 | "tags": [] 1516 | }, 1517 | "execution_count": 49 1518 | } 1519 | ] 1520 | }, 1521 | { 1522 | "cell_type": "markdown", 1523 | "metadata": { 1524 | "id": "jR5JDgESA1Hm", 1525 | "colab_type": "text" 1526 | }, 1527 | "source": [ 1528 | "### Ranking" 1529 | ] 1530 | }, 1531 | { 1532 | "cell_type": "code", 1533 | "metadata": { 1534 | "id": "M-rLU15PA1Hn", 1535 | "colab_type": "code", 1536 | "colab": { 1537 | "base_uri": "https://localhost:8080/", 1538 | "height": 187 1539 | }, 1540 | "outputId": "2e8dcc95-fe44-4d53-ae43-6fe10337f30c" 1541 | }, 1542 | "source": [ 1543 | "from caserec.recommenders.item_recommendation.most_popular import MostPopular\n", 1544 | "\n", 1545 | "MostPopular('train.dat', 'test.dat', 'rank_mostPopular.dat').compute(as_table=True, metrics=['NDCG'])" 1546 | ], 1547 | "execution_count": 50, 1548 | "outputs": [ 1549 | { 1550 | "output_type": "stream", 1551 | "text": [ 1552 | "[Case Recommender: Item Recommendation > Most Popular]\n", 1553 | "\n", 1554 | "train data:: 5036 users and 2581 items (34703 interactions) | sparsity:: 99.73%\n", 1555 | "test data:: 4508 users and 2493 items (17093 interactions) | sparsity:: 99.85%\n", 1556 | "\n", 1557 | "prediction_time:: 96.215000 sec\n", 1558 | "\n", 1559 | "\n", 1560 | "NDCG@1\tNDCG@3\tNDCG@5\tNDCG@10\t\n", 1561 | "0.019299\t0.041359\t0.051351\t0.065469\t\n" 1562 | ], 1563 | "name": "stdout" 1564 | } 1565 | ] 1566 | }, 1567 | { 1568 | "cell_type": "code", 1569 | "metadata": { 1570 | "id": "INpBAxt-A1Hq", 1571 | "colab_type": "code", 1572 | "colab": { 1573 | "base_uri": "https://localhost:8080/", 1574 | "height": 359 1575 | }, 1576 | "outputId": "e2fc6830-3931-4feb-8102-81b3a1e4b108" 1577 | }, 1578 | "source": [ 1579 | "ranking = pd.read_csv('rank_mostPopular.dat', sep='\\t', names=['reviewerID', 'asin', 'score'])\n", 1580 | "ranking['title'] = ranking.asin.map(asin_title)\n", 1581 | "ranking.head(10)" 1582 | ], 1583 | "execution_count": 51, 1584 | "outputs": [ 1585 | { 1586 | "output_type": "execute_result", 1587 | "data": { 1588 | "text/html": [ 1589 | "
\n", 1590 | "\n", 1603 | "\n", 1604 | " \n", 1605 | " \n", 1606 | " \n", 1607 | " \n", 1608 | " \n", 1609 | " \n", 1610 | " \n", 1611 | " \n", 1612 | " \n", 1613 | " \n", 1614 | " \n", 1615 | " \n", 1616 | " \n", 1617 | " \n", 1618 | " \n", 1619 | " \n", 1620 | " \n", 1621 | " \n", 1622 | " \n", 1623 | " \n", 1624 | " \n", 1625 | " \n", 1626 | " \n", 1627 | " \n", 1628 | " \n", 1629 | " \n", 1630 | " \n", 1631 | " \n", 1632 | " \n", 1633 | " \n", 1634 | " \n", 1635 | " \n", 1636 | " \n", 1637 | " \n", 1638 | " \n", 1639 | " \n", 1640 | " \n", 1641 | " \n", 1642 | " \n", 1643 | " \n", 1644 | " \n", 1645 | " \n", 1646 | " \n", 1647 | " \n", 1648 | " \n", 1649 | " \n", 1650 | " \n", 1651 | " \n", 1652 | " \n", 1653 | " \n", 1654 | " \n", 1655 | " \n", 1656 | " \n", 1657 | " \n", 1658 | " \n", 1659 | " \n", 1660 | " \n", 1661 | " \n", 1662 | " \n", 1663 | " \n", 1664 | " \n", 1665 | " \n", 1666 | " \n", 1667 | " \n", 1668 | " \n", 1669 | " \n", 1670 | " \n", 1671 | " \n", 1672 | " \n", 1673 | " \n", 1674 | " \n", 1675 | " \n", 1676 | " \n", 1677 | " \n", 1678 | " \n", 1679 | " \n", 1680 | " \n", 1681 | " \n", 1682 | " \n", 1683 | " \n", 1684 | " \n", 1685 | "
reviewerIDasinscoretitle
001770596.0The Marshall Mathers LP
102039577.0The Eminem Show [Limited Edition w/ Bonus DVD]
202133551.0Get Rich Or Die Tryin
30169511.0All Eyez on Me
402212510.0Speakerboxxx/ The Love Below
50992509.0Are You Experienced
602408492.0The Documentary
701665480.0Toxicity
801955470.0Blueprint
90459467.0Thriller
\n", 1686 | "
" 1687 | ], 1688 | "text/plain": [ 1689 | " reviewerID asin score title\n", 1690 | "0 0 1770 596.0 The Marshall Mathers LP\n", 1691 | "1 0 2039 577.0 The Eminem Show [Limited Edition w/ Bonus DVD]\n", 1692 | "2 0 2133 551.0 Get Rich Or Die Tryin\n", 1693 | "3 0 169 511.0 All Eyez on Me\n", 1694 | "4 0 2212 510.0 Speakerboxxx/ The Love Below\n", 1695 | "5 0 992 509.0 Are You Experienced\n", 1696 | "6 0 2408 492.0 The Documentary\n", 1697 | "7 0 1665 480.0 Toxicity\n", 1698 | "8 0 1955 470.0 Blueprint\n", 1699 | "9 0 459 467.0 Thriller" 1700 | ] 1701 | }, 1702 | "metadata": { 1703 | "tags": [] 1704 | }, 1705 | "execution_count": 51 1706 | } 1707 | ] 1708 | }, 1709 | { 1710 | "cell_type": "code", 1711 | "metadata": { 1712 | "id": "5Y9tzYdEA1Hz", 1713 | "colab_type": "code", 1714 | "colab": { 1715 | "base_uri": "https://localhost:8080/", 1716 | "height": 173 1717 | }, 1718 | "outputId": "4426df3a-0392-43f6-b0ef-586ea1cfff57" 1719 | }, 1720 | "source": [ 1721 | "train[train.reviewerID == 0]" 1722 | ], 1723 | "execution_count": 52, 1724 | "outputs": [ 1725 | { 1726 | "output_type": "execute_result", 1727 | "data": { 1728 | "text/html": [ 1729 | "
\n", 1730 | "\n", 1743 | "\n", 1744 | " \n", 1745 | " \n", 1746 | " \n", 1747 | " \n", 1748 | " \n", 1749 | " \n", 1750 | " \n", 1751 | " \n", 1752 | " \n", 1753 | " \n", 1754 | " \n", 1755 | " \n", 1756 | " \n", 1757 | " \n", 1758 | " \n", 1759 | " \n", 1760 | " \n", 1761 | " \n", 1762 | " \n", 1763 | " \n", 1764 | " \n", 1765 | " \n", 1766 | " \n", 1767 | " \n", 1768 | " \n", 1769 | " \n", 1770 | " \n", 1771 | " \n", 1772 | " \n", 1773 | " \n", 1774 | " \n", 1775 | " \n", 1776 | " \n", 1777 | " \n", 1778 | " \n", 1779 | " \n", 1780 | " \n", 1781 | " \n", 1782 | " \n", 1783 | "
reviewerIDasinoveralltitle
1816709595Ray of Light
19290010115Axis
27005015235Experience Hendrix
37443020142Come Away with Me
\n", 1784 | "
" 1785 | ], 1786 | "text/plain": [ 1787 | " reviewerID asin overall title\n", 1788 | "18167 0 959 5 Ray of Light\n", 1789 | "19290 0 1011 5 Axis\n", 1790 | "27005 0 1523 5 Experience Hendrix\n", 1791 | "37443 0 2014 2 Come Away with Me" 1792 | ] 1793 | }, 1794 | "metadata": { 1795 | "tags": [] 1796 | }, 1797 | "execution_count": 52 1798 | } 1799 | ] 1800 | } 1801 | ] 1802 | } -------------------------------------------------------------------------------- /Processed Datasets/RetailrocketEcommerce/README.md: -------------------------------------------------------------------------------- 1 | SUMMARY & USAGE LICENSE 2 | ============================================= 3 | 4 | This dataset is provided by Keagle through the link: https://www.kaggle.com/retailrocket/ecommerce-dataset 5 | 6 | The data has been collected from a real-world ecommerce website by Retail Rocket (retailrocket.io). It is raw data, i.e. without any content transformations, however, all values are hashed due to confidential issues. The purpose of publishing is to motivate researches in the field of recommender systems with implicit feedback. 7 | 8 | The behaviour data, i.e. events like clicks, add to carts, transactions, represent interactions that were collected over a period of 4.5 months. A visitor can make three types of events, namely “view”, “addtocart” or “transaction”. In the original dataset, there are 2,756,101 events including 2,664,312 views, 69,332 add to carts and 22,457 transactions produced by 1,407,580 unique visitors. 9 | 10 | 11 | This data has been organized and cleaned up by Arthur Fortes [1] similar to the MovieLens 100k treatment [2], 12 | which removed all users and items who had less than 10 and 40 interactions, receptively and separated 13 | in files the type of events. 14 | 15 | Detailed descriptions of the data file can be found at the end of this file. 16 | 17 | This dataset consists of: 18 | * 92,490 interactions from 3,431 users on 8,885 items. 19 | - History: 3,423 users and 8,878 items (78,371 accesses) 20 | - Purchase: 824 users and 3,077 items (5,088 interactions) 21 | - Add to cart: 1,557 users and 4,447 items (9,028 interactions) 22 | 23 | If you have any further questions or comments, please contact me 24 | . 25 | 26 | 27 | ACKNOWLEDGEMENTS 28 | ============================================== 29 | 30 | Retail Rocket (retailrocket.io) helps web shoppers make better shopping decisions by providing personalized real-time recommendations through multiple channels with over 100MM unique monthly users and 1000+ retail partners over the world. 31 | 32 | 33 | DETAILED DESCRIPTIONS OF DATA FILES 34 | ============================================== 35 | 36 | Here are brief descriptions of the data. 37 | 38 | view_ecommerce.dat 39 | 40 | The full history set, 78,371 accesses by 3,423 users on 8,878 items. 41 | Each user has accessed at least 10 items. Users and items are 42 | numbered consecutively from 1. The data is ordered by users ids. 43 | This is a tab separated list of 44 | visitorid | itemid | event 45 | 46 | 47 | add_to_cart_ecommerce.dat 48 | 49 | The full history set, 5,088 accesses by 824 users on 3,077 items. 50 | Users and items are numbered consecutively from 1. 51 | The data is ordered by users ids. This is a tab separated list of 52 | visitorid | itemid | event 53 | 54 | 55 | purchase_ecommerce.dat 56 | 57 | The full history set, 9,028 accesses by 1,557 users on 4,447 items. 58 | Users and items are numbered consecutively from 1. The data is ordered by users ids. 59 | This is a tab separated list of 60 | visitorid | itemid | event 61 | 62 | REFERENCES 63 | ============================================== 64 | 65 | [1] Da Costa, Arthur Fortes. PhD candidate at the Institute of Mathematical and Computational Sciences, 66 | University of São Paulo. URL: https://arthurfortes.github.io/ 67 | 68 | 69 | [2] MovieLens 100K Dataset. Stable benchmark dataset. 100,000 ratings from 1000 users on 1700 movies. 70 | Released 4/1998. URL: https://grouplens.org/datasets/movielens/100k/ 71 | Generated by GroupLens [Department of Computer Science and Engineering at the University of Minnesota]. 72 | -------------------------------------------------------------------------------- /Processed Datasets/RetailrocketEcommerce/Retailrocket_Ecommerce.zip: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/caserec/Datasets-for-Recommender-Systems/4180b4dc4103452c591a1718560d29bdf1f48540/Processed Datasets/RetailrocketEcommerce/Retailrocket_Ecommerce.zip -------------------------------------------------------------------------------- /Processed Datasets/Steam/README.md: -------------------------------------------------------------------------------- 1 | SUMMARY & USAGE LICENSE 2 | ============================================= 3 | 4 | This dataset is provided by Keagle through the link: https://www.kaggle.com/tamber/steam-video-games/data. 5 | 6 | * Context 7 | Steam is the world's most popular PC Gaming hub, with over 6,000 games and a community of millions of gamers. With a massive collection that includes everything from AAA blockbusters to small indie titles, great discovery tools are a highly valuable asset for Steam. How can we make them better? 8 | 9 | * Content 10 | This dataset is a list of user behaviors, with columns: user-id, game-title, behavior-name, value. The behaviors included are 'purchase' and 'play'. The value indicates the degree to which the behavior was performed - in the case of 'purchase' the value is always 1, and in the case of 'play' the value represents the number of hours the user has played the game. 11 | 12 | * Acknowledgements 13 | This dataset is generated entirely from public Steam data, so we want to thank Steam for building such an awesome platform and community! 14 | 15 | 16 | This data has been organized by Arthur Fortes [1], which generated new IDs for users and items and separated in files 17 | the information of purchase and play hours. 18 | 19 | Detailed descriptions of the data file can be found at the end of this file. 20 | 21 | This dataset consists of: 22 | * 200,000 interactions (play / purchase) from 12,393 users on 5,155 games. 23 | - Play Hours: 11,350 users and 3,600 games (70,490 interactions) 24 | - Purchase: 12,393 users and 5,155 games (129,512 interactions) 25 | 26 | If you have any further questions or comments, please contact me 27 | . 28 | 29 | 30 | DETAILED DESCRIPTIONS OF DATA FILES 31 | ============================================== 32 | 33 | Here are brief descriptions of the data. 34 | 35 | items_info.dat 36 | 37 | Information about the items (games); this is a tab separated list of 38 | Game_ID | Game Name | 39 | The item ids are the ones used in the game_purchase.dat 40 | and game_play.dat files. 41 | 42 | 43 | users_info.dat 44 | 45 | IDs information about the users; this is a tab separated list of 46 | New_ID | Real_ID | 47 | 48 | The user ids are the ones used in the game_purchase.dat 49 | and game_play.dat files. 50 | 51 | 52 | game_purchase.dat 53 | 54 | The full purchase set: 129,512 interactions by 12,393 users on 5,155 games. 55 | Users and items are numbered consecutively from 1. The data is ordered by users ids. 56 | This is a tab separated list of 57 | User_ID | Game_ID | Purchase 58 | 59 | 60 | game_play.dat 61 | 62 | The full play hours set: 70,490 interactions by 11,350 users on 3,600 games. 63 | Users and items are numbered consecutively from 1. The data is ordered by users ids. 64 | This is a tab separated list of 65 | User_ID | Game_ID | Hours 66 | 67 | 68 | REFERENCES 69 | ============================================== 70 | 71 | [1] Da Costa, Arthur Fortes. PhD candidate at the Institute of Mathematical and Computational Sciences, 72 | University of São Paulo. URL: https://arthurfortes.github.io/ -------------------------------------------------------------------------------- /Processed Datasets/Steam/steam.zip: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/caserec/Datasets-for-Recommender-Systems/4180b4dc4103452c591a1718560d29bdf1f48540/Processed Datasets/Steam/steam.zip -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Public Datasets For Recommender Systems 2 | 3 | This is a repository of a topic-centric public data sources in high quality for Recommender Systems (RS). They are collected and tidied from Stack Overflow, articles, recommender sites and academic experiments. Most of the datasets presented here are free, having open sorce linceses, however, some are not and you need to ask permission to use or cite the authors' work. 4 | 5 | > In addition, this repository contains some pre-processed datasets with treatment for academic experiments. 6 | 7 | ## Link and datasets descriptions 8 | 9 | ### Book 10 | - [Book Crossing](http://www2.informatik.uni-freiburg.de/~cziegler/BX/):: The BookCrossing (BX) dataset was collected by Cai-Nicolas in a 4-week crawl (August / September 2004) from the Book-Crossing community 11 | 12 | ### Dating 13 | - [Dating Agency](http://www.occamslab.com/petricek/data/):: This dataset contains 17,359,346 anonymous ratings of 168,791 profiles made by 135,359 LibimSeTi users as dumped on April 4, 2006. 14 | 15 | ### E-commerce 16 | - [Amazon](http://jmcauley.ucsd.edu/data/amazon/):: This dataset contains product reviews and metadata from Amazon, including 142.8 million reviews spanning May 1996 - July 2014 17 | - [Retailrocket recommender system dataset](https://www.kaggle.com/retailrocket/ecommerce-dataset):: The dataset consists of three files: a file with behaviour data (events.csv), a file with item properties (item_properties.сsv) and a file, which describes category tree (category_tree.сsv). The data has been collected from a real-world ecommerce website. 18 | 19 | ### Music 20 | - [Amazon Music](http://jmcauley.ucsd.edu/data/amazon/):: This digital music dataset contains reviews and metadata from Amazon 21 | - [Yahoo Music](https://webscope.sandbox.yahoo.com/catalog.php?datatype=r):: This dataset represents a snapshot of the Yahoo! Music community's preferences for various musical artists. 22 | - [LastFM (Implicit)](https://grouplens.org/datasets/hetrec-2011/):: This dataset contains social networking, tagging, and music artist listening information from a set of 2K users from Last.fm online music system. 23 | - [Million Song Dataset](https://labrosa.ee.columbia.edu/millionsong/):: The Million Song Dataset is a freely-available collection of audio features and metadata for a million contemporary popular music tracks. 24 | 25 | ### Movies 26 | - [MovieLens](https://grouplens.org/datasets/movielens/):: GroupLens Research has collected and made available rating datasets from their movie web site 27 | - [Yahoo Movies](https://webscope.sandbox.yahoo.com/catalog.php?datatype=r):: This dataset contains ratings for songs collected from two different sources. The first source consists of ratings supplied by users during normal interaction with Yahoo! Music services. 28 | - [CiaoDVD](https://drive.google.com/file/d/1w1FuVSQC9nqxcK5xj0Aw5Oxc1qV7d09A/view?usp=sharing):: CiaoDVD is a dataset crawled from the entire category of DVDs from the dvd.ciao.co.uk website in December, 2013 29 | - [FilmTrust](https://drive.google.com/file/d/1ohQ9oo8aaR7aWlpe56hXx66x-bwXxB56/view?usp=sharing):: FilmTrust is a small dataset crawled from the entire FilmTrust website in June, 2011 30 | - [Netflix](http://academictorrents.com/details/9b13183dc4d60676b773c9e2cd6de5e5542cee9a):: This is the official data set used in the Netflix Prize competition. 31 | 32 | ### Games 33 | 34 | - [Steam Video Games](https://www.kaggle.com/tamber/steam-video-games/data):: This dataset is a list of user behaviors, with columns: user-id, game-title, behavior-name, value. The behaviors included are 'purchase' and 'play'. The value indicates the degree to which the behavior was performed - in the case of 'purchase' the value is always 1, and in the case of 'play' the value represents the number of hours the user has played the game. 35 | 36 | ### Jokes 37 | - [Jester](http://www.ieor.berkeley.edu/~goldberg/jester-data/):: This Joke dataset contains 4.1 million continuous ratings (-10.00 to +10.00) of 100 jokes from 73,496 users 38 | 39 | ### Food 40 | - [Chicago Entree](http://archive.ics.uci.edu/ml/datasets/Entree+Chicago+Recommendation+Data):: This dataset contains a record of user interactions with the Entree Chicago restaurant recommendation system. 41 | 42 | ### Anime 43 | - [Anime Recommendations Database](https://www.kaggle.com/CooperUnion/anime-recommendations-database):: This data set contains information on user preference data from 73,516 users on 12,294 anime. Each user is able to add anime to their completed list and give it a rating and this data set is a compilation of those ratings. 44 | 45 | ### Android Applications 46 | 47 | - [Myket Android Application Install Dataset](https://github.com/erfanloghmani/myket-android-application-market-dataset):: This dataset contains 694,121 application install interactions from 10,000 anonymous users and 7,988 Anroid applications. 48 | 49 | ### Other dataset 50 | 51 | You can find more datasets in: 52 | 53 | - GroupLens Datasets [link](https://grouplens.org/datasets) 54 | - LibRec Datasets [link](https://www.librec.net/datasets.html) 55 | - Yahoo Research [link](https://webscope.sandbox.yahoo.com/catalog.php?datatype=r) 56 | - Datasets for Machine Learning [link](https://gist.github.com/entaroadun/1653794) 57 | - Stanford Large Network Dataset Collection [link](https://snap.stanford.edu/data/) 58 | 59 | ## Usage and License 60 | 61 | Before using these data sets, please review their README files or sites for the usage licenses, acknowledgments and other details. 62 | 63 | `Note` : If you have difficulties in downloading any of these datasets please contact me. I have backup of all datasets. 64 | 65 | ## Recommender Tools 66 | 67 | - [Case Recommender](https://github.com/caserec/CaseRecommender):: Python. 68 | - [MyMediaLite](http://www.mymedialite.net/):: C#. 69 | 70 | ## Contributors 71 | 72 | Arthur Fortes da Costa {fortes [dot] arthur [at] gmail [dot] com} [Editor] 73 | 74 | 75 | --------------------------------------------------------------------------------