├── .gitignore
├── AUCs.png
├── LICENSE
├── README.md
├── Subreddit_Subreddit_TBPR.ipynb
├── bpr.py
├── data_exploration.ipynb
├── docs
├── Poster.pdf
└── recommender_report.pdf
├── models.ipynb
├── models.py
├── preprocessing.ipynb
├── tbpr.ipynb
├── user_embeddings.py
├── utils.py
└── vanilla_tbpr.ipynb
/.gitignore:
--------------------------------------------------------------------------------
1 | # Byte-compiled / optimized / DLL files
2 | __pycache__/
3 | *.py[cod]
4 | *$py.class
5 |
6 | # C extensions
7 | *.so
8 |
9 | # Distribution / packaging
10 | .Python
11 | build/
12 | develop-eggs/
13 | dist/
14 | downloads/
15 | eggs/
16 | .eggs/
17 | lib/
18 | lib64/
19 | parts/
20 | sdist/
21 | var/
22 | wheels/
23 | *.egg-info/
24 | .installed.cfg
25 | *.egg
26 | MANIFEST
27 |
28 | # PyInstaller
29 | # Usually these files are written by a python script from a template
30 | # before PyInstaller builds the exe, so as to inject date/other infos into it.
31 | *.manifest
32 | *.spec
33 |
34 | # Installer logs
35 | pip-log.txt
36 | pip-delete-this-directory.txt
37 |
38 | # Unit test / coverage reports
39 | htmlcov/
40 | .tox/
41 | .coverage
42 | .coverage.*
43 | .cache
44 | nosetests.xml
45 | coverage.xml
46 | *.cover
47 | .hypothesis/
48 | .pytest_cache/
49 |
50 | # Translations
51 | *.mo
52 | *.pot
53 |
54 | # Django stuff:
55 | *.log
56 | local_settings.py
57 | db.sqlite3
58 |
59 | # Flask stuff:
60 | instance/
61 | .webassets-cache
62 |
63 | # Scrapy stuff:
64 | .scrapy
65 |
66 | # Sphinx documentation
67 | docs/_build/
68 |
69 | # PyBuilder
70 | target/
71 |
72 | # Jupyter Notebook
73 | .ipynb_checkpoints
74 |
75 | # pyenv
76 | .python-version
77 |
78 | # celery beat schedule file
79 | celerybeat-schedule
80 |
81 | # SageMath parsed files
82 | *.sage.py
83 |
84 | # Environments
85 | .env
86 | .venv
87 | env/
88 | venv/
89 | ENV/
90 | env.bak/
91 | venv.bak/
92 |
93 | # Spyder project settings
94 | .spyderproject
95 | .spyproject
96 |
97 | # Rope project settings
98 | .ropeproject
99 |
100 | # mkdocs documentation
101 | /site
102 |
103 | # mypy
104 | .mypy_cache/
105 |
106 | # data
107 | data/
108 |
--------------------------------------------------------------------------------
/AUCs.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/abkds/r-ecommender/93e253e15f248112714bcf67b3f289890e55d58c/AUCs.png
--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
1 | MIT License
2 |
3 | Copyright (c) 2019 Abhishek K Das
4 | Copyright (c) 2019 Sukanto Guha
5 | Copyright (c) 2019 Nikhil Bhat
6 | Copyright (c) 2019 Janvi Palan
7 |
8 |
9 | Permission is hereby granted, free of charge, to any person obtaining a copy
10 | of this software and associated documentation files (the "Software"), to deal
11 | in the Software without restriction, including without limitation the rights
12 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
13 | copies of the Software, and to permit persons to whom the Software is
14 | furnished to do so, subject to the following conditions:
15 |
16 | The above copyright notice and this permission notice shall be included in all
17 | copies or substantial portions of the Software.
18 |
19 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
20 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
21 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
22 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
23 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
24 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
25 | SOFTWARE.
26 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # r/ecommender
2 | A subreddit recommender system for users based on user and subreddit similarity using both implicit and explicit signals.
3 |
4 | Abhishek Das, Janvi Palan, Nikhil Bhat, Sukanto Guha
5 |
6 | ## Motivation
7 | Reddit is one of the biggest, most popular sites in the world, and we frequently use Reddit for staying up to date on subjects which interest us. Considering there are _540 million_ users on Reddit, we feel there is a need for a robust recommender system tailor made for Reddit users, so that they can discover new content and immprove their browsing goals, which may be different for different users.
8 | ## Dataset
9 | * Our [dataset](https://www.reddit.com/r/datasets/comments/65o7py/updated_reddit_comment_dataset_as_torrents/) comprises of user comments on Reddit from the month of January 2015. It contains 57 million comments from reddit users. One interesting thing about our dataset is that we have more users than items(subreddits), which is unusual for Information Retrieval datasets **and also why algorithms for other datasets cannot be directly applied to Reddit.**
10 | * Pre-processing
11 | We crysallized down on three approaches, and we created one dataset for each:
12 |
13 | #### Dataset with user-subreddit interactions for Collaborative Filtering and ALS
14 | 1. We removed user - subreddit interactions which were lesser than 30 characters and fewer than 5 comments
15 | 2. We removed users which were bots and removed comments which were [deleted]
16 | 3. Final dataset size:
17 | - Users = **735834**
18 | - Subreddits = **14842**
19 |
20 | #### Dataset with user comments grouped by user
21 | 1. We removed user - subreddit interactions which were lesser than 10 characters and fewer than 3 comments.
22 | 2. Final dataset size: **29 million comments** of the same users and subreddits
23 |
24 | #### Dataset with user comments grouped by subreddit
25 | 1. We removed user - subreddit interactions which were lesser than 10 characters and fewer than 3 comments.
26 | 2. Final dataset size: **29 million comments** of the same users and subreddits
27 |
28 | For all three datasets, we performed comment filtering by removing stopwords, fixing punctuations and converting to lower case.
29 |
30 | ## Methodology
31 | Subreddit recommendation is an important unsolved problem in the broad field of recommender systems, and we tried several methods and finally an ensemble appraoch to tackle this problem.
32 | #### Approach 1: Collaborative Filtering
33 |
34 | This approach involved Dataset 1 from above. We do not consider the actual words in a comment, but just the fact that the user has commented on a subreddit as a signal that they like it. Using this model, the advantage was that it was simple to implement, gave us a good baseline, and is easily scalable. We do **not** consider how many times a user comments on a subreddit, just the fact that they have commented. The major drawback of this method is that it falls behind in terms of our evaluation metrics.
35 |
36 | #### Approach 2: Collaborative Filtering - Alternating Least Squares Matrix Factorization(ALS - MF)
37 | This approach used Dataset 1. In ALS-MF, we theorize that the number of comments a user has on a subreddit is a strong indicator, and not just the fact that they have commented. The theory is that a user who has 50 comments on a subreddit finds the subreddit more relavant than someone who has 5 comments.
38 |
39 | #### Approach 3: Bayesian Personalized Ranking (BPR)
40 | This approach invloves using [BPR](https://arxiv.org/ftp/arxiv/papers/1205/1205.2618.pdf). BPR involves **(user,subreddit,subreddit)** triads. If User1 has commented on Subreddit1 but not Subreddit2, then then (User1,Subreddit1,Subreddit2) will have a positive value. We build such triads for all user and subreddit pairs to build the recommender system.
41 |
42 | #### Approach 4: Textual BPR (Vanilla t-BPR)
43 | This approach uses Dataset 2 and 3. This approach uses the approach used in [Visual BPR](https://arxiv.org/pdf/1510.01784.pdf). In the paper, visual embeddings are founf for each item in the Amazon dataset. We use this aprroach by creating textual embeddings for both users and subreddits by concatenating over all the comments made on a subreddit and concatenating all comments made by a user,respectively. Each list of comments is labelled and embeddings are created via [gensim](https://radimrehurek.com/gensim/models/doc2vec.html). These embeddings were used to find the k-most similar subreddits for recommendation to the user.
44 |
45 | #### Approach 5: Textual BPR + Training (Learning t-BPR)
46 | This approach uses Dataset 2 and 3. The difference from Vanilla t-BPR is that the user-user embeddings are trained by our model from the data instead of using gensim for the same. This model was also based on the Visual BPR paper, and considers a Deep CNN model for training the lower dimension embeddings for each user.
47 |
48 | #### Approach 6: Ensemble - Putting it all together
49 | In our project, we realized that combining different models like ALS with t-BPR may give better results than a similar model, as user recommendation should ideally take into account [user serendipity, novelty and diversity](http://ir.ii.uam.es/rim3/publications/ddr11.pdf). Choosing the best models is work in progress, and needs further insight on what user goals are when browsing Reddit.
50 |
51 | ## Evaluation
52 | For evaluation, we split our data into two sets- training and test data. We initially have a list of subreddits a user subscribes to. We take out 10% of subreddits associated with a user and add them to our test set. Once training is complete, we test how many of the subreddits we removed in thte initial set are present in our recommendations. We used the following evaluation model to test our models:
53 | #### Area-Under-the-Curve (AUC)
54 | This [evaluation criteria](https://wen.wikipedia.org/wiki/Receiver_operating_characteristic#/Area_under_the_curve) gives the probability that a classifier will rank a randomly chosen positive instance higher than a randomly chosen negative one. Because our models are _comparison based_, AUC works well as its defition pertains to the number of comparisons we perform correctly.
55 |
56 |
57 |
58 |
59 |
60 | #### Techonolgies used - a brief list of libraries and languages
61 | * Python 3
62 | * Jupyter notebook
63 | * gensim
64 | * Colab - Google research
65 | * implicit
66 | * tqdm
67 | * scipy
68 | * nltk
69 |
70 | #### [arxiv link](https://arxiv.org/abs/1905.01263)
71 |
--------------------------------------------------------------------------------
/Subreddit_Subreddit_TBPR.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "nbformat": 4,
3 | "nbformat_minor": 0,
4 | "metadata": {
5 | "colab": {
6 | "name": "IR_1.ipynb",
7 | "version": "0.3.2",
8 | "provenance": []
9 | },
10 | "kernelspec": {
11 | "name": "python3",
12 | "display_name": "Python 3"
13 | },
14 | "accelerator": "GPU"
15 | },
16 | "cells": [
17 | {
18 | "metadata": {
19 | "id": "ldwuy95sdjcD",
20 | "colab_type": "code",
21 | "colab": {
22 | "base_uri": "https://localhost:8080/",
23 | "height": 34
24 | },
25 | "outputId": "06bfbc41-4e36-4b02-f3eb-9691a2a8db00"
26 | },
27 | "cell_type": "code",
28 | "source": [
29 | "#!pip install ijson\n",
30 | "import ijson\n",
31 | "import json\n",
32 | "import numpy as np\n",
33 | "from tqdm import tqdm_notebook\n",
34 | "from collections import defaultdict\n",
35 | "\n",
36 | "with open('user_comments_tbpr_1000000','r') as fp:\n",
37 | " master_dict = json.load(fp)"
38 | ],
39 | "execution_count": 9,
40 | "outputs": [
41 | {
42 | "output_type": "stream",
43 | "text": [
44 | "Requirement already satisfied: ijson in /usr/local/lib/python3.6/dist-packages (2.3)\n"
45 | ],
46 | "name": "stdout"
47 | }
48 | ]
49 | },
50 | {
51 | "metadata": {
52 | "id": "NJvEUi5lnbMG",
53 | "colab_type": "code",
54 | "colab": {}
55 | },
56 | "cell_type": "code",
57 | "source": [
58 | "import json\n",
59 | "from nltk.tokenize import RegexpTokenizer\n",
60 | "import string\n",
61 | "import nltk\n",
62 | "from nltk.corpus import stopwords\n",
63 | "from nltk import word_tokenize\n",
64 | "\n",
65 | "stop_words = set(stopwords.words('english'))\n",
66 | "count = 0\n",
67 | "tokenizer = RegexpTokenizer(r'\\w+')\n",
68 | "for e in master_dict:\n",
69 | " count+=1\n",
70 | " word_tokens = word_tokenize(master_dict[e]['body']) \n",
71 | " filtered_sentence = [w for w in word_tokens if not w in stop_words]\n",
72 | " filtered_sentence = [w for w in filtered_sentence if not w == '\\n']\n",
73 | " filtered_sentence = [w for w in filtered_sentence if not w in string.punctuation]\n",
74 | " \n",
75 | " master_dict[e]['body'] = filtered_sentence\n",
76 | "#master_dict[e]['body']\n",
77 | "\n",
78 | "subreddit_comments = defaultdict(list)\n",
79 | "for key in master_dict:\n",
80 | " subreddit_comments[master_dict[key]['subreddit']].extend([x.lower() for x in master_dict[key]['body']]) \n",
81 | "subreddit_comments['exmormon']"
82 | ],
83 | "execution_count": 0,
84 | "outputs": []
85 | },
86 | {
87 | "metadata": {
88 | "id": "XKSZURdDnjkI",
89 | "colab_type": "code",
90 | "colab": {
91 | "base_uri": "https://localhost:8080/",
92 | "height": 366
93 | },
94 | "outputId": "7e3ef6ef-fffb-4967-e017-a1b0ce4da9ed"
95 | },
96 | "cell_type": "code",
97 | "source": [
98 | "from gensim.test.utils import common_texts\n",
99 | "from gensim.models import Word2Vec\n",
100 | "from gensim.models.doc2vec import Doc2Vec, TaggedDocument\n",
101 | "import nltk\n",
102 | "from nltk.tokenize import word_tokenize\n",
103 | "\n",
104 | "tagged_data = [TaggedDocument(words=(_d), tags=[str(i)]) for i, _d in subreddit_comments.items()]\n",
105 | "model = Doc2Vec(tagged_data, vector_size=50, window=2, min_count=1,epochs = 50, workers=4)"
106 | ],
107 | "execution_count": 91,
108 | "outputs": [
109 | {
110 | "output_type": "error",
111 | "ename": "KeyboardInterrupt",
112 | "evalue": "ignored",
113 | "traceback": [
114 | "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
115 | "\u001b[0;31mKeyboardInterrupt\u001b[0m Traceback (most recent call last)",
116 | "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[1;32m 6\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 7\u001b[0m \u001b[0mtagged_data\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m[\u001b[0m\u001b[0mTaggedDocument\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mwords\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0m_d\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mtags\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mstr\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mi\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;32mfor\u001b[0m \u001b[0mi\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0m_d\u001b[0m \u001b[0;32min\u001b[0m \u001b[0msubreddit_comments\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mitems\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 8\u001b[0;31m \u001b[0mmodel\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mDoc2Vec\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mtagged_data\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mvector_size\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;36m50\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mwindow\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;36m2\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mmin_count\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;36m1\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0mepochs\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;36m50\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mworkers\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;36m4\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
117 | "\u001b[0;32m/usr/local/lib/python3.6/dist-packages/gensim/models/doc2vec.py\u001b[0m in \u001b[0;36m__init__\u001b[0;34m(self, documents, corpus_file, dm_mean, dm, dbow_words, dm_concat, dm_tag_count, docvecs, docvecs_mapfile, comment, trim_rule, callbacks, **kwargs)\u001b[0m\n\u001b[1;32m 609\u001b[0m \u001b[0;32melif\u001b[0m \u001b[0misinstance\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mdocuments\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mGeneratorType\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 610\u001b[0m \u001b[0;32mraise\u001b[0m \u001b[0mTypeError\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"You can't pass a generator as the documents argument. Try an iterator.\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 611\u001b[0;31m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mbuild_vocab\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mdocuments\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mdocuments\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mcorpus_file\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mcorpus_file\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mtrim_rule\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mtrim_rule\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 612\u001b[0m self.train(\n\u001b[1;32m 613\u001b[0m \u001b[0mdocuments\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mdocuments\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mcorpus_file\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mcorpus_file\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mtotal_examples\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mcorpus_count\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
118 | "\u001b[0;32m/usr/local/lib/python3.6/dist-packages/gensim/models/doc2vec.py\u001b[0m in \u001b[0;36mbuild_vocab\u001b[0;34m(self, documents, corpus_file, update, progress_per, keep_raw_vocab, trim_rule, **kwargs)\u001b[0m\n\u001b[1;32m 1160\u001b[0m total_words, corpus_count = self.vocabulary.scan_vocab(\n\u001b[1;32m 1161\u001b[0m \u001b[0mdocuments\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mdocuments\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mcorpus_file\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mcorpus_file\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mdocvecs\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mdocvecs\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 1162\u001b[0;31m \u001b[0mprogress_per\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mprogress_per\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mtrim_rule\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mtrim_rule\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 1163\u001b[0m )\n\u001b[1;32m 1164\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mcorpus_count\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mcorpus_count\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
119 | "\u001b[0;32m/usr/local/lib/python3.6/dist-packages/gensim/models/doc2vec.py\u001b[0m in \u001b[0;36mscan_vocab\u001b[0;34m(self, documents, corpus_file, docvecs, progress_per, trim_rule)\u001b[0m\n\u001b[1;32m 1357\u001b[0m \u001b[0mdocuments\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mTaggedLineDocument\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mcorpus_file\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1358\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 1359\u001b[0;31m \u001b[0mtotal_words\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mcorpus_count\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_scan_vocab\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mdocuments\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mdocvecs\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mprogress_per\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mtrim_rule\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 1360\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1361\u001b[0m logger.info(\n",
120 | "\u001b[0;32m/usr/local/lib/python3.6/dist-packages/gensim/models/doc2vec.py\u001b[0m in \u001b[0;36m_scan_vocab\u001b[0;34m(self, documents, docvecs, progress_per, trim_rule)\u001b[0m\n\u001b[1;32m 1306\u001b[0m \u001b[0m_note_doctag\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mtag\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mdocument_length\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mdocvecs\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1307\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 1308\u001b[0;31m \u001b[0;32mfor\u001b[0m \u001b[0mword\u001b[0m \u001b[0;32min\u001b[0m \u001b[0mdocument\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mwords\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 1309\u001b[0m \u001b[0mvocab\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mword\u001b[0m\u001b[0;34m]\u001b[0m \u001b[0;34m+=\u001b[0m \u001b[0;36m1\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1310\u001b[0m \u001b[0mtotal_words\u001b[0m \u001b[0;34m+=\u001b[0m \u001b[0mlen\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mdocument\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mwords\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
121 | "\u001b[0;31mKeyboardInterrupt\u001b[0m: "
122 | ]
123 | }
124 | ]
125 | },
126 | {
127 | "metadata": {
128 | "id": "h2vbh-OJoDjg",
129 | "colab_type": "code",
130 | "colab": {
131 | "base_uri": "https://localhost:8080/",
132 | "height": 394
133 | },
134 | "outputId": "65c73266-a836-4713-83ac-dc9ce69dc5dc"
135 | },
136 | "cell_type": "code",
137 | "source": [
138 | "print(len(model.wv.vocab))\n",
139 | "print(len(model.docvecs))"
140 | ],
141 | "execution_count": 19,
142 | "outputs": [
143 | {
144 | "output_type": "stream",
145 | "text": [
146 | "547882\n",
147 | "11314\n"
148 | ],
149 | "name": "stdout"
150 | },
151 | {
152 | "output_type": "error",
153 | "ename": "TypeError",
154 | "evalue": "ignored",
155 | "traceback": [
156 | "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
157 | "\u001b[0;31mTypeError\u001b[0m Traceback (most recent call last)",
158 | "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[0mprint\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mlen\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mmodel\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mwv\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mvocab\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 2\u001b[0m \u001b[0mprint\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mlen\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mmodel\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mdocvecs\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 3\u001b[0;31m \u001b[0msimilar_doc\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mmodel\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mdocvecs\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mmost_similar\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'avengers'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 4\u001b[0m \u001b[0msimilar_doc\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
159 | "\u001b[0;32m/usr/local/lib/python3.6/dist-packages/gensim/models/keyedvectors.py\u001b[0m in \u001b[0;36mmost_similar\u001b[0;34m(self, positive, negative, topn, clip_start, clip_end, indexer)\u001b[0m\n\u001b[1;32m 1647\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0misinstance\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mdoc\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mndarray\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1648\u001b[0m \u001b[0mmean\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mappend\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mweight\u001b[0m \u001b[0;34m*\u001b[0m \u001b[0mdoc\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 1649\u001b[0;31m \u001b[0;32melif\u001b[0m \u001b[0mdoc\u001b[0m \u001b[0;32min\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mdoctags\u001b[0m \u001b[0;32mor\u001b[0m \u001b[0mdoc\u001b[0m \u001b[0;34m<\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mcount\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 1650\u001b[0m \u001b[0mmean\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mappend\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mweight\u001b[0m \u001b[0;34m*\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mvectors_docs_norm\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_int_index\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mdoc\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mdoctags\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mmax_rawint\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1651\u001b[0m \u001b[0mall_docs\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0madd\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_int_index\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mdoc\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mdoctags\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mmax_rawint\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
160 | "\u001b[0;31mTypeError\u001b[0m: '<' not supported between instances of 'str' and 'int'"
161 | ]
162 | }
163 | ]
164 | },
165 | {
166 | "metadata": {
167 | "id": "bcPLOkmLpa2H",
168 | "colab_type": "code",
169 | "colab": {}
170 | },
171 | "cell_type": "code",
172 | "source": [
173 | "model.save('subreddit_subreddit_bpr.model')"
174 | ],
175 | "execution_count": 0,
176 | "outputs": []
177 | },
178 | {
179 | "metadata": {
180 | "id": "eGITAgq_Afjl",
181 | "colab_type": "code",
182 | "colab": {
183 | "base_uri": "https://localhost:8080/",
184 | "height": 245
185 | },
186 | "outputId": "b36677ff-2491-4536-a432-072627478c06"
187 | },
188 | "cell_type": "code",
189 | "source": [
190 | "similar_doc = model.docvecs.most_similar('howyoudoin') \n",
191 | "similar_doc"
192 | ],
193 | "execution_count": 87,
194 | "outputs": [
195 | {
196 | "output_type": "stream",
197 | "text": [
198 | "/usr/local/lib/python3.6/dist-packages/gensim/matutils.py:737: FutureWarning: Conversion of the second argument of issubdtype from `int` to `np.signedinteger` is deprecated. In future, it will be treated as `np.int64 == np.dtype(int).type`.\n",
199 | " if np.issubdtype(vec.dtype, np.int):\n"
200 | ],
201 | "name": "stderr"
202 | },
203 | {
204 | "output_type": "execute_result",
205 | "data": {
206 | "text/plain": [
207 | "[('MST3K', 0.793683648109436),\n",
208 | " ('netflix', 0.7886813879013062),\n",
209 | " ('NetflixBestOf', 0.7679350972175598),\n",
210 | " ('eFreebies', 0.7671219110488892),\n",
211 | " ('MovieSuggestions', 0.7627629637718201),\n",
212 | " ('Fullmoviesonvimeo', 0.7585855722427368),\n",
213 | " ('overthegardenwall', 0.7394500970840454),\n",
214 | " ('arcadefire', 0.7392991781234741),\n",
215 | " ('karengillan', 0.7355697751045227),\n",
216 | " ('futurama', 0.7309762835502625)]"
217 | ]
218 | },
219 | "metadata": {
220 | "tags": []
221 | },
222 | "execution_count": 87
223 | }
224 | ]
225 | },
226 | {
227 | "metadata": {
228 | "id": "WVQbr19FCJnX",
229 | "colab_type": "code",
230 | "colab": {
231 | "base_uri": "https://localhost:8080/",
232 | "height": 245
233 | },
234 | "outputId": "227840e8-924a-4109-86f9-2e80d7f670aa"
235 | },
236 | "cell_type": "code",
237 | "source": [
238 | "result = model.docvecs.most_similar(positive=['TheRedPill', 'netflix'], negative=['TheBluePill'])\n",
239 | "result"
240 | ],
241 | "execution_count": 90,
242 | "outputs": [
243 | {
244 | "output_type": "stream",
245 | "text": [
246 | "/usr/local/lib/python3.6/dist-packages/gensim/matutils.py:737: FutureWarning: Conversion of the second argument of issubdtype from `int` to `np.signedinteger` is deprecated. In future, it will be treated as `np.int64 == np.dtype(int).type`.\n",
247 | " if np.issubdtype(vec.dtype, np.int):\n"
248 | ],
249 | "name": "stderr"
250 | },
251 | {
252 | "output_type": "execute_result",
253 | "data": {
254 | "text/plain": [
255 | "[('MST3K', 0.7942430377006531),\n",
256 | " ('eFreebies', 0.7533497214317322),\n",
257 | " ('AskUK', 0.7411583662033081),\n",
258 | " ('HighschoolDxD', 0.7250357866287231),\n",
259 | " ('bangalore', 0.7221680879592896),\n",
260 | " ('MovieSuggestions', 0.7219159603118896),\n",
261 | " ('Animesuggest', 0.7205169200897217),\n",
262 | " ('japan', 0.7200680375099182),\n",
263 | " ('creepy_gif', 0.7101523280143738),\n",
264 | " ('anime', 0.7031661868095398)]"
265 | ]
266 | },
267 | "metadata": {
268 | "tags": []
269 | },
270 | "execution_count": 90
271 | }
272 | ]
273 | },
274 | {
275 | "metadata": {
276 | "id": "klD-EMSRHrEh",
277 | "colab_type": "code",
278 | "colab": {}
279 | },
280 | "cell_type": "code",
281 | "source": [
282 | "android"
283 | ],
284 | "execution_count": 0,
285 | "outputs": []
286 | }
287 | ]
288 | }
--------------------------------------------------------------------------------
/bpr.py:
--------------------------------------------------------------------------------
1 | import numpy as np
2 | import torch
3 | import torch.nn as nn
4 | import torch.nn.functional as F
5 | import torch.optim as optim
6 | from torch.nn.init import normal_
7 |
8 |
9 | def get_config():
10 | mean = 0
11 | stddev = 0.01
12 | return mean, stddev
13 |
14 |
15 | mean, stddev = get_config()
16 |
17 |
18 | def Normal(tensor, mean=mean, stddev=stddev):
19 | "Re initialize the tensor with normal weights and custom mean and stddev"
20 | normal_(tensor, mean=mean, std=stddev)
21 | return None
22 |
23 |
24 | class vBPR(nn.Module):
25 | """
26 | Creates a vBPR module, which learns the latent factors over
27 | the user and item interactions.
28 |
29 | For more details refer to the paper: https://arxiv.org/pdf/1510.01784.pdf
30 | """
31 |
32 | def __init__(self,
33 | num_latent_factors,
34 | num_visual_factors,
35 | num_embedding_factors,
36 | num_users,
37 | num_items,
38 | visual_features,
39 | dropout=0.1):
40 | "Creates the weights matrices for storing factors"
41 | super(vBPR, self).__init__()
42 | self.K = num_latent_factors
43 | self.D = num_visual_factors
44 | self.F = num_embedding_factors
45 |
46 | self.n_u = num_users
47 | self.n_i = num_items
48 |
49 | mean, stddev = get_config()
50 | # declare latent factor matrices for users and items
51 | self.U_latent_factors = nn.Parameter(torch.randn(self.n_u, self.K))
52 | self.I_latent_factors = nn.Parameter(torch.randn(self.n_i, self.K))
53 | Normal(self.U_latent_factors)
54 | Normal(self.I_latent_factors)
55 |
56 | # declare visual factor matrices for users
57 | self.U_visual_factors = nn.Parameter(torch.randn(self.n_u, self.D))
58 | Normal(self.U_visual_factors)
59 |
60 | # embedding linear layer for projecting embedding to visual factors
61 | self.embedding_projection = nn.Linear(self.F, self.D)
62 | Normal(self.embedding_projection.weight)
63 | Normal(self.embedding_projection.bias)
64 |
65 | self.dropout = nn.Dropout(dropout)
66 |
67 | # visual bias
68 | self.beta_dash = nn.Parameter(torch.randn(1, self.F))
69 | Normal(self.beta_dash)
70 |
71 | # user bias and item bias
72 | self.user_bias = nn.Parameter(torch.zeros(self.n_u))
73 | self.item_bias = nn.Parameter(torch.zeros(self.n_i))
74 | Normal(self.user_bias)
75 | Normal(self.item_bias)
76 |
77 | self.visual_features = visual_features
78 |
79 | # TODO: include regularization
80 |
81 | def get_xui(self, u_s, i_s):
82 | "Get x_ui value for a bunch of user indices and item indices"
83 | I_visual_factors = self.dropout(
84 | self.embedding_projection(
85 | self.visual_features[i_s]))
86 |
87 | return self.user_bias[u_s] + self.item_bias[i_s] + \
88 | torch.bmm(
89 | self.U_latent_factors[u_s].unsqueeze(1),
90 | self.I_latent_factors[i_s].unsqueeze(2)
91 | ).squeeze() + \
92 | torch.bmm(
93 | self.U_visual_factors[u_s].unsqueeze(1),
94 | I_visual_factors.unsqueeze(2)
95 | ).squeeze() + \
96 | self.beta_dash.mm(
97 | self.visual_features[i_s].transpose(0, 1)
98 | ).squeeze()
99 |
100 | def forward(self, trg_batch):
101 | """Calculate the preferences of user, i, j pairs.
102 |
103 | Args:
104 | trg_batch: [batch, 3]
105 | Returns:
106 | A Tensor of shape [batch, 1]
107 | """
108 | user_indices = trg_batch[:, 0]
109 | i_indices = trg_batch[:, 1]
110 | j_indices = trg_batch[:, 2]
111 | return self.get_xui(user_indices, i_indices) - \
112 | self.get_xui(user_indices, j_indices)
113 |
114 |
115 | def BPRLoss(batch_xuij):
116 | "Return the loss for a batch of xuij predictions"
117 | return -torch.log(torch.sigmoid(xuij)).sum()
118 |
119 |
120 | def data_gen(train_data, batch_size=8):
121 | """
122 | Yields batches of training data with given batch size
123 |
124 | Args:
125 | train_data : Interaction, which contains user, i, count tuples
126 |
127 | Yields :
128 | batch of (u, i, j) tuples
129 | """
130 | interactions_dict = train_data.get_interaction_dict()
131 | num_items = train_data.num_items
132 | num_users = train_data.num_users
133 |
134 | for user in interactions_dict:
135 | for item in interactions_dict[user]:
136 | X = np.zeros((batch_size, 3))
137 | X[:, 0] = user
138 | X[:, 1] = item
139 | js = [np.random.randint(num_items) for _ in range(2 * batch_size)]
140 | row_index = 0
141 | for j in js:
142 | if j not in interactions_dict[user] and row_index < batch_size:
143 | X[row_index, 2] = j
144 | row_index += 1
145 |
146 | # just repeat the first row remaining times
147 | if row_index < batch_size:
148 | X[row_index, 2] = X[0, 2]
149 | row_index += 1
150 |
151 | yield torch.as_tensor(X, dtype=torch.long)
152 |
153 |
154 | num_epochs = 1
155 |
156 |
157 | def train(
158 | num_latent_factors,
159 | num_visual_factors,
160 | num_embedding_factors,
161 | num_users,
162 | num_items,
163 | train_data_gen,
164 | visual_features,
165 | dropout=0.1):
166 | "trains the network over the training data"
167 | model = vBPR(num_latent_factors=num_latent_factors,
168 | num_visual_factors=num_visual_factors,
169 | num_embedding_factors=num_embedding_factors,
170 | num_users=num_users,
171 | num_items=num_items,
172 | visual_features=visual_features,
173 | dropout=dropout)
174 |
175 | # initialize variables
176 | loss = 0
177 | print_losses = []
178 |
179 | # can try other optimizers here
180 | optimizer = optim.SGD(model.parameters(), lr=0.1, momentum=0.9)
181 |
182 | num_batches = 1000
183 | i = 0
184 | for epoch in range(num_epochs):
185 | running_loss = 0.0
186 | for trg_batch in train_data_gen:
187 | if i >= num_batches:
188 | break
189 |
190 | # zero the parameter gradients
191 | optimizer.zero_grad()
192 |
193 | # forward + backward + optimizer
194 | outputs = model(trg_batch)
195 | loss = BPRLoss(outputs)
196 | loss.backward()
197 | optimizer.step()
198 |
199 | # print statistics
200 | running_loss += loss.item()
201 |
202 | if i % 100 == 99: # print every 2000 mini-batches
203 | print('[%d, %5d] loss: %.3f' %
204 | (epoch + 1, i + 1, running_loss / 2000))
205 | running_loss = 0.0
206 |
207 | i += 1
208 |
209 | print('Finished Training')
210 |
--------------------------------------------------------------------------------
/docs/Poster.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/abkds/r-ecommender/93e253e15f248112714bcf67b3f289890e55d58c/docs/Poster.pdf
--------------------------------------------------------------------------------
/docs/recommender_report.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/abkds/r-ecommender/93e253e15f248112714bcf67b3f289890e55d58c/docs/recommender_report.pdf
--------------------------------------------------------------------------------
/models.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "code",
5 | "execution_count": 1,
6 | "metadata": {},
7 | "outputs": [],
8 | "source": [
9 | "import implicit\n",
10 | "import numpy as np\n",
11 | "from tqdm import tqdm_notebook\n",
12 | "import pandas as pd\n",
13 | "import csv \n",
14 | "import scipy\n",
15 | "from scipy.sparse import coo_matrix\n",
16 | "from scipy.sparse.linalg import svds\n",
17 | "from implicit.nearest_neighbours import bm25_weight\n",
18 | "from implicit import alternating_least_squares\n",
19 | "import umap"
20 | ]
21 | },
22 | {
23 | "cell_type": "code",
24 | "execution_count": 2,
25 | "metadata": {},
26 | "outputs": [],
27 | "source": [
28 | "data = []\n",
29 | "with open('final_interactions_count') as csvfile:\n",
30 | " datareader = csv.reader(csvfile, delimiter=' ')\n",
31 | " for subreddit, user, comments, _ in datareader:\n",
32 | " data.append([user, subreddit, int(comments)])"
33 | ]
34 | },
35 | {
36 | "cell_type": "code",
37 | "execution_count": 3,
38 | "metadata": {},
39 | "outputs": [],
40 | "source": [
41 | "data = pd.DataFrame.from_records(data)"
42 | ]
43 | },
44 | {
45 | "cell_type": "code",
46 | "execution_count": 4,
47 | "metadata": {},
48 | "outputs": [],
49 | "source": [
50 | "data.columns = ['user', 'subreddit', 'comments']"
51 | ]
52 | },
53 | {
54 | "cell_type": "code",
55 | "execution_count": 5,
56 | "metadata": {},
57 | "outputs": [],
58 | "source": [
59 | "data['user'] = data['user'].astype(\"category\")\n",
60 | "data['subreddit'] = data['subreddit'].astype(\"category\")"
61 | ]
62 | },
63 | {
64 | "cell_type": "code",
65 | "execution_count": 6,
66 | "metadata": {},
67 | "outputs": [],
68 | "source": [
69 | "# create a sparse matrix of all the artist/user/play triples\n",
70 | "comments = coo_matrix((data['comments'].astype(float), \n",
71 | " (data['subreddit'].cat.codes, \n",
72 | " data['user'].cat.codes)))"
73 | ]
74 | },
75 | {
76 | "cell_type": "code",
77 | "execution_count": 7,
78 | "metadata": {},
79 | "outputs": [],
80 | "source": [
81 | "csr_comments = comments.tocsr()"
82 | ]
83 | },
84 | {
85 | "cell_type": "code",
86 | "execution_count": 8,
87 | "metadata": {},
88 | "outputs": [
89 | {
90 | "data": {
91 | "text/plain": [
92 | "1537767"
93 | ]
94 | },
95 | "execution_count": 8,
96 | "metadata": {},
97 | "output_type": "execute_result"
98 | }
99 | ],
100 | "source": [
101 | "csr_comments.nnz"
102 | ]
103 | },
104 | {
105 | "cell_type": "markdown",
106 | "metadata": {},
107 | "source": [
108 | "### Latent Semantic Analysis"
109 | ]
110 | },
111 | {
112 | "cell_type": "code",
113 | "execution_count": 9,
114 | "metadata": {},
115 | "outputs": [],
116 | "source": [
117 | "# toggle this variable if you want to recalculate the als factors\n",
118 | "read_als_factors_from_file = False"
119 | ]
120 | },
121 | {
122 | "cell_type": "code",
123 | "execution_count": 10,
124 | "metadata": {},
125 | "outputs": [
126 | {
127 | "name": "stderr",
128 | "output_type": "stream",
129 | "text": [
130 | "This method is deprecated. Please use the AlternatingLeastSquares class instead\n",
131 | "WARNING:root:OpenBLAS detected. Its highly recommend to set the environment variable 'export OPENBLAS_NUM_THREADS=1' to disable its internal multithreading\n",
132 | "100%|██████████| 15.0/15 [00:14<00:00, 1.29it/s]\n"
133 | ]
134 | }
135 | ],
136 | "source": [
137 | "if read_als_factors_from_file:\n",
138 | " subreddit_factors = np.load('subreddit_factors_als.npy')\n",
139 | " user_factors = np.load('user_factors_als.npy')\n",
140 | "else:\n",
141 | " subreddit_factors, user_factors = alternating_least_squares(bm25_weight(comments), 20)"
142 | ]
143 | },
144 | {
145 | "cell_type": "code",
146 | "execution_count": 11,
147 | "metadata": {},
148 | "outputs": [],
149 | "source": [
150 | "class TopRelated(object):\n",
151 | " def __init__(self, subreddit_factors):\n",
152 | " norms = np.linalg.norm(subreddit_factors, axis=-1)\n",
153 | " self.factors = subreddit_factors / norms[:, np.newaxis]\n",
154 | " self.subreddits = data['subreddit'].cat.categories.array.to_numpy()\n",
155 | "\n",
156 | " def get_related(self, subreddit, N=10):\n",
157 | " subredditid = np.where(self.subreddits == subreddit)[0][0]\n",
158 | " scores = self.factors.dot(self.factors[subredditid])\n",
159 | " best = np.argpartition(scores, -N)[-N:]\n",
160 | " best_ = [self.subreddits[i] for i in best]\n",
161 | " return sorted(zip(best_, scores[best]), key=lambda x: -x[1])"
162 | ]
163 | },
164 | {
165 | "cell_type": "code",
166 | "execution_count": 12,
167 | "metadata": {},
168 | "outputs": [],
169 | "source": [
170 | "top_related = TopRelated(subreddit_factors)"
171 | ]
172 | },
173 | {
174 | "cell_type": "code",
175 | "execution_count": 13,
176 | "metadata": {},
177 | "outputs": [
178 | {
179 | "data": {
180 | "text/plain": [
181 | "[('OnePiece', 0.99999994),\n",
182 | " ('Naruto', 0.9689143),\n",
183 | " ('allblue', 0.96665776),\n",
184 | " ('bleach', 0.96390015),\n",
185 | " ('Feels', 0.9556872),\n",
186 | " ('manga', 0.94707495),\n",
187 | " ('Collabcomics', 0.9412527),\n",
188 | " ('whowouldwin', 0.9372169),\n",
189 | " ('anime', 0.9361908),\n",
190 | " ('doujinshi', 0.93373656)]"
191 | ]
192 | },
193 | "execution_count": 13,
194 | "metadata": {},
195 | "output_type": "execute_result"
196 | }
197 | ],
198 | "source": [
199 | "top_related.get_related('OnePiece')"
200 | ]
201 | },
202 | {
203 | "cell_type": "code",
204 | "execution_count": 14,
205 | "metadata": {},
206 | "outputs": [
207 | {
208 | "data": {
209 | "text/plain": [
210 | "(14842, 20)"
211 | ]
212 | },
213 | "execution_count": 14,
214 | "metadata": {},
215 | "output_type": "execute_result"
216 | }
217 | ],
218 | "source": [
219 | "subreddit_factors.shape"
220 | ]
221 | },
222 | {
223 | "cell_type": "code",
224 | "execution_count": 15,
225 | "metadata": {},
226 | "outputs": [
227 | {
228 | "data": {
229 | "text/plain": [
230 | "(14842, 2)"
231 | ]
232 | },
233 | "execution_count": 15,
234 | "metadata": {},
235 | "output_type": "execute_result"
236 | }
237 | ],
238 | "source": [
239 | "subreddits_embedded = umap.UMAP().fit_transform(subreddit_factors)\n",
240 | "subreddits_embedded.shape"
241 | ]
242 | },
243 | {
244 | "cell_type": "code",
245 | "execution_count": 16,
246 | "metadata": {},
247 | "outputs": [
248 | {
249 | "data": {
250 | "text/plain": [
251 | "array([[1.0131747, 5.303012 ],\n",
252 | " [2.208454 , 3.8350732],\n",
253 | " [6.189175 , 1.7819154],\n",
254 | " ...,\n",
255 | " [6.3854785, 2.2662826],\n",
256 | " [7.953953 , 0.7833559],\n",
257 | " [2.5185633, 1.8170209]], dtype=float32)"
258 | ]
259 | },
260 | "execution_count": 16,
261 | "metadata": {},
262 | "output_type": "execute_result"
263 | }
264 | ],
265 | "source": [
266 | "subreddits_embedded"
267 | ]
268 | },
269 | {
270 | "cell_type": "code",
271 | "execution_count": 17,
272 | "metadata": {},
273 | "outputs": [],
274 | "source": [
275 | "subreddits = data['subreddit'].cat.categories.array.to_numpy()"
276 | ]
277 | },
278 | {
279 | "cell_type": "code",
280 | "execution_count": 18,
281 | "metadata": {},
282 | "outputs": [],
283 | "source": [
284 | "import random\n",
285 | "\n",
286 | "indices = random.sample(range(len(subreddits)), 1000)"
287 | ]
288 | },
289 | {
290 | "cell_type": "code",
291 | "execution_count": 19,
292 | "metadata": {},
293 | "outputs": [],
294 | "source": [
295 | "sampled_subreddits = subreddits[indices]\n",
296 | "sampled_subreddits_embedded = subreddits_embedded[indices]"
297 | ]
298 | },
299 | {
300 | "cell_type": "code",
301 | "execution_count": 20,
302 | "metadata": {},
303 | "outputs": [
304 | {
305 | "name": "stdout",
306 | "output_type": "stream",
307 | "text": [
308 | "High five! You successfully sent some data to your account on plotly. View your plot in your browser at https://plot.ly/~abkds/0 or inside your plot.ly account where it is named 'basic-scatter'\n"
309 | ]
310 | },
311 | {
312 | "data": {
313 | "text/html": [
314 | ""
315 | ],
316 | "text/plain": [
317 | ""
318 | ]
319 | },
320 | "execution_count": 20,
321 | "metadata": {},
322 | "output_type": "execute_result"
323 | }
324 | ],
325 | "source": [
326 | "import plotly\n",
327 | "import plotly.plotly as py\n",
328 | "import plotly.graph_objs as go\n",
329 | "\n",
330 | "plotly.tools.set_credentials_file(username='abkds', api_key='KKuXHMUKu7EHg9kIZWrl')\n",
331 | "\n",
332 | "\n",
333 | "# Create random data with numpy\n",
334 | "import numpy as np\n",
335 | "\n",
336 | "N = 500\n",
337 | "xs = sampled_subreddits_embedded[:, 0]\n",
338 | "ys = sampled_subreddits_embedded[:, 1]\n",
339 | "\n",
340 | "# Create a trace\n",
341 | "trace = go.Scatter(\n",
342 | " x = xs,\n",
343 | " y = ys,\n",
344 | " mode='markers+text',\n",
345 | " text=sampled_subreddits\n",
346 | ")\n",
347 | "\n",
348 | "data_ = [trace]\n",
349 | "\n",
350 | "# Plot and embed in ipython notebook!\n",
351 | "py.iplot(data_, filename='basic-scatter')\n",
352 | "\n",
353 | "# or plot with: plot_url = py.plot(data, filename='basic-line')"
354 | ]
355 | },
356 | {
357 | "cell_type": "markdown",
358 | "metadata": {},
359 | "source": [
360 | "### Bayesian Personalized Ranking"
361 | ]
362 | },
363 | {
364 | "cell_type": "code",
365 | "execution_count": 21,
366 | "metadata": {},
367 | "outputs": [],
368 | "source": [
369 | "from implicit.bpr import BayesianPersonalizedRanking\n",
370 | "\n",
371 | "params = {\"factors\": 63}"
372 | ]
373 | },
374 | {
375 | "cell_type": "code",
376 | "execution_count": 22,
377 | "metadata": {},
378 | "outputs": [],
379 | "source": [
380 | "import logging\n",
381 | "import tqdm\n",
382 | "import time\n",
383 | "import codecs"
384 | ]
385 | },
386 | {
387 | "cell_type": "code",
388 | "execution_count": 23,
389 | "metadata": {},
390 | "outputs": [],
391 | "source": [
392 | "model = BayesianPersonalizedRanking(**params)"
393 | ]
394 | },
395 | {
396 | "cell_type": "code",
397 | "execution_count": 24,
398 | "metadata": {},
399 | "outputs": [],
400 | "source": [
401 | "model_name = 'bpr'\n",
402 | "output_filename = 'subreddits_recs_bpr'"
403 | ]
404 | },
405 | {
406 | "cell_type": "code",
407 | "execution_count": 25,
408 | "metadata": {},
409 | "outputs": [
410 | {
411 | "name": "stderr",
412 | "output_type": "stream",
413 | "text": [
414 | "100%|██████████| 100/100 [00:33<00:00, 2.98it/s, correct=97.91%, skipped=4.55%]\n"
415 | ]
416 | }
417 | ],
418 | "source": [
419 | "model.fit(comments)"
420 | ]
421 | },
422 | {
423 | "cell_type": "code",
424 | "execution_count": 26,
425 | "metadata": {},
426 | "outputs": [],
427 | "source": [
428 | "def bpr_related_subreddits(subreddit):\n",
429 | " found = np.where(subreddits == subreddit)\n",
430 | " if len(found[0]) == 0:\n",
431 | " raise ValueError(\"Subreddit doesn't exist in the dataset.\")\n",
432 | " _id = found[0][0]\n",
433 | " return [(subreddits[i], v) for i, v in model.similar_items(_id)]"
434 | ]
435 | },
436 | {
437 | "cell_type": "code",
438 | "execution_count": 27,
439 | "metadata": {},
440 | "outputs": [
441 | {
442 | "data": {
443 | "text/plain": [
444 | "[('dogs', 3.2827873),\n",
445 | " ('Dogtraining', 2.7844837),\n",
446 | " ('puppy101', 2.5864434),\n",
447 | " ('WiggleButts', 2.4180102),\n",
448 | " ('Paleo', 2.4142845),\n",
449 | " ('EatCheapAndHealthy', 2.3868768),\n",
450 | " ('germanshepherds', 2.3686364),\n",
451 | " ('AskVet', 2.3481765),\n",
452 | " ('Pets', 2.341671),\n",
453 | " ('TheGirlSurvivalGuide', 2.3099918)]"
454 | ]
455 | },
456 | "execution_count": 27,
457 | "metadata": {},
458 | "output_type": "execute_result"
459 | }
460 | ],
461 | "source": [
462 | "bpr_related_subreddits('dogs')"
463 | ]
464 | },
465 | {
466 | "cell_type": "code",
467 | "execution_count": 28,
468 | "metadata": {},
469 | "outputs": [],
470 | "source": [
471 | "users = data['user'].cat.categories.array.to_numpy()"
472 | ]
473 | },
474 | {
475 | "cell_type": "code",
476 | "execution_count": 31,
477 | "metadata": {},
478 | "outputs": [],
479 | "source": [
480 | "write_bpr_recommendations = False"
481 | ]
482 | },
483 | {
484 | "cell_type": "code",
485 | "execution_count": 32,
486 | "metadata": {},
487 | "outputs": [
488 | {
489 | "data": {
490 | "application/vnd.jupyter.widget-view+json": {
491 | "model_id": "9a1b4f7e9a33434ea78449a7c34eea2c",
492 | "version_major": 2,
493 | "version_minor": 0
494 | },
495 | "text/plain": [
496 | "HBox(children=(IntProgress(value=0, max=735834), HTML(value='')))"
497 | ]
498 | },
499 | "metadata": {},
500 | "output_type": "display_data"
501 | },
502 | {
503 | "name": "stdout",
504 | "output_type": "stream",
505 | "text": [
506 | "\n"
507 | ]
508 | }
509 | ],
510 | "source": [
511 | "user_comments = comments.T.tocsr()\n",
512 | "if write_bpr_recommendations:\n",
513 | " # generate recommendations for each user and write out to a file\n",
514 | " with tqdm.tqdm_notebook(total=len(users)) as progress:\n",
515 | " with codecs.open(output_filename, \"w\", \"utf8\") as o:\n",
516 | " for userid, username in enumerate(users):\n",
517 | " for subredditid, score in model.recommend(userid, user_comments):\n",
518 | " o.write(\"%s\\t%s\\t%s\\n\" % (username, subreddits[subredditid], score))\n",
519 | " progress.update(1)"
520 | ]
521 | },
522 | {
523 | "cell_type": "markdown",
524 | "metadata": {},
525 | "source": [
526 | "### Sample user recommendations\n",
527 | "\n",
528 | "We went through the user 'xkcd_transciber' list of subreddits, where he/she commented. Taking a view of the kind of subreddits followed by the user we see that the predictions are good. This is just one sample, we are saving the recommendations for all users in a file and will also write the AUC score function for getting the exact scores for the generated recommendations."
529 | ]
530 | },
531 | {
532 | "cell_type": "code",
533 | "execution_count": 33,
534 | "metadata": {},
535 | "outputs": [],
536 | "source": [
537 | "def recommend_for_user(username):\n",
538 | " sample_user_id = np.where(users == username)[0][0]\n",
539 | " return [(subreddits[i], v) for i, v in model.recommend(sample_user_id, user_comments)]"
540 | ]
541 | },
542 | {
543 | "cell_type": "code",
544 | "execution_count": 34,
545 | "metadata": {},
546 | "outputs": [
547 | {
548 | "data": {
549 | "text/plain": [
550 | "[('dwarffortress', 3.7335882),\n",
551 | " ('badphilosophy', 3.1564484),\n",
552 | " ('teslore', 3.1290364),\n",
553 | " ('Morrowind', 2.9986079),\n",
554 | " ('mountandblade', 2.9104137),\n",
555 | " ('Anarchism', 2.8900142),\n",
556 | " ('KerbalAcademy', 2.8693864),\n",
557 | " ('HPMOR', 2.8496687),\n",
558 | " ('shittykickstarters', 2.8411458),\n",
559 | " ('askphilosophy', 2.7882056)]"
560 | ]
561 | },
562 | "execution_count": 34,
563 | "metadata": {},
564 | "output_type": "execute_result"
565 | }
566 | ],
567 | "source": [
568 | "recommend_for_user('xkcd_transcriber')"
569 | ]
570 | },
571 | {
572 | "cell_type": "code",
573 | "execution_count": 35,
574 | "metadata": {},
575 | "outputs": [],
576 | "source": [
577 | "def subreddits_interacted_by_user(username):\n",
578 | " sample_user_id = np.where(users == username)[0][0]\n",
579 | " _idlist = comments.getcol(sample_user_id)\n",
580 | " return [subreddits[idx] for idx, i in enumerate(_idlist.toarray()) if i != 0.0]"
581 | ]
582 | },
583 | {
584 | "cell_type": "code",
585 | "execution_count": 36,
586 | "metadata": {},
587 | "outputs": [
588 | {
589 | "data": {
590 | "text/plain": [
591 | "['Planetside',\n",
592 | " 'tifu',\n",
593 | " 'homestuck',\n",
594 | " 'Minecraft',\n",
595 | " 'worldbuilding',\n",
596 | " 'italy',\n",
597 | " 'AgainstGamerGate',\n",
598 | " 'firefly',\n",
599 | " 'tf2',\n",
600 | " 'MLPLounge',\n",
601 | " 'Futurology',\n",
602 | " 'sysadmin',\n",
603 | " 'GlobalOffensive',\n",
604 | " 'argentina',\n",
605 | " 'india',\n",
606 | " 'RandomActsOfGaming',\n",
607 | " 'books',\n",
608 | " 'worldnews',\n",
609 | " 'boardgames',\n",
610 | " 'CasualConversation']"
611 | ]
612 | },
613 | "execution_count": 36,
614 | "metadata": {},
615 | "output_type": "execute_result"
616 | }
617 | ],
618 | "source": [
619 | "# sample 50 reddits with which xkcd_transcriber has interacted with.\n",
620 | "\n",
621 | "random.sample(subreddits_interacted_by_user('xkcd_transcriber'), 20)"
622 | ]
623 | },
624 | {
625 | "cell_type": "code",
626 | "execution_count": 37,
627 | "metadata": {},
628 | "outputs": [
629 | {
630 | "data": {
631 | "application/vnd.jupyter.widget-view+json": {
632 | "model_id": "c90c3113717242259a721ee69ba28cd9",
633 | "version_major": 2,
634 | "version_minor": 0
635 | },
636 | "text/plain": [
637 | "HBox(children=(IntProgress(value=1, bar_style='info', max=1), HTML(value='')))"
638 | ]
639 | },
640 | "metadata": {},
641 | "output_type": "display_data"
642 | },
643 | {
644 | "name": "stdout",
645 | "output_type": "stream",
646 | "text": [
647 | "\n"
648 | ]
649 | },
650 | {
651 | "data": {
652 | "application/vnd.jupyter.widget-view+json": {
653 | "model_id": "2f020d0befba451185505eb7808c9fe3",
654 | "version_major": 2,
655 | "version_minor": 0
656 | },
657 | "text/plain": [
658 | "HBox(children=(IntProgress(value=0, max=735834), HTML(value='')))"
659 | ]
660 | },
661 | "metadata": {},
662 | "output_type": "display_data"
663 | },
664 | {
665 | "name": "stdout",
666 | "output_type": "stream",
667 | "text": [
668 | "\n"
669 | ]
670 | }
671 | ],
672 | "source": [
673 | "from collections import defaultdict\n",
674 | "# set seed to get the same train and test set\n",
675 | "np.random.seed(42)\n",
676 | "\n",
677 | "filename = 'final_interactions_count'\n",
678 | "train_filename = 'interactions_5'\n",
679 | "\n",
680 | "def create_dataset():\n",
681 | " data = defaultdict(lambda: [])\n",
682 | " with open(filename) as csvfile:\n",
683 | " datareader = csv.reader(csvfile, delimiter=' ') \n",
684 | " for subreddit, user, comments, _ in tqdm.tqdm_notebook(datareader):\n",
685 | " data[user].append((subreddit, comments))\n",
686 | " \n",
687 | "\n",
688 | " f_train = open(train_filename, 'w')\n",
689 | " \n",
690 | " for user, items in tqdm.tqdm_notebook(data.items()):\n",
691 | " np.random.shuffle(items)\n",
692 | " if len(items) >= 2:\n",
693 | " for item in items:\n",
694 | " line = ' '.join(list(map(str, [item[0], user, item[1]]))) + '\\n'\n",
695 | " f_train.write(line)\n",
696 | " \n",
697 | " f_train.close()\n",
698 | " \n",
699 | "create_dataset()"
700 | ]
701 | },
702 | {
703 | "cell_type": "code",
704 | "execution_count": 2,
705 | "metadata": {},
706 | "outputs": [],
707 | "source": [
708 | "data = []\n",
709 | "with open('interactions_5') as csvfile:\n",
710 | " datareader = csv.reader(csvfile, delimiter=' ')\n",
711 | " for subreddit, user, comments in datareader:\n",
712 | " data.append([user, subreddit, 1])"
713 | ]
714 | },
715 | {
716 | "cell_type": "code",
717 | "execution_count": 3,
718 | "metadata": {},
719 | "outputs": [
720 | {
721 | "data": {
722 | "text/plain": [
723 | "1087036"
724 | ]
725 | },
726 | "execution_count": 3,
727 | "metadata": {},
728 | "output_type": "execute_result"
729 | }
730 | ],
731 | "source": [
732 | "len(data)"
733 | ]
734 | },
735 | {
736 | "cell_type": "code",
737 | "execution_count": 4,
738 | "metadata": {},
739 | "outputs": [],
740 | "source": [
741 | "data = pd.DataFrame.from_records(data)\n",
742 | "data.columns = ['user', 'subreddit', 'comments']\n",
743 | "\n",
744 | "data['user'] = data['user'].astype(\"category\")\n",
745 | "data['subreddit'] = data['subreddit'].astype(\"category\")"
746 | ]
747 | },
748 | {
749 | "cell_type": "code",
750 | "execution_count": 5,
751 | "metadata": {},
752 | "outputs": [],
753 | "source": [
754 | "# create a sparse matrix of all the artist/user/play triples\n",
755 | "comments = coo_matrix((data['comments'].astype(float), \n",
756 | " (data['subreddit'].cat.codes, \n",
757 | " data['user'].cat.codes)))"
758 | ]
759 | },
760 | {
761 | "cell_type": "code",
762 | "execution_count": 6,
763 | "metadata": {},
764 | "outputs": [
765 | {
766 | "data": {
767 | "text/plain": [
768 | "<13500x285103 sparse matrix of type ''\n",
769 | "\twith 1087036 stored elements in COOrdinate format>"
770 | ]
771 | },
772 | "execution_count": 6,
773 | "metadata": {},
774 | "output_type": "execute_result"
775 | }
776 | ],
777 | "source": [
778 | "comments"
779 | ]
780 | },
781 | {
782 | "cell_type": "code",
783 | "execution_count": 7,
784 | "metadata": {},
785 | "outputs": [],
786 | "source": [
787 | "subreddits = data['subreddit'].cat.categories.array.to_numpy()\n",
788 | "users = data['user'].cat.categories.array.to_numpy()"
789 | ]
790 | },
791 | {
792 | "cell_type": "code",
793 | "execution_count": 8,
794 | "metadata": {
795 | "scrolled": true
796 | },
797 | "outputs": [
798 | {
799 | "name": "stdout",
800 | "output_type": "stream",
801 | "text": [
802 | "Number of users for BPR model: 285103\n",
803 | "Number of subreddits for BPR model: 13500\n"
804 | ]
805 | }
806 | ],
807 | "source": [
808 | "print('Number of users for BPR model: %s' % len(users))\n",
809 | "print('Number of subreddits for BPR model: %s' % len(subreddits))"
810 | ]
811 | },
812 | {
813 | "cell_type": "markdown",
814 | "metadata": {},
815 | "source": [
816 | "Create the index and the reverse index for the users and subreddits"
817 | ]
818 | },
819 | {
820 | "cell_type": "code",
821 | "execution_count": 9,
822 | "metadata": {},
823 | "outputs": [],
824 | "source": [
825 | "from utils import *"
826 | ]
827 | },
828 | {
829 | "cell_type": "code",
830 | "execution_count": 10,
831 | "metadata": {},
832 | "outputs": [],
833 | "source": [
834 | "subreddits_index = item_to_index(subreddits)\n",
835 | "users_index = item_to_index(users)"
836 | ]
837 | },
838 | {
839 | "cell_type": "markdown",
840 | "metadata": {},
841 | "source": [
842 | "### Extracting test set\n",
843 | "\n",
844 | "We will pluck out the test set, as per the strategy given in the paper [BPR: Bayesian Personalized Ranking from Implicit Feedback](https://arxiv.org/pdf/1205.2618.pdf), section 6.2 "
845 | ]
846 | },
847 | {
848 | "cell_type": "code",
849 | "execution_count": 11,
850 | "metadata": {},
851 | "outputs": [],
852 | "source": [
853 | "def train_test_split(coo_comments):\n",
854 | " \"\"\"\n",
855 | " Omits random user subreddit interactions, zeros them out \n",
856 | " and appends them to the test list.\n",
857 | " \"\"\"\n",
858 | " csr_comments = coo_comments.tocsr()\n",
859 | " \n",
860 | " data = defaultdict(lambda: [])\n",
861 | " with open('interactions_5') as csvfile:\n",
862 | " datareader = csv.reader(csvfile, delimiter=' ') \n",
863 | " for subreddit, user, comments in tqdm.tqdm_notebook(datareader):\n",
864 | " data[user].append((subreddit, comments))\n",
865 | " \n",
866 | " train_set = []\n",
867 | " test_set = []\n",
868 | " \n",
869 | " for user, items in tqdm.tqdm_notebook(data.items()):\n",
870 | " np.random.shuffle(items)\n",
871 | " test_item = items[0]\n",
872 | " test_comments = items[1]\n",
873 | " \n",
874 | " test_subreddit = test_item[0]\n",
875 | " # zero out a user item interaction\n",
876 | " csr_comments[subreddits_index[test_subreddit], users_index[user]] = 0\n",
877 | " \n",
878 | " test_set.append([test_subreddit, user, int(comments)])\n",
879 | " \n",
880 | " for item in items[1:]:\n",
881 | " train_set.append([item[0], user, int(item[1])])\n",
882 | " \n",
883 | " csr_comments.eliminate_zeros()\n",
884 | " return train_set, test_set, csr_comments.tocoo()"
885 | ]
886 | },
887 | {
888 | "cell_type": "code",
889 | "execution_count": 12,
890 | "metadata": {},
891 | "outputs": [
892 | {
893 | "data": {
894 | "application/vnd.jupyter.widget-view+json": {
895 | "model_id": "ce1c954c01244483ad796996b0a73575",
896 | "version_major": 2,
897 | "version_minor": 0
898 | },
899 | "text/plain": [
900 | "HBox(children=(IntProgress(value=1, bar_style='info', max=1), HTML(value='')))"
901 | ]
902 | },
903 | "metadata": {},
904 | "output_type": "display_data"
905 | },
906 | {
907 | "name": "stdout",
908 | "output_type": "stream",
909 | "text": [
910 | "\n"
911 | ]
912 | },
913 | {
914 | "data": {
915 | "application/vnd.jupyter.widget-view+json": {
916 | "model_id": "62de1ef131e741b7b99847896e07e97a",
917 | "version_major": 2,
918 | "version_minor": 0
919 | },
920 | "text/plain": [
921 | "HBox(children=(IntProgress(value=0, max=285103), HTML(value='')))"
922 | ]
923 | },
924 | "metadata": {},
925 | "output_type": "display_data"
926 | },
927 | {
928 | "name": "stdout",
929 | "output_type": "stream",
930 | "text": [
931 | "\n"
932 | ]
933 | }
934 | ],
935 | "source": [
936 | "train_set, test_set, comments = train_test_split(comments)"
937 | ]
938 | },
939 | {
940 | "cell_type": "code",
941 | "execution_count": 13,
942 | "metadata": {},
943 | "outputs": [
944 | {
945 | "data": {
946 | "text/plain": [
947 | "801933"
948 | ]
949 | },
950 | "execution_count": 13,
951 | "metadata": {},
952 | "output_type": "execute_result"
953 | }
954 | ],
955 | "source": [
956 | "len(train_set)"
957 | ]
958 | },
959 | {
960 | "cell_type": "code",
961 | "execution_count": 14,
962 | "metadata": {},
963 | "outputs": [
964 | {
965 | "data": {
966 | "text/plain": [
967 | "285103"
968 | ]
969 | },
970 | "execution_count": 14,
971 | "metadata": {},
972 | "output_type": "execute_result"
973 | }
974 | ],
975 | "source": [
976 | "len(test_set)"
977 | ]
978 | },
979 | {
980 | "cell_type": "markdown",
981 | "metadata": {},
982 | "source": [
983 | "### AUC Metric\n",
984 | "\n",
985 | "We will implement the AUC Metric for evaluation of BPR based methods. We take the definition given in the paper [BPR: Bayesian Personalized Ranking from Implicit Feedback](https://arxiv.org/pdf/1205.2618.pdf), section 6.2 . AUC is defined as \n",
986 | " \n",
987 | "$$AUC = \\frac{1}{| U |} \\sum_u \\frac{1}{|E(u)|} \\sum_{(i, j) \\in E(u)} \\delta(\\hat{x}_{ui} - \\hat{x}_{uj}) $$\n",
988 | "\n",
989 | "where $$E(u) := \\{(i, j) | (u, i) \\in S_{test} ∧ (u, j) \\notin (S_{test} ∪ S_{train})\\}$$\n",
990 | "\n"
991 | ]
992 | },
993 | {
994 | "cell_type": "code",
995 | "execution_count": 15,
996 | "metadata": {},
997 | "outputs": [
998 | {
999 | "data": {
1000 | "application/vnd.jupyter.widget-view+json": {
1001 | "model_id": "4f708fb827d74865b8860996d8c93387",
1002 | "version_major": 2,
1003 | "version_minor": 0
1004 | },
1005 | "text/plain": [
1006 | "HBox(children=(IntProgress(value=0, max=801933), HTML(value='')))"
1007 | ]
1008 | },
1009 | "metadata": {},
1010 | "output_type": "display_data"
1011 | },
1012 | {
1013 | "name": "stdout",
1014 | "output_type": "stream",
1015 | "text": [
1016 | "\n"
1017 | ]
1018 | },
1019 | {
1020 | "data": {
1021 | "application/vnd.jupyter.widget-view+json": {
1022 | "model_id": "b97ec8892ffd4f76b20f2d207b315a46",
1023 | "version_major": 2,
1024 | "version_minor": 0
1025 | },
1026 | "text/plain": [
1027 | "HBox(children=(IntProgress(value=0, max=285103), HTML(value='')))"
1028 | ]
1029 | },
1030 | "metadata": {},
1031 | "output_type": "display_data"
1032 | },
1033 | {
1034 | "name": "stdout",
1035 | "output_type": "stream",
1036 | "text": [
1037 | "\n"
1038 | ]
1039 | }
1040 | ],
1041 | "source": [
1042 | "# create E(u) list for each user and store it use ids instead of names to store them\n",
1043 | "E_u = defaultdict(lambda : set())\n",
1044 | "\n",
1045 | "for subreddit, user, _ in tqdm.tqdm_notebook(train_set):\n",
1046 | " E_u[users_index[user]].add(subreddits_index[subreddit])\n",
1047 | " \n",
1048 | "for subreddit, user, _ in tqdm.tqdm_notebook(test_set):\n",
1049 | " E_u[users_index[user]].add(subreddits_index[subreddit])"
1050 | ]
1051 | },
1052 | {
1053 | "cell_type": "code",
1054 | "execution_count": 28,
1055 | "metadata": {},
1056 | "outputs": [],
1057 | "source": [
1058 | "# train the bpr model \n",
1059 | "from implicit.bpr import BayesianPersonalizedRanking\n",
1060 | "\n",
1061 | "params = {\"factors\": 63}"
1062 | ]
1063 | },
1064 | {
1065 | "cell_type": "code",
1066 | "execution_count": 29,
1067 | "metadata": {},
1068 | "outputs": [],
1069 | "source": [
1070 | "model = BayesianPersonalizedRanking(**params)"
1071 | ]
1072 | },
1073 | {
1074 | "cell_type": "code",
1075 | "execution_count": 30,
1076 | "metadata": {},
1077 | "outputs": [
1078 | {
1079 | "data": {
1080 | "text/plain": [
1081 | "<13500x285103 sparse matrix of type ''\n",
1082 | "\twith 801933 stored elements in COOrdinate format>"
1083 | ]
1084 | },
1085 | "execution_count": 30,
1086 | "metadata": {},
1087 | "output_type": "execute_result"
1088 | }
1089 | ],
1090 | "source": [
1091 | "comments"
1092 | ]
1093 | },
1094 | {
1095 | "cell_type": "code",
1096 | "execution_count": 31,
1097 | "metadata": {},
1098 | "outputs": [
1099 | {
1100 | "name": "stderr",
1101 | "output_type": "stream",
1102 | "text": [
1103 | "100%|██████████| 100/100 [00:18<00:00, 5.82it/s, correct=96.78%, skipped=6.10%]\n"
1104 | ]
1105 | }
1106 | ],
1107 | "source": [
1108 | "model.fit(comments)"
1109 | ]
1110 | },
1111 | {
1112 | "cell_type": "code",
1113 | "execution_count": 32,
1114 | "metadata": {},
1115 | "outputs": [],
1116 | "source": [
1117 | "num_subreddits = len(subreddits)"
1118 | ]
1119 | },
1120 | {
1121 | "cell_type": "code",
1122 | "execution_count": 33,
1123 | "metadata": {},
1124 | "outputs": [],
1125 | "source": [
1126 | "from utils import *"
1127 | ]
1128 | },
1129 | {
1130 | "cell_type": "code",
1131 | "execution_count": 35,
1132 | "metadata": {},
1133 | "outputs": [],
1134 | "source": [
1135 | "def auc(test_set, user_factors, subreddit_factors, subreddits, users):\n",
1136 | " \"\"\"\n",
1137 | " Returns the auc score on a test data set\n",
1138 | " \"\"\"\n",
1139 | " num_users = len(test_set)\n",
1140 | " total = 0\n",
1141 | "\n",
1142 | " # treat the signal as 1 as per the implicit bpr paper\n",
1143 | " for subreddit, user, signal in tqdm.tqdm_notebook(test_set):\n",
1144 | " u = users_index[user]\n",
1145 | " i = subreddits_index[subreddit]\n",
1146 | "\n",
1147 | " x_ui = user_factors[u].dot(subreddit_factors[i])\n",
1148 | "\n",
1149 | " js = []\n",
1150 | "\n",
1151 | " for j in range(0, num_subreddits):\n",
1152 | " if j != i and j not in E_u[u]:\n",
1153 | " js.append(j)\n",
1154 | "\n",
1155 | " total += np.sum(np.heaviside(x_ui - \\\n",
1156 | " user_factors[u].dot(subreddit_factors[js].T), 0)) / len(js)\n",
1157 | "\n",
1158 | " return total / num_users"
1159 | ]
1160 | },
1161 | {
1162 | "cell_type": "code",
1163 | "execution_count": 36,
1164 | "metadata": {},
1165 | "outputs": [
1166 | {
1167 | "data": {
1168 | "application/vnd.jupyter.widget-view+json": {
1169 | "model_id": "4d4bf4cd348a45d4bf37eb285da8e7ed",
1170 | "version_major": 2,
1171 | "version_minor": 0
1172 | },
1173 | "text/plain": [
1174 | "HBox(children=(IntProgress(value=0, max=10000), HTML(value='')))"
1175 | ]
1176 | },
1177 | "metadata": {},
1178 | "output_type": "display_data"
1179 | },
1180 | {
1181 | "name": "stdout",
1182 | "output_type": "stream",
1183 | "text": [
1184 | "\n"
1185 | ]
1186 | },
1187 | {
1188 | "data": {
1189 | "text/plain": [
1190 | "0.7154472624571377"
1191 | ]
1192 | },
1193 | "execution_count": 36,
1194 | "metadata": {},
1195 | "output_type": "execute_result"
1196 | }
1197 | ],
1198 | "source": [
1199 | "auc(test_set[:10000], model.user_factors, model.item_factors, subreddits, users)"
1200 | ]
1201 | },
1202 | {
1203 | "cell_type": "code",
1204 | "execution_count": 37,
1205 | "metadata": {},
1206 | "outputs": [],
1207 | "source": [
1208 | "def get_aucs_vs_factors():\n",
1209 | " factors = [8, 16, 32, 64, 128]\n",
1210 | " params_list = [{\"factors\": factor} for factor in factors]\n",
1211 | " \n",
1212 | " aucs = []\n",
1213 | " \n",
1214 | " for params in params_list:\n",
1215 | " model = BayesianPersonalizedRanking(**params)\n",
1216 | " model.fit(comments)\n",
1217 | " auc_ = auc(test_set[:10000], model.user_factors, model.item_factors, subreddits, users)\n",
1218 | " print(auc_)\n",
1219 | " aucs.append(auc_)\n",
1220 | " \n",
1221 | " return aucs"
1222 | ]
1223 | },
1224 | {
1225 | "cell_type": "code",
1226 | "execution_count": 38,
1227 | "metadata": {},
1228 | "outputs": [
1229 | {
1230 | "name": "stderr",
1231 | "output_type": "stream",
1232 | "text": [
1233 | "100%|██████████| 100/100 [00:16<00:00, 6.49it/s, correct=93.07%, skipped=6.06%]\n"
1234 | ]
1235 | },
1236 | {
1237 | "data": {
1238 | "application/vnd.jupyter.widget-view+json": {
1239 | "model_id": "10321dc325c14558b9bb47ce3b3377ca",
1240 | "version_major": 2,
1241 | "version_minor": 0
1242 | },
1243 | "text/plain": [
1244 | "HBox(children=(IntProgress(value=0, max=10000), HTML(value='')))"
1245 | ]
1246 | },
1247 | "metadata": {},
1248 | "output_type": "display_data"
1249 | },
1250 | {
1251 | "name": "stderr",
1252 | "output_type": "stream",
1253 | "text": [
1254 | "\r",
1255 | " 0%| | 0/100 [00:00, ?it/s]"
1256 | ]
1257 | },
1258 | {
1259 | "name": "stdout",
1260 | "output_type": "stream",
1261 | "text": [
1262 | "\n",
1263 | "0.719584496581462\n"
1264 | ]
1265 | },
1266 | {
1267 | "name": "stderr",
1268 | "output_type": "stream",
1269 | "text": [
1270 | "100%|██████████| 100/100 [00:14<00:00, 7.39it/s, correct=95.73%, skipped=6.08%]\n"
1271 | ]
1272 | },
1273 | {
1274 | "data": {
1275 | "application/vnd.jupyter.widget-view+json": {
1276 | "model_id": "f15673cb8b1543e0805de7dd67da5f05",
1277 | "version_major": 2,
1278 | "version_minor": 0
1279 | },
1280 | "text/plain": [
1281 | "HBox(children=(IntProgress(value=0, max=10000), HTML(value='')))"
1282 | ]
1283 | },
1284 | "metadata": {},
1285 | "output_type": "display_data"
1286 | },
1287 | {
1288 | "name": "stdout",
1289 | "output_type": "stream",
1290 | "text": [
1291 | "\n",
1292 | "0.7027533337412357\n"
1293 | ]
1294 | },
1295 | {
1296 | "name": "stderr",
1297 | "output_type": "stream",
1298 | "text": [
1299 | "100%|██████████| 100/100 [00:15<00:00, 6.66it/s, correct=96.84%, skipped=6.02%]\n"
1300 | ]
1301 | },
1302 | {
1303 | "data": {
1304 | "application/vnd.jupyter.widget-view+json": {
1305 | "model_id": "8090934de0814d1b8df30014b1aba2ea",
1306 | "version_major": 2,
1307 | "version_minor": 0
1308 | },
1309 | "text/plain": [
1310 | "HBox(children=(IntProgress(value=0, max=10000), HTML(value='')))"
1311 | ]
1312 | },
1313 | "metadata": {},
1314 | "output_type": "display_data"
1315 | },
1316 | {
1317 | "name": "stdout",
1318 | "output_type": "stream",
1319 | "text": [
1320 | "\n",
1321 | "0.7139457583443933\n"
1322 | ]
1323 | },
1324 | {
1325 | "name": "stderr",
1326 | "output_type": "stream",
1327 | "text": [
1328 | "100%|██████████| 100/100 [00:23<00:00, 5.04it/s, correct=96.82%, skipped=6.04%]\n"
1329 | ]
1330 | },
1331 | {
1332 | "data": {
1333 | "application/vnd.jupyter.widget-view+json": {
1334 | "model_id": "bcd9790b37a740689e4b88b89a4d0f03",
1335 | "version_major": 2,
1336 | "version_minor": 0
1337 | },
1338 | "text/plain": [
1339 | "HBox(children=(IntProgress(value=0, max=10000), HTML(value='')))"
1340 | ]
1341 | },
1342 | "metadata": {},
1343 | "output_type": "display_data"
1344 | },
1345 | {
1346 | "name": "stdout",
1347 | "output_type": "stream",
1348 | "text": [
1349 | "\n",
1350 | "0.7035268376229404\n"
1351 | ]
1352 | },
1353 | {
1354 | "name": "stderr",
1355 | "output_type": "stream",
1356 | "text": [
1357 | "100%|██████████| 100/100 [00:22<00:00, 4.60it/s, correct=96.55%, skipped=6.11%]\n"
1358 | ]
1359 | },
1360 | {
1361 | "data": {
1362 | "application/vnd.jupyter.widget-view+json": {
1363 | "model_id": "5f51b8a6481f4a478611ab4315bdb986",
1364 | "version_major": 2,
1365 | "version_minor": 0
1366 | },
1367 | "text/plain": [
1368 | "HBox(children=(IntProgress(value=0, max=10000), HTML(value='')))"
1369 | ]
1370 | },
1371 | "metadata": {},
1372 | "output_type": "display_data"
1373 | },
1374 | {
1375 | "name": "stdout",
1376 | "output_type": "stream",
1377 | "text": [
1378 | "\n",
1379 | "0.7166421893585617\n"
1380 | ]
1381 | }
1382 | ],
1383 | "source": [
1384 | "aucs_vs_factors = get_aucs_vs_factors()"
1385 | ]
1386 | },
1387 | {
1388 | "cell_type": "code",
1389 | "execution_count": 5,
1390 | "metadata": {},
1391 | "outputs": [
1392 | {
1393 | "data": {
1394 | "text/plain": [
1395 | "[0.7127681415290543,\n",
1396 | " 0.7169677827047857,\n",
1397 | " 0.7230466831483704,\n",
1398 | " 0.7053167273837033,\n",
1399 | " 0.7169114986739286]"
1400 | ]
1401 | },
1402 | "execution_count": 5,
1403 | "metadata": {},
1404 | "output_type": "execute_result"
1405 | }
1406 | ],
1407 | "source": [
1408 | "aucs_vs_factors"
1409 | ]
1410 | },
1411 | {
1412 | "cell_type": "code",
1413 | "execution_count": 21,
1414 | "metadata": {},
1415 | "outputs": [],
1416 | "source": [
1417 | "def auc(test_set, user_factors, subreddit_factors, subreddits, users):\n",
1418 | " \"\"\"\n",
1419 | " Returns the auc score on a test data set\n",
1420 | " \"\"\"\n",
1421 | " num_users = len(test_set)\n",
1422 | " total = 0\n",
1423 | "\n",
1424 | " # treat the signal as 1 as per the implicit bpr paper\n",
1425 | " for subreddit, user, signal in tqdm.tqdm_notebook(test_set):\n",
1426 | " u = users_index[user]\n",
1427 | " i = subreddits_index[subreddit]\n",
1428 | "\n",
1429 | " x_ui = user_factors[u].dot(subreddit_factors[i])\n",
1430 | "\n",
1431 | " js = []\n",
1432 | "\n",
1433 | " for j in range(0, num_subreddits):\n",
1434 | " if j != i and j not in E_u[u]:\n",
1435 | " js.append(j)\n",
1436 | "\n",
1437 | " total += np.sum(np.heaviside(x_ui - \\\n",
1438 | " user_factors[u].dot(subreddit_factors[js].T), 0)) / len(js)\n",
1439 | "\n",
1440 | " return total / num_users\n"
1441 | ]
1442 | },
1443 | {
1444 | "cell_type": "code",
1445 | "execution_count": 22,
1446 | "metadata": {},
1447 | "outputs": [],
1448 | "source": [
1449 | "def get_aucs_vs_factors_als():\n",
1450 | " factors = [8, 16, 32, 64, 128]\n",
1451 | " \n",
1452 | " aucs = []\n",
1453 | " \n",
1454 | " for factor in factors:\n",
1455 | " subreddit_factors, user_factors = alternating_least_squares(bm25_weight(comments), factor)\n",
1456 | " auc_ = auc(test_set[:10000], user_factors, subreddit_factors, subreddits, users)\n",
1457 | " print(auc_)\n",
1458 | " aucs.append(auc_)\n",
1459 | " \n",
1460 | " return aucs"
1461 | ]
1462 | },
1463 | {
1464 | "cell_type": "code",
1465 | "execution_count": 25,
1466 | "metadata": {},
1467 | "outputs": [
1468 | {
1469 | "name": "stderr",
1470 | "output_type": "stream",
1471 | "text": [
1472 | "WARNING:implicit:This method is deprecated. Please use the AlternatingLeastSquares class instead\n",
1473 | "100%|██████████| 15.0/15 [00:04<00:00, 3.42it/s]\n"
1474 | ]
1475 | },
1476 | {
1477 | "data": {
1478 | "application/vnd.jupyter.widget-view+json": {
1479 | "model_id": "0eba320176bc47979d9ae18daed58015",
1480 | "version_major": 2,
1481 | "version_minor": 0
1482 | },
1483 | "text/plain": [
1484 | "HBox(children=(IntProgress(value=0, max=10000), HTML(value='')))"
1485 | ]
1486 | },
1487 | "metadata": {},
1488 | "output_type": "display_data"
1489 | },
1490 | {
1491 | "name": "stderr",
1492 | "output_type": "stream",
1493 | "text": [
1494 | "WARNING:implicit:This method is deprecated. Please use the AlternatingLeastSquares class instead\n",
1495 | " 0%| | 0/15 [00:00, ?it/s]"
1496 | ]
1497 | },
1498 | {
1499 | "name": "stdout",
1500 | "output_type": "stream",
1501 | "text": [
1502 | "\n",
1503 | "0.7532502962975772\n"
1504 | ]
1505 | },
1506 | {
1507 | "name": "stderr",
1508 | "output_type": "stream",
1509 | "text": [
1510 | "100%|██████████| 15.0/15 [00:05<00:00, 2.82it/s]\n"
1511 | ]
1512 | },
1513 | {
1514 | "data": {
1515 | "application/vnd.jupyter.widget-view+json": {
1516 | "model_id": "3c8e91cf11e14dbb8e1b87eb2c6a525b",
1517 | "version_major": 2,
1518 | "version_minor": 0
1519 | },
1520 | "text/plain": [
1521 | "HBox(children=(IntProgress(value=0, max=10000), HTML(value='')))"
1522 | ]
1523 | },
1524 | "metadata": {},
1525 | "output_type": "display_data"
1526 | },
1527 | {
1528 | "name": "stderr",
1529 | "output_type": "stream",
1530 | "text": [
1531 | "WARNING:implicit:This method is deprecated. Please use the AlternatingLeastSquares class instead\n"
1532 | ]
1533 | },
1534 | {
1535 | "name": "stdout",
1536 | "output_type": "stream",
1537 | "text": [
1538 | "\n",
1539 | "0.7646220234906469\n"
1540 | ]
1541 | },
1542 | {
1543 | "name": "stderr",
1544 | "output_type": "stream",
1545 | "text": [
1546 | "100%|██████████| 15.0/15 [00:06<00:00, 2.53it/s]\n"
1547 | ]
1548 | },
1549 | {
1550 | "data": {
1551 | "application/vnd.jupyter.widget-view+json": {
1552 | "model_id": "ab34abf4fe664a3bb6665f4f26775097",
1553 | "version_major": 2,
1554 | "version_minor": 0
1555 | },
1556 | "text/plain": [
1557 | "HBox(children=(IntProgress(value=0, max=10000), HTML(value='')))"
1558 | ]
1559 | },
1560 | "metadata": {},
1561 | "output_type": "display_data"
1562 | },
1563 | {
1564 | "name": "stderr",
1565 | "output_type": "stream",
1566 | "text": [
1567 | "WARNING:implicit:This method is deprecated. Please use the AlternatingLeastSquares class instead\n"
1568 | ]
1569 | },
1570 | {
1571 | "name": "stdout",
1572 | "output_type": "stream",
1573 | "text": [
1574 | "\n",
1575 | "0.7811194828486229\n"
1576 | ]
1577 | },
1578 | {
1579 | "name": "stderr",
1580 | "output_type": "stream",
1581 | "text": [
1582 | "100%|██████████| 15.0/15 [00:09<00:00, 1.83it/s]\n"
1583 | ]
1584 | },
1585 | {
1586 | "data": {
1587 | "application/vnd.jupyter.widget-view+json": {
1588 | "model_id": "88a4ffac893f48f4bd934ef18e3b4732",
1589 | "version_major": 2,
1590 | "version_minor": 0
1591 | },
1592 | "text/plain": [
1593 | "HBox(children=(IntProgress(value=0, max=10000), HTML(value='')))"
1594 | ]
1595 | },
1596 | "metadata": {},
1597 | "output_type": "display_data"
1598 | },
1599 | {
1600 | "name": "stderr",
1601 | "output_type": "stream",
1602 | "text": [
1603 | "WARNING:implicit:This method is deprecated. Please use the AlternatingLeastSquares class instead\n"
1604 | ]
1605 | },
1606 | {
1607 | "name": "stdout",
1608 | "output_type": "stream",
1609 | "text": [
1610 | "\n",
1611 | "0.7996316000860498\n"
1612 | ]
1613 | },
1614 | {
1615 | "name": "stderr",
1616 | "output_type": "stream",
1617 | "text": [
1618 | "100%|██████████| 15.0/15 [00:16<00:00, 1.00s/it]\n"
1619 | ]
1620 | },
1621 | {
1622 | "data": {
1623 | "application/vnd.jupyter.widget-view+json": {
1624 | "model_id": "fd86be3cbe3c447689c132393a3123ed",
1625 | "version_major": 2,
1626 | "version_minor": 0
1627 | },
1628 | "text/plain": [
1629 | "HBox(children=(IntProgress(value=0, max=10000), HTML(value='')))"
1630 | ]
1631 | },
1632 | "metadata": {},
1633 | "output_type": "display_data"
1634 | },
1635 | {
1636 | "name": "stdout",
1637 | "output_type": "stream",
1638 | "text": [
1639 | "\n",
1640 | "0.8152623487533477\n"
1641 | ]
1642 | }
1643 | ],
1644 | "source": [
1645 | "aucs_als = get_aucs_vs_factors_als()"
1646 | ]
1647 | },
1648 | {
1649 | "cell_type": "code",
1650 | "execution_count": 26,
1651 | "metadata": {},
1652 | "outputs": [
1653 | {
1654 | "data": {
1655 | "text/plain": [
1656 | "[0.7532502962975772,\n",
1657 | " 0.7646220234906469,\n",
1658 | " 0.7811194828486229,\n",
1659 | " 0.7996316000860498,\n",
1660 | " 0.8152623487533477]"
1661 | ]
1662 | },
1663 | "execution_count": 26,
1664 | "metadata": {},
1665 | "output_type": "execute_result"
1666 | }
1667 | ],
1668 | "source": [
1669 | "aucs_als"
1670 | ]
1671 | },
1672 | {
1673 | "cell_type": "code",
1674 | "execution_count": 40,
1675 | "metadata": {},
1676 | "outputs": [],
1677 | "source": [
1678 | "aucs_tbpr = [0.9005884623283035,\n",
1679 | " 0.9000449631079535,\n",
1680 | " 0.8960811226825817,\n",
1681 | " 0.8951720529233058,\n",
1682 | " 0.89860491001835]"
1683 | ]
1684 | },
1685 | {
1686 | "cell_type": "code",
1687 | "execution_count": 58,
1688 | "metadata": {
1689 | "scrolled": true
1690 | },
1691 | "outputs": [
1692 | {
1693 | "data": {
1694 | "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYsAAAEWCAYAAACXGLsWAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMi4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvhp/UCwAAIABJREFUeJzt3Xl8VOW9+PHPN3sC2UgCKAmBIFpcg/LTWpfCVVErdbltrUvr0lbqVevFtrfVXzeFq9LW3mp/tbVo3Voqta5ovS5VUnFrwYKACMoWEhIgBAgJIet8f3+cM8OZySQzk8xk4/vmNS/Oec5znvPMOZPzPc9zNlFVjDHGmJ4kDXQFjDHGDH4WLIwxxkRkwcIYY0xEFiyMMcZEZMHCGGNMRBYsjDHGRGTBwgxrInKaiHwiIk0icvFA1yeUiFSIyDe6mTZBRFREUtzx/xWRq/u3hsY4LFgME+5OZ4+IpIdJ/0ZI2nQRqfaMi4jcLCJrRGS/iFSLyF9E5Lj+qn8CzQV+raojVfW5vhYmIo+KSJsbfHaLyGsi8qk41DMiVT1fVR9z63GNiLwVRV3/uz/qlkgiskVEDohIo4jsFZF3ROR6EYlq/xUadE3vWLAYBkRkAnAGoMCFvSjiPuA/gZuBUcCRwHPABfGpYWxEJDmOxZUCH/ayHt3tXH6mqiOBccA24Pe9rNshpw877M+rajbO9pwPfB9b7/3KgsXwcBXwHvAoEFM3hYhMBm4ELlfVN1S1VVWbVXWhqs7vZp5rRGSTe6S3WUSu9Ey7TkQ+cqetFZET3fQpbitnr4h8KCIXeuZ5VER+KyIvich+YIaIpIvIPSKyVUR2iMgDIpLp5i8UkRfdsnaLyNJwR5kishEoA15wWwLpInK4iCx259sgItd58t8uIk+JyB9FZB9wTU/rTlUPAE8C5SHL/Zq7DvaIyCsiUuqZdo6IrBORBhH5NSCeacnud94lIpsICdb+VqKITAEeAE51v9fenurpmf8vIrLdXfabInKMZ9qjInK/iPzV3Xb/EJFJ7rQuR+beFquITBKRN0Sk3q37QhHJ8+TdIiLfF5FVwH4R+S8ReTqkbv9PRO6N9B1UtUFVFwNfBq4WkWPd+S8QkRUisk9EqkTkds9sb7r/73XX16mR6mzCUFX7DPEPsAG4ATgJaAfGeKZVAN8IyT8dqHaHrwcqY1jWCGAfcJQ7fhhwjDv8JZwj7f+DsxM8AudIMNWt4/8F0oB/Axo9ZTwKNACn4RzAZAD3AotxWjrZwAvA3W7+u3F2lqnu5wxAuqnvFuBsz/jfgd+4yygH6oCz3Gm3u+vvYrcemWHKexT4b8+6+APwgWf6xe53nQKkAD8E3nGnFbrr7otuvW8BOvzbx90W64AS93svwWktpoRuS5xA9laEbRWoqzv+NXddprvrd2VI3t3AyW69FwKL3GkTvPUIU5cjgHPccotwds73hmyDle73ysT5zewH8tzpKcBO4KRotqEnfSvwH57f9HHudjse2AFc3EP9e6yzfbp+rGUxxInI6Tg75CdV9X1gI3BFDEUUALUxLtYHHCsimapaq6r+bp5v4HTRLFPHBlWtBD4NjATmq2qbqr4BvAhc7inzeVV9W1V9QCtwHXCLqu5W1UbgLuAyN287zg6nVFXbVXWpunuAnohICXA68H1VbVHVlcBDwFc92d5V1edU1adOyyGc77pH841ued75v4kT1D5S1Q633uVu6+JzwFpVfUpV23F22Ns9816Ks8OqUtXdOEExblT1YVVtVNVWnMB4gojkerI8o6r/dOu9kJAWUw/lblDV19RpldYB/wN8NiTbr9zvdUBVa3F2zl9yp50H7HJ/v7GowQmqqGqFqq52t9sq4IkwdYi1zsbDgsXQdzXwqqrucsf/RHBXVAfOUaxXKs4OF6AeZ8cbFVXdj9MFcD1Q63Zb+E/wluAEq1CHA1VuIPCrxOnz96vyDBcBWcD7blfTXuBlNx3g5zhH76+63WG3Rln9wwF/8ImmHt25R1XzcI5YDwBHeaaVAvd56r0bp5U1zl1+oHw3wHmXd3jIeGUUdYmK28U1X0Q2ul1sW9xJhZ5s3sDVjBPgoyl7tIgsEpFtbtl/DCkXuq7Xx4CvuMNfwWmhxWoczvpFRE4RkSUiUiciDTi/z9A6xFpn42HBYghz+/AvBT7r9kVvx+naOEFETnCzbcXZqXlN5OCO6HWgWESmRbtcVX1FVc/BCTLrgAfdSVXApDCz1AAlIecVxuN0WQWK9QzvwtkJH6Oqee4nV52TyrhHx99R1TLg88C3ReSsKKpeA4wSkewo69EjVd2Kc2HAff7zKTjr4JueeuepaqaqvoPTgivxzy8i4h0Pne7WrdvFR1tP1xXARcDZQC4HfxPS3Qwe+93/szxpYz3Dd7v1OV5Vc3B2/qHlhtb3OeB495zDLJyWTNRE5P/gBAv/FWF/wum2LFHVXJxuSn8dwq2raOpsPCxYDG0XA53A0ThdBuU4feVLcU56A/wZuFZEThbHkTgBZRGAqn6C04f/hDiX1KaJSIaIXBbuiF1ExojIhSIyAqe7qMmtAzhdOt8VkZPcZR3hdr/8A2eH8z0RSRWR6Tg7+UXhvpTbAnkQ+KWIjHaXO05EznWHZ7llC845gE5PHbqlqlXAO8Dd7nc8Hvg6Me6oQsp8DScIzXaTHgBu8588FpFcEfF3t/wVOEZE/t09WXwzwTvdJ4GbRaRYRPKBnlpMO3CCfFqUVc3G2V71ODv9u6KcD7ebZhvwFbeF8jWCDwqycX4He0VkHPBfUZTZAjyFs5P/pxt4IxKRHBGZhfPb+aOqrvbUYbeqtojIyQR3xdbhdJ2W9aXOhzoLFkPb1cAjqrpVVbf7P8CvgStFJEVVX8HZ6TyCcxL5JZwugAWecm5257kf2IvTlXQJzknlUEnAd3B2kLtx+nlvAFDVvwB34uwAGnGOHkepahvOJb3n47QafgNcparrevhu38fpanrP7Sb4Gwe7eya7403Au8BvVLUi4tpyXI5zVF0DPAv8xN3h98XPcQJhuqo+C/wUWOTWew3O98btKvwSzqWf9e73eNtTzoPAK8AHwL+AZ3pY5hs4lwRvF5FdPeTzH1U/jtOa3Aasxbl6LhbX4exQ64FjcIKu3x3AiTi/r79GqLfXYzgnpaPpgnpBRBpxWm4/wDnHcK1n+g3AXDfPj3ECLwCq2ozzu3zb7R78dB/qfMiSKM4LGmOGIBF5BnhTVSNekjoQRGQ8TjfmWFXdN9D1MT2zloUxw5DbtXI6sHyg6xKOe/7q2ziX51qgGAISFixE5GER2Skia7qZLiLyK3FujFol7s1b7rSrxXmezydiz8IxJiYicgOwAqeLssdHggwE93zXPpz7HH4ywNUxUUpYN5SInInTp/y4qh4bZvrngG/hXHt+CnCfqp4iIqNwjoam4fS3vo9zs86ehFTUGGNMRAlrWajqm7jXQHfjIpxAoqr6HpAnIocB5wKvuTdj7QFew7lpxxhjzAAZyKcwjiP4Rp1qN6279C5EZDbuJYuZmZknlZSUhMs2KPh8PpKS7BRRtGx9DT62TWIzVNbXxx9/vEtViyLlG8hgEe4GGO0hvWui6gLcS0CnTZumy5cPynN5AFRUVDB9+vSBrsaQYetr8LFtEpuhsr5EJKonBQxk2Ksm+G7VYpxr37tLN8YYM0AGMlgsBq5yr4r6NNDgPmDsFWCmiOS7d7HOdNOMMcYMkIR1Q4nIEziPDS4U561sP8F9oJ2qPoBzJ/HncO7Sbca9G1NVd4vIPGCZW9Rc9wmcxhhjBkjCgoWqXh5huuK8dCfctIeBhxNRL2OM8Wpvb6e6upqWlpa4lpubm8tHH30U1zL7IiMjg+LiYlJTQx9CHR17J60x5pBWXV1NdnY2EyZMwHk2ZXw0NjaSnZ0dOWM/UFXq6+uprq5m4sSJvSpj8F/XZYwxCdTS0kJBQUFcA8VgIyIUFBT0qfVkwcIYc8gbzoHCr6/f0YKFMcaYiCxYGGNMjJZsXsKEeyewZPOSuJSXnJxMeXk5J5xwAieeeCLvvOO8LmTLli1kZmZSXl7O0UcfzfXXX4/P5+uSftVVV9He3h5hKX1jwcIYY2KwZPMSZj0xi8qGSmY9MSsuASMzM5OVK1fywQcfcPfdd3PbbbcFpk2aNImVK1eyatUq1q5dy3PPPReUvnr1aqqrq3nyySe7Kz4uLFgYY0yU/IGiub0ZgOb25rgFDL99+/aRn5/fJT0lJYXPfOYzbNiwISg9OTmZk08+mW3btnWZJ57s0lljjHHNeXkOK7evDDttT8se1uxcg099QenN7c2c/YezOXb0seRnHNzJd3Z2Ot1LY8u597yeX1Z44MABysvLaWlpoba2ljfeeKNLnubmZl5//XXmzp0blN7S0sI//vEP7rvvvmi/Zq9Yy8IYY6Kwftf6LoHCz6c+1u9a3+uy/d1Q69at4+WXX+aqq67C/66hjRs3Ul5ezmmnncYFF1zA+eefH5ReUFDA+PHjOf7443u9/GhYy8IYY1w9tQBCu6C8slKzePHyF5kxcUYgrbc35Z166qns2rWLuro64OC5iVD+9NraWqZPn87ixYu58MILY15etKxlYYwxUZgxcQYvXv4iWalZQenhAkVfrFu3js7OTgoKCqLKf9hhhzF//nzuvvvuuCy/OxYsjDEmSqEBI16Bwn/Oory8nC9/+cs89thjJCcnRz3/xRdfTHNzM0uXLu1TPXpi3VDGGBMDf8C49vlreeSiR+LSoujs7AybPmHCBNasWRMxXUT44IMP+lyPnliwMMaYGM2YOIMtc7YMdDX6lXVDGWOMiciChTHGmIgsWBhjjInIgoUxxpiIEhosROQ8EVkvIhtE5NYw00tF5HURWSUiFSJS7JnWKSIr3c/iRNbTGGNMzxIWLEQkGbgfOB84GrhcRI4OyXYP8LiqHg/MBbx3lRxQ1XL3k7jbEo0xZhB49tlnERHWrVsHOI8nP/bYY7vke++99zjllFMoLy9nypQp3H777f1Sv0S2LE4GNqjqJlVtAxYBF4XkORp43R1eEma6McYMOrWNtXz20c+yvWl73Mp84oknOP3001m0aFGP+a6++moWLFjAypUrWbNmDZdeemnc6tCTRAaLcUCVZ7zaTfP6APiCO3wJkC0i/nvcM0RkuYi8JyIXJ7CexhgTk3lvzuOtrW8x7+/z4lJeU1MTb7/9Nr///e8jBoudO3dy2GGHAc7jyY8+OrTDJjESeVNeuBe+asj4d4Ffi8g1wJvANqDDnTZeVWtEpAx4Q0RWq+rGoAWIzAZmA4wZM4aKioo4Vj++mpqaBnX9BhtbX4PPcN0mubm5NDY2AvD9Jd9ndd3qHvO3dbaxvHY5Pnw8sPwBlm9bTlpyWpd8qoqIcFzRcfx0xk97LHPRokWcddZZHHbYYeTm5rJ06VLy8/Px+XyBuvndcMMNHHXUUZx++umcffbZXHHFFWRkZET1XVtaWnq9DRMZLKqBEs94MVDjzaCqNcC/A4jISOALqtrgmYaqbhKRCmAqsDFk/gXAAoBp06bp9OnTE/E94qKiooLBXL/BxtbX4DNct8lHH30UeDpsWlpaxGcyVe2tQt3jXkWpbqxmcsHkLvn877NIS0uL+PTZ5557jjlz5pCdnc2VV17J4sWLufHGG0lKSuoy75133snXvvY1Xn31VRYtWsSzzz4bdQDIyMhg6tSpUeUNlchgsQyYLCITcVoMlwFXeDOISCGwW1V9wG3Aw256PtCsqq1untOAnyWwrsYYE/ElRbWNtZT9qiwoWOxp2cOiLy5i7MixQXmjfUR5fX09b7zxBmvWrEFE6OzsRES44YYbup1n0qRJ/Md//AfXXXcdRUVF1NfXR/2U2t5K2DkLVe0AbgJeAT4CnlTVD0Vkroj4r26aDqwXkY+BMcCdbvoUYLmIfIBz4nu+qq5NVF2NMSYa896c1+UFSJ3a2adzF0899RRXXXUVlZWVbNmyhaqqKiZOnEh1dXXY/H/9618DL0b65JNPSE5OJi8vr9fLj1ZCHySoqi8BL4Wk/dgz/BTwVJj53gGOS2TdjDEmVu9Wv0tbZ1tQWltnG+9Uv9PrMp944gluvTX4NrQvfOEL3HXXXaxfv57i4sDtZ/zyl7/k6aef5pZbbiErK4uUlBQWLlwY0+PMe8ueOmuMMVFa8c0VcS8z3PmGm2++mZtvvjls/i996Utxr0M07HEfxhhjIrJgYYwxJiILFsYYYyKyYGGMMSYiCxbGGGMismBhjDEmIgsWxhgzgPbu3ctvfvMbwHkseWZmJuXl5Zxwwgl85jOfYf369YBziW1ubi5Tp05lypQp3HHHHV3SP/WpT/Hd7343IfW0YGGMMTFYuHohE+6dQNIdSUy4dwILVy/sU3neYAHOozxWrlzJBx98wNVXX81dd90VmHbGGWewYsUKli9fzh//+Efef//9oPQVK1bw4osv8vbbb/epTuHYTXnGGBOlhasXMvuF2TS3NwNQ2VDJ7BdmA3DlcVf2qsxbb72VjRs3Ul5ezuTJwQ8k3LdvH/n5+V3mGTFiBCeddBIbN25k9OjRgXR/q2Tbtm29qktPLFgYY4xrzstzWLl9ZbfT36t+j9bO1qC05vZmvv7813nw/QeD0v1PnS0fW97jAwrnz5/PmjVrWLlyJVu2bGHKlCmUl5fT2NhIc3Mz//jHP7rMU19fz3vvvcePfvQj6urqAul79uzhk08+4cwzz4z2K0fNuqGMMSZKoYEiUnpv+LuhNm7cyL333svs2bMD05YuXcrUqVOZOXMmt956K8ccc0wg/fjjj2fs2LHMmjWLsWPHdld8r1nLwhhjXJEeUT7h3glUNlR2SS/NLaXimoqgtGgfUd6TCy+8kGuvvTYwfsYZZ/Diiy92yedP//jjjzn99NO55JJLKC8v79OyQ1nLwhhjonTnWXeSlZoVlJaVmsWdZ93ZzRyRZWdnd3kbnt9bb73FpEmToi7ryCOP5LbbbuOnP+35zXy9YS0LY4yJkv8k9g9e/wFbG7YyPnc8d551Z69PbgMUFBRw2mmnceyxxzJlypTAyW5VJS0tjYceeiim8q6//nruueceNm/ezMSJE3tdr1AWLIwxJgZXHndln4JDOH/6058i5pk+fXrY19qGpmdmZibkaijrhjLGGBORBQtjjDERHfLBIt53Yxpjhh7/O62Hs75+x4QGCxE5T0TWi8gGEbk1zPRSEXldRFaJSIWIFHumXS0in7ifqxNRP//dmJUNlSgauBvTAkb3LLia4SYjI4P6+vohGTDqm+tZtWMVy2uWs2rHKuqb68PmU1Xq6+vJyMjo9bISdoJbRJKB+4FzgGpgmYgsVtW1nmz3AI+r6mMi8m/A3cBXRWQU8BNgGqDA++68e+JZxx+8/oPAbft+ze3NzHl5DmlJaYgIgvi/D4L0+v8P9nyAbtY+lzOQ/z+19inmvDyHAx0HgPg86sCYgVZcXEx1dXXQndDx0NLS0qedcyT72/ZTfyA4yNVKLQWZBYxIG9Elf0ZGBsXFxV3So5XIq6FOBjao6iYAEVkEXAR4g8XRwC3u8BLgOXf4XOA1Vd3tzvsacB7wRDwruLVha9j0Xc27uPSpS+O5KMeq+Bc50Jrbm/nqM1/l5v+9mYyUDNKT00lPSSc9Od0Zd4fTU9IPTg8dDxlOT05n045N7Fq7q9vp4cpOSRp8F/ctXL0wrpdZDncDsb5SU1PjeompX0VFBVOnTu3VvKpKp3bS1tlGa0crbZ1tgU9rpzP++YWfZ3vT9i7zluaWsmXOlj7WvqtE/nWNA6o849XAKSF5PgC+ANwHXAJki0hBN/OOC12AiMwGZgOMGTOGioqKmCo4On00O1p3dEkvSCvg58f/PBCx1f0HzkbszXjzgWYyMzN7PX9gHA2k9ff4/RvvD7seFeXM/DNp87XR7munTdto72inra2NBl8D7b522rX94HRfW9B4h3aE30Dretp6XSWRRGpSKqlJqaQlpZEq7v+e8aDpSamkSVrQuH+ewLgnLVy5/rTQ8lMlldd3vs49H99Dq895FERlQyVff+7rfLT2I84ec3ZsX24QaGpqivlvLBZ/2/G3fltfndpJh6+Ddm0P/Ab9/7f52nqe5hlv9wXnadf2wLzNbc3M+2geHb6OiOWGztvuaw/8DcZqa8PWhGynRAYLCZMW+u2/C/xaRK4B3gS2AR1RzouqLgAWAEybNk3DXYPck18U/CLoCZLg3I1536z74n40U1FREfYa6aHkxXtf7PZRB89e92yvy/Wpj7bONlo6WmjtaKW1s5U3336TE046gdbO1kCad7p3uLXDHQ/N2828rR2tNHU0OeNhpidSq6+V+evns3D7QpIkCUFIkiRnWDzDMaTHo4wu6WHy1NbWUjyuOGHL/82W3wQChXd9/Xrzr9FCDRxRhx5hB8Y9R+A9TWvrbKNTO+O+bVOTUklLTiM9JZ205DS0XckekU16cjppKWmkJaeRlZJFWnJa4JOenN7zeEp6t9O++eI3qWvu2nU2Pnd8QvY1iQwW1UCJZ7wYqPFmUNUa4N8BRGQk8AVVbRCRamB6yLwV8a5gIu7GHM7uPOvOsMG1L486AEiSJDJSMshIOdi/W5xVzHFjjutTub2hqrT72qMLRhGmz3tzXthldGonZ5aeiU99+NSHqh4cRnuV3u5rD6T3toxI6a1trSTvTg6bv7u0eGhobeBn7/ws7E7TuzNNS04jIyWDnPSc4OlJ4fPGsmPuad70lHRSk1IRCT7GTfQBYnNHc0L+HruTyGCxDJgsIhNxWgyXAVd4M4hIIbBbVX3AbcDD7qRXgLtExP8g95nu9LhLxN2Yw9WhEFxFJLATIL1vZT3+wePdtsQeu/ixvhU+AHqz84slKJ3wwAlU76vuUsb43PFUzum6Hg91/f33mLBgoaodInITzo4/GXhYVT8UkbnAclVdjNN6uFtEFKcb6kZ33t0iMg8n4ADM9Z/sNgPLgmv0EtUSG0pEhGRJJpnkiHnnnz0/7Pq666y7epjr0Naff48JvXxEVV8CXgpJ+7Fn+CngqW7mfZiDLQ1jhpxDoSUWT7a+BrfBd62hMcOItcRiY+tr8DrkH/dhjDEmMgsWxhhjIrJgYYwxJiILFsYYYyKyYGGMMSYiCxbGGGMismBhjDEmIgsWxhhjIrJgYYwxJiILFsYYYyKyYGGMMSYiCxbGGGMismBhjDEmIgsWxhhjIrJgYYwxJiILFsYYYyKyYGGMMSaihAYLETlPRNaLyAYRuTXM9PEiskREVojIKhH5nJs+QUQOiMhK9/NAIutpjDGmZwl7raqIJAP3A+cA1cAyEVmsqms92X4IPKmqvxWRo3He1z3BnbZRVcsTVT9jjDHRS2TL4mRgg6puUtU2YBFwUUgeBXLc4VygJoH1McYY00sJa1kA44Aqz3g1cEpIntuBV0XkW8AI4GzPtIkisgLYB/xQVZeGLkBEZgOzAcaMGUNFRUXcKh9vTU1Ng7p+g42tr8HHtklshtv6SmSwkDBpGjJ+OfCoqv5CRE4F/iAixwK1wHhVrReRk4DnROQYVd0XVJjqAmABwLRp03T69Olx/xLxUlFRwWCu32Bj62vwsW0Sm+G2vhLZDVUNlHjGi+nazfR14EkAVX0XyAAKVbVVVevd9PeBjcCRCayrMcaYHiQyWCwDJovIRBFJAy4DFofk2QqcBSAiU3CCRZ2IFLknyBGRMmAysCmBdTXGGNODhHVDqWqHiNwEvAIkAw+r6ociMhdYrqqLge8AD4rILThdVNeoqorImcBcEekAOoHrVXV3oupqjDGmZ4k8Z4GqvoRzOaw37cee4bXAaWHmexp4OpF1M8YYEz27g9sYY0xEFiyMMcZEZMHCGGNMRBYsjDHGRGTBwhhjTEQWLIwxxkRkwcIYY0xEFiyMMcZEZMHCGGNMRBYsjDHGRGTBwhhjhrDaxlo+++hn2d60PaHLsWBhjDFD2Lw35/HW1reY9/d5CV1OQh8kaIwxpm9aO1qp3ldN9b5qqvZVOf83VFHdWM2m3ZtYU7cGgEdWPsKPPvsjxo4cm5B6dBssRORcIFtVnwpJvxLYqaqvJaRGxhhziGjtaKWmsSYoCASG3f937t/ZZb78jHyKc4rZ17qPJEnCpz46tZN5f5/H/Rfcn5C69tSyuAP4fJj014FnAQsWxhjTjXZfO1v2bnFaAd5WgScw7Ni/o8t8uem5lOSWUJJTwkmHnURJTgnFOcWU5Dr/F+cUMzJtJLWNtZT9qgyf+gBo62xLaOuip2CRpap1oYmqul1ERsS9JsYYM0S0d7ZT01jTZefvDQg7mnagSzVovpz0nMDOv3xMeSAI+NOKc4rJTs+Oqg7z3pwXCBR+iWxd9BQsMkQkRVU7vIkikgpkxr0mxhgzCHT4OqhtrO2xa6i2sRYlOBBkp2UHjv6PH3M8Hbs7OOP4M4JaBTnpOXGr57vV79LW2RaU1tbZxjvV78RtGV49BYtncF55epOq7gdwWxS/cqcZY8yQ0unrpLaptseuodqm2i5H7CNSRwRaAMdMOoaSnJJAAPC3CnIzcoPmqaioYPqJ0xP2XVZ8c0XCyg6np2DxQ+C/gUoRqQQEKAF+D/womsJF5DzgPpx3cD+kqvNDpo8HHgPy3Dy3uq9iRURuA76O8w7um1X1lRi+lzFmiKptrOWypy/jz1/8c0x9752+TrY3be+xa6i2sZZO7QyaLys1K7DzP2fSOQfPEXgCQm56LiIS7686pHQbLNzup1tF5A7gCDd5g6oeiKZgEUkG7gfOAaqBZSKy2H3vtt8PgSdV9bcicjTO+7onuMOXAccAhwN/E5EjVUO2sjFm2PHeN+Dve/epjx1NO6jaVxW2VVDVUEVNY02XQJCZkhnY4Z818awuQaAkp4S8jLxDPhBEo6dLZ/89JEmBPBFZqaqNUZR9Mk5w2eSWtwi4CPAGCwX8nXi5QI07fBGwSFVbgc0issEt790olmuMGWJaOlrYsncLy2uW89C/HsKnPn73/u9YVrOMnft3sq1xGx2+oNOnZKRkBHb4MybOoDi7OCgIlOSWkJ+Rb4EgTnrqhgp32ewo4HgR+bqqvhGh7HFAlWe8GjglJM/twKsi8i1gBHDr7tLIAAAcAElEQVS2Z973QuYdF7oAEZkNzAYYM2YMFRUVEao0cJqamgZ1/QYbW1+DT1+2iaqyu203NS011LbUUnuglpqWGra3bKfmQA272nZ1madTO9m0axMn5p3IaTmnMTpjNEVpRc7/6UXkpOR0DQRNzmdPzR72sKdXdY2X4fYb7qkb6tpw6SJSCjxJ1x1/l6zhig0Zvxx4VFV/ISKnAn8QkWOjnBdVXQAsAJg2bZpOnz49QpUGTkVFBYO5foONra/BJ9I22d+2n817N7Npz6agz+a9m9m8ZzMHOg72YAvCuJxxlOWXceLEEynLKyM/M5/vvPqdoCt8mn3NPP7VxxN2V3IiDbffcMyP+1DVSvfy2UiqcU6I+xVzsJvJ7+vAeW6574pIBlAY5bzGmH7UqZ1UNVQFB4O9B4dD7zTOTsumLL+MowqO4vwjzqcsvyzwKc0tJT0lPSj/DX+9IewyE3lXsolezMFCRD4FtEaRdRkwWUQmAttwTlhfEZJnK3AW8KiITAEygDpgMfAnEfkfnBPck4F/xlpXY0xsGloawrYONu3ZxOY9m+l48+B5g2RJZnzueCbmT+TCIy8MCgZl+WWMyhwV0/mC/r5vwMSmpxPcL9C162cUcBjwlUgFq2qHiNwEvIJzWezDqvqhiMwFlqvqYuA7OPdy3OIu6xpVVeBDEXkS52R4B3CjXQllTN+1d7ZTta+KzXs2h20d7D6wOyj/qMxRlOWXUT62nGkjpjH9hOmBYFCSU0JqcjSdDNHp7/sGTGx6alncEzKuwG6cgPEVorgyyb1n4qWQtB97htcCp3Uz753AnZGWYYw5SFXZfWB30PkCb+tga8PWoMtLU5NSmZA3gbL8MqYdNi2oZTAxfyJ5GXmBvBUVFUw/afoAfCszGPR0gvvv/mERKcfpQroU2Aw8nfiqGWPCae1opbKhMqh7yNs62Ne6Lyj/6BGjKcsv49SSU7niuCuCAsK47HEkJyUP0DcxQ0lP3VBH4pxnuByoB/4MiKrO6Ke6GTMsxHpHsqqyc//OLlcU+Yer91UHPZcoIyWDiXkTKcsv44zxZwSG/a2DkWkjE/n1zCGip26odcBS4POqugHAPbdgjIlBuDuSm9ub2bJ3S7cBobm9OaiMw7MPpyy/jBkTZ1CWdzAQlOWXMXbkWJLEXnppEqunYPEFnJbFEhF5GVhE+PsfjDEh/G83+1ftv7rckVy9r5raptqg/CNSR1CWX8ak/EmcU3bOwZZB3kQm5E0gM9Ue9GwGVk/nLJ4FnnWfNHsxcAswRkR+Czyrqq/2Ux2NGVRUlfoD9Wxt2Br4VO6tZOu+g+Pbm7Z3ma9TO9m2b1vQPQf+1kFRVpE9lsIMahHvs3AfT74QWCgio4AvAbcCFizMsNTW2Ub1vmonAHgCwtZ9WwNp3ruRwXlg3fjc8YzPHc8Fky+gNLeUnPQcvve37wXdO7CnZQ93nnXnkLwj2RzaYropT1V3A79zP8YMOf5LS4NaBQ3BQWF70/YuL7YZM2IMpXmlHDfmOC6YfEEgMIzPHU9pXikFmQVdWgZ2R7IZTmK+g9uYwayts41t+7Z1CQDez/72/UHzZKRkBHb85x9xPqV5pUHBoDinmIyUjJjrYnckm+HEgoUZMlSVPS17uuz8vYEh3OsuR48YTWluKUcXHc15R5wX3CrILaUwqzAh5wvsjmQznFiwMINGe2c72xq3sbVhK6/ueJW333w7cK7AHwya2pqC5klPTg/s+M+ddG4gAHhbBXYlkTF9Z8HCxKw3r71UVfa27O2xVVDTWBPcKlgHRVlFlOaV8qnCTzGzbGaXcwV2FZEx/cOChYlZuJvM2jvbqWmsCXvC2P9pbAt+wWJaclpgx3/OpHMYnzM+cL6gdl0tXzzni9YqMGaQsGBhetTe2c6u5l3UNddRt7+OT+o/CbrJ7J81/2R703ZqGmvwqS9o3sKsQsbnjufIgiM5u+zsoFbB+NzxjB4xuts7jyu2VligMGYQsWBxiNnftp+65jonAOyvCwSBQEBwx/159rbs7basTu2kZl+N0yoIOVdQkltCVmpWP34zY0wiWbAYwnzqo6GlocsOPhAEQgPB/rouN5P5pSalUphVSNGIIoqyiph2+DQKMw+OF40oIokkrnjmClo7D777ak/LHuafPd9uMjNmmLNgMYh0+DoCO3vvDj6w0z/QtTXQ2c07oUakjgjs6EePGM0xo49xdvrujr8wqzAwXJRVRE56TsQTxTf89YYul6XaTWbGHBosWNC7q3ui0dzeHNj5/3P3P6n6oCrs0b9/fE/Lnm7Lys/ID+zYjxh1BKcWn9plh+8NAono77ebzIw5dFmwIPzVPaFUlYbWhvD9/N4uIM946GOmWe38l5KUErSjLx9b3uNRf0FWASlJA7+p7CYzYw5dCd0Dich5wH047+B+SFXnh0z/JeB/mVIWMFpV89xpnQR2r2xV1QsTUccte7bw+xW/x6c+HvzXg2SmZtLa0Rr2ZG+HryNsGVmpWUE7+CmFUyjKKgo6B1C1vopzTz+XwqxC8jLy7N4AY8yQkrBgISLJwP3AOUA1sExEFrvv3QZAVW/x5P8WMNVTxAFVLU9U/fzu+Psdga6Vdl87v3j3F+Rl5AV2/GX5ZZw87uSgI31vECgaURTVVT8V2yuYXDA50V/HGGMSIpEti5OBDaq6CUBEFgEXAWu7yX858JME1qeL2sZaFn24KCgtMyWTj278yK7uMcYYj0QGi3FAlWe8GjglXEYRKQUmAm94kjNEZDnQAcxX1efCzDcbmA0wZswYKioqYqrgLz/+JR2dwV1L7Z3tXP/E9cyZPCemsiJpamqKuX6HMltfg49tk9gMt/WVyGARrlNew6SB8/rWp1SDrgMdr6o1IlIGvCEiq1V1Y1BhqguABQDTpk3T6dOnx1TBW9bfQocGB4sO7aDSV0msZUVSUVER9zKHM1tfg49tk9gMt/WVyGBRDZR4xouBmm7yXgbc6E1Q1Rr3/00iUoFzPmNj11l7z67uMcaY6IR/ME98LAMmi8hEEUnDCQiLQzOJyFFAPvCuJy1fRNLd4ULgNLo/12GMMSbBEtayUNUOEbkJeAXn0tmHVfVDEZkLLFdVf+C4HFikqt4uqinA70TEhxPQ5nuvojLGGNO/Enqfhaq+BLwUkvbjkPHbw8z3DnBcIutmjDEmeonshjLGGDNMWLAwxhgTkQULY4wxEVmwMMYYE5EFC2OMMRFZsDDGGBORBQtjjDERWbAAlmxewoR7J7Bk85KBrooxxgxKh3ywWLJ5CbOemEVlQyWznphlAcMYM6T018HuIR0s/IHC//rT5vZmCxjGmCGjPw92D9lgERoo/Jrbmzn3j+fyvde+x+ubXmfD7g20drQOUC3NcGDdnCYR+vtgN6HPhhrMrn3+2i6Bwq/d187P3/k5P3/n54G0sSPHUppbSmleKaW5pYzPHR80npuR219VN0OI9w961hOzePHyF5kxcUbkGY3pwSsbXuGSP1/CgY4DQemJ/J0dssHikYseCduyAMhKyeLBzz/I4TmHU7m3ksqGSrY2bKWyoZIVtSt4ft3ztHYGtzZy0nOCgkcgoLjjPvX111czg0R3R34WMEw4nb5O6prr2N60PeKnobWh23Ka25u59vlr2TJnS1zrd8gGixkTZ/Di5S92CRhZqVkR/5h96mPn/p1OAHGDSeXeSrbuc8bf2voWe1v2Bs2TKqmUrvG0SEJaKSW5JaQlpyXs+5r48qmPhpYG6prr2NW8i7r9zv+7mndR11zH6p2r+dumv3U5SGhub+acP5zDJZ+6hKmHTaUgs4CCrIIu/2ekZAzQNzPxpKo0tDZEFQDqmuvCHlRmp2UzduRYxo4cy/FjjmfmpJk0tzfzh1V/oK2zrUv+rNQsHrnokbh/Fwl+jcTQNW3aNF2+fHnM83mP/qIJFNHa17rPCSBui2Tp6qVojgZaKbWNtajnLbOCcFj2YWG7uErznLSc9Jw+12uo6O9XUrZ2tAZ2/N6dfyAYhAkKnUFvAT4oKzWL1o7WbqeDs72127cMO2WEDSTucGFWYZfpuem5iIR7m3F8DLfXhPbFgfYD7Ni/o8ed/5ZdW9jbsbdLLwRAalJqIAD09BkzYgwj0kaErUO486692YeJyPuqOi1SvkO2ZeHnb2Fc+/y1PHLRI3HrHshJz+G4Mcdx3BjntRxH7z866A+ttaOV6n3VB7u4/C2UhkqW1yzn2XXPdjlqyMvIOxg8csYHBZPS3FJGjxid0J3FUNHdUX93O/665jqa2prCliUIBVkFFGUVUZhVyFGFR3Fa5mkUjXDG/en+8cKsQrJSs7q9gAIO/kGfWnIq9c311B+o7/5/d7hqexX1zfXsPrC72yCTLMmMyhzVbYDx/l+YVUhBVgGjMkdZi9bV124gQSgaURTY0Z+QdwLlk8rDBoG8jLw+/62G9o7E82A3nEO+ZdFfYj0q86mPHU07DnZxuS0U/3hlQyX7WvcFzZOenB50niT0vElxTjGpyal9/i5LNi+Je3AN5V1f3qP+Ljv+/XXsOhCcXt9c3+NRf+gOPjAeZsefn5FPclJyr75DvI78vHzqY2/LXuqb69nVvKvHAOP9v6Wjpdsys9OyowowWz7awszTZ1KQVUB2WnbCDkzi+ftSVfa27A2/098f0g20vy5sIM5Jzwne2Y8I3wooGlFEStLB4+/+aon1dX1F27KwYNFPEvHDaWhpCAoegYDijm9v2h6UXxAOzz48KJj4u7j8wyPTRva4zL522/l3duF2/N7hTds30ZrcGtNRf9GIIgozwwSBkKP+/pSobs5YNbc3h229BAJOmAATet7NKzUplVGZowItlO4CjPf/UZmjgnam4US7vg60H+j+yD8kCITr109LTosqAIwZOabXv5mh0m03KIKFiJwH3IfzDu6HVHV+yPRfAv5fQhYwWlXz3GlXAz90p/23qj7W07IOxWARSUtHi9PVFeYkfGVDJVUNVbT72oPmGZU5qtuT8FX7qvjqs18NPlJOyeKRix9h8qjJXXb4fTnqT25J5siSI7s96i/KKiIvI6/XR/39qT9aYonQ4etgz4E9gQBS8c8KDp90eNhWjLeVE/qb8spNzw0KIIFzL5kF1B+o57fLfxu0c09NSmXWkbNITU4NCgChrWro2g3UUxCIRzdQJBYsoq9AMvAxcA5QDSwDLlfVtd3k/xYwVVW/JiKjgOXANECB94GTVHVPd8uzYBG7Tl8n25u2d2mReMcb2xp7VXaSJAX6xntz1D8Y19ehLpptoqo0tTVFdR7G+3+4nb9XcU4xZfllMXUDDbSh8hseDCe4TwY2qOomt0KLgIuAsMECuBz4iTt8LvCaqu52530NOA94IoH1PeQkJyUzLmcc43LGcWrJqV2m+/t7KxsqmfmHmdQ113VbVlFWEc98+Zkhd9Rv4ktEyE7PJjs9mwl5E6Ker/TeUrY2bO12erIk8/dr/h6HGpreSmSwGAdUecargVPCZRSRUmAi8EYP844LM99sYDbAmDFjqKio6HOlE6WpqWlQ1y+S2464jdvW3Earr+tlgOlJ6dx2xG10bOpgu/uvr4b6+hqOErlN5pTO6fH39Z+l/znkfg/D7TecyGARrkOwuz6vy4CnVAOd2VHNq6oLgAXgdEMN5ibfUGmSdmc60ykvL4/71T3dGerrazhK5Dbp799Xfxhuv+FEPkiwGijxjBcDNd3kvYzgLqZY5jX9xH9dt//cwlD+QzaDj/2+BrdEBotlwGQRmSgiaTgBYXFoJhE5CsgH3vUkvwLMFJF8EckHZrppZoD5/6BLc0vtD9nEnf2+Bq+EdUOpaoeI3ISzk08GHlbVD0VkLrBcVf2B43JgkXouy1LV3SIyDyfgAMz1n+w2A2/GxBlxf0iZMX72+xqcEnqdmaq+BLwUkvbjkPHbu5n3YeDhhFXOGGNM1A7Zlx8ZY4yJngULY4wxEVmwMMYYE5EFC2OMMRFZsDDGGBORBQtjjDERWbAwxhgTkQULY4wxEVmwMMYYE5EFC2OMMRFZsDDGGBORBQtjjDERWbAwxhgTkQULY4wxEVmwMMYYE5EFC2OMMRFZsDDGGBORBQtjjDERJTRYiMh5IrJeRDaIyK3d5LlURNaKyIci8idPeqeIrHQ/i8PNa4wxpn8k7B3cIpIM3A+cA1QDy0Rksaqu9eSZDNwGnKaqe0RktKeIA6panqj6GWOMiV4iWxYnAxtUdZOqtgGLgItC8lwH3K+qewBUdWcC62OMMaaXEhksxgFVnvFqN83rSOBIEXlbRN4TkfM80zJEZLmbfnEC62mMMSaChHVDARImTcMsfzIwHSgGlorIsaq6FxivqjUiUga8ISKrVXVj0AJEZgOzAcaMGUNFRUWcv0L8NDU1Der6DTa2vgYf2yaxGW7rK5HBohoo8YwXAzVh8rynqu3AZhFZjxM8lqlqDYCqbhKRCmAqEBQsVHUBsABg2rRpOn369AR8jfioqKhgMNdvsLH1NfjYNonNcFtfieyGWgZMFpGJIpIGXAaEXtX0HDADQEQKcbqlNolIvoike9JPA9ZijDFmQCSsZaGqHSJyE/AKkAw8rKofishcYLmqLnanzRSRtUAn8F+qWi8inwF+JyI+nIA233sVlTHGmP6VyG4oVPUl4KWQtB97hhX4tvvx5nkHOC6RdTPGGBM9u4PbGGNMRBYsjDHGRGTBwhhjTEQWLIwxxkRkwcIYY0xEFiyMMcZEZMHCGGNMRBYsjDHGRGTBwhhjTEQWLIwxxkRkwcIYY0xEFiyMMcZEZMHCGGNMRBYsjDHGRGTBwhhjTEQWLIwxxkRkwcIYY0xEFiyMMcZEZMHCGGNMRAkNFiJynoisF5ENInJrN3kuFZG1IvKhiPzJk361iHzifq5OZD2NMcb0LCVRBYtIMnA/cA5QDSwTkcWqutaTZzJwG3Caqu4RkdFu+ijgJ8A0QIH33Xn3JKq+xhhjupfIlsXJwAZV3aSqbcAi4KKQPNcB9/uDgKrudNPPBV5T1d3utNeA8xJYV2OMMT1IWMsCGAdUecargVNC8hwJICJvA8nA7ar6cjfzjgtdgIjMBma7o00ish1o6GV9c3sxbyzzFAK7Yiz/UDYe2DrQlYhRb35DQ2nZ8d4m8ahzb8tI9N87DJ2/+dJoMiUyWEiYNA2z/MnAdKAYWCoix0Y5L6q6AFgQWKDIAlWdHZovqsr2Yt5Y5hGR5ao6rTd1OxSJSN1QW199+f0NhWXHe5vEo869LSPRf+9u/mH1N5/IbqhqoMQzXgzUhMnzvKq2q+pmYD1O8Ihm3nBe6H11ezVvX5ZnerZ3oCvQCwP5e+iPZcd7m8Sjzr0tw/7eYySqXQ7Y41OwSArwMXAWsA1YBlyhqh968pwHXK6qV4tIIbACKMc9qQ2c6Gb9F3CSqu5OSGX7wXA7ykg0W1+Dj22T2Ay39ZWwbihV7RCRm4BXcM5HPKyqH4rIXGC5qi52p80UkbVAJ/BfqloPICLzcAIMwNyhHChcCyJnMR62vgYf2yaxGVbrK2EtC2OMMcOH3cFtjDEmIgsWxhhjIrJgkQAi8rCI7BSRNZ60USLymvv4ktdEJH8g6ziYiEiJiCwRkY/cx778p5tu62yAiMgWEVktIitFZLmbZtvDI5a/c3H8yn300SoRObH7kgcnCxaJ8Shd7zi/FXhdVScDr7vjxtEBfEdVpwCfBm4UkaOxdTbQZqhqueeKHtsewR4l+r/z83FuC5iMcyPxb/upjnFjwSIBVPVNIPTqrYuAx9zhx4CL+7VSg5iq1qrqv9zhRuAjnDv2bZ0NLrY9PGL8O78IeFwd7wF5InJY/9Q0PixY9J8xqloLzs4RGD3A9RmURGQCMBX4B7bOBpICr4rI++5jdcC2RzS6W0dRPcJoMEvk4z6MiYmIjASeBuao6j6RcE99Mf3kNFWtcZ8E/ZqIrBvoCg1xUT3CaDCzlkX/2eFvdrr/74yQ/5AiIqk4gWKhqj7jJts6GyCqWuP+vxN4Fucp0rY9IutuHfX2EUaDhgWL/rMY8L/E6Wrg+QGsy6AiThPi98BHqvo/nkm2zgaAiIwQkWz/MDATWINtj2h0t44WA1e5V0V9Gmjwd1cNFXYHdwKIyBM4T9ItBHbgvMjpOeBJDj7m+UvD4BEmcSEipwNLgdWAz03+vzjnLWyd9TMRKcNpTYDTVf0nVb1TRAqw7REQy9+5e0D0a5yrp5qBa1V1+UDUu7csWBhjjInIuqGMMcZEZMHCGGNMRBYsjDHGRGTBwhhjTEQWLIwxxkRkwcIMeyJSISIJf72liNzsPjl3YbT1EZGXRCQv0XULU4e5InJ2fy/XDF32uA9jeiAiKaraEWX2G4DzVXVztOWr6ud6V7O+UdUfD8RyzdBlLQszKIjIBPeo/EH3nRavikimO817JF4oIlvc4WtE5DkReUFENovITSLybRFZISLvicgozyK+IiLviMgaETnZnX+E+06CZe48F3nK/YuIvAC8Gqau33bLWSMic9y0B4AyYLGI3BKSP1NEFrnvMfgzkOmZtsX9ThNEZJ2IPOSWu1BEzhaRt913I0RT52dE5GU3/8/c9GQRedQtc7W/bm7aF93hs9yyVrtlp3vqdoeI/Mud9ik3/bPivOdipTtfdl+2vRkiVNU+9hnwDzAB570W5e74k8BX3OEKYJo7XAhscYevATYA2UAR0ABc7077Jc4DCf3zP+gOnwmscYfv8iwjD/gYGOGWWw2MClPPk3DuNB8BjAQ+BKa607YAhWHm+TbwsDt8vPs9p3nn8Xz/43AO4t4HHsZ5AN1FwHNR1HkTkAtkAJU4zyI6CXjNU5c89/9HgS+6eauAI930xz3rbQvwLXf4BuAhd/gFnAcN4q6DlIH+/dgn8R9rWZjBZLOqrnSH38fZgUayRFUbVbUOJ1i84KavDpn/CQi8gyDHPU8wE7hVRFbiBJQMnMc0gLODDfcoi9OBZ1V1v6o2Ac8AZ0So45nAH93lrwJWdZNvs6quVlUfThB6XVU15Lv0VOfXVbVBVVuAtUApTgApE5H/JyLnAftClnmUu9yP3fHH3Pr6+R/q6N0ebwP/IyI34wSfaLvpzBBmwcIMJq2e4U4OnlPr4OBvNaOHeXyecR/B5+RCn2ujOEftX1DnbXDlqjpeVT9yp+/vpo69fW56NM/Viea79FTnLutPVfcAJ+AElhuBh0KWGen7+MsMbA9VnQ98A6c77T1/95QZ3ixYmKFgC053CjhdJ73xZQg8tLBBVRuAV4BvuQ95Q0SmRlHOm8DFIpLlPpH1EpyHIEaa50p3GcfidEX1Vkx1FpFCIElVnwZ+BIS++3kdMEFEjnDHvwr8PUKZk9wW0E+B5YAFi0OAXQ1lhoJ7gCdF5KvAG70sY4+IvAPkAF9z0+YB9wKr3J3vFmBWT4Wo6r9E5FHgn27SQ6q6IsKyfws8IiKrgJWeeXsj1jqPc5ftPzC8zTtRVVtE5FrgLyKSAiwDHohQhzkiMgOntbEW+N+Yv4UZcuyps8YYYyKybihjjDERWbAwxhgTkQULY4wxEVmwMMYYE5EFC2OMMRFZsDDGGBORBQtjjDER/X8i/2KSEyt/4wAAAABJRU5ErkJggg==\n",
1695 | "text/plain": [
1696 | ""
1697 | ]
1698 | },
1699 | "metadata": {},
1700 | "output_type": "display_data"
1701 | }
1702 | ],
1703 | "source": [
1704 | "import numpy as np\n",
1705 | "import matplotlib\n",
1706 | "import matplotlib.pyplot as plt\n",
1707 | "\n",
1708 | "\n",
1709 | "\n",
1710 | "# with plt.xkcd():\n",
1711 | "xs = np.array([8, 16, 32, 64, 128])\n",
1712 | "ys = aucs_vs_factors\n",
1713 | "axes = plt.axes()\n",
1714 | "plt.semilogx(xs, ys, '-gD', label='BPR')\n",
1715 | "plt.semilogx(xs, aucs_als, '-g^', label='ALS', color='orange')\n",
1716 | "plt.semilogx(xs, aucs_tbpr, '-go', label='tBPR')\n",
1717 | "axes.set_ylim([0.60, 1.0])\n",
1718 | "axes.set_xticks([10, 20, 50, 100])\n",
1719 | "axes.get_xaxis().set_major_formatter(matplotlib.ticker.ScalarFormatter())\n",
1720 | "axes.set_xlabel('number of dimensions')\n",
1721 | "axes.set_ylabel('AUC')\n",
1722 | "plt.title('AUC scores for Reddit January Data')\n",
1723 | "plt.legend()\n",
1724 | "plt.grid()\n",
1725 | "plt.savefig('AUCs.png', dpi=1000, bbox_inches='tight')\n",
1726 | "plt.show()"
1727 | ]
1728 | },
1729 | {
1730 | "cell_type": "code",
1731 | "execution_count": 43,
1732 | "metadata": {},
1733 | "outputs": [
1734 | {
1735 | "name": "stdout",
1736 | "output_type": "stream",
1737 | "text": [
1738 | "Help on function legend in module matplotlib.pyplot:\n",
1739 | "\n",
1740 | "legend(*args, **kwargs)\n",
1741 | " Places a legend on the axes.\n",
1742 | " \n",
1743 | " Call signatures::\n",
1744 | " \n",
1745 | " legend()\n",
1746 | " legend(labels)\n",
1747 | " legend(handles, labels)\n",
1748 | " \n",
1749 | " The call signatures correspond to three different ways how to use\n",
1750 | " this method.\n",
1751 | " \n",
1752 | " **1. Automatic detection of elements to be shown in the legend**\n",
1753 | " \n",
1754 | " The elements to be added to the legend are automatically determined,\n",
1755 | " when you do not pass in any extra arguments.\n",
1756 | " \n",
1757 | " In this case, the labels are taken from the artist. You can specify\n",
1758 | " them either at artist creation or by calling the\n",
1759 | " :meth:`~.Artist.set_label` method on the artist::\n",
1760 | " \n",
1761 | " line, = ax.plot([1, 2, 3], label='Inline label')\n",
1762 | " ax.legend()\n",
1763 | " \n",
1764 | " or::\n",
1765 | " \n",
1766 | " line.set_label('Label via method')\n",
1767 | " line, = ax.plot([1, 2, 3])\n",
1768 | " ax.legend()\n",
1769 | " \n",
1770 | " Specific lines can be excluded from the automatic legend element\n",
1771 | " selection by defining a label starting with an underscore.\n",
1772 | " This is default for all artists, so calling `Axes.legend` without\n",
1773 | " any arguments and without setting the labels manually will result in\n",
1774 | " no legend being drawn.\n",
1775 | " \n",
1776 | " \n",
1777 | " **2. Labeling existing plot elements**\n",
1778 | " \n",
1779 | " To make a legend for lines which already exist on the axes\n",
1780 | " (via plot for instance), simply call this function with an iterable\n",
1781 | " of strings, one for each legend item. For example::\n",
1782 | " \n",
1783 | " ax.plot([1, 2, 3])\n",
1784 | " ax.legend(['A simple line'])\n",
1785 | " \n",
1786 | " Note: This way of using is discouraged, because the relation between\n",
1787 | " plot elements and labels is only implicit by their order and can\n",
1788 | " easily be mixed up.\n",
1789 | " \n",
1790 | " \n",
1791 | " **3. Explicitly defining the elements in the legend**\n",
1792 | " \n",
1793 | " For full control of which artists have a legend entry, it is possible\n",
1794 | " to pass an iterable of legend artists followed by an iterable of\n",
1795 | " legend labels respectively::\n",
1796 | " \n",
1797 | " legend((line1, line2, line3), ('label1', 'label2', 'label3'))\n",
1798 | " \n",
1799 | " Parameters\n",
1800 | " ----------\n",
1801 | " \n",
1802 | " handles : sequence of `.Artist`, optional\n",
1803 | " A list of Artists (lines, patches) to be added to the legend.\n",
1804 | " Use this together with *labels*, if you need full control on what\n",
1805 | " is shown in the legend and the automatic mechanism described above\n",
1806 | " is not sufficient.\n",
1807 | " \n",
1808 | " The length of handles and labels should be the same in this\n",
1809 | " case. If they are not, they are truncated to the smaller length.\n",
1810 | " \n",
1811 | " labels : sequence of strings, optional\n",
1812 | " A list of labels to show next to the artists.\n",
1813 | " Use this together with *handles*, if you need full control on what\n",
1814 | " is shown in the legend and the automatic mechanism described above\n",
1815 | " is not sufficient.\n",
1816 | " \n",
1817 | " Other Parameters\n",
1818 | " ----------------\n",
1819 | " \n",
1820 | " loc : int or string or pair of floats, default: 'upper right'\n",
1821 | " The location of the legend. Possible codes are:\n",
1822 | " \n",
1823 | " =============== =============\n",
1824 | " Location String Location Code\n",
1825 | " =============== =============\n",
1826 | " 'best' 0\n",
1827 | " 'upper right' 1\n",
1828 | " 'upper left' 2\n",
1829 | " 'lower left' 3\n",
1830 | " 'lower right' 4\n",
1831 | " 'right' 5\n",
1832 | " 'center left' 6\n",
1833 | " 'center right' 7\n",
1834 | " 'lower center' 8\n",
1835 | " 'upper center' 9\n",
1836 | " 'center' 10\n",
1837 | " =============== =============\n",
1838 | " \n",
1839 | " \n",
1840 | " Alternatively can be a 2-tuple giving ``x, y`` of the lower-left\n",
1841 | " corner of the legend in axes coordinates (in which case\n",
1842 | " ``bbox_to_anchor`` will be ignored).\n",
1843 | " \n",
1844 | " bbox_to_anchor : `.BboxBase` or pair of floats\n",
1845 | " Specify any arbitrary location for the legend in `bbox_transform`\n",
1846 | " coordinates (default Axes coordinates).\n",
1847 | " \n",
1848 | " For example, to put the legend's upper right hand corner in the\n",
1849 | " center of the axes the following keywords can be used::\n",
1850 | " \n",
1851 | " loc='upper right', bbox_to_anchor=(0.5, 0.5)\n",
1852 | " \n",
1853 | " ncol : integer\n",
1854 | " The number of columns that the legend has. Default is 1.\n",
1855 | " \n",
1856 | " prop : None or :class:`matplotlib.font_manager.FontProperties` or dict\n",
1857 | " The font properties of the legend. If None (default), the current\n",
1858 | " :data:`matplotlib.rcParams` will be used.\n",
1859 | " \n",
1860 | " fontsize : int or float or {'xx-small', 'x-small', 'small', 'medium', 'large', 'x-large', 'xx-large'}\n",
1861 | " Controls the font size of the legend. If the value is numeric the\n",
1862 | " size will be the absolute font size in points. String values are\n",
1863 | " relative to the current default font size. This argument is only\n",
1864 | " used if `prop` is not specified.\n",
1865 | " \n",
1866 | " numpoints : None or int\n",
1867 | " The number of marker points in the legend when creating a legend\n",
1868 | " entry for a `.Line2D` (line).\n",
1869 | " Default is ``None``, which will take the value from\n",
1870 | " :rc:`legend.numpoints`.\n",
1871 | " \n",
1872 | " scatterpoints : None or int\n",
1873 | " The number of marker points in the legend when creating\n",
1874 | " a legend entry for a `.PathCollection` (scatter plot).\n",
1875 | " Default is ``None``, which will take the value from\n",
1876 | " :rc:`legend.scatterpoints`.\n",
1877 | " \n",
1878 | " scatteryoffsets : iterable of floats\n",
1879 | " The vertical offset (relative to the font size) for the markers\n",
1880 | " created for a scatter plot legend entry. 0.0 is at the base the\n",
1881 | " legend text, and 1.0 is at the top. To draw all markers at the\n",
1882 | " same height, set to ``[0.5]``. Default is ``[0.375, 0.5, 0.3125]``.\n",
1883 | " \n",
1884 | " markerscale : None or int or float\n",
1885 | " The relative size of legend markers compared with the originally\n",
1886 | " drawn ones.\n",
1887 | " Default is ``None``, which will take the value from\n",
1888 | " :rc:`legend.markerscale`.\n",
1889 | " \n",
1890 | " markerfirst : bool\n",
1891 | " If *True*, legend marker is placed to the left of the legend label.\n",
1892 | " If *False*, legend marker is placed to the right of the legend\n",
1893 | " label.\n",
1894 | " Default is *True*.\n",
1895 | " \n",
1896 | " frameon : None or bool\n",
1897 | " Control whether the legend should be drawn on a patch\n",
1898 | " (frame).\n",
1899 | " Default is ``None``, which will take the value from\n",
1900 | " :rc:`legend.frameon`.\n",
1901 | " \n",
1902 | " fancybox : None or bool\n",
1903 | " Control whether round edges should be enabled around the\n",
1904 | " :class:`~matplotlib.patches.FancyBboxPatch` which makes up the\n",
1905 | " legend's background.\n",
1906 | " Default is ``None``, which will take the value from\n",
1907 | " :rc:`legend.fancybox`.\n",
1908 | " \n",
1909 | " shadow : None or bool\n",
1910 | " Control whether to draw a shadow behind the legend.\n",
1911 | " Default is ``None``, which will take the value from\n",
1912 | " :rc:`legend.shadow`.\n",
1913 | " \n",
1914 | " framealpha : None or float\n",
1915 | " Control the alpha transparency of the legend's background.\n",
1916 | " Default is ``None``, which will take the value from\n",
1917 | " :rc:`legend.framealpha`. If shadow is activated and\n",
1918 | " *framealpha* is ``None``, the default value is ignored.\n",
1919 | " \n",
1920 | " facecolor : None or \"inherit\" or a color spec\n",
1921 | " Control the legend's background color.\n",
1922 | " Default is ``None``, which will take the value from\n",
1923 | " :rc:`legend.facecolor`. If ``\"inherit\"``, it will take\n",
1924 | " :rc:`axes.facecolor`.\n",
1925 | " \n",
1926 | " edgecolor : None or \"inherit\" or a color spec\n",
1927 | " Control the legend's background patch edge color.\n",
1928 | " Default is ``None``, which will take the value from\n",
1929 | " :rc:`legend.edgecolor` If ``\"inherit\"``, it will take\n",
1930 | " :rc:`axes.edgecolor`.\n",
1931 | " \n",
1932 | " mode : {\"expand\", None}\n",
1933 | " If `mode` is set to ``\"expand\"`` the legend will be horizontally\n",
1934 | " expanded to fill the axes area (or `bbox_to_anchor` if defines\n",
1935 | " the legend's size).\n",
1936 | " \n",
1937 | " bbox_transform : None or :class:`matplotlib.transforms.Transform`\n",
1938 | " The transform for the bounding box (`bbox_to_anchor`). For a value\n",
1939 | " of ``None`` (default) the Axes'\n",
1940 | " :data:`~matplotlib.axes.Axes.transAxes` transform will be used.\n",
1941 | " \n",
1942 | " title : str or None\n",
1943 | " The legend's title. Default is no title (``None``).\n",
1944 | " \n",
1945 | " borderpad : float or None\n",
1946 | " The fractional whitespace inside the legend border.\n",
1947 | " Measured in font-size units.\n",
1948 | " Default is ``None``, which will take the value from\n",
1949 | " :rc:`legend.borderpad`.\n",
1950 | " \n",
1951 | " labelspacing : float or None\n",
1952 | " The vertical space between the legend entries.\n",
1953 | " Measured in font-size units.\n",
1954 | " Default is ``None``, which will take the value from\n",
1955 | " :rc:`legend.labelspacing`.\n",
1956 | " \n",
1957 | " handlelength : float or None\n",
1958 | " The length of the legend handles.\n",
1959 | " Measured in font-size units.\n",
1960 | " Default is ``None``, which will take the value from\n",
1961 | " :rc:`legend.handlelength`.\n",
1962 | " \n",
1963 | " handletextpad : float or None\n",
1964 | " The pad between the legend handle and text.\n",
1965 | " Measured in font-size units.\n",
1966 | " Default is ``None``, which will take the value from\n",
1967 | " :rc:`legend.handletextpad`.\n",
1968 | " \n",
1969 | " borderaxespad : float or None\n",
1970 | " The pad between the axes and legend border.\n",
1971 | " Measured in font-size units.\n",
1972 | " Default is ``None``, which will take the value from\n",
1973 | " :rc:`legend.borderaxespad`.\n",
1974 | " \n",
1975 | " columnspacing : float or None\n",
1976 | " The spacing between columns.\n",
1977 | " Measured in font-size units.\n",
1978 | " Default is ``None``, which will take the value from\n",
1979 | " :rc:`legend.columnspacing`.\n",
1980 | " \n",
1981 | " handler_map : dict or None\n",
1982 | " The custom dictionary mapping instances or types to a legend\n",
1983 | " handler. This `handler_map` updates the default handler map\n",
1984 | " found at :func:`matplotlib.legend.Legend.get_legend_handler_map`.\n",
1985 | " \n",
1986 | " Returns\n",
1987 | " -------\n",
1988 | " \n",
1989 | " :class:`matplotlib.legend.Legend` instance\n",
1990 | " \n",
1991 | " Notes\n",
1992 | " -----\n",
1993 | " \n",
1994 | " Not all kinds of artist are supported by the legend command. See\n",
1995 | " :ref:`sphx_glr_tutorials_intermediate_legend_guide.py` for details.\n",
1996 | " \n",
1997 | " Examples\n",
1998 | " --------\n",
1999 | " \n",
2000 | " .. plot:: gallery/api/legend.py\n",
2001 | "\n"
2002 | ]
2003 | }
2004 | ],
2005 | "source": [
2006 | "help(plt.legend)"
2007 | ]
2008 | },
2009 | {
2010 | "cell_type": "code",
2011 | "execution_count": null,
2012 | "metadata": {},
2013 | "outputs": [],
2014 | "source": []
2015 | }
2016 | ],
2017 | "metadata": {
2018 | "@webio": {
2019 | "lastCommId": null,
2020 | "lastKernelId": null
2021 | },
2022 | "kernelspec": {
2023 | "display_name": "Python 3",
2024 | "language": "python",
2025 | "name": "python3"
2026 | },
2027 | "language_info": {
2028 | "codemirror_mode": {
2029 | "name": "ipython",
2030 | "version": 3
2031 | },
2032 | "file_extension": ".py",
2033 | "mimetype": "text/x-python",
2034 | "name": "python",
2035 | "nbconvert_exporter": "python",
2036 | "pygments_lexer": "ipython3",
2037 | "version": "3.6.8"
2038 | }
2039 | },
2040 | "nbformat": 4,
2041 | "nbformat_minor": 2
2042 | }
2043 |
--------------------------------------------------------------------------------
/models.py:
--------------------------------------------------------------------------------
1 |
2 | #%%
3 | import implicit
4 | import numpy as np
5 | from tqdm import tqdm_notebook
6 | import pandas as pd
7 | import csv
8 | import scipy
9 | from scipy.sparse import coo_matrix
10 | from scipy.sparse.linalg import svds
11 | from implicit.nearest_neighbours import bm25_weight
12 | from implicit import alternating_least_squares
13 | import umap
14 |
15 |
16 | #%%
17 | data = []
18 | with open('interactions_30_ch_no_bots') as csvfile:
19 | datareader = csv.reader(csvfile, delimiter=' ')
20 | for subreddit, user, comments, _ in datareader:
21 | data.append([user, subreddit, int(comments)])
22 |
23 |
24 | #%%
25 | data = pd.DataFrame.from_records(data)
26 |
27 |
28 | #%%
29 | data.columns = ['user', 'subreddit', 'comments']
30 |
31 |
32 | #%%
33 | data['user'] = data['user'].astype("category")
34 | data['subreddit'] = data['subreddit'].astype("category")
35 |
36 |
37 | #%%
38 | # create a sparse matrix of all the artist/user/play triples
39 | comments = coo_matrix((data['comments'].astype(float),
40 | (data['subreddit'].cat.codes,
41 | data['user'].cat.codes)))
42 |
43 | #%% [markdown]
44 | # ### Latent Semantic Analysis
45 |
46 | #%%
47 | # toggle this variable if you want to recalculate the als factors
48 | read_als_factors_from_file = True
49 |
50 |
51 | #%%
52 | if read_als_factors_from_file:
53 | subreddit_factors = np.load('subreddit_factors_als.npy')
54 | user_factors = np.load('user_factors_als.npy')
55 | else:
56 | subreddit_factors, user_factors = alternating_least_squares(bm25_weight(comments), 20)
57 |
58 |
59 | #%%
60 | subreddit_factors, user_factors = alternating_least_squares(bm25_weight(comments), 20)
61 |
62 |
63 | #%%
64 | class TopRelated(object):
65 | def __init__(self, subreddit_factors):
66 | norms = np.linalg.norm(subreddit_factors, axis=-1)
67 | self.factors = subreddit_factors / norms[:, np.newaxis]
68 | self.subreddits = data['subreddit'].cat.categories.array.to_numpy()
69 |
70 | def get_related(self, subreddit, N=10):
71 | subredditid = np.where(self.subreddits == subreddit)[0][0]
72 | scores = self.factors.dot(self.factors[subredditid])
73 | best = np.argpartition(scores, -N)[-N:]
74 | best_ = [self.subreddits[i] for i in best]
75 | return sorted(zip(best_, scores[best]), key=lambda x: -x[1])
76 |
77 |
78 | #%%
79 | top_related = TopRelated(subreddit_factors)
80 |
81 |
82 | #%%
83 | top_related.get_related('OnePiece')
84 |
85 |
86 | #%%
87 | subreddit_factors.shape
88 |
89 |
90 | #%%
91 | subreddits_embedded = umap.UMAP().fit_transform(subreddit_factors)
92 | subreddits_embedded.shape
93 |
94 |
95 | #%%
96 | subreddits_embedded
97 |
98 |
99 | #%%
100 | subreddits = data['subreddit'].cat.categories.array.to_numpy()
101 |
102 |
103 | #%%
104 | import random
105 |
106 | indices = random.sample(range(len(subreddits)), 1000)
107 |
108 |
109 | #%%
110 | sampled_subreddits = subreddits[indices]
111 | sampled_subreddits_embedded = subreddits_embedded[indices]
112 |
113 |
114 | #%%
115 | import plotly
116 | import plotly.plotly as py
117 | import plotly.graph_objs as go
118 |
119 | plotly.tools.set_credentials_file(username='abkds', api_key='KKuXHMUKu7EHg9kIZWrl')
120 |
121 |
122 | # Create random data with numpy
123 | import numpy as np
124 |
125 | N = 500
126 | xs = sampled_subreddits_embedded[:, 0]
127 | ys = sampled_subreddits_embedded[:, 1]
128 |
129 | # Create a trace
130 | trace = go.Scatter(
131 | x = xs,
132 | y = ys,
133 | mode='markers+text',
134 | text=sampled_subreddits
135 | )
136 |
137 | data_ = [trace]
138 |
139 | # Plot and embed in ipython notebook!
140 | py.iplot(data_, filename='basic-scatter')
141 |
142 | # or plot with: plot_url = py.plot(data, filename='basic-line')
143 |
144 | #%% [markdown]
145 | # ### Bayesian Personalized Ranking
146 |
147 | #%%
148 | from implicit.bpr import BayesianPersonalizedRanking
149 |
150 | params = {"factors": 63}
151 |
152 |
153 | #%%
154 | import logging
155 | import tqdm
156 | import time
157 | import codecs
158 |
159 |
160 | #%%
161 | model = BayesianPersonalizedRanking(**params)
162 |
163 |
164 | #%%
165 | model_name = 'bpr'
166 | output_filename = 'subreddits_recs_bpr'
167 |
168 |
169 | #%%
170 | model.fit(comments)
171 |
172 |
173 | #%%
174 | def bpr_related_subreddits(subreddit):
175 | found = np.where(subreddits == subreddit)
176 | if len(found[0]) == 0:
177 | raise ValueError("Subreddit doesn't exist in the dataset.")
178 | _id = found[0][0]
179 | return [(subreddits[i], v) for i, v in model.similar_items(_id)]
180 |
181 |
182 | #%%
183 | bpr_related_subreddits('dogs')
184 |
185 |
186 | #%%
187 | users = data['user'].cat.categories.array.to_numpy()
188 |
189 |
190 | #%%
191 | write_bpr_recommendations = False
192 |
193 |
194 | #%%
195 | user_comments = comments.T.tocsr()
196 | if write_bpr_recommendations:
197 | # generate recommendations for each user and write out to a file
198 | with tqdm.tqdm_notebook(total=len(users)) as progress:
199 | with codecs.open(output_filename, "w", "utf8") as o:
200 | for userid, username in enumerate(users):
201 | for subredditid, score in model.recommend(userid, user_comments):
202 | o.write("%s\t%s\t%s\n" % (username, subreddits[subredditid], score))
203 | progress.update(1)
204 |
205 | #%% [markdown]
206 | # ### Sample user recommendations
207 | #
208 | # We went through the user 'xkcd_transciber' list of subreddits, where he/she commented. Taking a view of the kind of subreddits followed by the user we see that the predictions are good. This is just one sample, we are saving the recommendations for all users in a file and will also write the AUC score function for getting the exact scores for the generated recommendations.
209 |
210 | #%%
211 | def recommend_for_user(username):
212 | sample_user_id = np.where(users == username)[0][0]
213 | return [(subreddits[i], v) for i, v in model.recommend(2293528, user_comments)]
214 |
215 |
216 | #%%
217 | recommend_for_user('xkcd_transcriber')
218 |
219 |
220 | #%%
221 | def subreddits_interacted_by_user(username):
222 | sample_user_id = np.where(users == username)[0][0]
223 | _idlist = comments.getcol(sample_user_id)
224 | return [subreddits[idx] for idx, i in enumerate(_idlist.toarray()) if i != 0.0]
225 |
226 |
227 | #%%
228 | # sample 50 reddits with which xkcd_transcriber has interacted with.
229 | random.sample(subreddits_interacted_by_user('xkcd_transcriber'), 50)
230 |
231 |
232 | #%%
233 | # set seed to get the same train and test set
234 | np.random.seed(42)
235 |
236 | filename = 'interactions_30_ch_no_bots'
237 | train_filename = 'interactions_5'
238 |
239 | def create_dataset():
240 | data = defaultdict(lambda: [])
241 | with open(filename) as csvfile:
242 | datareader = csv.reader(csvfile, delimiter=' ')
243 | for subreddit, user, comments, _ in tqdm.tqdm_notebook(datareader):
244 | data[user].append((subreddit, comments))
245 |
246 |
247 | f_train = open(train_filename, 'a')
248 |
249 | for user, items in tqdm.tqdm_notebook(data.items()):
250 | np.random.shuffle(items)
251 | if len(items) >= 5:
252 | for item in items:
253 | line = ' '.join(list(map(str, [item[0], user, item[1]]))) + '\n'
254 | f_train.write(line)
255 |
256 | f_train.close()
257 |
258 | create_dataset()
259 |
260 |
261 | #%%
262 | data = []
263 | with open('interactions_5') as csvfile:
264 | datareader = csv.reader(csvfile, delimiter=' ')
265 | for subreddit, user, comments in datareader:
266 | data.append([user, subreddit, int(comments)])
267 |
268 |
269 | #%%
270 | data = pd.DataFrame.from_records(data)
271 | data.columns = ['user', 'subreddit', 'comments']
272 |
273 | data['user'] = data['user'].astype("category")
274 | data['subreddit'] = data['subreddit'].astype("category")
275 |
276 |
277 | #%%
278 | # create a sparse matrix of all the artist/user/play triples
279 | comments = coo_matrix((data['comments'].astype(float),
280 | (data['subreddit'].cat.codes,
281 | data['user'].cat.codes)))
282 |
283 |
284 | #%%
285 | comments
286 |
287 |
288 | #%%
289 | subreddits = data['subreddit'].cat.categories.array.to_numpy()
290 | users = data['user'].cat.categories.array.to_numpy()
291 |
292 |
293 | #%%
294 | print('Number of users for BPR model: %s' % len(users))
295 | print('Number of subreddits for BPR model: %s' % len(subreddits))
296 |
297 | #%% [markdown]
298 | # Create the index and the reverse index for the users and subreddits
299 |
300 | #%%
301 | def item_to_index(things):
302 | index = {}
303 | for idx, item in enumerate(things):
304 | index[item] = idx
305 | return index
306 |
307 | def index_to_item(index):
308 | things = np.empty(len(index), dtype=object)
309 | for item, idx in index.items():
310 | things[idx] = item
311 | return things
312 |
313 |
314 | #%%
315 | subreddits_index = item_to_index(subreddits)
316 | users_index = item_to_index(users)
317 |
318 | #%% [markdown]
319 | # ### Extracting test set
320 | #
321 | # We will pluck out the test set, as per the strategy given in the paper [BPR: Bayesian Personalized Ranking from Implicit Feedback](https://arxiv.org/pdf/1205.2618.pdf), section 6.2
322 |
323 | #%%
324 | def train_test_split(coo_comments):
325 | """
326 | Omits random user subreddit interactions, zeros them out
327 | and appends them to the test list.
328 | """
329 | csr_comments = coo_comments.tocsr()
330 |
331 | data = defaultdict(lambda: [])
332 | with open('interactions_5') as csvfile:
333 | datareader = csv.reader(csvfile, delimiter=' ')
334 | for subreddit, user, comments in tqdm.tqdm_notebook(datareader):
335 | data[user].append((subreddit, comments))
336 |
337 | train_set = []
338 | test_set = []
339 |
340 | for user, items in tqdm.tqdm_notebook(data.items()):
341 | np.random.shuffle(items)
342 | test_item = items[0]
343 | test_comments = items[1]
344 |
345 | test_subreddit = test_item[0]
346 | # zero out a user item interaction
347 | csr_comments[subreddits_index[test_subreddit], users_index[user]] = 0
348 |
349 | test_set.append([test_subreddit, user, int(comments)])
350 |
351 | for item in items[1:]:
352 | train_set.append([item[0], user, int(item[1])])
353 |
354 | csr_comments.eliminate_zeros()
355 | return train_set, test_set, csr_comments.tocoo()
356 |
357 |
358 | #%%
359 | train_set, test_set, comments = train_test_split(comments)
360 |
361 | #%% [markdown]
362 | # ### AUC Metric
363 | #
364 | # We will implement the AUC Metric for evaluation of BPR based methods. We take the definition given in the paper [BPR: Bayesian Personalized Ranking from Implicit Feedback](https://arxiv.org/pdf/1205.2618.pdf), section 6.2 . AUC is defined as
365 | #
366 | # $$AUC = \frac{1}{| U |} \sum_u \frac{1}{|E(u)|} \sum_{(i, j) \in E(u)} \delta(\hat{x}_{ui} - \hat{x}_{uj}) $$
367 | #
368 | # where $$E(u) := \{(i, j) | (u, i) \in S_{test} ∧ (u, j) \notin (S_{test} ∪ S_{train})\}$$
369 | #
370 | #
371 |
372 | #%%
373 | # create E(u) list for each user and store it use ids instead of names to store them
374 | E_u = defaultdict(lambda : set())
375 |
376 | for subreddit, user, _ in tqdm.tqdm_notebook(train_set):
377 | E_u[users_index[user]].add(subreddits_index[subreddit])
378 |
379 | for subreddit, user, _ in tqdm.tqdm_notebook(test_set):
380 | E_u[users_index[user]].add(subreddits_index[subreddit])
381 |
382 |
383 | #%%
384 | # train the bpr model
385 | from implicit.bpr import BayesianPersonalizedRanking
386 |
387 | params = {"factors": 63}
388 |
389 |
390 | #%%
391 | model = BayesianPersonalizedRanking(**params)
392 |
393 |
394 | #%%
395 | comments
396 |
397 |
398 | #%%
399 | model.fit(comments)
400 |
401 |
402 | #%%
403 | num_subreddits = len(subreddits)
404 |
405 |
406 | #%%
407 | def auc(test_set, user_factors, subreddit_factors, subreddits, users):
408 | """
409 | Returns the auc score on a test data set
410 | """
411 | num_users = len(test_set)
412 |
413 | total = 0
414 |
415 | # treat the signal as 1 as per the implicit bpr paper
416 | for subreddit, user, signal in tqdm.tqdm_notebook(test_set): # outer summation
417 | # inner summation
418 | # TODO: try to parallelize
419 | u = users_index[user]
420 | i = subreddits_index[subreddit]
421 |
422 | x_ui = user_factors[u].dot(subreddit_factors[i])
423 |
424 | js = []
425 |
426 | for j in range(0, num_subreddits):
427 | if j != i and j not in E_u[u]:
428 | js.append(j)
429 |
430 | total += np.sum(np.heaviside(x_ui - user_factors[u].dot(subreddit_factors[js].T), 0)) / len(js)
431 |
432 | # for j in range(0, subreddits):
433 | # numel = 0
434 | # total_user = 0
435 | # if j != i and j not in E_u[u]:
436 | # numel += 1
437 | # x_uj = user_factors[u].dot(subreddit_factors[j])
438 | # total_user += heaviside(x_ui - x_uj)
439 |
440 | # total += (total_user * 1.0 / numel)
441 |
442 | return total / num_users
443 |
444 |
445 | #%%
446 | auc(test_set[:10000], model.user_factors, model.item_factors, subreddits, users)
447 |
448 |
449 | #%%
450 | def get_aucs_vs_factors():
451 | factors = [8, 16, 32, 64, 128]
452 | params_list = [{"factors": factor} for factor in factors]
453 |
454 | aucs = []
455 |
456 | for params in params_list:
457 | model = BayesianPersonalizedRanking(**params)
458 | model.fit(comments)
459 | aucs.append(auc(test_set[:20000], model.user_factors, model.item_factors, subreddits, users))
460 |
461 | return aucs
462 |
463 |
464 | #%%
465 | aucs_vs_factors = get_aucs_vs_factors()
466 |
467 |
468 | #%%
469 | aucs_vs_factors
470 |
471 |
472 | #%%
473 | def get_aucs_vs_factors_als():
474 | factors = [8, 16, 32, 64, 128]
475 |
476 | aucs = []
477 |
478 | for factor in factors:
479 | subreddit_factors, user_factors = alternating_least_squares(bm25_weight(comments), factor)
480 | aucs.append(auc(test_set[:20000], user_factors, subreddit_factors, subreddits, users))
481 |
482 | return aucs
483 |
484 |
485 | #%%
486 | aucs_als = get_aucs_vs_factors_als()
487 |
488 |
489 | #%%
490 | aucs_als
491 |
492 |
493 | #%%
494 | import numpy as np
495 | import matplotlib
496 | import matplotlib.pyplot as plt
497 |
498 | with plt.xkcd():
499 | xs = np.array([8, 16, 32, 64, 128])
500 | ys = aucs_vs_factors
501 | axes = plt.axes()
502 | plt.semilogx(xs, ys, '-gD', xs, aucs_als, '-g^')
503 | axes.set_ylim([0.75, 1.0])
504 | axes.set_xticks([10, 20, 50, 100])
505 | axes.get_xaxis().set_major_formatter(matplotlib.ticker.ScalarFormatter())
506 | axes.set_xlabel('number of dimensions')
507 | axes.set_ylabel('AUC')
508 | plt.title('AUC scores for Reddit January Data')
509 | plt.grid()
510 | plt.show()
511 |
512 |
513 |
--------------------------------------------------------------------------------
/user_embeddings.py:
--------------------------------------------------------------------------------
1 | import json
2 | from nltk.tokenize import RegexpTokenizer
3 | import string
4 |
5 | #data = json.loads("user_comments.json")
6 | with open('user_comments.json','r') as fp:
7 | data = json.load(fp)
8 | import nltk
9 | from nltk.corpus import stopwords
10 | from nltk import word_tokenize
11 | stop_words = set(stopwords.words('english'))
12 | count = 0
13 | tokenizer = RegexpTokenizer(r'\w+')
14 | for e in data:
15 | if(count > 1000):
16 | break
17 | count+=1
18 | word_tokens = word_tokenize(data[e]['body'])
19 | filtered_sentence = [w for w in word_tokens if not w in stop_words]
20 | filtered_sentence = [w for w in filtered_sentence if not w == '\n']
21 | filtered_sentence = [w for w in filtered_sentence if not w in string.punctuation]
22 | data[e]['body'] = filtered_sentence
23 |
24 | from collections import defaultdict
25 | all_user_comments = defaultdict(list)
26 | count = 0
27 | for e in data:
28 | curr_list = []
29 | count+=1
30 | curr_list = all_user_comments[data[e]['author']]
31 | for word in (data[e]['body']):
32 | curr_list.append(word)
33 | all_user_comments[data[e]['author']] = curr_list
34 |
35 |
36 | from nltk.tokenize import word_tokenize
37 | from gensim.test.utils import common_texts
38 | from gensim.models.doc2vec import Doc2Vec, TaggedDocument
39 | # reviews_train_list = getReviewsList(reviews_train)
40 |
41 | tagged_data = [TaggedDocument(words=(_d), tags=[str(i)]) for i,_d in all_user_comments.items()]
42 | model = Doc2Vec(tagged_data, vector_size=50, window=3, min_count=5, epochs = 4, workers=8)
43 |
44 | sims = model.docvecs.most_similar('YoungModern')
45 |
--------------------------------------------------------------------------------
/utils.py:
--------------------------------------------------------------------------------
1 | import numpy as np
2 | import tqdm
3 | from collections import defaultdict
4 |
5 |
6 | def item_to_index(things):
7 | index = {}
8 | for idx, item in enumerate(things):
9 | index[item] = idx
10 | return index
11 |
12 |
13 | def index_to_item(index):
14 | things = np.empty(len(index), dtype=object)
15 | for item, idx in index.items():
16 | things[idx] = item
17 | return things
18 |
19 |
20 | class Interaction():
21 | "Deal with basic manipulations of Interaction object"
22 |
23 | def __init__(self, interactions, num_users, num_items):
24 | self.interactions = interactions
25 | self.num_users = num_users
26 | self.num_items = num_items
27 | self.interactions_dict = None
28 |
29 | def get_interaction_dict(self):
30 | "Returns the interactions in the form of dictionary"
31 | if self.interactions_dict is None:
32 | self.interactions_dict = defaultdict(
33 | lambda: defaultdict(lambda: 0))
34 | for user, item, count in self.interactions:
35 | self.interactions_dict[user][item] = int(count)
36 | return self.interactions_dict
37 |
38 |
39 | def auc(test_set, user_factors, subreddit_factors, subreddits, users):
40 | """
41 | Returns the auc score on a test data set
42 | """
43 | num_users = len(test_set)
44 | total = 0
45 |
46 | # treat the signal as 1 as per the implicit bpr paper
47 | for subreddit, user, signal in tqdm.tqdm_notebook(test_set):
48 | u = users_index[user]
49 | i = subreddits_index[subreddit]
50 |
51 | x_ui = user_factors[u].dot(subreddit_factors[i])
52 |
53 | js = []
54 |
55 | for j in range(0, num_subreddits):
56 | if j != i and j not in E_u[u]:
57 | js.append(j)
58 |
59 | total += np.sum(np.heaviside(x_ui - \
60 | user_factors[u].dot(subreddit_factors[js].T), 0)) / len(js)
61 |
62 | return total / num_users
63 |
--------------------------------------------------------------------------------
/vanilla_tbpr.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "code",
5 | "execution_count": 1,
6 | "metadata": {},
7 | "outputs": [],
8 | "source": [
9 | "import numpy as np\n",
10 | "from collections import defaultdict"
11 | ]
12 | },
13 | {
14 | "cell_type": "code",
15 | "execution_count": null,
16 | "metadata": {},
17 | "outputs": [],
18 | "source": [
19 | "#get embeddings for users \n",
20 | "#load embeddings for subreddits\n"
21 | ]
22 | },
23 | {
24 | "cell_type": "code",
25 | "execution_count": 2,
26 | "metadata": {},
27 | "outputs": [],
28 | "source": [
29 | "#calculate biases for users and for subreddit\n",
30 | "users = set()\n",
31 | "subreddits = set()\n",
32 | "user_bias = defaultdict(float)\n",
33 | "subreddit_bias = defaultdict(float)\n",
34 | "with open(\"final_interactions_count\") as infile:\n",
35 | " for i in infile.readlines():\n",
36 | " line = i.split()\n",
37 | " users.add(line[1])\n",
38 | " subreddits.add(line[0])\n",
39 | " user_bias[line[1]]+=1\n",
40 | " subreddit_bias[line[0]] +=1\n",
41 | " \n",
42 | " "
43 | ]
44 | },
45 | {
46 | "cell_type": "code",
47 | "execution_count": 3,
48 | "metadata": {},
49 | "outputs": [
50 | {
51 | "name": "stdout",
52 | "output_type": "stream",
53 | "text": [
54 | "Number of unique users: 735834\n"
55 | ]
56 | }
57 | ],
58 | "source": [
59 | "print(\"Number of unique users:\", len(users))"
60 | ]
61 | },
62 | {
63 | "cell_type": "code",
64 | "execution_count": 4,
65 | "metadata": {},
66 | "outputs": [
67 | {
68 | "name": "stdout",
69 | "output_type": "stream",
70 | "text": [
71 | "Number of unique subreddits: 14842\n"
72 | ]
73 | }
74 | ],
75 | "source": [
76 | "print(\"Number of unique subreddits:\", len(subreddits))"
77 | ]
78 | },
79 | {
80 | "cell_type": "code",
81 | "execution_count": 5,
82 | "metadata": {},
83 | "outputs": [
84 | {
85 | "data": {
86 | "text/plain": [
87 | "14842"
88 | ]
89 | },
90 | "execution_count": 5,
91 | "metadata": {},
92 | "output_type": "execute_result"
93 | }
94 | ],
95 | "source": [
96 | "len(subreddit_bias)"
97 | ]
98 | },
99 | {
100 | "cell_type": "code",
101 | "execution_count": 6,
102 | "metadata": {},
103 | "outputs": [],
104 | "source": [
105 | "for e in user_bias:\n",
106 | " user_bias[e] /= len(subreddits)"
107 | ]
108 | },
109 | {
110 | "cell_type": "code",
111 | "execution_count": 7,
112 | "metadata": {},
113 | "outputs": [],
114 | "source": [
115 | "for e in subreddit_bias:\n",
116 | " subreddit_bias[e] /= len(users)"
117 | ]
118 | },
119 | {
120 | "cell_type": "code",
121 | "execution_count": 8,
122 | "metadata": {},
123 | "outputs": [
124 | {
125 | "data": {
126 | "text/plain": [
127 | "defaultdict(float,\n",
128 | " {'exmormon': 1.1848462492432234e-07,\n",
129 | " 'CanadaPolitics': 3.809088405604181e-08,\n",
130 | " 'AdviceAnimals': 1.8738700533819605e-06,\n",
131 | " 'WTF': 1.1059175423770984e-06,\n",
132 | " 'needadvice': 2.4539319536103853e-08,\n",
133 | " 'summonerschool': 1.8990503252940072e-07,\n",
134 | " 'sausagetalk': 4.578231256735794e-10,\n",
135 | " 'Naruto': 4.917020369734243e-08,\n",
136 | " 'hockey': 5.96543532752674e-07,\n",
137 | " 'Games': 5.559804038179948e-07,\n",
138 | " 'knives': 2.664530591420232e-08,\n",
139 | " 'LSD': 3.424516980038374e-08,\n",
140 | " 'changemyview': 1.7003550887516737e-07,\n",
141 | " 'beertrade': 2.8018775291223055e-08,\n",
142 | " 'TumblrInAction': 4.6688802356191625e-07,\n",
143 | " 'GrandTheftAutoV': 1.118004072894881e-07,\n",
144 | " 'AskReddit': 1.1843884261175499e-05,\n",
145 | " 'news': 1.3848233905374428e-06,\n",
146 | " 'HannibalTV': 7.050476135373122e-09,\n",
147 | " 'worldnews': 1.973492365528531e-06,\n",
148 | " 'BMW': 4.349319693899004e-08,\n",
149 | " 'Smite': 3.107703377072257e-07,\n",
150 | " 'BabyBumps': 1.41467345833136e-07,\n",
151 | " 'randomsuperpowers': 3.754149630523351e-09,\n",
152 | " 'devils': 1.9594829778829195e-08,\n",
153 | " 'Reformed': 9.705850264279882e-09,\n",
154 | " 'wow': 6.867346885103691e-07,\n",
155 | " 'Android': 3.8466299019094134e-07,\n",
156 | " 'MLPLounge': 6.153142809052907e-08,\n",
157 | " 'falcons': 2.270802703340954e-08,\n",
158 | " 'dayz': 1.943001345358671e-07,\n",
159 | " 'Autos': 4.21197275619693e-08,\n",
160 | " 'EliteDangerous': 2.8357564404221504e-07,\n",
161 | " 'Celebs': 1.7855101901269594e-08,\n",
162 | " 'NSFW_GIF': 1.2361224393186642e-08,\n",
163 | " 'Insurance': 1.3002176769129656e-08,\n",
164 | " 'childfree': 1.6792952249706893e-07,\n",
165 | " 'playrust': 3.882340105711953e-08,\n",
166 | " 'pokemontrades': 1.9521578078721423e-07,\n",
167 | " 'MLS': 1.3121210781804784e-07,\n",
168 | " 'elliottsmith': 1.9228571278290334e-09,\n",
169 | " 'roosterteeth': 1.333180941961463e-07,\n",
170 | " 'EverythingScience': 9.156462513471587e-09,\n",
171 | " 'nba': 9.085042105866509e-07,\n",
172 | " 'funny': 2.7407123595323152e-06,\n",
173 | " 'todayilearned': 2.0001376714427336e-06,\n",
174 | " 'dbz': 3.342108817417129e-08,\n",
175 | " 'techsupport': 1.771775496356752e-07,\n",
176 | " 'thelastofus': 1.0987755016165905e-08,\n",
177 | " 'beercanada': 4.578231256735794e-10,\n",
178 | " 'leagueoflegends': 2.3763767161212808e-06,\n",
179 | " 'headphones': 8.863455713040496e-08,\n",
180 | " 'DestinyTheGame': 1.3529589009905616e-06,\n",
181 | " 'Seattle': 1.0887033928517718e-07,\n",
182 | " 'HistoricalPowers': 3.1131972545803397e-09,\n",
183 | " 'videos': 1.733409918425306e-06,\n",
184 | " 'Cardinals': 1.3185306019399085e-08,\n",
185 | " 'Metroid': 1.2269659768051926e-08,\n",
186 | " 'gwcumsluts': 1.4650340021554539e-09,\n",
187 | " 'Portland': 8.762734625392309e-08,\n",
188 | " 'weddingplanning': 9.330435301227548e-08,\n",
189 | " 'swtor': 8.167564562016656e-08,\n",
190 | " 'projectcar': 6.409523759430112e-09,\n",
191 | " 'kpop': 4.889550982193828e-08,\n",
192 | " 'atheism': 4.730228534459422e-07,\n",
193 | " 'electronic_cigarette': 4.778757785780821e-07,\n",
194 | " 'Bushcraft': 7.2336053856425545e-09,\n",
195 | " 'perktv': 3.882340105711953e-08,\n",
196 | " 'canada': 2.445691137348261e-07,\n",
197 | " 'MMA': 3.677235345410189e-07,\n",
198 | " 'weeabootales': 5.951700633756532e-09,\n",
199 | " 'randomactsofcsgo': 7.636489736235304e-08,\n",
200 | " 'nfl': 1.805654407656597e-06,\n",
201 | " 'soccer': 8.669338707754898e-07,\n",
202 | " 'DotA2': 9.389952307565113e-07,\n",
203 | " 'FixedGearBicycle': 2.792721066608834e-08,\n",
204 | " 'OkCupid': 1.5657550898036414e-07,\n",
205 | " 'pcmasterrace': 1.3237497855725873e-06,\n",
206 | " 'FIFA': 2.7551795703036007e-07,\n",
207 | " 'Assistance': 2.591278891312459e-08,\n",
208 | " 'Glitch_in_the_Matrix': 1.8496054277212605e-08,\n",
209 | " 'gaming': 1.1209341408991917e-06,\n",
210 | " 'buffalobills': 3.1498231046342264e-08,\n",
211 | " 'TheWayWeWere': 4.395102006466362e-09,\n",
212 | " 'Random_Acts_Of_Amazon': 1.0355959102736366e-07,\n",
213 | " 'hometheater': 2.8018775291223055e-08,\n",
214 | " 'britishproblems': 1.0923659778571604e-07,\n",
215 | " 'korrasami': 2.664530591420232e-08,\n",
216 | " 'ffxiv': 2.7661673253197666e-07,\n",
217 | " 'Wishlist': 2.600435353825931e-08,\n",
218 | " 'GoneWildPlus': 4.202816293683458e-08,\n",
219 | " 'pics': 2.268147329212047e-06,\n",
220 | " 'NoFap': 1.6701387624572173e-07,\n",
221 | " 'Planetside': 1.736065292554213e-07,\n",
222 | " 'RagenChastain': 6.409523759430111e-10,\n",
223 | " 'Sneakers': 8.561292450095934e-08,\n",
224 | " 'Warframe': 1.0630652978140512e-07,\n",
225 | " 'sports': 1.0383428490276781e-07,\n",
226 | " 'genetics': 1.0072108764818748e-09,\n",
227 | " 'serialpodcast': 1.9283510053371163e-07,\n",
228 | " 'gifs': 5.62664621452829e-07,\n",
229 | " 'hearthstone': 5.125787715041394e-07,\n",
230 | " 'GlobalOffensiveTrade': 2.7707455565765024e-07,\n",
231 | " 'CFB': 8.075084290630592e-07,\n",
232 | " 'anime': 5.230171387694971e-07,\n",
233 | " 'buildapc': 4.953646219788129e-07,\n",
234 | " 'RealEstate': 3.259700654795885e-08,\n",
235 | " 'iphone': 6.720843484888145e-08,\n",
236 | " 'Homebrewing': 1.2828203981373692e-07,\n",
237 | " 'Anarcho_Capitalism': 6.006639408837361e-08,\n",
238 | " 'rwbyRP': 6.409523759430112e-09,\n",
239 | " 'truegaming': 5.640380908298498e-08,\n",
240 | " 'xboxone': 3.627790447837443e-07,\n",
241 | " 'touhou': 1.6481632524248858e-08,\n",
242 | " 'CrazyIdeas': 2.0327346779906925e-08,\n",
243 | " 'AskWomen': 3.6158870465699296e-07,\n",
244 | " 'politics': 6.692458451096383e-07,\n",
245 | " 'Military': 6.116516958999021e-08,\n",
246 | " 'barstoolsports': 3.296326504849771e-09,\n",
247 | " 'smashbros': 4.1570339811161e-07,\n",
248 | " 'TheLastAirbender': 1.3368435269668516e-07,\n",
249 | " 'KotakuInAction': 2.9126707255353117e-07,\n",
250 | " 'SquaredCircle': 7.9276652441637e-07,\n",
251 | " 'wicked_edge': 5.0452108449228445e-08,\n",
252 | " 'Trucks': 3.1315101796072827e-08,\n",
253 | " 'EngineeringPorn': 3.296326504849771e-09,\n",
254 | " 'neopets': 5.4572516580290665e-08,\n",
255 | " 'Damnthatsinteresting': 8.973333263202155e-09,\n",
256 | " 'giftcardexchange': 1.2361224393186642e-08,\n",
257 | " 'learnprogramming': 7.187823073075196e-08,\n",
258 | " 'dogecoin': 6.69337409734773e-08,\n",
259 | " 'aspergers': 2.6828435164471752e-08,\n",
260 | " 'offmychest': 1.2132312830349852e-07,\n",
261 | " 'Knoxville': 4.852925132139941e-09,\n",
262 | " 'Judaism': 2.151768690665823e-08,\n",
263 | " 'Bravenewbies': 4.9811156073285435e-08,\n",
264 | " 'ar15': 5.731945533433214e-08,\n",
265 | " 'DaystromInstitute': 2.8751292292300784e-08,\n",
266 | " 'sex': 2.9035142630218407e-07,\n",
267 | " 'Showerthoughts': 4.255923776261594e-07,\n",
268 | " 'Seahawks': 1.6353442049060254e-07,\n",
269 | " 'pussypassdenied': 2.637061203879817e-08,\n",
270 | " 'CompetitiveHS': 4.8803945196803556e-08,\n",
271 | " 'transpassing': 2.2067074657466526e-08,\n",
272 | " 'TwoXChromosomes': 2.7625047403143776e-07,\n",
273 | " 'tarot': 3.754149630523351e-09,\n",
274 | " 'mtgfinance': 7.874557761585566e-09,\n",
275 | " 'mistyfront': 1.8312925026943174e-10,\n",
276 | " 'rage': 6.74831287242856e-08,\n",
277 | " 'boardgames': 2.0263251542312626e-07,\n",
278 | " 'lotr': 3.0307890919590955e-08,\n",
279 | " 'startrek': 5.686163220865856e-08,\n",
280 | " 'opiates': 7.013850285319236e-08,\n",
281 | " 'personalfinance': 3.6424407878589973e-07,\n",
282 | " 'trashy': 8.47888428747469e-08,\n",
283 | " 'Rateme': 9.614285639145167e-08,\n",
284 | " 'firstworldproblems': 7.691428511316134e-09,\n",
285 | " 'subaru': 8.890925100580911e-08,\n",
286 | " 'bonnaroo': 6.098204033972077e-08,\n",
287 | " 'cars': 2.980428548135002e-07,\n",
288 | " 'marvelstudios': 8.643700612717178e-08,\n",
289 | " 'Fitness': 4.759529214502531e-07,\n",
290 | " 'Bass': 3.662585005388635e-08,\n",
291 | " 'ProRevenge': 6.409523759430112e-09,\n",
292 | " 'CasualPokemonTrades': 9.467782238929621e-08,\n",
293 | " 'singapore': 5.53965982065031e-08,\n",
294 | " 'Bitcoin': 3.5801768427673905e-07,\n",
295 | " 'CodAW': 1.4018544108125001e-07,\n",
296 | " 'thelastofusfactions': 2.4264625660699703e-08,\n",
297 | " 'tf2': 2.2918625671219385e-07,\n",
298 | " 'linux': 1.0044639377278332e-07,\n",
299 | " 'teenagers': 2.526268007466811e-07,\n",
300 | " 'hiphopheads': 3.2102557572231386e-07,\n",
301 | " 'Futurology': 1.5382857022632267e-07,\n",
302 | " 'autism': 6.226394509160679e-09,\n",
303 | " 'relationships': 6.945176816468198e-07,\n",
304 | " 'BostonBruins': 2.591278891312459e-08,\n",
305 | " 'australia': 2.1087333168525065e-07,\n",
306 | " 'explainlikeimfive': 6.417764575692236e-07,\n",
307 | " 'guns': 1.7699442038540578e-07,\n",
308 | " 'comicswap': 9.156462513471587e-09,\n",
309 | " 'drunk': 3.552707455226976e-08,\n",
310 | " 'OldSchoolCool': 6.986380897778822e-08,\n",
311 | " 'asstastic': 1.0346802640222895e-08,\n",
312 | " 'RealGirls': 2.069360528044579e-08,\n",
313 | " 'predaddit': 1.0896190391031188e-08,\n",
314 | " 'powerrangers': 9.705850264279882e-09,\n",
315 | " 'mercedes': 1.8312925026943174e-10,\n",
316 | " 'Scrubs': 1.7397278775596015e-09,\n",
317 | " '3Dprinting': 5.567129208190725e-08,\n",
318 | " 'Padres': 3.479455755119203e-09,\n",
319 | " 'PercyJacksonRP': 1.2635918268590791e-08,\n",
320 | " 'milwaukee': 1.1903401267513063e-08,\n",
321 | " 'subredditreports': 1.8312925026943174e-10,\n",
322 | " 'magicTCG': 4.3465727551449625e-07,\n",
323 | " 'travel': 7.535768648587117e-08,\n",
324 | " 'starcraft': 2.025409507979915e-07,\n",
325 | " 'PKA': 3.442829905065317e-08,\n",
326 | " 'LiverpoolFC': 1.3295183569560746e-07,\n",
327 | " 'Firearms': 4.413414931493305e-08,\n",
328 | " 'surfing': 2.105986378098465e-08,\n",
329 | " 'exjw': 4.129564593575686e-08,\n",
330 | " 'blog': 1.822136040180846e-08,\n",
331 | " 'Wet_Shavers': 2.1883945407197095e-08,\n",
332 | " 'Waxpen': 1.5565986272901698e-09,\n",
333 | " 'KillLaKill': 6.134829884025963e-09,\n",
334 | " 'depression': 9.183931901012002e-08,\n",
335 | " 'FutureWhatIf': 6.409523759430112e-09,\n",
336 | " 'asoiaf': 2.3449700497000736e-07,\n",
337 | " 'learnpython': 2.170081615692766e-08,\n",
338 | " 'tmobile': 3.3237958923901865e-08,\n",
339 | " 'newhampshire': 8.88176863806744e-09,\n",
340 | " 'SNSD': 4.21197275619693e-09,\n",
341 | " 'AgainstGamerGate': 3.772462555550294e-08,\n",
342 | " 'CoDCompetitive': 1.4293237983529148e-07,\n",
343 | " 'nerdcubed': 3.488612217632675e-08,\n",
344 | " 'InsurgenceBattles': 1.8312925026943174e-10,\n",
345 | " 'gamecollecting': 5.127619007544089e-08,\n",
346 | " 'bravefrontier': 2.175575493200849e-07,\n",
347 | " 'bjj': 8.790204012932724e-08,\n",
348 | " 'GlobalOffensive': 9.254436662365733e-07,\n",
349 | " 'TrollXChromosomes': 3.3869754837331403e-07,\n",
350 | " 'NHLHUT': 7.947809461693338e-08,\n",
351 | " 'androidcirclejerk': 1.3917823020476812e-08,\n",
352 | " 'Maplestory': 2.7744081415818912e-08,\n",
353 | " 'mildlyinteresting': 2.370608144737794e-07,\n",
354 | " 'trains': 3.571020380253919e-09,\n",
355 | " 'arrow': 6.6384353222669e-08,\n",
356 | " 'summonerswar': 1.1729428479757104e-07,\n",
357 | " 'NHLStreams': 5.127619007544089e-09,\n",
358 | " 'pathofexile': 1.834955087699706e-07,\n",
359 | " 'cats': 5.5305033581368394e-08,\n",
360 | " 'aww': 2.9950788881565565e-07,\n",
361 | " 'tifu': 6.233719679171457e-07,\n",
362 | " 'Catholicism': 4.889550982193828e-08,\n",
363 | " 'PoliticalDiscussion': 7.068789060400065e-08,\n",
364 | " 'hardwareswap': 1.0603183590600098e-07,\n",
365 | " 'amiibo': 2.77807072658728e-07,\n",
366 | " 'nostalgia': 1.2361224393186642e-08,\n",
367 | " 'TrueAtheism': 3.0948843295533965e-08,\n",
368 | " 'pokemon': 3.4931904488894105e-07,\n",
369 | " 'BitTippers': 9.888979514549315e-09,\n",
370 | " 'offbeat': 3.415360517524902e-08,\n",
371 | " 'Watches': 8.872612175553969e-08,\n",
372 | " 'science': 1.6765482862166478e-07,\n",
373 | " 'InfertilitySucks': 2.9300680043109078e-09,\n",
374 | " 'minnesotavikings': 4.230285681223873e-08,\n",
375 | " 'HFY': 2.243333315800539e-08,\n",
376 | " 'TalesFromTheSquadCar': 4.578231256735794e-10,\n",
377 | " 'prisonarchitect': 1.6023809398575278e-08,\n",
378 | " 'tfc': 4.486666631601077e-09,\n",
379 | " 'csgobetting': 2.4035714097862917e-07,\n",
380 | " 'mindcrack': 6.317959134295395e-08,\n",
381 | " 'AskMen': 4.305368673834341e-07,\n",
382 | " 'Whatcouldgowrong': 2.7835646040953624e-08,\n",
383 | " 'metalmusicians': 2.105986378098465e-09,\n",
384 | " 'metalgearsolid': 6.894816272644105e-08,\n",
385 | " 'Korrathegame': 3.6625850053886347e-10,\n",
386 | " 'Colorado': 6.592653009699542e-09,\n",
387 | " 'Steam': 1.1170884266435336e-07,\n",
388 | " 'UnderwearGW': 2.197551003233181e-09,\n",
389 | " 'YouEnterADungeon': 5.4023128829482365e-09,\n",
390 | " 'WildStar': 2.5454965787451013e-08,\n",
391 | " 'Pathfinder_RPG': 6.6384353222669e-08,\n",
392 | " 'Fireteams': 4.3712952039313356e-07,\n",
393 | " 'Surface': 4.44088431903372e-08,\n",
394 | " 'bayarea': 1.9411700528559765e-08,\n",
395 | " 'NASCAR': 4.78882989454564e-08,\n",
396 | " 'BillBurr': 2.197551003233181e-09,\n",
397 | " 'fantasybball': 6.464462534510941e-08,\n",
398 | " 'politota': 2.3898367160160845e-08,\n",
399 | " 'cigars': 4.61485710678968e-08,\n",
400 | " 'coys': 7.251918310669497e-08,\n",
401 | " 'MMORPG': 3.992217655873612e-08,\n",
402 | " 'cringe': 1.228797269307887e-07,\n",
403 | " 'ExpectationVsReality': 3.296326504849771e-09,\n",
404 | " 'apple': 1.755293863832503e-07,\n",
405 | " 'MURICA': 3.6076462303078057e-08,\n",
406 | " 'RunnerHub': 9.98054413968403e-09,\n",
407 | " 'ADHD': 4.8803945196803556e-08,\n",
408 | " 'CasualConversation': 3.687307454175008e-07,\n",
409 | " 'suboxone': 3.6625850053886347e-10,\n",
410 | " 'PerfectTiming': 2.2891156283678968e-09,\n",
411 | " 'shittyideas': 1.1903401267513065e-09,\n",
412 | " 'shapeoko': 1.0987755016165905e-09,\n",
413 | " 'lego': 6.894816272644105e-08,\n",
414 | " 'IraqiSoccer': 2.7469387540414763e-10,\n",
415 | " 'fitbit': 1.2819047518860223e-08,\n",
416 | " 'hardware': 3.342108817417129e-08,\n",
417 | " 'BloodWorld': 1.5565986272901698e-09,\n",
418 | " 'fakeid': 4.724734656951339e-08,\n",
419 | " 'customhearthstone': 9.797414889414598e-09,\n",
420 | " 'cordcutters': 4.074625818494856e-08,\n",
421 | " 'engineering': 2.2799591658544252e-08,\n",
422 | " 'flying': 4.825455744599527e-08,\n",
423 | " 'jailbreak': 2.1069020243498123e-07,\n",
424 | " 'halo': 2.1545156294198644e-07,\n",
425 | " 'MechanicAdvice': 5.0268979198959015e-08,\n",
426 | " 'beauty': 8.240816262124428e-10,\n",
427 | " 'GameDeals': 9.614285639145167e-08,\n",
428 | " 'kings': 8.607074762663292e-09,\n",
429 | " 'The_Tavern': 1.3734693770207382e-09,\n",
430 | " 'transhumanism': 1.9228571278290334e-09,\n",
431 | " 'CompanyOfHeroes': 1.4009387645611527e-08,\n",
432 | " 'PuertoRico': 4.395102006466362e-09,\n",
433 | " 'ReefTank': 2.105986378098465e-08,\n",
434 | " 'SubredditDrama': 2.821106100400596e-07,\n",
435 | " 'seduction': 4.285224456304703e-08,\n",
436 | " 'smoking': 1.15371427669742e-08,\n",
437 | " 'PokemonInsurgence': 3.580176842767391e-08,\n",
438 | " 'BBWGW': 9.614285639145167e-09,\n",
439 | " 'HomeImprovement': 5.0085849948689584e-08,\n",
440 | " 'AoTRP': 1.6481632524248856e-09,\n",
441 | " 'OpTicGaming': 4.889550982193828e-08,\n",
442 | " 'Christianity': 1.7534625713298088e-07,\n",
443 | " 'baseball': 2.2231890982709012e-07,\n",
444 | " 'Cinema4D': 6.592653009699542e-09,\n",
445 | " 'fatpeoplehate': 3.462974122594954e-07,\n",
446 | " 'assettocorsa': 1.1628707392108916e-08,\n",
447 | " 'vinyl': 9.385374076308377e-08,\n",
448 | " 'wikipedia': 5.31074825781352e-09,\n",
449 | " 'ForeverAlone': 4.944489757274657e-08,\n",
450 | " 'TalesFromRetail': 9.531877476523923e-08,\n",
451 | " 'starcitizen': 1.4055169958178886e-07,\n",
452 | " 'makeupexchange': 4.9719591448150716e-08,\n",
453 | " 'MotoX': 3.3146394298767146e-08,\n",
454 | " 'streetwear': 8.259129187151372e-08,\n",
455 | " 'roasting': 2.7469387540414763e-09,\n",
456 | " 'ClashOfClans': 1.4119265195773186e-07,\n",
457 | " 'justneckbeardthings': 6.180612196593322e-08,\n",
458 | " 'Warhammer40k': 5.301591795300049e-08,\n",
459 | " 'keto': 1.3972761795557644e-07,\n",
460 | " 'LosAngeles': 6.51940130959177e-08,\n",
461 | " 'heroesofthestorm': 2.2827061046084665e-07,\n",
462 | " 'casualiama': 1.55659862729017e-07,\n",
463 | " 'PS4': 3.8786775207065646e-07,\n",
464 | " 'millionairemakers': 4.651482956843566e-08,\n",
465 | " 'NZTrees': 1.0072108764818748e-09,\n",
466 | " 'AnimalsBeingJerks': 8.973333263202155e-09,\n",
467 | " 'Hungergames': 4.578231256735794e-10,\n",
468 | " 'ireland': 9.980544139684031e-08,\n",
469 | " 'ender': 5.493877508082953e-10,\n",
470 | " 'shittybattlestations': 2.197551003233181e-09,\n",
471 | " 'Guildwars2': 2.618748278852874e-07,\n",
472 | " 'Mavericks': 7.782993136450849e-09,\n",
473 | " 'MarkMyWords': 4.761360507005226e-09,\n",
474 | " 'progresspics': 3.1315101796072827e-08,\n",
475 | " 'inFAMOUSRP': 4.395102006466362e-09,\n",
476 | " 'Guitar': 1.573995906065766e-07,\n",
477 | " '49ers': 5.8601360086218166e-08,\n",
478 | " 'Nexus5': 4.285224456304703e-08,\n",
479 | " 'FulfillmentByAmazon': 1.107931964130062e-08,\n",
480 | " 'jacking': 7.325170010777269e-10,\n",
481 | " 'spicy': 8.973333263202155e-09,\n",
482 | " 'JusticePorn': 1.9393387603532823e-07,\n",
483 | " 'gtfoDerrickandArmand': 1.7397278775596015e-09,\n",
484 | " 'Kappa': 6.409523759430111e-08,\n",
485 | " 'KerbalSpaceProgram': 1.7397278775596016e-07,\n",
486 | " 'PurplePillDebate': 3.415360517524902e-08,\n",
487 | " 'Dogtraining': 2.151768690665823e-08,\n",
488 | " 'InternetIsBeautiful': 2.4722448786373283e-08,\n",
489 | " 'tolkienfans': 2.4264625660699703e-08,\n",
490 | " 'StarcraftCirclejerk': 3.1131972545803397e-09,\n",
491 | " 'HeroRP': 3.204761879715056e-09,\n",
492 | " 'GTA': 5.7685713834871e-09,\n",
493 | " 'mtgaltered': 2.0144217529637496e-09,\n",
494 | " 'TF2fashionadvice': 7.508299261046701e-09,\n",
495 | " 'battlefield_4': 1.1500516916920314e-07,\n",
496 | " 'Pokemongiveaway': 2.38159589975396e-07,\n",
497 | " 'PopCornTime': 2.8385033791761925e-09,\n",
498 | " 'rockets': 2.041891140504164e-08,\n",
499 | " 'feedthebeast': 7.755523748910435e-08,\n",
500 | " 'TrueChristian': 1.6756326399653007e-08,\n",
501 | " 'torontoraptors': 2.069360528044579e-08,\n",
502 | " 'Texans': 2.0876734530715217e-08,\n",
503 | " 'oculus': 1.361565975753225e-07,\n",
504 | " 'AndroidQuestions': 2.7377822915280048e-08,\n",
505 | " 'oneplus': 7.892870686612508e-08,\n",
506 | " 'memphis': 7.2336053856425545e-09,\n",
507 | " 'whowouldwin': 2.1490217519117817e-07,\n",
508 | " 'planetaryannihilation': 6.958911510238406e-09,\n",
509 | " 'Libertarian': 9.412843463848792e-08,\n",
510 | " 'polandball': 1.2370380855700114e-07,\n",
511 | " 'DetroitRedWings': 3.882340105711953e-08,\n",
512 | " 'LonghornNation': 5.585442133217669e-09,\n",
513 | " 'wiiu': 1.5785741373225016e-07,\n",
514 | " 'Music': 2.489642157412925e-07,\n",
515 | " 'amiugly': 8.762734625392309e-08,\n",
516 | " 'lifehacks': 1.8404489652077893e-08,\n",
517 | " 'reactiongifs': 8.964176800688685e-08,\n",
518 | " 'badkarma': 2.7469387540414763e-10,\n",
519 | " 'manga': 7.654802661262248e-08,\n",
520 | " 'HongKong': 1.2452789018321359e-08,\n",
521 | " 'Nerf': 1.5932244773440563e-08,\n",
522 | " 'Calligraphy': 8.332380887259144e-09,\n",
523 | " 'relationship_advice': 1.3249401256993386e-07,\n",
524 | " 'podcasts': 1.7580408025865448e-08,\n",
525 | " 'television': 1.8953877402886187e-07,\n",
526 | " 'spaceengineers': 5.603755058244611e-08,\n",
527 | " 'Askasurvivor': 4.303537381331646e-09,\n",
528 | " 'Monstercat': 5.42062580797518e-08,\n",
529 | " 'ShinyPokemon': 2.417306103556499e-08,\n",
530 | " 'Marvel': 9.980544139684031e-08,\n",
531 | " 'paydaytheheist': 9.843197201981956e-08,\n",
532 | " '3DS': 2.3742707297431828e-07,\n",
533 | " 'malaysia': 2.060204065531107e-08,\n",
534 | " 'Design': 5.677006758352385e-09,\n",
535 | " 'Cruise': 5.31074825781352e-09,\n",
536 | " 'IDontWorkHereLady': 3.204761879715056e-09,\n",
537 | " 'friendsafari': 1.7342340000515187e-07,\n",
538 | " 'eroticauthors': 1.309374139426437e-08,\n",
539 | " 'beer': 6.455306071997468e-08,\n",
540 | " 'burlington': 3.021632629445624e-09,\n",
541 | " 'RWBY': 5.301591795300049e-08,\n",
542 | " 'webdev': 6.208081584133737e-08,\n",
543 | " 'Bad_Cop_No_Donut': 5.1917142451383904e-08,\n",
544 | " 'ChanceTheRapper': 6.409523759430111e-10,\n",
545 | " 'TripSit': 1.0072108764818748e-09,\n",
546 | " 'chicago': 8.414789049880389e-08,\n",
547 | " 'gameofthrones': 4.6423264943300945e-08,\n",
548 | " 'MvC3': 1.7855101901269594e-08,\n",
549 | " 'providence': 5.7685713834871e-09,\n",
550 | " 'MechanicalKeyboards': 1.367059853261308e-07,\n",
551 | " 'RandomActsOfGaming': 3.488612217632675e-08,\n",
552 | " 'askscience': 8.945863875661741e-08,\n",
553 | " 'SVExchange': 1.9072911415561316e-07,\n",
554 | " 'stunfisk': 2.7286258290145332e-08,\n",
555 | " 'Cinemagraphs': 1.2819047518860222e-09,\n",
556 | " 'intj': 3.86402718068501e-08,\n",
557 | " 'spaceengine': 7.782993136450849e-09,\n",
558 | " 'LifeProTips': 1.0996911478679375e-07,\n",
559 | " '52book': 1.6023809398575278e-08,\n",
560 | " 'sportsbook': 2.3806802535026127e-08,\n",
561 | " 'Warthunder': 1.1839306029918761e-07,\n",
562 | " 'YMS': 4.486666631601077e-09,\n",
563 | " 'FrugalFemaleFashion': 1.0987755016165905e-09,\n",
564 | " 'ClopClop': 3.296326504849771e-09,\n",
565 | " 'AndroidMasterRace': 2.417306103556499e-08,\n",
566 | " 'architecture': 1.3917823020476812e-08,\n",
567 | " 'Illustration': 2.105986378098465e-09,\n",
568 | " 'zoophilia': 1.3734693770207382e-09,\n",
569 | " 'Multicopter': 6.464462534510941e-08,\n",
570 | " 'AskVet': 1.062149651562704e-08,\n",
571 | " 'Psychonaut': 4.312693843845117e-08,\n",
572 | " 'duolingo': 1.2635918268590791e-08,\n",
573 | " 'cincinnati': 1.8496054277212605e-08,\n",
574 | " 'trees': 6.737325117412394e-07,\n",
575 | " 'technology': 5.123040776287353e-07,\n",
576 | " 'DixieFood': 4.578231256735794e-10,\n",
577 | " 'alaska': 1.0438367265357608e-08,\n",
578 | " 'RandomKindness': 1.0163673389953462e-08,\n",
579 | " 'ultrahardcore': 4.367632618925947e-08,\n",
580 | " 'SSBPM': 5.0177414573824296e-08,\n",
581 | " 'LAClippers': 8.607074762663292e-09,\n",
582 | " 'sportsarefun': 4.578231256735794e-10,\n",
583 | " 'WritingPrompts': 1.0749686990815643e-07,\n",
584 | " 'PictureGame': 8.88176863806744e-09,\n",
585 | " 'stopdrinking': 6.894816272644105e-08,\n",
586 | " 'battlestations': 7.187823073075196e-08,\n",
587 | " 'Jokes': 9.074054350850342e-08,\n",
588 | " 'NewYorkMets': 1.2452789018321359e-08,\n",
589 | " 'androidapps': 2.4356190285834425e-08,\n",
590 | " 'clevelandcavs': 3.552707455226976e-08,\n",
591 | " 'Anarchism': 4.193659831169987e-08,\n",
592 | " 'Chargers': 1.8404489652077893e-08,\n",
593 | " 'blackladies': 1.3368435269668517e-08,\n",
594 | " 'IceandFirePowers': 7.142040760507838e-09,\n",
595 | " 'GamerGhazi': 9.943918289630143e-08,\n",
596 | " 'Austin': 8.982489725715627e-08,\n",
597 | " 'dotamasterrace': 1.7397278775596018e-08,\n",
598 | " 'hockeyquestionmark': 3.3878911299844873e-09,\n",
599 | " 'Rabbits': 2.5088707286912148e-08,\n",
600 | " 'MECoOp': 4.395102006466362e-09,\n",
601 | " 'reddevils': 1.4467210771285107e-07,\n",
602 | " 'chromeos': 2.0510476030176355e-08,\n",
603 | " 'DIY_eJuice': 3.4794557551192035e-08,\n",
604 | " 'deathgrips': 4.321850306358589e-08,\n",
605 | " 'Halfbull': 6.226394509160679e-09,\n",
606 | " 'foreverkailyn': 9.156462513471587e-09,\n",
607 | " 'malefashionadvice': 2.1206367181200195e-07,\n",
608 | " 'Cubers': 2.444775491096914e-08,\n",
609 | " 'DarkNetMarkets': 9.211401288552417e-08,\n",
610 | " 'NoFapChristians': 7.508299261046701e-09,\n",
611 | " 'Twitch': 5.200870707651862e-08,\n",
612 | " 'starbucks': 3.983061193360141e-08,\n",
613 | " 'gaymers': 2.215863928260124e-08,\n",
614 | " 'Art': 2.838503379176192e-08,\n",
615 | " 'GolfGTI': 1.5657550898036413e-08,\n",
616 | " 'Birmingham': 1.2819047518860223e-08,\n",
617 | " 'TTPloreplaycentral': 1.1903401267513065e-09,\n",
618 | " 'China': 3.616802692821277e-08,\n",
619 | " 'Mustang': 3.424516980038374e-08,\n",
620 | " 'USMC': 1.8129795776673744e-08,\n",
621 | " 'moderatepolitics': 5.493877508082953e-10,\n",
622 | " 'mylittlepony': 4.898707444707299e-08,\n",
623 | " 'NewSkaters': 5.127619007544089e-09,\n",
624 | " 'nexus6': 6.107360496485548e-08,\n",
625 | " 'RATS': 1.3368435269668517e-08,\n",
626 | " 'unitedkingdom': 2.175575493200849e-07,\n",
627 | " 'Coffee': 6.482775459537883e-08,\n",
628 | " 'hcfactions': 2.3806802535026127e-08,\n",
629 | " '4chan': 1.500744205957993e-07,\n",
630 | " 'EngineeringStudents': 4.340163231385532e-08,\n",
631 | " 'GoneWildSmiles': 3.845714255658067e-09,\n",
632 | " 'de': 3.955591805819726e-08,\n",
633 | " 'netsec': 1.419251689588096e-08,\n",
634 | " 'Chevy': 2.105986378098465e-09,\n",
635 | " 'bertstrips': 2.7469387540414763e-09,\n",
636 | " 'Metal': 5.3565305703808786e-08,\n",
637 | " 'Omnipotent_League': 2.8385033791761925e-09,\n",
638 | " 'EDC': 6.711687022374673e-08,\n",
639 | " 'gonewildcurvy': 7.801306061477792e-08,\n",
640 | " 'books': 2.267140118335565e-07,\n",
641 | " 'bitchimabus': 7.325170010777269e-10,\n",
642 | " 'GreenBayPackers': 1.2983863844102712e-07,\n",
643 | " 'ginger': 1.7397278775596015e-09,\n",
644 | " 'rangers': 3.772462555550294e-08,\n",
645 | " 'confessions': 5.4023128829482365e-09,\n",
646 | " 'soccercirclejerk': 3.479455755119203e-09,\n",
647 | " 'FoodPorn': 2.2616462408274818e-08,\n",
648 | " 'KansasCityChiefs': 9.705850264279882e-09,\n",
649 | " 'trypophobia': 6.409523759430111e-10,\n",
650 | " 'cocktails': 1.2269659768051926e-08,\n",
651 | " 'movies': 1.0877877466004246e-06,\n",
652 | " 'running': 8.71695231282495e-08,\n",
653 | " 'NCSU': 4.1204081310622145e-09,\n",
654 | " 'buildapcsales': 6.69337409734773e-08,\n",
655 | " 'army': 5.4480951955155946e-08,\n",
656 | " 'Unity2D': 4.761360507005226e-09,\n",
657 | " 'conspiracy': 1.8450271964645248e-07,\n",
658 | " 'SteamGameSwap': 4.147877518602629e-08,\n",
659 | " 'socialwork': 5.4023128829482365e-09,\n",
660 | " 'Scrolls': 1.419251689588096e-08,\n",
661 | " 'NavyBlazer': 4.852925132139941e-09,\n",
662 | " 'fantasyfootball': 4.84376866962647e-08,\n",
663 | " 'orangecounty': 9.797414889414598e-09,\n",
664 | " 'Civcraft': 4.38594554395289e-08,\n",
665 | " 'KitchenConfidential': 4.120408131062214e-08,\n",
666 | " 'transgender': 5.4023128829482365e-09,\n",
667 | " 'LegendofKorraRP': 2.0144217529637496e-09,\n",
668 | " 'gadgets': 6.510244847078298e-08,\n",
669 | " 'Philippines': 6.565183622159128e-08,\n",
670 | " 'PanicHistory': 6.043265258891248e-09,\n",
671 | " 'Velo': 9.98054413968403e-09,\n",
672 | " 'changetip': 3.1131972545803397e-09,\n",
673 | " 'gainit': 3.6808979304155786e-08,\n",
674 | " 'ShouldIbuythisgame': 2.865972766716607e-08,\n",
675 | " 'GradSchool': 1.5565986272901698e-08,\n",
676 | " 'PokemonPlaza': 1.2837360443887167e-07,\n",
677 | " 'fivenightsatfreddys': 7.90202714912598e-08,\n",
678 | " 'MakeupAddiction': 3.31280813737402e-07,\n",
679 | " 'HerbGrow': 5.7685713834871e-09,\n",
680 | " 'ACTrade': 5.1642448575979755e-08,\n",
681 | " 'WastelandPowers': 8.515510137528577e-09,\n",
682 | " 'vita': 9.064897888336871e-08,\n",
683 | " 'pettyrevenge': 2.8934421542570218e-08,\n",
684 | " 'Madden': 2.005265290450278e-08,\n",
685 | " 'InsurgenceTrades': 6.592653009699542e-09,\n",
686 | " 'DJs': 1.913700665315562e-08,\n",
687 | " 'yugioh': 8.039374086828054e-08,\n",
688 | " 'Drugs': 1.7095115512651455e-07,\n",
689 | " 'CCW': 5.219183632678805e-08,\n",
690 | " 'zen': 1.2727482893725506e-08,\n",
691 | " 'audiophile': 4.266911531277759e-08,\n",
692 | " 'Solving_A858': 1.0072108764818748e-09,\n",
693 | " 'TryingForABaby': 2.3898367160160845e-08,\n",
694 | " 'chemicalreactiongifs': 9.156462513471588e-10,\n",
695 | " 'NorthKoreaNews': 2.4722448786373287e-09,\n",
696 | " 'malelivingspace': 9.431156388875735e-09,\n",
697 | " 'electricdaisycarnival': 6.134829884025963e-09,\n",
698 | " 'promos': 9.797414889414598e-09,\n",
699 | " 'Juve': 3.479455755119203e-09,\n",
700 | " '2007scape': 2.099576854339035e-07,\n",
701 | " 'sandiego': 2.8568163042031354e-08,\n",
702 | " 'CryptoCurrency': 8.240816262124429e-09,\n",
703 | " 'frugalmalefashion': 5.1459319325710324e-08,\n",
704 | " 'soccerspirits': 2.170081615692766e-08,\n",
705 | " 'AskModerators': 7.325170010777269e-10,\n",
706 | " 'katawashoujo': 1.794666652640431e-08,\n",
707 | " 'recipes': 1.2910612143994939e-08,\n",
708 | " 'supremecommander': 9.156462513471587e-11,\n",
709 | " 'futurama': 6.6842176348342585e-09,\n",
710 | " 'self': 4.587387719249265e-08,\n",
711 | " 'Minecraft': 1.7324027075488244e-07,\n",
712 | " 'Economics': 8.277442112178316e-08,\n",
713 | " 'OSHA': 9.705850264279882e-09,\n",
714 | " 'Terraria': 4.5690747942223215e-08,\n",
715 | " 'LGG3': 4.633170031816623e-08,\n",
716 | " 'tipofmypenis': 9.33959176374102e-09,\n",
717 | " 'investing': 7.929496536666394e-08,\n",
718 | " 'russia': 3.781619018063766e-08,\n",
719 | " 'ECR_Plus': 5.585442133217669e-09,\n",
720 | " 'NoSleepOOC': 4.944489757274657e-09,\n",
721 | " 'goldenretrievers': 2.7469387540414763e-09,\n",
722 | " 'Colts': 5.081836694976731e-08,\n",
723 | " 'oaklandraiders': 3.012476166932152e-08,\n",
724 | " 'MST3K': 3.754149630523351e-09,\n",
725 | " 'Welding': 1.6390067899114142e-08,\n",
726 | " 'Columbus': 2.7744081415818912e-08,\n",
727 | " 'CannabisExtracts': 2.371523790989141e-08,\n",
728 | " 'nsfw': 1.4925033896958687e-08,\n",
729 | " 'UnsentLetters': 2.197551003233181e-09,\n",
730 | " 'overclocking': 1.684789102478772e-08,\n",
731 | " 'XMenRP': 8.790204012932724e-09,\n",
732 | " 'heat': 2.215863928260124e-08,\n",
733 | " 'nyjets': 2.5638095037720447e-08,\n",
734 | " 'IAmA': 4.5022326178739795e-07,\n",
735 | " 'europe': 2.775323787833238e-07,\n",
736 | " 'beards': 3.4062040550114306e-08,\n",
737 | " 'batman': 2.591278891312459e-08,\n",
738 | " 'bicycletouring': 1.1720272017243631e-08,\n",
739 | " 'steroids': 5.301591795300049e-08,\n",
740 | " 'knitting': 5.219183632678805e-08,\n",
741 | " 'ShatteredOath': 9.156462513471587e-11,\n",
742 | " 'circlejerkseattle': 1.0987755016165905e-09,\n",
743 | " 'ddpyoga': 5.493877508082953e-10,\n",
744 | " 'weightlifting': 2.124299303125408e-08,\n",
745 | " 'ProtectAndServe': 6.079891108945133e-08,\n",
746 | " 'RandomActsofMakeup': 3.5252380676865615e-08,\n",
747 | " 'YamakuHighSchool': 6.867346885103691e-09,\n",
748 | " 'notinteresting': 9.614285639145167e-09,\n",
749 | " 'GirlGamers': 4.047156430954441e-08,\n",
750 | " 'AskFeminists': 1.4009387645611527e-08,\n",
751 | " 'MensRights': 1.1106789028841035e-07,\n",
752 | " 'rva': 2.5821224287989877e-08,\n",
753 | " 'TrueDoTA2': 1.5016598522093403e-08,\n",
754 | " 'xbox360': 1.3734693770207382e-08,\n",
755 | " 'Wicca': 7.874557761585566e-09,\n",
756 | " 'programming': 1.7617033875919333e-07,\n",
757 | " 'eldertrees': 9.98054413968403e-09,\n",
758 | " 'vegas': 1.6756326399653007e-08,\n",
759 | " 'secretsanta': 3.3237958923901865e-08,\n",
760 | " 'simracing': 1.620693864884471e-08,\n",
761 | " 'religion': 1.3459999894803234e-08,\n",
762 | " 'AmazonUnder5': 9.156462513471587e-11,\n",
763 | " 'htcone': 1.8953877402886185e-08,\n",
764 | " 'swoleacceptance': 1.3002176769129656e-08,\n",
765 | " 'somethingimade': 1.0804625765896473e-08,\n",
766 | " 'mormon': 8.42394551239386e-09,\n",
767 | " 'twincitiessocial': 8.240816262124428e-10,\n",
768 | " 'bestof': 1.0987755016165905e-07,\n",
769 | " 'pcgaming': 1.8239673326835401e-07,\n",
770 | " 'GreenDawn': 9.156462513471588e-10,\n",
771 | " 'Conservative': 4.276067993791231e-08,\n",
772 | " 'findareddit': 4.028843505927499e-09,\n",
773 | " 'Trove': 7.691428511316134e-09,\n",
774 | " 'musictheory': 1.2635918268590791e-08,\n",
775 | " 'bindingofisaac': 1.5858993073332788e-07,\n",
776 | " 'FRC': 1.9228571278290334e-08,\n",
777 | " 'FashionReps': 6.043265258891248e-09,\n",
778 | " 'france': 6.92228566018452e-08,\n",
779 | " 'eagles': 5.2924353327865776e-08,\n",
780 | " 'MapPorn': 1.0667278828194399e-07,\n",
781 | " 'motogp': 3.937278880792783e-09,\n",
782 | " 'AirBnB': 3.296326504849771e-09,\n",
783 | " 'pkmntcgtrades': 1.4100952270746244e-08,\n",
784 | " 'BravoRealHousewives': 1.2086530517782496e-08,\n",
785 | " 'fireemblem': 5.3565305703808786e-08,\n",
786 | " 'PuzzleAndDragons': 9.431156388875735e-08,\n",
787 | " 'CringeAnarchy': 2.325741478421783e-08,\n",
788 | " 'CoonTown': 4.376789081439419e-08,\n",
789 | " 'FifaCareers': 2.820190454149249e-08,\n",
790 | " 'cirkeltrek': 2.7469387540414763e-09,\n",
791 | " 'Cyberpunk': 2.8568163042031354e-08,\n",
792 | " 'detroitlions': 5.3656870328943505e-08,\n",
793 | " 'Cynicalbrit': 6.317959134295395e-08,\n",
794 | " 'shield': 1.7397278775596018e-08,\n",
795 | " 'nbastreams': 1.8312925026943177e-09,\n",
796 | " 'bodybuilding': 1.6115374023709995e-07,\n",
797 | " 'bartenders': 9.248027138606302e-09,\n",
798 | " 'redditdota2league': 6.867346885103691e-09,\n",
799 | " 'fireemblemcasual': 1.1262448891570053e-08,\n",
800 | " 'Shitty_Car_Mods': 3.772462555550294e-08,\n",
801 | " 'ftm': 2.4630884161238568e-08,\n",
802 | " 'shittyaskscience': 2.792721066608834e-08,\n",
803 | " 'HecklerKoch': 3.6625850053886347e-10,\n",
804 | " 'Yogscast': 4.056312893467914e-08,\n",
805 | " 'PersonalFinanceCanada': 3.5160816051730897e-08,\n",
806 | " 'StarshipPorn': 2.8385033791761925e-09,\n",
807 | " 'MonsterHunter': 1.4110108733259716e-07,\n",
808 | " 'CollegeBasketball': 2.1838163094629736e-07,\n",
809 | " 'creepyPMs': 4.321850306358589e-08,\n",
810 | " 'astrophotography': 1.4925033896958687e-08,\n",
811 | " 'swimmingpools': 7.325170010777269e-10,\n",
812 | " 'BABYMETAL': 2.2891156283678967e-08,\n",
813 | " 'steak': 6.226394509160679e-09,\n",
814 | " 'AskBattlestations': 1.5565986272901698e-09,\n",
815 | " 'DnD': 1.5730802598144186e-07,\n",
816 | " 'Borderlands': 3.3878911299844876e-08,\n",
817 | " 'Pennsylvania': 5.493877508082953e-09,\n",
818 | " 'kreiswichs': 4.028843505927499e-09,\n",
819 | " 'kansascity': 1.8770748152616754e-08,\n",
820 | " 'RunningCringe': 3.6625850053886347e-10,\n",
821 | " 'StarWars': 1.0127047539899575e-07,\n",
822 | " 'raisedbynarcissists': 1.3743850232720852e-07,\n",
823 | " 'Musicthemetime': 4.944489757274657e-09,\n",
824 | " 'space': 1.0474993115411495e-07,\n",
825 | " 'jobs': 4.129564593575686e-08,\n",
826 | " 'chicagobulls': 6.198925121620264e-08,\n",
827 | " 'TheRedPill': 1.7140897825218812e-07,\n",
828 | " 'TagPro': 6.153142809052907e-08,\n",
829 | " 'gamingsuggestions': 2.3806802535026127e-08,\n",
830 | " 'fatestaynight': 1.5199727772362833e-08,\n",
831 | " 'suggestmeabook': 2.9575373918513225e-08,\n",
832 | " 'EarthPorn': 2.5821224287989877e-08,\n",
833 | " 'daddit': 1.6756326399653007e-08,\n",
834 | " 'photography': 1.2361224393186644e-07,\n",
835 | " 'GamePhysics': 7.691428511316134e-09,\n",
836 | " 'poker': 5.127619007544089e-08,\n",
837 | " 'powersofmiddleearth': 3.021632629445624e-09,\n",
838 | " 'hockeyjerseys': 8.790204012932724e-09,\n",
839 | " 'Minecraft360': 2.5638095037720444e-09,\n",
840 | " 'sydney': 5.59459859573114e-08,\n",
841 | " 'knifeclub': 4.5507618691953785e-08,\n",
842 | " 'vzla': 1.1903401267513063e-08,\n",
843 | " 'GaybrosGoneWild': 1.2269659768051926e-08,\n",
844 | " 'adventuretime': 3.0857278670399247e-08,\n",
845 | " 'NovaScotia': 1.1903401267513065e-09,\n",
846 | " 'beerwithaview': 2.7469387540414763e-10,\n",
847 | " 'FoodAddiction': 4.578231256735794e-10,\n",
848 | " 'nottheonion': 2.47499181739137e-07,\n",
849 | " 'finalfantasyx': 8.240816262124428e-10,\n",
850 | " 'CrossStitch': 1.7214149525326584e-08,\n",
851 | " 'Barca': 4.056312893467914e-08,\n",
852 | " 'meme': 1.8312925026943174e-10,\n",
853 | " 'wargame': 1.9594829778829195e-08,\n",
854 | " 'rpg': 8.213346874584013e-08,\n",
855 | " 'phillies': 3.204761879715056e-09,\n",
856 | " 'battleparty': 5.8601360086218156e-09,\n",
857 | " 'Denmark': 5.677006758352384e-08,\n",
858 | " 'NYGiants': 2.0327346779906925e-08,\n",
859 | " 'photoshopbattles': 5.1367754700575605e-08,\n",
860 | " 'COents': 6.775782259968975e-09,\n",
861 | " 'MaddenUltimateTeam': 7.059632597886593e-08,\n",
862 | " 'socialism': 5.612911520758083e-08,\n",
863 | " 'Bankruptcy': 1.8312925026943174e-10,\n",
864 | " 'tasker': 8.149251636989712e-09,\n",
865 | " 'ClubNintendoTrade': 8.607074762663292e-09,\n",
866 | " 'Winnipeg': 3.3604217424440726e-08,\n",
867 | " 'Filmmakers': 4.376789081439419e-08,\n",
868 | " 'bassnectar': 1.0896190391031188e-08,\n",
869 | " 'Dodge': 3.6625850053886353e-09,\n",
870 | " 'infj': 1.794666652640431e-08,\n",
871 | " 'LesbianGamers': 2.7469387540414763e-10,\n",
872 | " 'runescape': 1.8615088289887738e-07,\n",
873 | " 'SpaceBuckets': 6.134829884025963e-09,\n",
874 | " 'PowerShell': 6.867346885103691e-09,\n",
875 | " 'playrustpublic': 3.3878911299844873e-09,\n",
876 | " 'podemos': 8.817673400473139e-08,\n",
877 | " 'firstworldanarchists': 7.782993136450849e-09,\n",
878 | " 'Moscow': 1.8312925026943174e-10,\n",
879 | " 'CautiousBB': 1.8496054277212605e-08,\n",
880 | " 'AnimeFigures': 1.712258490019187e-08,\n",
881 | " 'origin': 1.8312925026943177e-09,\n",
882 | " 'craigslist': 1.5565986272901698e-09,\n",
883 | " 'civ': 1.1958340042593892e-07,\n",
884 | " 'asktransgender': 9.165618975985058e-08,\n",
885 | " 'starbound': 3.964748268333197e-08,\n",
886 | " 'bengals': 2.4264625660699703e-08,\n",
887 | " 'sewing': 1.5840680148305847e-08,\n",
888 | " 'NBA2k': 4.678952344383981e-08,\n",
889 | " 'startups': 3.378734667471016e-08,\n",
890 | " 'ccna': 5.585442133217669e-09,\n",
891 | " 'gonewild': 2.4713292323859815e-07,\n",
892 | " 'scifi': 4.422571394006777e-08,\n",
893 | " 'WowUI': 4.028843505927499e-09,\n",
894 | " 'Florensia': 4.578231256735794e-10,\n",
895 | " 'phish': 2.3898367160160845e-08,\n",
896 | " 'cosplay': 1.135401351670477e-08,\n",
897 | " 'BlackPeopleTwitter': 1.3872040707909454e-07,\n",
898 | " 'hubchargen': 9.064897888336872e-09,\n",
899 | " 'history': 4.175346906143043e-08,\n",
900 | " 'Astronomy': 1.5932244773440563e-08,\n",
901 | " 'brisbane': 3.799931943090709e-08,\n",
902 | " 'minecraftsuggestions': 1.2727482893725506e-08,\n",
903 | " 'hiking': 5.493877508082953e-09,\n",
904 | " 'boston': 7.215292460615611e-08,\n",
905 | " 'moto360': 2.0510476030176355e-08,\n",
906 | " 'evenewbies': 4.5782312567357936e-09,\n",
907 | " 'Gaming4Gamers': 2.4539319536103853e-08,\n",
908 | " 'PussyPass': 4.395102006466362e-09,\n",
909 | " 'Saggy': 1.0072108764818748e-09,\n",
910 | " 'fatlogic': 1.41467345833136e-07,\n",
911 | " 'cringepics': 9.68753733925294e-08,\n",
912 | " 'gravityfalls': 8.057687011854998e-09,\n",
913 | " 'homestuck': 4.8803945196803556e-08,\n",
914 | " 'MassiveCock': 9.33959176374102e-09,\n",
915 | " 'modnews': 2.197551003233181e-09,\n",
916 | " 'argentina': 5.622067983271555e-08,\n",
917 | " 'ElectricForest': 3.983061193360141e-08,\n",
918 | " 'hoggit': 1.7488843400730733e-08,\n",
919 | " 'Edmonton': 3.5252380676865615e-08,\n",
920 | " 'uberdrivers': 2.060204065531107e-08,\n",
921 | " 'EDH': 8.753578162878838e-08,\n",
922 | " 'steelers': 3.882340105711953e-08,\n",
923 | " 'DarkSouls2': 1.41467345833136e-07,\n",
924 | " 'WeAreTheMusicMakers': 6.510244847078298e-08,\n",
925 | " 'TheseAreOurAlbums': 9.156462513471587e-11,\n",
926 | " 'askgaybros': 7.947809461693338e-08,\n",
927 | " 'FIFACoins': 1.309374139426437e-08,\n",
928 | " 'economy': 1.135401351670477e-08,\n",
929 | " 'PipeTobacco': 2.0510476030176355e-08,\n",
930 | " 'CommunismWorldwide': 2.380680253502613e-09,\n",
931 | " 'uwaterloo': 1.5199727772362833e-08,\n",
932 | " 'TrollXSupport': 1.2819047518860222e-09,\n",
933 | " 'Coyotes': 5.951700633756532e-09,\n",
934 | " 'ecigclassifieds': 4.092938743521799e-08,\n",
935 | " 'Frisson': 3.937278880792783e-09,\n",
936 | " 'feminineboys': 2.65537412890676e-09,\n",
937 | " 'OutreachHPG': 4.514136019141493e-08,\n",
938 | " 'OutOfTheLoop': 5.8784489336487596e-08,\n",
939 | " 'gundeals': 1.950326515369448e-08,\n",
940 | " 'windowsphone': 1.2104843442809437e-07,\n",
941 | " 'braces': 1.0072108764818748e-09,\n",
942 | " 'RandomActsOfPizza': 7.782993136450849e-09,\n",
943 | " 'infertility': 1.3734693770207382e-08,\n",
944 | " 'business': 2.88428569174355e-08,\n",
945 | " 'formula1': 1.3872040707909454e-07,\n",
946 | " 'caps': 1.419251689588096e-08,\n",
947 | " 'Miami': 9.522721014010452e-09,\n",
948 | " 'hardstyle': 5.127619007544089e-09,\n",
949 | " 'Chipotle': 3.754149630523351e-09,\n",
950 | " 'hardbodies': 4.852925132139941e-09,\n",
951 | " 'Redskins': 1.950326515369448e-08,\n",
952 | " 'oddlysatisfying': 2.325741478421783e-08,\n",
953 | " 'cheatatmathhomework': 1.5932244773440563e-08,\n",
954 | " 'Gunners': 1.541032641017268e-07,\n",
955 | " 'TeenMFA': 2.2891156283678967e-08,\n",
956 | " 'food': 1.246194548083483e-07,\n",
957 | " 'artificial': 3.296326504849771e-09,\n",
958 | " 'StLouis': 3.2780135798228285e-08,\n",
959 | " 'PhotoshopRequest': 1.9411700528559765e-08,\n",
960 | " 'Prismata': 7.874557761585566e-09,\n",
961 | " 'india': 1.4210829820907903e-07,\n",
962 | " 'saplings': 2.0785169905580505e-08,\n",
963 | " 'CHIBears': 6.5743400846726e-08,\n",
964 | " 'rollercoasters': 3.021632629445624e-09,\n",
965 | " 'DepthHub': 4.028843505927499e-09,\n",
966 | " 'Fantasy': 9.57765978909128e-08,\n",
967 | " 'A_irsoft': 4.303537381331646e-09,\n",
968 | " 'OpiatesRecovery': 1.3917823020476812e-08,\n",
969 | " 'ukraina': 4.9902720698420153e-08,\n",
970 | " 'Denver': 3.964748268333197e-08,\n",
971 | " 'dogs': 6.098204033972077e-08,\n",
972 | " 'DIY': 9.51356455149698e-08,\n",
973 | " 'veganarchism': 5.493877508082953e-10,\n",
974 | " 'againstmensrights': 1.0896190391031188e-08,\n",
975 | " 'watchpeopledie': 4.102095206035271e-08,\n",
976 | " 'beermoney': 3.1223537170938115e-08,\n",
977 | " 'selfharmpics': 2.197551003233181e-09,\n",
978 | " 'StarWarsBattlefront': 1.4650340021554539e-09,\n",
979 | " 'quiteinteresting': 2.5638095037720444e-09,\n",
980 | " 'TagProTesting': 5.493877508082953e-10,\n",
981 | " 'AmericanHorrorStory': 5.9059183211891745e-08,\n",
982 | " 'IASIP': 3.918965955765839e-08,\n",
983 | " 'etymology': 2.4722448786373287e-09,\n",
984 | " 'wardrobepurge': 1.5565986272901698e-09,\n",
985 | " 'actuallesbians': 6.894816272644105e-08,\n",
986 | " 'creepy': 9.788258426901126e-08,\n",
987 | " 'Justrolledintotheshop': 1.2773265206292865e-07,\n",
988 | " 'marriedredpill': 7.874557761585566e-09,\n",
989 | " 'WorldofTanks': 1.031933325268248e-07,\n",
990 | " 'comedy': 1.1903401267513065e-09,\n",
991 | " 'HPfanfiction': 1.1720272017243631e-08,\n",
992 | " 'Polandballart': 1.0072108764818748e-09,\n",
993 | " 'GWCouples': 5.7685713834871e-09,\n",
994 | " 'Blackfellas': 8.149251636989712e-09,\n",
995 | " 'interestingasfuck': 7.178666610561724e-08,\n",
996 | " 'TapTitans': 9.98054413968403e-09,\n",
997 | " 'Israel': 3.4062040550114306e-08,\n",
998 | " 'FIU': 2.7469387540414763e-10,\n",
999 | " 'gtaglitches': 1.1720272017243631e-08,\n",
1000 | " 'askcarsales': 2.5088707286912148e-08,\n",
1001 | " 'Amd': 1.3276870644533802e-08,\n",
1002 | " 'GOTHEFTOSLEEP': 1.8312925026943174e-10,\n",
1003 | " 'Freethought': 3.479455755119203e-09,\n",
1004 | " 'mildlyinfuriating': 4.7613605070052254e-08,\n",
1005 | " 'teslamotors': 3.2780135798228285e-08,\n",
1006 | " 'nosleep': 8.103469324422355e-08,\n",
1007 | " 'Ultralight': 8.332380887259144e-09,\n",
1008 | " 'comicbooks': 1.5163101922308948e-07,\n",
1009 | " 'singing': 7.2336053856425545e-09,\n",
1010 | " 'kemonomimi': 1.2819047518860222e-09,\n",
1011 | " 'ApocalypseRising': 2.4081496410430276e-08,\n",
1012 | " 'raspberry_pi': 2.7286258290145332e-08,\n",
1013 | " 'GoogleCardboard': 5.4023128829482365e-09,\n",
1014 | " 'airsoft': 7.3801087858581e-08,\n",
1015 | " 'cosplayers': 2.8385033791761925e-09,\n",
1016 | " 'archeage': 6.775782259968975e-08,\n",
1017 | " 'dragonage': 2.05928841927976e-07,\n",
1018 | " 'sysadmin': 1.8267142714375816e-07,\n",
1019 | " 'nova': 2.270802703340954e-08,\n",
1020 | " 'MovieSuggestions': 9.431156388875735e-09,\n",
1021 | " 'thinkpad': 1.2819047518860223e-08,\n",
1022 | " 'rupaulsdragrace': 1.0685591753221342e-07,\n",
1023 | " 'ABraThatFits': 2.4905578036642717e-08,\n",
1024 | " 'Cuckold': 5.0360543824093735e-09,\n",
1025 | " 'Documentaries': 6.830721035049804e-08,\n",
1026 | " 'HistoryPorn': 4.834612207112998e-08,\n",
1027 | " 'spikes': 7.105414910453952e-08,\n",
1028 | " 'PlantedTank': 2.9575373918513225e-08,\n",
1029 | " 'furry': 6.565183622159128e-08,\n",
1030 | " 'CrusaderKings': 6.92228566018452e-08,\n",
1031 | " 'NoStupidQuestions': 1.0291863865142065e-07,\n",
1032 | " 'soarelneo': 7.325170010777269e-10,\n",
1033 | " 'whiskey': 4.395102006466362e-09,\n",
1034 | " 'peacecorps': 4.486666631601077e-09,\n",
1035 | " 'OnePiece': 8.781047550419253e-08,\n",
1036 | " 'French': 9.522721014010452e-09,\n",
1037 | " 'C25K': 1.419251689588096e-08,\n",
1038 | " 'SigSauer': 3.479455755119203e-09,\n",
1039 | " 'VentGrumps': 1.6115374023709997e-08,\n",
1040 | " 'AskHistorians': 5.301591795300049e-08,\n",
1041 | " '8BallMC': 3.6625850053886347e-10,\n",
1042 | " 'weekendgunnit': 1.867918352748204e-08,\n",
1043 | " 'newjersey': 2.7011564414741183e-08,\n",
1044 | " 'PJRP_Community': 3.845714255658067e-09,\n",
1045 | " 'NoMansSkyTheGame': 1.996108827936806e-08,\n",
1046 | " 'ravens': 5.979170021296946e-08,\n",
1047 | " 'TalesFromYourServer': 2.417306103556499e-08,\n",
1048 | " 'BigBrother': 6.043265258891248e-09,\n",
1049 | " 'DebateAnAtheist': 3.543550992713504e-08,\n",
1050 | " 'latin': 4.395102006466362e-09,\n",
1051 | " 'toronto': 1.319446248191256e-07,\n",
1052 | " 'HeresAFunFact': 2.4722448786373287e-09,\n",
1053 | " 'reptiles': 1.1720272017243631e-08,\n",
1054 | " 'incremental_games': 1.9869523654233345e-08,\n",
1055 | " 'tampa': 1.9686394403963914e-08,\n",
1056 | " 'Anticonsumption': 4.761360507005226e-09,\n",
1057 | " 'ipad': 9.797414889414598e-09,\n",
1058 | " 'railroading': 5.493877508082953e-10,\n",
1059 | " 'soylent': 1.1262448891570053e-08,\n",
1060 | " 'Hunting': 2.0510476030176355e-08,\n",
1061 | " 'freedonuts': 4.5782312567357936e-09,\n",
1062 | " 'Chattanooga': 6.501088384564828e-09,\n",
1063 | " 'thewestwing': 2.5638095037720444e-09,\n",
1064 | " 'SkincareAddiction': 1.194002711756695e-07,\n",
1065 | " 'Owls': 1.8312925026943174e-10,\n",
1066 | " 'cowboys': 1.0804625765896473e-07,\n",
1067 | " 'lacrosse': 6.3179591342953955e-09,\n",
1068 | " 'carporn': 2.600435353825931e-08,\n",
1069 | " 'love': 7.325170010777269e-10,\n",
1070 | " 'opieandanthony': 5.850979546108345e-08,\n",
1071 | " 'TrueReddit': 8.918394488121326e-08,\n",
1072 | " 'EDM': 1.2635918268590791e-08,\n",
1073 | " 'LasVegas': 1.2819047518860222e-09,\n",
1074 | " 'bigboobproblems': 9.156462513471587e-09,\n",
1075 | " 'Boxing': 4.266911531277759e-08,\n",
1076 | " 'landscaping': 1.2819047518860222e-09,\n",
1077 | " 'snowboarding': 4.5324489441684354e-08,\n",
1078 | " 'netflix': 1.107931964130062e-08,\n",
1079 | " 'randomactsofsteam': 2.2891156283678968e-09,\n",
1080 | " 'dfsports': 5.4297822704886515e-08,\n",
1081 | " 'FeedTheBeastCrashes': 8.240816262124428e-10,\n",
1082 | " 'AndroidGaming': 2.948380929337851e-08,\n",
1083 | " 'Nootropics': 3.5618639177404477e-08,\n",
1084 | " 'ghibli': 1.6481632524248856e-09,\n",
1085 | " 'applehelp': 2.4722448786373283e-08,\n",
1086 | " 'FNAFfangames': 1.1903401267513065e-09,\n",
1087 | " 'NZXT': 2.5638095037720444e-09,\n",
1088 | " 'WarshipPorn': 1.9411700528559765e-08,\n",
1089 | " 'ShitAmericansSay': 2.637061203879817e-08,\n",
1090 | " 'Unity3D': 2.4539319536103853e-08,\n",
1091 | " 'Patriots': 2.0080122292043194e-07,\n",
1092 | " 'darksouls': 9.852353664495427e-08,\n",
1093 | " 'DeadBedrooms': 3.671741467902106e-08,\n",
1094 | " 'nexus4': 9.248027138606302e-09,\n",
1095 | " 'cade': 5.493877508082953e-09,\n",
1096 | " 'Frozen': 1.1720272017243631e-08,\n",
1097 | " 'vapeitforward': 8.240816262124428e-10,\n",
1098 | " 'gradadmissions': 4.761360507005226e-09,\n",
1099 | " 'AnimalCrossing': 2.4630884161238568e-08,\n",
1100 | " 'guitarpedals': 4.862081594653413e-08,\n",
1101 | " 'GetMotivated': 3.708367317955993e-08,\n",
1102 | " 'teslore': 2.2341768532870675e-08,\n",
1103 | " 'eu4': 9.248027138606303e-08,\n",
1104 | " 'Kanon': 9.156462513471587e-11,\n",
1105 | " 'ComicWriting': 3.6625850053886347e-10,\n",
1106 | " 'prephysicianassistant': 1.4650340021554539e-09,\n",
1107 | " 'raiseyourdongers': 6.3179591342953955e-09,\n",
1108 | " 'dubai': 1.3368435269668517e-08,\n",
1109 | " 'Bayonetta': 7.325170010777269e-10,\n",
1110 | " 'survivor': 3.772462555550294e-08,\n",
1111 | " 'Nisekoi': 6.6842176348342585e-09,\n",
1112 | " 'Sexsells': 6.958911510238406e-09,\n",
1113 | " 'hentai': 6.3179591342953955e-09,\n",
1114 | " 'holdmybeer': 9.614285639145167e-09,\n",
1115 | " 'LearnJapanese': 2.5363401162316297e-08,\n",
1116 | " 'acturnips': 1.5840680148305847e-08,\n",
1117 | " 'oneplusphotos': 2.7469387540414763e-10,\n",
1118 | " 'HouseOfCards': 8.790204012932724e-09,\n",
1119 | " 'newzealand': 9.34874822625449e-08,\n",
1120 | " 'PersonOfInterest': 2.124299303125408e-08,\n",
1121 | " 'GirlswithNeonHair': 6.409523759430111e-10,\n",
1122 | " 'SimplePlanes': 3.1131972545803397e-09,\n",
1123 | " 'stlouisblues': 2.215863928260124e-08,\n",
1124 | " 'linguistics': 1.0255238015088178e-08,\n",
1125 | " 'Dualsport': 7.2336053856425545e-09,\n",
1126 | " 'triplej': 2.380680253502613e-09,\n",
1127 | " 'fatpeoplestories': 4.404258468979833e-08,\n",
1128 | " ...})"
1129 | ]
1130 | },
1131 | "execution_count": 8,
1132 | "metadata": {},
1133 | "output_type": "execute_result"
1134 | }
1135 | ],
1136 | "source": [
1137 | "subreddit_bias"
1138 | ]
1139 | },
1140 | {
1141 | "cell_type": "code",
1142 | "execution_count": null,
1143 | "metadata": {},
1144 | "outputs": [],
1145 | "source": [
1146 | "x_i_j = defaultdict\n",
1147 | "for user in users:\n",
1148 | " for subreddit in subreddits:\n",
1149 | " \n",
1150 | " "
1151 | ]
1152 | }
1153 | ],
1154 | "metadata": {
1155 | "kernelspec": {
1156 | "display_name": "Python 3",
1157 | "language": "python",
1158 | "name": "python3"
1159 | },
1160 | "language_info": {
1161 | "codemirror_mode": {
1162 | "name": "ipython",
1163 | "version": 3
1164 | },
1165 | "file_extension": ".py",
1166 | "mimetype": "text/x-python",
1167 | "name": "python",
1168 | "nbconvert_exporter": "python",
1169 | "pygments_lexer": "ipython3",
1170 | "version": "3.6.0"
1171 | }
1172 | },
1173 | "nbformat": 4,
1174 | "nbformat_minor": 2
1175 | }
1176 |
--------------------------------------------------------------------------------