├── .github
    └── ISSUE_TEMPLATE
    │   ├── bug_report.md
    │   ├── feature_request.md
    │   └── question.md
├── .gitignore
├── CODE_OF_CONDUCT.md
├── CONTRIBUTING.md
├── LICENSE
├── Pipfile
├── README.md
├── collector.py
├── config_template.yml
├── create_dense_result.sql
├── create_node_view.sql
├── database_handler.py
├── empty_keys.json
├── exceptions.py
├── functional_test.py
├── helpers.py
├── make_config.py
├── make_test_tweet_jsons.py
├── passwords_template.py
├── seed_with_lots_of_friends.csv
├── seeds.csv
├── seeds_empty.csv
├── seeds_template.csv
├── seeds_test.csv
├── setup.py
├── setup_server.sh
├── start.py
├── test_helpers.py
├── test_run.sh
├── tests
    ├── config_test_empty.yml
    └── tests.py
├── twauth.py
├── two_seeds.csv
└── wrong_tokens.csv


/.github/ISSUE_TEMPLATE/bug_report.md:
--------------------------------------------------------------------------------
 1 | ---
 2 | name: Bug report
 3 | about: Create a report to help us improve
 4 | title: ''
 5 | labels: ''
 6 | assignees: ''
 7 | 
 8 | ---
 9 | 
10 | **Describe the bug**
11 | A clear and concise description of what the bug is.
12 | 
13 | **Possibly related issues I have found under "Issues":**
14 | Use the search function in Issues to find related issues (also closed ones).
15 | 
16 | **To Reproduce**
17 | Steps to reproduce the behavior:
18 | 
19 | 
20 | **Expected behavior**
21 | A clear and concise description of what you expected to happen.
22 | 
23 | **Screenshots**
24 | If applicable, add screenshots/command line outputs to help explain your problem.
25 | 
26 | **Desktop (please complete the following information):**
27 |  - OS: [e.g. iOS]
28 | - Python: [e.g. 3.6]
29 | [- contents of Pipfile and lockfile]
30 | 
31 | **Additional context**
32 | Add any other context about the problem here.
33 | 


--------------------------------------------------------------------------------
/.github/ISSUE_TEMPLATE/feature_request.md:
--------------------------------------------------------------------------------
 1 | ---
 2 | name: Feature request
 3 | about: Suggest an idea for this project
 4 | title: ''
 5 | labels: ''
 6 | assignees: ''
 7 | 
 8 | ---
 9 | 
10 | **Is your feature request related to a problem? Please describe.**
11 | A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]
12 | 
13 | **Possibly related issues/feature requests I have found under "Issues":**
14 | Use the search function in issues to find related issues (also closed ones).
15 | 
16 | **Describe the solution you'd like**
17 | A clear and concise description of what you want to happen.
18 | 
19 | **Describe alternatives you've considered**
20 | A clear and concise description of any alternative solutions or features you've considered.
21 | 
22 | **Additional context**
23 | Add any other context or screenshots about the feature request here.
24 | 


--------------------------------------------------------------------------------
/.github/ISSUE_TEMPLATE/question.md:
--------------------------------------------------------------------------------
 1 | ---
 2 | name: Question
 3 | about: Ask a question that is not answered by the documentation, the readme, or related publications.
 4 | title: ''
 5 | labels: ''
 6 | assignees: ''
 7 | 
 8 | ---
 9 | 
10 | **I have a question regarding:**
11 | 
12 | **This is my question:**
13 | 
14 | **I already have consulted the following resources (e.g. README, documentation, linked talks, linked articles, stackoverflow), from which my understanding is that …:**
15 | 


--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
 1 | environment.yml
 2 | .idea/
 3 | .python-version
 4 | __pycache__/
 5 | keys*.*
 6 | token*.csv
 7 | .swp
 8 | *.db
 9 | passwords.py
10 | config_bu_perm.yml
11 | test_config.yml
12 | tests/tweet_jsons
13 | *latest_seeds*
14 | seeds_de_ids.csv
15 | user_ids_de.csv
16 | results/
17 | Pipfile.lock.bk
18 | config.yml
19 | Pipfile.lock
20 | .vscode/settings.json
21 | 


--------------------------------------------------------------------------------
/CODE_OF_CONDUCT.md:
--------------------------------------------------------------------------------
 1 | # Contributor Covenant Code of Conduct
 2 | 
 3 | ## Our Pledge
 4 | 
 5 | In the interest of fostering an open and welcoming environment, we as
 6 | contributors and maintainers pledge to making participation in our project and
 7 | our community a harassment-free experience for everyone, regardless of age, body
 8 | size, disability, ethnicity, sex characteristics, gender identity and expression,
 9 | level of experience, education, socio-economic status, nationality, personal
10 | appearance, race, religion, or sexual identity and orientation.
11 | 
12 | ## Our Standards
13 | 
14 | Examples of behavior that contributes to creating a positive environment
15 | include:
16 | 
17 | * Using welcoming and inclusive language
18 | * Being respectful of differing viewpoints and experiences
19 | * Gracefully accepting constructive criticism
20 | * Focusing on what is best for the community
21 | * Showing empathy towards other community members
22 | 
23 | Examples of unacceptable behavior by participants include:
24 | 
25 | * The use of sexualized language or imagery and unwelcome sexual attention or
26 |  advances
27 | * Trolling, insulting/derogatory comments, and personal or political attacks
28 | * Public or private harassment
29 | * Publishing others' private information, such as a physical or electronic
30 |  address, without explicit permission
31 | * Other conduct which could reasonably be considered inappropriate in a
32 |  professional setting
33 | 
34 | ## Our Responsibilities
35 | 
36 | Project maintainers are responsible for clarifying the standards of acceptable
37 | behavior and are expected to take appropriate and fair corrective action in
38 | response to any instances of unacceptable behavior.
39 | 
40 | Project maintainers have the right and responsibility to remove, edit, or
41 | reject comments, commits, code, wiki edits, issues, and other contributions
42 | that are not aligned to this Code of Conduct, or to ban temporarily or
43 | permanently any contributor for other behaviors that they deem inappropriate,
44 | threatening, offensive, or harmful.
45 | 
46 | ## Scope
47 | 
48 | This Code of Conduct applies both within project spaces and in public spaces
49 | when an individual is representing the project or its community. Examples of
50 | representing a project or community include using an official project e-mail
51 | address, posting via an official social media account, or acting as an appointed
52 | representative at an online or offline event. Representation of a project may be
53 | further defined and clarified by project maintainers.
54 | 
55 | ## Enforcement
56 | 
57 | Instances of abusive, harassing, or otherwise unacceptable behavior may be
58 | reported by contacting the project team at f.muench@leibniz-hbi.de. All
59 | complaints will be reviewed and investigated and will result in a response that
60 | is deemed necessary and appropriate to the circumstances. The project team is
61 | obligated to maintain confidentiality with regard to the reporter of an incident.
62 | Further details of specific enforcement policies may be posted separately.
63 | 
64 | Project maintainers who do not follow or enforce the Code of Conduct in good
65 | faith may face temporary or permanent repercussions as determined by other
66 | members of the project's leadership.
67 | 
68 | ## Attribution
69 | 
70 | This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4,
71 | available at https://www.contributor-covenant.org/version/1/4/code-of-conduct.html
72 | 
73 | [homepage]: https://www.contributor-covenant.org
74 | 
75 | For answers to common questions about this code of conduct, see
76 | https://www.contributor-covenant.org/faq
77 | 


--------------------------------------------------------------------------------
/CONTRIBUTING.md:
--------------------------------------------------------------------------------
 1 | # Contributor's Guidelines
 2 | 
 3 | Contributions are possible in form of Issues or Pull Requests.
 4 | 
 5 | For Issues, please:
 6 | 
 7 | 1. Read the documentation and/or the README.
 8 | 2. Search for your problem in the issues section first.
 9 | 3. If you cannot find a solution/answer yourself with these resources, raise an issue.
10 | 
11 | For PRs:
12 | 
13 | 1. Every PR must contain a passing unit-test or an adaption of existing tests that tests the proposed changes. All existing tests must pass. (If you are unfamiliar with Test Driven Development (TDD) read [this](https://code.tutsplus.com/tutorials/beginning-test-driven-development-in-python--net-30137) and maybe the first chapters of [this](https://www.oreilly.com/library/view/test-driven-development-with/9781449365141/)).
14 | 2. Keep your commits small.
15 | 3. document your code, preferably with Google style docstrings (https://sphinxcontrib-napoleon.readthedocs.io/en/latest/example_google.html),
16 |    at least though with intuitive comments
17 | 4. keep your code as readable as possible (https://docs.python-guide.org/writing/style/) and
18 | 5. compliant with flake8 (http://flake8.pycqa.org/en/latest/index.html#)
19 | 
20 | By submitting a pull request to this repository, you agree to license your contribution under the MIT license of this project.
21 | 
22 | By contributing you agree to follow the [Code of Conduct of this project](CODE_OF_CONDUCT.md). The project owner(s) reserve the right to exclude/ban contributors from this project not only, but especially if they violate the CoC.
23 | 


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) 2018 Felix Victor Münch
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 | 


--------------------------------------------------------------------------------
/Pipfile:
--------------------------------------------------------------------------------
 1 | [[source]]
 2 | url = "https://pypi.org/simple"
 3 | verify_ssl = true
 4 | name = "pypi"
 5 | 
 6 | [packages]
 7 | pandas = "*"
 8 | tweepy = "<4"
 9 | pyyaml = "<6"
10 | sqlalchemy = "<2"
11 | pymysql = "*"
12 | argparse = "*"
13 | urllib3 = ">=1.26.5"
14 | 
15 | [dev-packages]
16 | "flake8" = "*"
17 | pytest = "*"
18 | isort = "*"
19 | ipython = ">=8.10"
20 | "autopep8" = "*"
21 | pydocstyle = "*"
22 | 
23 | [requires]
24 | python_version = ">=3.8"
25 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | ![LOGO](https://upload.wikimedia.org/wikipedia/commons/thumb/4/48/Radishes.svg/173px-Radishes.svg.png)
  2 | 
  3 | # RADICES
  4 | 
  5 | This software prototype creates an explorative sample of core accounts in (optionally language-based) Twitter follow networks.
  6 | 
  7 | If you use this for your research please cite [the article](https://journals.sagepub.com/doi/full/10.1177/2056305120984475) and/or [cite the software itself](https://doi.org/10.6084/m9.figshare.8864777).
  8 | 
  9 | ## Why is this useful and how does this work?
 10 | 
 11 | In this journal article we explain the underlying method and draw a map of the German Twittersphere:
 12 | 
 13 | https://journals.sagepub.com/doi/full/10.1177/2056305120984475
 14 | 
 15 | This talk sums up the article:
 16 | 
 17 | https://youtu.be/qsnGTl8d3qU?t=21823.
 18 | 
 19 | A short usage demo that was prepared for the ICA conference in 2020 can be found here:
 20 | 
 21 | https://www.youtube.com/watch?v=i_p-tjvmrR4
 22 | 
 23 | (**PLEASE NOTE:** The language specification is not working as it did for our paper due to changes in the Twitter API. Now it uses the language of the last tweet (or optionally the last 200 tweets with a threshold fraction defined by you to avoid false positives) by a user as determined by Twitter instead of the interface language. This might lead to different results.)
 24 | 
 25 | A large-scale test of the 'bootstrapping' feature regarding the preservation of k-coreness-ranking of the sampled accounts is presented here:
 26 | 
 27 | https://youtu.be/sV8Giaj9UwI (video for IC2S2 2020)
 28 | 
 29 | A test of the sampling method across language communities (Italian and German) can be watched here:
 30 | 
 31 | https://www.youtube.com/watch?v=dhXRO2d1Eno (video for AoIR 2020)
 32 | 
 33 | Please feel free to open an issue or comment if you have any questions.
 34 | 
 35 | Moreover, if you find any bugs, you are invited to report them as an [Issue](https://github.com/FlxVctr/SparseTwitter/issues).
 36 | 
 37 | Before contributing/raising an issue, please read the [Contributor Guidelines](CONTRIBUTING.md).
 38 | 
 39 | ## Installation & Usage
 40 | 1. [Create a Twitter Developer app](https://developer.twitter.com/en/docs/basics/getting-started)
 41 | 2. Set up your virtual environment with [pipenv](https://pipenv.readthedocs.io/en/latest/) [(see here)](#Create-Virtual-Environment-with-Pipenv)
 42 | 3. Have users authorise your app (the more the better - at least one) [(see here)](#authorise-app--get-tokens)
 43 | 4. [Set up a mysql Database locally or online](https://dev.mysql.com/doc/mysql-getting-started/en/).
 44 | 5. Fill out config.yml according to your requirements [(see here)](#configuration-configyml)
 45 | 6. Fill out the seeds_template with your starting seeds or use the given ones [(see here)](#Indicate-starting-seeds-for-the-walkers)
 46 | 7. [Start software](#Start), be happy
 47 | 8. (Develop the app further - [run tests](#Testing))
 48 | 
 49 | ### Create Virtual Environment with Pipenv
 50 | We recommend installing [pipenv](https://pipenv.readthedocs.io/en/latest) (including the installation of pyenv) to create a virtual environment with all the required packages in the respective versions.
 51 | After installing pipenv, navigate to the project directory and run:
 52 | 
 53 | ```
 54 | pipenv install
 55 | ```
 56 | This creates a virtual environment and installs the packages specified in the Pipfile.
 57 | 
 58 | Run
 59 | ```
 60 | pipenv shell
 61 | ```
 62 | to start a shell in the virtual environment.
 63 | 
 64 | ### Authorise App & Get Tokens
 65 | This app is based on a [Twitter Developer](https://developer.twitter.com/) app. To use it you have to first create a Twitter app.
 66 | Once you did that, your Consumer API Key and Secret have to be pasted into a `keys.json`, for which you can copy `empty_keys.json` (do not delete or change this file if you want to use the developer tests).
 67 | You are now ready to have users authorize your app so that it will get more API calls. To do so, run
 68 | ```
 69 | python twauth.py
 70 | ```
 71 | This will open a link to Twitter that requires you (or someone else) to log in with their Twitter account. Once logged in, a 6-digit authorisation key will be shown on the screen. This key has to be entered into the console window where `twauth.py` is still running. After the code was entered, a new token will be added to the `tokens.csv` file. For this software to run, the app has to be authorised by at least one Twitter user.
 72 | 
 73 | ### Configuration (config.yml)
 74 | After setting up your mysql database, copy `config_template.yml` to a file named `config.yml` and enter the database information. Do not change the dbtype argument since at the moment, only mySQL databases are supported.
 75 | Note that the password field is required (this also means that your database has to be password-protected). If no password is given (even is none is needed for the database), the app will raise an Exception.
 76 | 
 77 | You can also indicate which Twitter user account details you want to collect. Those will be stored in a database table called `user_details`. By default, the software has to collect account id, follower count, account creation time and account tweets count at the moment and you have to activate those by uncommenting in the config. If you wish to collect more user details, just enter the mysql type after the colon (":") of the respective user detail in the list. The suggested type is already indicated in the comment in the respective line. Note, however, that collecting huge amounts of data has not been tested with all the user details being collected, so we do not guarantee the code to work with them. Moreover, due to Twitter API changes, some of the user details may become private properties, thus not collectable any more through the API.
 78 | 
 79 | If you have a mailgun account, you can also add your details at the bottom of the `config.yml`. If you do so, you will receive an email when the software encounters an error.
 80 | 
 81 | ### Indicate starting seeds for the walkers
 82 | The algorithm needs seeds (i.e. Twitter Account IDs) to draw randomly from when initialising the walkers or when it reached an impasse. These seeds have to be specified in `seeds.csv`. One Twitter account ID per line. Feel free to use `seeds_template.csv` (and rename it to `seeds.csv`) to replace the existing seeds which are 200 randomly drawn accounts from the TrISMA dataset (Bruns, Moon, Münch & Sadkowsky, 2017) that use German as interface language.
 83 | 
 84 | Note that the `seeds.csv` at least have to contain that many account IDs as walkers should run in parallel. We suggest using at least 100 seeds, the more the better (we used 15.000.000). However, since a recent update, the algorithm can gather ('bootstrap') its own seeds and there is no need to give a comprehensive seed list. This changes the quality of the sample (for the worse or the better is subject of ongoing research), however, it makes it a very powerful exploratory tool.
 85 | 
 86 | ## Start
 87 | 
 88 | **PLEASE NOTE:** The language specification is not working as it did for our paper due to changes in the Twitter API. Now it uses the language of the last tweet(s) by a user as determined by Twitter instead of the interface language. This might lead to different results from our paper (even though the macrostructures of a certain network should remain very similar).
 89 | 
 90 | Run (while you are in the pipenv virtual environment)
 91 | ```
 92 | python start.py -n 2 -l de it -lt 0.05 -p 1 -k "keyword1" "keyword2" "keyword3"
 93 | ```
 94 | where
 95 | 
 96 | * -n takes the number of seeds to be drawn from the seed pool,
 97 | * -l can set the Twitter accounts's last [status languages](https://developer.twitter.com/en/docs/developer-utilities/supported-languages/api-reference/get-help-languages) that are of your interest,
 98 | * -lt defines a fraction of tweets within the last 200 tweets that has to be detected to be in the requested languages (might slow down collection)
 99 | * -k can be used to only follow paths to seeds who used defined keywords in their last 200 tweets (keywords are interpreted as [regexes](https://docs.python.org/3/howto/regex.html), ignoring case)
100 | * and -p the number of pages to look at when identifying the next node. For explanation of advanced usage and more features (like 'bootstrapping', an approach, reminiscent of snowballing, to grow the seed pool) use
101 | 
102 | ```
103 | python start.py --help
104 | ```
105 | which will show a help dialogue with explanations and default values. Please raise an issue if those should not be clear enough.
106 | 
107 | Note:
108 | - If the program freezes after saying "Starting x Collectors", it is likely that either your keys.json or your tokens.csv contains wrong information. We work on a solution that is more user-friendly!
109 | - If you get an error saying "lookup_users() got an unexpected keyword argument", you likely have the wrong version of tweepy installed. Either update your tweepy package or use pipenv to create a virtual environment and install all the packages you need.
110 | - If at some point an error is encountered: There is a -r (restart with latest seeds) option to resume collection after interrupting the crawler with `control-c`. This is also handy in case you need to reboot your machine. **Note that you will still have to define the other parameters as you did when you started the collection the first time.**
111 | 
112 | ## Analysis (with Gephi)
113 | 
114 | It is possible to import the data into tools like Gephi via a MySQL connector. However, Gephi apparently supports only MySQL 5 at the time of writing.
115 | 
116 | To do so, it is helpful to use [`create_node_view.sql`](https://github.com/FlxVctr/RADICES/blob/master/create_node_view.sql) and [`create_dense_result.sql`](https://github.com/FlxVctr/RADICES/blob/master/create_dense_result.sql) to create views for Gephi to import.
117 | 
118 | Then you can import the results, in the case of Gephi via the menu item **File -> Import Database -> Edge List**, using your database credentials and
119 | 
120 | * `SELECT * FROM nodes` as the "Node Query"
121 | * `SELECT * FROM result` as the "Edge Query" if you want to analyse the walked edges only (as done in the German Twittersphere paper)
122 | * `SELECT * FROM dense_result` as the "Edge Query" if you want to analyse all edges between collected accounts (which will be a much denser network)
123 | 
124 | Other tables created by RADICES in the database that might be interesting for analysis are:
125 | 
126 | * **result**: edge list (columns: source,target) containing the Twitter IDs of walked accounts
127 | * **friends**: cache of collected follow connections, up to p * 5000 connections per walked account (might contain connections to accounts which do not fulfill language or keyword criteria)
128 | * **user_details**: user details cache, as defined in `config.yml` of all accounts in **result** and **friends** (might contain not deleted data from accounts which do not fulfill language or keyword criteria)
129 | 
130 | Other tables contain only data that is necessary for internal functions.
131 | 
132 | ## Testing
133 | 
134 | For development purposes. Note that you still need a functional (i.e. filled out) `keys.json` and tokens indicated in `tokens.csv` to work with.
135 | Moreover, for some tests to run through, some user details json files are needed. They have to be stored in `tests/tweet_jsons/` and can be downloaded by running
136 | ```
137 | python make_test_tweet_jsons.py -s 1670174994
138 | ```
139 | where -s stands for the seed to use and can be replaced by any Twitter seed of your choice.
140 | Note: the name `tweet_jsons` is misleading, since the json files actually contain information about specific users (friends of the given seed). This will be changed in a later version.
141 | 
142 | ### passwords.py
143 | Before testing, please re-enter the password of the sparsetwitter mySQL user into the `passwords_template.py`. Then, rename it into `passwords.py`. If you would like to make use (and test) mailgun notifications, please also enter the relevant information as well.
144 | 
145 | ### Local mysql database
146 | Some of the tests try to connect to a local mySQL database using the user "sparsetwitter@localhost". For these tests to run properly it is required that a mySQL server actually runs on the device and that a user 'sparsetwitter'@'localhost' with relevant permissions exists.
147 | 
148 | Please refer to the [mySQL documentation](https://dev.mysql.com/doc/mysql-installation-excerpt/5.5/en/installing.html) on how to install mySQL on your system. If your mySQL server is up and running, the following command will create the user 'sparsetwitter' and will give it full permissions (replace "<your password>" with a password):
149 | 
150 | ```
151 | CREATE USER 'sparsetwitter'@'localhost' IDENTIFIED BY '<your password>'; GRANT ALL ON *.* TO 'sparsetwitter'@'localhost' WITH GRANT OPTION;
152 | ```
153 | 
154 | ### Tests that will fail
155 | For the functional tests, the test `FirstUseTest.test_restarts_after_exception` will fail if you did not provide (or did not provide valid) Mailgun credentials. Also one unit-test will fail in this case.
156 | 
157 | ### Running the tests
158 | To run the tests, just type
159 | 
160 | ```
161 | python functional_test.py
162 | ```
163 | 
164 | and / or
165 | 
166 | ```
167 | python tests/tests.py -s
168 | ```
169 | The -s parameter is for skipping API call-draining tests. Note that even if -s is set, the tests can take very long to run if only few API tokens are given in the tokens.csv. The whole software relies on a sufficiently high number of tokens. We used 15.
170 | 
171 | ## Disclaimer
172 | By submitting a pull request to this repository, you agree to license your contribution under the MIT license (as this project is).
173 | 
174 | The "Logo" above is from https://commons.wikimedia.org/wiki/File:Radishes.svg and licensed as being in the public domain ([CC0](https://creativecommons.org/publicdomain/zero/1.0/deed.en)).
175 | 
176 | 


--------------------------------------------------------------------------------
/collector.py:
--------------------------------------------------------------------------------
   1 | import multiprocessing.dummy as mp
   2 | import time
   3 | from exceptions import TestException
   4 | from functools import wraps
   5 | from sys import stdout, stderr
   6 | 
   7 | import numpy as np
   8 | import pandas as pd
   9 | import tweepy
  10 | from sqlalchemy.exc import IntegrityError, ProgrammingError
  11 | 
  12 | from database_handler import DataBaseHandler
  13 | from helpers import friends_details_dtypes
  14 | from setup import FileImport
  15 | 
  16 | # mp.set_start_method('spawn')
  17 | 
  18 | 
  19 | def get_latest_tweets(user_id, connection, fields=['lang', 'full_text']):
  20 | 
  21 |     statuses = connection.api.user_timeline(user_id=user_id, count=200, tweet_mode='extended')
  22 | 
  23 |     result = pd.DataFrame(columns=fields)
  24 | 
  25 |     for status in statuses:
  26 |         result = result.append({field: getattr(status, field) for field in fields},
  27 |                                ignore_index=True)
  28 | 
  29 |     return result
  30 | 
  31 | 
  32 | def get_fraction_of_tweets_in_language(tweets):
  33 |     """Returns fraction of languages in a tweet dataframe as a dictionary
  34 | 
  35 |     Args:
  36 |         tweets (pandas.DataFrame): Tweet DataFrame as returned by `get_latest_tweets`
  37 |     Returns:
  38 |         language_fractions (dict): {languagecode (str): fraction (float)}
  39 |     """
  40 | 
  41 |     language_fractions = tweets['lang'].value_counts(normalize=True)
  42 | 
  43 |     language_fractions = language_fractions.to_dict()
  44 | 
  45 |     return language_fractions
  46 | 
  47 | 
  48 | # TODO: there might be a better way to drop columns that we don't want than flatten everything
  49 | # and removing the columns thereafter.
  50 | def flatten_json(y: dict, columns: list, sep: str = "_",
  51 |                  nonetype: dict = {'date': None, 'num': None, 'str': None, 'bool': None}):
  52 |     '''
  53 |     Flattens nested dictionaries.
  54 |     adapted from: https://medium.com/@amirziai/flattening-json-objects-in-python-f5343c794b10
  55 |     Attributes:
  56 |         y (dict): Nested dictionary to be flattened.
  57 |         columns (list of str): Dictionary keys that should not be flattened.
  58 |         sep (str): Separator for new dictionary keys of nested structures.
  59 |         nonetype (Value): specify the value that should be used if a key's value is None
  60 |     '''
  61 | 
  62 |     out = {}
  63 | 
  64 |     def flatten(x, name=''):
  65 |         if type(x) is dict and str(name[:-1]) not in columns:  # don't flatten nested fields
  66 |             for a in x:
  67 |                 flatten(x[a], name + a + sep)
  68 |         elif type(x) is list and str(name[:-1]) not in columns:  # same
  69 |             i = 0
  70 |             for a in x:
  71 |                 flatten(a, name + str(i) + sep)
  72 |                 i += 1
  73 |         elif type(x) is list and str(name[:-1]) in columns:
  74 |             out[str(name[:-1])] = str(x)  # Must be str so that nested lists are written to db
  75 |         elif type(x) is dict and str(name[:-1]) in columns:
  76 |             out[str(name[:-1])] = str(x)  # Same here
  77 |         elif type(x) is bool and str(name[:-1]) in columns:
  78 |             out[str(name[:-1])] = int(x)  # Same here
  79 |         elif x is None and str(name[:-1]) in columns:
  80 |             if friends_details_dtypes[str(name[:-1])] == np.datetime64:
  81 |                 out[str(name[:-1])] = nonetype["date"]
  82 |             elif friends_details_dtypes[str(name[:-1])] == np.int64:
  83 |                 out[str(name[:-1])] = nonetype["num"]
  84 |             elif friends_details_dtypes[str(name[:-1])] == str:
  85 |                 out[str(name[:-1])] = nonetype["str"]
  86 |             elif friends_details_dtypes[str(name[:-1])] == np.int8:
  87 |                 out[str(name[:-1])] = nonetype["bool"]
  88 |             else:
  89 |                 raise NotImplementedError("twitter user_detail does not have a supported"
  90 |                                           "corresponding data type")
  91 |         else:
  92 |             out[str(name[:-1])] = x
  93 | 
  94 |     flatten(y)
  95 |     return out
  96 | 
  97 | 
  98 | # Decorator function for re-executing x times (with exponentially developing
  99 | # waiting times)
 100 | def retry_x_times(x):
 101 |     def retry_decorator(func):
 102 | 
 103 |         @wraps(func)
 104 |         def func_wrapper(*args, **kwargs):
 105 | 
 106 |             try:
 107 |                 if kwargs['fail'] is True:
 108 |                     # if we're testing fails:
 109 |                     return func(*args, **kwargs)
 110 |             except KeyError:
 111 |                 try:
 112 |                     if kwargs['test_fail'] is True:
 113 |                         return func(*args, **kwargs)
 114 |                 except KeyError:
 115 |                     pass
 116 | 
 117 |             i = 0
 118 |             if 'restart' in kwargs:
 119 |                 restart = kwargs['restart']
 120 | 
 121 |             if 'retries' in kwargs:
 122 |                 retries = kwargs['retries']
 123 |             else:
 124 |                 retries = x
 125 | 
 126 |             for i in range(retries - 1):
 127 |                 try:
 128 |                     if 'restart' in kwargs:
 129 |                         kwargs['restart'] = restart
 130 |                     return func(*args, **kwargs)
 131 |                 except Exception as e:
 132 |                     restart = True
 133 |                     waiting_time = 2**i
 134 |                     stdout.write(f"Encountered exception in {func.__name__}{args, kwargs}.\n{e}")
 135 |                     stdout.write(f"Retrying in {waiting_time}.\n")
 136 |                     stdout.flush()
 137 |                     time.sleep(waiting_time)
 138 |                 i += 1
 139 | 
 140 |             return func(*args, **kwargs)
 141 | 
 142 |         return func_wrapper
 143 | 
 144 |     return retry_decorator
 145 | 
 146 | 
 147 | class MyProcess(mp.Process):
 148 |     def run(self):
 149 |         try:
 150 |             mp.Process.run(self)
 151 |         except Exception as err:
 152 |             self.err = err
 153 |             raise self.err
 154 |         else:
 155 |             self.err = None
 156 | 
 157 | 
 158 | class Connection(object):
 159 |     """Class that handles the connection to Twitter
 160 | 
 161 |     Attributes:
 162 |         token_file_name (str): Path to file with user tokens
 163 |     """
 164 | 
 165 |     def __init__(self, token_file_name="tokens.csv", token_queue=None):
 166 |         self.credentials = FileImport().read_app_key_file()
 167 | 
 168 |         self.ctoken = self.credentials[0]
 169 |         self.csecret = self.credentials[1]
 170 | 
 171 |         if token_queue is None:
 172 |             self.tokens = FileImport().read_token_file(token_file_name)
 173 | 
 174 |             self.token_queue = mp.Queue()
 175 | 
 176 |             for token, secret in self.tokens.values:
 177 |                 self.token_queue.put((token, secret, {}, {}))
 178 |         else:
 179 |             self.token_queue = token_queue
 180 | 
 181 |         self.token, self.secret, self.reset_time_dict, self.calls_dict = self.token_queue.get()
 182 |         self.auth = tweepy.OAuthHandler(self.ctoken, self.csecret)
 183 |         self.auth.set_access_token(self.token, self.secret)
 184 |         self.api = tweepy.API(self.auth, wait_on_rate_limit=False, wait_on_rate_limit_notify=False)
 185 | 
 186 |     def next_token(self):
 187 | 
 188 |         self.token_queue.put((self.token, self.secret, self.reset_time_dict, self.calls_dict))
 189 | 
 190 |         (self.token, self.secret,
 191 |          self.reset_time_dict, self.calls_dict) = self.token_queue.get()
 192 | 
 193 |         self.auth = tweepy.OAuthHandler(self.ctoken, self.csecret)
 194 |         self.auth.set_access_token(self.token, self.secret)
 195 | 
 196 |         self.api = tweepy.API(self.auth)
 197 | 
 198 |     def remaining_calls(self, endpoint='/friends/ids'):
 199 |         """Returns the number of remaining calls until reset time.
 200 | 
 201 |         Args:
 202 |             endpoint (str):
 203 |                 API endpoint.
 204 |                 Defaults to '/friends/ids'
 205 |         Returns:
 206 |             remaining calls (int)
 207 |         """
 208 | 
 209 |         rate_limits = self.api.rate_limit_status()
 210 | 
 211 |         path = endpoint.split('/')
 212 | 
 213 |         path = path[1:]
 214 | 
 215 |         rate_limits = rate_limits['resources'][path[0]]
 216 | 
 217 |         key = "/" + path[0]
 218 | 
 219 |         for item in path[1:]:
 220 |             key = key + '/' + item
 221 |             rate_limits = rate_limits[key]
 222 | 
 223 |         rate_limits = rate_limits['remaining']
 224 | 
 225 |         return rate_limits
 226 | 
 227 |     def reset_time(self, endpoint='/friends/ids'):
 228 |         """Returns the time until reset time.
 229 | 
 230 |         Args:
 231 |             endpoint (str):
 232 |                 API endpoint.
 233 |                 Defaults to '/friends/ids'
 234 |         Returns:
 235 |             remaining time in seconds (int)
 236 |         """
 237 | 
 238 |         reset_time = self.api.rate_limit_status()
 239 | 
 240 |         path = endpoint.split('/')
 241 | 
 242 |         path = path[1:]
 243 | 
 244 |         reset_time = reset_time['resources'][path[0]]
 245 | 
 246 |         key = "/" + path[0]
 247 | 
 248 |         for item in path[1:]:
 249 |             key = key + '/' + item
 250 |             reset_time = reset_time[key]
 251 | 
 252 |         reset_time = reset_time['reset'] - int(time.time())
 253 | 
 254 |         return reset_time
 255 | 
 256 | 
 257 | class Collector(object):
 258 |     """Does the collecting of friends.
 259 | 
 260 |     Attributes:
 261 |         connection (Connection object):
 262 |             Connection object with actually active credentials
 263 |         seed (int): Twitter id of seed user
 264 |     """
 265 | 
 266 |     def __init__(self, connection, seed, following_pages_limit=0):
 267 |         self.seed = seed
 268 |         self.connection = connection
 269 | 
 270 |         self.token_blacklist = {}
 271 |         self.following_pages_limit = following_pages_limit
 272 | 
 273 |     class Decorators(object):
 274 | 
 275 |         @staticmethod
 276 |         def retry_with_next_token_on_rate_limit_error(func):
 277 |             def wrapper(*args, **kwargs):
 278 |                 collector = args[0]
 279 |                 old_token = collector.connection.token
 280 |                 while True:
 281 |                     try:
 282 |                         try:
 283 |                             if kwargs['force_retry_token'] is True:
 284 |                                 print('Forced retry with token.')
 285 |                                 return func(*args, **kwargs)
 286 |                         except KeyError:
 287 |                             pass
 288 |                         try:
 289 |                             if collector.token_blacklist[old_token] <= time.time():
 290 |                                 print(f'Token starting with {old_token[:4]} should work again.')
 291 |                                 return func(*args, **kwargs)
 292 |                             else:
 293 |                                 print(f'Token starting with {old_token[:4]} not ready yet.')
 294 |                                 collector.connection.next_token()
 295 |                                 time.sleep(10)
 296 |                                 continue
 297 |                         except KeyError:
 298 |                             print(f'Token starting with {old_token[:4]} not tried yet. Trying.')
 299 |                             return func(*args, **kwargs)
 300 |                     except tweepy.RateLimitError:
 301 |                         collector.token_blacklist[old_token] = time.time() + 150
 302 |                         print(f'Token starting with {old_token[:4]} hit rate limit.')
 303 |                         print("Retrying with next available token.")
 304 |                         print(f"Blacklisted until {collector.token_blacklist[old_token]}")
 305 |                         collector.connection.next_token()
 306 |                         continue
 307 |                     break
 308 |             return wrapper
 309 | 
 310 |     @Decorators.retry_with_next_token_on_rate_limit_error
 311 |     def check_API_calls_and_update_if_necessary(self, endpoint, check_calls=True):
 312 |         """Checks for an endpoint how many calls are left (optional), gets the reset time
 313 |         and updates token if necessary.
 314 | 
 315 |         If called with check_calls = False,
 316 |         it will assume that the actual token calls for the specified endpoint are depleted
 317 |         and return None for remaining calls
 318 | 
 319 |         Args:
 320 |             endpoint (str): API endpoint, e.g. '/friends/ids'
 321 |             check_calls (boolean): Default True
 322 |         Returns:
 323 |             if check_calls=True:
 324 |                 remaining_calls (int)
 325 |             else:
 326 |                 None
 327 |         """
 328 | 
 329 |         def try_remaining_calls_except_invalid_token():
 330 |             try:
 331 |                 remaining_calls = self.connection.remaining_calls(endpoint=endpoint)
 332 |             except tweepy.error.TweepError as invalid_error:
 333 |                 if "'code': 89" in invalid_error.reason:
 334 |                     print(f"Token starting with {self.connection.token[:5]} seems to have expired or\
 335 |      it has been revoked.")
 336 |                     print(invalid_error)
 337 |                     self.connection.next_token()
 338 |                     remaining_calls = self.connection.remaining_calls(endpoint=endpoint)
 339 |                 else:
 340 |                     raise invalid_error
 341 |             print("REMAINING CALLS FOR {} WITH TOKEN STARTING WITH {}: ".format(
 342 |                 endpoint, self.connection.token[:4]), remaining_calls)
 343 |             return remaining_calls
 344 | 
 345 |         if check_calls is True:
 346 |             self.connection.calls_dict[endpoint] = try_remaining_calls_except_invalid_token()
 347 | 
 348 |             reset_time = self.connection.reset_time(endpoint=endpoint)
 349 | 
 350 |             self.connection.reset_time_dict[endpoint] = time.time() + reset_time
 351 | 
 352 |             while self.connection.calls_dict[endpoint] == 0:
 353 |                 stdout.write("Attempt with next available token.\n")
 354 | 
 355 |                 self.connection.next_token()
 356 | 
 357 |                 try:
 358 |                     next_reset_at = self.connection.reset_time_dict[endpoint]
 359 |                     if time.time() >= next_reset_at:
 360 |                         self.connection.calls_dict[endpoint] = \
 361 |                             self.connection.remaining_calls(endpoint=endpoint)
 362 |                     else:
 363 |                         time.sleep(10)
 364 |                         continue
 365 |                 except KeyError:
 366 |                     self.connection.calls_dict[endpoint] = \
 367 |                         try_remaining_calls_except_invalid_token()
 368 |                     reset_time = self.connection.reset_time(endpoint=endpoint)
 369 |                     self.connection.reset_time_dict[endpoint] = time.time() + reset_time
 370 | 
 371 |                 print("REMAINING CALLS FOR {} WITH TOKEN STARTING WITH {}: ".format(
 372 |                     endpoint, self.connection.token[:4]), self.connection.calls_dict[endpoint])
 373 |                 print(f"{time.strftime('%c')}: new reset of token {self.connection.token[:4]} for \
 374 | {endpoint} in {int(self.connection.reset_time_dict[endpoint] - time.time())} seconds.")
 375 | 
 376 |             return self.connection.calls_dict[endpoint]
 377 | 
 378 |         else:
 379 |             self.connection.calls_dict[endpoint] = 0
 380 | 
 381 |             if endpoint not in self.connection.reset_time_dict \
 382 |                or self.connection.reset_time_dict[endpoint] <= time.time():
 383 |                 reset_time = self.connection.reset_time(endpoint=endpoint)
 384 |                 self.connection.reset_time_dict[endpoint] = time.time() + reset_time
 385 |                 print("REMAINING CALLS FOR {} WITH TOKEN STARTING WITH {}: ".format(
 386 |                     endpoint, self.connection.token[:4]), self.connection.calls_dict[endpoint])
 387 |                 print(f"{time.strftime('%c')}: new reset of token {self.connection.token[:4]} for \
 388 | {endpoint} in {int(self.connection.reset_time_dict[endpoint] - time.time())} seconds.")
 389 | 
 390 |             while (endpoint in self.connection.reset_time_dict and
 391 |                    self.connection.reset_time_dict[endpoint] >= time.time() and
 392 |                    self.connection.calls_dict[endpoint] == 0):
 393 |                 self.connection.next_token()
 394 |                 time.sleep(1)
 395 | 
 396 |             return None
 397 | 
 398 |     def get_friend_list(self, twitter_id=None, follower=False):
 399 |         """Gets the friend list of an account.
 400 | 
 401 |         Args:
 402 |             twitter_id (int): Twitter Id of account,
 403 |                 if None defaults to seed account of Collector object.
 404 | 
 405 |         Returns:
 406 |             list with friends of user.
 407 |         """
 408 | 
 409 |         if twitter_id is None:
 410 |             twitter_id = self.seed
 411 | 
 412 |         result = []
 413 | 
 414 |         cursor = -1
 415 |         following_page = 0
 416 |         while self.following_pages_limit == 0 or following_page < self.following_pages_limit:
 417 |             while True:
 418 |                 try:
 419 |                     if follower is False:
 420 |                         page = self.connection.api.friends_ids(user_id=twitter_id, cursor=cursor)
 421 |                         self.connection.calls_dict['/friends/ids'] = 1
 422 |                     else:
 423 |                         page = self.connection.api.followers_ids(user_id=twitter_id, cursor=cursor)
 424 |                         self.connection.calls_dict['/followers/ids'] = 1
 425 |                     break
 426 |                 except tweepy.RateLimitError:
 427 |                     if follower is False:
 428 |                         self.check_API_calls_and_update_if_necessary(endpoint='/friends/ids',
 429 |                                                                      check_calls=False)
 430 |                     else:
 431 |                         self.check_API_calls_and_update_if_necessary(endpoint='/followers/ids',
 432 |                                                                      check_calls=False)
 433 | 
 434 |             if len(page[0]) > 0:
 435 |                 result += page[0]
 436 |             else:
 437 |                 break
 438 |             cursor = page[1][1]
 439 | 
 440 |             following_page += 1
 441 | 
 442 |         return result
 443 | 
 444 |     def get_details(self, friends):
 445 |         """Collects details from friends of an account.
 446 | 
 447 |         Args:
 448 |             friends (list of int): list of Twitter user ids
 449 | 
 450 |         Returns:
 451 |             list of Tweepy user objects
 452 |         """
 453 | 
 454 |         i = 0
 455 | 
 456 |         user_details = []
 457 | 
 458 |         while i < len(friends):
 459 | 
 460 |             if i + 100 <= len(friends):
 461 |                 j = i + 100
 462 |             else:
 463 |                 j = len(friends)
 464 | 
 465 |             while True:
 466 |                 try:
 467 |                     try:
 468 |                         user_details += self.connection.api.lookup_users(user_ids=friends[i:j],
 469 |                                                                          tweet_mode='extended')
 470 |                     except tweepy.error.TweepError as e:
 471 |                         if "No user matches for specified terms." in e.reason:
 472 |                             stdout.write(f"No user matches for {friends[i:j]}")
 473 |                             stdout.flush()
 474 |                         else:
 475 |                             raise e
 476 |                     self.connection.calls_dict['/users/lookup'] = 1
 477 |                     break
 478 |                 except tweepy.RateLimitError:
 479 |                     self.check_API_calls_and_update_if_necessary(endpoint='/users/lookup',
 480 |                                                                  check_calls=False)
 481 | 
 482 |             i += 100
 483 | 
 484 |         return user_details
 485 | 
 486 |     @staticmethod
 487 |     def make_friend_df(friends_details, select=["id", "followers_count", "status_lang",
 488 |                                                 "created_at", "statuses_count"],
 489 |                        provide_jsons: bool = False, replace_nonetype: bool = True,
 490 |                        nonetype: dict = {'date': '1970-01-01',
 491 |                                          'num': -1,
 492 |                                          'str': '-1',
 493 |                                          'bool': -1}):
 494 |         """Transforms list of user details to pandas.DataFrame
 495 | 
 496 |         Args:
 497 |             friends_details (list of Tweepy user objects)
 498 |             select (list of str): columns to keep in DataFrame
 499 |             provide_jsons (boolean): If true, will treat friends_details as list of jsons. This
 500 |                                      allows creating a user details dataframe without having to
 501 |                                      download the details first. Note that the jsons must have the
 502 |                                      same format as the _json attribute of a user node of the
 503 |                                      Twitter API.
 504 |             replace_nonetype (boolean): Whether or not to replace values in the user_details that
 505 |                                         are None. Setting this to False is experimental, since code
 506 |                                         to avoid errors resulting from it has not yet been
 507 |                                         implemented. By default, missing dates will be replaced by
 508 |                                         1970/01/01, missing numericals by -1, missing strs by
 509 |                                         '-1', and missing booleans by -1.
 510 |                                         Use the 'nonetype' param to change the default.
 511 |             nonetype (dict): Contains the defaults for nonetype replacement (see docs for
 512 |                              'replace_nonetype' param).
 513 |                              {'date': 'yyyy-mm-dd', 'num': int, 'str': 'str', 'bool': int}
 514 | 
 515 |         Returns:
 516 |             pandas.DataFrame with these columns or selected as by `select`:
 517 |                 ['contributors_enabled',
 518 |                  'created_at',
 519 |                  'default_profile',
 520 |                  'default_profile_image',
 521 |                  'description',
 522 |                  'entities_description_urls',
 523 |                  'entities_url_urls',
 524 |                  'favourites_count',
 525 |                  'follow_request_sent',
 526 |                  'followers_count',
 527 |                  'following',
 528 |                  'friends_count',
 529 |                  'geo_enabled',
 530 |                  'has_extended_profile',
 531 |                  'id',
 532 |                  'id_str',
 533 |                  'is_translation_enabled',
 534 |                  'is_translator',
 535 |                  'lang',
 536 |                  'listed_count',
 537 |                  'location',
 538 |                  'name',
 539 |                  'needs_phone_verification',
 540 |                  'notifications',
 541 |                  'profile_background_color',
 542 |                  'profile_background_image_url',
 543 |                  'profile_background_image_url_https',
 544 |                  'profile_background_tile',
 545 |                  'profile_banner_url',
 546 |                  'profile_image_url',
 547 |                  'profile_image_url_https',
 548 |                  'profile_link_color',
 549 |                  'profile_sidebar_border_color',
 550 |                  'profile_sidebar_fill_color',
 551 |                  'profile_text_color',
 552 |                  'profile_use_background_image',
 553 |                  'protected',
 554 |                  'screen_name',
 555 |                  'status_contributors',
 556 |                  'status_coordinates',
 557 |                  'status_coordinates_coordinates',
 558 |                  'status_coordinates_type',
 559 |                  'status_created_at',
 560 |                  'status_entities_hashtags',
 561 |                  'status_entities_media',
 562 |                  'status_entities_symbols',
 563 |                  'status_entities_urls',
 564 |                  'status_entities_user_mentions',
 565 |                  'status_extended_entities_media',
 566 |                  'status_favorite_count',
 567 |                  'status_favorited',
 568 |                  'status_geo',
 569 |                  'status_geo_coordinates',
 570 |                  'status_geo_type',
 571 |                  'status_id',
 572 |                  'status_id_str',
 573 |                  'status_in_reply_to_screen_name',
 574 |                  'status_in_reply_to_status_id',
 575 |                  'status_in_reply_to_status_id_str',
 576 |                  'status_in_reply_to_user_id',
 577 |                  'status_in_reply_to_user_id_str',
 578 |                  'status_is_quote_status',
 579 |                  'status_lang',
 580 |                  'status_place',
 581 |                  'status_place_bounding_box_coordinates',
 582 |                  'status_place_bounding_box_type',
 583 |                  'status_place_contained_within',
 584 |                  'status_place_country',
 585 |                  'status_place_country_code',
 586 |                  'status_place_full_name',
 587 |                  'status_place_id',
 588 |                  'status_place_name',
 589 |                  'status_place_place_type',
 590 |                  'status_place_url',
 591 |                  'status_possibly_sensitive',
 592 |                  'status_quoted_status_id',
 593 |                  'status_quoted_status_id_str',
 594 |                  'status_retweet_count',
 595 |                  'status_retweeted',
 596 |                  'status_retweeted_status_contributors',
 597 |                  'status_retweeted_status_coordinates',
 598 |                  'status_retweeted_status_created_at',
 599 |                  'status_retweeted_status_entities_hashtags',
 600 |                  'status_retweeted_status_entities_media',
 601 |                  'status_retweeted_status_entities_symbols',
 602 |                  'status_retweeted_status_entities_urls',
 603 |                  'status_retweeted_status_entities_user_mentions',
 604 |                  'status_retweeted_status_extended_entities_media',
 605 |                  'status_retweeted_status_favorite_count',
 606 |                  'status_retweeted_status_favorited',
 607 |                  'status_retweeted_status_geo',
 608 |                  'status_retweeted_status_id',
 609 |                  'status_retweeted_status_id_str',
 610 |                  'status_retweeted_status_in_reply_to_screen_name',
 611 |                  'status_retweeted_status_in_reply_to_status_id',
 612 |                  'status_retweeted_status_in_reply_to_status_id_str',
 613 |                  'status_retweeted_status_in_reply_to_user_id',
 614 |                  'status_retweeted_status_in_reply_to_user_id_str',
 615 |                  'status_retweeted_status_is_quote_status',
 616 |                  'status_retweeted_status_lang',
 617 |                  'status_retweeted_status_place',
 618 |                  'status_retweeted_status_possibly_sensitive',
 619 |                  'status_retweeted_status_quoted_status_id',
 620 |                  'status_retweeted_status_quoted_status_id_str',
 621 |                  'status_retweeted_status_retweet_count',
 622 |                  'status_retweeted_status_retweeted',
 623 |                  'status_retweeted_status_source',
 624 |                  'status_retweeted_status_full_text',
 625 |                  'status_retweeted_status_truncated',
 626 |                  'status_source',
 627 |                  'status_full_text',
 628 |                  'status_truncated',
 629 |                  'statuses_count',
 630 |                  'suspended',
 631 |                  'time_zone',
 632 |                  'translator_type',
 633 |                  'url',
 634 |                  'verified'
 635 |                  'utc_offset'],
 636 |         """
 637 | 
 638 |         if not provide_jsons:
 639 |             json_list_raw = [friend._json for friend in friends_details]
 640 |         else:
 641 |             json_list_raw = friends_details
 642 |         json_list = []
 643 |         dtypes = {key: value for (key, value) in friends_details_dtypes.items() if key in select}
 644 |         for j in json_list_raw:
 645 |             flat = flatten_json(j, sep="_", columns=select, nonetype=nonetype)
 646 |             # In case that there are keys in the user_details json that are not in select
 647 |             newflat = {key: value for (key, value) in flat.items() if key in select}
 648 |             json_list.append(newflat)
 649 | 
 650 |         df = pd.json_normalize(json_list)
 651 | 
 652 |         for var in select:
 653 |             if var not in df.columns:
 654 |                 if dtypes[var] == np.datetime64:
 655 |                     df[var] = pd.to_datetime(nonetype["date"])
 656 |                 elif dtypes[var] == np.int64:
 657 |                     df[var] = nonetype["num"]
 658 |                 elif dtypes[var] == str:
 659 |                     df[var] = nonetype["str"]
 660 |                 elif dtypes[var] == np.int8:
 661 |                     df[var] = nonetype["bool"]
 662 |                 else:
 663 |                     df[var] = np.nan
 664 |             else:
 665 |                 if dtypes[var] == np.datetime64:
 666 |                     df[var] = df[var].fillna(pd.to_datetime(nonetype["date"]))
 667 |                 elif dtypes[var] == np.int64:
 668 |                     df[var] = df[var].fillna(nonetype["num"])
 669 |                 elif dtypes[var] == str:
 670 |                     df[var] = df[var].fillna(nonetype["str"])
 671 |                 elif dtypes[var] == np.int8:
 672 |                     df[var] = df[var].fillna(nonetype["bool"])
 673 |                 df[var] = df[var].astype(dtypes[var])
 674 | 
 675 |         df.sort_index(axis=1, inplace=True)
 676 |         return df
 677 | 
 678 |     def check_follows(self, source, target):
 679 |         """Checks Twitter API whether `source` account follows `target` account.
 680 | 
 681 |         Args:
 682 |             source (int): user id
 683 |             target (int): user id
 684 |         Returns:
 685 |             - `True` if `source` follows `target`
 686 |             - `False` if `source` does not follow `target`
 687 |         """
 688 | 
 689 |         # TODO: check remaining API calls
 690 | 
 691 |         friendship = self.connection.api.show_friendship(
 692 |             source_id=source, target_id=target)
 693 | 
 694 |         following = friendship[0].following
 695 | 
 696 |         return following
 697 | 
 698 | 
 699 | class Coordinator(object):
 700 |     """Selects a queue of seeds and coordinates the collection with collectors
 701 |     and a queue of tokens.
 702 |     """
 703 | 
 704 |     def __init__(self, seeds=2, token_file_name="tokens.csv", seed_list=None,
 705 |                  following_pages_limit=0):
 706 | 
 707 |         # Get seeds from seeds.csv
 708 |         self.seed_pool = FileImport().read_seed_file()
 709 | 
 710 |         # Create seed_list if none is given by sampling from the seed_pool
 711 |         if seed_list is None:
 712 | 
 713 |             self.number_of_seeds = seeds
 714 |             try:
 715 |                 self.seeds = self.seed_pool.sample(n=self.number_of_seeds)
 716 |             except ValueError:  # seed pool too small
 717 |                 stderr.write("WARNING: Seed pool smaller than number of seeds.\n")
 718 |                 self.seeds = self.seed_pool.sample(n=self.number_of_seeds, replace=True)
 719 | 
 720 |             self.seeds = self.seeds[0].values
 721 |         else:
 722 |             self.number_of_seeds = len(seed_list)
 723 |             self.seeds = seed_list
 724 | 
 725 |         self.seed_queue = mp.Queue()
 726 | 
 727 |         for seed in self.seeds:
 728 |             self.seed_queue.put(seed)
 729 | 
 730 |         # Get authorized user tokens for app from tokens.csv
 731 |         self.tokens = FileImport().read_token_file(token_file_name)
 732 | 
 733 |         # and put them in a queue
 734 |         self.token_queue = mp.Queue()
 735 | 
 736 |         for token, secret in self.tokens.values:
 737 |             self.token_queue.put((token, secret, {}, {}))
 738 | 
 739 |         # Initialize DataBaseHandler for DB communication
 740 |         self.dbh = DataBaseHandler()
 741 |         self.following_pages_limit = following_pages_limit
 742 | 
 743 |     def bootstrap_seed_pool(self, after_timestamp=0):
 744 |         """Adds all collected user details, i.e. friends with the desired properties
 745 |         (e.g. language) of previously found seeds to the seed pool.
 746 | 
 747 |         Args:
 748 |             after_timestamp (int): filter for friends added after this timestamp. Default: 0
 749 |         Returns:
 750 |             None
 751 |         """
 752 | 
 753 |         seed_pool_size = len(self.seed_pool)
 754 |         stdout.write("Bootstrapping seeds.\n")
 755 |         stdout.write(f"Old size: {seed_pool_size}. Adding after {after_timestamp} ")
 756 |         stdout.flush()
 757 | 
 758 |         query = f"SELECT id FROM user_details WHERE UNIX_TIMESTAMP(timestamp) >= {after_timestamp}"
 759 | 
 760 |         more_seeds = pd.read_sql(query, self.dbh.engine)
 761 |         more_seeds.columns = [0]  # rename from id to 0 for proper append
 762 |         self.seed_pool = self.seed_pool.merge(more_seeds, how='outer', on=[0])
 763 | 
 764 |         seed_pool_size = len(self.seed_pool)
 765 |         stdout.write(f"New size: {seed_pool_size}\n")
 766 |         stdout.flush()
 767 | 
 768 |     def lookup_accounts_friend_details(self, account_id, db_connection=None, select="*"):
 769 |         """Looks up and retrieves details from friends of `account_id` via database.
 770 | 
 771 |         Args:
 772 |             account_id (int)
 773 |             db_connection (database connection/engine object)
 774 |             select (str): comma separated list of required fields, defaults to all available ("*")
 775 |         Returns:
 776 |             None, if no friends found.
 777 |             Otherwise DataFrame with all details. Might be empty if language filter is on.
 778 |         """
 779 | 
 780 |         if db_connection is None:
 781 |             db_connection = self.dbh.engine
 782 | 
 783 |         query = f"SELECT target from friends WHERE source = {account_id} AND burned = 0"
 784 |         friends = pd.read_sql(query, db_connection)
 785 | 
 786 |         if len(friends) == 0:
 787 |             return None
 788 |         else:
 789 |             friends = friends['target'].values
 790 |             friends = tuple(friends)
 791 |             if len(friends) == 1:
 792 |                 friends = str(friends).replace(',', '')
 793 | 
 794 |             query = f"SELECT {select} from user_details WHERE id IN {friends}"
 795 |             friend_detail = pd.read_sql(query, db_connection)
 796 | 
 797 |             return friend_detail
 798 | 
 799 |     def choose_random_new_seed(self, msg, connection):
 800 |         new_seed = self.seed_pool.sample(n=1)
 801 |         new_seed = new_seed[0].values[0]
 802 | 
 803 |         if msg is not None:
 804 |             stdout.write(msg + "\n")
 805 |             stdout.flush()
 806 | 
 807 |         self.token_queue.put(
 808 |             (connection.token, connection.secret,
 809 |              connection.reset_time_dict, connection.calls_dict))
 810 | 
 811 |         self.seed_queue.put(new_seed)
 812 | 
 813 |         return new_seed
 814 | 
 815 |     def write_user_details(self, user_details):
 816 |         """Writes pandas.DataFrame `user_details` to MySQL table 'user_details'
 817 |         """
 818 | 
 819 |         try:
 820 |             user_details.to_sql('user_details', if_exists='append',
 821 |                                 index=False, con=self.dbh.engine)
 822 | 
 823 |         except IntegrityError:  # duplicate id (primary key)
 824 |             temp_tbl_name = self.dbh.make_temp_tbl()
 825 |             user_details.to_sql(temp_tbl_name, if_exists="append", index=False,
 826 |                                 con=self.dbh.engine)
 827 |             query = "REPLACE INTO user_details SELECT * FROM {};".format(
 828 |                     temp_tbl_name)
 829 |             self.dbh.engine.execute(query)
 830 |             self.dbh.engine.execute("DROP TABLE " + temp_tbl_name + ";")
 831 | 
 832 |     @retry_x_times(10)
 833 |     def work_through_seed_get_next_seed(self, seed, select=[], status_lang=None,
 834 |                                         connection=None, fail=False, **kwargs):
 835 |         """Takes a seed and determines the next seed and saves all details collected to db.
 836 | 
 837 |         Args:
 838 |             seed (int)
 839 |             select (list of str): fields to save to database, defaults to all
 840 |             status_lang (str): Twitter language code for language of last status to filter for,
 841 |                 defaults to None
 842 |             connection (collector.Connection object)
 843 |         Returns:
 844 |             seed (int)
 845 |         """
 846 | 
 847 |         # For testing raise of errors while multithreading
 848 |         if fail is True:
 849 |             raise TestException
 850 | 
 851 |         if 'fail_hidden' in kwargs and kwargs['fail_hidden'] is True:
 852 |             raise TestException
 853 | 
 854 |         language_check_condition = (
 855 |             status_lang is not None and
 856 |             'language_threshold' in kwargs and
 857 |             kwargs['language_threshold'] > 0
 858 |         )
 859 | 
 860 |         keyword_condition = ('keywords' in kwargs and
 861 |                              kwargs['keywords'] is not None and
 862 |                              len(kwargs['keywords']) > 0)
 863 | 
 864 |         if connection is None:
 865 |             connection = Connection(token_queue=self.token_queue)
 866 | 
 867 |         friends_details = None
 868 |         if 'restart' in kwargs and kwargs['restart'] is True:
 869 |             print("No db lookup after restart allowed, accessing Twitter API.")
 870 |         else:
 871 |             try:
 872 |                 friends_details = self.lookup_accounts_friend_details(
 873 |                     seed, self.dbh.engine)
 874 | 
 875 |             except ProgrammingError:
 876 | 
 877 |                 print("""Accessing db for friends_details failed. Maybe database does not exist yet.
 878 |                 Accessing Twitter API.""")
 879 | 
 880 |         if friends_details is None:
 881 |             if 'restart' in kwargs and kwargs['restart'] is True:
 882 |                 pass
 883 |             elif language_check_condition or keyword_condition:
 884 |                 check_exists_query = f"""
 885 |                                         SELECT EXISTS(
 886 |                                             SELECT source FROM result
 887 |                                             WHERE source={seed}
 888 |                                             )
 889 |                                      """
 890 |                 seed_depleted = self.dbh.engine.execute(check_exists_query).scalar()
 891 | 
 892 |                 if seed_depleted == 1:
 893 |                     new_seed = self.choose_random_new_seed(
 894 |                         f'Seed {seed} is depleted. No friends meet conditions. Random new seed.',
 895 |                         connection)
 896 | 
 897 |                     return new_seed
 898 | 
 899 |             collector = Collector(connection, seed,
 900 |                                   following_pages_limit=self.following_pages_limit)
 901 | 
 902 |             try:
 903 |                 friend_list = collector.get_friend_list()
 904 |                 if 'bootstrap' in kwargs and kwargs['bootstrap'] is True:
 905 |                     follower_list = collector.get_friend_list(follower=True)
 906 |             except tweepy.error.TweepError as e:  # if account is protected
 907 |                 if "Not authorized." in e.reason:
 908 | 
 909 |                     new_seed = self.choose_random_new_seed(
 910 |                         "Account {} protected, selecting random seed.".format(seed), connection)
 911 | 
 912 |                     return new_seed
 913 | 
 914 |                 elif "does not exist" in e.reason:
 915 | 
 916 |                     new_seed = self.choose_random_new_seed(
 917 |                         f"Account {seed} does not exist. Selecting random seed.", connection)
 918 | 
 919 |                     return new_seed
 920 | 
 921 |                 else:
 922 |                     raise e
 923 | 
 924 |             if friend_list == []:  # if account follows nobody
 925 | 
 926 |                 new_seed = self.choose_random_new_seed(
 927 |                     "No friends or unburned connections left, selecting random seed.", connection)
 928 | 
 929 |                 return new_seed
 930 | 
 931 |             self.dbh.write_friends(seed, friend_list)
 932 | 
 933 |             friends_details = collector.get_details(friend_list)
 934 |             select = list(set(select + ["id", "followers_count",
 935 |                                         "status_lang", "created_at", "statuses_count"]))
 936 |             friends_details = Collector.make_friend_df(friends_details, select)
 937 | 
 938 |             if 'bootstrap' in kwargs and kwargs['bootstrap'] is True:
 939 |                 follower_details = collector.get_details(follower_list)
 940 |                 follower_details = Collector.make_friend_df(follower_details, select)
 941 | 
 942 |             if status_lang is not None:
 943 | 
 944 |                 if type(status_lang) is str:
 945 |                     status_lang = [status_lang]
 946 |                 friends_details = friends_details[friends_details['status_lang'].isin(status_lang)]
 947 | 
 948 |                 if 'bootstrap' in kwargs and kwargs['bootstrap'] is True:
 949 |                     follower_details = follower_details[follower_details['status_lang'].isin(
 950 |                         status_lang)]
 951 | 
 952 |                 if len(friends_details) == 0:
 953 | 
 954 |                     new_seed = self.choose_random_new_seed(
 955 |                         f"No friends found with language '{status_lang}', selecting random seed.",
 956 |                         connection)
 957 | 
 958 |                     return new_seed
 959 | 
 960 |             self.write_user_details(friends_details)
 961 | 
 962 |             if 'bootstrap' in kwargs and kwargs['bootstrap'] is True:
 963 |                 self.write_user_details(follower_details)
 964 | 
 965 |         if status_lang is not None and len(friends_details) == 0:
 966 | 
 967 |             new_seed = self.seed_pool.sample(n=1)
 968 |             new_seed = new_seed[0].values[0]
 969 | 
 970 |             stdout.write(
 971 |                 "No user details for friends with last status language '{}' found in db.\n".format(
 972 |                     status_lang))
 973 |             stdout.flush()
 974 | 
 975 |             self.token_queue.put(
 976 |                 (connection.token, connection.secret,
 977 |                  connection.reset_time_dict, connection.calls_dict))
 978 | 
 979 |             self.seed_queue.put(new_seed)
 980 | 
 981 |             return new_seed
 982 | 
 983 |         if 'restart' in kwargs and kwargs['restart'] is True:
 984 |             #  lookup just in case we had them already
 985 |             friends_details_db = self.lookup_accounts_friend_details(
 986 |                 seed, self.dbh.engine)
 987 |             if friends_details_db is not None and len(friends_details_db) > 0:
 988 |                 friends_details = friends_details_db
 989 | 
 990 |         double_burned = True
 991 | 
 992 |         while double_burned is True:
 993 |             max_follower_count = friends_details['followers_count'].max()
 994 | 
 995 |             new_seed = friends_details[
 996 |                 friends_details['followers_count'] == max_follower_count]['id'].values[0]
 997 | 
 998 |             while language_check_condition or keyword_condition:
 999 |                 # RETRIEVE AND TEST MORE TWEETS FOR LANGUAGE OR KEYWORDS
1000 |                 try:
1001 |                     latest_tweets = get_latest_tweets(new_seed, connection,
1002 |                                                       fields=['lang', 'full_text'])
1003 |                 except tweepy.error.TweepError as e:  # if account is protected
1004 |                     if "Not authorized." in e.reason:
1005 |                         new_seed = self.choose_random_new_seed(
1006 |                             f"Account {new_seed} protected, selecting random seed.", connection)
1007 | 
1008 |                         return new_seed
1009 |                     elif "does not exist" in e.reason:
1010 |                         new_seed = self.choose_random_new_seed(
1011 |                             f"Account {seed} does not exist. Selecting random seed.", connection)
1012 | 
1013 |                         return new_seed
1014 |                     else:
1015 |                         raise e
1016 | 
1017 |                 threshold_met = True  # set true per default and change to False if not met
1018 |                 keyword_met = True
1019 | 
1020 |                 if language_check_condition:
1021 |                     language_fractions = get_fraction_of_tweets_in_language(latest_tweets)
1022 | 
1023 |                     threshold_met = any(kwargs['language_threshold'] <= fraction
1024 |                                         for fraction in language_fractions.values())
1025 | 
1026 |                 if keyword_condition:
1027 |                     keyword_met = any(latest_tweets['full_text'].str.contains(keyword,
1028 |                                       case=False).any()
1029 |                                       for keyword in kwargs['keywords'])
1030 | 
1031 |                 # THEN REMOVE FROM friends_details DATAFRAME, SEED POOL,
1032 |                 # AND DATABASE IF FALSE POSITIVE
1033 |                 # ACCORDING TO THRESHOLD OR KEYWORD
1034 | 
1035 |                 if threshold_met and keyword_met:
1036 |                     break
1037 |                 else:
1038 |                     friends_details = friends_details[friends_details['id'] != new_seed]
1039 | 
1040 |                     print(
1041 |                         f'seed pool size before removing not matching seed: {len(self.seed_pool)}')
1042 |                     self.seed_pool = self.seed_pool[self.seed_pool[0] != new_seed]
1043 |                     print(
1044 |                         f'seed pool size after removing not matching seed: {len(self.seed_pool)}')
1045 | 
1046 |                     # query = f"DELETE from user_details WHERE id = {new_seed}"
1047 |                     # self.dbh.engine.execute(query)
1048 | 
1049 |                     query = f"DELETE from friends WHERE target = {new_seed}"
1050 |                     self.dbh.engine.execute(query)
1051 | 
1052 |                     # AND REPEAT THE CHECK
1053 |                     try:
1054 |                         new_seed = friends_details[friends_details['followers_count'] ==
1055 |                                                    max_follower_count]['id'].values[0]
1056 |                     except IndexError:  # no more friends
1057 |                         new_seed = self.choose_random_new_seed(
1058 |                             f'{seed}: No friends meet set conditions. Selecting random.',
1059 |                             connection)
1060 | 
1061 |                         return new_seed
1062 | 
1063 |             check_exists_query = """
1064 |                                     SELECT EXISTS(
1065 |                                         SELECT * FROM friends
1066 |                                         WHERE source={source}
1067 |                                         )
1068 |                                  """.format(source=new_seed)
1069 |             node_exists_as_source = self.dbh.engine.execute(check_exists_query).scalar()
1070 | 
1071 |             if node_exists_as_source == 1:
1072 |                 check_follow_query = """
1073 |                                         SELECT EXISTS(
1074 |                                             SELECT * FROM friends
1075 |                                             WHERE source={source} and target={target}
1076 |                                             )
1077 |                                      """.format(source=new_seed, target=seed)
1078 | 
1079 |                 follows = self.dbh.engine.execute(check_follow_query).scalar()
1080 | 
1081 |             elif node_exists_as_source == 0:
1082 |                 # check on Twitter
1083 | 
1084 |                 # FIXTHIS: dirty workaround because of wacky test
1085 |                 if connection == "fail":
1086 |                     connection = Connection()
1087 | 
1088 |                 try:
1089 |                     collector
1090 |                 except NameError:
1091 |                     collector = Collector(connection, seed)
1092 | 
1093 |                 try:
1094 |                     follows = int(collector.check_follows(source=new_seed, target=seed))
1095 |                 except tweepy.TweepError:
1096 |                     print(f"Follow back undetermined. User {new_seed} not available")
1097 |                     follows = 0
1098 | 
1099 |             if follows == 0:
1100 | 
1101 |                 insert_query = f"""
1102 |                     INSERT INTO result (source, target)
1103 |                     VALUES ({seed}, {new_seed})
1104 |                     ON DUPLICATE KEY UPDATE source = source
1105 |                 """
1106 | 
1107 |                 self.dbh.engine.execute(insert_query)
1108 | 
1109 |                 print('\nno follow back: added ({seed})-->({new_seed})'.format(
1110 |                     seed=seed, new_seed=new_seed
1111 |                 ))
1112 | 
1113 |             if follows == 1:
1114 | 
1115 |                 insert_query = f"""
1116 |                     INSERT INTO result (source, target)
1117 |                     VALUES
1118 |                         ({seed}, {new_seed}),
1119 |                         ({new_seed}, {seed})
1120 |                     ON DUPLICATE KEY UPDATE source = source
1121 |                 """
1122 | 
1123 |                 self.dbh.engine.execute(insert_query)
1124 | 
1125 |                 print('\nfollow back: added ({seed})<-->({new_seed})'.format(
1126 |                     seed=seed, new_seed=new_seed
1127 |                 ))
1128 | 
1129 |             update_query = """
1130 |                             UPDATE friends
1131 |                             SET burned=1
1132 |                             WHERE source={source} AND target={target} AND burned = 0
1133 |                            """.format(source=seed, target=new_seed)
1134 | 
1135 |             update_result = self.dbh.engine.execute(update_query)
1136 | 
1137 |             if update_result.rowcount == 0:
1138 |                 print(f"Connection ({seed})-->({new_seed}) was burned already.")
1139 |                 friends_details = self.lookup_accounts_friend_details(
1140 |                     seed, self.dbh.engine)
1141 | 
1142 |                 if friends_details is None or len(friends_details) == 0:
1143 |                     new_seed = self.choose_random_new_seed(
1144 |                         f"No friends or unburned connections left for {seed}, selecting random.",
1145 |                         connection)
1146 | 
1147 |                     return new_seed
1148 | 
1149 |             else:
1150 |                 print(f"burned ({seed})-->({new_seed})")
1151 |                 double_burned = False
1152 | 
1153 |         self.token_queue.put(
1154 |             (connection.token, connection.secret,
1155 |              connection.reset_time_dict, connection.calls_dict))
1156 | 
1157 |         self.seed_queue.put(new_seed)
1158 | 
1159 |         return new_seed
1160 | 
1161 |     def start_collectors(self, number_of_seeds=None, select=[], status_lang=None, fail=False,
1162 |                          fail_hidden=False, restart=False, retries=10, bootstrap=False,
1163 |                          latest_start_time=0, language_threshold=0, keywords=[]):
1164 |         """Starts `number_of_seeds` collector threads
1165 |         collecting the next seed for on seed taken from `self.queue`
1166 |         and puting it back into `self.seed_queue`.
1167 | 
1168 |         Args:
1169 |             number_of_seeds (int): Defaults to `self.number_of_seeds`
1170 |             select (list of strings): fields to save to user_details table in database
1171 |             status_lang (str): language code for latest tweet langage to select
1172 |         Returns:
1173 |             list of mp.(dummy.)Process
1174 |         """
1175 | 
1176 |         if bootstrap is True:
1177 | 
1178 |             if restart is True:
1179 |                 latest_start_time = 0
1180 | 
1181 |             self.bootstrap_seed_pool(after_timestamp=latest_start_time)
1182 | 
1183 |         if number_of_seeds is None:
1184 |             number_of_seeds = self.number_of_seeds
1185 | 
1186 |         processes = []
1187 |         seed_list = []
1188 | 
1189 |         print("number of seeds: ", number_of_seeds)
1190 | 
1191 |         for i in range(number_of_seeds):
1192 |             seed = self.seed_queue.get()
1193 |             seed_list += [seed]
1194 |             print("seed ", i, ": ", seed)
1195 |             processes.append(MyProcess(target=self.work_through_seed_get_next_seed,
1196 |                                        kwargs={'seed': seed,
1197 |                                                'select': select,
1198 |                                                'status_lang': status_lang,
1199 |                                                'fail': fail,
1200 |                                                'fail_hidden': fail_hidden,
1201 |                                                'restart': restart,
1202 |                                                'retries': retries,
1203 |                                                'language_threshold': language_threshold,
1204 |                                                'bootstrap': bootstrap,
1205 |                                                'keywords': keywords},
1206 |                                        name=str(seed)))
1207 | 
1208 |         latest_seeds = pd.DataFrame(seed_list)
1209 | 
1210 |         latest_seeds.to_csv('latest_seeds.csv', index=False, header=False)
1211 | 
1212 |         for p in processes:
1213 |             p.start()
1214 |             print(f"Thread {p.name} started.")
1215 | 
1216 |         return processes
1217 | 


--------------------------------------------------------------------------------
/config_template.yml:
--------------------------------------------------------------------------------
  1 | # In the following config file, please fill the fields as you need them.
  2 | # Do not use quotes, just plain text: e.g.:
  3 | # sql:
  4 |  # dbtype: sqlite
  5 | # etc.
  6 | 
  7 | # ================== Database Information =====================
  8 | sql:
  9 |     dbtype:   mysql
 10 |     host:     # if dbtype = mysql, provide host
 11 |     user:     # if dbtype = mysql, provide user
 12 |     passwd:   # if dbtype = mysql, provide password
 13 |     dbname:   # provide a name for the database.
 14 | 
 15 | 
 16 | # ================== Twitter User Details =====================
 17 | # If you wish to save certain twitter user details, please just add the SQL data
 18 | # type you wish to save it as in the SQL database (recommended types are indicated
 19 | # in parantheses). If you do not wish to save a certain detail, just leave it empty
 20 | # like so:
 21 | # twitter_user_details:
 22 |     # contributors_enabled: SMALLINT
 23 |     # created at:
 24 | # This will save the detail "contributors_enabled" as booelan / tinyint into the
 25 | # database but it will not save "created_at" at all.
 26 | 
 27 | twitter_user_details:
 28 |     contributors_enabled:  # SMALLINT
 29 |     created_at:  DATETIME
 30 |     default_profile:  # SMALLINT
 31 |     default_profile_image:  # SMALLINT
 32 |     description:  # TEXT (contains a dict)
 33 |     entities_description_urls:  # TEXT
 34 |     entities_url_urls:  # TEXT (contains a dict)
 35 |     favourites_count:  # BIGINT
 36 |     follow_request_sent:  # SMALLINT
 37 |     followers_count:  BIGINT
 38 |     following:  # SMALLINT
 39 |     friends_count:  # BIGINT
 40 |     geo_enabled:  # SMALLINT
 41 |     has_extended_profile:  # SMALLINT
 42 |     id:  BIGINT PRIMARY KEY
 43 |     id_str:  # VARCHAR(30)
 44 |     is_translation_enabled:  # SMALLINT
 45 |     is_translator:  # SMALLINT
 46 |     lang:  # VARCHAR(10)
 47 |     listed_count:  # BIGINT
 48 |     location:  # TEXT
 49 |     name:  # VARCHAR (50)
 50 |     needs_phone_verification:  #SMALLINT
 51 |     notifications:  # SMALLINT
 52 |     profile_background_color:  # CHAR(6) (is a Hex Color Code)
 53 |     profile_background_image_url:  # TEXT
 54 |     profile_background_image_url_https:  # TEXT
 55 |     profile_background_tile:  # SMALLINT
 56 |     profile_banner_url:  # TEXT
 57 |     profile_image_url:  # TEXT
 58 |     profile_image_url_https:  # TEXT
 59 |     profile_link_color:  # CHAR(6) (is a Hex Color Code)
 60 |     profile_sidebar_border_color:  # CHAR(6) (is a Hex Color Code)
 61 |     profile_sidebar_fill_color:  # CHAR(6) (is a Hex Color Code)
 62 |     profile_text_color:  # CHAR(6) (is a Hex Color Code)
 63 |     profile_use_background_image:  # SMALLINT
 64 |     protected:  # SMALLINT
 65 |     screen_name:  # VARCHAR(50)
 66 |     status_contributors:  # TEXT (Rarely available)
 67 |     status_coordinates:  # TEXT (contains a dict)
 68 |     status_coordinates_coordinates:  # TEXT (Rarely available)
 69 |     status_coordinates_type:  # TEXT (Rarely available)
 70 |     status_created_at:  # DATETIME
 71 |     status_entities_hashtags:  # TEXT (contains a dict)
 72 |     status_entities_media:  # TEXT (contains a dict)
 73 |     status_entities_symbols:  # TEXT (contains a dict) # DE FACTO ALWAYS EMPTY
 74 |     status_entities_urls:  # TEXT (contains a dict)
 75 |     status_entities_user_mentions:  # TEXT (contains a dict)
 76 |     status_extended_entities_media:  # TEXT (contains a dict)
 77 |     status_favorite_count:  # INT
 78 |     status_favorited:  # SMALLINT
 79 |     status_geo:  # TEXT (contains a dict)
 80 |     status_geo_coordinates:  # TEXT (Rarely available)
 81 |     status_geo_type:  # TEXT (Rarely available)
 82 |     status_id:  # BIGINT
 83 |     status_id_str: # VARCHAR(50)
 84 |     status_in_reply_to_screen_name:  # VARCHAR(50)
 85 |     status_in_reply_to_status_id:  # BIGINT
 86 |     status_in_reply_to_status_id_str:  # VARCHAR(50)
 87 |     status_in_reply_to_user_id:  # BIGINT
 88 |     status_in_reply_to_user_id_str:  # VARCHAR(30)
 89 |     status_is_quote_status:  # SMALLINT
 90 |     status_lang:  VARCHAR(10)
 91 |     status_place:  # TEXT (contains a dict)
 92 |     status_place_bounding_box_coordinates: # TEXT (Rarely available)
 93 |     status_place_bounding_box_type:  # TEXT (Rarely available)
 94 |     status_place_contained_within:  # TEXT (Rarely available)
 95 |     status_place_country:  # TEXT (Rarely available)
 96 |     status_place_country_code:  # TEXT (Rarely available)
 97 |     status_place_full_name:  # TEXT (Rarely available)
 98 |     status_place_id:  # TEXT (Rarely available)
 99 |     status_place_name: # TEXT (Rarely available)
100 |     status_place_place_type:  # TEXT (Rarely available)
101 |     status_place_url:  # TEXT (Rarely available)
102 |     status_possibly_sensitive:  # SMALLINT
103 |     status_quoted_status_id:  # BIGINT
104 |     status_quoted_status_id_str:  # VARCHAR(50)
105 |     status_retweet_count:  # INT
106 |     status_retweeted:  # SMALLINT
107 |     status_retweeted_status_contributors:  # TEXT (Rarely available)
108 |     status_retweeted_status_coordinates:  # TEXT (contains a dict)
109 |     status_retweeted_status_created_at:  # DATETIME
110 |     status_retweeted_status_entities_hashtags:  # TEXT (contains a dict)
111 |     status_retweeted_status_entities_media:  # TEXT (contains a dict)
112 |     status_retweeted_status_entities_symbols:  # TEXT (Rarely available)
113 |     status_retweeted_status_entities_urls:  # TEXT (contains a dict)
114 |     status_retweeted_status_entities_user_mentions:  # TEXT (contains a dict)
115 |     status_retweeted_status_extended_entities_media: # TEXT (contains a dict)
116 |     status_retweeted_status_favorite_count:  # INT
117 |     status_retweeted_status_favorited:  # SMALLINT
118 |     status_retweeted_status_geo:  # TEXT (contains a dict)
119 |     status_retweeted_status_id:  # BIGINT
120 |     status_retweeted_status_id_str:  # VARCHAR(50)
121 |     status_retweeted_status_in_reply_to_screen_name:  # VARCHAR(30)
122 |     status_retweeted_status_in_reply_to_status_id:  # BIGINT
123 |     status_retweeted_status_in_reply_to_status_id_str:  # VARCHAR(50)
124 |     status_retweeted_status_in_reply_to_user_id:  # BIGINT
125 |     status_retweeted_status_in_reply_to_user_id_str:  # VARCHAR(30)
126 |     status_retweeted_status_is_quote_status:  # SMALLINT
127 |     status_retweeted_status_lang:  # VARCHAR(10)
128 |     status_retweeted_status_place:  # TEXT (contains a dict)
129 |     status_retweeted_status_possibly_sensitive:  # SMALLINT
130 |     status_retweeted_status_quoted_status_id:  # BIGINT
131 |     status_retweeted_status_quoted_status_id_str:  # VARCHAR(50)
132 |     status_retweeted_status_retweet_count:  # INT
133 |     status_retweeted_status_retweeted:  # SMALLINT
134 |     status_retweeted_status_source:  # TEXT
135 |     status_retweeted_status_full_text:  # TEXT
136 |     status_retweeted_status_truncated:  # SMALLINT
137 |     status_source:  # TEXT
138 |     status_full_text:  # TEXT
139 |     status_truncated:  # SMALLINT
140 |     statuses_count:  BIGINT
141 |     suspended:  # SMALLINT
142 |     time_zone: # TEXT (Rarely available)
143 |     translator_type:  # VARCHAR(50)
144 |     url:  # TEXT
145 |     verified:  # BOOLEAN
146 |     utc_offset:  # TEXT (Rarely available)
147 | 
148 | 
149 | # ================== Notification Emails =====================
150 | 
151 | notifications:
152 |     email_to_notify: # user@example.com
153 |     # mailgun details
154 |     # (find them under the respective domain name here: https://mailgun.com/app/domains)
155 |     mailgun_default_smtp_login:
156 |     mailgun_api_base_url:
157 |     mailgun_api_key:
158 | 


--------------------------------------------------------------------------------
/create_dense_result.sql:
--------------------------------------------------------------------------------
 1 | CREATE VIEW `dense_result` AS
 2 | SELECT DISTINCT source, target FROM friends WHERE source IN
 3 | (SELECT DISTINCT T.id
 4 |             FROM
 5 |                 (SELECT 
 6 |                     result.source AS id
 7 |                 FROM
 8 |                     result UNION SELECT 
 9 |                     result.target AS id
10 |                 FROM
11 |                     result) T)
12 | AND target IN
13 | (SELECT DISTINCT T.id
14 |             FROM
15 |                 (SELECT 
16 |                     result.source AS id
17 |                 FROM
18 |                     result UNION SELECT 
19 |                     result.target AS id
20 |                 FROM
21 |                     result) T)
22 | 


--------------------------------------------------------------------------------
/create_node_view.sql:
--------------------------------------------------------------------------------
 1 | CREATE VIEW `nodes` AS
 2 |     SELECT 
 3 |         `user_details`.`id` AS `id`,
 4 |         `user_details`.`status_lang` AS `status_lang`,
 5 |         `user_details`.`screen_name` AS `screen_name`,
 6 |         `user_details`.`name` AS `name`,
 7 |         `user_details`.`location` AS `location`,
 8 |         `user_details`.`description` AS `description`,
 9 |         `user_details`.`created_at` AS `created_at`,
10 |         `user_details`.`favourites_count` AS `favourites_count`,
11 |         `user_details`.`followers_count` AS `followers_count`,
12 |         `user_details`.`friends_count` AS `friends_count`,
13 |         `user_details`.`listed_count` AS `listed_count`,
14 |         `user_details`.`protected` AS `protected`,
15 |         `user_details`.`statuses_count` AS `statuses_count`,
16 |         `user_details`.`status_created_at` AS `status_created_at`,
17 |         `user_details`.`timestamp` AS `timestamp`,
18 |         `user_details`.`verified` AS `verified`
19 |     FROM
20 |         `user_details`
21 |     WHERE
22 |         `user_details`.`id` IN (SELECT 
23 |                 `T`.`id`
24 |             FROM
25 |                 (SELECT 
26 |                     `result`.`source` AS `id`
27 |                 FROM
28 |                     `result` UNION SELECT 
29 |                     `result`.`target` AS `id`
30 |                 FROM
31 |                     `result`) `T`)


--------------------------------------------------------------------------------
/database_handler.py:
--------------------------------------------------------------------------------
  1 | import sqlite3 as lite
  2 | import uuid
  3 | from sqlite3 import Error
  4 | 
  5 | import pandas as pd
  6 | from sqlalchemy import create_engine
  7 | from sqlalchemy.exc import OperationalError
  8 | 
  9 | from setup import Config
 10 | 
 11 | 
 12 | class DataBaseHandler():
 13 |     def __init__(self, config_path: str = "config.yml", config_dict: dict = None,
 14 |                  create_all: bool = True):
 15 |         """Initializes class by either connecting to an existing database
 16 |         or by creating a new database. Database settings depend on config.yml
 17 | 
 18 |         Args:
 19 |             config_file (str): Path to configuration file. Defaults to "config.yml"
 20 |             config_dict (dict): Dictionary containing the config information (in case
 21 |                                 the dictionary shall be directly passed instead of read
 22 |                                 out of a configuration file).
 23 |             create_all (bool): If set to false, will not attempt to create the friends,
 24 |                                result, and user_details tables.
 25 |         Returns:
 26 |             Nothing
 27 |         """
 28 | 
 29 |         # Prepare user_details configured in config.yml for user_details table creation
 30 |         self.config = Config(config_path, config_dict)
 31 |         user_details_list = []
 32 |         if "twitter_user_details" in self.config.config:
 33 |             for detail, sqldatatype in self.config.config["twitter_user_details"].items():
 34 |                 if sqldatatype is not None:
 35 |                     user_details_list.append(detail + " " + sqldatatype)
 36 |         else:
 37 |             print("""Key "twitter_user_details" could not be found in config.yml. Will not create
 38 |                   a user_details table.""")
 39 | 
 40 |         # Table creation for SQLITE database type.
 41 |         # Note and TODO: the collector does not support sqlite (yet)
 42 |         if self.config.dbtype.lower() == "sqlite":
 43 |             try:
 44 |                 self.engine = lite.connect(self.config.dbname + ".db")
 45 |                 print("Connected to " + self.config.dbname + "!")
 46 |             except Error as e:
 47 |                 raise e
 48 |             if create_all:
 49 |                 try:
 50 |                     create_friends_table_sql = """CREATE TABLE IF NOT EXISTS friends (
 51 |                                                     source BIGINT NOT NULL,
 52 |                                                     target BIGINT NOT NULL,
 53 |                                                     burned TINYINT NOT NULL,
 54 |                                                     timestamp DATETIME DEFAULT CURRENT_TIMESTAMP
 55 |                                                   );"""
 56 |                     create_friends_index_sql_1 = "CREATE INDEX iFSource ON friends(source);"
 57 |                     create_friends_index_sql_2 = "CREATE INDEX iFTimestamp ON friends(timestamp);"
 58 |                     create_results_table_sql = """CREATE TABLE IF NOT EXISTS result (
 59 |                                                     source BIGINT NOT NULL,
 60 |                                                     target BIGINT NOT NULL,
 61 |                                                     timestamp DATETIME DEFAULT CURRENT_TIMESTAMP
 62 |                                                   );"""
 63 |                     create_results_index_sql_1 = "CREATE INDEX iRSource ON result(source);"
 64 |                     create_results_index_sql_2 = "CREATE INDEX iRTimestamp ON result(timestamp);"
 65 |                     c = self.engine.cursor()
 66 |                     c.execute(create_friends_table_sql)
 67 |                     c.execute(create_friends_index_sql_1)
 68 |                     c.execute(create_friends_index_sql_2)
 69 |                     c.execute(create_results_table_sql)
 70 |                     c.execute(create_results_index_sql_1)
 71 |                     c.execute(create_results_index_sql_2)
 72 |                     if user_details_list != []:
 73 |                         create_user_details_sql = """
 74 |                             CREATE TABLE IF NOT EXISTS user_details
 75 |                             (""" + ", ".join(user_details_list) + """,
 76 |                              timestamp DATETIME DEFAULT CURRENT_TIMESTAMP);"""
 77 |                         create_ud_index = "CREATE INDEX iUTimestamp ON user_details(timestamp)"
 78 |                         c.execute(create_user_details_sql)
 79 |                         c.execute(create_ud_index)
 80 |                     else:
 81 |                         # TODO: Make this a minimal user_details table?
 82 |                         print("""No user_details configured in config.yml. Will not create a
 83 |                               user_details table.""")
 84 |                 except Error as e:
 85 |                     print(e)
 86 | 
 87 |         # Table creation for mysql database type
 88 |         elif self.config.dbtype.lower() == "mysql":
 89 |             try:
 90 |                 self.engine = create_engine(
 91 |                     f'mysql+pymysql://{self.config.dbuser}:'
 92 |                     f'{self.config.dbpwd}@{self.config.dbhost}/{self.config.dbname}'
 93 |                 )
 94 |                 print('Connected to database "' + self.config.dbname + '" via mySQL!')
 95 |             except OperationalError as e:
 96 |                 raise e
 97 |             if create_all:
 98 |                 try:
 99 |                     create_friends_table_sql = """CREATE TABLE IF NOT EXISTS friends (
100 |                                                     source BIGINT NOT NULL,
101 |                                                     target BIGINT NOT NULL,
102 |                                                     burned TINYINT NOT NULL,
103 |                                                     timestamp TIMESTAMP DEFAULT CURRENT_TIMESTAMP
104 |                                                     ON UPDATE CURRENT_TIMESTAMP,
105 |                                                     UNIQUE INDEX fedge (source, target),
106 |                                                     INDEX(timestamp)
107 |                                                     );"""
108 |                     create_results_table_sql = """CREATE TABLE IF NOT EXISTS result (
109 |                                                     source BIGINT NOT NULL,
110 |                                                     target BIGINT NOT NULL,
111 |                                                     timestamp TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
112 |                                                     UNIQUE INDEX redge (source, target),
113 |                                                     INDEX(timestamp)
114 |                                                   );"""
115 |                     self.engine.execute(create_friends_table_sql)
116 |                     self.engine.execute(create_results_table_sql)
117 |                     if user_details_list != []:
118 |                         create_user_details_sql = """
119 |                             CREATE TABLE IF NOT EXISTS user_details
120 |                             (""" + ", ".join(user_details_list) + """, timestamp TIMESTAMP
121 |                             DEFAULT CURRENT_TIMESTAMP,
122 |                             INDEX(timestamp));"""
123 |                         self.engine.execute(create_user_details_sql)
124 |                     else:
125 |                         print("""No user_details configured in config.yml. Will not create a
126 |                               user_details table.""")
127 |                 except OperationalError as e:
128 |                     raise e
129 | 
130 |     def make_temp_tbl(self, type: str = "user_details"):
131 |         """Creates a new temporary table with a random name consisting of a temp_ prefix
132 |            and a uid. The structure of the table depends on the chosen type param. The
133 |            table's structure will be a copy of an existing table, for example, a temporary
134 |            user_details table will have the same columns and attributes (Keys, constraints, etc.)
135 |            as the user_details table.
136 | 
137 |         Args:
138 |             type (str): The table that the temporary table is going to simulate.
139 |                         Possible values are ["friends", "result", "user_details"]
140 |         Returns:
141 |             The name of the temporary table.
142 |         """
143 |         uid = uuid.uuid4()
144 |         temp_tbl_name = "temp_" + str(uid).replace('-', '_')
145 | 
146 |         if self.config.dbtype.lower() == "mysql":
147 |             create_temp_tbl_sql = f"CREATE TABLE {temp_tbl_name} LIKE {type};"
148 |         elif self.config.dbtype.lower() == "sqlite":
149 |             create_temp_tbl_sql = f"CREATE TABLE {temp_tbl_name} AS SELECT * FROM {type} WHERE 0"
150 |         self.engine.execute(create_temp_tbl_sql)
151 |         return temp_tbl_name
152 | 
153 |     def write_friends(self, seed, friendlist):
154 |         """Writes the database entries for one user and their friends in format user, friends.
155 |         Note that the database is appended by the new entries, and that no entries will be deleted
156 |         by this method.
157 | 
158 |         Args:
159 |             seed (str): single Twitter ID
160 |             friendlist (list of str): Twitter IDs of seed's friends
161 |         Returns:
162 |             Nothing
163 |         """
164 |         temp_tbl_name = self.make_temp_tbl(type="friends")
165 | 
166 |         friends_df = pd.DataFrame({'target': friendlist})
167 |         friends_df['source'] = seed
168 |         friends_df['burned'] = 0
169 |         friends_df.to_sql(name=temp_tbl_name, con=self.engine, if_exists="replace", index=False)
170 | 
171 |         if self.config.dbtype.lower() == "mysql":
172 |             insert_query = f"""
173 |                 INSERT INTO friends (source, target, burned)
174 |                 SELECT source, target, burned
175 |                 FROM {temp_tbl_name}
176 |                 ON DUPLICATE KEY UPDATE
177 |                     source = {temp_tbl_name}.source
178 |             """
179 |         elif self.config.dbtype.lower() == "sqlite":
180 |             insert_query = f"""
181 |                 INSERT OR IGNORE INTO friends (source, target, burned)
182 |                 SELECT source, target, burned
183 |                 FROM {temp_tbl_name}
184 |             """
185 | 
186 |         self.engine.execute(insert_query)
187 |         self.engine.execute(f"DROP TABLE {temp_tbl_name}")
188 | 


--------------------------------------------------------------------------------
/empty_keys.json:
--------------------------------------------------------------------------------
1 | {"consumer_token": "", "consumer_secret": ""}
2 | 


--------------------------------------------------------------------------------
/exceptions.py:
--------------------------------------------------------------------------------
1 | class TestException(Exception):
2 |     pass
3 | 


--------------------------------------------------------------------------------
/functional_test.py:
--------------------------------------------------------------------------------
  1 | # functional test for network collector
  2 | import argparse
  3 | from datetime import datetime
  4 | import os
  5 | import shutil
  6 | import sys
  7 | import unittest
  8 | import warnings
  9 | from exceptions import TestException
 10 | from subprocess import PIPE, STDOUT, CalledProcessError, Popen, check_output
 11 | 
 12 | import pandas as pd
 13 | import yaml
 14 | from sqlalchemy.exc import InternalError
 15 | 
 16 | import test_helpers
 17 | from collector import Coordinator
 18 | from database_handler import DataBaseHandler
 19 | from start import main_loop
 20 | 
 21 | parser = argparse.ArgumentParser(description='SparseTwitter FunctionalTestSuite')
 22 | parser.add_argument('-w', '--show_resource_warnings',
 23 |                     help='If set, will show possible resource warnings from the requests package.',
 24 |                     required=False,
 25 |                     action='store_true')
 26 | parser.add_argument('unittest_args', nargs='*')
 27 | 
 28 | args = parser.parse_args()
 29 | show_warnings = args.show_resource_warnings
 30 | sys.argv[1:] = args.unittest_args
 31 | 
 32 | mysql_cfg = test_helpers.config_dict_user_details_dtypes_mysql
 33 | 
 34 | 
 35 | def setUpModule():
 36 |     if not show_warnings:
 37 |         warnings.filterwarnings(action="ignore",
 38 |                                 message="unclosed",
 39 |                                 category=ResourceWarning)
 40 |     if os.path.isfile("latest_seeds.csv"):
 41 |         os.rename("latest_seeds.csv",
 42 |                   "{}_latest_seeds.csv".format(datetime.now().isoformat().replace(":", "-")))
 43 | 
 44 | 
 45 | class FirstUseTest(unittest.TestCase):
 46 | 
 47 |     """Functional test for first use of the program."""
 48 | 
 49 |     @classmethod
 50 |     def setUpClass(cls):
 51 |         os.rename("seeds.csv", "seeds.csv.bak")
 52 |         if os.path.exists("latest_seeds.csv"):
 53 |             os.rename("latest_seeds.csv", "latest_seeds.csv.bak")
 54 | 
 55 |     @classmethod
 56 |     def tearDownClass(cls):
 57 |         if os.path.exists("seeds.csv"):
 58 |             os.remove("seeds.csv")
 59 |         os.rename("seeds.csv.bak", "seeds.csv")
 60 |         if os.path.exists("latest_seeds.csv.bak"):
 61 |             os.rename("latest_seeds.csv.bak", "latest_seeds.csv")
 62 | 
 63 |     def setUp(self):
 64 |         if os.path.isfile("config.yml"):
 65 |             os.rename("config.yml", "config.yml.bak")
 66 | 
 67 |     def tearDown(self):
 68 |         if os.path.isfile("config.yml.bak"):
 69 |             os.replace("config.yml.bak", "config.yml")
 70 |         if os.path.isfile("seeds.csv"):
 71 |             os.remove("seeds.csv")
 72 | 
 73 |         dbh = DataBaseHandler(config_dict=mysql_cfg, create_all=False)
 74 | 
 75 |         try:
 76 |             dbh.engine.execute("DROP TABLE friends")
 77 |         except InternalError:
 78 |             pass
 79 |         try:
 80 |             dbh.engine.execute("DROP TABLE user_details")
 81 |         except InternalError:
 82 |             pass
 83 |         try:
 84 |             dbh.engine.execute("DROP TABLE result")
 85 |         except InternalError:
 86 |             pass
 87 | 
 88 |     def test_starts_and_checks_for_necessary_input_seeds_missing(self):
 89 |         if os.path.isfile("seeds.csv"):
 90 |             os.remove("seeds.csv")
 91 | 
 92 |         with open("config.yml", "w") as f:
 93 |             yaml.dump(mysql_cfg, f, default_flow_style=False)
 94 | 
 95 |         # User starts program with `start.py`
 96 |         try:
 97 |             response = str(check_output('python start.py', stderr=STDOUT,
 98 |                                         shell=True), encoding="ascii")
 99 | 
100 |         # ... and encounters an error because the seeds.csv is missing.
101 |         except CalledProcessError as e:
102 |             response = str(e.output)
103 |             self.assertIn('"seeds.csv" could not be found', response)
104 | 
105 |     def test_starts_and_checks_for_necessary_input_seeds_empty(self):
106 |         # User starts program with `start.py`
107 |         shutil.copyfile("seeds_empty.csv", "seeds.csv")
108 | 
109 |         with open("config.yml", "w") as f:
110 |             yaml.dump(mysql_cfg, f, default_flow_style=False)
111 | 
112 |         try:
113 |             response = str(check_output('python start.py', stderr=STDOUT,
114 |                                         shell=True), encoding="ascii")
115 | 
116 |         # ... and encounters an error because the seeds.csv is empty.
117 |         except CalledProcessError as e:
118 |             response = str(e.output)
119 |             self.assertIn('"seeds.csv" is empty', response)
120 | 
121 |     def test_starts_and_checks_for_necessary_input_config_missing(self):
122 |         # user starts program with `start.py`
123 |         if not os.path.exists("seeds.csv"):
124 |             shutil.copyfile("seeds.csv.bak", "seeds.csv")
125 |         try:
126 |             response = str(check_output('python start.py', stderr=STDOUT,
127 |                                         shell=True), encoding="ascii")
128 | 
129 |         # ... and encounters an error because:
130 |         except CalledProcessError as e:
131 |             response = str(e.output)
132 |             # ... the config.yml is missing. Ergo the user creates a new one using make_config.py
133 |             self.assertIn("provide a config.yml", response)
134 |             if "provide a config.yml" in response:
135 |                 # Does make_config.py not make a new config.yml when entered "n"?
136 |                 p = Popen("python make_config.py", stdout=PIPE, stderr=PIPE, stdin=PIPE,
137 |                           shell=True)
138 |                 p.communicate("n\n".encode())
139 |                 self.assertFalse(os.path.isfile("config.yml"))
140 | 
141 |                 # Does make_config.py open a dialogue asking to open the new config.yaml?
142 |                 p = Popen("python make_config.py", stdout=PIPE, stderr=PIPE, stdin=PIPE,
143 |                           shell=True)
144 |                 p.communicate("y\n".encode())
145 | 
146 |             self.assertTrue(os.path.exists("config.yml"))
147 | 
148 |             with open("config.yml", "w") as f:
149 |                 yaml.dump(mysql_cfg, f, default_flow_style=False)
150 | 
151 |             DataBaseHandler().engine.execute("DROP TABLES friends, user_details, result;")
152 | 
153 |     def test_starting_collectors_and_writing_to_db(self):
154 | 
155 |         shutil.copyfile("seeds_test.csv", "seeds.csv")
156 | 
157 |         with open("config.yml", "w") as f:
158 |             yaml.dump(mysql_cfg, f, default_flow_style=False)
159 | 
160 |         try:
161 |             response = str(check_output('python start.py -n 2 -t -p 1',
162 |                                         stderr=STDOUT, shell=True))
163 |             print(response)
164 |         except CalledProcessError as e:
165 |             response = str(e.output)
166 |             print(response)
167 |             raise e
168 | 
169 |         dbh = DataBaseHandler()
170 | 
171 |         result = pd.read_sql("result", dbh.engine)
172 | 
173 |         self.assertLessEqual(len(result), 8)
174 | 
175 |         self.assertNotIn(True, result.duplicated().values)
176 | 
177 |         dbh.engine.execute("DROP TABLE friends, user_details, result;")
178 | 
179 |     def test_restarts_after_exception(self):
180 | 
181 |         shutil.copyfile("two_seeds.csv", "seeds.csv")
182 | 
183 |         with open("config.yml", "w") as f:
184 |             yaml.dump(mysql_cfg, f, default_flow_style=False)
185 | 
186 |         with self.assertRaises(TestException):
187 |             main_loop(Coordinator(), test_fail=True)
188 | 
189 |         p = Popen("python start.py -n 2 -t -f -p 1", stdout=PIPE, stderr=PIPE, stdin=PIPE,
190 |                   shell=True)
191 | 
192 |         stdout, stderr = p.communicate()
193 | 
194 |         self.assertIn("Retrying", stdout.decode('utf-8'))  # tries to restart
195 |         self.assertIn("Sent notification to", stdout.decode('utf-8'))
196 | 
197 |         latest_seeds = set(pd.read_csv("latest_seeds.csv", header=None)[0].values)
198 |         seeds = set(pd.read_csv('seeds.csv', header=None)[0].values)
199 | 
200 |         self.assertEqual(latest_seeds, seeds)
201 | 
202 |         q = Popen("python start.py -t --restart -p 1", stdout=PIPE, stderr=PIPE, stdin=PIPE,
203 |                   shell=True)
204 | 
205 |         stdout, stderr = q.communicate()
206 | 
207 |         self.assertIn("Restarting with latest seeds:", stdout.decode('utf-8'),
208 |                       msg=f"{stdout.decode('utf-8')}\n{stderr.decode('utf-8')}")
209 | 
210 |         latest_seeds = set(pd.read_csv("latest_seeds.csv", header=None)[0].values)
211 | 
212 |         self.assertNotEqual(latest_seeds, seeds)
213 | 
214 |         DataBaseHandler().engine.execute("DROP TABLE friends, user_details, result;")
215 | 
216 |     def test_collects_only_requested_number_of_pages_of_friends(self):
217 | 
218 |         shutil.copyfile("seed_with_lots_of_friends.csv", "seeds.csv")
219 | 
220 |         with open("config.yml", "w") as f:
221 |             yaml.dump(mysql_cfg, f, default_flow_style=False)
222 | 
223 |         try:
224 |             response = str(check_output('python start.py -n 1 -t -p 1',
225 |                                         stderr=STDOUT, shell=True))
226 |             print(response)
227 |         except CalledProcessError as e:
228 |             response = str(e.output)
229 |             print(response)
230 |             raise e
231 | 
232 |         dbh = DataBaseHandler()
233 | 
234 |         result = pd.read_sql("SELECT COUNT(*) FROM friends WHERE source = 2343198944", dbh.engine)
235 | 
236 |         result = result['COUNT(*)'][0]
237 | 
238 |         self.assertLessEqual(result, 5000)
239 |         self.assertGreater(result, 4000)
240 | 
241 |         dbh.engine.execute("DROP TABLE friends, user_details, result;")
242 | 
243 | 
244 | if __name__ == '__main__':
245 |     unittest.main()
246 | 


--------------------------------------------------------------------------------
/helpers.py:
--------------------------------------------------------------------------------
  1 | import numpy as np
  2 | 
  3 | 
  4 | friends_details_dtypes = {
  5 |     "contributors_enabled": np.int8,
  6 |     "created_at": np.datetime64,
  7 |     "default_profile": np.int8,
  8 |     "default_profile_image": np.int8,
  9 |     "description": str,
 10 |     "entities_description_urls": str,
 11 |     "entities_url_urls": str,
 12 |     "favourites_count": np.int64,
 13 |     "follow_request_sent": np.int8,
 14 |     "followers_count": np.int64,
 15 |     "following": np.int8,
 16 |     "friends_count": np.int64,
 17 |     "geo_enabled": np.int8,
 18 |     "has_extended_profile": np.int8,
 19 |     "id": np.int64,
 20 |     "id_str": str,
 21 |     "is_translation_enabled": np.int8,
 22 |     "is_translator": np.int8,
 23 |     "lang": str,
 24 |     "listed_count": np.int64,
 25 |     "location": str,
 26 |     "name": str,
 27 |     "needs_phone_verification": np.int8,
 28 |     "notifications": np.int8,
 29 |     "profile_background_color": str,
 30 |     "profile_background_image_url": str,
 31 |     "profile_background_image_url_https": str,
 32 |     "profile_background_tile": np.int8,
 33 |     "profile_banner_url": str,
 34 |     "profile_image_url": str,
 35 |     "profile_image_url_https": str,
 36 |     "profile_link_color": str,
 37 |     "profile_sidebar_border_color": str,
 38 |     "profile_sidebar_fill_color": str,
 39 |     "profile_text_color": str,
 40 |     "profile_use_background_image": np.int8,
 41 |     "protected": np.int8,
 42 |     "screen_name": str,
 43 |     "status_contributors": str,
 44 |     "status_coordinates": str,
 45 |     "status_coordinates_coordinates": str,
 46 |     "status_coordinates_type": str,
 47 |     "status_created_at": np.datetime64,
 48 |     "status_entities_hashtags": str,
 49 |     "status_entities_media": str,
 50 |     "status_entities_symbols": str,
 51 |     "status_entities_urls": str,
 52 |     "status_entities_user_mentions": str,
 53 |     "status_extended_entities_media": str,
 54 |     "status_favorite_count": np.int64,
 55 |     "status_favorited": np.int8,
 56 |     "status_geo": str,
 57 |     "status_geo_coordinates": str,
 58 |     "status_geo_type": str,
 59 |     "status_id": np.int64,
 60 |     "status_id_str": str,
 61 |     "status_in_reply_to_screen_name": str,
 62 |     "status_in_reply_to_status_id": np.int64,
 63 |     "status_in_reply_to_status_id_str": str,
 64 |     "status_in_reply_to_user_id": np.int64,
 65 |     "status_in_reply_to_user_id_str": str,
 66 |     "status_is_quote_status": np.int8,
 67 |     "status_lang": str,
 68 |     "status_place": str,
 69 |     "status_place_bounding_box_coordinates": str,
 70 |     "status_place_bounding_box_type": str,
 71 |     "status_place_contained_within": str,
 72 |     "status_place_country": str,
 73 |     "status_place_country_code": str,
 74 |     "status_place_full_name": str,
 75 |     "status_place_id": str,
 76 |     "status_place_name": str,
 77 |     "status_place_place_type": str,
 78 |     "status_place_url": str,
 79 |     "status_possibly_sensitive": np.int8,
 80 |     "status_quoted_status_id": np.int64,
 81 |     "status_quoted_status_id_str": str,
 82 |     "status_retweet_count": np.int64,
 83 |     "status_retweeted": np.int8,
 84 |     "status_retweeted_status_contributors": str,
 85 |     "status_retweeted_status_coordinates": str,
 86 |     "status_retweeted_status_created_at": np.datetime64,
 87 |     "status_retweeted_status_entities_hashtags": str,
 88 |     "status_retweeted_status_entities_media": str,
 89 |     "status_retweeted_status_entities_symbols": str,
 90 |     "status_retweeted_status_entities_urls": str,
 91 |     "status_retweeted_status_entities_user_mentions": str,
 92 |     "status_retweeted_status_extended_entities_media": str,
 93 |     "status_retweeted_status_favorite_count": np.int64,
 94 |     "status_retweeted_status_favorited": np.int8,
 95 |     "status_retweeted_status_geo": str,
 96 |     "status_retweeted_status_id": np.int64,
 97 |     "status_retweeted_status_id_str": str,
 98 |     "status_retweeted_status_in_reply_to_screen_name": str,
 99 |     "status_retweeted_status_in_reply_to_status_id": np.int64,
100 |     "status_retweeted_status_in_reply_to_status_id_str": str,
101 |     "status_retweeted_status_in_reply_to_user_id": np.int64,
102 |     "status_retweeted_status_in_reply_to_user_id_str": str,
103 |     "status_retweeted_status_is_quote_status": np.int8,
104 |     "status_retweeted_status_lang": str,
105 |     "status_retweeted_status_place": str,
106 |     "status_retweeted_status_possibly_sensitive": np.int8,
107 |     "status_retweeted_status_quoted_status_id": np.int64,
108 |     "status_retweeted_status_quoted_status_id_str": str,
109 |     "status_retweeted_status_retweet_count": np.int64,
110 |     "status_retweeted_status_retweeted": np.int8,
111 |     "status_retweeted_status_source": str,
112 |     "status_retweeted_status_full_text": str,
113 |     "status_retweeted_status_truncated": np.int8,
114 |     "status_source": str,
115 |     "status_full_text": str,
116 |     "status_truncated": np.int8,
117 |     "statuses_count": np.int64,
118 |     "suspended": np.int8,
119 |     "time_zone": str,
120 |     "translator_type": str,
121 |     "url": str,
122 |     "verified": np.int8,
123 |     "utc_offset": str
124 | }
125 | 


--------------------------------------------------------------------------------
/make_config.py:
--------------------------------------------------------------------------------
 1 | import os
 2 | import subprocess
 3 | import sys
 4 | from shutil import copyfile
 5 | 
 6 | 
 7 | # creates a new empty config file and opens it
 8 | def make_config():
 9 |     copyfile('config_template.yml', 'config.yml')
10 | 
11 | 
12 | if __name__ == '__main__':
13 |     i = 0
14 |     while True:
15 |         if i == 0:
16 |             answer = input('''This program will create a new config.yml.\n
17 |                             After running it, you will be asked with which program to\n
18 |                             open the new file. Please choose your standard text editor.\n
19 |                             Do you wish to create a new config.yml now? (y/n): ''')
20 |         else:
21 |             answer = input('''Sorry, I did not get your input. Do you wish to create \n
22 |                            a new config.yml now? Pleaser answer y for yes or n for no: ''')
23 |         if answer == "n":
24 |             break
25 |         elif answer == "y":
26 |             make_config()
27 |             if sys.platform.startswith('darwin'):
28 |                 subprocess.call(('open', "config.yml"))
29 |             elif os.name == 'nt':  # For Windows
30 |                 os.startfile("config.yml")
31 |             elif os.name == 'posix':  # For Linux, Mac, etc.
32 |                 subprocess.call(('xdg-open', "config.yml"))
33 |             break
34 |         else:
35 |             i = 1
36 |             pass
37 | 


--------------------------------------------------------------------------------
/make_test_tweet_jsons.py:
--------------------------------------------------------------------------------
 1 | import argparse
 2 | import json
 3 | import os
 4 | from collector import Connection, Collector
 5 | from tweepy.error import TweepError
 6 | 
 7 | parser = argparse.ArgumentParser(description='SparseTwitter user_details Downloader')
 8 | parser.add_argument('-s', '--seed',
 9 |                     help='''Provide a seed (=Twitter user ID). Its friends details will
10 |                          be downloaded.''',
11 |                     required=False,
12 |                     type=int,
13 |                     default=1670174994)
14 | 
15 | # Setup the Collector
16 | seed = parser.parse_args().seed  # Swap seeds with another Twitter User ID if you like
17 | if seed == 1670174994:
18 |     print("No seed given. Using default seed " + str(seed) + ".")
19 | else:
20 |     print("Downloading and saving friends' details of user " + str(seed) + ".")
21 | con = Connection()
22 | collector = Collector(con, seed)
23 | 
24 | # Get the friends and details of the specified seed
25 | try:
26 |     friends = collector.get_friend_list()
27 | except TweepError as e:
28 |     if "'code': 34" in e.reason:
29 |         raise TweepError("The seed you have given is not a valid Twitter user ID")
30 | friends_details = collector.get_details(friends)
31 | 
32 | # Check for the relevant directory
33 | if not os.path.isdir(os.path.join("tests", "tweet_jsons")):
34 |     os.mkdir(os.path.join("tests", "tweet_jsons"))
35 | 
36 | # Write details in json files
37 | ct = 1
38 | for friend_details in friends_details:
39 |     with open(os.path.join("tests", "tweet_jsons", "user_" + str(ct) + ".json"), "w") as f:
40 |         json.dump(friend_details._json, f)
41 |     ct += 1
42 | 


--------------------------------------------------------------------------------
/passwords_template.py:
--------------------------------------------------------------------------------
1 | sparsetwittermysqlpw = ""  # mySQL Database Password
2 | 
3 | # Details for Mailgun
4 | email_to_notify = ""
5 | mailgun_default_smtp_login = ""
6 | mailgun_api_base_url = ""
7 | mailgun_api_key = ""
8 | 


--------------------------------------------------------------------------------
/seed_with_lots_of_friends.csv:
--------------------------------------------------------------------------------
1 | 2343198944


--------------------------------------------------------------------------------
/seeds.csv:
--------------------------------------------------------------------------------
  1 | 1160813580
  2 | 387286700
  3 | 1650198780
  4 | 3024495627
  5 | 2578020576
  6 | 1615651381
  7 | 1474424516
  8 | 2369064876
  9 | 2356366477
 10 | 3472134496
 11 | 747694110
 12 | 2823918498
 13 | 3208550097
 14 | 272320418
 15 | 829238504
 16 | 1313754673
 17 | 1058601290
 18 | 4745661921
 19 | 4441230569
 20 | 4318242023
 21 | 1063591824
 22 | 3128010721
 23 | 1127335022
 24 | 1864730334
 25 | 3002360276
 26 | 4471778609
 27 | 130920197
 28 | 349283534
 29 | 3856362255
 30 | 2252523257
 31 | 1444461152
 32 | 4077417616
 33 | 2262505775
 34 | 3024991169
 35 | 3147623579
 36 | 3874002016
 37 | 3671970441
 38 | 1726736760
 39 | 3814280783
 40 | 2914680869
 41 | 1058357965
 42 | 3372766829
 43 | 2538995806
 44 | 2262632932
 45 | 2885623492
 46 | 3418218262
 47 | 2218086659
 48 | 335823104
 49 | 1053116809
 50 | 3401112971
 51 | 3432820259
 52 | 209303905
 53 | 1258184006
 54 | 796851890
 55 | 1006285662
 56 | 220695831
 57 | 637379118
 58 | 4277813061
 59 | 1880484644
 60 | 2743755429
 61 | 2383032568
 62 | 3973389993
 63 | 4259114561
 64 | 718765261
 65 | 1726474417
 66 | 2956222413
 67 | 462397876
 68 | 3296344923
 69 | 884449957
 70 | 3137787919
 71 | 2335721112
 72 | 3027412138
 73 | 498901234
 74 | 2781553545
 75 | 351156021
 76 | 2782879317
 77 | 4135423463
 78 | 3456702916
 79 | 2390022283
 80 | 4051757151
 81 | 203013862
 82 | 2376494468
 83 | 3290275384
 84 | 164215794
 85 | 2505322224
 86 | 490618665
 87 | 941416256
 88 | 368341331
 89 | 2905473880
 90 | 3170869461
 91 | 1423283714
 92 | 348531344
 93 | 4094440878
 94 | 2808133216
 95 | 2572066829
 96 | 913997334
 97 | 2857548747
 98 | 755882400
 99 | 402905400
100 | 863806069
101 | 2859522701
102 | 2798373465
103 | 2842997865
104 | 704562018
105 | 1645850383
106 | 1338687164
107 | 154246966
108 | 2576520419
109 | 2762898040
110 | 629324495
111 | 399662062
112 | 2209566454
113 | 2513383812
114 | 2920605401
115 | 14654198
116 | 452151022
117 | 4689724754
118 | 1323367015
119 | 3502661722
120 | 582949817
121 | 3199922521
122 | 2247300875
123 | 4162142890
124 | 927925566
125 | 239562669
126 | 3368720859
127 | 1433137598
128 | 289021193
129 | 4758403947
130 | 2746563946
131 | 140023188
132 | 588559865
133 | 498903566
134 | 2391508352
135 | 585492814
136 | 269555197
137 | 3401234464
138 | 2322539054
139 | 881006096
140 | 382580898
141 | 485927303
142 | 115634822
143 | 2980339960
144 | 2623612919
145 | 939869593
146 | 4762405155
147 | 3791004803
148 | 3771853829
149 | 1134613718
150 | 81878858
151 | 968468935
152 | 1127549936
153 | 2807449437
154 | 889759956
155 | 83843595
156 | 2187803476
157 | 3207500698
158 | 2604254442
159 | 2172587015
160 | 347856926
161 | 2477017303
162 | 877446548
163 | 438437315
164 | 2919834849
165 | 487816821
166 | 2759768812
167 | 518411707
168 | 2955674980
169 | 1324474825
170 | 86313958
171 | 2956580308
172 | 1355366048
173 | 3204190670
174 | 265279495
175 | 495344962
176 | 2428003101
177 | 198588886
178 | 171061254
179 | 339112582
180 | 2699374110
181 | 2635454777
182 | 1074460009
183 | 612562461
184 | 115398574
185 | 446242313
186 | 2992813781
187 | 2365475859
188 | 609966621
189 | 528335444
190 | 2572785724
191 | 288124721
192 | 3000036442
193 | 3588679635
194 | 2359625452
195 | 2177278912
196 | 3050287239
197 | 98723220
198 | 4872795622
199 | 2841209625
200 | 4003824741
201 | 


--------------------------------------------------------------------------------
/seeds_empty.csv:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/FlxVctr/RADICES/94eb8d91663e58fcc20d1cddbba35b4f51378b0b/seeds_empty.csv


--------------------------------------------------------------------------------
/seeds_template.csv:
--------------------------------------------------------------------------------
  1 | 1160813580
  2 | 387286700
  3 | 1650198780
  4 | 3024495627
  5 | 2578020576
  6 | 1615651381
  7 | 1474424516
  8 | 2369064876
  9 | 2356366477
 10 | 3472134496
 11 | 747694110
 12 | 2823918498
 13 | 3208550097
 14 | 272320418
 15 | 829238504
 16 | 1313754673
 17 | 1058601290
 18 | 4745661921
 19 | 4441230569
 20 | 4318242023
 21 | 1063591824
 22 | 3128010721
 23 | 1127335022
 24 | 1864730334
 25 | 3002360276
 26 | 4471778609
 27 | 130920197
 28 | 349283534
 29 | 3856362255
 30 | 2252523257
 31 | 1444461152
 32 | 4077417616
 33 | 2262505775
 34 | 3024991169
 35 | 3147623579
 36 | 3874002016
 37 | 3671970441
 38 | 1726736760
 39 | 3814280783
 40 | 2914680869
 41 | 1058357965
 42 | 3372766829
 43 | 2538995806
 44 | 2262632932
 45 | 2885623492
 46 | 3418218262
 47 | 2218086659
 48 | 335823104
 49 | 1053116809
 50 | 3401112971
 51 | 3432820259
 52 | 209303905
 53 | 1258184006
 54 | 796851890
 55 | 1006285662
 56 | 220695831
 57 | 637379118
 58 | 4277813061
 59 | 1880484644
 60 | 2743755429
 61 | 2383032568
 62 | 3973389993
 63 | 4259114561
 64 | 718765261
 65 | 1726474417
 66 | 2956222413
 67 | 462397876
 68 | 3296344923
 69 | 884449957
 70 | 3137787919
 71 | 2335721112
 72 | 3027412138
 73 | 498901234
 74 | 2781553545
 75 | 351156021
 76 | 2782879317
 77 | 4135423463
 78 | 3456702916
 79 | 2390022283
 80 | 4051757151
 81 | 203013862
 82 | 2376494468
 83 | 3290275384
 84 | 164215794
 85 | 2505322224
 86 | 490618665
 87 | 941416256
 88 | 368341331
 89 | 2905473880
 90 | 3170869461
 91 | 1423283714
 92 | 348531344
 93 | 4094440878
 94 | 2808133216
 95 | 2572066829
 96 | 913997334
 97 | 2857548747
 98 | 755882400
 99 | 402905400
100 | 863806069
101 | 2859522701
102 | 2798373465
103 | 2842997865
104 | 704562018
105 | 1645850383
106 | 1338687164
107 | 154246966
108 | 2576520419
109 | 2762898040
110 | 629324495
111 | 399662062
112 | 2209566454
113 | 2513383812
114 | 2920605401
115 | 14654198
116 | 452151022
117 | 4689724754
118 | 1323367015
119 | 3502661722
120 | 582949817
121 | 3199922521
122 | 2247300875
123 | 4162142890
124 | 927925566
125 | 239562669
126 | 3368720859
127 | 1433137598
128 | 289021193
129 | 4758403947
130 | 2746563946
131 | 140023188
132 | 588559865
133 | 498903566
134 | 2391508352
135 | 585492814
136 | 269555197
137 | 3401234464
138 | 2322539054
139 | 881006096
140 | 382580898
141 | 485927303
142 | 115634822
143 | 2980339960
144 | 2623612919
145 | 939869593
146 | 4762405155
147 | 3791004803
148 | 3771853829
149 | 1134613718
150 | 81878858
151 | 968468935
152 | 1127549936
153 | 2807449437
154 | 889759956
155 | 83843595
156 | 2187803476
157 | 3207500698
158 | 2604254442
159 | 2172587015
160 | 347856926
161 | 2477017303
162 | 877446548
163 | 438437315
164 | 2919834849
165 | 487816821
166 | 2759768812
167 | 518411707
168 | 2955674980
169 | 1324474825
170 | 86313958
171 | 2956580308
172 | 1355366048
173 | 3204190670
174 | 265279495
175 | 495344962
176 | 2428003101
177 | 198588886
178 | 171061254
179 | 339112582
180 | 2699374110
181 | 2635454777
182 | 1074460009
183 | 612562461
184 | 115398574
185 | 446242313
186 | 2992813781
187 | 2365475859
188 | 609966621
189 | 528335444
190 | 2572785724
191 | 288124721
192 | 3000036442
193 | 3588679635
194 | 2359625452
195 | 2177278912
196 | 3050287239
197 | 98723220
198 | 4872795622
199 | 2841209625
200 | 4003824741
201 | 


--------------------------------------------------------------------------------
/seeds_test.csv:
--------------------------------------------------------------------------------
   1 | 769041127899009024
   2 | 83662933
   3 | 2673261430
   4 | 1016830345889701888
   5 | 19830856
   6 | 1032678888269336576
   7 | 896102591498776578
   8 | 36684736
   9 | 476935296
  10 | 596500117
  11 | 967089795456491522
  12 | 980251691193851904
  13 | 2649412098
  14 | 2328075054
  15 | 2320475550
  16 | 2364445033
  17 | 932738354294161408
  18 | 4820804277
  19 | 39356720
  20 | 190564700
  21 | 806177097265901568
  22 | 595484668
  23 | 35257717
  24 | 54184375
  25 | 26716249
  26 | 593041510
  27 | 951028652770291712
  28 | 1040403366
  29 | 263020833
  30 | 5715752
  31 | 26783152
  32 | 114508061
  33 | 15243812
  34 | 5734902
  35 | 1560587828
  36 | 84550656
  37 | 66823494
  38 | 14553288
  39 | 31087080
  40 | 19108766
  41 | 19767324
  42 | 11013962
  43 | 36970167
  44 | 988320878
  45 | 16100710
  46 | 14389093
  47 | 1855503205
  48 | 30569817
  49 | 214272214
  50 | 2836842186
  51 | 976057131400155136
  52 | 1348771484
  53 | 368379922
  54 | 20463983
  55 | 21494378
  56 | 727875626330378240
  57 | 3076796608
  58 | 201490244
  59 | 335181321
  60 | 3306930087
  61 | 1423906682
  62 | 57925222
  63 | 2308103545
  64 | 4901370903
  65 | 16310263
  66 | 2317351705
  67 | 316389142
  68 | 1464120432
  69 | 453030125
  70 | 1470232105
  71 | 3046696618
  72 | 65607107
  73 | 47375691
  74 | 5483202
  75 | 103973912
  76 | 4307446636
  77 | 3379461005
  78 | 379940359
  79 | 95551818
  80 | 47478759
  81 | 1372037468
  82 | 33434994
  83 | 736541087716630532
  84 | 126046965
  85 | 1337785291
  86 | 78837869
  87 | 88577174
  88 | 936011619804442624
  89 | 2764750474
  90 | 2419140080
  91 | 2828212668
  92 | 398087684
  93 | 2460368252
  94 | 68400053
  95 | 22525751
  96 | 303048766
  97 | 2962250933
  98 | 381187236
  99 | 5994452
 100 | 16689804
 101 | 18812572
 102 | 817386
 103 | 2251623492
 104 | 209811713
 105 | 1582853809
 106 | 4398626122
 107 | 1654188770
 108 | 1451773004
 109 | 316327930
 110 | 23962323
 111 | 717313
 112 | 34743251
 113 | 14647570
 114 | 13298072
 115 | 11348282
 116 | 5988062
 117 | 14677919
 118 | 259771124
 119 | 17248121
 120 | 14075928
 121 | 19658826
 122 | 3459051
 123 | 14361155
 124 | 27830610
 125 | 20596281
 126 | 14499829
 127 | 33933259
 128 | 14700316
 129 | 18213483
 130 | 69231187
 131 | 14159148
 132 | 1289362482
 133 | 38451030
 134 | 3898598357
 135 | 77436536
 136 | 95731075
 137 | 607311335
 138 | 712598138901643264
 139 | 79749280
 140 | 16366472
 141 | 16680571
 142 | 846165733285347328
 143 | 702824612095078400
 144 | 56605612
 145 | 701504334341734400
 146 | 52401341
 147 | 19002346
 148 | 824241575706427392
 149 | 26000689
 150 | 8143682
 151 | 2329921
 152 | 37213193
 153 | 27047369
 154 | 3092629269
 155 | 237670274
 156 | 545156588
 157 | 2233154425
 158 | 187484412
 159 | 818876014390603776
 160 | 16333712
 161 | 470106608
 162 | 15518560
 163 | 3027041582
 164 | 12
 165 | 50393960
 166 | 13
 167 | 82443473
 168 | 27596259
 169 | 12819112
 170 | 17534607
 171 | 314378129
 172 | 8271262
 173 | 361173087
 174 | 426550185
 175 | 21088417
 176 | 708113
 177 | 2259993758
 178 | 264097819
 179 | 288078578
 180 | 288889970
 181 | 277961081
 182 | 123431491
 183 | 22595510
 184 | 388983706
 185 | 4157723231
 186 | 394063385
 187 | 2979097128
 188 | 5605712
 189 | 17675072
 190 | 14684110
 191 | 253603966
 192 | 243896198
 193 | 3889782193
 194 | 18622869
 195 | 16017475
 196 | 2347049341
 197 | 16955870
 198 | 2303751216
 199 | 16753692
 200 | 1374321499
 201 | 13493122
 202 | 2479087773
 203 | 830152026453577728
 204 | 375502494
 205 | 783648878495145984
 206 | 80750540
 207 | 900247404917673984
 208 | 46377338
 209 | 46220856
 210 | 75498935
 211 | 40383352
 212 | 402875257
 213 | 261665936
 214 | 14767515
 215 | 116766613
 216 | 2164132428
 217 | 47571889
 218 | 14773649
 219 | 11178902
 220 | 16228337
 221 | 817442499551756288
 222 | 4816
 223 | 14706139
 224 | 402995665
 225 | 765104120915234816
 226 | 899902687
 227 | 44586323
 228 | 47881149
 229 | 2977580734
 230 | 1377379782
 231 | 337321190
 232 | 2287331420
 233 | 1112553272
 234 | 2771848270
 235 | 114485232
 236 | 17112843
 237 | 27731964
 238 | 358619703
 239 | 94035021
 240 | 93654149
 241 | 36346494
 242 | 822215673812119553
 243 | 2341772972
 244 | 2284174986
 245 | 44196397
 246 | 335671713
 247 | 3094222966
 248 | 142791254
 249 | 120269446
 250 | 48477652
 251 | 3697013177
 252 | 19854920
 253 | 563311832
 254 | 2891858896
 255 | 267944170
 256 | 50762276
 257 | 1539343122
 258 | 2228815292
 259 | 807095
 260 | 822710883935780864
 261 | 2254182852
 262 | 1595615893
 263 | 3536736623
 264 | 2775564497
 265 | 229910053
 266 | 816567039577956352
 267 | 3431605294
 268 | 15806521
 269 | 11663272
 270 | 425267565
 271 | 4876022234
 272 | 723099338629500928
 273 | 4353357197
 274 | 46769303
 275 | 2472108787
 276 | 58841829
 277 | 823367015830323201
 278 | 16426657
 279 | 1320161
 280 | 571603577
 281 | 24362471
 282 | 917910307
 283 | 2179529843
 284 | 183501560
 285 | 153185635
 286 | 14140695
 287 | 15008456
 288 | 775184441090027520
 289 | 3197921
 290 | 875861664
 291 | 2398002414
 292 | 3606164057
 293 | 1605720368
 294 | 3356531254
 295 | 2244994945
 296 | 786491
 297 | 495430242
 298 | 3306527433
 299 | 2493407430
 300 | 22097835
 301 | 69448996
 302 | 53549590
 303 | 2903217440
 304 | 2198690413
 305 | 125695429
 306 | 1398876810
 307 | 207534677
 308 | 202181866
 309 | 123085951
 310 | 121564533
 311 | 716375773800697857
 312 | 370389714
 313 | 434857136
 314 | 434236851
 315 | 161646584
 316 | 35708293
 317 | 111600563
 318 | 3017844172
 319 | 4507964833
 320 | 1198594368
 321 | 2237808535
 322 | 14504859
 323 | 4848860457
 324 | 3588277825
 325 | 1902773580
 326 | 3075333501
 327 | 1927923236
 328 | 77712285
 329 | 15008596
 330 | 1493423918
 331 | 415906140
 332 | 426509606
 333 | 304928205
 334 | 56341402
 335 | 15160540
 336 | 582161546
 337 | 16629994
 338 | 47654804
 339 | 15210983
 340 | 6753702
 341 | 14191640
 342 | 6480152
 343 | 238955118
 344 | 2390331060
 345 | 25053097
 346 | 2323501513
 347 | 1173685621
 348 | 88943180
 349 | 1200763760
 350 | 603877056
 351 | 88060423
 352 | 783942472019931136
 353 | 384650167
 354 | 39428536
 355 | 101770339
 356 | 47438273
 357 | 20778563
 358 | 31384530
 359 | 14893345
 360 | 1526228120
 361 | 1288375663
 362 | 62827149
 363 | 1290351
 364 | 273214252
 365 | 1025580128
 366 | 492327798
 367 | 17324052
 368 | 740238495952736256
 369 | 716348353915686912
 370 | 2828562864
 371 | 273199520
 372 | 23268862
 373 | 1699604941
 374 | 101422142
 375 | 1340011
 376 | 961374817
 377 | 20747796
 378 | 1237373550
 379 | 14892927
 380 | 2248057626
 381 | 1653010081
 382 | 775583414510497792
 383 | 491895975
 384 | 380749300
 385 | 1412345191
 386 | 3190556852
 387 | 25073877
 388 | 2881661080
 389 | 793926
 390 | 14453232
 391 | 54425158
 392 | 30849103
 393 | 763914232786145280
 394 | 917965284
 395 | 24741685
 396 | 2803191
 397 | 74286565
 398 | 125400714
 399 | 18720595
 400 | 259453542
 401 | 68677937
 402 | 2341085688
 403 | 27792033
 404 | 3099501
 405 | 2248872301
 406 | 1618939716
 407 | 89087163
 408 | 2191981764
 409 | 14414286
 410 | 25298569
 411 | 172532664
 412 | 66468564
 413 | 717583922595364865
 414 | 12229772
 415 | 4255361
 416 | 76980293
 417 | 142415193
 418 | 1551184494
 419 | 90524645
 420 | 274626857
 421 | 16669898
 422 | 14824990
 423 | 249765062
 424 | 2312333412
 425 | 754651485980459009
 426 | 3353357193
 427 | 86044873
 428 | 335595328
 429 | 14077832
 430 | 91068586
 431 | 133562519
 432 | 1439685422
 433 | 21558856
 434 | 350700354
 435 | 31543267
 436 | 34247967
 437 | 228055082
 438 | 19530582
 439 | 15140589
 440 | 4727373682
 441 | 2498331294
 442 | 20499665
 443 | 14300508
 444 | 607480698
 445 | 14396027
 446 | 420954778
 447 | 736285300729733120
 448 | 137116288
 449 | 86390214
 450 | 57571700
 451 | 4923437464
 452 | 352510636
 453 | 566812856
 454 | 91804096
 455 | 1273755780
 456 | 2443972061
 457 | 2902651666
 458 | 2786173214
 459 | 106611045
 460 | 51734793
 461 | 14659447
 462 | 15668978
 463 | 67566869
 464 | 170303282
 465 | 76115927
 466 | 1336552399
 467 | 29248745
 468 | 47087325
 469 | 87308833
 470 | 57140128
 471 | 19304187
 472 | 14299624
 473 | 838999716
 474 | 66421943
 475 | 18247347
 476 | 375499570
 477 | 49976458
 478 | 847412334
 479 | 133893663
 480 | 854581958
 481 | 56510427
 482 | 16330659
 483 | 78270570
 484 | 1547101849
 485 | 3245142196
 486 | 3119988399
 487 | 1252144297
 488 | 461379146
 489 | 16403155
 490 | 2859039287
 491 | 26291578
 492 | 20639941
 493 | 50478950
 494 | 3549743295
 495 | 3674645536
 496 | 2576431
 497 | 2608166073
 498 | 1884191208
 499 | 19968025
 500 | 274430113
 501 | 2470512481
 502 | 712749669260984320
 503 | 78688499
 504 | 3500269756
 505 | 2301638324
 506 | 4316593949
 507 | 633
 508 | 2182812798
 509 | 600029236
 510 | 17184309
 511 | 287816744
 512 | 334107188
 513 | 522324711
 514 | 719302825512034304
 515 | 131935861
 516 | 17682232
 517 | 6635422
 518 | 3406855121
 519 | 1922183119
 520 | 19278408
 521 | 2402013658
 522 | 2904886151
 523 | 22036511
 524 | 3356462770
 525 | 1010950338
 526 | 47428795
 527 | 70459168
 528 | 430306392
 529 | 2810902381
 530 | 1257535058
 531 | 20555437
 532 | 1160751655
 533 | 21650075
 534 | 12808972
 535 | 154355992
 536 | 2413352126
 537 | 48289662
 538 | 610659001
 539 | 14247082
 540 | 17682362
 541 | 551687308
 542 | 10451462
 543 | 288605100
 544 | 15806978
 545 | 529830882
 546 | 918151616
 547 | 120813008
 548 | 2251684044
 549 | 279412211
 550 | 216776631
 551 | 1339835893
 552 | 3389944744
 553 | 295713773
 554 | 1606242174
 555 | 92227227
 556 | 4228587674
 557 | 16477634
 558 | 224495471
 559 | 2436389418
 560 | 4023357917
 561 | 763440894
 562 | 3066551830
 563 | 2916305152
 564 | 5943942
 565 | 67915432
 566 | 19662154
 567 | 5974412
 568 | 22888086
 569 | 2846840261
 570 | 50605029
 571 | 1101341
 572 | 822215679726100480
 573 | 1536791610
 574 | 3317084775
 575 | 3525878536
 576 | 272730913
 577 | 3318180745
 578 | 1101399690
 579 | 2262722558
 580 | 20280065
 581 | 13334762
 582 | 63873759
 583 | 45391039
 584 | 607946179
 585 | 105590714
 586 | 2213117090
 587 | 1152122496
 588 | 56365668
 589 | 1891806212
 590 | 51002583
 591 | 464564528
 592 | 11156392
 593 | 3297270913
 594 | 2734713482
 595 | 19054387
 596 | 13566872
 597 | 206717989
 598 | 169426475
 599 | 157981564
 600 | 196994616
 601 | 124690469
 602 | 33838201
 603 | 3355051751
 604 | 2837694282
 605 | 78442404
 606 | 846905922
 607 | 420091534
 608 | 16895951
 609 | 75641981
 610 | 11125942
 611 | 251535413
 612 | 1320998672
 613 | 3023310881
 614 | 15903326
 615 | 1262059087
 616 | 111030963
 617 | 22755223
 618 | 3305381595
 619 | 500704345
 620 | 35147037
 621 | 190583610
 622 | 2367431
 623 | 98873137
 624 | 596316544
 625 | 3297801310
 626 | 204297410
 627 | 3196230661
 628 | 1912617043
 629 | 186910006
 630 | 1946165882
 631 | 3154287162
 632 | 15765018
 633 | 312189104
 634 | 1490902753
 635 | 1652884033
 636 | 897858440
 637 | 1549209552
 638 | 466083800
 639 | 22749856
 640 | 46770438
 641 | 39973389
 642 | 91982547
 643 | 2834511
 644 | 1338019026
 645 | 65078480
 646 | 6822912
 647 | 1621528116
 648 | 1975955528
 649 | 442678902
 650 | 537122882
 651 | 56505125
 652 | 19330099
 653 | 39223985
 654 | 2202920821
 655 | 2710319475
 656 | 40148479
 657 | 2718774725
 658 | 41312733
 659 | 370338250
 660 | 95463894
 661 | 78934252
 662 | 34950452
 663 | 887523506
 664 | 372313088
 665 | 17803524
 666 | 19534629
 667 | 242582424
 668 | 2546210232
 669 | 2176822777
 670 | 1044919213
 671 | 19488465
 672 | 1689053928
 673 | 1295719904
 674 | 1335048846
 675 | 19072286
 676 | 3052069771
 677 | 2595064682
 678 | 30599331
 679 | 69220946
 680 | 719013
 681 | 2846058853
 682 | 14839109
 683 | 2176358690
 684 | 46471465
 685 | 70105352
 686 | 111574040
 687 | 1530889117
 688 | 26974240
 689 | 124434709
 690 | 814881
 691 | 15247023
 692 | 138012644
 693 | 17015268
 694 | 2970420477
 695 | 144078209
 696 | 1051171
 697 | 106241842
 698 | 18202677
 699 | 22461446
 700 | 1132776031
 701 | 3010639332
 702 | 487905834
 703 | 14645160
 704 | 246532728
 705 | 263127148
 706 | 180505807
 707 | 204832963
 708 | 25411758
 709 | 17672825
 710 | 322506796
 711 | 2496529670
 712 | 7190852
 713 | 15861468
 714 | 502531529
 715 | 614884262
 716 | 13088992
 717 | 627987302
 718 | 531788806
 719 | 664883
 720 | 560864000
 721 | 2832189284
 722 | 16213124
 723 | 130649891
 724 | 87532773
 725 | 6253282
 726 | 376825877
 727 | 6844292
 728 | 22465460
 729 | 18313165
 730 | 371823539
 731 | 94787173
 732 | 234785663
 733 | 950802878
 734 | 15181803
 735 | 1536552278
 736 | 78807415
 737 | 22467617
 738 | 13521812
 739 | 428982236
 740 | 1958326711
 741 | 9626642
 742 | 838464523
 743 | 15865878
 744 | 2496667400
 745 | 2270729301
 746 | 218984871
 747 | 1105839134
 748 | 2315178300
 749 | 521368028
 750 | 2317524115
 751 | 5769472
 752 | 2231228510
 753 | 253536357
 754 | 611986351
 755 | 949268624
 756 | 2181826142
 757 | 376267732
 758 | 2154538213
 759 | 1719900512
 760 | 1430400949
 761 | 18780641
 762 | 19923474
 763 | 321629144
 764 | 39021657
 765 | 17896763
 766 | 17462723
 767 | 1397451631
 768 | 312748934
 769 | 18490018
 770 | 235921307
 771 | 557558765
 772 | 16076032
 773 | 577289927
 774 | 141915585
 775 | 234296047
 776 | 14854155
 777 | 17971083
 778 | 14850399
 779 | 77719559
 780 | 1140451
 781 | 42256930
 782 | 14235098
 783 | 281581384
 784 | 105102178
 785 | 275686563
 786 | 18972334
 787 | 15392033
 788 | 97426684
 789 | 15742288
 790 | 21749206
 791 | 21695922
 792 | 11060982
 793 | 407347022
 794 | 177507079
 795 | 79456653
 796 | 17791180
 797 | 97310140
 798 | 775591994
 799 | 22014811
 800 | 10246582
 801 | 582728848
 802 | 711803
 803 | 235696644
 804 | 401922298
 805 | 621403074
 806 | 19704259
 807 | 95255169
 808 | 460533115
 809 | 82383210
 810 | 857813509
 811 | 471872425
 812 | 15006174
 813 | 15028848
 814 | 52696465
 815 | 16527404
 816 | 21704174
 817 | 20915367
 818 | 17389991
 819 | 144950663
 820 | 19180320
 821 | 40517090
 822 | 423364278
 823 | 14640559
 824 | 37712201
 825 | 459298312
 826 | 552895403
 827 | 16867474
 828 | 21440857
 829 | 15211500
 830 | 15775749
 831 | 18239435
 832 | 123578864
 833 | 9793082
 834 | 463700716
 835 | 14176770
 836 | 17803569
 837 | 116193595
 838 | 19242665
 839 | 451586190
 840 | 9993772
 841 | 545825855
 842 | 18942977
 843 | 132660415
 844 | 18588430
 845 | 220000321
 846 | 1741221
 847 | 834544453
 848 | 74448435
 849 | 39982325
 850 | 44576546
 851 | 16475657
 852 | 358428263
 853 | 410955689
 854 | 11979772
 855 | 97237270
 856 | 10253232
 857 | 82971362
 858 | 45370474
 859 | 19004783
 860 | 312705486
 861 | 21257115
 862 | 791083760
 863 | 45163174
 864 | 13068822
 865 | 756741314
 866 | 105554801
 867 | 84111950
 868 | 5582742
 869 | 224407693
 870 | 15473958
 871 | 15998669
 872 | 17782233
 873 | 16745492
 874 | 90596312
 875 | 20597868
 876 | 14497517
 877 | 49411886
 878 | 726912924
 879 | 509064363
 880 | 633793808
 881 | 10876852
 882 | 182055312
 883 | 18466967
 884 | 462775400
 885 | 481456646
 886 | 21209555
 887 | 613657071
 888 | 17492631
 889 | 439281506
 890 | 1217801
 891 | 19759617
 892 | 22036165
 893 | 9773092
 894 | 566298895
 895 | 241067950
 896 | 533076506
 897 | 117051643
 898 | 56118316
 899 | 40227879
 900 | 338891131
 901 | 550843940
 902 | 72814114
 903 | 14764243
 904 | 364488011
 905 | 211544109
 906 | 273468342
 907 | 463122035
 908 | 76700084
 909 | 57350105
 910 | 359919054
 911 | 480266532
 912 | 12421082
 913 | 129026706
 914 | 475910101
 915 | 5746452
 916 | 15129265
 917 | 89593573
 918 | 16569660
 919 | 307009093
 920 | 71277496
 921 | 326776224
 922 | 52687314
 923 | 40283581
 924 | 225235528
 925 | 40227292
 926 | 31030273
 927 | 26015858
 928 | 438532851
 929 | 389946451
 930 | 161723960
 931 | 268464220
 932 | 121431534
 933 | 203781901
 934 | 208149256
 935 | 387942987
 936 | 383797311
 937 | 411560362
 938 | 53901362
 939 | 18295116
 940 | 387750600
 941 | 389532648
 942 | 389393606
 943 | 393577559
 944 | 389298698
 945 | 359959410
 946 | 392675870
 947 | 390160050
 948 | 19299909
 949 | 18687936
 950 | 390945097
 951 | 390622471
 952 | 237651617
 953 | 19726654
 954 | 35142791
 955 | 190730941
 956 | 48272332
 957 | 19187671
 958 | 114504124
 959 | 337299792
 960 | 14208791
 961 | 316516471
 962 | 49713024
 963 | 98220617
 964 | 16838771
 965 | 258428285
 966 | 2960221
 967 | 173155110
 968 | 34872011
 969 | 326117963
 970 | 19016843
 971 | 225421876
 972 | 225712501
 973 | 16490214
 974 | 234343491
 975 | 57136767
 976 | 225663702
 977 | 323242396
 978 | 1344951
 979 | 38464103
 980 | 27159398
 981 | 263096190
 982 | 917131
 983 | 275508478
 984 | 265296096
 985 | 125348034
 986 | 14324983
 987 | 257124818
 988 | 24867956
 989 | 24870628
 990 | 24277906
 991 | 256400038
 992 | 14613514
 993 | 151058995
 994 | 2836581
 995 | 245735979
 996 | 69248730
 997 | 128463002
 998 | 185611778
 999 | 11435642
1000 | 9334352
1001 | 15040978
1002 | 89785537
1003 | 92546663
1004 | 101472891
1005 | 53112669
1006 | 1708841
1007 | 211089576
1008 | 21650481
1009 | 49394939
1010 | 1007251
1011 | 46090688
1012 | 14116588
1013 | 19280591
1014 | 102734360
1015 | 18216581
1016 | 201891140
1017 | 7014182
1018 | 21507142
1019 | 86277392
1020 | 14247715
1021 | 1727601
1022 | 21743933
1023 | 14868991
1024 | 16986371
1025 | 136879221
1026 | 18044580
1027 | 17995778
1028 | 14420009
1029 | 16589206
1030 | 12509262
1031 | 1234801
1032 | 38888944
1033 | 15494684
1034 | 14670903
1035 | 37439097
1036 | 123247764
1037 | 106164576
1038 | 68511500
1039 | 15072786
1040 | 94834350
1041 | 47971303
1042 | 47718696
1043 | 18761526
1044 | 91797748
1045 | 10049712
1046 | 18863815
1047 | 15318271
1048 | 31812497
1049 | 20862865
1050 | 77777801
1051 | 19903466
1052 | 5876652
1053 | 783214
1054 | 813286
1055 | 14454247
1056 | 20536157
1057 | 52407579
1058 | 9655032
1059 | 1879351
1060 | 21657529
1061 | 55836889
1062 | 56624987
1063 | 639643
1064 | 37930051
1065 | 9294012
1066 | 15234407
1067 | 27434239
1068 | 18477798
1069 | 14341194
1070 | 20187015
1071 | 42603105
1072 | 23759730
1073 | 14742504
1074 | 17773851
1075 | 


--------------------------------------------------------------------------------
/setup.py:
--------------------------------------------------------------------------------
  1 | import json
  2 | from json import JSONDecodeError
  3 | import pandas as pd
  4 | from requests import post
  5 | import yaml
  6 | 
  7 | 
  8 | # General class for necessary file imports
  9 | class FileImport():
 10 |     def read_app_key_file(self, filename: str = "keys.json") -> tuple:
 11 |         """Reads file with consumer key and consumer secret (JSON)
 12 | 
 13 |         Args:
 14 |             filename (str, optional): Defaults to "keys.json"
 15 | 
 16 |         Returns:
 17 |             Tuple with two strings: (1) being the twitter consumer token and (2) being the
 18 |             twitter consumer secret
 19 |         """
 20 | 
 21 |         # TODO: change return to dictionary
 22 | 
 23 |         try:
 24 |             with open(filename, "r") as f:
 25 |                 self.key_file = json.load(f)
 26 |         except FileNotFoundError:
 27 |             raise FileNotFoundError('"keys.json" could not be found')
 28 |         except JSONDecodeError as e:
 29 |             print("Bad JSON file. Please check that 'keys.json' is formatted\
 30 |                   correctly and that it is not empty")
 31 |             raise e
 32 |         if "consumer_token" not in self.key_file or "consumer_secret" not in self.key_file:
 33 |             raise KeyError('''"keys.json" does not contain the dictionary keys
 34 |              "consumer_token" and/or "consumer_secret"''')
 35 | 
 36 |         if type(self.key_file["consumer_secret"]) is not str or type(
 37 |                 self.key_file["consumer_token"]) is not str:
 38 |                     raise TypeError("Consumer secret is type" +
 39 |                                     str(type(self.key_file["consumer_secret"])) +
 40 |                                     "and consumer token is type " + str(type(
 41 |                                         self.key_file["consumer_token"])) + '''. Both
 42 |                                      must be of type str. ''')
 43 | 
 44 |         return (self.key_file["consumer_token"], self.key_file["consumer_secret"])
 45 | 
 46 |     def read_seed_file(self, filename: str = "seeds.csv") -> pd.DataFrame:
 47 |         """Reads file with specified seeds to start from (csv)
 48 | 
 49 |         Args:
 50 |             filename (str, optional): Defaults to "seeds.csv"
 51 | 
 52 |         Returns:
 53 |             A single column pandas DataFrame with one Twitter ID (seed) each row.
 54 |         """
 55 |         try:
 56 |             with open("seeds.csv", "r") as f:
 57 |                 self.seeds = pd.read_csv(f, header=None)
 58 |         except FileNotFoundError:
 59 |             raise FileNotFoundError('"seeds.csv" could not be found')
 60 |         except pd.errors.EmptyDataError as e:
 61 |             print('"seeds.csv" is empty!')
 62 |             raise e
 63 |         return self.seeds
 64 | 
 65 |     def read_token_file(self, filename="tokens.csv"):
 66 |         """Reads file with authorized user tokens (csv).
 67 | 
 68 |         Args:
 69 |             filename (str, optional): Defaults to "tokens.csv"
 70 | 
 71 |         Returns:
 72 |             pandas.DataFrame: With columns `token` and `secret`, one line per user
 73 |         """
 74 |         return pd.read_csv(filename)
 75 | 
 76 | 
 77 | # Configuration class. Reads out all information from a given config.yml
 78 | class Config():
 79 |     """Class that handles the SQL and twitter user details configuration.
 80 | 
 81 |     Attributes:
 82 |         config_file (str): Path to configuration file
 83 |         config_dict (dict): Dictionary containing the config information (in case
 84 |                             the dictionary shall be directly passed instead of read
 85 |                             out of a configuration file).
 86 |     """
 87 |     config_template = "config_template.py"
 88 | 
 89 |     # Initializes class using config.yml
 90 |     def __init__(self, config_file="config.yml", config_dict: dict = None):
 91 |         if config_dict is not None:
 92 |             self.config = config_dict
 93 |         else:
 94 |             self.config_path = config_file
 95 |             try:
 96 |                 with open(self.config_path, 'r') as f:
 97 |                     self.config = yaml.safe_load(f)
 98 |             except FileNotFoundError:
 99 |                 raise FileNotFoundError('Could not find "' + self.config_path + '''".\n
100 |                 Please run "python3 make_config.py" or provide a config.yml''')
101 | 
102 |         # Check if mailgun notifications should be used
103 |         if "notifications" not in self.config:
104 |             self.use_notifications = False
105 |         else:
106 |             self.notif_config = self.config["notifications"]
107 |             notif_config_items = [value for (key, value) in self.notif_config.items()]
108 |             single_items = list(set(notif_config_items))
109 |             if len(single_items) == 1 and single_items[0] is None:
110 |                 self.use_notifications = False
111 |             elif None in single_items:
112 |                 missing = [key for (key, value) in self.notif_config.items() if value is None]
113 |                 raise ValueError(f"""You have not filled all required fields for the notifications
114 |                                  configuration! Fields missing are {missing}""")
115 |             else:
116 |                 self.use_notifications = True
117 | 
118 |         # Check for necessary database information. If no information is provided,
119 |         # stop
120 |         if "sql" not in self.config:
121 |             print("Config file " + config_file + """ does not contain key 'sql'!
122 |                   Will use default sqlite configuration.""")
123 |             self.config["sql"] = dict(dbtype="sqlite",
124 |                                       dbname="new_database")
125 |         self.sql_config = self.config["sql"]
126 | 
127 |         # No db type given in Config
128 |         if self.sql_config["dbtype"] is None:
129 |             print('''Parameter dbtype not set in the "config.yml". Will create
130 |                   an sqlite database.''')
131 |             self.dbtype = "sqlite"
132 |         else:
133 |             self.dbtype = self.sql_config["dbtype"].strip()
134 | 
135 |         # DB type is msql - checking for all parameters
136 |         if self.dbtype == "mysql":
137 |             try:
138 |                 self.dbhost = str(self.sql_config["host"])
139 |                 self.dbuser = str(self.sql_config["user"])
140 |                 self.dbpwd = str(self.sql_config["passwd"])
141 |                 if self.dbhost == '':
142 |                     raise ValueError("dbhost parameter is empty")
143 |                 if self.dbuser == '':
144 |                     raise ValueError("dbuser parameter is empty")
145 |                 if self.dbpwd == '':
146 |                     raise ValueError("passwd parameter is empty")
147 |             except KeyError as e:
148 |                 raise e
149 |         elif self.dbtype == "sqlite":
150 |             self.dbhost = None
151 |             self.dbuser = None
152 |             self.dbpwd = None
153 |         else:
154 |             raise ValueError('''dbtype parameter is neither "sqlite" nor
155 |                                       "mysql". Please adjust the "config.yml" ''')
156 | 
157 |         # Set db name
158 |         if self.sql_config["dbname"] is not None:
159 |             self.dbname = self.sql_config["dbname"]
160 |         else:
161 |             print('''Parameter "dbname" is missing. New database will have the name
162 |                   "new_database".''')
163 |             self.dbname = "new_database"
164 | 
165 |     # Function to send mail if notifications are turned on in config.yml
166 |     # TODO: finalize this function
167 |     def send_mail(self, message_dict):
168 |         '''Sends an email via Mailgun.
169 |         Args:
170 |             message_dict (dict):
171 |                 {
172 |                 "subject": "your_subject"
173 |                 "text": "message"
174 |                 }
175 |             config (dict):
176 |                 {
177 |                 "mailgun_api_base_url": "link to mailgun_api_base_url"
178 |                 "mailgun_api_key": "your mailgun_api_key"
179 |                 "mailgun_default_smtp_login": "your mailgun_default_smtp_login"
180 |                 "email_to_notify": "the email_to_notify"
181 |                 }
182 |         Returns:
183 |             requests.post to Mailgun API.
184 |         '''
185 | 
186 |         api_base_url = self.notif_config["mailgun_api_base_url"] + '/messages'
187 |         auth = ('api', self.notif_config["mailgun_api_key"])
188 | 
189 |         data = {
190 |             "from": f"SparseTwitter <{self.notif_config['mailgun_default_smtp_login']}>",
191 |             "to": self.notif_config["email_to_notify"]
192 |         }
193 | 
194 |         data.update(message_dict)
195 | 
196 |         return post(api_base_url, auth=auth, data=data)
197 | 
198 |         # TODO: Add mailgun config
199 | 


--------------------------------------------------------------------------------
/setup_server.sh:
--------------------------------------------------------------------------------
 1 | wget https://raw.githubusercontent.com/chetankapoor/swap/master/swap.sh -O swap.sh
 2 | sudo sh swap.sh 2G
 3 | sudo apt-get update
 4 | sudo apt-get install mysql-server
 5 | mysql_secure_installation
 6 | sudo mysql -u root -p
 7 | create database sparsetwitter;
 8 | create database sparsetwitter_live;
 9 | set global validate_password_special_char_count = 0;
10 | create user 'sparsetwitter'@'localhost' identified by 'password';
11 | GRANT ALL PRIVILEGES ON *.* TO 'sparsetwitter'@'localhost';
12 | create user 'sparsetwitter_remote'@'%' identified by 'password';
13 | GRANT ALL PRIVILEGES ON sparsetwitter.* TO 'sparsetwitter_remote'@'%';
14 | GRANT ALL PRIVILEGES ON sparsetwitter_live.* TO 'sparsetwitter_remote'@'%';
15 | # follow this guide: https://medium.com/@haotangio/how-to-properly-setup-mysql-5-7-for-production-on-ubuntu-16-04-dd4088286016
16 | exit
17 | git clone https://github.com/FlxVctr/SparseTwitter.git
18 | sudo apt install python-pip
19 | pip install --user pipenv 
20 | sudo apt upgrade
21 | sudo reboot now
22 | # ssh back into machine
23 | cd SparseTwitter
24 | screen
25 | curl -L https://github.com/pyenv/pyenv-installer/raw/master/bin/pyenv-installer | bash
26 | # add stuff to bashrc as prompted by script
27 | source ../.bashrc
28 | pyenv update
29 | sudo apt-get install -y make build-essential libssl-dev zlib1g-dev libbz2-dev \
30 | libreadline-dev libsqlite3-dev wget curl llvm libncurses5-dev libncursesw5-dev \
31 | xz-utils tk-dev libffi-dev liblzma-dev
32 | sudo reboot now
33 | # ssh back into machine
34 | screen
35 | cd SparseTwitter
36 | pipenv install
37 | pipenv shell
38 | # follow readme in SparseTwitter
39 | python tests/tests.py -s
40 | python functional_test.py


--------------------------------------------------------------------------------
/start.py:
--------------------------------------------------------------------------------
  1 | import argparse
  2 | from datetime import datetime
  3 | import os
  4 | import time
  5 | import traceback
  6 | from shutil import copyfile
  7 | from sys import stderr, stdout
  8 | 
  9 | import pandas as pd
 10 | 
 11 | from collector import Coordinator
 12 | from setup import Config
 13 | 
 14 | 
 15 | def main_loop(coordinator, select=[], status_lang=None, test_fail=False, restart=False,
 16 |               bootstrap=False, language_threshold=0, keywords=[]):
 17 | 
 18 |     try:
 19 |         latest_start_time = pd.read_sql_table('timetable', coordinator.dbh.engine)
 20 |         latest_start_time = latest_start_time['latest_start_time'][0]
 21 |     except ValueError:
 22 |         latest_start_time = 0
 23 | 
 24 |     if restart is True:
 25 | 
 26 |         update_query = f"""
 27 |                         UPDATE friends
 28 |                         SET burned=0
 29 |                         WHERE UNIX_TIMESTAMP(timestamp) > {latest_start_time}
 30 |                        """
 31 |         coordinator.dbh.engine.execute(update_query)
 32 | 
 33 |     start_time = time.time()
 34 | 
 35 |     pd.DataFrame({'latest_start_time': [start_time]}).to_sql('timetable', coordinator.dbh.engine,
 36 |                                                              if_exists='replace')
 37 | 
 38 |     collectors = coordinator.start_collectors(select=select,
 39 |                                               status_lang=status_lang,
 40 |                                               fail=test_fail,
 41 |                                               restart=restart,
 42 |                                               retries=4,
 43 |                                               latest_start_time=latest_start_time,
 44 |                                               bootstrap=bootstrap,
 45 |                                               language_threshold=language_threshold,
 46 |                                               keywords=keywords)
 47 | 
 48 |     stdout.write("\nstarting {} collectors\n".format(len(collectors)))
 49 |     stdout.write(f"\nKeywords: {keywords}\n")
 50 |     stdout.flush()
 51 | 
 52 |     i = 0
 53 |     timeout = 7200
 54 | 
 55 |     for instance in collectors:
 56 |         instance.join(timeout=timeout)
 57 |         if instance.is_alive():
 58 |             raise RuntimeError(f"Thread {instance.name} took longer than {timeout} seconds \
 59 | to finish.")
 60 |         if instance.err is not None:
 61 |             raise instance.err
 62 |         i += 1
 63 |         stdout.write(f"Thread {instance.name} joined. {i} collector(s) finished\n")
 64 |         stdout.flush()
 65 | 
 66 | 
 67 | if __name__ == "__main__":
 68 | 
 69 |     # Backup latest_seeds.csv if exists
 70 |     if os.path.isfile("latest_seeds.csv"):
 71 |         copyfile("latest_seeds.csv",
 72 |                  "{}_latest_seeds.csv".format(datetime.now().isoformat().replace(":", "-")))
 73 | 
 74 |     # Get arguments from commandline
 75 |     parser = argparse.ArgumentParser()
 76 |     parser.add_argument('-n', '--seeds', type=int, help="specify number of seeds", default=10)
 77 |     parser.add_argument('-l', '--language', nargs="+",
 78 |                         help="specify language codes of last status by users to gather")
 79 |     parser.add_argument('-lt', '--lthreshold', type=float,
 80 |                         help="fraction threshold (0 to 1) of last 200 tweets by an account that \
 81 | must have chosen languages detected (leads to less false positives but \
 82 | also more false negatives)", default=0)
 83 |     parser.add_argument('-k', '--keywords', nargs="+",
 84 |                         help="specify keywords contained in last 200 tweets by users to gather")
 85 |     parser.add_argument('-r', '--restart',
 86 |                         help="restart with latest seeds in latest_seeds.csv", action="store_true")
 87 |     parser.add_argument('-p', '--following_pages_limit', type=int,
 88 |                         help='''Define limit for maximum number of recent followings to retrieve per \
 89 | account to determine most followed friend.
 90 | 1 page has a maximum of 5000 folllowings.
 91 | Lower values speed up collection. Default: 0 (unlimited)''', default=0)
 92 |     parser.add_argument('-b', '--bootstrap', help="at every step, add a seed's friends and followers \
 93 | to the seed pool from which accounts are chosen randomly if walkers are at an impasse",
 94 |                         action="store_true")
 95 |     parser.add_argument('-t', '--test', help="dev only: test for 2 loops only",
 96 |                         action="store_true")
 97 |     parser.add_argument('-f', '--fail', help="dev only: test unexpected exception",
 98 |                         action="store_true")
 99 | 
100 |     args = parser.parse_args()
101 | 
102 |     config = Config()
103 | 
104 |     user_details_list = []
105 |     for detail, sqldatatype in config.config["twitter_user_details"].items():
106 |         if sqldatatype is not None:
107 |             user_details_list.append(detail)
108 | 
109 |     if args.restart:
110 |         latest_seeds_df = pd.read_csv('latest_seeds.csv', header=None)[0]
111 |         latest_seeds = list(latest_seeds_df.values)
112 |         coordinator = Coordinator(seed_list=latest_seeds,
113 |                                   following_pages_limit=args.following_pages_limit)
114 |         print("Restarting with latest seeds:\n")
115 |         print(latest_seeds_df)
116 |     else:
117 |         coordinator = Coordinator(seeds=args.seeds,
118 |                                   following_pages_limit=args.following_pages_limit)
119 | 
120 |     k = 0
121 |     restart_counter = 0
122 | 
123 |     while True:
124 | 
125 |         if args.test:
126 |             k += 1
127 |             if k == 2:
128 |                 args.fail = False
129 |             if k == 3:
130 |                 break
131 |             stdout.write("\nTEST RUN {}\n".format(k))
132 |             stdout.flush()
133 | 
134 |         try:
135 |             if args.restart is True and restart_counter == 0:
136 | 
137 |                 main_loop(coordinator, select=user_details_list,
138 |                           status_lang=args.language, test_fail=args.fail, restart=True,
139 |                           bootstrap=args.bootstrap, language_threshold=args.lthreshold,
140 |                           keywords=args.keywords)
141 |                 restart_counter += 1
142 |             else:
143 |                 main_loop(coordinator, select=user_details_list,
144 |                           status_lang=args.language, test_fail=args.fail, bootstrap=args.bootstrap,
145 |                           language_threshold=args.lthreshold, keywords=args.keywords)
146 |         except Exception:
147 |             stdout.write("Encountered unexpected exception:\n")
148 |             traceback.print_exc()
149 |             try:
150 |                 if config.use_notifications is True:
151 |                     response = config.send_mail({
152 |                         "subject": "Unexpected Error",
153 |                         "text":
154 |                             f"Unexpected Error encountered.\n{traceback.format_exc()}"
155 |                     }
156 |                     )
157 |                     assert '200' in str(response)
158 |                     stdout.write(f"Sent notification to {config.notif_config['email_to_notify']}")
159 |                     stdout.flush()
160 |             except Exception:
161 |                 stderr.write('Could not send error-mail: \n')
162 |                 traceback.print_exc(file=stderr)
163 |             stdout.write("Retrying in 5 seconds.")
164 |             stdout.flush()
165 |             latest_seeds = list(pd.read_csv('latest_seeds.csv', header=None)[0].values)
166 |             coordinator = Coordinator(seed_list=latest_seeds,
167 |                                       following_pages_limit=args.following_pages_limit)
168 |             args.restart = True
169 |             restart_counter = 0
170 |             time.sleep(5)
171 | 


--------------------------------------------------------------------------------
/test_helpers.py:
--------------------------------------------------------------------------------
  1 | import numpy as np
  2 | import pandas as pd
  3 | 
  4 | import passwords
  5 | 
  6 | config_dict_notifications = {
  7 |     "email_to_notify": passwords.email_to_notify,
  8 |     "mailgun_default_smtp_login": passwords.mailgun_default_smtp_login,
  9 |     "mailgun_api_base_url": passwords.mailgun_api_base_url,
 10 |     "mailgun_api_key": passwords.mailgun_api_key
 11 | }
 12 | 
 13 | config_dict_twitter_details = {
 14 |     "twitter_user_details": {
 15 |         "contributors_enabled": None,
 16 |         "created_at": "DATETIME",
 17 |         "default_profile": None,
 18 |         "default_profile_image": None,
 19 |         "description": None,
 20 |         "entities_description_urls": None,
 21 |         "entities_url_urls": None,
 22 |         "favourites_count": None,
 23 |         "follow_request_sent": None,
 24 |         "followers_count": "INT",
 25 |         "following": None,
 26 |         "friends_count": None,
 27 |         "geo_enabled": None,
 28 |         "has_extended_profile": None,
 29 |         "id": "BIGINT PRIMARY KEY",
 30 |         "id_str": None,
 31 |         "is_translation_enabled": None,
 32 |         "is_translator": None,
 33 |         "lang": None,
 34 |         "listed_count": None,
 35 |         "location": None,
 36 |         "name": None,
 37 |         "needs_phone_verification": None,
 38 |         "notifications": None,
 39 |         "profile_background_color": None,
 40 |         "profile_background_image_url": None,
 41 |         "profile_background_image_url_https": None,
 42 |         "profile_background_tile": None,
 43 |         "profile_banner_url": None,
 44 |         "profile_image_url": None,
 45 |         "profile_image_url_https": None,
 46 |         "profile_link_color": None,
 47 |         "profile_sidebar_border_color": None,
 48 |         "profile_sidebar_fill_color": None,
 49 |         "profile_text_color": None,
 50 |         "profile_use_background_image": None,
 51 |         "protected": None,
 52 |         "screen_name": None,
 53 |         "status_contributors": None,
 54 |         "status_coordinates": None,
 55 |         "status_coordinates_coordinates": None,
 56 |         "status_coordinates_type": None,
 57 |         "status_created_at": None,
 58 |         "status_entities_hashtags": None,
 59 |         "status_entities_media": None,
 60 |         "status_entities_symbols": None,
 61 |         "status_entities_urls": None,
 62 |         "status_entities_user_mentions": None,
 63 |         "status_extended_entities_media": None,
 64 |         "status_favorite_count": None,
 65 |         "status_favorited": None,
 66 |         "status_geo": None,
 67 |         "status_geo_coordinates": None,
 68 |         "status_geo_type": None,
 69 |         "status_id": None,
 70 |         "status_id_str": None,
 71 |         "status_in_reply_to_screen_name": None,
 72 |         "status_in_reply_to_status_id": None,
 73 |         "status_in_reply_to_status_id_str": None,
 74 |         "status_in_reply_to_user_id": None,
 75 |         "status_in_reply_to_user_id_str": None,
 76 |         "status_is_quote_status": None,
 77 |         "status_lang": "VARCHAR(20)",
 78 |         "status_place": None,
 79 |         "status_place_bounding_box_coordinates": None,
 80 |         "status_place_bounding_box_type": None,
 81 |         "status_place_contained_within": None,
 82 |         "status_place_country": None,
 83 |         "status_place_country_code": None,
 84 |         "status_place_full_name": None,
 85 |         "status_place_id": None,
 86 |         "status_place_name": None,
 87 |         "status_place_place_type": None,
 88 |         "status_place_url": None,
 89 |         "status_possibly_sensitive": None,
 90 |         "status_quoted_status_id": None,
 91 |         "status_quoted_status_id_str": None,
 92 |         "status_retweet_count": None,
 93 |         "status_retweeted": None,
 94 |         "status_retweeted_status_contributors": None,
 95 |         "status_retweeted_status_coordinates": None,
 96 |         "status_retweeted_status_created_at": None,
 97 |         "status_retweeted_status_entities_hashtags": None,
 98 |         "status_retweeted_status_entities_media": None,
 99 |         "status_retweeted_status_entities_symbols": None,
100 |         "status_retweeted_status_entities_urls": None,
101 |         "status_retweeted_status_entities_user_mentions": None,
102 |         "status_retweeted_status_extended_entities_media": None,
103 |         "status_retweeted_status_favorite_count": None,
104 |         "status_retweeted_status_favorited": None,
105 |         "status_retweeted_status_geo": None,
106 |         "status_retweeted_status_id": None,
107 |         "status_retweeted_status_id_str": None,
108 |         "status_retweeted_status_in_reply_to_screen_name": None,
109 |         "status_retweeted_status_in_reply_to_status_id": None,
110 |         "status_retweeted_status_in_reply_to_status_id_str": None,
111 |         "status_retweeted_status_in_reply_to_user_id": None,
112 |         "status_retweeted_status_in_reply_to_user_id_str": None,
113 |         "status_retweeted_status_is_quote_status": None,
114 |         "status_retweeted_status_lang": None,
115 |         "status_retweeted_status_place": None,
116 |         "status_retweeted_status_possibly_sensitive": None,
117 |         "status_retweeted_status_quoted_status_id": None,
118 |         "status_retweeted_status_quoted_status_id_str": None,
119 |         "status_retweeted_status_retweet_count": None,
120 |         "status_retweeted_status_retweeted": None,
121 |         "status_retweeted_status_source": None,
122 |         "status_retweeted_status_full_text": None,
123 |         "status_retweeted_status_truncated": None,
124 |         "status_source": None,
125 |         "status_full_text": None,
126 |         "status_truncated": None,
127 |         "statuses_count": "BIGINT",
128 |         "suspended": None,
129 |         "time_zone": None,
130 |         "translator_type": None,
131 |         "url": None,
132 |         "verified": None,
133 |         "utc_offset": None
134 |     }
135 | }
136 | 
137 | config_dict_sqlite = {
138 |     "sql": {
139 |         "dbtype": "sqlite",
140 |         "host": None,
141 |         "user": None,
142 |         "passwd": None,
143 |         "dbname": "test_db"
144 |     },
145 |     "twitter_user_details": config_dict_twitter_details["twitter_user_details"],
146 |     "notifications": config_dict_notifications
147 | }
148 | 
149 | config_dict_mysql = {
150 |     "sql": {
151 |         "dbtype": "mysql",
152 |         "host": "127.0.0.1",
153 |         "user": "sparsetwitter",
154 |         "passwd": passwords.sparsetwittermysqlpw,
155 |         "dbname": "sparsetwitter"
156 |     },
157 |     "twitter_user_details": config_dict_twitter_details["twitter_user_details"],
158 |     "notifications": config_dict_notifications
159 | }
160 | 
161 | # TODO: FELDER OHNE VORKOMMEN RAUS?
162 | config_dict_user_details_dtypes_sqlite = {
163 |     "sql": {
164 |         "dbtype": "sqlite",
165 |         "host": None,
166 |         "user": None,
167 |         "passwd": None,
168 |         "dbname": "test_db"
169 |     },
170 |     "twitter_user_details": {
171 |         "contributors_enabled": "SMALLINT",
172 |         "created_at": "DATETIME",
173 |         "default_profile": "SMALLINT",
174 |         "default_profile_image": "SMALLINT",
175 |         "description": "TEXT",
176 |         "entities_description_urls": "TEXT",
177 |         "entities_url_urls": "TEXT",
178 |         "favourites_count": "BIGINT",
179 |         "follow_request_sent": "SMALLINT",
180 |         "followers_count": "BIGINT",
181 |         "following": "SMALLINT",
182 |         "friends_count": "BIGINT",
183 |         "geo_enabled": "SMALLINT",
184 |         "has_extended_profile": "SMALLINT",
185 |         "id": "BIGINT PRIMARY KEY",
186 |         "id_str": "VARCHAR(30)",
187 |         "is_translation_enabled": "SMALLINT",
188 |         "is_translator": "SMALLINT",
189 |         "lang": "VARCHAR(30)",
190 |         "listed_count": "BIGINT",
191 |         "location": "TEXT",
192 |         "name": "VARCHAR(50)",
193 |         "needs_phone_verification": "SMALLINT",
194 |         "notifications": "SMALLINT",
195 |         "profile_background_color": "CHAR(6)",
196 |         "profile_background_image_url": "TEXT",
197 |         "profile_background_image_url_https": "TEXT",
198 |         "profile_background_tile": "SMALLINT",
199 |         "profile_banner_url": "TEXT",
200 |         "profile_image_url": "TEXT",
201 |         "profile_image_url_https": "TEXT",
202 |         "profile_link_color": "CHAR(6)",
203 |         "profile_sidebar_border_color": "CHAR(6)",
204 |         "profile_sidebar_fill_color": "CHAR(6)",
205 |         "profile_text_color": "CHAR(6)",
206 |         "profile_use_background_image": "SMALLINT",
207 |         "protected": "SMALLINT",
208 |         "screen_name": "VARCHAR(50)",
209 |         "status_contributors": "TEXT",  # TODO: SEE IF THIS IS NEEDED
210 |         "status_coordinates": "TEXT",
211 |         "status_coordinates_coordinates": "TEXT",  # TODO: SEE IF THIS IS NEEDED
212 |         "status_coordinates_type": "TEXT",  # TODO: SEE IF THIS IS NEEDED
213 |         "status_created_at": "DATETIME",
214 |         "status_entities_hashtags": "TEXT",
215 |         "status_entities_media": "TEXT",
216 |         "status_entities_symbols": "TEXT",
217 |         "status_entities_urls": "TEXT",
218 |         "status_entities_user_mentions": "TEXT",
219 |         "status_extended_entities_media": "TEXT",
220 |         "status_favorite_count": "INT",
221 |         "status_favorited": "SMALLINT",
222 |         "status_geo": "TEXT",
223 |         "status_geo_coordinates": "TEXT",  # TODO: SEE IF THIS IS NEEDED
224 |         "status_geo_type": "TEXT",  # TODO: SEE IF THIS IS NEEDED
225 |         "status_id": "BIGINT",
226 |         "status_id_str": "VARCHAR(50)",
227 |         "status_in_reply_to_screen_name": "VARCHAR(30)",
228 |         "status_in_reply_to_status_id": "BIGINT",
229 |         "status_in_reply_to_status_id_str": "VARCHAR(50)",
230 |         "status_in_reply_to_user_id": "BIGINT",
231 |         "status_in_reply_to_user_id_str": "VARCHAR(30)",
232 |         "status_is_quote_status": "SMALLINT",
233 |         "status_lang": "VARCHAR(30)",
234 |         "status_place": "TEXT",
235 |         "status_place_bounding_box_coordinates": "TEXT",  # TODO: SEE IF THIS IS NEEDED
236 |         "status_place_bounding_box_type": "TEXT",  # TODO: SEE IF THIS IS NEEDED
237 |         "status_place_contained_within": "TEXT",   # TODO: SEE IF THIS IS NEEDED
238 |         "status_place_country": "TEXT",  # TODO: SEE IF THIS IS NEEDED
239 |         "status_place_country_code": "TEXT",  # TODO: SEE IF THIS IS NEEDED
240 |         "status_place_full_name": "TEXT",  # TODO: SEE IF THIS IS NEEDED
241 |         "status_place_id": "TEXT",  # TODO: SEE IF THIS IS NEEDED
242 |         "status_place_name": "TEXT",  # TODO: SEE IF THIS IS NEEDED
243 |         "status_place_place_type": "TEXT",  # TODO: SEE IF THIS IS NEEDED
244 |         "status_place_url": "TEXT",  # TODO: SEE IF THIS IS NEEDED
245 |         "status_possibly_sensitive": "SMALLINT",
246 |         "status_quoted_status_id": "BIGINT",
247 |         "status_quoted_status_id_str": "VARCHAR(50)",
248 |         "status_retweet_count": "INT",
249 |         "status_retweeted": "SMALLINT",
250 |         "status_retweeted_status_contributors": "TEXT",  # TODO: SEE IF THIS IS NEEDED
251 |         "status_retweeted_status_coordinates": "TEXT",
252 |         "status_retweeted_status_created_at": "DATETIME",
253 |         "status_retweeted_status_entities_hashtags": "TEXT",
254 |         "status_retweeted_status_entities_media": "TEXT",
255 |         "status_retweeted_status_entities_symbols": "TEXT",  # TODO: SEE IF THIS IS NEEDED
256 |         "status_retweeted_status_entities_urls": "TEXT",
257 |         "status_retweeted_status_entities_user_mentions": "TEXT",
258 |         "status_retweeted_status_extended_entities_media": "TEXT",
259 |         "status_retweeted_status_favorite_count": "INT",
260 |         "status_retweeted_status_favorited": "SMALLINT",
261 |         "status_retweeted_status_geo": "TEXT",
262 |         "status_retweeted_status_id": "BIGINT",
263 |         "status_retweeted_status_id_str": "VARCHAR(50)",
264 |         "status_retweeted_status_in_reply_to_screen_name": "VARCHAR(30)",
265 |         "status_retweeted_status_in_reply_to_status_id": "BIGINT",
266 |         "status_retweeted_status_in_reply_to_status_id_str": "VARCHAR(50)",
267 |         "status_retweeted_status_in_reply_to_user_id": "BIGINT",
268 |         "status_retweeted_status_in_reply_to_user_id_str": "VARCHAR(30)",
269 |         "status_retweeted_status_is_quote_status": "SMALLINT",
270 |         "status_retweeted_status_lang": "VARCHAR(30)",
271 |         "status_retweeted_status_place": "TEXT",
272 |         "status_retweeted_status_possibly_sensitive": "SMALLINT",
273 |         "status_retweeted_status_quoted_status_id": "BIGINT",
274 |         "status_retweeted_status_quoted_status_id_str": "VARCHAR(50)",
275 |         "status_retweeted_status_retweet_count": "INT",
276 |         "status_retweeted_status_retweeted": "SMALLINT",
277 |         "status_retweeted_status_source": "TEXT",
278 |         "status_retweeted_status_full_text": "TEXT",
279 |         "status_retweeted_status_truncated": "SMALLINT",
280 |         "status_source": "TEXT",
281 |         "status_full_text": "TEXT",
282 |         "status_truncated": "SMALLINT",
283 |         "statuses_count": "BIGINT",
284 |         "suspended": "SMALLINT",
285 |         "time_zone": "TEXT",  # TODO: SEE IF THIS IS NEEDED
286 |         "translator_type": "VARCHAR(30)",
287 |         "url": "TEXT",
288 |         "verified": "SMALLINT",
289 |         "utc_offset": "TEXT"   # TODO: SEE IF THIS IS NEEDED
290 |     },
291 |     "notifications": config_dict_notifications
292 | }
293 | 
294 | config_dict_user_details_dtypes_mysql = {
295 |     "sql": {
296 |         "dbtype": "mysql",
297 |         "host": "127.0.0.1",
298 |         "user": "sparsetwitter",
299 |         "passwd": passwords.sparsetwittermysqlpw,
300 |         "dbname": "sparsetwitter"
301 |     },
302 |     "twitter_user_details": config_dict_user_details_dtypes_sqlite["twitter_user_details"],
303 |     "notifications": config_dict_notifications
304 | }
305 | 
306 | friends_details_pddf_dtypes = {
307 |     "contributors_enabled": bool,
308 |     "created_at": pd.Timestamp,
309 |     "default_profile": bool,
310 |     "default_profile_image": bool,
311 |     "description": str,
312 |     "entities_description_urls": str,
313 |     "entities_url_urls": str,
314 |     "favourites_count": np.int64,
315 |     "follow_request_sent": bool,
316 |     "followers_count": np.int64,
317 |     "following": bool,
318 |     "friends_count": np.int64,
319 |     "geo_enabled": bool,
320 |     "has_extended_profile": bool,
321 |     "id": np.int64,
322 |     "id_str": str,
323 |     "is_translation_enabled": bool,
324 |     "is_translator": bool,
325 |     "lang": str,
326 |     "listed_count": np.int64,
327 |     "location": str,
328 |     "name": str,
329 |     "needs_phone_verification": bool,
330 |     "notifications": bool,
331 |     "profile_background_color": str,
332 |     "profile_background_image_url": str,
333 |     "profile_background_image_url_https": str,
334 |     "profile_background_tile": bool,
335 |     "profile_banner_url": str,
336 |     "profile_image_url": str,
337 |     "profile_image_url_https": str,
338 |     "profile_link_color": str,
339 |     "profile_sidebar_border_color": str,
340 |     "profile_sidebar_fill_color": str,
341 |     "profile_text_color": str,
342 |     "profile_use_background_image": bool,
343 |     "protected": bool,
344 |     "screen_name": str,
345 |     "status_contributors": str,
346 |     "status_coordinates": str,
347 |     "status_coordinates_coordinates": str,
348 |     "status_coordinates_type": str,
349 |     "status_created_at": pd.Timestamp,
350 |     "status_entities_hashtags": str,
351 |     "status_entities_media": str,
352 |     "status_entities_symbols": str,
353 |     "status_entities_urls": str,
354 |     "status_entities_user_mentions": str,
355 |     "status_extended_entities_media": str,
356 |     "status_favorite_count": np.int64,
357 |     "status_favorited": bool,
358 |     "status_geo": str,
359 |     "status_geo_coordinates": str,
360 |     "status_geo_type": str,
361 |     "status_id": np.int64,
362 |     "status_id_str": str,
363 |     "status_in_reply_to_screen_name": str,
364 |     "status_in_reply_to_status_id": np.int64,
365 |     "status_in_reply_to_status_id_str": str,
366 |     "status_in_reply_to_user_id": np.int64,
367 |     "status_in_reply_to_user_id_str": str,
368 |     "status_is_quote_status": bool,
369 |     "status_lang": str,
370 |     "status_place": str,
371 |     "status_place_bounding_box_coordinates": str,
372 |     "status_place_bounding_box_type": str,
373 |     "status_place_contained_within": str,
374 |     "status_place_country": str,
375 |     "status_place_country_code": str,
376 |     "status_place_full_name": str,
377 |     "status_place_id": str,
378 |     "status_place_name": str,
379 |     "status_place_place_type": str,
380 |     "status_place_url": str,
381 |     "status_possibly_sensitive": bool,
382 |     "status_quoted_status_id": np.int64,
383 |     "status_quoted_status_id_str": str,
384 |     "status_retweet_count": np.int64,
385 |     "status_retweeted": bool,
386 |     "status_retweeted_status_contributors": str,
387 |     "status_retweeted_status_coordinates": str,
388 |     "status_retweeted_status_created_at": pd.Timestamp,
389 |     "status_retweeted_status_entities_hashtags": str,
390 |     "status_retweeted_status_entities_media": str,
391 |     "status_retweeted_status_entities_symbols": str,
392 |     "status_retweeted_status_entities_urls": str,
393 |     "status_retweeted_status_entities_user_mentions": str,
394 |     "status_retweeted_status_extended_entities_media": str,
395 |     "status_retweeted_status_favorite_count": np.int64,
396 |     "status_retweeted_status_favorited": bool,
397 |     "status_retweeted_status_geo": str,
398 |     "status_retweeted_status_id": np.int64,
399 |     "status_retweeted_status_id_str": str,
400 |     "status_retweeted_status_in_reply_to_screen_name": str,
401 |     "status_retweeted_status_in_reply_to_status_id": np.int64,
402 |     "status_retweeted_status_in_reply_to_status_id_str": str,
403 |     "status_retweeted_status_in_reply_to_user_id": np.int64,
404 |     "status_retweeted_status_in_reply_to_user_id_str": str,
405 |     "status_retweeted_status_is_quote_status": bool,
406 |     "status_retweeted_status_lang": str,
407 |     "status_retweeted_status_place": str,
408 |     "status_retweeted_status_possibly_sensitive": bool,
409 |     "status_retweeted_status_quoted_status_id": np.int64,
410 |     "status_retweeted_status_quoted_status_id_str": str,
411 |     "status_retweeted_status_retweet_count": np.int64,
412 |     "status_retweeted_status_retweeted": bool,
413 |     "status_retweeted_status_source": str,
414 |     "status_retweeted_status_full_text": str,
415 |     "status_retweeted_status_truncated": bool,
416 |     "status_source": str,
417 |     "status_full_text": str,
418 |     "status_truncated": bool,
419 |     "statuses_count": np.int64,
420 |     "suspended": bool,
421 |     "time_zone": str,
422 |     "translator_type": str,
423 |     "url": str,
424 |     "verified": bool,
425 |     "utc_offset": str
426 | }
427 | 


--------------------------------------------------------------------------------
/test_run.sh:
--------------------------------------------------------------------------------
1 | cp test_config.yml config.yml
2 | cp user_ids_de.csv seeds.csv
3 | python start.py -n 200 -l de -p 1
4 | 


--------------------------------------------------------------------------------
/tests/config_test_empty.yml:
--------------------------------------------------------------------------------
  1 | # In the following config file, please fill the fields as you need them.
  2 | # Do not use quotes, just plain text: e.g.:
  3 | # sql:
  4 |  # dbtype: sqlite
  5 | # etc.
  6 | 
  7 | # ================== Database Information =====================
  8 | sql:
  9 |     dbtype:   # mysql
 10 |     host:     # if dbtype = mysql, provide host
 11 |     user:     # if dbtype = mysql, provide user
 12 |     passwd:   # if dbtype = mysql, provide password
 13 |     dbname:   # provide a name for the database.
 14 | 
 15 | 
 16 | # ================== Twitter User Details =====================
 17 | # If you wish to save certain twitter user details, please just add the SQL data
 18 | # type you wish to save it as in the SQL database (recommended types are indicated
 19 | # in parantheses). If you do not wish to save a certain detail, just leave it empty
 20 | # like so:
 21 | # twitter_user_details:
 22 |     # contributors_enabled: SMALLINT
 23 |     # created at:
 24 | # This will save the detail "contributors_enabled" as booelan / tinyint into the
 25 | # database but it will not save "created_at" at all.
 26 | 
 27 | twitter_user_details:
 28 |     contributors_enabled:  # SMALLINT
 29 |     created_at:  # DATETIME
 30 |     default_profile:  # SMALLINT
 31 |     default_profile_image:  # SMALLINT
 32 |     description:  # TEXT (contains a dict)
 33 |     entities_description_urls:  # TEXT
 34 |     entities_url_urls:  # TEXT (contains a dict)
 35 |     favourites_count:  # BIGINT
 36 |     follow_request_sent:  # SMALLINT
 37 |     followers_count:  # BIGINT
 38 |     following:  # SMALLINT
 39 |     friends_count:  # BIGINT
 40 |     geo_enabled:  # SMALLINT
 41 |     has_extended_profile:  # SMALLINT
 42 |     id:  # BIGINT PRIMARY KEY
 43 |     id_str:  # VARCHAR(30)
 44 |     is_translation_enabled:  # SMALLINT
 45 |     is_translator:  # SMALLINT
 46 |     lang:  # VARCHAR(10)
 47 |     listed_count:  # BIGINT
 48 |     location:  # TEXT
 49 |     name:  # VARCHAR (50)
 50 |     needs_phone_verification:  #SMALLINT
 51 |     notifications:  # SMALLINT
 52 |     profile_background_color:  # CHAR(6) (is a Hex Color Code)
 53 |     profile_background_image_url:  # TEXT
 54 |     profile_background_image_url_https:  # TEXT
 55 |     profile_background_tile:  # SMALLINT
 56 |     profile_banner_url:  # TEXT
 57 |     profile_image_url:  # TEXT
 58 |     profile_image_url_https:  # TEXT
 59 |     profile_link_color:  # CHAR(6) (is a Hex Color Code)
 60 |     profile_sidebar_border_color:  # CHAR(6) (is a Hex Color Code)
 61 |     profile_sidebar_fill_color:  # CHAR(6) (is a Hex Color Code)
 62 |     profile_text_color:  # CHAR(6) (is a Hex Color Code)
 63 |     profile_use_background_image:  # SMALLINT
 64 |     protected:  # SMALLINT
 65 |     screen_name:  # VARCHAR(50)
 66 |     status_contributors:  # TEXT (Rarely available)
 67 |     status_coordinates:  # TEXT (contains a dict)
 68 |     status_coordinates_coordinates:  # TEXT (Rarely available)
 69 |     status_coordinates_type:  # TEXT (Rarely available)
 70 |     status_created_at:  # DATETIME
 71 |     status_entities_hashtags:  # TEXT (contains a dict)
 72 |     status_entities_media:  # TEXT (contains a dict)
 73 |     status_entities_symbols:  # TEXT (contains a dict) # DE FACTO ALWAYS EMPTY
 74 |     status_entities_urls:  # TEXT (contains a dict)
 75 |     status_entities_user_mentions:  # TEXT (contains a dict)
 76 |     status_extended_entities_media:  # TEXT (contains a dict)
 77 |     status_favorite_count:  # INT
 78 |     status_favorited:  # SMALLINT
 79 |     status_geo:  # TEXT (contains a dict)
 80 |     status_geo_coordinates:  # TEXT (Rarely available)
 81 |     status_geo_type:  # TEXT (Rarely available)
 82 |     status_id:  # BIGINT
 83 |     status_id_str: # VARCHAR(50)
 84 |     status_in_reply_to_screen_name:  # VARCHAR(50)
 85 |     status_in_reply_to_status_id:  # BIGINT
 86 |     status_in_reply_to_status_id_str:  # VARCHAR(50)
 87 |     status_in_reply_to_user_id:  # BIGINT
 88 |     status_in_reply_to_user_id_str:  # VARCHAR(30)
 89 |     status_is_quote_status:  # SMALLINT
 90 |     status_lang:  # VARCHAR(10)
 91 |     status_place:  # TEXT (contains a dict)
 92 |     status_place_bounding_box_coordinates: # TEXT (Rarely available)
 93 |     status_place_bounding_box_type:  # TEXT (Rarely available)
 94 |     status_place_contained_within:  # TEXT (Rarely available)
 95 |     status_place_country:  # TEXT (Rarely available)
 96 |     status_place_country_code:  # TEXT (Rarely available)
 97 |     status_place_full_name:  # TEXT (Rarely available)
 98 |     status_place_id:  # TEXT (Rarely available)
 99 |     status_place_name: # TEXT (Rarely available)
100 |     status_place_place_type:  # TEXT (Rarely available)
101 |     status_place_url:  # TEXT (Rarely available)
102 |     status_possibly_sensitive:  # SMALLINT
103 |     status_quoted_status_id:  # BIGINT
104 |     status_quoted_status_id_str:  # VARCHAR(50)
105 |     status_retweet_count:  # INT
106 |     status_retweeted:  # SMALLINT
107 |     status_retweeted_status_contributors:  # TEXT (Rarely available)
108 |     status_retweeted_status_coordinates:  # TEXT (contains a dict)
109 |     status_retweeted_status_created_at:  # DATETIME
110 |     status_retweeted_status_entities_hashtags:  # TEXT (contains a dict)
111 |     status_retweeted_status_entities_media:  # TEXT (contains a dict)
112 |     status_retweeted_status_entities_symbols:  # TEXT (Rarely available)
113 |     status_retweeted_status_entities_urls:  # TEXT (contains a dict)
114 |     status_retweeted_status_entities_user_mentions:  # TEXT (contains a dict)
115 |     status_retweeted_status_extended_entities_media: # TEXT (contains a dict)
116 |     status_retweeted_status_favorite_count:  # INT
117 |     status_retweeted_status_favorited:  # SMALLINT
118 |     status_retweeted_status_geo:  # TEXT (contains a dict)
119 |     status_retweeted_status_id:  # BIGINT
120 |     status_retweeted_status_id_str:  # VARCHAR(50)
121 |     status_retweeted_status_in_reply_to_screen_name:  # VARCHAR(30)
122 |     status_retweeted_status_in_reply_to_status_id:  # BIGINT
123 |     status_retweeted_status_in_reply_to_status_id_str:  # VARCHAR(50)
124 |     status_retweeted_status_in_reply_to_user_id:  # BIGINT
125 |     status_retweeted_status_in_reply_to_user_id_str:  # VARCHAR(30)
126 |     status_retweeted_status_is_quote_status:  # SMALLINT
127 |     status_retweeted_status_lang:  # VARCHAR(10)
128 |     status_retweeted_status_place:  # TEXT (contains a dict)
129 |     status_retweeted_status_possibly_sensitive:  # SMALLINT
130 |     status_retweeted_status_quoted_status_id:  # BIGINT
131 |     status_retweeted_status_quoted_status_id_str:  # VARCHAR(50)
132 |     status_retweeted_status_retweet_count:  # INT
133 |     status_retweeted_status_retweeted:  # SMALLINT
134 |     status_retweeted_status_source:  # TEXT
135 |     status_retweeted_status_full_text:  # TEXT
136 |     status_retweeted_status_truncated:  # SMALLINT
137 |     status_source:  # TEXT
138 |     status_full_text:  # TEXT
139 |     status_truncated:  # SMALLINT
140 |     statuses_count:  # BIGINT
141 |     suspended:  # SMALLINT
142 |     time_zone: # TEXT (Rarely available)
143 |     translator_type:  # VARCHAR(50)
144 |     url:  # TEXT
145 |     verified:  # BOOLEAN
146 |     utc_offset:  # TEXT (Rarely available)
147 | 
148 | 
149 | # ================== Notification Emails =====================
150 | 
151 | notifications:
152 |     email_to_notify: # user@example.com
153 |     # mailgun details
154 |     # (find them under the respective domain name here: https://mailgun.com/app/domains)
155 |     mailgun_default_smtp_login:
156 |     mailgun_api_base_url:
157 |     mailgun_api_key:
158 | 


--------------------------------------------------------------------------------
/twauth.py:
--------------------------------------------------------------------------------
 1 | from __future__ import unicode_literals
 2 | 
 3 | import csv
 4 | import os
 5 | import webbrowser
 6 | 
 7 | import tweepy as tp
 8 | 
 9 | from setup import FileImport
10 | 
11 | 
12 | class OAuthorizer():
13 |     def __init__(self):
14 |         ctoken, csecret = FileImport().read_app_key_file()
15 |         auth = tp.OAuthHandler(ctoken, csecret)
16 | 
17 |         try:
18 |             redirect_url = auth.get_authorization_url()
19 |         except tp.TweepError as e:
20 |             if '"code":32' in e.reason:
21 |                 raise tp.TweepError("""Failed to get the request token. Perhaps the Consumer Key
22 |                 and / or secret in your 'keys.json' is incorrect?""")
23 |             else:
24 |                 raise e
25 | 
26 |         webbrowser.open(redirect_url)
27 |         token = auth.request_token["oauth_token"]
28 |         verifier = input("Please enter Verifier Code: ")
29 |         auth.request_token = {'oauth_token': token,
30 |                               'oauth_token_secret': verifier}
31 |         try:
32 |             auth.get_access_token(verifier)
33 |         except tp.TweepError as e:
34 |             if "Invalid oauth_verifier parameter" in e.reason:
35 |                 raise tp.TweepError("""Failed to get access token! Perhaps the
36 |                                     verifier you've entered is wrong.""")
37 |             else:
38 |                 raise e
39 | 
40 |         if not os.path.isfile('tokens.csv'):
41 |             with open('tokens.csv', 'a', newline='') as f:
42 |                 writer = csv.writer(f)
43 |                 writer.writerow(["token", "secret"])
44 |             f.close()
45 | 
46 |         with open('tokens.csv', 'a', newline='') as f:
47 |             writer = csv.writer(f)
48 |             writer.writerow([auth.access_token, auth.access_token_secret])
49 |         f.close()
50 | 
51 | 
52 | if __name__ == "__main__":
53 |     OAuthorizer()
54 | 


--------------------------------------------------------------------------------
/two_seeds.csv:
--------------------------------------------------------------------------------
1 | 83662933
2 | 36476777
3 | 


--------------------------------------------------------------------------------
/wrong_tokens.csv:
--------------------------------------------------------------------------------
1 | token,secret
2 | asdfasdf,asdf


--------------------------------------------------------------------------------