├── .gitignore
├── LICENSE.txt
├── README.md
├── cryptopanic_scraper.py
├── images
├── logo.png
└── screenshot.png
├── jupyter
├── Scratchpad.ipynb
└── eda.ipynb
├── requirements.txt
└── test.py
/.gitignore:
--------------------------------------------------------------------------------
1 | notes.txt
2 | data/
3 |
4 | # Byte-compiled / optimized / DLL files
5 | __pycache__/
6 | *.py[cod]
7 | *$py.class
8 |
9 | # C extensions
10 | *.so
11 |
12 | # Distribution / packaging
13 | .Python
14 | build/
15 | develop-eggs/
16 | dist/
17 | downloads/
18 | eggs/
19 | .eggs/
20 | lib/
21 | lib64/
22 | parts/
23 | sdist/
24 | var/
25 | wheels/
26 | *.egg-info/
27 | .installed.cfg
28 | *.egg
29 | MANIFEST
30 |
31 | # PyInstaller
32 | # Usually these files are written by a python script from a template
33 | # before PyInstaller builds the exe, so as to inject date/other infos into it.
34 | *.manifest
35 | *.spec
36 |
37 | # Installer logs
38 | pip-log.txt
39 | pip-delete-this-directory.txt
40 |
41 | # Unit test / coverage reports
42 | htmlcov/
43 | .tox/
44 | .coverage
45 | .coverage.*
46 | .cache
47 | nosetests.xml
48 | coverage.xml
49 | *.cover
50 | .hypothesis/
51 | .pytest_cache/
52 |
53 | # Translations
54 | *.mo
55 | *.pot
56 |
57 | # Django stuff:
58 | *.log
59 | local_settings.py
60 | db.sqlite3
61 |
62 | # Flask stuff:
63 | instance/
64 | .webassets-cache
65 |
66 | # Scrapy stuff:
67 | .scrapy
68 |
69 | # Sphinx documentation
70 | docs/_build/
71 |
72 | # PyBuilder
73 | target/
74 |
75 | # Jupyter Notebook
76 | .ipynb_checkpoints
77 |
78 | # pyenv
79 | .python-version
80 |
81 | # celery beat schedule file
82 | celerybeat-schedule
83 |
84 | # SageMath parsed files
85 | *.sage.py
86 |
87 | # Environments
88 | .env
89 | .venv
90 | env/
91 | venv/
92 | ENV/
93 | env.bak/
94 | venv.bak/
95 |
96 | # Spyder project settings
97 | .spyderproject
98 | .spyproject
99 |
100 | # Rope project settings
101 | .ropeproject
102 |
103 | # mkdocs documentation
104 | /site
105 |
106 | # mypy
107 | .mypy_cache/
108 |
109 | .idea
110 | data/
111 | .DS_Store
--------------------------------------------------------------------------------
/LICENSE.txt:
--------------------------------------------------------------------------------
1 | MIT License
2 |
3 | Copyright (c) 2019 Paul Mendes
4 |
5 | Permission is hereby granted, free of charge, to any person obtaining a copy
6 | of this software and associated documentation files (the "Software"), to deal
7 | in the Software without restriction, including without limitation the rights
8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 |
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 |
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
5 |
6 | [![Contributors][contributors-shield]][contributors-url]
7 | [![Forks][forks-shield]][forks-url]
8 | [![Stargazers][stars-shield]][stars-url]
9 | [![Issues][issues-shield]][issues-url]
10 | [![MIT License][license-shield]][license-url]
11 | [![LinkedIn][linkedin-shield]][linkedin-url]
12 |
13 |
14 |
15 |
16 |
17 |
18 |
19 |
20 |
21 |
22 |
Cryptopanic Scraper
23 |
24 |
25 | Headless chromedriver for automatic scraping of cryptopanics asynchronous newsfeed.
26 |
27 | Explore the docs »
28 |
29 |
30 | Report Bug
31 | ·
32 | Request Feature
33 |
34 |
35 |
36 |
37 |
38 |
39 | ## Table of Contents
40 |
41 | * [About the Project](#about-the-project)
42 | * [Built With](#built-with)
43 | * [Getting Started](#getting-started)
44 | * [Prerequisites](#prerequisites)
45 | * [Installation](#installation)
46 | * [Usage](#usage)
47 | * [Roadmap](#roadmap)
48 | * [Contributing](#contributing)
49 | * [License](#license)
50 | * [Contact](#contact)
51 |
52 |
53 |
54 |
55 | ## About The Project
56 |
57 | [![Product Name Screen Shot][product-screenshot]](https://cryptopanic.com/)
58 |
59 | Cryptopanic is a crypto news aggregator that offers realtime news feeds of all things crypto as well
60 | as user input for ratings.
61 | This project was designed to scrape the data from their website so it could be later analyzed using NLP.
62 |
63 | ### Built With
64 |
65 | * [Python](https://github.com/topics/python)
66 | * [Selenium](https://github.com/topics/selenium)
67 |
68 |
69 |
70 |
71 | ## Getting Started
72 |
73 | To get a local copy up and running follow these simple steps.
74 |
75 | ### Prerequisites
76 |
77 |
78 | * python 3
79 | * pip
80 |
81 |
82 | ### Installation
83 |
84 | 1. Clone the cryptopanic_scraper
85 | ```sh
86 | git clone https:://github.com/grilledchickenthighs/cryptopanic_scraper.git
87 | ```
88 | 2. Change directory
89 | ```sh
90 | cd cryptopanic_scraper
91 | ```
92 | 3. Install packages
93 | ```sh
94 | pip install -r requirements.txt
95 | ```
96 |
97 |
98 |
99 |
100 | ## Usage
101 | Simply run:
102 | ```sh
103 | python cryptopanic_scraper.py --headless
104 | ```
105 | If you want to see it in action, run the script without any flags.
106 | ```sh
107 | python cryptopanic_scraper.py
108 | ```
109 | If you want to filter the type of news to scrape add the --filter flag and choose
110 | a type. {all,hot,rising,bullish,bearish,lol,commented,important,saved}
111 | ```sh
112 | python cryptopanic_scraper.py --filter hot
113 | ```
114 | You can always use the --help flag if you forget these commands:
115 | ```sh
116 | python cryptopanic_scraper.py --help
117 |
118 | usage: cryptopanic_webdriver.py [-h] [-v]
119 | [-f {all,hot,rising,bullish,bearish,lol,commented,important,saved}]
120 | [-s]
121 |
122 | optional arguments:
123 | -h, --help show this help message and exit
124 | -v, --verbose increase output verbosity
125 | -f {all,hot,rising,bullish,bearish,lol,commented,important,saved}, --filter {all,hot,rising,bullish,bearish,lol,commented,important,saved}
126 | Type of News filter
127 | -s, --headless Run Chrome driver headless
128 | ```
129 |
130 | If your interested in analyzing the data:
131 |
132 | Please feel free to check out the [jupyter](https://github.com/GrilledChickenThighs/cryptopanic_scraper/tree/master/jupyter) directory for getting started.
133 |
134 |
135 | ## Roadmap
136 |
137 | See the [open issues](https://github.com/grilledchickenthighs/cryptopanic_scraper/issues) for a list of proposed features (and known issues).
138 |
139 |
140 |
141 |
142 | ## Contributing
143 |
144 | Contributions are what make the open source community such an amazing place to be learn, inspire, and create. Any contributions you make are **greatly appreciated**.
145 |
146 | 1. Fork the Project
147 | 2. Create your Feature Branch (`git checkout -b feature/AmazingFeature`)
148 | 3. Commit your Changes (`git commit -m 'Add some AmazingFeature'`)
149 | 4. Push to the Branch (`git push origin feature/AmazingFeature`)
150 | 5. Open a Pull Request
151 |
152 |
153 |
154 |
155 | ## License
156 |
157 | Distributed under the MIT License. See `LICENSE` for more information.
158 |
159 |
160 |
161 |
162 | ## Contact
163 |
164 | [Paul Mendes](https://grilledchickenthighs.github.io/) - [@BTCTradeNation](https://twitter.com/BTCTradeNation) - [paulsperformance@gmail.com](mailto:paulseperformance@gmail.com)
165 |
166 | Project Link: [https://github.com/grilledchickenthighs/cryptopanic_scraper](https://github.com/grilledchickenthighs/cryptopanic_scraper)
167 |
168 |
169 |
170 |
171 |
172 | [contributors-shield]: https://img.shields.io/github/contributors/grilledchickenthighs/cryptopanic_scraper?style=flat-square
173 | [contributors-url]: https://github.com/GrilledChickenThighs/cryptopanic_scraper/graphs/contributors
174 | [forks-shield]: https://img.shields.io/github/forks/grilledchickenthighs/cryptopanic_scraper?style=flat-sqaure
175 | [forks-url]: https://github.com/GrilledChickenThighs/cryptopanic_scraper/network/members
176 | [stars-shield]: https://img.shields.io/github/stars/grilledchickenthighs/cryptopanic_scraper?style=flat-square
177 | [stars-url]: https://github.com/grilledchickenthighs/cryptopanic_scraper/stargazers
178 | [issues-shield]: https://img.shields.io/github/issues/grilledchickenthighs/cryptopanic_scraper.svg?style=flat-square
179 | [issues-url]: https://github.com/grilledchickenthighs/cryptopanic_scraper/issues
180 | [license-shield]: https://img.shields.io/github/license/grilledchickenthighs/cryptopanic_scraper.svg?style=flat-square
181 | [license-url]: https://github.com/grilledchickenthighs/cryptopanic_scraper/blob/master/LICENSE.txt
182 | [linkedin-shield]: https://img.shields.io/badge/-LinkedIn-black.svg?style=flat-square&logo=linkedin&colorB=555
183 | [linkedin-url]: https://linkedin.com/in/paul-mendes
184 | [product-screenshot]: images/screenshot.png
--------------------------------------------------------------------------------
/cryptopanic_scraper.py:
--------------------------------------------------------------------------------
1 | from selenium import webdriver
2 | import os
3 | import time
4 | import datetime
5 | import re
6 | import pickle
7 | import urllib
8 | import argparse
9 | from webdriver_manager.chrome import ChromeDriverManager
10 | import pathlib
11 |
12 | parser = argparse.ArgumentParser()
13 | parser.add_argument("-v", "--verbose", help="increase output verbosity",
14 | action="store_true")
15 |
16 | parser.add_argument("-f", "--filter",
17 | help="Type of News filter",
18 | default="All",
19 | choices=['all', "hot", "rising", "bullish",
20 | "bearish", "lol", "commented", "important", "saved"])
21 |
22 | parser.add_argument("-s", "--headless", help="Run Chrome driver headless",
23 | action="store_true")
24 |
25 | parser.add_argument("-l", "--limit", help="Amount of pages to scrape",
26 | type=int, default=None)
27 |
28 |
29 | args = parser.parse_args()
30 |
31 | if args.verbose:
32 | print("verbosity turned on")
33 |
34 | # TODO: Create logger for exception handling
35 | # TODO: Replace print with logger
36 | # TODO: Create bash script or cron to automate this script
37 |
38 |
39 |
40 | SCROLL_PAUSE_TIME = 1
41 |
42 |
43 | def setUp():
44 |
45 | url = "https://www.cryptopanic.com/news?filter={}".format(args.filter)
46 |
47 | options = webdriver.ChromeOptions()
48 |
49 | # initialize headless mode
50 | if args.headless:
51 | options.add_argument('headless')
52 |
53 | # Don't load images
54 | prefs = {"profile.managed_default_content_settings.images": 2}
55 | options.add_experimental_option("prefs", prefs)
56 |
57 | # Set the window size
58 | options.add_argument('window-size=1200x800')
59 |
60 | # initialize the driver
61 | print("Initializing chromedriver.\n")
62 | driver = webdriver.Chrome(ChromeDriverManager().install(), options=options)
63 |
64 | print("Navigating to %s\n" % url)
65 | driver.get(url)
66 |
67 | # wait up to 2.5 seconds for the elements to become available
68 | driver.implicitly_wait(2.5)
69 |
70 | return driver
71 |
72 |
73 | def loadMore(len_elements):
74 | # Infinite scroll
75 |
76 | # Load More News
77 | load_more = driver.find_element_by_class_name('btn-outline-primary')
78 | driver.execute_script("arguments[0].scrollIntoView();", load_more)
79 |
80 | time.sleep(SCROLL_PAUSE_TIME)
81 |
82 | elements = driver.find_elements_by_css_selector('div.news-row.news-row-link')
83 | if len_elements < len(elements):
84 | if args.verbose:
85 | print("Loading %s more rows" % (len(elements) - len_elements))
86 | return True
87 | else:
88 | if args.verbose:
89 | print("No more rows to load :/")
90 | print("Total rows loaded: %s\n" % len(elements))
91 | return False
92 |
93 |
94 | def getData():
95 | data = dict()
96 | elements = driver.find_elements_by_css_selector('div.news-row.news-row-link')
97 |
98 | total_rows = len(elements) - 7 # elements being returned are appended by 7 of the first rows.
99 | print("Downloading Data...\n")
100 | start = datetime.datetime.now()
101 | print("Time Start: %s\n" % start)
102 |
103 | for i in range(total_rows):
104 | if i >= args.limit:
105 | print(f'Limit argument of {args.limit} hit.')
106 | break
107 | time.sleep(.5) # Busy sleep to keep cpu cool
108 | try:
109 | # Get date posted
110 | date_time = elements[i].find_element_by_css_selector('time').get_attribute('datetime')
111 | # string_date = re.sub('-.*', '', date_time)
112 | # date_time = datetime.datetime.strptime(string_date, "%a %b %d %Y %H:%M:%S %Z")
113 | # Get Title of News
114 | title = elements[i].find_element_by_css_selector("span.title-text span:nth-child(1)").text
115 | if title == '':
116 | driver.execute_script("arguments[0].scrollIntoView();",
117 | elements[i].find_element_by_css_selector("span.title-text"))
118 | title = elements[i].find_element_by_css_selector("span.title-text span:nth-child(1)").text
119 |
120 | # Get Source URL
121 | elements[i].find_element_by_css_selector("a.news-cell.nc-title").click()
122 | source_name = elements[i].find_element_by_css_selector("span.si-source-name").text
123 | source_link = driver.find_element_by_xpath("//div/h1/a[2]").get_property('href')
124 | source_url = re.sub(".*=", '', urllib.parse.unquote(source_link))
125 | driver.back()
126 |
127 | # Get Currency Tags
128 | currencies = []
129 | currency_elements = elements[i].find_elements_by_class_name("colored-link")
130 | for currency in currency_elements:
131 | currencies.append(currency.text)
132 |
133 | votes = dict()
134 | nc_votes = elements[i].find_elements_by_css_selector("span.nc-vote-cont")
135 | for nc_vote in nc_votes:
136 | vote = nc_vote.get_attribute('title')
137 | value = vote[:2]
138 | action = vote.replace(value, '').replace('votes', '').strip()
139 | votes[action] = int(value)
140 |
141 | data[i] = {"Date": date_time,
142 | "Title": title,
143 | "Currencies": currencies,
144 | "Votes": votes,
145 | "Source": source_name,
146 | "URL": source_url}
147 | if args.verbose:
148 | print("Downloaded %s of %s\nPublished: %s\nTitle: %s\nSource: %s\nURL: %s\n" % (i + 1,
149 | total_rows,
150 | data[i]["Date"],
151 | data[i]["Title"],
152 | data[i]["Source"],
153 | data[i]["URL"]))
154 | except Exception as e:
155 | print(e)
156 | raise e
157 |
158 | print("Finished gathering %s rows of data\n" % len(data))
159 | print("Time End: %.19s" % datetime.datetime.now())
160 | print("Elapsed Time Gathering Data: %.7s\n" % (datetime.datetime.now() - start))
161 |
162 | return data
163 |
164 |
165 | def saveData(data):
166 | # Save the website data
167 | file_name = "cryptopanic_{}_{:.10}->{:.10}.pickle".format(args.filter.lower(),
168 | str(data[len(data) - 1]['Date']),
169 | str(data[0]['Date']))
170 | # Make sure directory exists, if not make one.
171 | pathlib.Path("data").mkdir(parents=True, exist_ok=True)
172 |
173 | with open(os.path.join(os.getcwd(), 'data', file_name), 'wb') as f:
174 | pickle.dump(data, f)
175 |
176 | print("Saved data to %s\n" % file_name)
177 |
178 |
179 | def tearDown():
180 | if args.verbose:
181 | print("Exiting Chrome Driver")
182 | driver.quit()
183 |
184 |
185 | driver = setUp()
186 | if __name__ == "__main__":
187 | if args.limit is not None:
188 | data_limit = args.limit
189 | else:
190 | data_limit = 100000 # Just make this number massive.
191 | print("Loading News Feed...\n")
192 | while True:
193 |
194 | elements = driver.find_elements_by_css_selector('div.news-row.news-row-link')
195 |
196 | if len(elements) <= data_limit and loadMore(len(elements)):
197 | continue
198 | else:
199 | data = getData()
200 | saveData(data)
201 | tearDown()
202 | break
203 |
--------------------------------------------------------------------------------
/images/logo.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/pAulseperformance/cryptopanic_scraper/a2784d9d1297bd8f1ec5ea32c33f671b35ea85a7/images/logo.png
--------------------------------------------------------------------------------
/images/screenshot.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/pAulseperformance/cryptopanic_scraper/a2784d9d1297bd8f1ec5ea32c33f671b35ea85a7/images/screenshot.png
--------------------------------------------------------------------------------
/jupyter/Scratchpad.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "code",
5 | "execution_count": 1,
6 | "metadata": {},
7 | "outputs": [],
8 | "source": [
9 | "# df['Currencies'].apply(', '.join)"
10 | ]
11 | },
12 | {
13 | "cell_type": "code",
14 | "execution_count": 2,
15 | "metadata": {},
16 | "outputs": [
17 | {
18 | "data": {
19 | "text/html": [
20 | "\n",
21 | " \n",
22 | "
\n",
23 | "
Loading BokehJS ...\n",
24 | "
"
25 | ]
26 | },
27 | "metadata": {},
28 | "output_type": "display_data"
29 | },
30 | {
31 | "data": {
32 | "application/javascript": [
33 | "\n",
34 | "(function(root) {\n",
35 | " function now() {\n",
36 | " return new Date();\n",
37 | " }\n",
38 | "\n",
39 | " var force = true;\n",
40 | "\n",
41 | " if (typeof (root._bokeh_onload_callbacks) === \"undefined\" || force === true) {\n",
42 | " root._bokeh_onload_callbacks = [];\n",
43 | " root._bokeh_is_loading = undefined;\n",
44 | " }\n",
45 | "\n",
46 | " var JS_MIME_TYPE = 'application/javascript';\n",
47 | " var HTML_MIME_TYPE = 'text/html';\n",
48 | " var EXEC_MIME_TYPE = 'application/vnd.bokehjs_exec.v0+json';\n",
49 | " var CLASS_NAME = 'output_bokeh rendered_html';\n",
50 | "\n",
51 | " /**\n",
52 | " * Render data to the DOM node\n",
53 | " */\n",
54 | " function render(props, node) {\n",
55 | " var script = document.createElement(\"script\");\n",
56 | " node.appendChild(script);\n",
57 | " }\n",
58 | "\n",
59 | " /**\n",
60 | " * Handle when an output is cleared or removed\n",
61 | " */\n",
62 | " function handleClearOutput(event, handle) {\n",
63 | " var cell = handle.cell;\n",
64 | "\n",
65 | " var id = cell.output_area._bokeh_element_id;\n",
66 | " var server_id = cell.output_area._bokeh_server_id;\n",
67 | " // Clean up Bokeh references\n",
68 | " if (id != null && id in Bokeh.index) {\n",
69 | " Bokeh.index[id].model.document.clear();\n",
70 | " delete Bokeh.index[id];\n",
71 | " }\n",
72 | "\n",
73 | " if (server_id !== undefined) {\n",
74 | " // Clean up Bokeh references\n",
75 | " var cmd = \"from bokeh.io.state import curstate; print(curstate().uuid_to_server['\" + server_id + \"'].get_sessions()[0].document.roots[0]._id)\";\n",
76 | " cell.notebook.kernel.execute(cmd, {\n",
77 | " iopub: {\n",
78 | " output: function(msg) {\n",
79 | " var id = msg.content.text.trim();\n",
80 | " if (id in Bokeh.index) {\n",
81 | " Bokeh.index[id].model.document.clear();\n",
82 | " delete Bokeh.index[id];\n",
83 | " }\n",
84 | " }\n",
85 | " }\n",
86 | " });\n",
87 | " // Destroy server and session\n",
88 | " var cmd = \"import bokeh.io.notebook as ion; ion.destroy_server('\" + server_id + \"')\";\n",
89 | " cell.notebook.kernel.execute(cmd);\n",
90 | " }\n",
91 | " }\n",
92 | "\n",
93 | " /**\n",
94 | " * Handle when a new output is added\n",
95 | " */\n",
96 | " function handleAddOutput(event, handle) {\n",
97 | " var output_area = handle.output_area;\n",
98 | " var output = handle.output;\n",
99 | "\n",
100 | " // limit handleAddOutput to display_data with EXEC_MIME_TYPE content only\n",
101 | " if ((output.output_type != \"display_data\") || (!output.data.hasOwnProperty(EXEC_MIME_TYPE))) {\n",
102 | " return\n",
103 | " }\n",
104 | "\n",
105 | " var toinsert = output_area.element.find(\".\" + CLASS_NAME.split(' ')[0]);\n",
106 | "\n",
107 | " if (output.metadata[EXEC_MIME_TYPE][\"id\"] !== undefined) {\n",
108 | " toinsert[toinsert.length - 1].firstChild.textContent = output.data[JS_MIME_TYPE];\n",
109 | " // store reference to embed id on output_area\n",
110 | " output_area._bokeh_element_id = output.metadata[EXEC_MIME_TYPE][\"id\"];\n",
111 | " }\n",
112 | " if (output.metadata[EXEC_MIME_TYPE][\"server_id\"] !== undefined) {\n",
113 | " var bk_div = document.createElement(\"div\");\n",
114 | " bk_div.innerHTML = output.data[HTML_MIME_TYPE];\n",
115 | " var script_attrs = bk_div.children[0].attributes;\n",
116 | " for (var i = 0; i < script_attrs.length; i++) {\n",
117 | " toinsert[toinsert.length - 1].firstChild.setAttribute(script_attrs[i].name, script_attrs[i].value);\n",
118 | " }\n",
119 | " // store reference to server id on output_area\n",
120 | " output_area._bokeh_server_id = output.metadata[EXEC_MIME_TYPE][\"server_id\"];\n",
121 | " }\n",
122 | " }\n",
123 | "\n",
124 | " function register_renderer(events, OutputArea) {\n",
125 | "\n",
126 | " function append_mime(data, metadata, element) {\n",
127 | " // create a DOM node to render to\n",
128 | " var toinsert = this.create_output_subarea(\n",
129 | " metadata,\n",
130 | " CLASS_NAME,\n",
131 | " EXEC_MIME_TYPE\n",
132 | " );\n",
133 | " this.keyboard_manager.register_events(toinsert);\n",
134 | " // Render to node\n",
135 | " var props = {data: data, metadata: metadata[EXEC_MIME_TYPE]};\n",
136 | " render(props, toinsert[toinsert.length - 1]);\n",
137 | " element.append(toinsert);\n",
138 | " return toinsert\n",
139 | " }\n",
140 | "\n",
141 | " /* Handle when an output is cleared or removed */\n",
142 | " events.on('clear_output.CodeCell', handleClearOutput);\n",
143 | " events.on('delete.Cell', handleClearOutput);\n",
144 | "\n",
145 | " /* Handle when a new output is added */\n",
146 | " events.on('output_added.OutputArea', handleAddOutput);\n",
147 | "\n",
148 | " /**\n",
149 | " * Register the mime type and append_mime function with output_area\n",
150 | " */\n",
151 | " OutputArea.prototype.register_mime_type(EXEC_MIME_TYPE, append_mime, {\n",
152 | " /* Is output safe? */\n",
153 | " safe: true,\n",
154 | " /* Index of renderer in `output_area.display_order` */\n",
155 | " index: 0\n",
156 | " });\n",
157 | " }\n",
158 | "\n",
159 | " // register the mime type if in Jupyter Notebook environment and previously unregistered\n",
160 | " if (root.Jupyter !== undefined) {\n",
161 | " var events = require('base/js/events');\n",
162 | " var OutputArea = require('notebook/js/outputarea').OutputArea;\n",
163 | "\n",
164 | " if (OutputArea.prototype.mime_types().indexOf(EXEC_MIME_TYPE) == -1) {\n",
165 | " register_renderer(events, OutputArea);\n",
166 | " }\n",
167 | " }\n",
168 | "\n",
169 | " \n",
170 | " if (typeof (root._bokeh_timeout) === \"undefined\" || force === true) {\n",
171 | " root._bokeh_timeout = Date.now() + 5000;\n",
172 | " root._bokeh_failed_load = false;\n",
173 | " }\n",
174 | "\n",
175 | " var NB_LOAD_WARNING = {'data': {'text/html':\n",
176 | " \"\\n\"+\n",
177 | " \"
\\n\"+\n",
178 | " \"BokehJS does not appear to have successfully loaded. If loading BokehJS from CDN, this \\n\"+\n",
179 | " \"may be due to a slow or bad network connection. Possible fixes:\\n\"+\n",
180 | " \"
\\n\"+\n",
181 | " \"
\\n\"+\n",
182 | " \"- re-rerun `output_notebook()` to attempt to load from CDN again, or
\\n\"+\n",
183 | " \"- use INLINE resources instead, as so:
\\n\"+\n",
184 | " \"
\\n\"+\n",
185 | " \"
\\n\"+\n",
186 | " \"from bokeh.resources import INLINE\\n\"+\n",
187 | " \"output_notebook(resources=INLINE)\\n\"+\n",
188 | " \"
\\n\"+\n",
189 | " \"
\"}};\n",
190 | "\n",
191 | " function display_loaded() {\n",
192 | " var el = document.getElementById(\"eaa22502-2ee8-4116-b1ce-02e804e16a5a\");\n",
193 | " if (el != null) {\n",
194 | " el.textContent = \"BokehJS is loading...\";\n",
195 | " }\n",
196 | " if (root.Bokeh !== undefined) {\n",
197 | " if (el != null) {\n",
198 | " el.textContent = \"BokehJS \" + root.Bokeh.version + \" successfully loaded.\";\n",
199 | " }\n",
200 | " } else if (Date.now() < root._bokeh_timeout) {\n",
201 | " setTimeout(display_loaded, 100)\n",
202 | " }\n",
203 | " }\n",
204 | "\n",
205 | "\n",
206 | " function run_callbacks() {\n",
207 | " try {\n",
208 | " root._bokeh_onload_callbacks.forEach(function(callback) { callback() });\n",
209 | " }\n",
210 | " finally {\n",
211 | " delete root._bokeh_onload_callbacks\n",
212 | " }\n",
213 | " console.info(\"Bokeh: all callbacks have finished\");\n",
214 | " }\n",
215 | "\n",
216 | " function load_libs(js_urls, callback) {\n",
217 | " root._bokeh_onload_callbacks.push(callback);\n",
218 | " if (root._bokeh_is_loading > 0) {\n",
219 | " console.log(\"Bokeh: BokehJS is being loaded, scheduling callback at\", now());\n",
220 | " return null;\n",
221 | " }\n",
222 | " if (js_urls == null || js_urls.length === 0) {\n",
223 | " run_callbacks();\n",
224 | " return null;\n",
225 | " }\n",
226 | " console.log(\"Bokeh: BokehJS not loaded, scheduling load and callback at\", now());\n",
227 | " root._bokeh_is_loading = js_urls.length;\n",
228 | " for (var i = 0; i < js_urls.length; i++) {\n",
229 | " var url = js_urls[i];\n",
230 | " var s = document.createElement('script');\n",
231 | " s.src = url;\n",
232 | " s.async = false;\n",
233 | " s.onreadystatechange = s.onload = function() {\n",
234 | " root._bokeh_is_loading--;\n",
235 | " if (root._bokeh_is_loading === 0) {\n",
236 | " console.log(\"Bokeh: all BokehJS libraries loaded\");\n",
237 | " run_callbacks()\n",
238 | " }\n",
239 | " };\n",
240 | " s.onerror = function() {\n",
241 | " console.warn(\"failed to load library \" + url);\n",
242 | " };\n",
243 | " console.log(\"Bokeh: injecting script tag for BokehJS library: \", url);\n",
244 | " document.getElementsByTagName(\"head\")[0].appendChild(s);\n",
245 | " }\n",
246 | " };var element = document.getElementById(\"eaa22502-2ee8-4116-b1ce-02e804e16a5a\");\n",
247 | " if (element == null) {\n",
248 | " console.log(\"Bokeh: ERROR: autoload.js configured with elementid 'eaa22502-2ee8-4116-b1ce-02e804e16a5a' but no matching script tag was found. \")\n",
249 | " return false;\n",
250 | " }\n",
251 | "\n",
252 | " var js_urls = [\"https://cdn.pydata.org/bokeh/release/bokeh-0.13.0.min.js\", \"https://cdn.pydata.org/bokeh/release/bokeh-widgets-0.13.0.min.js\", \"https://cdn.pydata.org/bokeh/release/bokeh-tables-0.13.0.min.js\", \"https://cdn.pydata.org/bokeh/release/bokeh-gl-0.13.0.min.js\"];\n",
253 | "\n",
254 | " var inline_js = [\n",
255 | " function(Bokeh) {\n",
256 | " Bokeh.set_log_level(\"info\");\n",
257 | " },\n",
258 | " \n",
259 | " function(Bokeh) {\n",
260 | " \n",
261 | " },\n",
262 | " function(Bokeh) {\n",
263 | " console.log(\"Bokeh: injecting CSS: https://cdn.pydata.org/bokeh/release/bokeh-0.13.0.min.css\");\n",
264 | " Bokeh.embed.inject_css(\"https://cdn.pydata.org/bokeh/release/bokeh-0.13.0.min.css\");\n",
265 | " console.log(\"Bokeh: injecting CSS: https://cdn.pydata.org/bokeh/release/bokeh-widgets-0.13.0.min.css\");\n",
266 | " Bokeh.embed.inject_css(\"https://cdn.pydata.org/bokeh/release/bokeh-widgets-0.13.0.min.css\");\n",
267 | " console.log(\"Bokeh: injecting CSS: https://cdn.pydata.org/bokeh/release/bokeh-tables-0.13.0.min.css\");\n",
268 | " Bokeh.embed.inject_css(\"https://cdn.pydata.org/bokeh/release/bokeh-tables-0.13.0.min.css\");\n",
269 | " }\n",
270 | " ];\n",
271 | "\n",
272 | " function run_inline_js() {\n",
273 | " \n",
274 | " if ((root.Bokeh !== undefined) || (force === true)) {\n",
275 | " for (var i = 0; i < inline_js.length; i++) {\n",
276 | " inline_js[i].call(root, root.Bokeh);\n",
277 | " }if (force === true) {\n",
278 | " display_loaded();\n",
279 | " }} else if (Date.now() < root._bokeh_timeout) {\n",
280 | " setTimeout(run_inline_js, 100);\n",
281 | " } else if (!root._bokeh_failed_load) {\n",
282 | " console.log(\"Bokeh: BokehJS failed to load within specified timeout.\");\n",
283 | " root._bokeh_failed_load = true;\n",
284 | " } else if (force !== true) {\n",
285 | " var cell = $(document.getElementById(\"eaa22502-2ee8-4116-b1ce-02e804e16a5a\")).parents('.cell').data().cell;\n",
286 | " cell.output_area.append_execute_result(NB_LOAD_WARNING)\n",
287 | " }\n",
288 | "\n",
289 | " }\n",
290 | "\n",
291 | " if (root._bokeh_is_loading === 0) {\n",
292 | " console.log(\"Bokeh: BokehJS loaded, going straight to plotting\");\n",
293 | " run_inline_js();\n",
294 | " } else {\n",
295 | " load_libs(js_urls, function() {\n",
296 | " console.log(\"Bokeh: BokehJS plotting callback run at\", now());\n",
297 | " run_inline_js();\n",
298 | " });\n",
299 | " }\n",
300 | "}(window));"
301 | ],
302 | "application/vnd.bokehjs_load.v0+json": "\n(function(root) {\n function now() {\n return new Date();\n }\n\n var force = true;\n\n if (typeof (root._bokeh_onload_callbacks) === \"undefined\" || force === true) {\n root._bokeh_onload_callbacks = [];\n root._bokeh_is_loading = undefined;\n }\n\n \n\n \n if (typeof (root._bokeh_timeout) === \"undefined\" || force === true) {\n root._bokeh_timeout = Date.now() + 5000;\n root._bokeh_failed_load = false;\n }\n\n var NB_LOAD_WARNING = {'data': {'text/html':\n \"\\n\"+\n \"
\\n\"+\n \"BokehJS does not appear to have successfully loaded. If loading BokehJS from CDN, this \\n\"+\n \"may be due to a slow or bad network connection. Possible fixes:\\n\"+\n \"
\\n\"+\n \"
\\n\"+\n \"- re-rerun `output_notebook()` to attempt to load from CDN again, or
\\n\"+\n \"- use INLINE resources instead, as so:
\\n\"+\n \"
\\n\"+\n \"
\\n\"+\n \"from bokeh.resources import INLINE\\n\"+\n \"output_notebook(resources=INLINE)\\n\"+\n \"
\\n\"+\n \"
\"}};\n\n function display_loaded() {\n var el = document.getElementById(\"eaa22502-2ee8-4116-b1ce-02e804e16a5a\");\n if (el != null) {\n el.textContent = \"BokehJS is loading...\";\n }\n if (root.Bokeh !== undefined) {\n if (el != null) {\n el.textContent = \"BokehJS \" + root.Bokeh.version + \" successfully loaded.\";\n }\n } else if (Date.now() < root._bokeh_timeout) {\n setTimeout(display_loaded, 100)\n }\n }\n\n\n function run_callbacks() {\n try {\n root._bokeh_onload_callbacks.forEach(function(callback) { callback() });\n }\n finally {\n delete root._bokeh_onload_callbacks\n }\n console.info(\"Bokeh: all callbacks have finished\");\n }\n\n function load_libs(js_urls, callback) {\n root._bokeh_onload_callbacks.push(callback);\n if (root._bokeh_is_loading > 0) {\n console.log(\"Bokeh: BokehJS is being loaded, scheduling callback at\", now());\n return null;\n }\n if (js_urls == null || js_urls.length === 0) {\n run_callbacks();\n return null;\n }\n console.log(\"Bokeh: BokehJS not loaded, scheduling load and callback at\", now());\n root._bokeh_is_loading = js_urls.length;\n for (var i = 0; i < js_urls.length; i++) {\n var url = js_urls[i];\n var s = document.createElement('script');\n s.src = url;\n s.async = false;\n s.onreadystatechange = s.onload = function() {\n root._bokeh_is_loading--;\n if (root._bokeh_is_loading === 0) {\n console.log(\"Bokeh: all BokehJS libraries loaded\");\n run_callbacks()\n }\n };\n s.onerror = function() {\n console.warn(\"failed to load library \" + url);\n };\n console.log(\"Bokeh: injecting script tag for BokehJS library: \", url);\n document.getElementsByTagName(\"head\")[0].appendChild(s);\n }\n };var element = document.getElementById(\"eaa22502-2ee8-4116-b1ce-02e804e16a5a\");\n if (element == null) {\n console.log(\"Bokeh: ERROR: autoload.js configured with elementid 'eaa22502-2ee8-4116-b1ce-02e804e16a5a' but no matching script tag was found. \")\n return false;\n }\n\n var js_urls = [\"https://cdn.pydata.org/bokeh/release/bokeh-0.13.0.min.js\", \"https://cdn.pydata.org/bokeh/release/bokeh-widgets-0.13.0.min.js\", \"https://cdn.pydata.org/bokeh/release/bokeh-tables-0.13.0.min.js\", \"https://cdn.pydata.org/bokeh/release/bokeh-gl-0.13.0.min.js\"];\n\n var inline_js = [\n function(Bokeh) {\n Bokeh.set_log_level(\"info\");\n },\n \n function(Bokeh) {\n \n },\n function(Bokeh) {\n console.log(\"Bokeh: injecting CSS: https://cdn.pydata.org/bokeh/release/bokeh-0.13.0.min.css\");\n Bokeh.embed.inject_css(\"https://cdn.pydata.org/bokeh/release/bokeh-0.13.0.min.css\");\n console.log(\"Bokeh: injecting CSS: https://cdn.pydata.org/bokeh/release/bokeh-widgets-0.13.0.min.css\");\n Bokeh.embed.inject_css(\"https://cdn.pydata.org/bokeh/release/bokeh-widgets-0.13.0.min.css\");\n console.log(\"Bokeh: injecting CSS: https://cdn.pydata.org/bokeh/release/bokeh-tables-0.13.0.min.css\");\n Bokeh.embed.inject_css(\"https://cdn.pydata.org/bokeh/release/bokeh-tables-0.13.0.min.css\");\n }\n ];\n\n function run_inline_js() {\n \n if ((root.Bokeh !== undefined) || (force === true)) {\n for (var i = 0; i < inline_js.length; i++) {\n inline_js[i].call(root, root.Bokeh);\n }if (force === true) {\n display_loaded();\n }} else if (Date.now() < root._bokeh_timeout) {\n setTimeout(run_inline_js, 100);\n } else if (!root._bokeh_failed_load) {\n console.log(\"Bokeh: BokehJS failed to load within specified timeout.\");\n root._bokeh_failed_load = true;\n } else if (force !== true) {\n var cell = $(document.getElementById(\"eaa22502-2ee8-4116-b1ce-02e804e16a5a\")).parents('.cell').data().cell;\n cell.output_area.append_execute_result(NB_LOAD_WARNING)\n }\n\n }\n\n if (root._bokeh_is_loading === 0) {\n console.log(\"Bokeh: BokehJS loaded, going straight to plotting\");\n run_inline_js();\n } else {\n load_libs(js_urls, function() {\n console.log(\"Bokeh: BokehJS plotting callback run at\", now());\n run_inline_js();\n });\n }\n}(window));"
303 | },
304 | "metadata": {},
305 | "output_type": "display_data"
306 | }
307 | ],
308 | "source": [
309 | "# Now that the data is cleaned and we have a backup of our df let's explore\n",
310 | "from bokeh.plotting import figure, output_notebook, show\n",
311 | "output_notebook()\n",
312 | "from bokeh.models import ColumnDataSource\n",
313 | "from bokeh.models.tools import HoverTool\n",
314 | "from bokeh.transform import factor_cmap\n",
315 | "from bokeh.palettes import Spectral5, Spectral3, inferno, viridis, Category20"
316 | ]
317 | },
318 | {
319 | "cell_type": "code",
320 | "execution_count": 3,
321 | "metadata": {},
322 | "outputs": [
323 | {
324 | "ename": "NameError",
325 | "evalue": "name 'df' is not defined",
326 | "output_type": "error",
327 | "traceback": [
328 | "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
329 | "\u001b[0;31mNameError\u001b[0m Traceback (most recent call last)",
330 | "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0msource\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mColumnDataSource\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mdf\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 2\u001b[0m \u001b[0mtypes\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mdf\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m'Currencies'\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0munique\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mtolist\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 3\u001b[0m \u001b[0mcolor_map\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mfactor_cmap\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mfield_name\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;34m'Currencies'\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mpalette\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mviridis\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;36m18\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mfactors\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mtypes\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
331 | "\u001b[0;31mNameError\u001b[0m: name 'df' is not defined"
332 | ]
333 | }
334 | ],
335 | "source": [
336 | "source = ColumnDataSource(df)\n",
337 | "types = df['Currencies'].unique().tolist()\n",
338 | "color_map = factor_cmap(field_name='Currencies', palette=viridis(18), factors=types)"
339 | ]
340 | },
341 | {
342 | "cell_type": "code",
343 | "execution_count": 4,
344 | "metadata": {},
345 | "outputs": [
346 | {
347 | "ename": "NameError",
348 | "evalue": "name 'source' is not defined",
349 | "output_type": "error",
350 | "traceback": [
351 | "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
352 | "\u001b[0;31mNameError\u001b[0m Traceback (most recent call last)",
353 | "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[0mp\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mfigure\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mx_axis_type\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;34m'datetime'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 2\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 3\u001b[0;31m \u001b[0mp\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mcircle\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mx\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;34m'Date'\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0my\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;34m'positive'\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0msource\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0msource\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0msize\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;36m10\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mcolor\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mcolor_map\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 4\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 5\u001b[0m \u001b[0;31m# p.title.text = 'Pokemon Attack vs Speed'\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
354 | "\u001b[0;31mNameError\u001b[0m: name 'source' is not defined"
355 | ]
356 | }
357 | ],
358 | "source": [
359 | "p = figure(x_axis_type='datetime')\n",
360 | "\n",
361 | "p.circle(x='Date', y='positive', source=source, size=10, color=color_map)\n",
362 | "\n",
363 | "# p.title.text = 'Pokemon Attack vs Speed'\n",
364 | "# p.xaxis.axis_label = 'Attacking Stats'\n",
365 | "# p.yaxis.axis_label = 'Speed Stats'\n",
366 | "\n",
367 | "hover = HoverTool()\n",
368 | "hover.tooltips=[\n",
369 | " ('Positive', '@positive'),\n",
370 | " ('Negative', '@negative'),\n",
371 | " ('Important', '@{important}'),\n",
372 | " ('Title', '@Title'),\n",
373 | "]\n",
374 | "\n",
375 | "p.add_tools(hover)\n",
376 | "\n",
377 | "show(p)"
378 | ]
379 | },
380 | {
381 | "cell_type": "code",
382 | "execution_count": 5,
383 | "metadata": {},
384 | "outputs": [
385 | {
386 | "ename": "NameError",
387 | "evalue": "name 'df' is not defined",
388 | "output_type": "error",
389 | "traceback": [
390 | "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
391 | "\u001b[0;31mNameError\u001b[0m Traceback (most recent call last)",
392 | "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mattribs\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mdf\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mgroupby\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'Currencies'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m'positive'\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mmean\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
393 | "\u001b[0;31mNameError\u001b[0m: name 'df' is not defined"
394 | ]
395 | }
396 | ],
397 | "source": [
398 | "attribs = df.groupby('Currencies')['positive'].mean()"
399 | ]
400 | },
401 | {
402 | "cell_type": "code",
403 | "execution_count": 6,
404 | "metadata": {},
405 | "outputs": [
406 | {
407 | "ename": "NameError",
408 | "evalue": "name 'pd' is not defined",
409 | "output_type": "error",
410 | "traceback": [
411 | "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
412 | "\u001b[0;31mNameError\u001b[0m Traceback (most recent call last)",
413 | "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mdf\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mpd\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mconcat\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mdf\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mdrop\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m'Currencies'\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0maxis\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;36m1\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mdf\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m'Currencies'\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mapply\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mpd\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mSeries\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0maxis\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;36m1\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mfillna\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;36m0\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 2\u001b[0m \u001b[0mdf\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mdescribe\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
414 | "\u001b[0;31mNameError\u001b[0m: name 'pd' is not defined"
415 | ]
416 | }
417 | ],
418 | "source": [
419 | "df = pd.concat([df.drop(['Currencies'], axis=1), df['Currencies'].apply(pd.Series)], axis=1).fillna(0)\n",
420 | "df.describe()"
421 | ]
422 | },
423 | {
424 | "cell_type": "code",
425 | "execution_count": null,
426 | "metadata": {},
427 | "outputs": [],
428 | "source": []
429 | }
430 | ],
431 | "metadata": {
432 | "kernelspec": {
433 | "display_name": "Python 3",
434 | "language": "python",
435 | "name": "python3"
436 | },
437 | "language_info": {
438 | "codemirror_mode": {
439 | "name": "ipython",
440 | "version": 3
441 | },
442 | "file_extension": ".py",
443 | "mimetype": "text/x-python",
444 | "name": "python",
445 | "nbconvert_exporter": "python",
446 | "pygments_lexer": "ipython3",
447 | "version": "3.6.6"
448 | }
449 | },
450 | "nbformat": 4,
451 | "nbformat_minor": 2
452 | }
453 |
--------------------------------------------------------------------------------
/jupyter/eda.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "code",
5 | "execution_count": 3,
6 | "metadata": {},
7 | "outputs": [],
8 | "source": [
9 | "# I want to look at how each cryptocurrency compares with the number of votes it recieves.\n",
10 | "# I want to look at how big of a response each source gets"
11 | ]
12 | },
13 | {
14 | "cell_type": "code",
15 | "execution_count": 4,
16 | "metadata": {},
17 | "outputs": [
18 | {
19 | "data": {
20 | "text/html": [
21 | "\n",
22 | "\n",
35 | "
\n",
36 | " \n",
37 | " \n",
38 | " | \n",
39 | " Currencies | \n",
40 | " Date | \n",
41 | " Source | \n",
42 | " Title | \n",
43 | " URL | \n",
44 | " Votes | \n",
45 | "
\n",
46 | " \n",
47 | " \n",
48 | " \n",
49 | " 0 | \n",
50 | " [] | \n",
51 | " 2019-02-05 21:17:27 | \n",
52 | " cryptoglobe.com | \n",
53 | " Crypto-Fund Assets at All-Time High Despite Be... | \n",
54 | " https://www.cryptoglobe.com/latest/2019/02/cry... | \n",
55 | " {} | \n",
56 | "
\n",
57 | " \n",
58 | " 1 | \n",
59 | " [BTC] | \n",
60 | " 2019-02-05 21:07:00 | \n",
61 | " cointelegraph.com | \n",
62 | " Crypto Firm Accused of Fraud, Duping Investor ... | \n",
63 | " https://cointelegraph.com/news/crypto-firm-acc... | \n",
64 | " {} | \n",
65 | "
\n",
66 | " \n",
67 | " 2 | \n",
68 | " [] | \n",
69 | " 2019-02-05 20:20:05 | \n",
70 | " cryptoslate.com | \n",
71 | " Analysis of UAE and Saudi Arabia’s Government ... | \n",
72 | " https://cryptoslate.com/uae-saudi-arabia-launc... | \n",
73 | " {} | \n",
74 | "
\n",
75 | " \n",
76 | " 3 | \n",
77 | " [] | \n",
78 | " 2019-02-05 20:00:54 | \n",
79 | " bitcoinist.com | \n",
80 | " Bottom Feeders’ Time to Shine | \n",
81 | " https://bitcoinist.com/bottom-feeders-time-to-... | \n",
82 | " {} | \n",
83 | "
\n",
84 | " \n",
85 | " 4 | \n",
86 | " [BTC] | \n",
87 | " 2019-02-05 19:56:00 | \n",
88 | " cointelegraph.com | \n",
89 | " US Trading Platform LedgerX Introduces Binary ... | \n",
90 | " https://cointelegraph.com/news/us-trading-plat... | \n",
91 | " {'positive': 1, 'like': 1} | \n",
92 | "
\n",
93 | " \n",
94 | "
\n",
95 | "
"
96 | ],
97 | "text/plain": [
98 | " Currencies Date Source \\\n",
99 | "0 [] 2019-02-05 21:17:27 cryptoglobe.com \n",
100 | "1 [BTC] 2019-02-05 21:07:00 cointelegraph.com \n",
101 | "2 [] 2019-02-05 20:20:05 cryptoslate.com \n",
102 | "3 [] 2019-02-05 20:00:54 bitcoinist.com \n",
103 | "4 [BTC] 2019-02-05 19:56:00 cointelegraph.com \n",
104 | "\n",
105 | " Title \\\n",
106 | "0 Crypto-Fund Assets at All-Time High Despite Be... \n",
107 | "1 Crypto Firm Accused of Fraud, Duping Investor ... \n",
108 | "2 Analysis of UAE and Saudi Arabia’s Government ... \n",
109 | "3 Bottom Feeders’ Time to Shine \n",
110 | "4 US Trading Platform LedgerX Introduces Binary ... \n",
111 | "\n",
112 | " URL \\\n",
113 | "0 https://www.cryptoglobe.com/latest/2019/02/cry... \n",
114 | "1 https://cointelegraph.com/news/crypto-firm-acc... \n",
115 | "2 https://cryptoslate.com/uae-saudi-arabia-launc... \n",
116 | "3 https://bitcoinist.com/bottom-feeders-time-to-... \n",
117 | "4 https://cointelegraph.com/news/us-trading-plat... \n",
118 | "\n",
119 | " Votes \n",
120 | "0 {} \n",
121 | "1 {} \n",
122 | "2 {} \n",
123 | "3 {} \n",
124 | "4 {'positive': 1, 'like': 1} "
125 | ]
126 | },
127 | "execution_count": 4,
128 | "metadata": {},
129 | "output_type": "execute_result"
130 | }
131 | ],
132 | "source": [
133 | "import pandas as pd\n",
134 | "\n",
135 | "# Read pickle and transform dataframe\n",
136 | "\n",
137 | "data = pd.read_pickle(\"../data/cryptopanic_all_2019-02-01->2019-02-05.pickle\")\n",
138 | "df = pd.DataFrame(data)\n",
139 | "df = df.T\n",
140 | "df.head()"
141 | ]
142 | },
143 | {
144 | "cell_type": "code",
145 | "execution_count": 5,
146 | "metadata": {},
147 | "outputs": [],
148 | "source": [
149 | "# Make Datetime column the index\n",
150 | "df.index = df['Date']\n",
151 | "df.drop(columns='Date', inplace=True)"
152 | ]
153 | },
154 | {
155 | "cell_type": "code",
156 | "execution_count": 6,
157 | "metadata": {},
158 | "outputs": [
159 | {
160 | "data": {
161 | "text/html": [
162 | "\n",
163 | "\n",
176 | "
\n",
177 | " \n",
178 | " \n",
179 | " | \n",
180 | " Currencies | \n",
181 | " Source | \n",
182 | " Title | \n",
183 | " URL | \n",
184 | " comments | \n",
185 | " dislike | \n",
186 | " important | \n",
187 | " like | \n",
188 | " lol | \n",
189 | " negative | \n",
190 | " positive | \n",
191 | " save | \n",
192 | " saves | \n",
193 | "
\n",
194 | " \n",
195 | " Date | \n",
196 | " | \n",
197 | " | \n",
198 | " | \n",
199 | " | \n",
200 | " | \n",
201 | " | \n",
202 | " | \n",
203 | " | \n",
204 | " | \n",
205 | " | \n",
206 | " | \n",
207 | " | \n",
208 | " | \n",
209 | "
\n",
210 | " \n",
211 | " \n",
212 | " \n",
213 | " 2019-02-05 21:17:27 | \n",
214 | " [] | \n",
215 | " cryptoglobe.com | \n",
216 | " Crypto-Fund Assets at All-Time High Despite Be... | \n",
217 | " https://www.cryptoglobe.com/latest/2019/02/cry... | \n",
218 | " 0.0 | \n",
219 | " 0.0 | \n",
220 | " 0.0 | \n",
221 | " 0.0 | \n",
222 | " 0.0 | \n",
223 | " 0.0 | \n",
224 | " 0.0 | \n",
225 | " 0.0 | \n",
226 | " 0.0 | \n",
227 | "
\n",
228 | " \n",
229 | " 2019-02-05 21:07:00 | \n",
230 | " [BTC] | \n",
231 | " cointelegraph.com | \n",
232 | " Crypto Firm Accused of Fraud, Duping Investor ... | \n",
233 | " https://cointelegraph.com/news/crypto-firm-acc... | \n",
234 | " 0.0 | \n",
235 | " 0.0 | \n",
236 | " 0.0 | \n",
237 | " 0.0 | \n",
238 | " 0.0 | \n",
239 | " 0.0 | \n",
240 | " 0.0 | \n",
241 | " 0.0 | \n",
242 | " 0.0 | \n",
243 | "
\n",
244 | " \n",
245 | " 2019-02-05 20:20:05 | \n",
246 | " [] | \n",
247 | " cryptoslate.com | \n",
248 | " Analysis of UAE and Saudi Arabia’s Government ... | \n",
249 | " https://cryptoslate.com/uae-saudi-arabia-launc... | \n",
250 | " 0.0 | \n",
251 | " 0.0 | \n",
252 | " 0.0 | \n",
253 | " 0.0 | \n",
254 | " 0.0 | \n",
255 | " 0.0 | \n",
256 | " 0.0 | \n",
257 | " 0.0 | \n",
258 | " 0.0 | \n",
259 | "
\n",
260 | " \n",
261 | " 2019-02-05 20:00:54 | \n",
262 | " [] | \n",
263 | " bitcoinist.com | \n",
264 | " Bottom Feeders’ Time to Shine | \n",
265 | " https://bitcoinist.com/bottom-feeders-time-to-... | \n",
266 | " 0.0 | \n",
267 | " 0.0 | \n",
268 | " 0.0 | \n",
269 | " 0.0 | \n",
270 | " 0.0 | \n",
271 | " 0.0 | \n",
272 | " 0.0 | \n",
273 | " 0.0 | \n",
274 | " 0.0 | \n",
275 | "
\n",
276 | " \n",
277 | " 2019-02-05 19:56:00 | \n",
278 | " [BTC] | \n",
279 | " cointelegraph.com | \n",
280 | " US Trading Platform LedgerX Introduces Binary ... | \n",
281 | " https://cointelegraph.com/news/us-trading-plat... | \n",
282 | " 0.0 | \n",
283 | " 0.0 | \n",
284 | " 0.0 | \n",
285 | " 1.0 | \n",
286 | " 0.0 | \n",
287 | " 0.0 | \n",
288 | " 1.0 | \n",
289 | " 0.0 | \n",
290 | " 0.0 | \n",
291 | "
\n",
292 | " \n",
293 | "
\n",
294 | "
"
295 | ],
296 | "text/plain": [
297 | " Currencies Source \\\n",
298 | "Date \n",
299 | "2019-02-05 21:17:27 [] cryptoglobe.com \n",
300 | "2019-02-05 21:07:00 [BTC] cointelegraph.com \n",
301 | "2019-02-05 20:20:05 [] cryptoslate.com \n",
302 | "2019-02-05 20:00:54 [] bitcoinist.com \n",
303 | "2019-02-05 19:56:00 [BTC] cointelegraph.com \n",
304 | "\n",
305 | " Title \\\n",
306 | "Date \n",
307 | "2019-02-05 21:17:27 Crypto-Fund Assets at All-Time High Despite Be... \n",
308 | "2019-02-05 21:07:00 Crypto Firm Accused of Fraud, Duping Investor ... \n",
309 | "2019-02-05 20:20:05 Analysis of UAE and Saudi Arabia’s Government ... \n",
310 | "2019-02-05 20:00:54 Bottom Feeders’ Time to Shine \n",
311 | "2019-02-05 19:56:00 US Trading Platform LedgerX Introduces Binary ... \n",
312 | "\n",
313 | " URL \\\n",
314 | "Date \n",
315 | "2019-02-05 21:17:27 https://www.cryptoglobe.com/latest/2019/02/cry... \n",
316 | "2019-02-05 21:07:00 https://cointelegraph.com/news/crypto-firm-acc... \n",
317 | "2019-02-05 20:20:05 https://cryptoslate.com/uae-saudi-arabia-launc... \n",
318 | "2019-02-05 20:00:54 https://bitcoinist.com/bottom-feeders-time-to-... \n",
319 | "2019-02-05 19:56:00 https://cointelegraph.com/news/us-trading-plat... \n",
320 | "\n",
321 | " comments dislike important like lol negative \\\n",
322 | "Date \n",
323 | "2019-02-05 21:17:27 0.0 0.0 0.0 0.0 0.0 0.0 \n",
324 | "2019-02-05 21:07:00 0.0 0.0 0.0 0.0 0.0 0.0 \n",
325 | "2019-02-05 20:20:05 0.0 0.0 0.0 0.0 0.0 0.0 \n",
326 | "2019-02-05 20:00:54 0.0 0.0 0.0 0.0 0.0 0.0 \n",
327 | "2019-02-05 19:56:00 0.0 0.0 0.0 1.0 0.0 0.0 \n",
328 | "\n",
329 | " positive save saves \n",
330 | "Date \n",
331 | "2019-02-05 21:17:27 0.0 0.0 0.0 \n",
332 | "2019-02-05 21:07:00 0.0 0.0 0.0 \n",
333 | "2019-02-05 20:20:05 0.0 0.0 0.0 \n",
334 | "2019-02-05 20:00:54 0.0 0.0 0.0 \n",
335 | "2019-02-05 19:56:00 1.0 0.0 0.0 "
336 | ]
337 | },
338 | "execution_count": 6,
339 | "metadata": {},
340 | "output_type": "execute_result"
341 | }
342 | ],
343 | "source": [
344 | "# Split Votes list into separate columns and fill NaN values\n",
345 | "df = pd.concat([df.drop(['Votes'], axis=1), df['Votes'].apply(pd.Series)], axis=1).fillna(0)\n",
346 | "df.head()"
347 | ]
348 | },
349 | {
350 | "cell_type": "code",
351 | "execution_count": 7,
352 | "metadata": {},
353 | "outputs": [],
354 | "source": [
355 | "# Remove list from Currency column\n",
356 | "df['Currencies'] = df['Currencies'].apply(', '.join)"
357 | ]
358 | },
359 | {
360 | "cell_type": "code",
361 | "execution_count": 8,
362 | "metadata": {},
363 | "outputs": [],
364 | "source": [
365 | "# find all unique currencies\n",
366 | "unique_currencies = set([c for i in df.Currencies for c in i])"
367 | ]
368 | },
369 | {
370 | "cell_type": "code",
371 | "execution_count": 55,
372 | "metadata": {},
373 | "outputs": [
374 | {
375 | "data": {
376 | "text/html": [
377 | "\n",
378 | "\n",
391 | "
\n",
392 | " \n",
393 | " \n",
394 | " | \n",
395 | " comments | \n",
396 | " dislike | \n",
397 | " important | \n",
398 | " like | \n",
399 | " lol | \n",
400 | " negative | \n",
401 | " positive | \n",
402 | " save | \n",
403 | " saves | \n",
404 | "
\n",
405 | " \n",
406 | " Currencies | \n",
407 | " | \n",
408 | " | \n",
409 | " | \n",
410 | " | \n",
411 | " | \n",
412 | " | \n",
413 | " | \n",
414 | " | \n",
415 | " | \n",
416 | "
\n",
417 | " \n",
418 | " \n",
419 | " \n",
420 | " XRP | \n",
421 | " 28 | \n",
422 | " 28 | \n",
423 | " 28 | \n",
424 | " 28 | \n",
425 | " 28 | \n",
426 | " 28 | \n",
427 | " 28 | \n",
428 | " 28 | \n",
429 | " 28 | \n",
430 | "
\n",
431 | " \n",
432 | " XRP, EOS, XLM | \n",
433 | " 1 | \n",
434 | " 1 | \n",
435 | " 1 | \n",
436 | " 1 | \n",
437 | " 1 | \n",
438 | " 1 | \n",
439 | " 1 | \n",
440 | " 1 | \n",
441 | " 1 | \n",
442 | "
\n",
443 | " \n",
444 | " XRP, ETH | \n",
445 | " 2 | \n",
446 | " 2 | \n",
447 | " 2 | \n",
448 | " 2 | \n",
449 | " 2 | \n",
450 | " 2 | \n",
451 | " 2 | \n",
452 | " 2 | \n",
453 | " 2 | \n",
454 | "
\n",
455 | " \n",
456 | " XRP, ETH, BCH | \n",
457 | " 1 | \n",
458 | " 1 | \n",
459 | " 1 | \n",
460 | " 1 | \n",
461 | " 1 | \n",
462 | " 1 | \n",
463 | " 1 | \n",
464 | " 1 | \n",
465 | " 1 | \n",
466 | "
\n",
467 | " \n",
468 | " XRP, ETH, TRX | \n",
469 | " 1 | \n",
470 | " 1 | \n",
471 | " 1 | \n",
472 | " 1 | \n",
473 | " 1 | \n",
474 | " 1 | \n",
475 | " 1 | \n",
476 | " 1 | \n",
477 | " 1 | \n",
478 | "
\n",
479 | " \n",
480 | " XRP, LTC | \n",
481 | " 1 | \n",
482 | " 1 | \n",
483 | " 1 | \n",
484 | " 1 | \n",
485 | " 1 | \n",
486 | " 1 | \n",
487 | " 1 | \n",
488 | " 1 | \n",
489 | " 1 | \n",
490 | "
\n",
491 | " \n",
492 | " XRP, LTC, TRX | \n",
493 | " 1 | \n",
494 | " 1 | \n",
495 | " 1 | \n",
496 | " 1 | \n",
497 | " 1 | \n",
498 | " 1 | \n",
499 | " 1 | \n",
500 | " 1 | \n",
501 | " 1 | \n",
502 | "
\n",
503 | " \n",
504 | " XRP, TRX, XLM | \n",
505 | " 2 | \n",
506 | " 2 | \n",
507 | " 2 | \n",
508 | " 2 | \n",
509 | " 2 | \n",
510 | " 2 | \n",
511 | " 2 | \n",
512 | " 2 | \n",
513 | " 2 | \n",
514 | "
\n",
515 | " \n",
516 | "
\n",
517 | "
"
518 | ],
519 | "text/plain": [
520 | " comments dislike important like lol negative positive \\\n",
521 | "Currencies \n",
522 | "XRP 28 28 28 28 28 28 28 \n",
523 | "XRP, EOS, XLM 1 1 1 1 1 1 1 \n",
524 | "XRP, ETH 2 2 2 2 2 2 2 \n",
525 | "XRP, ETH, BCH 1 1 1 1 1 1 1 \n",
526 | "XRP, ETH, TRX 1 1 1 1 1 1 1 \n",
527 | "XRP, LTC 1 1 1 1 1 1 1 \n",
528 | "XRP, LTC, TRX 1 1 1 1 1 1 1 \n",
529 | "XRP, TRX, XLM 2 2 2 2 2 2 2 \n",
530 | "\n",
531 | " save saves \n",
532 | "Currencies \n",
533 | "XRP 28 28 \n",
534 | "XRP, EOS, XLM 1 1 \n",
535 | "XRP, ETH 2 2 \n",
536 | "XRP, ETH, BCH 1 1 \n",
537 | "XRP, ETH, TRX 1 1 \n",
538 | "XRP, LTC 1 1 \n",
539 | "XRP, LTC, TRX 1 1 \n",
540 | "XRP, TRX, XLM 2 2 "
541 | ]
542 | },
543 | "execution_count": 55,
544 | "metadata": {},
545 | "output_type": "execute_result"
546 | }
547 | ],
548 | "source": [
549 | "for c in unique_currencies:\n",
550 | " a =df[df.Currencies.str.match('XRP')]\n",
551 | "\n",
552 | "votes = [i for i in df.iloc[:, 4:]]\n",
553 | "a.groupby('Currencies')[votes].count()"
554 | ]
555 | },
556 | {
557 | "cell_type": "code",
558 | "execution_count": 45,
559 | "metadata": {},
560 | "outputs": [],
561 | "source": [
562 | "# df['Currencies']\n",
563 | "# df.groupby(['Currencies']).apply(list)\n",
564 | "# df.Currencies.unique()"
565 | ]
566 | },
567 | {
568 | "cell_type": "code",
569 | "execution_count": 46,
570 | "metadata": {},
571 | "outputs": [
572 | {
573 | "data": {
574 | "text/plain": [
575 | "['comments',\n",
576 | " 'dislike',\n",
577 | " 'important',\n",
578 | " 'like',\n",
579 | " 'lol',\n",
580 | " 'negative',\n",
581 | " 'positive',\n",
582 | " 'save',\n",
583 | " 'saves']"
584 | ]
585 | },
586 | "execution_count": 46,
587 | "metadata": {},
588 | "output_type": "execute_result"
589 | }
590 | ],
591 | "source": [
592 | "votes"
593 | ]
594 | },
595 | {
596 | "cell_type": "code",
597 | "execution_count": null,
598 | "metadata": {},
599 | "outputs": [],
600 | "source": []
601 | }
602 | ],
603 | "metadata": {
604 | "kernelspec": {
605 | "display_name": "Python 3",
606 | "language": "python",
607 | "name": "python3"
608 | },
609 | "language_info": {
610 | "codemirror_mode": {
611 | "name": "ipython",
612 | "version": 3
613 | },
614 | "file_extension": ".py",
615 | "mimetype": "text/x-python",
616 | "name": "python",
617 | "nbconvert_exporter": "python",
618 | "pygments_lexer": "ipython3",
619 | "version": "3.6.6"
620 | }
621 | },
622 | "nbformat": 4,
623 | "nbformat_minor": 2
624 | }
625 |
--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
1 | certifi==2019.9.11
2 | chardet==3.0.4
3 | colorama==0.4.1
4 | configparser==4.0.2
5 | crayons==0.2.0
6 | idna==2.8
7 | requests==2.22.0
8 | selenium==3.141.0
9 | urllib3==1.25.6
10 | webdriver-manager==1.8.2
11 |
--------------------------------------------------------------------------------
/test.py:
--------------------------------------------------------------------------------
1 | import cryptopanic_scraper as cw
2 |
3 | def test_setUp():
4 | assert (cw.setUp())
5 |
6 |
7 | def test_setUp():
8 | assert (cw.getData())
9 |
--------------------------------------------------------------------------------