├── .gitignore
├── Cryptocurrency-Pricing-Analysis.html
├── Cryptocurrency-Pricing-Analysis.ipynb
├── Cryptocurrency-Pricing-Analysis.py
├── Dockerfile
├── README.md
└── charts
├── html
├── aggregate-bitcoin-price.html
├── altcoin_prices_combined.html
├── combined-exchanges-pricing-clean.html
├── combined-exchanges-pricing.html
├── cryptocurrency-correlations-2016-v2.html
├── cryptocurrency-correlations-2016.html
├── cryptocurrency-correlations-2017-v2.html
├── cryptocurrency-correlations-2017.html
└── kraken_price_plot.html
└── png
├── aggregate-bitcoin-price.png
├── altcoin_prices_combined.png
├── combined-exchanges-pricing-clean.png
├── combined-exchanges-pricing.png
├── cryptocurrency-correlations-2016-v2.png
├── cryptocurrency-correlations-2016.png
├── cryptocurrency-correlations-2017-v2.png
├── cryptocurrency-correlations-2017.png
└── kraken_price_plot.png
/.gitignore:
--------------------------------------------------------------------------------
1 | .ipynb_checkpoints/
2 | __pycache__/
3 | data/
4 | .vscode/
5 | .DS_Store
--------------------------------------------------------------------------------
/Cryptocurrency-Pricing-Analysis.py:
--------------------------------------------------------------------------------
1 |
2 | # coding: utf-8
3 |
4 | # ## A Data-Driven Approach To Cryptocurrency Speculation
5 | #
6 | # *How do Bitcoin markets behave? What are the causes of the sudden spikes and dips in cryptocurrency values? Are the markets for different altcoins, such as Litecoin and Ripple, inseparably linked or largely independent? **How can we predict what will happen next?***
7 | #
8 | # Articles on cryptocurrencies, such as Bitcoin and Ethereum, are rife with speculation these days, with hundreds of self-proclaimed experts advocating for the trends that they expect to emerge. What is lacking from many of these analyses is a strong data analysis foundation to backup the claims.
9 | #
10 | # The goal of this article is to provide an easy introduction to cryptocurrency analysis using Python. We will walk through a simple Python script to retrieve, analyze, and visualize data on different cryptocurrencies. In the process, we will uncover an interesting trend in how these volatile markets behave, and how they are evolving.
11 | #
12 | #
13 | #
14 | # This is not a post explaining what cryptocurrencies are (if you want one, I would recommend this great overview), nor is it an opinion piece on which specific currencies will rise and which will fall. Instead, all that we are concerned about in this tutorial is procuring the raw data and uncovering the stories hidden in the numbers.
15 | #
16 | #
17 | # ### Step 1 - Setup Your Data Laboratory
18 | # The tutorial is intended to be accessible for enthusiasts, engineers, and data scientists at all skill levels. The only skills that you will need are a basic understanding of Python and enough knowledge of the command line to setup a project.
19 | #
20 | # ##### Step 1.1 - Install Anaconda
21 | # The easiest way to install the dependencies for this project from scratch is to use Anaconda, a prepackaged Python data science ecosystem and dependency manager.
22 | #
23 | # To setup Anaconda, I would recommend following the official installation instructions - [https://www.continuum.io/downloads](https://www.continuum.io/downloads).
24 | #
25 | # *If you're an advanced user, and you don't want to use Anaconda, that's totally fine; I'll assume you don't need help installing the required dependencies. Feel free to skip to section 2.*
26 | #
27 | # ##### Step 1.2 - Setup an Anaconda Project Environment
28 | #
29 | # Once Anaconda is installed, we'll want to create a new environment to keep our dependencies organized.
30 | #
31 | # Run `conda create --name cryptocurrency-analysis python=3` to create a new Anaconda environment for our project.
32 | #
33 | # Next, run `source activate cryptocurrency-analysis` (on Linux/macOS) or `activate cryptocurrency-analysis` (on windows) to activate this environment.
34 | #
35 | # Finally, run `conda install numpy pandas nb_conda jupyter plotly quandl` to install the required dependencies in the environment. This could take a few minutes to complete.
36 | #
37 | # *Why use environments? If you plan on developing multiple Python projects on your computer, it is helpful to keep the dependencies (software libraries and packages) separate in order to avoid conflicts. Anaconda will create a special environment directory for the dependencies for each project to keep everything organized and separated.*
38 | #
39 | # ##### Step 1.3 - Start An Interative Jupyter Notebook
40 | #
41 | # Once the environment and dependencies are all set up, run `jupyter notebook` to start the iPython kernel, and open your browser to `http://localhost:8888/`. Create a new Python notebook, making sure to use the `Python [conda env:cryptocurrency-analysis]` kernel.
42 | #
43 | # 
44 | #
45 |
46 | # ##### Step 1.4 - Import the Dependencies At The Top of The Notebook
47 | # Once you've got a blank Jupyter notebook open, the first thing we'll do is import the required dependencies.
48 |
49 | # In[1]:
50 |
51 |
52 | import os
53 | import numpy as np
54 | import pandas as pd
55 | import pickle
56 | import quandl
57 | from datetime import datetime
58 |
59 |
60 | # We'll also import Plotly and enable the offline mode.
61 |
62 | # In[2]:
63 |
64 |
65 | import plotly.offline as py
66 | import plotly.graph_objs as go
67 | import plotly.figure_factory as ff
68 | py.init_notebook_mode(connected=True)
69 |
70 |
71 | # In[3]:
72 |
73 |
74 | quandl.ApiConfig.api_key = os.environ['QUANDL_API_KEY']
75 |
76 |
77 | # ### Step 2 - Retrieve Bitcoin Pricing Data
78 | # Now that everything is set up, we're ready to start retrieving data for analysis. First, we need to get Bitcoin pricing data using [Quandl's free Bitcoin API](https://blog.quandl.com/api-for-bitcoin-data).
79 |
80 | # ##### Step 2.1 - Define Quandl Helper Function
81 | # To assist with this data retrieval we'll define a function to download and cache datasets from Quandl.
82 |
83 | # In[4]:
84 |
85 |
86 | def get_quandl_data(quandl_id):
87 | '''Download and cache Quandl dataseries'''
88 | cache_path = '{}.pkl'.format(quandl_id).replace('/','-')
89 | try:
90 | f = open(cache_path, 'rb')
91 | df = pickle.load(f)
92 | print('Loaded {} from cache'.format(quandl_id))
93 | except (OSError, IOError) as e:
94 | print('Downloading {} from Quandl'.format(quandl_id))
95 | df = quandl.get(quandl_id, returns="pandas")
96 | df.to_pickle(cache_path)
97 | print('Cached {} at {}'.format(quandl_id, cache_path))
98 | return df
99 |
100 |
101 | # We're using `pickle` to serialize and save the downloaded data as a file, which will prevent our script from re-downloading the same data each time we run the script. The function will return the data as a [Pandas]('http://pandas.pydata.org/') dataframe. If you're not familiar with dataframes, you can think of them as super-powered Python spreadsheets.
102 |
103 | # ##### Step 2.2 - Pull Kraken Exchange Pricing Data
104 | # Let's first pull the historical Bitcoin exchange rate for the [Kraken](https://www.kraken.com/) Bitcoin exchange.
105 |
106 | # In[5]:
107 |
108 |
109 | # Pull Kraken BTC price exchange data
110 | btc_usd_price_kraken = get_quandl_data('BCHARTS/KRAKENUSD')
111 |
112 |
113 | # We can inspect the first 5 rows of the dataframe using the `head()` method.
114 |
115 | # In[6]:
116 |
117 |
118 | btc_usd_price_kraken.head()
119 |
120 |
121 | # Next, we'll generate a simple chart as a quick visual verification that the data looks correct.
122 |
123 | # In[7]:
124 |
125 |
126 | # Chart the BTC pricing data
127 | btc_trace = go.Scatter(x=btc_usd_price_kraken.index, y=btc_usd_price_kraken['Weighted Price'])
128 | py.iplot([btc_trace])
129 |
130 |
131 | # Here, we're using [Plotly](https://plot.ly/) for generating our visualizations. This is a less traditional choice than some of the more established Python data visualization libraries such as [Matplotlib](https://matplotlib.org/), but I think Plotly is a great choice since it produces fully-interactive charts using [D3.js](https://d3js.org/). These charts have attractive visual defaults, are easy to explore, and are very simple to embed in web pages.
132 | #
133 | # > As a quick sanity check, you should compare the generated chart with publically available graphs on Bitcoin prices(such as those on [Coinbase](https://www.coinbase.com/dashboard)), to verify that the downloaded data is legit.
134 |
135 | # ##### Step 2.3 - Pull Pricing Data From More BTC Exchanges
136 | # You might have noticed a hitch in this dataset - there are a few notable down-spikes, particularly in late 2014 and early 2016. These spikes are specific to the Kraken dataset, and we obviously don't want them to be reflected in our overall pricing analysis.
137 | #
138 | # The nature of Bitcoin exchanges is that the pricing is determined by supply and demand, hence no single exchange contains a true "master price" of Bitcoin. To solve this issue, along with that of down-spikes, we'll pull data from three more major Bitcoin changes to calculate an aggregate Bitcoin price index.
139 | #
140 | # First, we will download the data from each exchange into a dictionary of dataframes.
141 |
142 | # In[8]:
143 |
144 |
145 | # Pull pricing data for 3 more BTC exchanges
146 | exchanges = ['COINBASE','BITSTAMP','ITBIT']
147 |
148 | exchange_data = {}
149 |
150 | exchange_data['KRAKEN'] = btc_usd_price_kraken
151 |
152 | for exchange in exchanges:
153 | exchange_code = 'BCHARTS/{}USD'.format(exchange)
154 | btc_exchange_df = get_quandl_data(exchange_code)
155 | exchange_data[exchange] = btc_exchange_df
156 |
157 |
158 | # ##### Step 2.4 - Merge All Of The Pricing Data Into A Single Dataframe
159 | # Next, we will define a simple function to merge a common column of each dataframe into a new combined dataframe.
160 |
161 | # In[9]:
162 |
163 |
164 | def merge_dfs_on_column(dataframes, labels, col):
165 | '''Merge a single column of each dataframe into a new combined dataframe'''
166 | series_dict = {}
167 | for index in range(len(dataframes)):
168 | series_dict[labels[index]] = dataframes[index][col]
169 |
170 | return pd.DataFrame(series_dict)
171 |
172 |
173 | # Now we will merge all of the dataframes together on their "Weighted Price" column.
174 |
175 | # In[10]:
176 |
177 |
178 | # Merge the BTC price dataseries' into a single dataframe
179 | btc_usd_datasets = merge_dfs_on_column(list(exchange_data.values()), list(exchange_data.keys()), 'Weighted Price')
180 |
181 |
182 | # Finally, we can preview last five rows the result using the `tail()` method, to make sure it looks ok.
183 |
184 | # In[11]:
185 |
186 |
187 | btc_usd_datasets.tail()
188 |
189 |
190 | # ##### Step 2.5 - Visualize The Pricing Datasets
191 | # The next logical step is to visualize how these pricing datasets compare. For this, we'll define a helper function to provide a single-line command to compare each column in the dataframe on a graph using Plotly.
192 |
193 | # In[12]:
194 |
195 |
196 | def df_scatter(df, title, seperate_y_axis=False, y_axis_label='', scale='linear', initial_hide=False):
197 | '''Generate a scatter plot of the entire dataframe'''
198 | label_arr = list(df)
199 | series_arr = list(map(lambda col: df[col], label_arr))
200 |
201 | layout = go.Layout(
202 | title=title,
203 | legend=dict(orientation="h"),
204 | xaxis=dict(type='date'),
205 | yaxis=dict(
206 | title=y_axis_label,
207 | showticklabels= not seperate_y_axis,
208 | type=scale
209 | )
210 | )
211 |
212 | y_axis_config = dict(
213 | overlaying='y',
214 | showticklabels=False,
215 | type=scale )
216 |
217 | visibility = 'visible'
218 | if initial_hide:
219 | visibility = 'legendonly'
220 |
221 | # Form Trace For Each Series
222 | trace_arr = []
223 | for index, series in enumerate(series_arr):
224 | trace = go.Scatter(
225 | x=series.index,
226 | y=series,
227 | name=label_arr[index],
228 | visible=visibility
229 | )
230 |
231 | # Add seperate axis for the series
232 | if seperate_y_axis:
233 | trace['yaxis'] = 'y{}'.format(index + 1)
234 | layout['yaxis{}'.format(index + 1)] = y_axis_config
235 | trace_arr.append(trace)
236 |
237 | fig = go.Figure(data=trace_arr, layout=layout)
238 | py.iplot(fig)
239 |
240 |
241 | # In the interest of brevity, I won't go too far into how this helper function works. Check out the documentation for [Pandas](http://pandas.pydata.org/) and [Plotly](https://plot.ly/) if you would like to learn more.
242 |
243 | # With the function defined, we can compare our pricing datasets like so.
244 |
245 | # In[13]:
246 |
247 |
248 | # Plot all of the BTC exchange prices
249 | df_scatter(btc_usd_datasets, 'Bitcoin Price (USD) By Exchange')
250 |
251 |
252 | # ##### Step 2.6 - Clean and Aggregate the Pricing Data
253 | # We can see that, although the four series follow roughly the same path, there are various irregularities in each that we'll want to get rid of.
254 | #
255 | # Let's remove all of the zero values from the dataframe, since we know that the price of Bitcoin has never been equal to zero in the timeframe that we are examining.
256 |
257 | # In[14]:
258 |
259 |
260 | # Remove "0" values
261 | btc_usd_datasets.replace(0, np.nan, inplace=True)
262 |
263 |
264 | # When we re-chart the dataframe, we'll see a much cleaner looking chart without the spikes.
265 |
266 | # In[15]:
267 |
268 |
269 | # Plot the revised dataframe
270 | df_scatter(btc_usd_datasets, 'Bitcoin Price (USD) By Exchange')
271 |
272 |
273 | # We can now calculate a new column, containing the daily average Bitcoin price across all of the exchanges.
274 |
275 | # In[16]:
276 |
277 |
278 | # Calculate the average BTC price as a new column
279 | btc_usd_datasets['avg_btc_price_usd'] = btc_usd_datasets.mean(axis=1)
280 |
281 |
282 | # This new column is our Bitcoin pricing index! Let's chart that column to make sure it looks ok.
283 |
284 | # In[17]:
285 |
286 |
287 | # Plot the average BTC price
288 | btc_trace = go.Scatter(x=btc_usd_datasets.index, y=btc_usd_datasets['avg_btc_price_usd'])
289 | py.iplot([btc_trace])
290 |
291 |
292 | # Yup, looks good. We'll use this aggregate pricing series later on, in order to convert the exchange rates of other cryptocurrencies to USD.
293 |
294 | # ### Step 3 - Retrieve Altcoin Pricing Data
295 | # Now that we have a solid time series dataset for the price of Bitcoin, let's pull in some data on non-Bitcoin cryptocurrencies, commonly referred to as altcoins.
296 |
297 | # ##### Step 3.1 - Define Poloniex API Helper Functions
298 | #
299 | # For retrieving data on cryptocurrencies we'll be using the [Poloniex API](https://poloniex.com/support/api/). To assist in the altcoin data retrieval, we'll define two helper functions to download and cache JSON data from this API.
300 | #
301 | # First, we'll define `get_json_data`, which will download and cache JSON data from a provided URL.
302 |
303 | # In[18]:
304 |
305 |
306 | def get_json_data(json_url, cache_path):
307 | '''Download and cache JSON data, return as a dataframe.'''
308 | try:
309 | f = open(cache_path, 'rb')
310 | df = pickle.load(f)
311 | print('Loaded {} from cache'.format(json_url))
312 | except (OSError, IOError) as e:
313 | print('Downloading {}'.format(json_url))
314 | df = pd.read_json(json_url)
315 | df.to_pickle(cache_path)
316 | print('Cached response at {}'.format(json_url, cache_path))
317 | return df
318 |
319 |
320 | # Next, we'll define a function to format Poloniex API HTTP requests and call our new `get_json_data` function to save the resulting data.
321 |
322 | # In[19]:
323 |
324 |
325 | base_polo_url = 'https://poloniex.com/public?command=returnChartData¤cyPair={}&start={}&end={}&period={}'
326 | start_date = datetime.strptime('2015-01-01', '%Y-%m-%d') # get data from the start of 2015
327 | end_date = datetime.now() # up until today
328 | pediod = 86400 # pull daily data (86,400 seconds per day)
329 |
330 | def get_crypto_data(poloniex_pair):
331 | '''Retrieve cryptocurrency data from poloniex'''
332 | json_url = base_polo_url.format(poloniex_pair, start_date.timestamp(), end_date.timestamp(), pediod)
333 | data_df = get_json_data(json_url, poloniex_pair)
334 | data_df = data_df.set_index('date')
335 | return data_df
336 |
337 |
338 | # This function will take a cryptocurrency pair string (such as 'BTC_ETH') and return the dataframe containing the historical exchange rate of the two currencies.
339 |
340 | # ##### Step 3.2 - Download Trading Data From Poloniex
341 | # Most altcoins cannot be bought directly with USD; to acquire these coins individuals often buy Bitcoins and then trade the Bitcoins for altcoins on cryptocurrency exchanges. For this reason we'll be downloading the exchange rate to BTC for each coin, and then we'll use our existing BTC pricing data to convert this value to USD.
342 |
343 | # We'll download exchange data for nine of the top cryptocurrencies -
344 | # [Ethereum](https://www.ethereum.org/), [Litecoin](https://litecoin.org/), [Ripple](https://ripple.com/), [Ethereum Classic](https://ethereumclassic.github.io/), [Stellar](https://www.stellar.org/), [Dashcoin](http://dashcoin.info/), [Siacoin](http://sia.tech/), [Monero](https://getmonero.org/), and [NEM](https://www.nem.io/).
345 |
346 | # In[20]:
347 |
348 |
349 | altcoins = ['ETH','LTC','XRP','ETC','STR','DASH','SC','XMR','XEM']
350 |
351 | altcoin_data = {}
352 | for altcoin in altcoins:
353 | coinpair = 'BTC_{}'.format(altcoin)
354 | crypto_price_df = get_crypto_data(coinpair)
355 | altcoin_data[altcoin] = crypto_price_df
356 |
357 |
358 | # Now we have a dictionary of 9 dataframes, each containing the historical daily average exchange prices between the altcoin and Bitcoin.
359 | #
360 | # We can preview the last few rows of the Ethereum price table to make sure it looks ok.
361 |
362 | # In[21]:
363 |
364 |
365 | altcoin_data['ETH'].tail()
366 |
367 |
368 | # ##### Step 3.3 - Convert Prices to USD
369 | #
370 | # Since we now have the exchange rate for each cryptocurrency to Bitcoin, and we have the Bitcoin/USD historical pricing index, we can directly calculate the USD price series for each altcoin.
371 |
372 | # In[22]:
373 |
374 |
375 | # Calculate USD Price as a new column in each altcoin dataframe
376 | for altcoin in altcoin_data.keys():
377 | altcoin_data[altcoin]['price_usd'] = altcoin_data[altcoin]['weightedAverage'] * btc_usd_datasets['avg_btc_price_usd']
378 |
379 |
380 | # Here, we've created a new column in each altcoin dataframe with the USD prices for that coin.
381 | #
382 | # Next, we can re-use our `merge_dfs_on_column` function from earlier to create a combined dataframe of the USD price for each cryptocurrency.
383 |
384 | # In[23]:
385 |
386 |
387 | # Merge USD price of each altcoin into single dataframe
388 | combined_df = merge_dfs_on_column(list(altcoin_data.values()), list(altcoin_data.keys()), 'price_usd')
389 |
390 |
391 | # Easy. Now let's also add the Bitcoin prices as a final column to the combined dataframe.
392 |
393 | # In[24]:
394 |
395 |
396 | # Add BTC price to the dataframe
397 | combined_df['BTC'] = btc_usd_datasets['avg_btc_price_usd']
398 |
399 |
400 | # Now we should have a single dataframe containing daily USD prices for the ten cryptocurrencies that we're examining.
401 | #
402 | # Let's reuse our `df_scatter` function from earlier to chart all of the cryptocurrency prices against each other.
403 |
404 | # In[25]:
405 |
406 |
407 | # Chart all of the altocoin prices
408 | df_scatter(combined_df, 'Cryptocurrency Prices (USD)', seperate_y_axis=False, y_axis_label='Coin Value (USD)', scale='log')
409 |
410 |
411 | # Nice! This graph gives a pretty solid "big picture" view of how the exchange rates of each currency have varied over the past few years.
412 | #
413 | # > Note that we're using a logarithmic y-axis scale in order to compare all of currencies on the same plot. You are welcome to try out different parameters values here (such as `scale='linear'`) to get different perspectives on the data.
414 |
415 | # ##### Step 3.4 - Compute Correlation Values of The Cryptocurrencies
416 | # You might notice is that the cryptocurrency exchange rates, despite their wildly different values and volatility, seem to be slightly correlated. Especially since the spike in April 2017, even many of the smaller fluctuations appear to be occurring in sync across the entire market.
417 | #
418 | # A visually-derived hunch is not much better than a guess until we have the stats to back it up.
419 | #
420 | # We can test our correlation hypothesis using the Pandas `corr()` method, which computes a Pearson correlation coefficient for each column in the dataframe against each other column.
421 | #
422 | # Computing correlations directly on a non-stationary time series (such as raw pricing data) can give biased correlation values. We will work around this by using the `pct_change()` method, which will convert each cell in the dataframe from an absolute price value to a daily return percentage.
423 | #
424 | # First we'll calculate correlations for 2016.
425 |
426 | # In[26]:
427 |
428 |
429 | # Calculate the pearson correlation coefficients for altcoins in 2016
430 | combined_df_2016 = combined_df[combined_df.index.year == 2016]
431 | combined_df_2016.pct_change().corr(method='pearson')
432 |
433 |
434 | # These correlation coefficients are all over the place. Coefficients close to 1 or -1 mean that the series' are strongly correlated or inversely correlated respectively, and coefficients close to zero mean that the values tend to fluctuate independently of each other.
435 | #
436 | # To help visualize these results, we'll create one more helper visualization function.
437 |
438 | # In[27]:
439 |
440 |
441 | def correlation_heatmap(df, title, absolute_bounds=True):
442 | '''Plot a correlation heatmap for the entire dataframe'''
443 | heatmap = go.Heatmap(
444 | z=df.corr(method='pearson').as_matrix(),
445 | x=df.columns,
446 | y=df.columns,
447 | colorbar=dict(title='Pearson Coefficient'),
448 | )
449 |
450 | layout = go.Layout(title=title)
451 |
452 | if absolute_bounds:
453 | heatmap['zmax'] = 1.0
454 | heatmap['zmin'] = -1.0
455 |
456 | fig = go.Figure(data=[heatmap], layout=layout)
457 | py.iplot(fig)
458 |
459 |
460 | # In[28]:
461 |
462 |
463 | correlation_heatmap(combined_df_2016.pct_change(), "Cryptocurrency Correlations in 2016")
464 |
465 |
466 | # Here, the dark red values represent strong correlations (note that each currency is, obviously, strongly correlated with itself), and the dark blue values represent strong inverse correlations. All of the light blue/orange/gray/tan colors in-between represent varying degrees of weak/non-existent correlations.
467 | #
468 | # What does this chart tell us? Essentially, it shows that there was very little statistically significant linkage between how the prices of different cryptocurrencies fluctuated during 2016.
469 | #
470 | # Now, to test our hypothesis that the cryptocurrencies have become more correlated in recent months, let's repeat the same test using only the data from 2017.
471 |
472 | # In[29]:
473 |
474 |
475 | combined_df_2017 = combined_df[combined_df.index.year == 2017]
476 | combined_df_2017.pct_change().corr(method='pearson')
477 |
478 |
479 | # These are somewhat more significant correlation coefficients. Strong enough to use as the sole basis for an investment? Certainly not.
480 | #
481 | # It is notable, however, that almost all of the cryptocurrencies have become more correlated with each other across the board.
482 |
483 | # In[30]:
484 |
485 |
486 | correlation_heatmap(combined_df_2017.pct_change(), "Cryptocurrency Correlations in 2017")
487 |
488 |
489 | # Huh. That's rather interesting.
490 |
491 | # ### Why is this happening?
492 | #
493 | # Good question. I'm really not sure.
494 | #
495 | # The most immediate explanation that comes to mind is that **hedge funds have recently begun publicly trading in crypto-currency markets**[^1][^2]. These funds have vastly more capital to play with than the average trader, so if a fund is hedging their bets across multiple cryptocurrencies, and using similar trading strategies for each based on independent variables (say, the stock market), it could make sense that this trend would emerge.
496 | #
497 | # ##### In-Depth - XRP and STR
498 | # For instance, one noticeable trait of the above chart is that XRP (the token for [Ripple](https://ripple.com/)), is the least correlated cryptocurrency. The notable exception here is with STR (the token for [Stellar](https://www.stellar.org/), officially known as "Lumens"), which has a stronger (0.62) correlation with XRP.
499 | #
500 | # What is interesting here is that Stellar and Ripple are both fairly similar fintech platforms aimed at reducing the friction of international money transfers between banks.
501 | #
502 | # It is conceivable that some big-money players and hedge funds might be using similar trading strategies for their investments in Stellar and Ripple, due to the similarity of the blockchain services that use each token. This could explain why XRP is so much more heavily correlated with STR than with the other cryptocurrencies.
503 | #
504 | # > Quick Plug - I'm a contributor to [Chipper](https://www.chipper.xyz/), a (very) early-stage startup using Stellar with the aim of disrupting micro-remittances in Africa.
505 | #
506 | # ### Your Turn
507 | #
508 | # This explanation is, however, largely speculative. **Maybe you can do better**. With the foundation we've made here, there are hundreds of different paths to take to continue searching for stories within the data.
509 | #
510 | # Here are some ideas:
511 | #
512 | # - Add data from more cryptocurrencies to the analysis.
513 | # - Adjust the time frame and granularity of the correlation analysis, for a more fine or coarse grained view of the trends.
514 | # - Search for trends in trading volume and/or blockchain mining data sets. The buy/sell volume ratios are likely more relevant than the raw price data if you want to predict future price fluctuations.
515 | # - Add pricing data on stocks, commodities, and fiat currencies to determine which of them correlate with cryptocurrencies (but please remember the old adage that "Correlation does not imply causation").
516 | # - Quantify the amount of "buzz" surrounding specific cryptocurrencies using Event Registry, GDLELT, and Google Trends.
517 | # - Train a predictive machine learning model on the data to predict tomorrow's prices. If you're more ambitious, you could even try doing this with a recurrent neural network (RNN).
518 | # - Use your analysis to create an automated "Trading Bot" on a trading site such as Poloniex or Coinbase, using their respective trading APIs. Be careful: a poorly optimized trading bot is an easy way to lose your money quickly.
519 | # - **Share your findings!** The best part of Bitcoin, and of cryptocurrencies in general, is that their decentralized nature makes them more free and democratic than virtually any other market. Open source your analysis, participate in the community, maybe write a blog post about it.
520 | #
521 | # Hopefully, now you have the skills to do your own analysis and to think critically about any speculative cryptocurrency articles you might read in the near future, especially those written without any data to back up the provided predictions.
522 | #
523 | # Thanks for reading, and feel free to comment below with any ideas, suggestions, or criticisms regarding this tutorial. I've got second (and potentially third) part in the works, which will likely be following through on some of same the ideas listed above, so stay tuned for more in the coming weeks.
524 | #
525 | # [^1]: http://fortune.com/2017/07/26/bitcoin-cryptocurrency-hedge-fund-sequoia-andreessen-horowitz-metastable/
526 | # [^2]: https://www.forbes.com/sites/laurashin/2017/07/12/crypto-boom-15-new-hedge-funds-want-in-on-84000-returns/#7946ab0d416a
527 |
--------------------------------------------------------------------------------
/Dockerfile:
--------------------------------------------------------------------------------
1 | FROM jupyter/datascience-notebook
2 |
3 | RUN mkdir /home/jovyan/work/data
4 |
5 | ADD . /home/jovyan/work
6 |
7 | RUN pip install quandl plotly
8 |
9 | ENV QUANDL_API_KEY $QUANDL_API_KEY
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # Analyzing Cryptocurrency Markets Using Python
2 | ## A Data-Driven Approach To Cryptocurrency Speculation
3 |
4 | *How do Bitcoin markets behave? What are the causes of the sudden spikes and dips in cryptocurrency values? Are the markets for different altcoins inseparably linked or largely independent? **How can we predict what will happen next?***
5 |
6 | Articles on cryptocurrencies, such as Bitcoin and Ethereum, are rife with speculation these days, with hundreds of self-proclaimed experts advocating for the trends that they expect to emerge. What is lacking from many of these analyses is a strong foundation of data and statistics to backup the claims.
7 |
8 | The goal of this article is to provide an easy introduction to cryptocurrency analysis using Python. We will walk through a simple Python script to retrieve, analyze, and visualize data on different cryptocurrencies. In the process, we will uncover an interesting trend in how these volatile markets behave, and how they are evolving.
9 |
10 |
11 |
12 | This is not a post explaining what cryptocurrencies are (if you want one, I would recommend this great overview), nor is it an opinion piece on which specific currencies will rise and which will fall. Instead, all that we are concerned about in this tutorial is procuring the raw data and uncovering the stories hidden in the numbers.
13 |
14 | To read more, visit -
15 | [blog.patricktriest.com/analyzing-cryptocurrencies-python](https://blog.patricktriest.com/analyzing-cryptocurrencies-python/)
16 |
17 | ___
18 |
19 | An HTML version of the entire notebook, with results and visualizations, is available here -
20 | https://cdn.patricktriest.com/blog/images/posts/crypto-markets/Cryptocurrency-Pricing-Analysis.html
21 |
22 | Included in this repository are the
23 | - IPython Notebook
24 | - Notebook Python File
25 | - Notebook HTML Page
26 | - Pre-rendered charts (PNG and HTML)
27 |
28 | This Python notebook is 100% open-source, feel free to utilize the code however you would like.
29 | ```
30 | The MIT License (MIT)
31 |
32 | Copyright (c) 2017 Patrick Triest
33 |
34 | Permission is hereby granted, free of charge, to any person obtaining a copy
35 | of this software and associated documentation files (the "Software"), to deal
36 | in the Software without restriction, including without limitation the rights
37 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
38 | copies of the Software, and to permit persons to whom the Software is
39 | furnished to do so, subject to the following conditions:
40 |
41 | The above copyright notice and this permission notice shall be included in all
42 | copies or substantial portions of the Software.
43 |
44 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
45 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
46 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
47 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
48 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
49 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
50 | SOFTWARE.
51 | ```
52 |
--------------------------------------------------------------------------------
/charts/html/aggregate-bitcoin-price.html:
--------------------------------------------------------------------------------
1 |
2 |
3 |