\n",
14 | "\n",
15 | "___\n",
16 | "\n",
17 | "\n",
18 | "The CAPM model describes the relationship between expected returns and volatility (systematic risk). Why does this matter? Because investors expect to be compensated for risk and time value of money. So the CAPM is used as a theoretical model that adjusts for risk when evaluating the value of a stock.\n",
19 | "\n",
20 | "This model assumes the existance of a market portfolio - all possible investments in the world combined - hence the existance of a risk-free asset. However, this is not true. It also assumes that all investors are rational, and therefore hold the optimal portfolio. This is in consequence of the mutual fund theorem: _all investors hold the same portfolio of risky assets, the tangency portfolio_. Therefore, the CAPM assumes that the tangency portfolio is the market portfolio. Again... not true.\n",
21 | "\n",
22 | "In this context, the tangency portfolio is the portfolio with the largest Sharpe Ratio. But what is the _Sharpe Ratio?_\n",
23 | "\n",
24 | "\n",
25 | "**Sharpe Ratio**: measures the performance of a security compared to a risk-free asset, after adjusting for its risk. This is the excess return per unit of risk of an investment.\n",
26 | "$$\n",
27 | "Sharpe = \\frac{\\overline{r_{i}} - r_f}{\\sigma_{i}}\n",
28 | "$$\n",
29 | " When Sharpe > 1, GOOD risk-adjusted returns\n",
30 | " \n",
31 | " When Sharpe > 2, VERY GOOD risk-adjusted returns\n",
32 | " \n",
33 | " When Sharpe > 3, EXCELLENT risk-adjusted returns\n",
34 | "\n",
35 | "\n",
36 | "_How do we measure risk?_ There are many ways to measure risk, although variance (standard deviation) is one of the most common. However, when it comes to the risk that cannot be avoided through diversification, the Beta is king!\n",
37 | "\n",
38 | "**Beta**: measures the market risk that cannot be avoided through diversification. This is the relationship between the stock and the market portfolio. In other words, it is a measure of how much risk the investment will add to a portfolio that looks like the market.\n",
39 | "$$ \n",
40 | "\\beta_{i} = \\frac{\\sigma_{i,m}}{\\sigma_{m}^2}\n",
41 | "$$\n",
42 | "\n",
43 | " When beta = 0, it means that there's no relationship.\n",
44 | " \n",
45 | " When beta < 1, it means that the stock is defensive (less prone to high highs and low lows)\n",
46 | " \n",
47 | " When beta > 1, it means that the stock is aggresive (more prone to high highs and low lows)\n",
48 | " \n",
49 | "Amazing! We're only one small step away. The risk-adjusted returns. \n",
50 | "\n",
51 | "**Expected Return CAPM**: calculates the expected return of a security adjusted to the risk taken. This equates to the return expected from taking the extra risk of purchasing this security.\n",
52 | "$$\n",
53 | "\\overline{r_{i}} = r_f + \\beta_{i}(\\overline{r_{m}} - r_f) \n",
54 | "$$\n",
55 | "\n",
56 | "Awesome! There are a couple more things we will discuss later, but for now, now that we understand the underlying theory of the CAPM model, let's get coding!\n",
57 | "\n",
58 | "---\n",
59 | "\n",
60 | "**Step 1**:Import necessary libraries"
61 | ]
62 | },
63 | {
64 | "cell_type": "code",
65 | "execution_count": 1,
66 | "metadata": {},
67 | "outputs": [],
68 | "source": [
69 | "import numpy as np\n",
70 | "import pandas as pd\n",
71 | "from pandas_datareader import data as wb"
72 | ]
73 | },
74 | {
75 | "cell_type": "markdown",
76 | "metadata": {},
77 | "source": [
78 | "**Step 2**: Import data for a stock and the market data. In this case we will use:\n",
79 | " 1. Amazon\n",
80 | " 2. S&P 500 (as a proxy for the market)"
81 | ]
82 | },
83 | {
84 | "cell_type": "code",
85 | "execution_count": 2,
86 | "metadata": {},
87 | "outputs": [],
88 | "source": [
89 | "tickers = ['AMZN','^GSPC']\n",
90 | "data = pd.DataFrame()\n",
91 | "for t in tickers:\n",
92 | " data[t] = wb.DataReader(t, data_source='yahoo', start='2010-1-1')['Adj Close']"
93 | ]
94 | },
95 | {
96 | "cell_type": "markdown",
97 | "metadata": {},
98 | "source": [
99 | "**Step 3**: compute the logarthmic returns of the daily data. This is logarithmic daily returns of the data. \n",
100 | " \n",
101 | " Why logarithmic and not simple returns?\n",
102 | " \n",
103 | " We usually use logarithmic returns when making calculations about a single asset over time.\n",
104 | " We use simple returns when dealing with multiple assets over the same timeframe."
105 | ]
106 | },
107 | {
108 | "cell_type": "code",
109 | "execution_count": 3,
110 | "metadata": {},
111 | "outputs": [
112 | {
113 | "data": {
114 | "text/html": [
115 | "
\n",
116 | "\n",
129 | "
\n",
130 | " \n",
131 | "
\n",
132 | "
\n",
133 | "
AMZN
\n",
134 | "
^GSPC
\n",
135 | "
\n",
136 | "
\n",
137 | "
Date
\n",
138 | "
\n",
139 | "
\n",
140 | "
\n",
141 | " \n",
142 | " \n",
143 | "
\n",
144 | "
2010-01-04
\n",
145 | "
NaN
\n",
146 | "
NaN
\n",
147 | "
\n",
148 | "
\n",
149 | "
2010-01-05
\n",
150 | "
0.005883
\n",
151 | "
0.003111
\n",
152 | "
\n",
153 | "
\n",
154 | "
2010-01-06
\n",
155 | "
-0.018282
\n",
156 | "
0.000545
\n",
157 | "
\n",
158 | "
\n",
159 | "
2010-01-07
\n",
160 | "
-0.017160
\n",
161 | "
0.003993
\n",
162 | "
\n",
163 | "
\n",
164 | "
2010-01-08
\n",
165 | "
0.026717
\n",
166 | "
0.002878
\n",
167 | "
\n",
168 | " \n",
169 | "
\n",
170 | "
"
171 | ],
172 | "text/plain": [
173 | " AMZN ^GSPC\n",
174 | "Date \n",
175 | "2010-01-04 NaN NaN\n",
176 | "2010-01-05 0.005883 0.003111\n",
177 | "2010-01-06 -0.018282 0.000545\n",
178 | "2010-01-07 -0.017160 0.003993\n",
179 | "2010-01-08 0.026717 0.002878"
180 | ]
181 | },
182 | "execution_count": 3,
183 | "metadata": {},
184 | "output_type": "execute_result"
185 | }
186 | ],
187 | "source": [
188 | "sec_returns = np.log(data / data.shift(1))\n",
189 | "sec_returns.head()"
190 | ]
191 | },
192 | {
193 | "cell_type": "markdown",
194 | "metadata": {},
195 | "source": [
196 | "**Step 4**: Compute covariance and market variance.\n",
197 | "\n",
198 | " As we can see from the Beta function in the introduction, we need the covariance between Amazon stock and the market. We also need the variance of market returns. We need these annualized, so we will multiply them by 252, since there are 252 trading days in an actual year. "
199 | ]
200 | },
201 | {
202 | "cell_type": "code",
203 | "execution_count": 4,
204 | "metadata": {},
205 | "outputs": [
206 | {
207 | "data": {
208 | "text/html": [
209 | "
8 |
9 | ___
10 |
11 | # Table of Contents
12 |
13 | In this repository, I will include:
14 |
15 | 1. Introduction to CAPM, Beta and Sharpe Ratio
16 | 2. Introduction to Monte-Carlo Simulations
17 | 3. Portfolio Optimization
18 | 4. Predicting Stock Prices: Monte Carlo Simulations Automated (coded to an easy-to-use function)
19 | 5. Predicting Stock Prices: Monte Carlo Simulations with Cholesky Decomposition (as a means to correlate returns)
20 | 6. Predicting Stock Prices: Monte Carlo Simulations with Cholesky Automated (coded to an easy-to-use function)
21 | 7. Introduction to Time Series: ETS, EWMA, ARIMA, ACF, PACF
22 | 7. Momentum Metrics: Relative Strength Index and MACD
23 | 8. Predicting Option Prices: Introduction to Black-Scholes-Merton
24 | 9. Predicting Option Prices: Monte Carlo Simulations with Euler Discretization
25 | 10. Introduction to Quantopian
26 |
27 | ## Capital Asset Pricing Model
28 | The CAPM model describes the relationship between expected returns and volatility (systematic risk). Why does this matter? Because investors expect to be compensated for risk and time value of money. So the CAPM is used as a theoretical model that adjusts for risk when evaluating the value of a stock.
29 |
30 | This model assumes the existance of a market portfolio - all possible investments in the world combined - hence the existance of a risk-free asset. However, this is not true. It also assumes that all investors are rational, and therefore hold the optimal portfolio. This is in consequence of the mutual fund theorem: _all investors hold the same portfolio of risky assets, the tangency portfolio_. Therefore, the CAPM assumes that the tangency portfolio is the market portfolio. Again... not necessarily true. However, the model is great for understanding and conceptualizing the intricacies of risk in investing and the concept of diversification, so let's continue.
31 |
32 | In this context, the tangency portfolio is the portfolio with the largest Sharpe Ratio. But what is the _Sharpe Ratio?_
33 |
34 |
35 | **Sharpe Ratio**: measures the performance of a security compared to a risk-free asset, after adjusting for its risk. This is the excess return per unit of risk of an investment.
36 |
37 |
38 |
39 | When Sharpe > 1, GOOD risk-adjusted returns
40 |
41 | When Sharpe > 2, VERY GOOD risk-adjusted returns
42 |
43 | When Sharpe > 3, EXCELLENT risk-adjusted returns
44 |
45 |
46 | _How do we measure risk?_ There are many ways to measure risk, although variance (standard deviation) is one of the most common. However, when it comes to the risk that cannot be avoided through diversification, the Beta is king!
47 |
48 | **Beta**: measures the market risk that cannot be avoided through diversification. This is the relationship between the stock and the market portfolio. In other words, it is a measure of how much risk the investment will add to a portfolio that looks like the market.
49 |
50 |
51 |
52 | When beta = 0, it means that there's no relationship.
53 |
54 | When beta < 1, it means that the stock is defensive (less prone to high highs and low lows)
55 |
56 | When beta > 1, it means that the stock is aggresive (more prone to high highs and low lows)
57 |
58 | Amazing! We're only one small step away. The risk-adjusted returns.
59 |
60 | **Expected Return CAPM**: calculates the expected return of a security adjusted to the risk taken. This equates to the return expected from taking the extra risk of purchasing this security.
61 |
62 |
63 |
64 | Awesome! There are a couple more things we will discuss later, but for now, now that we understand the underlying theory of the CAPM model, let's get coding!
65 |
66 | ```
67 | # Import libraries
68 | import numpy as np
69 | import pandas as pd
70 | from pandas_datareader import data as wb
71 |
72 | # Load stock and market data
73 | tickers = ['AMZN','^GSPC']
74 | data = pd.DataFrame()
75 | for t in tickers:
76 | data[t] = wb.DataReader(t, data_source='yahoo', start='2010-1-1')['Adj Close']
77 |
78 | #Calculate logarithmic daily returns
79 | sec_returns = np.log(data / data.shift(1))
80 |
81 | # To calculate the beta, we need the covariance between the specific stock and the market...
82 | cov = sec_returns.cov() *252 #Annualize by multiplying by 252 (trading days in a year)
83 | cov_with_market = cov.iloc[0,1]
84 | # ...we also need the variance of the daily returns of the market
85 | market_var = sec_returns['^GSPC'].var()*252
86 |
87 | # Calculate Beta
88 | amazon_beta = cov_with_market / market_var
89 | ```
90 |
91 | Before calculating the expected risk-adjusted return, we must clarify a couple assumptions:
92 | 1. A 10 year US government bond is a good proxy for a risk-free asset, with a yield of 2.5%
93 | 2. The common risk premium is between 4.5% and 5.5%, so we will use 5%. Risk premium is the expected return of the market minus the risk-free return.
94 |
95 | ```
96 | riskfree = 0.025
97 | riskpremium = 0.05
98 | amazon_capm_return = riskfree + amazon_beta*riskpremium
99 | ```
100 | This yields an annualized risk-adjusted return of 7.52% (as per May 23rd 2020). Let's try the same procedure with the arithmetic mean of the returns of the market, instead of assuming a risk premium of 5%.
101 |
102 | ```
103 | riskfree = 0.025
104 | riskpremium = (sec_returns['^GSPC'].mean()*252) - riskfree
105 | amazon_capm_return = riskfree + amazon_beta*riskpremium
106 | ```
107 | This yields a 9.26% return - a considerable significant change.
108 |
109 | Last but not least, the Sharpe Ratio!
110 | ```
111 | log_returns = np.log(data / data.shift(1))
112 | sharpe_amazon = (amazon_capm_return-riskfree)/(log_returns['AMZN'].std()*250**0.5)
113 | ```
114 |
115 | Great! Now that we have demonstrated how to compute the metrics derived from the CAPM, let's make it into a convenient way of using it.
116 |
117 | ```
118 | from datetime import datetime
119 | #Import the data of any stock of set of stocks
120 | def import_stock_data(tickers, start = '2010-1-1', end = datetime.today().strftime('%Y-%m-%d')):
121 | data = pd.DataFrame()
122 | if len([tickers]) ==1:
123 | data[tickers] = wb.DataReader(tickers, data_source='yahoo', start = start)['Adj Close']
124 | data = pd.DataFrame(data)
125 | else:
126 | for t in tickers:
127 | data[t] = wb.DataReader(t, data_source='yahoo', start = start)['Adj Close']
128 | return(data)
129 |
130 | # Compute beta function
131 | def compute_beta(data, stock, market):
132 | log_returns = np.log(data / data.shift(1))
133 | cov = log_returns.cov()*250
134 | cov_w_market = cov.loc[stock,market]
135 | market_var = log_returns[market].var()*250
136 | return cov_w_market/market_var
137 |
138 | #Compute risk adjusted return function
139 | def compute_capm(data, stock, market, riskfree = 0.025, riskpremium = 'market'):
140 | log_returns = np.log(data / data.shift(1))
141 | if riskpremium == 'market':
142 | riskpremium = (log_returns[market].mean()*252) - riskfree
143 | beta = compute_beta(data, stock, market)
144 | return (riskfree + (beta*riskpremium))
145 |
146 | #Compute Sharpe Ratio
147 | def compute_sharpe(data, stock, market, riskfree = 0.025, riskpremium='market'):
148 | log_returns = np.log(data / data.shift(1))
149 | ret = compute_capm(data, stock, market, riskfree, riskpremium)
150 | return ((ret-riskfree)/(log_returns[stock].std()*250**0.5))
151 |
152 | # All in one function
153 | def stock_CAPM(stock_ticker, market_ticker, start_date = '2010-1-1', riskfree = 0.025, riskpremium = 'set'):
154 | data = import_stock_data([stock_ticker,market_ticker], start = start_date)
155 | beta = compute_beta(data, stock_ticker, market_ticker)
156 | capm = compute_capm(data, stock_ticker, market_ticker)
157 | sharpe = compute_sharpe(data, stock_ticker, market_ticker)
158 | #listcapm = [beta,capm,sharpe]
159 | capmdata = pd.DataFrame([beta,capm,sharpe], columns=[stock_ticker], index=['Beta','Return','Sharpe'])
160 | return capmdata.T
161 |
162 | stock_CAPM("AAPL","^GSPC")
163 | ```
164 |
165 |
166 | ## Monte Carlo Simulations
167 | Monte Carlo Simulations are an incredibly powerful tool in numerous contexts, including operations research, game theory, physics, business and finance, among others. It is a technique used to understand the impact of risk and uncertainty when making a decision. Simply put, a Monte Carlo simulation runs an enourmous amount of trials with different random numbers generated from an underlying distribution for the uncertain variables.
168 |
169 | Here, we will dive into how to predict stock prices using a Monte Carlo simulation!
170 |
171 | **What do we need to understand before we start?**
172 |
173 |
174 |
175 |
176 | * We know yesterday's price.
177 |
178 | * We want to predict today's price.
179 |
180 | * What we do not know is the rate of return, r, of the share price between yesterday and today.
181 |
182 | This is where the Monte Carlo simulation comes in! But first, how do we compute the return?
183 |
184 | ### Brownian Motion
185 |
186 | Brownian motion will be the main driver for estimating the return. It is a stochastic process used for modeling random behavior over time. For simplicity, we will use regular brownian motion, instead of the Geometric Brownian Motion, which is more common and less questionable in stock pricing applications.
187 |
188 | **Brownian Motion** has two main main components:
189 | 1. Drift - the direction that rates of returns have had in the past. That is, the expected return of the stock.
190 |
191 |
192 | Why do we multiply the variance by 0.5? Because historical values are eroded in the future.
193 |
194 |
195 | 2. Volatility - random variable. This is the historical volatility multiplied by a random, standard normally distributed variable.
196 |
197 |
198 |
199 | Therefore, our asset pricing equation ends up looking like this:
200 |
201 |
202 |
203 |
204 | This technique will be used for every day into the future you want to predict, and for however many trials the monte carlo simulation will run!
205 |
206 | ---
207 |
208 | First, import required libraries.
209 | ```
210 | #Import libraries
211 | import numpy as np
212 | import pandas as pd
213 | from pandas_datareader import data as wb
214 | import matplotlib.pyplot as plt
215 | from scipy.stats import norm, gmean, cauchy
216 | import seaborn as sns
217 | from datetime import datetime
218 |
219 | %matplotlib inline
220 | ```
221 | Import data for one or multiple stocks from a specified date until the last available data. Data source: yahoo finance.
222 |
223 | For this, it's better if we define a function that imports stock(s) daily data for any publicly traded company as defined by the user starting at a user-defined date until today. We will use the Adjusted Close price. We will continue using Amazon as a running example.
224 |
225 | ```
226 | #Import stock data function
227 | ticker = 'AMZN'
228 | data = pd.DataFrame()
229 | data[ticker] = wb.DataReader(ticker, data_source='yahoo', start='2010-1-1')['Adj Close']
230 | data.head()
231 |
232 | #Compute log or simple returns
233 | def log_returns(data):
234 | return (np.log(1+data.pct_change()))
235 |
236 | def simple_returns(data):
237 | return ((data/data.shift(1))-1)
238 |
239 | log_return = log_returns(data)
240 |
241 | #Plot returns histogram for AMZN
242 | sns.distplot(log_returns.iloc[1:])
243 | plt.xlabel("Daily Return")
244 | plt.ylabel("Frequency")
245 | ```
246 |
247 |
248 | ```
249 | data.plot(figsize=(15,6))
250 | ```
251 |
252 |
253 | Great! Next, we have to calculate the Brownian Motion with randomly generated returns.
254 | ```
255 | #Calculate the Drift
256 | u = log_returns.mean()
257 | var = log_returns.var()
258 | drift = u - (0.5*var)
259 | ```
260 | Before we generate the variable portion of the generated returns, I'll show you how to generate uncorrelated random daily returns. Note that correlated returns are very valuable when discussing derivates that are based on an underlying basket of assets. We will discuss the Cholesky Decomposition method in a later.
261 | ```
262 | #Returns random variables between 0 and 1
263 | x = np.random.rand(10,2)
264 |
265 | #Percent Point Function - the inverse of a CDF
266 | norm.ppf(x)
267 | ```
268 | Using these, we can generate random returns. For example, we can run 1,000 iterations of random walks consisting of 50 steps (days). Now we can generate the variable part of the Brownian Motion.
269 | ```
270 | #Calculate standard deviation of returns
271 | stddev = log_returns.std()
272 | #Calculate expected daily returns for all of the iterations
273 | daily_returns = np.exp(drift.values + stddev.values * norm.ppf(np.random.rand(50,1000)))
274 | ```
275 | So close! Now that we have randomly generated 50 random returns for every one of the ten thousand trials, all we need is to calculate the price path for each of the trials!
276 | ```
277 | # Create matrix with same size as daily returns matrix
278 | price_list = np.zeros_like(daily_returns)
279 |
280 | # Introduce the last known price for the stock in the first item of every iteration - ie Day 0 for every trial in the simulation
281 | price_list[0] = data.iloc[-1]
282 |
283 | # Run a loop to calculate the price today for every simulation based on the daily returns generated
284 | for t in range(1,50):
285 | price_list[t] = price_list[t-1]*daily_returns[t]
286 | ```
287 | Voila! We have officially finished the Monte Carlo simulation to predict stock prices. Let's see how it looks!
288 |
289 | The first 30 simulations:
290 |
291 |
292 |
293 | The histogram of the final prices for each simulation:
294 |
295 |
296 |
297 | With these predictions, we can now calcualte Value at Risk, or simply the probability of a certain event occuring and the expected annualized return. We will do this once we create automated versions of this process that can handle multiple stocks and reports certain metrics, included the aforementioned and other CAPM metrics!
298 |
299 | But first, let's create a Monte Carlo simulation that returns some basic statistics with it and that is highly flexible!
300 |
301 | ## Monte Carlo Simulation Easy-to-Use Functions
302 |
303 | We first import the required libraries
304 | ```
305 | import numpy as np
306 | import pandas as pd
307 | from pandas_datareader import data as wb
308 | import matplotlib.pyplot as plt
309 | from scipy.stats import norm, gmean, cauchy
310 | import seaborn as sns
311 | from datetime import datetime
312 |
313 | %matplotlib inline
314 | ```
315 | Then we create the functions for:
316 | 1. Import stock data
317 | 2. Compute log or simple returns of stocks
318 | 3. Append market data (default S&P) with the imported stock data
319 | 4. Compute the CAPM metrics: Beta, Standard Deviation, Risk-adjusted return, and Sharpe Ratio
320 | 5. Compute Drift - Brownian Motion
321 | 6. Generate Daily Returns - Brownian Motion - for all simulations
322 | 7. Probability Function - computes rpobability of something happening
323 |
324 | ```
325 | def import_stock_data(tickers, start = '2010-1-1', end = datetime.today().strftime('%Y-%m-%d')):
326 | data = pd.DataFrame()
327 | if len([tickers]) ==1:
328 | data[tickers] = wb.DataReader(tickers, data_source='yahoo', start = start)['Adj Close']
329 | data = pd.DataFrame(data)
330 | else:
331 | for t in tickers:
332 | data[t] = wb.DataReader(t, data_source='yahoo', start = start)['Adj Close']
333 | return(data)
334 |
335 | def log_returns(data):
336 | return (np.log(1+data.pct_change()))
337 | def simple_returns(data):
338 | return ((data/data.shift(1))-1)
339 |
340 | def market_data_combination(data, mark_ticker = "^GSPC", start='2010-1-1'):
341 | market_data = import_stock_data(mark_ticker, start)
342 | market_rets = log_returns(market_data).dropna()
343 | ann_return = np.exp(market_rets.mean()*252).values-1
344 | data = data.merge(market_data, left_index=True, right_index=True)
345 | return data, ann_return
346 |
347 | def beta_sharpe(data, mark_ticker = "^GSPC", start='2010-1-1', riskfree = 0.025):
348 |
349 | """
350 | Input:
351 | 1. data: dataframe of stock price data
352 | 2. mark_ticker: ticker of the market data you want to compute CAPM metrics with (default is ^GSPC)
353 | 3. start: data from which to download data (default Jan 1st 2010)
354 | 4. riskfree: the assumed risk free yield (US 10 Year Bond is assumed: 2.5%)
355 |
356 | Output:
357 | 1. Dataframe with CAPM metrics computed against specified market procy
358 | """
359 | # Beta
360 | dd, mark_ret = market_data_combination(data, mark_ticker, start)
361 | log_ret = log_returns(dd)
362 | covar = log_ret.cov()*252
363 | covar = pd.DataFrame(covar.iloc[:-1,-1])
364 | mrk_var = log_ret.iloc[:,-1].var()*252
365 | beta = covar/mrk_var
366 |
367 | stdev_ret = pd.DataFrame(((log_ret.std()*250**0.5)[:-1]), columns=['STD'])
368 | beta = beta.merge(stdev_ret, left_index=True, right_index=True)
369 |
370 | # CAPM
371 | for i, row in beta.iterrows():
372 | beta.at[i,'CAPM'] = riskfree + (row[mark_ticker] * (mark_ret-riskfree))
373 | # Sharpe
374 | for i, row in beta.iterrows():
375 | beta.at[i,'Sharpe'] = ((row['CAPM']-riskfree)/(row['STD']))
376 | beta.rename(columns={"^GSPC":"Beta"}, inplace=True)
377 |
378 | return beta
379 |
380 | def drift_calc(data, return_type='log'):
381 | if return_type=='log':
382 | lr = log_returns(data)
383 | elif return_type=='simple':
384 | lr = simple_returns(data)
385 | u = lr.mean()
386 | var = lr.var()
387 | drift = u-(0.5*var)
388 | try:
389 | return drift.values
390 | except:
391 | return drift
392 |
393 | def daily_returns(data, days, iterations, return_type='log'):
394 | ft = drift_calc(data, return_type)
395 | if return_type == 'log':
396 | try:
397 | stv = log_returns(data).std().values
398 | except:
399 | stv = log_returns(data).std()
400 | elif return_type=='simple':
401 | try:
402 | stv = simple_returns(data).std().values
403 | except:
404 | stv = simple_returns(data).std()
405 | #Oftentimes, we find that the distribution of returns is a variation of the normal distribution where it has a fat tail
406 | # This distribution is called cauchy distribution
407 | dr = np.exp(ft + stv * norm.ppf(np.random.rand(days, iterations)))
408 | return dr
409 |
410 | def probs_find(predicted, higherthan, ticker = None, on = 'value'):
411 | """
412 | This function calculated the probability of a stock being above a certain threshhold, which can be defined as a value (final stock price) or return rate (percentage change)
413 | Input:
414 | 1. predicted: dataframe with all the predicted prices (days and simulations)
415 | 2. higherthan: specified threshhold to which compute the probability (ex. 0 on return will compute the probability of at least breakeven)
416 | 3. on: 'return' or 'value', the return of the stock or the final value of stock for every simulation over the time specified
417 | 4. ticker: specific ticker to compute probability for
418 | """
419 | if ticker == None:
420 | if on == 'return':
421 | predicted0 = predicted.iloc[0,0]
422 | predicted = predicted.iloc[-1]
423 | predList = list(predicted)
424 | over = [(i*100)/predicted0 for i in predList if ((i-predicted0)*100)/predicted0 >= higherthan]
425 | less = [(i*100)/predicted0 for i in predList if ((i-predicted0)*100)/predicted0 < higherthan]
426 | elif on == 'value':
427 | predicted = predicted.iloc[-1]
428 | predList = list(predicted)
429 | over = [i for i in predList if i >= higherthan]
430 | less = [i for i in predList if i < higherthan]
431 | else:
432 | print("'on' must be either value or return")
433 | else:
434 | if on == 'return':
435 | predicted = predicted[predicted['ticker'] == ticker]
436 | predicted0 = predicted.iloc[0,0]
437 | predicted = predicted.iloc[-1]
438 | predList = list(predicted)
439 | over = [(i*100)/predicted0 for i in predList if ((i-predicted0)*100)/predicted0 >= higherthan]
440 | less = [(i*100)/predicted0 for i in predList if ((i-predicted0)*100)/predicted0 < higherthan]
441 | elif on == 'value':
442 | predicted = predicted.iloc[-1]
443 | predList = list(predicted)
444 | over = [i for i in predList if i >= higherthan]
445 | less = [i for i in predList if i < higherthan]
446 | else:
447 | print("'on' must be either value or return")
448 | return (len(over)/(len(over)+len(less)))
449 | ```
450 | Great! With all these functions we can create yet antoher function that does a Monte Carlo simulation for each stock.
451 |
452 | How does it work?
453 |
454 | 1. Calculate the daily returns for every day and every iteration (simulation) of the data.
455 | 2. Creates an equally large matrix of size [days x iteration] full of zeroes.
456 | 3. Input the last stock price value in the first row (day 0) of the "empty" matrix (part 2). This is our starting point.
457 | 4. Calculate "today's price" based on yesterday's multiplied by the daily return generated. That is, multiply the daily return generated for every simulation with the stock price calculated for the previous day (the previous row) for every simulation.
458 |
459 | Does that sounds familiar? The fourth step multiplies the daily returns with the price of the stock of the previous day!
460 |
461 | ```
462 | def simulate_mc(data, days, iterations, return_type='log', plot=True):
463 | # Generate daily returns
464 | returns = daily_returns(data, days, iterations, return_type)
465 | # Create empty matrix
466 | price_list = np.zeros_like(returns)
467 | # Put the last actual price in the first row of matrix.
468 | price_list[0] = data.iloc[-1]
469 | # Calculate the price of each day
470 | for t in range(1,days):
471 | price_list[t] = price_list[t-1]*returns[t]
472 |
473 | # Plot Option
474 | if plot == True:
475 | x = pd.DataFrame(price_list).iloc[-1]
476 | fig, ax = plt.subplots(1,2, figsize=(14,4))
477 | sns.distplot(x, ax=ax[0])
478 | sns.distplot(x, hist_kws={'cumulative':True},kde_kws={'cumulative':True},ax=ax[1])
479 | plt.xlabel("Stock Price")
480 | plt.show()
481 |
482 | #CAPM and Sharpe Ratio
483 |
484 | # Printing information about stock
485 | try:
486 | [print(nam) for nam in data.columns]
487 | except:
488 | print(data.name)
489 | print(f"Days: {days-1}")
490 | print(f"Expected Value: ${round(pd.DataFrame(price_list).iloc[-1].mean(),2)}")
491 | print(f"Return: {round(100*(pd.DataFrame(price_list).iloc[-1].mean()-price_list[0,1])/pd.DataFrame(price_list).iloc[-1].mean(),2)}%")
492 | print(f"Probability of Breakeven: {probs_find(pd.DataFrame(price_list),0, on='return')}")
493 |
494 |
495 | return pd.DataFrame(price_list)
496 |
497 | simulate_mc(data, 252, 1000, 'log')
498 | ```
499 |
500 | Now, let's loop through all the stated securities and generate the visualizations and statistics that will help us understand the expected performance of a stock.
501 |
502 | ```
503 | def monte_carlo(tickers, days_forecast, iterations, start_date = '2000-1-1', return_type = 'log', plotten=False):
504 | data = import_stock_data(tickers, start=start_date)
505 | inform = beta_sharpe(data, mark_ticker="^GSPC", start=start_date)
506 | simulatedDF = []
507 | for t in range(len(tickers)):
508 | y = simulate_mc(data.iloc[:,t], (days_forecast+1), iterations, return_type)
509 | if plotten == True:
510 | forplot = y.iloc[:,0:10]
511 | forplot.plot(figsize=(15,4))
512 | print(f"Beta: {round(inform.iloc[t,inform.columns.get_loc('Beta')],2)}")
513 | print(f"Sharpe: {round(inform.iloc[t,inform.columns.get_loc('Sharpe')],2)}")
514 | print(f"CAPM Return: {round(100*inform.iloc[t,inform.columns.get_loc('CAPM')],2)}%")
515 | y['ticker'] = tickers[t]
516 | cols = y.columns.tolist()
517 | cols = cols[-1:] + cols[:-1]
518 | y = y[cols]
519 | simulatedDF.append(y)
520 | simulatedDF = pd.concat(simulatedDF)
521 | return simulatedDF
522 |
523 | start = "2015-1-1"
524 | days_to_forecast= 252
525 | simulation_trials= 10000
526 | ret_sim_df = monte_carlo(['GOOG','AAPL'], days_to_forecast, simulation_trials, start_date=start, plotten=False)
527 | ```
528 |
529 |
530 | Now, we can do Monte Carlo simulations on individual stocks, assuming they are uncorrelated. But usually, people don't choose between stocks A or B. Investors have to choose from an sea of stocks and other possible securities they can invest in! Investors target to maximize returns while avoiding risk, and one way an investor can do that is by diversifying their portfolio. Hence, the next two sections are: Portfolio Optimization theory and code, and Cholesky Decomposition to generate correlated returns.
531 |
532 | ## Portfolio Optimization
533 | To understand portfolio optimization, we must introduce Markowitz and the Efficient Frontier.
534 |
535 | The efficiency frontier is a set of optimal portfolios that offer the highest expected returns for a given volatility - ie risk. Hence, any portfolio that does not lie in the frontier, is suboptimal. This is because these portfolios could provide higher returns for the same amount of risk.
536 |
537 | Let's exemplify with the case of a portfolio that can only hold 2 stocks:
538 | 1. Microsoft (MSFT)
539 | 2. UnitedHealth (UNH)
540 |
541 | ```
542 | import numpy as np
543 | import pandas as pd
544 | from pandas_datareader import data as wb
545 | import matplotlib.pyplot as plt
546 | %matplotlib inline
547 |
548 | assets = ['MSFT','UNH']
549 |
550 | pf_data = pd.DataFrame()
551 | for t in assets:
552 | pf_data[t] = wb.DataReader(t, data_source='yahoo', start='2015-1-1')['Adj Close']
553 |
554 | (pf_data / pf_data.iloc[0]*100).plot(figsize=(15,6))
555 | ```
556 |
557 |
558 | ```
559 | log_returns = np.log(pf_data / pf_data.shift(1))
560 | log_returns.mean()*250
561 |
562 | num_assets = len(assets)
563 |
564 | weights = np.random.random(num_assets)
565 | weights /= np.sum(weights)
566 |
567 | pfolio_returns = []
568 | pfolio_volatilities = []
569 |
570 | for x in range(1000):
571 | weights = np.random.random(num_assets)
572 | weights /= np.sum(weights)
573 |
574 | pfolio_returns.append(np.sum(weights*log_returns.mean())*252)
575 | pfolio_volatilities.append(np.sqrt(np.dot(weights.T, np.dot(log_returns.cov()*250, weights))))
576 |
577 | pfolio_returns = np.array(pfolio_returns)
578 | pfolio_volatilities = np.array(pfolio_volatilities)
579 |
580 | portfolios = pd.DataFrame({'Return':pfolio_returns,'Volatility':pfolio_volatilities})
581 |
582 | portfolios.plot(x='Volatility',y='Return', kind='scatter', figsize=(10,6))
583 | #plt.axis([0,])
584 | plt.xlabel('Expected Volatility')
585 | plt.ylabel('Expected Return')
586 |
587 | print(f"Expected Portfolio Return: {round(np.sum(weights * log_returns.mean())*252*100,2)}%")
588 | print(f"Expected Portfolio Variance: {round(100*np.dot(weights.T, np.dot(log_returns.cov() *252, weights)),2)}%")
589 | print(f"Expected Portfolio Volatility: {round(100*np.sqrt(np.dot(weights.T, np.dot(log_returns.cov()*252, weights))),2)}%")
590 | ```
591 |
592 |
593 | Expected Portfolio Return: 26.97%
594 |
595 | Expected Portfolio Variance: 6.78%
596 |
597 | Expected Portfolio Volatility: 26.03%
598 |
599 | The image above shows the efficient frontier, with each dot being a portfolio made of the two stocks. The difference between all portfolios is the weight of it attributed to each stock. As we can observe, for the same expected risk (volatility), there are different expected returns. If an investor targets a certain risk and is not on the part of the frontier than maximizes returns, the portfolio is suboptimal.
600 |
601 | Next we will leverage Monte Carlo simulations, to calculate the
602 |
--------------------------------------------------------------------------------
/Multivariate MC.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "code",
5 | "execution_count": 47,
6 | "metadata": {},
7 | "outputs": [],
8 | "source": [
9 | "import numpy as np\n",
10 | "import pandas as pd\n",
11 | "from pandas_datareader import data as wb\n",
12 | "import matplotlib.pyplot as plt\n",
13 | "from scipy.stats import norm, gmean, cauchy, multivariate_normal\n",
14 | "import seaborn as sns\n",
15 | "# import plotly as py\n",
16 | "# import plotly.graph_objs as go\n",
17 | "# import cufflinks\n",
18 | "# cufflinks.go_offline(connected=True)\n",
19 | "#init_notebook_mode(connected=True)\n",
20 | "\n",
21 | "%matplotlib inline"
22 | ]
23 | },
24 | {
25 | "cell_type": "code",
26 | "execution_count": 521,
27 | "metadata": {},
28 | "outputs": [],
29 | "source": [
30 | "def import_stock_data(tickers, start = '2010-1-1'):\n",
31 | " data = pd.DataFrame()\n",
32 | " if len([tickers]) ==1:\n",
33 | " data[tickers] = wb.DataReader(tickers, data_source='yahoo', start = start)['Adj Close']\n",
34 | " data = pd.DataFrame(data)\n",
35 | " else:\n",
36 | " for t in tickers:\n",
37 | " data[t] = wb.DataReader(t, data_source='yahoo', start = start)['Adj Close']\n",
38 | " return(data)"
39 | ]
40 | },
41 | {
42 | "cell_type": "code",
43 | "execution_count": 687,
44 | "metadata": {},
45 | "outputs": [],
46 | "source": [
47 | "data = import_stock_data([\"MSFT\",\"BAM\"])"
48 | ]
49 | },
50 | {
51 | "cell_type": "code",
52 | "execution_count": 688,
53 | "metadata": {},
54 | "outputs": [
55 | {
56 | "data": {
57 | "text/html": [
58 | "