├── Chart Pattern Recognition.ipynb ├── README.md └── images ├── chart.png └── table.png /Chart Pattern Recognition.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Chart Pattern Recognition" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "This feature engineering example involves a method for chart pattern recognition. Stemming from the idea that voice recognition is basically looking for patterns in waveforms, it seemed plausible that this could apply to normalized stock returns. This example uses closing price data for Microsoft from January 2000 to April 2018 (data from Quandl)." 15 | ] 16 | }, 17 | { 18 | "cell_type": "markdown", 19 | "metadata": {}, 20 | "source": [ 21 | "The following approach takes a rolling one-month window of price data, normalizes the data by converting it to cumulative percent returns over the period, and then stores the pattern. Next, it steps the window forward by one day and repeats the process to collect all possible historical patterns." 22 | ] 23 | }, 24 | { 25 | "cell_type": "markdown", 26 | "metadata": {}, 27 | "source": [ 28 | "Then we can compare the current period’s pattern to historical patterns, extract those patterns similar to the current pattern, and use the mean outcome of those similar patterns as the basis for the buy or sell recommendation. This is shown by the dots on the right-hand side of the chart — the green and red dots are the outcomes of the historical patterns, the yellow dot is the mean of those outcomes, which is the recommendation, and the purple dot is the actual outcome of the current pattern being evaluated." 29 | ] 30 | }, 31 | { 32 | "cell_type": "markdown", 33 | "metadata": {}, 34 | "source": [ 35 | "![alt text](chart.png \"Title\")" 36 | ] 37 | }, 38 | { 39 | "cell_type": "markdown", 40 | "metadata": {}, 41 | "source": [ 42 | "Testing this across about 200 patterns to evaluate predictive strength generated a 64% accuracy rate which is encouraging." 43 | ] 44 | }, 45 | { 46 | "cell_type": "markdown", 47 | "metadata": {}, 48 | "source": [ 49 | "![alt text](table.png \"Title\")" 50 | ] 51 | }, 52 | { 53 | "cell_type": "markdown", 54 | "metadata": {}, 55 | "source": [ 56 | "### Import dependencies" 57 | ] 58 | }, 59 | { 60 | "cell_type": "code", 61 | "execution_count": 3, 62 | "metadata": {}, 63 | "outputs": [], 64 | "source": [ 65 | "import pandas as pd\n", 66 | "import seaborn as sns\n", 67 | "import matplotlib.pyplot as plt\n", 68 | "import numpy as np\n", 69 | "from matplotlib import style\n", 70 | "style.use(\"ggplot\")\n", 71 | "%matplotlib inline" 72 | ] 73 | }, 74 | { 75 | "cell_type": "markdown", 76 | "metadata": {}, 77 | "source": [ 78 | "### Import data" 79 | ] 80 | }, 81 | { 82 | "cell_type": "code", 83 | "execution_count": 5, 84 | "metadata": {}, 85 | "outputs": [], 86 | "source": [ 87 | "stock = pd.read_csv('msft_prices.csv', parse_dates=[0]).set_index('Date') # parse dates convert from str to date, [0] is col index\n", 88 | "price = stock.Close.values # convert closing prices to numpy array" 89 | ] 90 | }, 91 | { 92 | "cell_type": "markdown", 93 | "metadata": {}, 94 | "source": [ 95 | "### Set up the rolling window function" 96 | ] 97 | }, 98 | { 99 | "cell_type": "code", 100 | "execution_count": 8, 101 | "metadata": {}, 102 | "outputs": [], 103 | "source": [ 104 | "def strided_app(prices, window_len, stride_step=1):\n", 105 | " ''' Takes a winow length of data points, steps forward by the stride_step, and\n", 106 | " repeats to collect all the data points into an array. \n", 107 | " \n", 108 | " Parameters\n", 109 | " ----------\n", 110 | " prices = closing price data\n", 111 | " window_len = 21 for 20 trading days or a 1 month window \n", 112 | " stride_step = step the window forward by one day \n", 113 | " \n", 114 | " Output\n", 115 | " ------\n", 116 | " Strided window of determined length and step function ''' \n", 117 | " \n", 118 | " nrows = ((prices.size - window_len)//stride_step) + 1\n", 119 | " n = prices.strides[0]\n", 120 | " \n", 121 | " return np.lib.stride_tricks.as_strided(prices, shape=(nrows, window_len), \n", 122 | " strides=(stride_step * n, n))" 123 | ] 124 | }, 125 | { 126 | "cell_type": "markdown", 127 | "metadata": {}, 128 | "source": [ 129 | "### Collect historical patterns of cumulative percentage price return data" 130 | ] 131 | }, 132 | { 133 | "cell_type": "code", 134 | "execution_count": 22, 135 | "metadata": {}, 136 | "outputs": [], 137 | "source": [ 138 | "def stored_cumPatterns(prices, window_len, stride_step=1):\n", 139 | " ''' Collects cumulative pct-change patterns (which normalizes the data) over \n", 140 | " the window period for each step forward (stride) through the price series. ''' \n", 141 | "\n", 142 | " # Array where each row has window_len of closing prices, by calling the strided_app function\n", 143 | " pattern_window = strided_app(prices, window_len, stride_step)\n", 144 | " # Normalize closing prices to cumulative percentage returns\n", 145 | " stored_cum_patts = (pattern_window[:,1:] / pattern_window[:,0:1] - 1) * 100\n", 146 | " \n", 147 | " return stored_cum_patts" 148 | ] 149 | }, 150 | { 151 | "cell_type": "markdown", 152 | "metadata": {}, 153 | "source": [ 154 | "The first two rows of window percentage returns are shown below, each row with 20 days of cumulative return figures." 155 | ] 156 | }, 157 | { 158 | "cell_type": "code", 159 | "execution_count": 27, 160 | "metadata": {}, 161 | "outputs": [ 162 | { 163 | "data": { 164 | "text/plain": [ 165 | "array([[ -3.38023336, -2.35929993, -5.62800275, -4.39258751,\n", 166 | " -3.69766644, -6.16849691, -9.22271791, -7.50686342,\n", 167 | " -3.69766644, -1.07240906, -8.20178449, -9.05971174,\n", 168 | " -10.99004804, -13.13486616, -11.79649966, -14.74776939,\n", 169 | " -15.27968428, -15.70864791, -16.03466026, -11.68496911],\n", 170 | " [ 1.05665068, -2.32640739, -1.04777127, -0.32853845,\n", 171 | " -2.88581069, -6.04688332, -4.27099982, -0.32853845,\n", 172 | " 2.38856331, -4.99023264, -5.87817439, -7.87604333,\n", 173 | " -10.09589771, -8.71070858, -11.7652282 , -12.31575209,\n", 174 | " -12.75972296, -13.09714083, -8.59527615, -10.48659208]])" 175 | ] 176 | }, 177 | "execution_count": 27, 178 | "metadata": {}, 179 | "output_type": "execute_result" 180 | } 181 | ], 182 | "source": [ 183 | "stored_cumPatterns(prices=price, window_len=21, stride_step=1)[:2]" 184 | ] 185 | }, 186 | { 187 | "cell_type": "code", 188 | "execution_count": 28, 189 | "metadata": {}, 190 | "outputs": [], 191 | "source": [ 192 | "def stored_dailyPatterns(prices, window_len, stride_step=1):\n", 193 | " ''' Gets the daily percent change patterns for the period. ''' \n", 194 | " \n", 195 | " # Array where each row has window_len of closing prices, by calling the strided_app function\n", 196 | " pattern_window = strided_app(prices, window_len, stride_step)\n", 197 | " # Normalize closing prices to daily percentage returns\n", 198 | " stored_daily_patts = (pattern_window[:,1:] / pattern_window[:,:-1] - 1) * 100\n", 199 | "\n", 200 | " return stored_daily_patts" 201 | ] 202 | }, 203 | { 204 | "cell_type": "markdown", 205 | "metadata": {}, 206 | "source": [ 207 | "### Collect the current pattern of cumulative percentage price return for comparison" 208 | ] 209 | }, 210 | { 211 | "cell_type": "code", 212 | "execution_count": 18, 213 | "metadata": {}, 214 | "outputs": [], 215 | "source": [ 216 | "def curr_cumPattern(prices, curr_patt_start, window_len, stride_step=1):\n", 217 | " ''' Gets the specified cumulative pct-change pattern for the window period. '''\n", 218 | " \n", 219 | " stored_cum_patts = stored_cumPatterns(prices, window_len, stride_step)\n", 220 | " curr_cum_patt = stored_cum_patts[curr_patt_start].reshape(1,-1)\n", 221 | " \n", 222 | " return curr_cum_patt" 223 | ] 224 | }, 225 | { 226 | "cell_type": "markdown", 227 | "metadata": {}, 228 | "source": [ 229 | "A current 20-day pattern is shown below, taken from the 500th trading day within the price history as a starting point for the window period." 230 | ] 231 | }, 232 | { 233 | "cell_type": "code", 234 | "execution_count": 29, 235 | "metadata": {}, 236 | "outputs": [ 237 | { 238 | "data": { 239 | "text/plain": [ 240 | "array([[ 3.26670644, 2.77446301, 2.2673031 , 3.49045346, 2.49105012,\n", 241 | " 3.34128878, 2.34188544, 2.13305489, 3.74403341, 1.23806683,\n", 242 | " 4.20644391, -1.40214797, -3.84844869, -4.92243437, -3.63961814,\n", 243 | " -4.83293556, -4.80310263, -7.04057279, -6.25 , -4.96718377]])" 244 | ] 245 | }, 246 | "execution_count": 29, 247 | "metadata": {}, 248 | "output_type": "execute_result" 249 | } 250 | ], 251 | "source": [ 252 | "curr_cumPattern(prices=price, curr_patt_start=500, window_len=21, stride_step=1)" 253 | ] 254 | }, 255 | { 256 | "cell_type": "code", 257 | "execution_count": 19, 258 | "metadata": {}, 259 | "outputs": [], 260 | "source": [ 261 | "def curr_dailyPattern(prices, curr_patt_start, window_len, stride_step=1):\n", 262 | " ''' Gets the daily percent change patterns for the period. ''' \n", 263 | " \n", 264 | " stored_daily_patts = stored_dailyPatterns(prices, window_len, stride_step)\n", 265 | " curr_daily_patt = stored_daily_patts[curr_patt_start].reshape(1,-1)\n", 266 | " \n", 267 | " return curr_daily_patt" 268 | ] 269 | }, 270 | { 271 | "cell_type": "markdown", 272 | "metadata": {}, 273 | "source": [ 274 | "### Compare current pattern to historical patterns for recognition" 275 | ] 276 | }, 277 | { 278 | "cell_type": "markdown", 279 | "metadata": {}, 280 | "source": [ 281 | "A simple way to identify similar patterns is to compare the number of up/down days in the window period. This can be done by converting the pct-change patterns to positive/negative days using numpy.sign. Then count those days in the window period where the sign matches the stored pattern and divide by the number of elements in the window period to get a percentage of days matched. Once all stored patterns have been compared to the current pattern, a cut-off threshold for a minimum level of similarity can then be applied to extract only those deemed similar enough. Conceptually, this is akin to a right-tail significance interval of a distribution.\n", 282 | "\n", 283 | "This isn't ideal as matching daily returns can understate the match while cumulative returns can overstate the match. To overcome this, an average of the daily and cumulative comparisons has been used as a compromise." 284 | ] 285 | }, 286 | { 287 | "cell_type": "code", 288 | "execution_count": 20, 289 | "metadata": {}, 290 | "outputs": [], 291 | "source": [ 292 | "def pattComp(prices, curr_patt_start, window_len, stride_step=1):\n", 293 | " ''' Counts the number of same-sign returns where a cut-off threshold for \n", 294 | " similarity can be applied. This needs to be an average of cumulative\n", 295 | " daily comparisons for a more stable cut-off threshold level. ''' \n", 296 | " \n", 297 | " # Call the current cumulative return pattern and stored cumulative return patterns:\n", 298 | " curr_cum_patt = curr_cumPattern(prices, curr_patt_start, window_len, \n", 299 | " stride_step).reshape(1,-1)\n", 300 | " stored_cum_patts = stored_cumPatterns(prices, window_len, stride_step)\n", 301 | " \n", 302 | " \n", 303 | " # Call the current daily return pattern and stored daily return patterns:\n", 304 | " curr_daily_patt = curr_dailyPattern(prices, curr_patt_start, window_len, \n", 305 | " stride_step).reshape(1,-1)\n", 306 | " stored_daily_patts = stored_dailyPatterns(prices, window_len, stride_step)\n", 307 | " \n", 308 | " # Count the number of matching signs, divide by window elements to get matching percent\n", 309 | " cum_comp = np.sum( np.sign(curr_cum_patt) == \n", 310 | " np.sign(stored_cum_patts), axis=1 ) / curr_cum_patt.size \n", 311 | " daily_comp = np.sum( np.sign(curr_daily_patt) == \n", 312 | " np.sign(stored_daily_patts), axis=1 ) / curr_daily_patt.size\n", 313 | " \n", 314 | " # Take the blended average of the two comparison methods above\n", 315 | " both_comp = np.mean((daily_comp, cum_comp), axis=0)\n", 316 | " \n", 317 | " return cum_comp, daily_comp, both_comp" 318 | ] 319 | }, 320 | { 321 | "cell_type": "markdown", 322 | "metadata": {}, 323 | "source": [ 324 | "#### Plot a historgram of pattern similarities to get a sense of what is going on..." 325 | ] 326 | }, 327 | { 328 | "cell_type": "markdown", 329 | "metadata": {}, 330 | "source": [ 331 | "Overlaying the distribution of pattern comparisons for the cumulative returns, daily returns and the average of the two gives a clearer picture or sense of what things look like. Change the curr_patt_start to anything less than 4565 (which is the index start point for the most recent pattern for comparison) to see how the distributions vary. " 332 | ] 333 | }, 334 | { 335 | "cell_type": "code", 336 | "execution_count": 30, 337 | "metadata": {}, 338 | "outputs": [], 339 | "source": [ 340 | "def plotHistograms(prices, curr_patt_start, window_len, stride_step=1):\n", 341 | " \n", 342 | " cum_comp, daily_comp, both_comp = pattComp(prices, curr_patt_start, \n", 343 | " window_len, stride_step)\n", 344 | "\n", 345 | " plt.hist(both_comp, label='both_comp', alpha=0.7)\n", 346 | " names = ['cum_comp', 'daily_comp']\n", 347 | " plt.hist([cum_comp, daily_comp], label=names , alpha=0.8)\n", 348 | " \n", 349 | " plt.title('Both Pattern Comparison')\n", 350 | " plt.legend()" 351 | ] 352 | }, 353 | { 354 | "cell_type": "code", 355 | "execution_count": 46, 356 | "metadata": {}, 357 | "outputs": [ 358 | { 359 | "data": { 360 | "image/png": "\n", 361 | "text/plain": [ 362 | "" 363 | ] 364 | }, 365 | "metadata": {}, 366 | "output_type": "display_data" 367 | } 368 | ], 369 | "source": [ 370 | "plotHistograms(prices=price, curr_patt_start=4560, window_len=21, stride_step=1)" 371 | ] 372 | }, 373 | { 374 | "cell_type": "markdown", 375 | "metadata": {}, 376 | "source": [ 377 | "### Retrieve the similar patterns" 378 | ] 379 | }, 380 | { 381 | "cell_type": "markdown", 382 | "metadata": {}, 383 | "source": [ 384 | "With the current pattern compared to all previous patterns, we can now extract only the similar ones by defining a minimum cut-off threshold for perecentage similarity." 385 | ] 386 | }, 387 | { 388 | "cell_type": "code", 389 | "execution_count": 49, 390 | "metadata": {}, 391 | "outputs": [], 392 | "source": [ 393 | "def retrieve_cumSimPatts(prices, curr_patt_start, window_len, sim_threshold, \n", 394 | " stride_step=1):\n", 395 | " ''' need to retrieve stored_cumPatts that correspond w the stored_dailyPatts.\n", 396 | " -- use patt_sim_index to lookup from stored_cumPatts '''\n", 397 | " \n", 398 | " patt_similarity = pattComp(prices, curr_patt_start, window_len, stride_step)[2] # 2=both_comp\n", 399 | " patt_sim_index = np.where(patt_similarity > sim_threshold)[0]\n", 400 | " sim_patts_cum = stored_cumPatterns(prices, window_len, stride_step)[patt_sim_index]\n", 401 | " \n", 402 | " return sim_patts_cum" 403 | ] 404 | }, 405 | { 406 | "cell_type": "markdown", 407 | "metadata": {}, 408 | "source": [ 409 | "As an example, below shows the current pattern (from index start 4560) has 60 historical patterns that match with an 80% degree of similarity." 410 | ] 411 | }, 412 | { 413 | "cell_type": "code", 414 | "execution_count": 55, 415 | "metadata": {}, 416 | "outputs": [ 417 | { 418 | "data": { 419 | "text/plain": [ 420 | "60" 421 | ] 422 | }, 423 | "execution_count": 55, 424 | "metadata": {}, 425 | "output_type": "execute_result" 426 | } 427 | ], 428 | "source": [ 429 | "retrieve_cumSimPatts(prices=price, curr_patt_start=4560, window_len=21, \n", 430 | " sim_threshold=0.8, stride_step=1).shape[0]" 431 | ] 432 | }, 433 | { 434 | "cell_type": "code", 435 | "execution_count": 48, 436 | "metadata": {}, 437 | "outputs": [], 438 | "source": [ 439 | "def retrieve_dailySimPatts(prices, curr_patt_start, window_len, sim_threshold, \n", 440 | " stride_step=1):\n", 441 | " \n", 442 | " patt_similarity = pattComp(prices, curr_patt_start, window_len, stride_step)[2] # 2=both_comp\n", 443 | " patt_sim_index = np.where(patt_similarity > sim_threshold)[0]\n", 444 | " sim_patts_daily = stored_dailyPatterns(prices, window_len, stride_step)[patt_sim_index]\n", 445 | " \n", 446 | " return sim_patts_daily" 447 | ] 448 | }, 449 | { 450 | "cell_type": "markdown", 451 | "metadata": {}, 452 | "source": [ 453 | "### Evaluate the pattern outcomes" 454 | ] 455 | }, 456 | { 457 | "cell_type": "markdown", 458 | "metadata": {}, 459 | "source": [ 460 | "These functions calculate the returns of the patterns that are similar to the current pattern and then uses the mean of those similar returns as the predicted return for current pattern we are trying to predict." 461 | ] 462 | }, 463 | { 464 | "cell_type": "code", 465 | "execution_count": 68, 466 | "metadata": {}, 467 | "outputs": [], 468 | "source": [ 469 | "def stored_patts_returnOutcome(prices, curr_patt_start, window_len, return_period, \n", 470 | " sim_threshold, stride_step=1):\n", 471 | " ''' Then need to get the start price of the corresponding matched pattern from \n", 472 | " stored_cumPatts to calculate the return. (use the start price because we\n", 473 | " are extending the cumulative return pattern. '''\n", 474 | " patt_similarity = pattComp(prices, curr_patt_start, window_len, stride_step)[2] # 2=both_comp\n", 475 | " \n", 476 | " startprice_index = np.where(patt_similarity > sim_threshold)[0]\n", 477 | " stored_patts_startprice = prices[startprice_index]\n", 478 | " \n", 479 | " outcome_prices_index = np.clip(np.where(patt_similarity > sim_threshold)[0] +\n", 480 | " return_period+window_len, a_min=0, \n", 481 | " a_max=len(prices)-window_len)\n", 482 | " stored_patts_outcome_prices = prices[outcome_prices_index]\n", 483 | " \n", 484 | " stored_patts_outcome_returns = (stored_patts_outcome_prices / \n", 485 | " stored_patts_startprice -1 ) * 100\n", 486 | " \n", 487 | " return stored_patts_outcome_returns" 488 | ] 489 | }, 490 | { 491 | "cell_type": "code", 492 | "execution_count": 69, 493 | "metadata": {}, 494 | "outputs": [], 495 | "source": [ 496 | "def predictedOutcome(prices, curr_patt_start, window_len, return_period, \n", 497 | " sim_threshold, stride_step=1):\n", 498 | " ''' Predicted outcome is just be the average of past similar patt outcomes. '''\n", 499 | " \n", 500 | " stored_patts_outcome_returns = stored_patts_returnOutcome(prices, curr_patt_start, \n", 501 | " window_len, return_period, \n", 502 | " sim_threshold, stride_step)\n", 503 | " \n", 504 | " return np.mean(stored_patts_outcome_returns)" 505 | ] 506 | }, 507 | { 508 | "cell_type": "markdown", 509 | "metadata": {}, 510 | "source": [ 511 | "This function calculates the actual return of the pattern we are trying to predict." 512 | ] 513 | }, 514 | { 515 | "cell_type": "code", 516 | "execution_count": 70, 517 | "metadata": {}, 518 | "outputs": [], 519 | "source": [ 520 | "def curr_patt_returnOutcome(prices, curr_patt_start, window_len, return_period): # curr_patt_start will be 'i' in the loop\n", 521 | " ''' Actual cumulative return of the current pattern. ''' \n", 522 | " curr_patt_start_price = prices[curr_patt_start]\n", 523 | " curr_patt_outcome_price = prices[curr_patt_start + window_len + return_period-1]\n", 524 | " curr_patt_outcome_return = (curr_patt_outcome_price / curr_patt_start_price -1 ) * 100\n", 525 | " \n", 526 | " return curr_patt_outcome_return" 527 | ] 528 | }, 529 | { 530 | "cell_type": "markdown", 531 | "metadata": {}, 532 | "source": [ 533 | "### Visualizations" 534 | ] 535 | }, 536 | { 537 | "cell_type": "code", 538 | "execution_count": 74, 539 | "metadata": {}, 540 | "outputs": [], 541 | "source": [ 542 | "def run_charts(prices, curr_patt_start, window_len, return_period, \n", 543 | " sim_threshold, stride_step=1):\n", 544 | " \n", 545 | " fig = plt.figure(figsize=(9,9)) \n", 546 | " \n", 547 | " curr_cum_patt = curr_cumPattern(prices, curr_patt_start, window_len, stride_step)\n", 548 | " curr_daily_patt = curr_dailyPattern(prices, curr_patt_start, window_len, stride_step)\n", 549 | " sim_patts_cum = retrieve_cumSimPatts(prices, curr_patt_start, window_len, \n", 550 | " sim_threshold, stride_step)\n", 551 | " sim_patts_daily = retrieve_dailySimPatts(prices, curr_patt_start, window_len, \n", 552 | " sim_threshold, stride_step)\n", 553 | " \n", 554 | " stored_patts_outcome_returns = stored_patts_returnOutcome(prices, curr_patt_start, \n", 555 | " window_len, return_period, \n", 556 | " sim_threshold, stride_step)\n", 557 | " \n", 558 | " pred_outcome = predictedOutcome(prices, curr_patt_start, window_len, return_period, \n", 559 | " sim_threshold, stride_step) \n", 560 | " \n", 561 | " curr_patt_outcome_return = curr_patt_returnOutcome(prices, curr_patt_start, \n", 562 | " window_len, return_period) \n", 563 | " \n", 564 | "\n", 565 | " # CUMULATIVE PATTERNS\n", 566 | " ax1 = fig.add_subplot(211)\n", 567 | " # Similar patterns:\n", 568 | " counter = 0\n", 569 | " for i in sim_patts_cum: \n", 570 | " if stored_patts_outcome_returns[counter:counter+1] > curr_cum_patt[0][-1]:\n", 571 | " dot_color = 'green'\n", 572 | " else: dot_color = 'red' \n", 573 | " plt.scatter( window_len + return_period - 1, \n", 574 | " stored_patts_outcome_returns[counter:counter+1], \n", 575 | " c = dot_color, alpha=0.3) \n", 576 | " plt.plot(sim_patts_cum[counter:counter+1][0], alpha=0.5)\n", 577 | " counter += 1\n", 578 | " # Predicted Outcome:\n", 579 | " plt.scatter( window_len + return_period - 2 , pred_outcome, \n", 580 | " c='yellow', label = 'Predicted Outcome') \n", 581 | " # Actual Outcome:\n", 582 | " plt.scatter( window_len + return_period - 2 , curr_patt_outcome_return , \n", 583 | " c='purple', label = 'Actual Outcome')\n", 584 | " # Current Pattern:\n", 585 | " plt.plot(curr_cum_patt[0], linewidth=3, c='blue', label = 'Current Pattern') \n", 586 | " plt.title('Cumulative Pattern Comparison')\n", 587 | " plt.legend()\n", 588 | "\n", 589 | "\n", 590 | " # DAILY PATTERNS\n", 591 | " ax2 = fig.add_subplot(212)\n", 592 | " plt.scatter(window_len + return_period - 1, 1, c='grey', alpha=0.01) # this is just to keep x-axis same as cumulative chart\n", 593 | " # Similar patterns:\n", 594 | " counter = 0\n", 595 | " for i in sim_patts_daily:\n", 596 | " plt.plot(sim_patts_daily[counter:counter+1][0], alpha=0.5)\n", 597 | " counter += 1\n", 598 | " # Current Pattern:\n", 599 | " plt.plot(curr_daily_patt[0], linewidth=3, c='blue', label = 'Current Pattern')\n", 600 | " plt.title('Daily Pattern Comparison')\n", 601 | " plt.legend()" 602 | ] 603 | }, 604 | { 605 | "cell_type": "code", 606 | "execution_count": 111, 607 | "metadata": {}, 608 | "outputs": [ 609 | { 610 | "data": { 611 | "image/png": "\n", 612 | "text/plain": [ 613 | "" 614 | ] 615 | }, 616 | "metadata": {}, 617 | "output_type": "display_data" 618 | } 619 | ], 620 | "source": [ 621 | "run_charts(prices=price, curr_patt_start=455, window_len=21, return_period=10, \n", 622 | " sim_threshold=0.8)" 623 | ] 624 | }, 625 | { 626 | "cell_type": "markdown", 627 | "metadata": {}, 628 | "source": [] 629 | }, 630 | { 631 | "cell_type": "code", 632 | "execution_count": 80, 633 | "metadata": {}, 634 | "outputs": [], 635 | "source": [ 636 | "def run_update(prices, window_len, return_period, sim_threshold, stride_step=1): \n", 637 | " \n", 638 | " # Setup empty df to capture scores:\n", 639 | " confusion_matrix = pd.DataFrame(columns = ['Up_pred', 'Down_pred'])\n", 640 | " confusion_matrix['Scores'] = ['Up_outcome', 'Down_outcome', 'Accuracy']\n", 641 | " confusion_matrix = confusion_matrix.set_index('Scores')\n", 642 | " \n", 643 | " true_positive = 0\n", 644 | " false_positive = 0\n", 645 | " true_negative = 0\n", 646 | " false_negative = 0\n", 647 | " \n", 648 | " \n", 649 | " for i in range(1, int(len(prices)/window_len)):\n", 650 | " curr_patt_start = len(prices) - return_period - window_len*i\n", 651 | " \n", 652 | " # Storing patterns:\n", 653 | " stored_cum_patts = stored_cumPatterns(prices, window_len, stride_step=1)\n", 654 | " stored_daily_patts = stored_dailyPatterns(prices, window_len, stride_step=1)\n", 655 | " \n", 656 | " curr_cum_patt = curr_cumPattern(prices, curr_patt_start, window_len, stride_step)\n", 657 | " curr_daily_patt = curr_dailyPattern(prices, curr_patt_start, window_len, stride_step) \n", 658 | " \n", 659 | " # Comparing patterns:\n", 660 | " patt_comps = pattComp(prices, curr_patt_start, window_len, stride_step)[2] # 2=both_comp\n", 661 | " sim_patts_daily = retrieve_dailySimPatts(prices, curr_patt_start, window_len, \n", 662 | " sim_threshold, stride_step)\n", 663 | " sim_patts_cum = retrieve_cumSimPatts(prices, curr_patt_start, window_len, \n", 664 | " sim_threshold, stride_step)\n", 665 | " # Pattern return outcomes and pred:\n", 666 | " stored_patts_outcome_returns = stored_patts_returnOutcome(prices, curr_patt_start, \n", 667 | " window_len, \n", 668 | " return_period, \n", 669 | " sim_threshold, \n", 670 | " stride_step)\n", 671 | " pred_outcome = predictedOutcome(prices, curr_patt_start, window_len, \n", 672 | " return_period, sim_threshold, stride_step)\n", 673 | " curr_patt_outcome_return = curr_patt_returnOutcome(prices, curr_patt_start, \n", 674 | " window_len, return_period)\n", 675 | " \n", 676 | " # Scoring: \n", 677 | " if pred_outcome > curr_cum_patt[0][-1] and curr_patt_outcome_return >= curr_cum_patt[0][-1]:\n", 678 | " true_positive += 1\n", 679 | " elif pred_outcome > curr_cum_patt[0][-1] and curr_patt_outcome_return < curr_cum_patt[0][-1]:\n", 680 | " false_positive += 1\n", 681 | " elif pred_outcome < curr_cum_patt[0][-1] and curr_patt_outcome_return <= curr_cum_patt[0][-1]:\n", 682 | " true_negative += 1\n", 683 | " else: false_negative += 1\n", 684 | " \n", 685 | "\n", 686 | " # Record scores in rgr_scores df:\n", 687 | " confusion_matrix.iloc[0, 0] = true_positive\n", 688 | " confusion_matrix.iloc[0, 1] = false_positive\n", 689 | " confusion_matrix.iloc[1, 1] = true_negative\n", 690 | " confusion_matrix.iloc[1, 0] = false_negative\n", 691 | " \n", 692 | " correct_preds = true_positive + true_negative\n", 693 | " incorrect_preds = false_positive + false_negative\n", 694 | " accuracy = correct_preds / (correct_preds + incorrect_preds)\n", 695 | " \n", 696 | " confusion_matrix.iloc[2:3, 0:1] = accuracy\n", 697 | "\n", 698 | " return confusion_matrix" 699 | ] 700 | }, 701 | { 702 | "cell_type": "code", 703 | "execution_count": 81, 704 | "metadata": {}, 705 | "outputs": [ 706 | { 707 | "data": { 708 | "text/html": [ 709 | "
\n", 710 | "\n", 723 | "\n", 724 | " \n", 725 | " \n", 726 | " \n", 727 | " \n", 728 | " \n", 729 | " \n", 730 | " \n", 731 | " \n", 732 | " \n", 733 | " \n", 734 | " \n", 735 | " \n", 736 | " \n", 737 | " \n", 738 | " \n", 739 | " \n", 740 | " \n", 741 | " \n", 742 | " \n", 743 | " \n", 744 | " \n", 745 | " \n", 746 | " \n", 747 | " \n", 748 | " \n", 749 | " \n", 750 | " \n", 751 | " \n", 752 | " \n", 753 | "
Up_predDown_pred
Scores
Up_outcome8939
Down_outcome4049
Accuracy0.635945NaN
\n", 754 | "
" 755 | ], 756 | "text/plain": [ 757 | " Up_pred Down_pred\n", 758 | "Scores \n", 759 | "Up_outcome 89 39\n", 760 | "Down_outcome 40 49\n", 761 | "Accuracy 0.635945 NaN" 762 | ] 763 | }, 764 | "execution_count": 81, 765 | "metadata": {}, 766 | "output_type": "execute_result" 767 | } 768 | ], 769 | "source": [ 770 | "run_update(prices=price, window_len=21, return_period=10, sim_threshold=0.8, stride_step=1)" 771 | ] 772 | }, 773 | { 774 | "cell_type": "code", 775 | "execution_count": null, 776 | "metadata": {}, 777 | "outputs": [], 778 | "source": [] 779 | } 780 | ], 781 | "metadata": { 782 | "kernelspec": { 783 | "display_name": "Python 3", 784 | "language": "python", 785 | "name": "python3" 786 | }, 787 | "language_info": { 788 | "codemirror_mode": { 789 | "name": "ipython", 790 | "version": 3 791 | }, 792 | "file_extension": ".py", 793 | "mimetype": "text/x-python", 794 | "name": "python", 795 | "nbconvert_exporter": "python", 796 | "pygments_lexer": "ipython3", 797 | "version": "3.7.2" 798 | } 799 | }, 800 | "nbformat": 4, 801 | "nbformat_minor": 2 802 | } 803 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | ## Chart Pattern Recognition 2 | ### Identifying stock chart patterns and evaluating predicted trade outcomes 3 | 4 | This feature engineering example involves a method for chart pattern recognition. Stemming from the idea that voice recognition is basically looking for patterns in waveforms, it seemed plausible that this could apply to normalized stock returns. This example uses closing price data for Microsoft from January 2000 to April 2018 (data from Quandl). 5 | 6 | The following approach takes a rolling one-month window of price data, normalizes the data by converting it to cumulative percent returns over the period, and then stores the pattern. Next, it steps the window forward by one day and repeats the process to collect all possible historical patterns. 7 | 8 | Then we can compare the current period’s pattern to historical patterns, extract those patterns similar to the current pattern, and use the mean outcome of those similar patterns as the basis for the buy or sell recommendation. This is shown by the dots on the right-hand side of the chart — the green and red dots are the outcomes of the historical patterns, the yellow dot is the mean of those outcomes, which is the recommendation, and the purple dot is the actual outcome of the current pattern being evaluated. 9 | 10 | ![My image](https://github.com/footfalcon/Chart_Pattern_Recognition/blob/master/images/chart.png) 11 | 12 | Initial testing across about 200 patterns to evaluate predictive strength generated a 64% accuracy rate. 13 | 14 | ![My image](https://github.com/footfalcon/Chart_Pattern_Recognition/blob/master/images/table.png 15 | ) 16 | -------------------------------------------------------------------------------- /images/chart.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/footfalcon/Chart_Pattern_Recognition/7805427ad11ac0870cf8ff967200709eedb78e38/images/chart.png -------------------------------------------------------------------------------- /images/table.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/footfalcon/Chart_Pattern_Recognition/7805427ad11ac0870cf8ff967200709eedb78e38/images/table.png --------------------------------------------------------------------------------