├── .gitignore
├── LICENSE
├── README.md
├── helper.py
└── markov_chain.py


/.gitignore:
--------------------------------------------------------------------------------
1 | *.pyc
2 | 


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) 2018 Lucas Calestini
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | # Markov Chain applied to baseball
  2 | 
  3 | ## Game Simulation
  4 | 
  5 | It runs 9 innings using one or multiple batters transition matrices. It will run the transitions in the sequence of the list of batters that is passed (no limits applied).
  6 | The code runs 9 loops (one per inning) of a while loop, generating random states from 0 --> 25-28.
  7 | 
  8 | ## Expected Runs
  9 | 
 10 | 
 11 | ## Batting Line Optimization
 12 | 
 13 | If we denote the numbers of batters as n, then all possible combinations for a batting line are n! (or the permutations of n). For a 9 batters, we have a total possible combination of 362,880 for the batting line. This number encompasses all true possibilities with the assumption that position matters, not only the sequence.
 14 | 
 15 | With this, we can know the expected run given a sequence of batters, for the first inning.
 16 | 
 17 | ## Explanation
 18 | 
 19 | There are 28 states in baseball innings:
 20 | 
 21 | |Runners|None|1st|2nd|3rd|1&2|1&3|2&3|1,2,3|
 22 | |-:|-:|-:|-:|-:|-:|-:|-:|-:|
 23 | |Outs|||||||||
 24 | |0|#1|#2|#3|#4|#5|#6|#7|#8|
 25 | |1|#9|#10|#11|#12|#13|#14|#15|#16|
 26 | |2|#17|#18|#19|#20|#21|#22|#23|#24|
 27 | 
 28 | - State 25: 3 out, 0 runs
 29 | - State 26: 3 out, 1 run
 30 | - State 27: 3 out, 2 runs
 31 | - State 28: 3 out, 3 runs
 32 | 
 33 | ```python
 34 | '''
 35 | 24 states (state-space S):
 36 | 
 37 | Runners:|None|1st|2nd|3rd|1&2|1&3|2&3|1,2,3|
 38 | Outs
 39 | 0|#1|#2|#3|#4|#5|#6|#7|#8|
 40 | 1|#9|#10|#11|#12|#13|#14|#15|#16|
 41 | 2|#17|#18|#19|#20|#21|#22|#23|#24|
 42 | 
 43 | 
 44 | 3outs,0runs=#25
 45 | 3outs,1run=#26
 46 | 3outs,2runs=#27
 47 | 3outs,3runs=#28
 48 | 
 49 | pij is the probability of moving from state i to state j. Therefore the
 50 | transition matrix (stochastic matrix) of pij is:
 51 | 
 52 | __
 53 | |p1,1|p1,2|p1,3p1,28|
 54 | |p2,1|p2,2|p2,3p2,28|
 55 | T=|p3,1,|p3,2|p3,3p3,28|
 56 | |()|()|()()|
 57 | |p28,1|p28,2|p28,3p28,28|
 58 | 
 59 | Tshape=(28,28)
 60 | 
 61 | The matrix can also be read as a combination of from-to(pre-transition and
 62 | post-transition) situations, where row-wise it has to add to 1 as it represents
 63 | the same origin state and all end states.
 64 | 
 65 | The matrix above can be represented as a block matrix:
 66 |     __
 67 |     |A0 B0 C0 D0|
 68 | P = |0  A1 B1 E1|
 69 |     |0  0  A2 F2|
 70 |     |0  0  0  1 |
 71 | 
 72 | Where A(8X8) is situation with no out, B(8X8) with one out, C(8X8)from zero to
 73 | two, D(8X4), E(8X4) and F(8X4) to end the inning,0 matrices for impossible
 74 | scenarios, and 1(4X4). The latter, specifically, is of the form:
 75 | 
 76 |   |1000|
 77 | 1=|1000|
 78 |   |1000|
 79 |   |1000|
 80 | 
 81 | The reason is so all outs end in state 25(absorbing state).
 82 | 
 83 | 
 84 | Every transition brings one possible number of runs, so we can have a run matrix
 85 | R(28X1) of all expected runs given original state i:
 86 | 
 87 |     |R(1)=p1,1|
 88 |     |R(2)=p2,2|
 89 | R=  |()|
 90 |     |R(4)=2*p4,1+p4,4+p4,7+p4,10+p4,2|
 91 |     |()|
 92 |     |R(28)=0|
 93 | 
 94 | Rshape=(28,1)
 95 | 
 96 | There as on why R(1)=p1,1 is because from no out and no runner(#1),a batter
 97 | can only go back to the same situation to score a run. Anything else is not a
 98 | run. Similarly for R(2) and R(3) as in those scenarios there is no one on base
 99 | 
100 | We can then keep track of the runs/state in the inning by using a matrix U of
101 | 20-25rows(max of runs in the inning) X 28 columns(current state).
102 | '''
103 | ```
104 | 
105 | Sources
106 |   - [Markov Chain Models: Theoretical Background](http://www.pankin.com/markov/theory.htm)
107 |   - https://wwwjstororg/stable/171922?seq=1#page_scan_tab_contents
108 |   - https://enwikipediaorg/wiki/Stochastic_matrix#Definition_and_properties
109 |   - http://statshackercom/the-markov-chain-model-of-baseball#prettyPhoto
110 |   - https://enwikipediaorg/wiki/State_space
111 |   - https://enwikipediaorg/wiki/Block_matrix
112 | 


--------------------------------------------------------------------------------
/helper.py:
--------------------------------------------------------------------------------
  1 | 
  2 | import time
  3 | 
  4 | def timeit(method):
  5 | 
  6 |     def timed(*args, **kw):
  7 |         ts = time.time()
  8 |         result = method(*args, **kw)
  9 |         te = time.time()
 10 | 
 11 |         print ('{0} {1:.5f} sec'.format(method.__name__, te-ts))
 12 |         return result
 13 | 
 14 |     return timed
 15 | 
 16 | 
 17 | RUNS=[
 18 |     [1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
 19 |     [2,1,1,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
 20 |     [2,1,1,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
 21 |     [2,1,1,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
 22 |     [3,2,2,2,1,1,1,0,2,1,1,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0],
 23 |     [3,2,2,2,1,1,1,0,2,1,1,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0],
 24 |     [3,2,2,2,1,1,1,0,2,1,1,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0],
 25 |     [4,3,3,3,2,2,2,1,3,2,2,2,1,1,1,0,2,1,1,1,0,0,0,0,0,1,0,0],
 26 |     [0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
 27 |     [0,0,0,0,0,0,0,0,2,1,1,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0],
 28 |     [0,0,0,0,0,0,0,0,2,1,1,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0],
 29 |     [0,0,0,0,0,0,0,0,2,1,1,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0],
 30 |     [0,0,0,0,0,0,0,0,3,2,2,2,1,1,1,0,2,1,1,1,0,0,0,0,0,1,0,0],
 31 |     [0,0,0,0,0,0,0,0,3,2,2,2,1,1,1,0,2,1,1,1,0,0,0,0,0,1,0,0],
 32 |     [0,0,0,0,0,0,0,0,3,2,2,2,1,1,1,0,2,1,1,1,0,0,0,0,0,1,0,0],
 33 |     [0,0,0,0,0,0,0,0,4,3,3,3,2,2,2,1,3,2,2,2,1,1,1,0,0,1,2,0],
 34 |     [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0],
 35 |     [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,1,1,1,0,0,0,0,0,1,0,0],
 36 |     [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,1,1,1,0,0,0,0,0,1,0,0],
 37 |     [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,1,1,1,0,0,0,0,0,1,0,0],
 38 |     [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,2,2,2,1,1,1,0,0,1,2,0],
 39 |     [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,2,2,2,1,1,1,0,0,1,2,0],
 40 |     [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,2,2,2,1,1,1,0,0,1,2,0],
 41 |     [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,4,3,3,3,2,2,2,1,0,1,2,3]
 42 | #    [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
 43 | #    [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
 44 | #    [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
 45 | #    [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]
 46 |     ]
 47 | 
 48 | OUTS=[
 49 |     [0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
 50 |     [0,0,0,0,0,0,0,0,1,1,1,1,0,0,0,0,2,0,0,0,0,0,0,0,0,0,0,0],
 51 |     [0,0,0,0,0,0,0,0,1,1,1,1,0,0,0,0,2,0,0,0,0,0,0,0,0,0,0,0],
 52 |     [0,0,0,0,0,0,0,0,1,1,1,1,0,0,0,0,2,0,0,0,0,0,0,0,0,0,0,0],
 53 |     [0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,0,2,2,2,2,0,0,0,0,3,0,0,0],
 54 |     [0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,0,2,2,2,2,0,0,0,0,3,0,0,0],
 55 |     [0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,0,2,2,2,2,0,0,0,0,3,0,0,0],
 56 |     [0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,0,3,3,0,0],
 57 |     [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0],
 58 |     [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,0,0,0,0,2,0,0,0],
 59 |     [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,0,0,0,0,2,0,0,0],
 60 |     [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,0,0,0,0,2,0,0,0],
 61 |     [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,0,2,2,0,0],
 62 |     [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,0,2,2,0,0],
 63 |     [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,0,2,2,0,0],
 64 |     [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,2,2,2,0],
 65 |     [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0],
 66 |     [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0],
 67 |     [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0],
 68 |     [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0],
 69 |     [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,0],
 70 |     [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,0],
 71 |     [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,0],
 72 |     [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1]
 73 |     #[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
 74 |     #[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
 75 |     #[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
 76 |     #[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]
 77 |     ]
 78 | 
 79 | PRIOR = [
 80 |     [0.0238,0.2554,0.0466,0.0075,0.0,0.0,0.0,0.0,0.6668,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0],
 81 |     [0.0211,0.0003,0.0124,0.0073,0.1951,0.0491,0.0328,0.0,0.0006,0.4092,0.1544,0.0028,0.0,0.0,0.0,0.0,0.1148,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0],
 82 |     [0.0171,0.0554,0.0437,0.0076,0.0993,0.0954,0.0054,0.0,0.0035,0.018,0.3803,0.2664,0.0,0.0,0.0,0.0,0.008,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0],
 83 |     [0.0178,0.173,0.047,0.0104,0.0,0.1127,0.0002,0.0,0.206,0.0061,0.0015,0.4197,0.0,0.0,0.0,0.0,0.0057,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0],
 84 |     [0.021,0.0004,0.0122,0.0074,0.0465,0.0322,0.0338,0.1603,0.0006,0.0027,0.0033,0.0016,0.3215,0.0952,0.1583,0.0,0.0003,0.0108,0.012,0.078,0.0,0.0,0.0,0.0,0.0018,0.0,0.0,0.0],
 85 |     [0.022,0.0004,0.0121,0.0077,0.1387,0.0511,0.039,0.0801,0.001,0.182,0.0531,0.003,0.0287,0.2304,0.0417,0.0,0.086,0.0055,0.0034,0.0134,0.0,0.0,0.0,0.0,0.0005,0.0,0.0,0.0],
 86 |     [0.0182,0.0603,0.0485,0.0096,0.0072,0.0803,0.0077,0.1515,0.0047,0.0048,0.0938,0.1358,0.0058,0.0105,0.35,0.0,0.0029,0.0003,0.0032,0.0044,0.0,0.0,0.0,0.0,0.0003,0.0,0.0,0.0],
 87 |     [0.0236,0.0001,0.0123,0.008,0.0565,0.0357,0.0409,0.1634,0.0008,0.0036,0.0039,0.003,0.0783,0.0964,0.0464,0.3144,0.0004,0.0025,0.0029,0.0675,0.0065,0.0037,0.0283,0.0,0.001,0.0002,0.0,0.0],
 88 |     [0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0217,0.2532,0.0442,0.007,0.0,0.0,0.0,0.0,0.6739,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0],
 89 |     [0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0235,0.0004,0.0147,0.0082,0.203,0.0551,0.0326,0.0,0.0008,0.4362,0.0913,0.0029,0.0,0.0,0.0,0.0,0.1312,0.0,0.0,0.0],
 90 |     [0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0193,0.0707,0.0503,0.0088,0.1471,0.0612,0.0041,0.0,0.0049,0.014,0.4407,0.1681,0.0,0.0,0.0,0.0,0.0109,0.0,0.0,0.0],
 91 |     [0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0175,0.1824,0.0475,0.0094,0.0,0.1378,0.0005,0.0,0.2136,0.0267,0.0074,0.3434,0.0,0.0,0.0,0.0,0.0138,0.0,0.0,0.0],
 92 |     [0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0232,0.0004,0.0154,0.0088,0.0603,0.0396,0.0354,0.1416,0.0008,0.0047,0.0048,0.002,0.3579,0.0968,0.0734,0.0,0.1344,0.0002,0.0,0.0],
 93 |     [0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0208,0.0005,0.0133,0.0086,0.1405,0.0498,0.0356,0.0812,0.0008,0.1847,0.0687,0.0041,0.0248,0.2085,0.0228,0.0,0.1306,0.0048,0.0,0.0],
 94 |     [0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0131,0.0619,0.0434,0.0079,0.0056,0.0637,0.0063,0.273,0.005,0.0056,0.0882,0.1099,0.0085,0.0248,0.2674,0.0,0.0132,0.0025,0.0,0.0],
 95 |     [0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0222,0.0002,0.0126,0.0095,0.0541,0.0362,0.0393,0.1502,0.0009,0.0053,0.0053,0.0027,0.0911,0.0921,0.0489,0.2877,0.1368,0.0048,0.0002,0.0],
 96 |     [0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0223,0.2574,0.0438,0.0063,0.0,0.0,0.0,0.0,0.6703,0.0,0.0,0.0],
 97 |     [0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0239,0.001,0.0209,0.009,0.1944,0.0529,0.0216,0.0,0.6753,0.0009,0.0,0.0],
 98 |     [0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0183,0.0923,0.0534,0.0079,0.1711,0.0305,0.0005,0.0,0.6208,0.0051,0.0,0.0],
 99 |     [0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0181,0.1503,0.0411,0.0073,0.0,0.1556,0.0,0.0,0.6256,0.002,0.0,0.0],
100 |     [0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0209,0.001,0.0226,0.0098,0.0609,0.0445,0.0254,0.1204,0.6863,0.0073,0.0009,0.0],
101 |     [0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0206,0.001,0.0205,0.01,0.1133,0.0497,0.0233,0.0931,0.6614,0.006,0.0011,0.0],
102 |     [0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0159,0.0824,0.0472,0.0072,0.0008,0.0311,0.0007,0.2222,0.5821,0.0054,0.0051,0.0],
103 |     [0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0214,0.0011,0.0218,0.0106,0.0544,0.046,0.0274,0.1205,0.683,0.0054,0.0073,0.001]
104 | ]
105 | 


--------------------------------------------------------------------------------
/markov_chain.py:
--------------------------------------------------------------------------------
  1 | import numpy as np
  2 | import pandas as pd
  3 | import seaborn as sns
  4 | import itertools
  5 | import matplotlib.pyplot as plt
  6 | #import concurrent.futures
  7 | import time
  8 | #from operator import itemgetter
  9 | import datetime
 10 | 
 11 | from helper import PRIOR, OUTS, RUNS, timeit
 12 | 
 13 | class markov():
 14 |     """
 15 |     """
 16 |     def __init__(self):
 17 |         self.data = None
 18 |         self.initalize_matrices()
 19 |         self.batter_list = ['goldp001','donaj001','cruzn002','vottj001','davic003']
 20 |         self.batting_line = [
 21 |                 ['goldp001', 'Paul GoldSchmidt'],
 22 |                 ['donaj001', 'Josh Donaldson'],
 23 |                 ['cruzn002', 'Nelson Cruz'],
 24 |                 ['cabrm001', 'Migruel Cabrera'],
 25 |                 ['vottj001', 'Joey Votto'],
 26 |                 ['mccua001', 'Andrew McCutchen'],
 27 |                 ['machm001', 'Manny Machado'],
 28 |                 ['heywj001', 'Jason Heyward'],
 29 |                 ['davic003', 'Chris Davis']
 30 |         ]
 31 | 
 32 |     #T is a block matrix of A,B,C,D,E,F,0s,1s
 33 |     def initalize_matrices(self):
 34 |         """
 35 |         """
 36 |         A = np.zeros([8,8])
 37 |         B = np.zeros([8,8])
 38 |         C = np.zeros([8,8])
 39 | 
 40 |         D = np.zeros([8,4])
 41 |         E = np.zeros([8,4])
 42 |         F = np.zeros([8,4])
 43 | 
 44 |         P0_1 = np.zeros([8,8])
 45 |         P0_4 = np.zeros([4,8])
 46 |         P1 = np.repeat([[1,0,0,0]],4,axis=0)
 47 | 
 48 |         T = np.block([
 49 |             [ A  , B ,  C  , D ],
 50 |             [P0_1, A ,  B  , E ],
 51 |             [P0_1,P0_1, A  , F ],
 52 |             [P0_4,P0_4,P0_4 ,P1]
 53 |         ])
 54 |         outs = np.array(OUTS)
 55 |         runs = np.array(RUNS)
 56 |         prior = np.array(PRIOR)
 57 | 
 58 |         #self.T = T
 59 | 
 60 |         self.runs = runs
 61 |         self.outs = outs
 62 |         self.prior = prior
 63 | 
 64 |         #return True
 65 | 
 66 | 
 67 |     def get_data(self, path_str = '../markov.csv'):
 68 |         if self.data is None:
 69 |             self.data = pd.read_csv(path_str)
 70 |             return True
 71 |         return False
 72 | 
 73 | 
 74 |     def pad(self, array, shape=[24,28]):
 75 |         rows = shape[0] - array.shape[0]
 76 |         cols = shape[1] - array.shape[1]
 77 | 
 78 |         padded = np.pad(array,  ((rows,0),(0,cols)), constant_values = 0, mode='constant')
 79 |         return padded
 80 | 
 81 | 
 82 |     def transition(self, player_id, precision = 2):
 83 |         """Transition matrix for the player
 84 |         """
 85 |         if self.data is None:
 86 |             self.get_data()
 87 | 
 88 |         pre_post_cnt = self.data[self.data['player_id']==player_id]\
 89 |             .groupby(['pre_state','post_state']).count().reset_index()[['pre_state','post_state','play_runs']]\
 90 |             .pivot('pre_state', 'post_state', 'play_runs').fillna(0).values
 91 | 
 92 |         all_pre_cnt = self.data[self.data['player_id']==player_id].groupby(['pre_state']).count()[['post_state']].values
 93 |         Tp = (pre_post_cnt / all_pre_cnt)
 94 |         Tp = self.pad(Tp, [24,28]) #correct shape for multiplication and broadcasting
 95 | 
 96 |         return np.round(Tp,precision)
 97 | 
 98 | 
 99 |     def er(self, Tp=None):
100 |         """Function to calculate expected run
101 |         """
102 |         Er = np.sum(self.runs*Tp , axis=1).reshape([3,8])
103 |         #print (Er)
104 |         return Er
105 | 
106 | 
107 |     def eo(self, Tp=None):
108 |         """Function to calculate expected out
109 |         """
110 |         Eo = np.sum(self.outs*Tp , axis=1).reshape([3,8])
111 |         #print (Eo)
112 |         return Eo
113 | 
114 | 
115 |     def plot(self, array, title):
116 |         plt.figure(figsize=(16,8))
117 |         plt.title(title)
118 |         sns.heatmap(array, annot=True)
119 |         plt.show()
120 | 
121 | 
122 |     def plot_T(self, Tp=None, player_id=None):
123 |         """Plot transition matrix
124 |         """
125 |         if player_id:
126 |             Tp = self.transition(player_id=player_id)
127 |             self.plot(Tp, title=player_id)
128 |         else: print ('No player id or dataset passed.')
129 | 
130 | 
131 |     def plot_er(self, player_id=None):
132 |         """Plot expected run
133 |         """
134 |         if player_id:
135 |             Tp = self.transition(player_id=player_id)
136 |             Er = self.er(Tp)
137 |             self.plot(Er, title=player_id)
138 |         else: print ('No player id or dataset passed.')
139 | 
140 | 
141 |     def plot_eo(self, player_id=None):
142 |         """Plot expected run
143 |         """
144 |         if player_id:
145 |             Tp = self.transition(player_id=player_id)
146 |             Eo = self.eo(Tp)
147 |             self.plot(Eo, title=player_id)
148 |         else: print ('No player id or dataset passed.')
149 | 
150 | 
151 |     def batting_line_T(self, batter_list):
152 |         """final results has 3D --> shape[N, 24, 24] where N is no of batters
153 |         """
154 |         bT = np.zeros([len(batter_list),24,24], dtype=float)
155 |         #bT = []
156 |         #bT[i].append(self.transition(player_id = batter, precision = 10)[:24,:24])
157 |         for i in range(len(batter_list)):
158 |             bT[i,:,:] = self.transition(player_id = batter_list[i], precision = 10)[:24,:24]
159 |         return bT
160 | 
161 | 
162 |     def move_state(self, pre_state, Tp):
163 |         post_state = None
164 |         random_prob = np.random.uniform()
165 |         Tp_cum = np.cumsum(Tp, axis=1)
166 |         transition = Tp_cum[pre_state]
167 |         while post_state is None:
168 |             try:
169 |                 post_state = int(np.min(np.where(transition >= random_prob)))
170 |             except:
171 |                 random_prob = np.random.uniform()
172 | 
173 |         return post_state
174 | 
175 |     def play_game(self, T=None, innings=9, N =1000):
176 |         '''Game simulations for one T matrix
177 |         '''
178 |         if not T:
179 |             T = self.prior
180 |         tot_runs = []
181 |         for n in range(N):
182 |             runs = 0
183 | 
184 |             for i in range(0, innings, 1): #full game
185 |                 pre_state = 0
186 | 
187 |                 while pre_state < 24:
188 | 
189 |                     post_state = self.move_state(pre_state, T)
190 | 
191 |                     runs += self.runs[pre_state, post_state]
192 |                     pre_state = post_state
193 | 
194 |             tot_runs.append(runs)
195 |         return np.mean(tot_runs)
196 | 
197 |     def simulate_games(self, batter_list=['goldp001','donaj001','cruzn002','vottj001','cabrm001','mccua001','machm001','heywj001','davic003'], N = 10, innings = 9):
198 |         """it runs random selection of states for innings
199 |         """
200 |         if batter_list is None:
201 |             batter_list = self.batter_list
202 | 
203 |         batting_line_T = self.batting_line_T(self.batter_list)
204 | 
205 |         tot_runs = []
206 |         for n in range(N):
207 | 
208 |             runs = 0
209 |             batting = 0
210 | 
211 |             for i in range(0, innings, 1): #full game
212 |                 pre_state = 0
213 | 
214 |                 while pre_state < 24:
215 |                     if batting > len(batting_line_T)-1: #allows for any number of batters, including one
216 |                         batting = 0
217 |                     post_state = self.move_state(pre_state, batting_line_T[batting])
218 | 
219 |                     runs += self.runs[pre_state, post_state]
220 |                     pre_state = post_state
221 | 
222 |                     batting += 1
223 | 
224 |             tot_runs.append(runs)
225 | 
226 |         return np.mean(tot_runs)
227 |         #sns.distplot((tot_runs))
228 |         #plt.title(np.mean(tot_runs))
229 |         #plt.show()
230 | 
231 |     # def batter_permutations(self, batter_list = ['goldp001','donaj001','cruzn002','vottj001','cabrm001','mccua001','machm001','heywj001','davic003']):
232 |     #     """create all possible permutations of batters
233 |     #     """
234 |     #     permutations = itertools.permutations(batter_list)
235 |     #     self.permutations = [i for i in permutations]
236 |     #
237 |     #     return [i for i in range(len(self.permutations))]
238 | 
239 | 
240 |     # def optimize_batting(self, permutations):
241 |     #
242 |     #     results = []
243 |     #     for loop, batter_list in enumerate(permutations):
244 |     #         #print (batter_list)
245 |     #         #print (loop)
246 |     #         avg_runs = self.simulate_games(batter_list=batter_list, N=2)
247 |     #         results.append([batter_list, avg_runs])
248 |     #
249 |     #     #results = sorted(results, key=itemgetter(-1), reverse=True)
250 |     #     #plt.plot(np.array(results)[:,-1])
251 |     #     #plt.show()
252 |     #     #print (results)
253 |     #     return results
254 | 
255 |     # def optimize_batting2(self, index):
256 |     #
257 |     #     batter_list = self.permutations[index]
258 |     #     avg_runs = self.simulate_games(batter_list=batter_list, N=10)
259 |     #     results = [batter_list, avg_runs]
260 |     #     #print (index) if index % 10000 == 0 else None
261 |     #     #results = sorted(results, key=itemgetter(-1), reverse=True)
262 |     #     #plt.plot(np.array(results)[:,-1])
263 |     #     #plt.show()
264 |     #     #print (results)
265 |     #     return results
266 | 
267 |     #def expected_run(self, Tp):
268 |     #    Er = np.sum(self.runs[:24,:24]*Tp , axis=1).reshape([24,1])
269 |     #    return Er
270 | 
271 |     def rotate_batters(self, current_batter_list):
272 |         """rotate batters once, to the left"""
273 |         next_batter_list = current_batter_list[1:]
274 |         next_batter_list.append(current_batter_list[0])
275 |         return next_batter_list
276 | 
277 | 
278 |     def line_runs_transitions(self, batter_list=['goldp001','donaj001','cruzn002','vottj001','cabrm001','mccua001','machm001','heywj001','davic003']):
279 |         """dictionaries with transitions and runs for batting line
280 |         """
281 |         transitions = {}
282 |         runs = {}
283 | 
284 |         for player in batter_list:#9 loops
285 |             transitions[player] = self.transition(player_id = player, precision = 10)[:24,:24]
286 |             runs[player] = np.sum(self.runs[:24,:24]*transitions[player], axis=1)
287 | 
288 |         return transitions, runs
289 | 
290 |     #have to create an ordered permutation to reduce lookup time in optimization
291 |     @timeit
292 |     def batter_permutation(self, batter_list):
293 |         """it creates 3X times the number of batters, to account for potential
294 |         long innings, where a full rotation happens (~0.01% of cases)
295 |         """
296 |         n = len(batter_list)
297 |         #first get all possibe sequences permutaitons of (n-1)
298 |         combinations = itertools.permutations(batter_list[1:])
299 |         ordered_batter_list = []
300 |         for combination in combinations:
301 |             batter_seq = [batter_list[0]] + list(combination)
302 |             ordered_batter_list.append(batter_seq+batter_seq+batter_seq)
303 |             for i in range(n-1):
304 |                 batter_seq = self.rotate_batters(batter_seq)
305 |                 ordered_batter_list.append(batter_seq+batter_seq+batter_seq)
306 | 
307 |         return ordered_batter_list
308 | 
309 |     #@timeit
310 |     def Er_out_Matrix(self, permutations_list, runs, transitions, dim):
311 |         """Returns earned run and out matrix for each batter lineup
312 |         It is the backbone for the optimal batter calculation
313 |         """
314 | 
315 | 
316 |         ER_matrix = np.zeros([len(permutations_list),24])
317 |         Outs_matrix = np.zeros([len(permutations_list),27]) #Out prob, starting from that batter
318 | 
319 |         #batters_list = []
320 |         for loop, batters in enumerate(permutations_list):
321 |             #batters_list.append(list(batters[:9]))
322 | 
323 |             #T_list = np.zeros([len(batters),24,24])
324 |             Er_list = np.zeros([len(batters),24,1])
325 | 
326 |             current_T = np.identity(24)
327 | 
328 |             for i in range(len(batters)):
329 | 
330 |                 current_T = np.dot(current_T, transitions[batters[i]])
331 |                 current_run = runs[batters[i]]
332 | 
333 |                 if i == 0:
334 |                     ER = np.dot(np.identity(24), current_run)
335 |                     current_U = transitions[batters[i]][0,:]
336 |                     prob = 1-np.sum(current_U)
337 |                     Outs_matrix[loop][i] = prob
338 |                 else:
339 |                     ER += np.dot(current_T, current_run)
340 |                     prob = 1-np.sum(np.dot(current_U, transitions[batters[i]]))
341 |                     current_U = np.dot(current_U, transitions[batters[i]])
342 |                     Outs_matrix[loop][i] = prob - np.sum(Outs_matrix[loop][2:])
343 | 
344 |             ER_matrix[loop] = ER.reshape([24,])
345 | 
346 |         Outs_matrix[:,:2] = 0.
347 | 
348 |         return ER_matrix, Outs_matrix
349 | 
350 | 
351 |     #@timeit
352 |     def expected_run(self,batters_out_prob, batters_er_on_00, dim):
353 |         er_total = 0
354 |         new = np.zeros((dim,dim))
355 |         new[:,1:] = batters_out_prob[:,:-1]
356 |         new[:,0] = batters_out_prob[:,-1]
357 | 
358 |         er1 = batters_er_on_00[0,0]
359 | 
360 |         start_second = new[0,:]
361 |         first_ER = batters_er_on_00[:,0]
362 |         er2 = np.dot(start_second, first_ER)
363 | 
364 |         start_third = start_second.reshape(dim,1) * new
365 |         er3 = np.sum(start_third * batters_er_on_00)
366 | 
367 |         er_total = er1 + er2 + er3
368 | 
369 |         start_previous = start_third
370 |         for last_innings in [4,5,6,7,8,9]:
371 |             start_next = np.zeros((dim,1))
372 | 
373 |             for i in range(dim):
374 | 
375 |                 if i == 8: #diaognal 0 = last player = player 8
376 |                     x = 0
377 |                 else:
378 |                     x = i + 1
379 | 
380 |                 start_next[i,:] = np.trace(start_previous[:,::-1], -x) + np.trace(start_previous[:,::-1], dim - x) #adding up the diagonals
381 | 
382 |             start_next = start_next.reshape(dim,1) * new
383 |             start_previous = start_next.copy()
384 |             er_total += np.sum(start_next * batters_er_on_00)
385 | 
386 |         return er_total
387 | 
388 | 
389 |     #@timeit
390 |     def out_er(self, ER_matrix, Outs_matrix, batters_list, dim, index_st):
391 |         batters_out_prob = np.zeros([dim,dim])
392 |         batters_er_on_00 = np.zeros([dim,1])
393 | 
394 |         index_end = index_st + dim
395 |         batters_out_prob = Outs_matrix[index_st:index_end,0:dim]
396 |         batters_er_on_00[:,0] = ER_matrix[index_st:index_end,0]
397 | 
398 |         batters_out_prob[:,:2] = 0.
399 | 
400 |         return batters_out_prob, batters_er_on_00
401 | 
402 | 
403 |     @timeit
404 |     def optimize_line(self, batter_list=['goldp001','donaj001','cruzn002','vottj001','cabrm001','mccua001','machm001','heywj001','davic003']):
405 |         dim = 9
406 | 
407 |         transitions, runs = self.line_runs_transitions(batter_list)
408 |         permutations_list = self.batter_permutation(batter_list)
409 | 
410 |         ER_matrix, Outs_matrix = self.Er_out_Matrix(permutations_list=permutations_list, runs=runs, transitions=transitions,dim=dim)
411 | 
412 |         er_total = np.zeros((len(permutations_list), 1))
413 | 
414 |         for loop in range(0,len(permutations_list),dim):
415 | 
416 |             for i in range(dim):
417 |                 batters_out_prob = np.zeros([dim,dim])
418 |                 batters_er_on_00 = np.zeros([dim,1])
419 | 
420 |                 if i == 0:
421 |                     index_st = loop
422 |                     index_end = loop + dim
423 |                     batters_out_prob = Outs_matrix[index_st:index_end,0:dim]
424 |                     batters_er_on_00[:,0] = ER_matrix[index_st:index_end,0]
425 |                     batters_out_prob[:,:2] = 0.
426 | 
427 |                     p_batters_out_prob = batters_out_prob
428 |                     p_batters_er_on_00 = batters_er_on_00
429 |                 else: #rotate matrices for different starting batters
430 |                     batters_out_prob[:,:-i] = p_batters_out_prob[:,i:]
431 |                     batters_out_prob[:,-i:] = p_batters_out_prob[:,:i]
432 | 
433 |                     batters_er_on_00[:-i,0] = p_batters_er_on_00[i:,0]
434 |                     batters_er_on_00[-i:,0] = p_batters_er_on_00[:i:,0]
435 | 
436 |                 er_total[loop+i,0] = self.expected_run(batters_out_prob=batters_out_prob, batters_er_on_00=batters_er_on_00, dim=dim)
437 | 
438 |         best_lines = np.array(permutations_list).reshape([len(permutations_list),27])[:,:9]
439 | 
440 |         return er_total, best_lines
441 | 
442 | 
443 | if __name__ == '__main__':
444 |     mk = markov()
445 | 
446 |     #using parallel code (max 10 cores)'''
447 |     print ('Starting:\t', datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S'))
448 |     max_cores_to_use = 10
449 |     '''
450 |     chunksize = 10000
451 | 
452 |     print ('Starting:\t', datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S'))
453 | 
454 |     permutations_indices = mk.batter_permutations()
455 |     print (len(permutations_indices))
456 |     results = []
457 |     #for result in map(mk.optimize_batting2,permutations_indices):
458 |     #    results.append(result)
459 | 
460 | 
461 |     with concurrent.futures.ProcessPoolExecutor(max_workers=max_cores_to_use) as executor:
462 |         #for result in executor.map(mk.optimize_batting2, permutations_indices):
463 |         results = zip(executor.map(mk.optimize_batting2, permutations_indices))
464 | 
465 |     #print (results)
466 | 
467 |     #plt.plot(np.array(results)[:,-1])
468 |     #plt.show()
469 |     '''
470 |     #Altuve, Blackmon, J.D. Martinez, Stanton, Inciarte, Jose Ramirez, Votto, Pujols, Trout
471 |     best_2017 = ['altuj001','martj006','blacc001','stanm004','incie001','ramij003','vottj001','pujoa001','troum001']
472 | 
473 |     red_socks = ['bettm001','benia002','martj006','bogax001','holtb002','vazqc001','nunee002','leons001','bradj001']
474 |     er_total, bline = mk.optimize_line(batter_list = best_2017)
475 |     #er_games = mk.simulate_games()
476 | 
477 |     print ('Finishing:\t', datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S'))
478 | 


--------------------------------------------------------------------------------