├── .gitignore ├── LICENSE ├── README.md ├── helper.py └── markov_chain.py /.gitignore: -------------------------------------------------------------------------------- 1 | *.pyc 2 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2018 Lucas Calestini 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Markov Chain applied to baseball 2 | 3 | ## Game Simulation 4 | 5 | It runs 9 innings using one or multiple batters transition matrices. It will run the transitions in the sequence of the list of batters that is passed (no limits applied). 6 | The code runs 9 loops (one per inning) of a while loop, generating random states from 0 --> 25-28. 7 | 8 | ## Expected Runs 9 | 10 | 11 | ## Batting Line Optimization 12 | 13 | If we denote the numbers of batters as n, then all possible combinations for a batting line are n! (or the permutations of n). For a 9 batters, we have a total possible combination of 362,880 for the batting line. This number encompasses all true possibilities with the assumption that position matters, not only the sequence. 14 | 15 | With this, we can know the expected run given a sequence of batters, for the first inning. 16 | 17 | ## Explanation 18 | 19 | There are 28 states in baseball innings: 20 | 21 | |Runners|None|1st|2nd|3rd|1&2|1&3|2&3|1,2,3| 22 | |-:|-:|-:|-:|-:|-:|-:|-:|-:| 23 | |Outs||||||||| 24 | |0|#1|#2|#3|#4|#5|#6|#7|#8| 25 | |1|#9|#10|#11|#12|#13|#14|#15|#16| 26 | |2|#17|#18|#19|#20|#21|#22|#23|#24| 27 | 28 | - State 25: 3 out, 0 runs 29 | - State 26: 3 out, 1 run 30 | - State 27: 3 out, 2 runs 31 | - State 28: 3 out, 3 runs 32 | 33 | ```python 34 | ''' 35 | 24 states (state-space S): 36 | 37 | Runners:|None|1st|2nd|3rd|1&2|1&3|2&3|1,2,3| 38 | Outs 39 | 0|#1|#2|#3|#4|#5|#6|#7|#8| 40 | 1|#9|#10|#11|#12|#13|#14|#15|#16| 41 | 2|#17|#18|#19|#20|#21|#22|#23|#24| 42 | 43 | 44 | 3outs,0runs=#25 45 | 3outs,1run=#26 46 | 3outs,2runs=#27 47 | 3outs,3runs=#28 48 | 49 | pij is the probability of moving from state i to state j. Therefore the 50 | transition matrix (stochastic matrix) of pij is: 51 | 52 | __ 53 | |p1,1|p1,2|p1,3p1,28| 54 | |p2,1|p2,2|p2,3p2,28| 55 | T=|p3,1,|p3,2|p3,3p3,28| 56 | |()|()|()()| 57 | |p28,1|p28,2|p28,3p28,28| 58 | 59 | Tshape=(28,28) 60 | 61 | The matrix can also be read as a combination of from-to(pre-transition and 62 | post-transition) situations, where row-wise it has to add to 1 as it represents 63 | the same origin state and all end states. 64 | 65 | The matrix above can be represented as a block matrix: 66 | __ 67 | |A0 B0 C0 D0| 68 | P = |0 A1 B1 E1| 69 | |0 0 A2 F2| 70 | |0 0 0 1 | 71 | 72 | Where A(8X8) is situation with no out, B(8X8) with one out, C(8X8)from zero to 73 | two, D(8X4), E(8X4) and F(8X4) to end the inning,0 matrices for impossible 74 | scenarios, and 1(4X4). The latter, specifically, is of the form: 75 | 76 | |1000| 77 | 1=|1000| 78 | |1000| 79 | |1000| 80 | 81 | The reason is so all outs end in state 25(absorbing state). 82 | 83 | 84 | Every transition brings one possible number of runs, so we can have a run matrix 85 | R(28X1) of all expected runs given original state i: 86 | 87 | |R(1)=p1,1| 88 | |R(2)=p2,2| 89 | R= |()| 90 | |R(4)=2*p4,1+p4,4+p4,7+p4,10+p4,2| 91 | |()| 92 | |R(28)=0| 93 | 94 | Rshape=(28,1) 95 | 96 | There as on why R(1)=p1,1 is because from no out and no runner(#1),a batter 97 | can only go back to the same situation to score a run. Anything else is not a 98 | run. Similarly for R(2) and R(3) as in those scenarios there is no one on base 99 | 100 | We can then keep track of the runs/state in the inning by using a matrix U of 101 | 20-25rows(max of runs in the inning) X 28 columns(current state). 102 | ''' 103 | ``` 104 | 105 | Sources 106 | - [Markov Chain Models: Theoretical Background](http://www.pankin.com/markov/theory.htm) 107 | - https://wwwjstororg/stable/171922?seq=1#page_scan_tab_contents 108 | - https://enwikipediaorg/wiki/Stochastic_matrix#Definition_and_properties 109 | - http://statshackercom/the-markov-chain-model-of-baseball#prettyPhoto 110 | - https://enwikipediaorg/wiki/State_space 111 | - https://enwikipediaorg/wiki/Block_matrix 112 | -------------------------------------------------------------------------------- /helper.py: -------------------------------------------------------------------------------- 1 | 2 | import time 3 | 4 | def timeit(method): 5 | 6 | def timed(*args, **kw): 7 | ts = time.time() 8 | result = method(*args, **kw) 9 | te = time.time() 10 | 11 | print ('{0} {1:.5f} sec'.format(method.__name__, te-ts)) 12 | return result 13 | 14 | return timed 15 | 16 | 17 | RUNS=[ 18 | [1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0], 19 | [2,1,1,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0], 20 | [2,1,1,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0], 21 | [2,1,1,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0], 22 | [3,2,2,2,1,1,1,0,2,1,1,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0], 23 | [3,2,2,2,1,1,1,0,2,1,1,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0], 24 | [3,2,2,2,1,1,1,0,2,1,1,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0], 25 | [4,3,3,3,2,2,2,1,3,2,2,2,1,1,1,0,2,1,1,1,0,0,0,0,0,1,0,0], 26 | [0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0], 27 | [0,0,0,0,0,0,0,0,2,1,1,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0], 28 | [0,0,0,0,0,0,0,0,2,1,1,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0], 29 | [0,0,0,0,0,0,0,0,2,1,1,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0], 30 | [0,0,0,0,0,0,0,0,3,2,2,2,1,1,1,0,2,1,1,1,0,0,0,0,0,1,0,0], 31 | [0,0,0,0,0,0,0,0,3,2,2,2,1,1,1,0,2,1,1,1,0,0,0,0,0,1,0,0], 32 | [0,0,0,0,0,0,0,0,3,2,2,2,1,1,1,0,2,1,1,1,0,0,0,0,0,1,0,0], 33 | [0,0,0,0,0,0,0,0,4,3,3,3,2,2,2,1,3,2,2,2,1,1,1,0,0,1,2,0], 34 | [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0], 35 | [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,1,1,1,0,0,0,0,0,1,0,0], 36 | [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,1,1,1,0,0,0,0,0,1,0,0], 37 | [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,1,1,1,0,0,0,0,0,1,0,0], 38 | [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,2,2,2,1,1,1,0,0,1,2,0], 39 | [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,2,2,2,1,1,1,0,0,1,2,0], 40 | [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,2,2,2,1,1,1,0,0,1,2,0], 41 | [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,4,3,3,3,2,2,2,1,0,1,2,3] 42 | # [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0], 43 | # [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0], 44 | # [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0], 45 | # [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0] 46 | ] 47 | 48 | OUTS=[ 49 | [0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0], 50 | [0,0,0,0,0,0,0,0,1,1,1,1,0,0,0,0,2,0,0,0,0,0,0,0,0,0,0,0], 51 | [0,0,0,0,0,0,0,0,1,1,1,1,0,0,0,0,2,0,0,0,0,0,0,0,0,0,0,0], 52 | [0,0,0,0,0,0,0,0,1,1,1,1,0,0,0,0,2,0,0,0,0,0,0,0,0,0,0,0], 53 | [0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,0,2,2,2,2,0,0,0,0,3,0,0,0], 54 | [0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,0,2,2,2,2,0,0,0,0,3,0,0,0], 55 | [0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,0,2,2,2,2,0,0,0,0,3,0,0,0], 56 | [0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,0,3,3,0,0], 57 | [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0], 58 | [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,0,0,0,0,2,0,0,0], 59 | [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,0,0,0,0,2,0,0,0], 60 | [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,0,0,0,0,2,0,0,0], 61 | [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,0,2,2,0,0], 62 | [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,0,2,2,0,0], 63 | [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,0,2,2,0,0], 64 | [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,2,2,2,0], 65 | [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0], 66 | [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0], 67 | [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0], 68 | [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0], 69 | [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,0], 70 | [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,0], 71 | [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,0], 72 | [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1] 73 | #[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0], 74 | #[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0], 75 | #[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0], 76 | #[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0] 77 | ] 78 | 79 | PRIOR = [ 80 | [0.0238,0.2554,0.0466,0.0075,0.0,0.0,0.0,0.0,0.6668,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0], 81 | [0.0211,0.0003,0.0124,0.0073,0.1951,0.0491,0.0328,0.0,0.0006,0.4092,0.1544,0.0028,0.0,0.0,0.0,0.0,0.1148,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0], 82 | [0.0171,0.0554,0.0437,0.0076,0.0993,0.0954,0.0054,0.0,0.0035,0.018,0.3803,0.2664,0.0,0.0,0.0,0.0,0.008,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0], 83 | [0.0178,0.173,0.047,0.0104,0.0,0.1127,0.0002,0.0,0.206,0.0061,0.0015,0.4197,0.0,0.0,0.0,0.0,0.0057,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0], 84 | [0.021,0.0004,0.0122,0.0074,0.0465,0.0322,0.0338,0.1603,0.0006,0.0027,0.0033,0.0016,0.3215,0.0952,0.1583,0.0,0.0003,0.0108,0.012,0.078,0.0,0.0,0.0,0.0,0.0018,0.0,0.0,0.0], 85 | [0.022,0.0004,0.0121,0.0077,0.1387,0.0511,0.039,0.0801,0.001,0.182,0.0531,0.003,0.0287,0.2304,0.0417,0.0,0.086,0.0055,0.0034,0.0134,0.0,0.0,0.0,0.0,0.0005,0.0,0.0,0.0], 86 | [0.0182,0.0603,0.0485,0.0096,0.0072,0.0803,0.0077,0.1515,0.0047,0.0048,0.0938,0.1358,0.0058,0.0105,0.35,0.0,0.0029,0.0003,0.0032,0.0044,0.0,0.0,0.0,0.0,0.0003,0.0,0.0,0.0], 87 | [0.0236,0.0001,0.0123,0.008,0.0565,0.0357,0.0409,0.1634,0.0008,0.0036,0.0039,0.003,0.0783,0.0964,0.0464,0.3144,0.0004,0.0025,0.0029,0.0675,0.0065,0.0037,0.0283,0.0,0.001,0.0002,0.0,0.0], 88 | [0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0217,0.2532,0.0442,0.007,0.0,0.0,0.0,0.0,0.6739,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0], 89 | [0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0235,0.0004,0.0147,0.0082,0.203,0.0551,0.0326,0.0,0.0008,0.4362,0.0913,0.0029,0.0,0.0,0.0,0.0,0.1312,0.0,0.0,0.0], 90 | [0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0193,0.0707,0.0503,0.0088,0.1471,0.0612,0.0041,0.0,0.0049,0.014,0.4407,0.1681,0.0,0.0,0.0,0.0,0.0109,0.0,0.0,0.0], 91 | [0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0175,0.1824,0.0475,0.0094,0.0,0.1378,0.0005,0.0,0.2136,0.0267,0.0074,0.3434,0.0,0.0,0.0,0.0,0.0138,0.0,0.0,0.0], 92 | [0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0232,0.0004,0.0154,0.0088,0.0603,0.0396,0.0354,0.1416,0.0008,0.0047,0.0048,0.002,0.3579,0.0968,0.0734,0.0,0.1344,0.0002,0.0,0.0], 93 | [0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0208,0.0005,0.0133,0.0086,0.1405,0.0498,0.0356,0.0812,0.0008,0.1847,0.0687,0.0041,0.0248,0.2085,0.0228,0.0,0.1306,0.0048,0.0,0.0], 94 | [0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0131,0.0619,0.0434,0.0079,0.0056,0.0637,0.0063,0.273,0.005,0.0056,0.0882,0.1099,0.0085,0.0248,0.2674,0.0,0.0132,0.0025,0.0,0.0], 95 | [0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0222,0.0002,0.0126,0.0095,0.0541,0.0362,0.0393,0.1502,0.0009,0.0053,0.0053,0.0027,0.0911,0.0921,0.0489,0.2877,0.1368,0.0048,0.0002,0.0], 96 | [0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0223,0.2574,0.0438,0.0063,0.0,0.0,0.0,0.0,0.6703,0.0,0.0,0.0], 97 | [0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0239,0.001,0.0209,0.009,0.1944,0.0529,0.0216,0.0,0.6753,0.0009,0.0,0.0], 98 | [0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0183,0.0923,0.0534,0.0079,0.1711,0.0305,0.0005,0.0,0.6208,0.0051,0.0,0.0], 99 | [0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0181,0.1503,0.0411,0.0073,0.0,0.1556,0.0,0.0,0.6256,0.002,0.0,0.0], 100 | [0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0209,0.001,0.0226,0.0098,0.0609,0.0445,0.0254,0.1204,0.6863,0.0073,0.0009,0.0], 101 | [0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0206,0.001,0.0205,0.01,0.1133,0.0497,0.0233,0.0931,0.6614,0.006,0.0011,0.0], 102 | [0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0159,0.0824,0.0472,0.0072,0.0008,0.0311,0.0007,0.2222,0.5821,0.0054,0.0051,0.0], 103 | [0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0214,0.0011,0.0218,0.0106,0.0544,0.046,0.0274,0.1205,0.683,0.0054,0.0073,0.001] 104 | ] 105 | -------------------------------------------------------------------------------- /markov_chain.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import pandas as pd 3 | import seaborn as sns 4 | import itertools 5 | import matplotlib.pyplot as plt 6 | #import concurrent.futures 7 | import time 8 | #from operator import itemgetter 9 | import datetime 10 | 11 | from helper import PRIOR, OUTS, RUNS, timeit 12 | 13 | class markov(): 14 | """ 15 | """ 16 | def __init__(self): 17 | self.data = None 18 | self.initalize_matrices() 19 | self.batter_list = ['goldp001','donaj001','cruzn002','vottj001','davic003'] 20 | self.batting_line = [ 21 | ['goldp001', 'Paul GoldSchmidt'], 22 | ['donaj001', 'Josh Donaldson'], 23 | ['cruzn002', 'Nelson Cruz'], 24 | ['cabrm001', 'Migruel Cabrera'], 25 | ['vottj001', 'Joey Votto'], 26 | ['mccua001', 'Andrew McCutchen'], 27 | ['machm001', 'Manny Machado'], 28 | ['heywj001', 'Jason Heyward'], 29 | ['davic003', 'Chris Davis'] 30 | ] 31 | 32 | #T is a block matrix of A,B,C,D,E,F,0s,1s 33 | def initalize_matrices(self): 34 | """ 35 | """ 36 | A = np.zeros([8,8]) 37 | B = np.zeros([8,8]) 38 | C = np.zeros([8,8]) 39 | 40 | D = np.zeros([8,4]) 41 | E = np.zeros([8,4]) 42 | F = np.zeros([8,4]) 43 | 44 | P0_1 = np.zeros([8,8]) 45 | P0_4 = np.zeros([4,8]) 46 | P1 = np.repeat([[1,0,0,0]],4,axis=0) 47 | 48 | T = np.block([ 49 | [ A , B , C , D ], 50 | [P0_1, A , B , E ], 51 | [P0_1,P0_1, A , F ], 52 | [P0_4,P0_4,P0_4 ,P1] 53 | ]) 54 | outs = np.array(OUTS) 55 | runs = np.array(RUNS) 56 | prior = np.array(PRIOR) 57 | 58 | #self.T = T 59 | 60 | self.runs = runs 61 | self.outs = outs 62 | self.prior = prior 63 | 64 | #return True 65 | 66 | 67 | def get_data(self, path_str = '../markov.csv'): 68 | if self.data is None: 69 | self.data = pd.read_csv(path_str) 70 | return True 71 | return False 72 | 73 | 74 | def pad(self, array, shape=[24,28]): 75 | rows = shape[0] - array.shape[0] 76 | cols = shape[1] - array.shape[1] 77 | 78 | padded = np.pad(array, ((rows,0),(0,cols)), constant_values = 0, mode='constant') 79 | return padded 80 | 81 | 82 | def transition(self, player_id, precision = 2): 83 | """Transition matrix for the player 84 | """ 85 | if self.data is None: 86 | self.get_data() 87 | 88 | pre_post_cnt = self.data[self.data['player_id']==player_id]\ 89 | .groupby(['pre_state','post_state']).count().reset_index()[['pre_state','post_state','play_runs']]\ 90 | .pivot('pre_state', 'post_state', 'play_runs').fillna(0).values 91 | 92 | all_pre_cnt = self.data[self.data['player_id']==player_id].groupby(['pre_state']).count()[['post_state']].values 93 | Tp = (pre_post_cnt / all_pre_cnt) 94 | Tp = self.pad(Tp, [24,28]) #correct shape for multiplication and broadcasting 95 | 96 | return np.round(Tp,precision) 97 | 98 | 99 | def er(self, Tp=None): 100 | """Function to calculate expected run 101 | """ 102 | Er = np.sum(self.runs*Tp , axis=1).reshape([3,8]) 103 | #print (Er) 104 | return Er 105 | 106 | 107 | def eo(self, Tp=None): 108 | """Function to calculate expected out 109 | """ 110 | Eo = np.sum(self.outs*Tp , axis=1).reshape([3,8]) 111 | #print (Eo) 112 | return Eo 113 | 114 | 115 | def plot(self, array, title): 116 | plt.figure(figsize=(16,8)) 117 | plt.title(title) 118 | sns.heatmap(array, annot=True) 119 | plt.show() 120 | 121 | 122 | def plot_T(self, Tp=None, player_id=None): 123 | """Plot transition matrix 124 | """ 125 | if player_id: 126 | Tp = self.transition(player_id=player_id) 127 | self.plot(Tp, title=player_id) 128 | else: print ('No player id or dataset passed.') 129 | 130 | 131 | def plot_er(self, player_id=None): 132 | """Plot expected run 133 | """ 134 | if player_id: 135 | Tp = self.transition(player_id=player_id) 136 | Er = self.er(Tp) 137 | self.plot(Er, title=player_id) 138 | else: print ('No player id or dataset passed.') 139 | 140 | 141 | def plot_eo(self, player_id=None): 142 | """Plot expected run 143 | """ 144 | if player_id: 145 | Tp = self.transition(player_id=player_id) 146 | Eo = self.eo(Tp) 147 | self.plot(Eo, title=player_id) 148 | else: print ('No player id or dataset passed.') 149 | 150 | 151 | def batting_line_T(self, batter_list): 152 | """final results has 3D --> shape[N, 24, 24] where N is no of batters 153 | """ 154 | bT = np.zeros([len(batter_list),24,24], dtype=float) 155 | #bT = [] 156 | #bT[i].append(self.transition(player_id = batter, precision = 10)[:24,:24]) 157 | for i in range(len(batter_list)): 158 | bT[i,:,:] = self.transition(player_id = batter_list[i], precision = 10)[:24,:24] 159 | return bT 160 | 161 | 162 | def move_state(self, pre_state, Tp): 163 | post_state = None 164 | random_prob = np.random.uniform() 165 | Tp_cum = np.cumsum(Tp, axis=1) 166 | transition = Tp_cum[pre_state] 167 | while post_state is None: 168 | try: 169 | post_state = int(np.min(np.where(transition >= random_prob))) 170 | except: 171 | random_prob = np.random.uniform() 172 | 173 | return post_state 174 | 175 | def play_game(self, T=None, innings=9, N =1000): 176 | '''Game simulations for one T matrix 177 | ''' 178 | if not T: 179 | T = self.prior 180 | tot_runs = [] 181 | for n in range(N): 182 | runs = 0 183 | 184 | for i in range(0, innings, 1): #full game 185 | pre_state = 0 186 | 187 | while pre_state < 24: 188 | 189 | post_state = self.move_state(pre_state, T) 190 | 191 | runs += self.runs[pre_state, post_state] 192 | pre_state = post_state 193 | 194 | tot_runs.append(runs) 195 | return np.mean(tot_runs) 196 | 197 | def simulate_games(self, batter_list=['goldp001','donaj001','cruzn002','vottj001','cabrm001','mccua001','machm001','heywj001','davic003'], N = 10, innings = 9): 198 | """it runs random selection of states for innings 199 | """ 200 | if batter_list is None: 201 | batter_list = self.batter_list 202 | 203 | batting_line_T = self.batting_line_T(self.batter_list) 204 | 205 | tot_runs = [] 206 | for n in range(N): 207 | 208 | runs = 0 209 | batting = 0 210 | 211 | for i in range(0, innings, 1): #full game 212 | pre_state = 0 213 | 214 | while pre_state < 24: 215 | if batting > len(batting_line_T)-1: #allows for any number of batters, including one 216 | batting = 0 217 | post_state = self.move_state(pre_state, batting_line_T[batting]) 218 | 219 | runs += self.runs[pre_state, post_state] 220 | pre_state = post_state 221 | 222 | batting += 1 223 | 224 | tot_runs.append(runs) 225 | 226 | return np.mean(tot_runs) 227 | #sns.distplot((tot_runs)) 228 | #plt.title(np.mean(tot_runs)) 229 | #plt.show() 230 | 231 | # def batter_permutations(self, batter_list = ['goldp001','donaj001','cruzn002','vottj001','cabrm001','mccua001','machm001','heywj001','davic003']): 232 | # """create all possible permutations of batters 233 | # """ 234 | # permutations = itertools.permutations(batter_list) 235 | # self.permutations = [i for i in permutations] 236 | # 237 | # return [i for i in range(len(self.permutations))] 238 | 239 | 240 | # def optimize_batting(self, permutations): 241 | # 242 | # results = [] 243 | # for loop, batter_list in enumerate(permutations): 244 | # #print (batter_list) 245 | # #print (loop) 246 | # avg_runs = self.simulate_games(batter_list=batter_list, N=2) 247 | # results.append([batter_list, avg_runs]) 248 | # 249 | # #results = sorted(results, key=itemgetter(-1), reverse=True) 250 | # #plt.plot(np.array(results)[:,-1]) 251 | # #plt.show() 252 | # #print (results) 253 | # return results 254 | 255 | # def optimize_batting2(self, index): 256 | # 257 | # batter_list = self.permutations[index] 258 | # avg_runs = self.simulate_games(batter_list=batter_list, N=10) 259 | # results = [batter_list, avg_runs] 260 | # #print (index) if index % 10000 == 0 else None 261 | # #results = sorted(results, key=itemgetter(-1), reverse=True) 262 | # #plt.plot(np.array(results)[:,-1]) 263 | # #plt.show() 264 | # #print (results) 265 | # return results 266 | 267 | #def expected_run(self, Tp): 268 | # Er = np.sum(self.runs[:24,:24]*Tp , axis=1).reshape([24,1]) 269 | # return Er 270 | 271 | def rotate_batters(self, current_batter_list): 272 | """rotate batters once, to the left""" 273 | next_batter_list = current_batter_list[1:] 274 | next_batter_list.append(current_batter_list[0]) 275 | return next_batter_list 276 | 277 | 278 | def line_runs_transitions(self, batter_list=['goldp001','donaj001','cruzn002','vottj001','cabrm001','mccua001','machm001','heywj001','davic003']): 279 | """dictionaries with transitions and runs for batting line 280 | """ 281 | transitions = {} 282 | runs = {} 283 | 284 | for player in batter_list:#9 loops 285 | transitions[player] = self.transition(player_id = player, precision = 10)[:24,:24] 286 | runs[player] = np.sum(self.runs[:24,:24]*transitions[player], axis=1) 287 | 288 | return transitions, runs 289 | 290 | #have to create an ordered permutation to reduce lookup time in optimization 291 | @timeit 292 | def batter_permutation(self, batter_list): 293 | """it creates 3X times the number of batters, to account for potential 294 | long innings, where a full rotation happens (~0.01% of cases) 295 | """ 296 | n = len(batter_list) 297 | #first get all possibe sequences permutaitons of (n-1) 298 | combinations = itertools.permutations(batter_list[1:]) 299 | ordered_batter_list = [] 300 | for combination in combinations: 301 | batter_seq = [batter_list[0]] + list(combination) 302 | ordered_batter_list.append(batter_seq+batter_seq+batter_seq) 303 | for i in range(n-1): 304 | batter_seq = self.rotate_batters(batter_seq) 305 | ordered_batter_list.append(batter_seq+batter_seq+batter_seq) 306 | 307 | return ordered_batter_list 308 | 309 | #@timeit 310 | def Er_out_Matrix(self, permutations_list, runs, transitions, dim): 311 | """Returns earned run and out matrix for each batter lineup 312 | It is the backbone for the optimal batter calculation 313 | """ 314 | 315 | 316 | ER_matrix = np.zeros([len(permutations_list),24]) 317 | Outs_matrix = np.zeros([len(permutations_list),27]) #Out prob, starting from that batter 318 | 319 | #batters_list = [] 320 | for loop, batters in enumerate(permutations_list): 321 | #batters_list.append(list(batters[:9])) 322 | 323 | #T_list = np.zeros([len(batters),24,24]) 324 | Er_list = np.zeros([len(batters),24,1]) 325 | 326 | current_T = np.identity(24) 327 | 328 | for i in range(len(batters)): 329 | 330 | current_T = np.dot(current_T, transitions[batters[i]]) 331 | current_run = runs[batters[i]] 332 | 333 | if i == 0: 334 | ER = np.dot(np.identity(24), current_run) 335 | current_U = transitions[batters[i]][0,:] 336 | prob = 1-np.sum(current_U) 337 | Outs_matrix[loop][i] = prob 338 | else: 339 | ER += np.dot(current_T, current_run) 340 | prob = 1-np.sum(np.dot(current_U, transitions[batters[i]])) 341 | current_U = np.dot(current_U, transitions[batters[i]]) 342 | Outs_matrix[loop][i] = prob - np.sum(Outs_matrix[loop][2:]) 343 | 344 | ER_matrix[loop] = ER.reshape([24,]) 345 | 346 | Outs_matrix[:,:2] = 0. 347 | 348 | return ER_matrix, Outs_matrix 349 | 350 | 351 | #@timeit 352 | def expected_run(self,batters_out_prob, batters_er_on_00, dim): 353 | er_total = 0 354 | new = np.zeros((dim,dim)) 355 | new[:,1:] = batters_out_prob[:,:-1] 356 | new[:,0] = batters_out_prob[:,-1] 357 | 358 | er1 = batters_er_on_00[0,0] 359 | 360 | start_second = new[0,:] 361 | first_ER = batters_er_on_00[:,0] 362 | er2 = np.dot(start_second, first_ER) 363 | 364 | start_third = start_second.reshape(dim,1) * new 365 | er3 = np.sum(start_third * batters_er_on_00) 366 | 367 | er_total = er1 + er2 + er3 368 | 369 | start_previous = start_third 370 | for last_innings in [4,5,6,7,8,9]: 371 | start_next = np.zeros((dim,1)) 372 | 373 | for i in range(dim): 374 | 375 | if i == 8: #diaognal 0 = last player = player 8 376 | x = 0 377 | else: 378 | x = i + 1 379 | 380 | start_next[i,:] = np.trace(start_previous[:,::-1], -x) + np.trace(start_previous[:,::-1], dim - x) #adding up the diagonals 381 | 382 | start_next = start_next.reshape(dim,1) * new 383 | start_previous = start_next.copy() 384 | er_total += np.sum(start_next * batters_er_on_00) 385 | 386 | return er_total 387 | 388 | 389 | #@timeit 390 | def out_er(self, ER_matrix, Outs_matrix, batters_list, dim, index_st): 391 | batters_out_prob = np.zeros([dim,dim]) 392 | batters_er_on_00 = np.zeros([dim,1]) 393 | 394 | index_end = index_st + dim 395 | batters_out_prob = Outs_matrix[index_st:index_end,0:dim] 396 | batters_er_on_00[:,0] = ER_matrix[index_st:index_end,0] 397 | 398 | batters_out_prob[:,:2] = 0. 399 | 400 | return batters_out_prob, batters_er_on_00 401 | 402 | 403 | @timeit 404 | def optimize_line(self, batter_list=['goldp001','donaj001','cruzn002','vottj001','cabrm001','mccua001','machm001','heywj001','davic003']): 405 | dim = 9 406 | 407 | transitions, runs = self.line_runs_transitions(batter_list) 408 | permutations_list = self.batter_permutation(batter_list) 409 | 410 | ER_matrix, Outs_matrix = self.Er_out_Matrix(permutations_list=permutations_list, runs=runs, transitions=transitions,dim=dim) 411 | 412 | er_total = np.zeros((len(permutations_list), 1)) 413 | 414 | for loop in range(0,len(permutations_list),dim): 415 | 416 | for i in range(dim): 417 | batters_out_prob = np.zeros([dim,dim]) 418 | batters_er_on_00 = np.zeros([dim,1]) 419 | 420 | if i == 0: 421 | index_st = loop 422 | index_end = loop + dim 423 | batters_out_prob = Outs_matrix[index_st:index_end,0:dim] 424 | batters_er_on_00[:,0] = ER_matrix[index_st:index_end,0] 425 | batters_out_prob[:,:2] = 0. 426 | 427 | p_batters_out_prob = batters_out_prob 428 | p_batters_er_on_00 = batters_er_on_00 429 | else: #rotate matrices for different starting batters 430 | batters_out_prob[:,:-i] = p_batters_out_prob[:,i:] 431 | batters_out_prob[:,-i:] = p_batters_out_prob[:,:i] 432 | 433 | batters_er_on_00[:-i,0] = p_batters_er_on_00[i:,0] 434 | batters_er_on_00[-i:,0] = p_batters_er_on_00[:i:,0] 435 | 436 | er_total[loop+i,0] = self.expected_run(batters_out_prob=batters_out_prob, batters_er_on_00=batters_er_on_00, dim=dim) 437 | 438 | best_lines = np.array(permutations_list).reshape([len(permutations_list),27])[:,:9] 439 | 440 | return er_total, best_lines 441 | 442 | 443 | if __name__ == '__main__': 444 | mk = markov() 445 | 446 | #using parallel code (max 10 cores)''' 447 | print ('Starting:\t', datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S')) 448 | max_cores_to_use = 10 449 | ''' 450 | chunksize = 10000 451 | 452 | print ('Starting:\t', datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S')) 453 | 454 | permutations_indices = mk.batter_permutations() 455 | print (len(permutations_indices)) 456 | results = [] 457 | #for result in map(mk.optimize_batting2,permutations_indices): 458 | # results.append(result) 459 | 460 | 461 | with concurrent.futures.ProcessPoolExecutor(max_workers=max_cores_to_use) as executor: 462 | #for result in executor.map(mk.optimize_batting2, permutations_indices): 463 | results = zip(executor.map(mk.optimize_batting2, permutations_indices)) 464 | 465 | #print (results) 466 | 467 | #plt.plot(np.array(results)[:,-1]) 468 | #plt.show() 469 | ''' 470 | #Altuve, Blackmon, J.D. Martinez, Stanton, Inciarte, Jose Ramirez, Votto, Pujols, Trout 471 | best_2017 = ['altuj001','martj006','blacc001','stanm004','incie001','ramij003','vottj001','pujoa001','troum001'] 472 | 473 | red_socks = ['bettm001','benia002','martj006','bogax001','holtb002','vazqc001','nunee002','leons001','bradj001'] 474 | er_total, bline = mk.optimize_line(batter_list = best_2017) 475 | #er_games = mk.simulate_games() 476 | 477 | print ('Finishing:\t', datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S')) 478 | --------------------------------------------------------------------------------