├── LICENSE ├── README.md ├── bot.py ├── data_prep.py ├── model-fg ├── example.js └── model-fg.js ├── model_train.py ├── package.json ├── plays.py ├── requirements.txt └── winprob.py /LICENSE: -------------------------------------------------------------------------------- 1 | The MIT License (MIT) 2 | 3 | Copyright (c) 2015 Trey Causey, Josh Katz, and Kevin Quealy 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | Fourth Down Bot 2 | =============== 3 | 4 | This is the code that powers the [New York Times 4th Down Bot](http://nyt4thdownbot.com/). Using 5 | NFL play-by-play data from [Armchair Analysis](http://armchairanalysis.com/), this code will: 6 | 7 | - Munge the raw play-by-play data and transform it into a form suitable for modeling 8 | - Create a win probability model and serialize that model 9 | - Provide all of the functions to make optimal 4th down decisions for a given play 10 | 11 | The Armchair Analysis data is *not free*. It costs $49 (one-time) to gain access to play-by-play 12 | data from the 2000-2014 NFL seasons. There is also a professional package that will provide 13 | weekly updates. Without the Armchair Analysis data, you will not be able to use much of this code. 14 | 15 | This code currently requires Python 2.7 and is not Python 3 compliant to our knowledge. Questions about the Python code can be directed to [Trey Causey](mailto:trey@thespread.us). 16 | 17 | Please note that none of the file operations are supported on Windows. 18 | 19 | NOTE: If you are unable to purchase the Armchair Analysis data, [Ben Dilday](https://github.com/bdilday) has created a fork of this code that uses freely available play-by-play data. There are no guarantees that this fork is current with the production version of the 4th Down Bot and this fork is not affiliated in any way with or supported by *The Upshot* or Trey Causey. You can find that fork [here](https://github.com/bdilday/4thdownbot-model). Questions about that fork should be directed to Ben Dilday. 20 | 21 | ## Python package requirements 22 | 23 | - click 24 | - matplotlib (if you want to visually diagnose your model's performance) 25 | - naked 26 | - numpy 27 | - pandas 28 | - scikit-learn 29 | 30 | ## Usage 31 | 32 | Unzip the play-by-play data into a directory. Run the following code from the directory 33 | where you want the Fourth Down Bot code to live. It will create the subdirectories 34 | `models` and `data` to store files. 35 | 36 | ```bash 37 | python data_prep.py 38 | python model_train.py 39 | ``` 40 | 41 | If you wish to view the calibration plots and ROC curves for the model, run 42 | `model_train` with the `--plot` flag, like so: 43 | 44 | ```bash 45 | python model_train.py --plot 46 | ``` 47 | 48 | There is a rudimentary command line interface for interactively querying 49 | the bot's model, although the model was built to be queried programatically. 50 | Feel free to improve upon this. To query the model interactively, use 51 | the following syntax at the command line and follow the prompts: 52 | 53 | ```bash 54 | python bot.py 55 | ``` 56 | 57 | #### Field goal model 58 | 59 | The bot's field goal model is also accessible as a separate module, via either a node script (see `model-fg/example.js` for details) or the command line. A sample query: 60 | 61 | ```bash 62 | node model-fg/model-fg.js --kicker_code=AH-2600 --temp=40 --wind=10 --yfog=67 --chanceOfRain=10 --is_dome=1 --is_turf=0 63 | ``` 64 | 65 | As an alternative to supplying the [Armchair Analysis](http://armchairanalysis.com/) player code, you can instead specify the team on offense (team codes are fairly standard, but see `model-fg.js` for a lookup table). Similarly, you can supply the home team instead of `is_dome` and `is_turf` arguments: 66 | 67 | ```bash 68 | node model-fg/model-fg.js --offense=PHI --home=NE --temp=40 --wind=10 --yfog=67 --chanceOfRain=10 69 | ``` 70 | 71 | -------------------------------------------------------------------------------- /bot.py: -------------------------------------------------------------------------------- 1 | from collections import OrderedDict 2 | 3 | import click 4 | import pandas as pd 5 | 6 | from sklearn.externals import joblib 7 | from Naked.toolshed.shell import muterun_js 8 | 9 | import winprob as wp 10 | 11 | 12 | def load_data(): 13 | click.echo('Loading data and setting up model.') 14 | data = {} 15 | data['fgs'] = pd.read_csv('data/fgs_grouped.csv') 16 | data['punts'] = pd.read_csv('data/punts_grouped.csv') 17 | data['fd_open_field'] = pd.read_csv('data/fd_open_field.csv') 18 | data['fd_inside_10'] = pd.read_csv('data/fd_inside_10.csv') 19 | data['final_drives'] = pd.read_csv('data/final_drives.csv') 20 | data['decisions'] = pd.read_csv('data/coaches_decisions.csv') 21 | data['scaler'] = joblib.load('models/scaler.pkl') 22 | data['features'] = ['dwn', 'yfog', 'secs_left', 23 | 'score_diff', 'timo', 'timd', 'spread', 24 | 'kneel_down', 'qtr', 'qtr_scorediff'] 25 | 26 | model = joblib.load('models/win_probability.pkl') 27 | return data, model 28 | 29 | def fg_make_prob(situation): 30 | args = ' '.join("--%s=%r" % (key,val) for (key,val) in situation.iteritems()) 31 | model_fg = muterun_js('model-fg/model-fg.js', args) 32 | return model_fg.stdout.split()[-1] 33 | 34 | @click.command() 35 | def run_bot(): 36 | click.echo("\n\n*** Hit CTRL-C to leave the program. *** \n\n") 37 | while True: 38 | situation = OrderedDict.fromkeys(data['features']) 39 | 40 | situation['dwn'] = int(raw_input('Down: ')) 41 | situation['ytg'] = int(raw_input('Yards to go: ')) 42 | situation['yfog'] = int(raw_input('Yards from own goal: ')) 43 | situation['secs_left'] = int(raw_input('Seconds remaining in game: ')) 44 | situation['score_diff'] = int(raw_input("Offense's lead (can be " 45 | "negative): ")) 46 | situation['timo'] = int(raw_input("Timeouts remaining, offense: ")) 47 | situation['timd'] = int(raw_input("Timeouts remaining, defense: ")) 48 | situation['spread'] = float(raw_input("Spread in terms of offensive " 49 | "team (can be negative, enter " 50 | "0 if you don't know): ")) 51 | 52 | situation['dome'] = int(raw_input('Is game in dome? 1 for yes, ' 53 | '0 for no. ')) 54 | situation['offense'] = str(raw_input('Team on offense (eg, PHI): ')) 55 | situation['home'] = str(raw_input('Home team: ')) 56 | situation['temp'] = float(raw_input('Temperature: ')) 57 | situation['wind'] = float(raw_input('Windspeed (mph): ')) 58 | situation['chanceOfRain'] = float(raw_input('Chance of rain (percent): ')) 59 | situation['fg_make_prob'] = float(fg_make_prob(situation)) 60 | 61 | response = wp.generate_response(situation, data, model) 62 | 63 | click.echo(response) 64 | 65 | if __name__ == '__main__': 66 | data, model = load_data() 67 | run_bot() 68 | -------------------------------------------------------------------------------- /data_prep.py: -------------------------------------------------------------------------------- 1 | from __future__ import division, print_function 2 | 3 | import os 4 | 5 | import click 6 | import numpy as np 7 | import pandas as pd 8 | 9 | 10 | def load_games(game_data_fname, remove_ties=False): 11 | """Load data containing results of each game and return a DataFrame. 12 | 13 | Parameters 14 | ---------- 15 | game_data_fname : str, filename of Armchair Analysis GAME table 16 | remove_ties : boolean, optional 17 | 18 | Returns 19 | ------- 20 | games : DataFrame 21 | """ 22 | games = pd.read_csv(game_data_fname, index_col=0) 23 | 24 | # Data from 2000 import is less reliable, omit this season 25 | # and use regular season games only. 26 | 27 | games = (games.query('seas >= 2001 & wk <= 17') 28 | .drop(['stad', 'temp', 'humd', 'wspd', 29 | 'wdir', 'cond', 'surf'], 30 | axis='columns')) 31 | 32 | games['winner'] = games.apply(winner, axis=1) 33 | if remove_ties: 34 | games = games[games['winner'] != 'TIE'] 35 | return games 36 | 37 | 38 | def winner(row): 39 | """Returns the team name that won the game, otherwise returns 'TIE'""" 40 | if row.ptsv > row.ptsh: 41 | return row.v 42 | elif row.ptsh > row.ptsv: 43 | return row.h 44 | else: 45 | return 'TIE' 46 | 47 | 48 | def load_pbp(pbp_data_fname, games, remove_knees=False): 49 | """Load the play by play data and return a DataFrame. 50 | 51 | Parameters 52 | ---------- 53 | pbp_data_fname : str, location of play by play data 54 | games : DataFrame, game-level DataFrame created by load_games 55 | remove_knees : boolean, optional 56 | 57 | Returns 58 | ------- 59 | pbp : DataFrame 60 | """ 61 | pbp = pd.read_csv(pbp_data_fname, index_col=1, low_memory=False, 62 | usecols=['gid', 'pid', 'off', 'def', 'type', 'qtr', 63 | 'min', 'sec', 'kne', 'ptso', 'ptsd', 'timo', 64 | 'timd', 'dwn', 'ytg', 'yfog', 'yds', 'fd', 65 | 'fgxp', 'good', 'pnet', 'pts', 'detail']) 66 | 67 | # Remove overtime 68 | pbp = pbp[pbp.qtr <= 4] 69 | 70 | # pid 183134 should have a value of 0 for min, but has "0:00" 71 | pbp['min'] = pbp['min'].replace({'0:00': 0}) 72 | pbp['min'] = pbp['min'].astype(np.int64) 73 | 74 | # Restrict to regular season games after 2000 75 | pbp = pbp[pbp.gid.isin(games.index)] 76 | 77 | if remove_knees: 78 | pbp = pbp[pbp.kne.isnull()] 79 | return pbp 80 | 81 | 82 | def switch_offense(df): 83 | """Swap game state columns for offense & defense dependent variables. 84 | The play by play data has some statistics on punts and kickoffs in terms 85 | of the receiving team. Switch these to reflect the game state for 86 | the kicking team.""" 87 | 88 | df.loc[(df['type'] == 'PUNT') | (df['type'] == 'KOFF'), 89 | ['off', 'def', 'ptso', 'ptsd', 'timo', 'timd']] = df.loc[ 90 | (df['type'] == 'PUNT') | (df['type'] == 'KOFF'), 91 | ['def', 'off', 'ptsd', 'ptso', 'timd', 'timo']].values 92 | 93 | # If any points are scored on a PUNT/KOFF, they are given in terms 94 | # of the receiving team -- switch this. 95 | 96 | df.loc[(df['type'] == 'PUNT') | (df['type'] == 'KOFF'), 'pts'] = ( 97 | -1 * df.loc[(df['type'] == 'PUNT') | (df['type'] == 'KOFF'), 98 | 'pts'].values) 99 | return df 100 | 101 | 102 | def code_fourth_downs(df): 103 | """Parse all fourth downs and determine if teams intended to go for it, 104 | punt, or attempt a field goal. If intent is not clear, do not include 105 | the play. 106 | """ 107 | 108 | fourths = df.loc[df.dwn == 4, :].copy() 109 | fourths['goforit'] = np.zeros(fourths.shape[0]) 110 | fourths['punt'] = np.zeros(fourths.shape[0]) 111 | fourths['kick'] = np.zeros(fourths.shape[0]) 112 | 113 | # Omit false start, delay of game, encroachment, neutral zone infraction 114 | # We cannot infer from these plays if the offense was going to 115 | # go for it or not. 116 | 117 | omitstring = (r'encroachment|false start|delay of game|neutral zone ' 118 | 'infraction') 119 | fourths = fourths[-(fourths.detail.str.contains(omitstring, case=False))] 120 | 121 | # Ran a play 122 | fourths.loc[(fourths['type'] == 'RUSH') | 123 | (fourths['type'] == 'PASS'), 'goforit'] = 1 124 | 125 | fourths.loc[(fourths['type'] == 'RUSH') | 126 | (fourths['type'] == 'PASS'), 'punt'] = 0 127 | 128 | fourths.loc[(fourths['type'] == 'RUSH') | 129 | (fourths['type'] == 'PASS'), 'kick'] = 0 130 | 131 | # Field goal attempts and punts 132 | fourths.loc[(fourths['type'] == 'FGXP') | 133 | (fourths['type'] == 'PUNT'), 'goforit'] = 0 134 | 135 | fourths.loc[(fourths['type'] == 'FGXP'), 'kick'] = 1 136 | fourths.loc[(fourths['type'] == 'PUNT'), 'punt'] = 1 137 | 138 | # Punted, but penalty on play 139 | puntstring = r'punts|out of bounds' 140 | fourths.loc[(fourths['type'] == 'NOPL') & 141 | (fourths.detail.str.contains(puntstring, case=False)), 142 | 'punt'] = 1 143 | 144 | # Kicked, but penalty on play 145 | kickstring = r'field goal is|field goal attempt' 146 | fourths.loc[(fourths['type'] == 'NOPL') & 147 | (fourths.detail.str.contains(kickstring, case=False)), 148 | 'kick'] = 1 149 | 150 | # Went for it, but penalty on play 151 | gostring = (r'pass to|incomplete|sacked|left end|up the middle|' 152 | 'pass interference|right tackle|right guard|right end|' 153 | 'pass intended|left tackle|left guard|pass deep|' 154 | 'pass short|up the middle') 155 | 156 | fourths.loc[(fourths['type'] == 'NOPL') & 157 | (fourths.detail.str.contains(gostring, case=False)) & 158 | -(fourths.detail.str.contains(puntstring, case=False)) & 159 | -(fourths.detail.str.contains(kickstring, case=False)), 160 | 'goforit'] = 1 161 | 162 | fourths = fourths[fourths[['goforit', 'punt', 'kick']].sum(axis=1) == 1] 163 | return fourths 164 | 165 | 166 | def fg_success_rate(fg_data_fname, out_fname, min_pid=473957): 167 | """Historical field goal success rates by field position. 168 | 169 | By default, uses only attempts from >= 2011 season to reflect 170 | more improved kicker performance. 171 | 172 | Returns and writes results to a CSV. 173 | 174 | NOTE: These are somewhat sparse and irregular at longer FG ranges. 175 | This is because kickers who attempt long FGs are not selected at 176 | random -- they are either in situations which require a long FG 177 | attempt or are kickers with a known long range. The NYT model 178 | uses a logistic regression kicking model developed by Josh Katz 179 | to smooth out these rates. 180 | """ 181 | fgs = pd.read_csv(fg_data_fname) 182 | 183 | fgs = fgs.loc[(fgs.fgxp == 'FG') & (fgs.pid >= min_pid)].copy() 184 | 185 | fgs_grouped = fgs.groupby('dist')['good'].agg( 186 | {'N': len, 'average': np.mean}).reset_index() 187 | 188 | fgs_grouped['yfog'] = 100 - (fgs_grouped.dist - 17) 189 | fgs_grouped[['yfog', 'average']].to_csv(out_fname, index=False) 190 | 191 | return fgs_grouped 192 | 193 | 194 | def nyt_fg_model(fname, outname): 195 | """Sub in simple logit for field goal success rates.""" 196 | fgs = pd.read_csv(fname) 197 | fgs['yfog'] = 100 - (fgs.fg_distance - 17) 198 | fgs.to_csv(outname) 199 | return fgs 200 | 201 | 202 | def punt_averages(punt_data_fname, out_fname, joined): 203 | """Group punts by kicking field position to get average return distance. 204 | Currently does not incorporate the possibility of a muffed punt 205 | or punt returned for a TD. 206 | """ 207 | 208 | punts = pd.read_csv(punt_data_fname, index_col=0) 209 | 210 | punts = pd.merge(punts, joined[['yfog']], 211 | left_index=True, right_index=True) 212 | 213 | punts_dist = pd.DataFrame(punts.groupby('yfog')['pnet'] 214 | .mean().reset_index()) 215 | 216 | punts_dist.to_csv(out_fname, index=False) 217 | return punts_dist 218 | 219 | 220 | def group_coaches_decisions(fourths): 221 | """Group 4th down decisions by score difference and field 222 | position to get coarse historical comparisons. 223 | 224 | Writes these to a CSV and returns them.""" 225 | 226 | df = fourths.copy() 227 | 228 | df['down_by_td'] = (df.score_diff <= -4).astype(np.uint8) 229 | df['up_by_td'] = (df.score_diff >= 4).astype(np.uint8) 230 | df['yfog_bin'] = df.yfog // 20 231 | df['short'] = (df.ytg <= 3).astype(np.uint8) 232 | df['med'] = ((df.ytg >= 4) & (df.ytg <= 7)).astype(np.uint8) 233 | df['long'] = (df.ytg > 7).astype(np.uint8) 234 | 235 | grouped = df.groupby(['down_by_td', 'up_by_td', 'yfog_bin', 236 | 'short', 'med', 'long']) 237 | 238 | goforit = grouped['goforit'].agg({'proportion_went': np.mean, 239 | 'sample_size': len}) 240 | punt = grouped['punt'].agg({'proportion_punted': np.mean, 241 | 'sample_size': len}) 242 | kick = grouped['kick'].agg({'proportion_kicked': np.mean, 243 | 'sample_size': len}) 244 | 245 | decisions = goforit.merge( 246 | punt.merge(kick, left_index=True, right_index=True, 247 | suffixes=['_punt', '_kick']), 248 | left_index=True, right_index=True, suffixes=['_goforit', 'punt']) 249 | 250 | decisions.to_csv('data/coaches_decisions.csv') 251 | return decisions 252 | 253 | 254 | def first_down_rates(df_plays, yfog): 255 | """Find the mean 1st down success rate at a given point in the field. 256 | 257 | Parameters 258 | ---------- 259 | df_plays : DataFrame 260 | yfog : str, must be 'yfog' or 'yfog_bin' 261 | If yfog, use the actual yards from own goal 262 | If yfog_bin, use the decile of the field instead. 263 | """ 264 | 265 | downs = df_plays.copy() 266 | if yfog == 'yfog_bin': 267 | # Break the field into deciles 268 | downs[yfog] = downs.yfog // 10 269 | downs = downs.loc[downs.yfog < 90].copy() 270 | else: 271 | downs = downs.loc[downs.yfog >= 90].copy() 272 | 273 | # For each segment, find the average first down rate by dwn & ytg 274 | grouped = (downs.groupby([yfog, 'dwn', 'ytg'])['first_down'] 275 | .agg({'fdr': np.mean, 'N': len}) 276 | .reset_index()) 277 | 278 | # Just keep 3rd & 4th downs 279 | grouped = grouped.loc[grouped.dwn >= 3].copy() 280 | merged = grouped.merge(grouped, on=[yfog, 'ytg'], how='left') 281 | 282 | # Note this will lose scenarios that have *only* ever seen a 4th down 283 | # This matches to one play since 2001. 284 | merged = merged.loc[(merged.dwn_x == 4) & (merged.dwn_y == 3)].copy() 285 | 286 | # Compute a weighted mean of FDR on 3rd & 4th down to deal with sparsity 287 | merged['weighted_N_x'] = (merged.fdr_x * merged.N_x) 288 | merged['weighted_N_y'] = (merged.fdr_y * merged.N_y) 289 | merged['weighted_total'] = (merged.weighted_N_x + merged.weighted_N_y) 290 | merged['total_N'] = (merged.N_x + merged.N_y) 291 | merged['weighted_fdr'] = (merged.weighted_total / merged.total_N) 292 | merged = merged.drop(labels=['weighted_N_x', 'weighted_N_y', 293 | 'weighted_total', 'total_N'], axis='columns') 294 | merged = merged.rename(columns={'dwn_x': 'dwn'}) 295 | 296 | # Need to fill in any missing combinations where possible 297 | merged = merged.set_index([yfog, 'dwn', 'ytg']) 298 | p = pd.MultiIndex.from_product(merged.index.levels, 299 | names=merged.index.names) 300 | merged = merged.reindex(p, fill_value=None).reset_index() 301 | merged = merged.rename(columns={'weighted_fdr': 'fdr'}) 302 | 303 | # Eliminate impossible combinations 304 | if yfog == 'yfog_bin': 305 | # Sparse situations, just set to p(success) = 0.1 306 | merged.loc[merged.ytg > 13, 'fdr'] = 0.10 307 | 308 | # Missing values inside -10 because no one goes for it here 309 | merged.loc[(merged.fdr_x.isnull()) & (merged.ytg <= 3), 310 | 'fdr'] = .2 311 | merged.loc[(merged.fdr_x.isnull()) & (merged.ytg > 3), 312 | 'fdr'] = .1 313 | 314 | # Fill in missing values 315 | merged['fdr'] = merged['fdr'].interpolate() 316 | merged.to_csv('data/fd_open_field.csv', index=False) 317 | else: 318 | merged = merged.loc[(merged.yfog + merged.ytg <= 100)] 319 | merged.loc[(merged.yfog == 99) & (merged.ytg == 1), 'fdr'] = ( 320 | merged.loc[(merged.yfog == 99) & (merged.ytg == 1), 'fdr_x']) 321 | merged['fdr'] = merged['fdr'].interpolate() 322 | merged.to_csv('data/fd_inside_10.csv', index=False) 323 | return merged 324 | 325 | 326 | def join_df_first_down_rates(df, fd_open_field, fd_inside_10): 327 | """Join the computed first down rates with the play by play data.""" 328 | open_field = df.loc[df.yfog < 90].reset_index().copy() 329 | open_field['yfog_bin'] = open_field.yfog // 10 330 | open_field = open_field.merge( 331 | fd_open_field, on=['yfog_bin', 'dwn', 'ytg'], how='left') 332 | open_field = open_field.drop('yfog_bin', axis='columns') 333 | inside_10 = df.loc[df.yfog >= 90].reset_index().copy() 334 | inside_10 = inside_10.merge( 335 | fd_inside_10, on=['dwn', 'ytg', 'yfog'], how='left') 336 | new_df = pd.concat([open_field, inside_10]).set_index('pid').sort_index() 337 | return new_df 338 | 339 | 340 | def kneel_down(df): 341 | """Code a situation a 1 if the offense can kneel to end the game 342 | based on time remaining, defensive timeouts remaining, 343 | down, and score difference. 344 | """ 345 | df['kneel_down'] = np.zeros(df.shape[0]) 346 | 347 | df.loc[(df.timd == 0) & (df.secs_left <= 120) & (df.dwn == 1) & 348 | (df.score_diff > 0), 'kneel_down'] = 1 349 | df.loc[(df.timd == 1) & (df.secs_left <= 84) & (df.dwn == 1) & 350 | (df.score_diff > 0), 'kneel_down'] = 1 351 | df.loc[(df.timd == 2) & (df.secs_left <= 48) & (df.dwn == 1) & 352 | (df.score_diff > 0), 'kneel_down'] = 1 353 | 354 | df.loc[(df.timd == 0) & (df.secs_left <= 84) & (df.dwn == 2) & 355 | (df.score_diff > 0), 'kneel_down'] = 1 356 | df.loc[(df.timd == 1) & (df.secs_left <= 45) & (df.dwn == 2) & 357 | (df.score_diff > 0), 'kneel_down'] = 1 358 | 359 | df.loc[(df.timd == 0) & (df.secs_left <= 42) & (df.dwn == 3) & 360 | (df.score_diff > 0), 'kneel_down'] = 1 361 | 362 | df.loc[(df.score_diff <= 0) | (df.dwn == 4), 'kneel_down'] = 0 363 | return df 364 | 365 | 366 | def calculate_prob_poss(drive_fname, out_name, games): 367 | """Determine the starting point for the final possession in each 368 | non-overtime, regular season game to use as a proxy for the probability 369 | that the team will have another possession in the game during the 370 | 4th quarter. 371 | 372 | Used to weight the win probabilities in the 4th quarter. 373 | """ 374 | 375 | drives = pd.read_csv(drive_fname, index_col=1) 376 | drives = drives.merge(games[['seas', 'wk']], 377 | left_index=True, right_index=True) 378 | 379 | # Restrict to non-overtime games 380 | 381 | final_qtr = drives.reset_index().groupby('gid')[['qtr']].max() 382 | overtime_games = final_qtr[final_qtr.qtr > 4].index 383 | drives_reduced = drives[-drives.index.isin(overtime_games)].copy() 384 | 385 | # Starting time of final drive of game 386 | 387 | drives_reduced = drives_reduced.loc[drives_reduced.qtr == 4] 388 | final_drive = drives_reduced.reset_index().groupby('gid').last() 389 | final_drive['secs'] = (final_drive['min'] * 60) + final_drive.sec 390 | 391 | # Group and get summary statistics 392 | final_drives = (final_drive.groupby('secs')['fpid'] 393 | .agg({'n': len}) 394 | .reset_index()) 395 | 396 | final_drives['pct'] = final_drives.n / final_drives.n.sum() 397 | final_drives['cum_pct'] = final_drives.pct.cumsum() 398 | 399 | final_drives.to_csv('data/final_drives.csv') 400 | 401 | 402 | @click.command() 403 | @click.argument('pbp_data_location') 404 | def main(pbp_data_location): 405 | pd.set_option('display.max_columns', 200) 406 | pd.set_option('display.max_colwidth', 200) 407 | pd.set_option('display.width', 200) 408 | 409 | if not os.path.exists('data'): 410 | click.echo('Making data directory.') 411 | os.mkdir('data') 412 | 413 | click.echo('Loading game data.') 414 | games = load_games('{}/GAME.csv'.format(pbp_data_location)) 415 | click.echo('Loading play by play data.') 416 | pbp = load_pbp('{}/PBP.csv'.format(pbp_data_location), 417 | games, remove_knees=False) 418 | 419 | click.echo('Joining game and play by play data.') 420 | joined = pbp.merge(games, left_on='gid', right_index=True) 421 | 422 | # Switch offensive and defensive stats on PUNT/KOFF 423 | click.echo('Munging data...') 424 | joined = switch_offense(joined) 425 | 426 | # Modify the spread so that the sign is negative when the offense 427 | # is favored and positive otherwise 428 | 429 | joined['spread'] = joined.sprv 430 | joined.loc[joined.off != joined.v, 'spread'] = ( 431 | -1.0 * joined.loc[joined.off != joined.v, 'spread']) 432 | 433 | # For model purposes, touchdowns are "first downs" (successful conversion) 434 | 435 | joined['first_down'] = (joined.fd.notnull()) | (joined.pts >= 6) 436 | 437 | # Add winners for classification task 438 | joined['win'] = (joined.off == joined.winner).astype(np.uint8) 439 | 440 | # Features needed for the win probability model 441 | joined['score_diff'] = joined.ptso - joined.ptsd 442 | joined['secs_left'] = (((4 - joined.qtr) * 15.0) * 60 + 443 | (joined['min'] * 60) + joined.sec) 444 | 445 | # Group all fourth downs that indicate if the team went for it or not 446 | # by down, yards to go, and yards from own goal 447 | 448 | click.echo('Processing fourth downs.') 449 | fourths = code_fourth_downs(joined) 450 | 451 | # Merge the goforit column back into all plays, not just fourth downs 452 | joined = joined.merge(fourths[['goforit']], left_index=True, 453 | right_index=True, how='left') 454 | 455 | click.echo('Grouping and saving historical 4th down decisions.') 456 | decisions = group_coaches_decisions(fourths) 457 | fourths_grouped = fourths.groupby(['dwn', 'ytg', 'yfog'])['goforit'].agg( 458 | {'N': len, 'mean': np.mean}) 459 | fourths_grouped.to_csv('data/fourths_grouped.csv', index=False) 460 | 461 | # Remove kickoffs and extra points, retain FGs 462 | joined = joined[(joined['type'] != 'KOFF') & (joined.fgxp != 'XP')] 463 | 464 | click.echo('Grouping and saving field goal attempts and punts.') 465 | fgs_grouped = fg_success_rate('{}/FGXP.csv'.format(pbp_data_location), 466 | 'data/fgs_grouped.csv') 467 | punt_dist = punt_averages('{}/PUNT.csv'.format(pbp_data_location), 468 | 'data/punts_grouped.csv', joined) 469 | 470 | # Code situations where the offense can take a knee(s) to win 471 | click.echo('Coding kneel downs.') 472 | joined = kneel_down(joined) 473 | 474 | click.echo('Computing first down rates.') 475 | 476 | # Only rush & pass plays that were actually executed are eligible 477 | # for computing first down success rates. 478 | 479 | df_plays = joined.loc[joined['type'].isin(['PASS', 'RUSH']), :].copy() 480 | fd_open_field = first_down_rates(df_plays, 'yfog_bin') 481 | fd_inside_10 = first_down_rates(df_plays, 'yfog') 482 | joined = join_df_first_down_rates(joined, fd_open_field, fd_inside_10) 483 | 484 | click.echo('Calculating final drive statistics.') 485 | final_drives = calculate_prob_poss( 486 | '{}/DRIVE.csv'.format(pbp_data_location), 487 | 'data/final_drives.csv', games) 488 | 489 | click.echo('Writing cleaned play-by-play data.') 490 | joined.to_csv('data/pbp_cleaned.csv') 491 | 492 | if __name__ == '__main__': 493 | main() 494 | -------------------------------------------------------------------------------- /model-fg/example.js: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env node 2 | var modelFG = require('../model-fg/model-fg'); 3 | 4 | var exampleData = { 5 | kicker_code: "AH-2600", 6 | temp: 40, // temperature (degrees F) 7 | wind: 10, // wind speed (mph) 8 | yfog: 67, // a 50-yard field goal 9 | chanceOfRain: 10, // percentage 10 | is_dome: 1, // binary indicator (1 == dome) 11 | is_turf: 1 // binary indicator (1 == turf) 12 | }; 13 | 14 | // if in a dome, bot ignores weather variables 15 | var prob1 = modelFG.calculateProb(exampleData); 16 | 17 | // second argument to alter specific parts of previously defined scenario 18 | // let's move this game to Denver... 19 | var prob2 = modelFG.calculateProb(exampleData, { home: "DEN" }); 20 | // note: supplying 'home' key automatically populates 'is_dome' and 'is_turf' 21 | 22 | // supply 'offense' to have bot find the kicker code for you 23 | var prob3 = modelFG.calculateProb(exampleData, { 24 | temp: 10, 25 | wind: 20, 26 | chanceOfRain: 100, 27 | home: "GB", 28 | offense: "BAL" 29 | }) 30 | 31 | console.log([prob1, prob2, prob3]) -------------------------------------------------------------------------------- /model-fg/model-fg.js: -------------------------------------------------------------------------------- 1 | (function() { 2 | 3 | function init(_) { 4 | 5 | var modelFG = { 6 | 7 | calculateProb: function(d, situation) { 8 | var situation = situation || {}; 9 | var cloned = _.chain(d) 10 | .clone() 11 | .extend(situation) 12 | .value(); 13 | // override roof/surface variables if object has 'home' key 14 | if ( _.has(cloned, "home") ) cloned.is_dome = this.lookup[cloned.home].roofType !== "open"; 15 | if ( _.has(cloned, "home") ) cloned.is_turf = this.lookup[cloned.home].surfaceType === "turf"; 16 | // assign kicker_code if object has 'offense' key 17 | if ( _.has(cloned, "offense") ) cloned.kicker_code = this.lookup[cloned.offense].kickerCode; 18 | 19 | var kickerTerm = this.kickerAdjust[cloned.kicker_code] || 0; 20 | var par = this.terms.parametric; 21 | var smoothTerm = _.findWhere(this.terms.smooth, {yfog: Number(cloned.yfog)}).term; 22 | var weather = { 23 | temp: [0, 100, cloned.temp].sort()[1], 24 | wind: cloned.wind, 25 | chanceOfRain: Math.min(50, cloned.chanceOfRain) 26 | }; 27 | // put it all together 28 | var linearPredictor = kickerTerm + smoothTerm + 29 | par.isDomeTRUE * cloned.is_dome + 30 | par.isTurfTRUE * cloned.is_turf + 31 | par.sqrtGameTemp * ((1 - cloned.is_dome) * Math.sqrt(weather.temp)) + 32 | par.sqrtWindSpeed * ((1 - cloned.is_dome) * Math.sqrt(weather.wind)) + 33 | par.isRainingTRUE * ((1 - cloned.is_dome) * weather.chanceOfRain / 50) + 34 | par.highAltitudeTRUE * (cloned.home == "DEN"); 35 | // convert from log-odds to probability 36 | var prob = Math.exp(linearPredictor) / (1 + Math.exp(linearPredictor)) 37 | return Math.round(100000*prob) / 100000; // round to 5 decimal places 38 | }, 39 | 40 | // team/kicker lookup table // 41 | lookup: { 42 | "ARI": { 43 | "teamCode": "ARI", 44 | "city": "Arizona", 45 | "teamName": "Cardinals", 46 | "fullName": "Arizona Cardinals", 47 | "roofType": "retractable", 48 | "surfaceType": "grass", 49 | "kickerName": "Chandler Catanzaro", 50 | "kickerCode": "CC-1150" 51 | }, 52 | "ATL": { 53 | "teamCode": "ATL", 54 | "city": "Atlanta", 55 | "teamName": "Falcons", 56 | "fullName": "Atlanta Falcons", 57 | "roofType": "dome", 58 | "surfaceType": "turf", 59 | "kickerName": "Matt Bryant", 60 | "kickerCode": "MB-4600" 61 | }, 62 | "BAL": { 63 | "teamCode": "BAL", 64 | "city": "Baltimore", 65 | "teamName": "Ravens", 66 | "fullName": "Baltimore Ravens", 67 | "roofType": "open", 68 | "surfaceType": "turf", 69 | "kickerName": "Justin Tucker", 70 | "kickerCode": "JT-3950" 71 | }, 72 | "BUF": { 73 | "teamCode": "BUF", 74 | "city": "Buffalo", 75 | "teamName": "Bills", 76 | "fullName": "Buffalo Bills", 77 | "roofType": "open", 78 | "surfaceType": "turf", 79 | "kickerName": "Dan Carpenter", 80 | "kickerCode": "DC-0500" 81 | }, 82 | "CAR": { 83 | "teamCode": "CAR", 84 | "city": "Carolina", 85 | "teamName": "Panthers", 86 | "fullName": "Carolina Panthers", 87 | "roofType": "open", 88 | "surfaceType": "grass", 89 | "kickerName": "Graham Gano", 90 | "kickerCode": "GG-0100" 91 | }, 92 | "CHI": { 93 | "teamCode": "CHI", 94 | "city": "Chicago", 95 | "teamName": "Bears", 96 | "fullName": "Chicago Bears", 97 | "roofType": "open", 98 | "surfaceType": "grass", 99 | "kickerName": "Robbie Gould", 100 | "kickerCode": "RG-1500" 101 | }, 102 | "CIN": { 103 | "teamCode": "CIN", 104 | "city": "Cincinnati", 105 | "teamName": "Bengals", 106 | "fullName": "Cincinnati Bengals", 107 | "roofType": "open", 108 | "surfaceType": "turf", 109 | "kickerName": "Mike Nugent", 110 | "kickerCode": "MN-0800" 111 | }, 112 | "CLE": { 113 | "teamCode": "CLE", 114 | "city": "Cleveland", 115 | "teamName": "Browns", 116 | "fullName": "Cleveland Browns", 117 | "roofType": "open", 118 | "surfaceType": "grass", 119 | "kickerName": "Travis Coons", 120 | "kickerCode": "TC-2450" 121 | }, 122 | "DAL": { 123 | "teamCode": "DAL", 124 | "city": "Dallas", 125 | "teamName": "Cowboys", 126 | "fullName": "Dallas Cowboys", 127 | "roofType": "retractable", 128 | "surfaceType": "turf", 129 | "kickerName": "Dan Bailey", 130 | "kickerCode": "DB-0200" 131 | }, 132 | "DEN": { 133 | "teamCode": "DEN", 134 | "city": "Denver", 135 | "teamName": "Broncos", 136 | "fullName": "Denver Broncos", 137 | "roofType": "open", 138 | "surfaceType": "grass", 139 | "kickerName": "Brandon McManus", 140 | "kickerCode": "BM-1650" 141 | }, 142 | "DET": { 143 | "teamCode": "DET", 144 | "city": "Detroit", 145 | "teamName": "Lions", 146 | "fullName": "Detroit Lions", 147 | "roofType": "dome", 148 | "surfaceType": "turf", 149 | "kickerName": "Matt Prater", 150 | "kickerCode": "MP-2100" 151 | }, 152 | "GB": { 153 | "teamCode": "GB", 154 | "city": "Green Bay", 155 | "teamName": "Packers", 156 | "fullName": "Green Bay Packers", 157 | "roofType": "open", 158 | "surfaceType": "grass", 159 | "kickerName": "Mason Crosby", 160 | "kickerCode": "MC-3000" 161 | }, 162 | "HOU": { 163 | "teamCode": "HOU", 164 | "city": "Houston", 165 | "teamName": "Texans", 166 | "fullName": "Houston Texans", 167 | "roofType": "retractable", 168 | "surfaceType": "grass", 169 | "kickerName": "Randy Bullock", 170 | "kickerCode": "RB-4650" 171 | }, 172 | "IND": { 173 | "teamCode": "IND", 174 | "city": "Indianapolis", 175 | "teamName": "Colts", 176 | "fullName": "Indianapolis Colts", 177 | "roofType": "retractable", 178 | "surfaceType": "turf", 179 | "kickerName": "Adam Vinatieri", 180 | "kickerCode": "AV-0400" 181 | }, 182 | "JAX": { 183 | "teamCode": "JAX", 184 | "city": "Jacksonville", 185 | "teamName": "Jaguars", 186 | "fullName": "Jacksonville Jaguars", 187 | "roofType": "open", 188 | "surfaceType": "grass", 189 | "kickerName": "Jason Myers", 190 | "kickerCode": "JM-7000" 191 | }, 192 | "KC": { 193 | "teamCode": "KC", 194 | "city": "Kansas City", 195 | "teamName": "Chiefs", 196 | "fullName": "Kansas City Chiefs", 197 | "roofType": "open", 198 | "surfaceType": "grass", 199 | "kickerName": "Cairo Santos", 200 | "kickerCode": "CS-0250" 201 | }, 202 | "MIA": { 203 | "teamCode": "MIA", 204 | "city": "Miami", 205 | "teamName": "Dolphins", 206 | "fullName": "Miami Dolphins", 207 | "roofType": "open", 208 | "surfaceType": "grass", 209 | "kickerName": "Andrew Franks", 210 | "kickerCode": "AF-1150" 211 | }, 212 | "MIN": { 213 | "teamCode": "MIN", 214 | "city": "Minnesota", 215 | "teamName": "Vikings", 216 | "fullName": "Minnesota Vikings", 217 | "roofType": "open", 218 | "surfaceType": "turf", 219 | "kickerName": "Blair Walsh", 220 | "kickerCode": "BW-0350" 221 | }, 222 | "NE": { 223 | "teamCode": "NE", 224 | "city": "New England", 225 | "teamName": "Patriots", 226 | "fullName": "New England Patriots", 227 | "roofType": "open", 228 | "surfaceType": "turf", 229 | "kickerName": "Stephen Gostkowski", 230 | "kickerCode": "SG-0800" 231 | }, 232 | "NO": { 233 | "teamCode": "NO", 234 | "city": "New Orleans", 235 | "teamName": "Saints", 236 | "fullName": "New Orleans Saints", 237 | "roofType": "dome", 238 | "surfaceType": "turf", 239 | "kickerName": "Zach Hocker", 240 | "kickerCode": "ZH-0150" 241 | }, 242 | "NYG": { 243 | "teamCode": "NYG", 244 | "city": "New York", 245 | "teamName": "Giants", 246 | "fullName": "New York Giants", 247 | "roofType": "open", 248 | "surfaceType": "turf", 249 | "kickerName": "Josh Brown", 250 | "kickerCode": "JB-7100" 251 | }, 252 | "NYJ": { 253 | "teamCode": "NYJ", 254 | "city": "New York", 255 | "teamName": "Jets", 256 | "fullName": "New York Jets", 257 | "roofType": "open", 258 | "surfaceType": "turf", 259 | "kickerName": "Nick Folk", 260 | "kickerCode": "NF-0300" 261 | }, 262 | "OAK": { 263 | "teamCode": "OAK", 264 | "city": "Oakland", 265 | "teamName": "Raiders", 266 | "fullName": "Oakland Raiders", 267 | "roofType": "open", 268 | "surfaceType": "grass", 269 | "kickerName": "Sebastian Janikowski", 270 | "kickerCode": "SJ-0300" 271 | }, 272 | "PHI": { 273 | "teamCode": "PHI", 274 | "city": "Philadelphia", 275 | "teamName": "Eagles", 276 | "fullName": "Philadelphia Eagles", 277 | "roofType": "open", 278 | "surfaceType": "grass", 279 | "kickerName": "Cody Parkey", 280 | "kickerCode": "CP-0575" 281 | }, 282 | "PIT": { 283 | "teamCode": "PIT", 284 | "city": "Pittsburgh", 285 | "teamName": "Steelers", 286 | "fullName": "Pittsburgh Steelers", 287 | "roofType": "open", 288 | "surfaceType": "grass", 289 | "kickerName": "Josh Scobee", 290 | "kickerCode": "JS-1100" 291 | }, 292 | "SD": { 293 | "teamCode": "SD", 294 | "city": "San Diego", 295 | "teamName": "Chargers", 296 | "fullName": "San Diego Chargers", 297 | "roofType": "open", 298 | "surfaceType": "grass", 299 | "kickerName": "Josh Lambo", 300 | "kickerCode": "JL-0207" 301 | }, 302 | "SEA": { 303 | "teamCode": "SEA", 304 | "city": "Seattle", 305 | "teamName": "Seahawks", 306 | "fullName": "Seattle Seahawks", 307 | "roofType": "open", 308 | "surfaceType": "turf", 309 | "kickerName": "Steven Hauschka", 310 | "kickerCode": "SH-0400" 311 | }, 312 | "SF": { 313 | "teamCode": "SF", 314 | "city": "San Francisco", 315 | "teamName": "49ers", 316 | "fullName": "San Francisco 49ers", 317 | "roofType": "open", 318 | "surfaceType": "grass", 319 | "kickerName": "Phil Dawson", 320 | "kickerCode": "PD-0200" 321 | }, 322 | "STL": { 323 | "teamCode": "STL", 324 | "city": "St. Louis", 325 | "teamName": "Rams", 326 | "fullName": "St. Louis Rams", 327 | "roofType": "dome", 328 | "surfaceType": "turf", 329 | "kickerName": "Greg Zuerlein", 330 | "kickerCode": "GZ-2000" 331 | }, 332 | "TB": { 333 | "teamCode": "TB", 334 | "city": "Tampa Bay", 335 | "teamName": "Buccaneers", 336 | "fullName": "Tampa Bay Buccaneers", 337 | "roofType": "open", 338 | "surfaceType": "grass", 339 | "kickerName": "Kyle Brindza", 340 | "kickerCode": "KB-1850" 341 | }, 342 | "TEN": { 343 | "teamCode": "TEN", 344 | "city": "Tennessee", 345 | "teamName": "Titans", 346 | "fullName": "Tennessee Titans", 347 | "roofType": "open", 348 | "surfaceType": "grass", 349 | "kickerName": "Ryan Succop", 350 | "kickerCode": "RS-3400" 351 | }, 352 | "WAS": { 353 | "teamCode": "WAS", 354 | "city": "Washington", 355 | "teamName": "Redskins", 356 | "fullName": "Washington Redskins", 357 | "roofType": "open", 358 | "surfaceType": "grass", 359 | "kickerName": "Dustin Hopkins", 360 | "kickerCode": "DH-3970" 361 | } 362 | }, 363 | 364 | // model coefficients // 365 | kickerAdjust: {"AE-0700":-0.1951,"AH-2600":-0.0836,"AP-1000":-0.2041,"AV-0400":0.2272,"BC-2300":0.0041,"BC-2600":-0,"BC-3000":-0.3186,"BG-1300":-0.1757,"BM-1650":-0.301,"BW-0350":0.1633,"CB-0700":0.2317,"CC-1150":0.0342,"CH-2900":0.0484,"CP-0575":0.0436,"CS-0250":-0.0485,"CS-4000":-0.0167,"CS-4250":-0.2075,"DA-0300":0.0659,"DB-0200":0.3344,"DB-3500":0.0167,"DB-3900":0.0193,"DB-5500":-0.0601,"DC-0500":0.2197,"DR-0600":-0.2617,"GA-0300":0.0973,"GG-0100":-0.1554,"GH-0600":-0.1825,"GZ-2000":0.1333,"HE-0100":-0.2257,"JB-7100":0.1269,"JC-0900":0.042,"JC-2100":-0.1896,"JC-5000":-0.3189,"JE-0200":0.0056,"JF-0900":0.1023,"JH-0500":-0.0306,"JH-0900":0.335,"JH-3800":-0.1195,"JK-0200":0.2664,"JM-4000":-0.2487,"JN-0600":0.3579,"JP-3850":0.0272,"JR-1100":0.0911,"JS-1100":0.1124,"JT-1000":-0.0011,"JT-3950":0.4197,"JT-4400":-0.2207,"JW-3300":0.2515,"KB-2300":0.0114,"KF-0250":0.1381,"LT-1100":-0.1541,"MA-0700":0.1338,"MB-4600":0.1103,"MC-3000":0.0307,"MG-1200":-0.2808,"MH-3100":0.1039,"MH-3900":-0.0013,"MK-1100":-0.088,"MK-1200":-0.5071,"MN-0800":-0.2053,"MP-2100":-0.1346,"MS-0600":0.0385,"MS-5200":0.1906,"MV-0100":0.1394,"NF-0300":-0.1343,"NF-0400":-0.3396,"NK-0200":0.2193,"NN-0200":-0.09,"NR-0100":0.3045,"OK-0100":-0.2002,"OM-0100":-0.148,"OP-0200":-0.3293,"PD-0200":0.3593,"PE-0100":-0.0503,"PM-1300":0.0802,"RB-2200":0.394,"RB-4650":-0.0603,"RC-2900":0.0278,"RG-1500":0.3412,"RL-0900":-0.0026,"RL-1300":0.2714,"RS-0600":-0.0788,"RS-3400":-0.0567,"SA-1100":-0.2379,"SC-0700":-0.1198,"SG-0800":0.0828,"SG-1100":0.2684,"SH-0400":0.0414,"SJ-0300":0.4307,"SM-0600":-0.4275,"SS-3100":0.0324,"TD-2400":-0.2218,"TF-1200":0.0302,"TM-2400":-0.0693,"TP-1200":0.0444,"TS-1100":-0.1911,"WR-0500":0.0342,"WW-0200":0.0585,"none":0}, 366 | terms: {"smooth":[{"yfog":100,"term":4.249286},{"yfog":99,"term":4.037548},{"yfog":98,"term":3.826787},{"yfog":97,"term":3.618663},{"yfog":96,"term":3.414203},{"yfog":95,"term":3.214431},{"yfog":94,"term":3.020376},{"yfog":93,"term":2.833062},{"yfog":92,"term":2.653517},{"yfog":91,"term":2.482766},{"yfog":90,"term":2.321668},{"yfog":89,"term":2.17005},{"yfog":88,"term":2.027343},{"yfog":87,"term":1.892978},{"yfog":86,"term":1.766389},{"yfog":85,"term":1.647005},{"yfog":84,"term":1.534258},{"yfog":83,"term":1.42758},{"yfog":82,"term":1.326393},{"yfog":81,"term":1.230019},{"yfog":80,"term":1.137725},{"yfog":79,"term":1.048778},{"yfog":78,"term":0.962444},{"yfog":77,"term":0.87799},{"yfog":76,"term":0.794683},{"yfog":75,"term":0.711788},{"yfog":74,"term":0.628527},{"yfog":73,"term":0.543361},{"yfog":72,"term":0.454118},{"yfog":71,"term":0.358609},{"yfog":70,"term":0.254644},{"yfog":69,"term":0.140034},{"yfog":68,"term":0.012588},{"yfog":67,"term":-0.129883},{"yfog":66,"term":-0.289593},{"yfog":65,"term":-0.469523},{"yfog":64,"term":-0.673581},{"yfog":63,"term":-0.905733},{"yfog":62,"term":-1.16994},{"yfog":61,"term":-1.470168},{"yfog":60,"term":-1.81038},{"yfog":59,"term":-2.19454},{"yfog":58,"term":-2.626573},{"yfog":57,"term":-3.107103},{"yfog":56,"term":-3.630938},{"yfog":55,"term":-4.19229},{"yfog":54,"term":-4.785376},{"yfog":53,"term":-5.404408},{"yfog":52,"term":-6.043601},{"yfog":51,"term":-6.697169},{"yfog":50,"term":-7.359331},{"yfog":49,"term":-8.02562},{"yfog":48,"term":-8.695036},{"yfog":47,"term":-9.367138},{"yfog":46,"term":-10.041483},{"yfog":45,"term":-10.717632},{"yfog":44,"term":-11.395143},{"yfog":43,"term":-12.073575},{"yfog":42,"term":-12.752487},{"yfog":41,"term":-13.431492},{"yfog":40,"term":-14.110496},{"yfog":39,"term":-14.789501},{"yfog":38,"term":-15.468505},{"yfog":37,"term":-16.14751},{"yfog":36,"term":-16.826515},{"yfog":35,"term":-17.505519},{"yfog":34,"term":-18.184524},{"yfog":33,"term":-18.863529},{"yfog":32,"term":-19.542533},{"yfog":31,"term":-20.221538},{"yfog":30,"term":-20.900542},{"yfog":29,"term":-21.579547},{"yfog":28,"term":-22.258552},{"yfog":27,"term":-22.937556},{"yfog":26,"term":-23.616561},{"yfog":25,"term":-24.295566},{"yfog":24,"term":-24.97457},{"yfog":23,"term":-25.653575},{"yfog":22,"term":-26.332579},{"yfog":21,"term":-27.011584},{"yfog":20,"term":-27.690589},{"yfog":19,"term":-28.369593},{"yfog":18,"term":-29.048598},{"yfog":17,"term":-29.727603},{"yfog":16,"term":-30.406607},{"yfog":15,"term":-31.085612},{"yfog":14,"term":-31.764616},{"yfog":13,"term":-32.443621},{"yfog":12,"term":-33.122626},{"yfog":11,"term":-33.80163},{"yfog":10,"term":-34.480635},{"yfog":9,"term":-35.15964},{"yfog":8,"term":-35.838644},{"yfog":7,"term":-36.517649},{"yfog":6,"term":-37.196653},{"yfog":5,"term":-37.875658},{"yfog":4,"term":-38.554663},{"yfog":3,"term":-39.233667},{"yfog":2,"term":-39.912672},{"yfog":1,"term":-40.591677},{"yfog":0,"term":-41.270681}],"parametric":{"sqrtGameTemp":0.135653,"sqrtWindSpeed":-0.127033,"isDomeTRUE":0.727479,"isTurfTRUE":0.284092,"highAltitudeTRUE":0.680362,"isRainingTRUE":-0.285889}} 367 | 368 | }; 369 | 370 | return modelFG; 371 | 372 | } 373 | 374 | // if called from command line or python, write probability to stdout 375 | if (!module.parent) { 376 | var argv = require('minimist')(process.argv.slice(2)); 377 | var fgMakeProb = init(require('underscore')).calculateProb(argv); 378 | console.log("prob of making FG: ") 379 | process.stdout.write(fgMakeProb.toString()) 380 | console.log("") 381 | } 382 | 383 | if (typeof define === "function" && define.amd) define(['underscore'], init); 384 | else if (typeof module === "object" && module.exports) { 385 | module.exports = init(require('underscore')); 386 | } 387 | 388 | })(); -------------------------------------------------------------------------------- /model_train.py: -------------------------------------------------------------------------------- 1 | from __future__ import division, print_function 2 | 3 | import os 4 | 5 | import click 6 | import matplotlib.pyplot as plt 7 | import numpy as np 8 | import pandas as pd 9 | 10 | from sklearn.cross_validation import train_test_split 11 | from sklearn.externals import joblib 12 | from sklearn.linear_model import LogisticRegression 13 | from sklearn.metrics import (auc, classification_report, 14 | f1_score, log_loss, roc_curve) 15 | from sklearn.preprocessing import StandardScaler 16 | 17 | 18 | def calibration_plot(preds, truth): 19 | """Produces a calibration plot for the win probability model. 20 | 21 | Splits the predictions into percentiles and calculates the 22 | percentage of predictions per percentile that were wins. A perfectly 23 | calibrated model means that plays with a win probability of n% 24 | win about n% of the time. 25 | """ 26 | cal_df = pd.DataFrame({'pred': preds, 'win': truth}) 27 | cal_df['pred_bin'] = pd.cut(cal_df.pred, 100, labels=False) 28 | 29 | win_means = cal_df.groupby('pred_bin')['win'].mean() 30 | 31 | plt.figure() 32 | plt.plot(win_means.index.values, 33 | [100 * v for v in win_means.values], color='SteelBlue') 34 | plt.plot(np.arange(0, 100), np.arange(0, 100), 'k--', alpha=0.3) 35 | plt.xlim([0.0, 100]) 36 | plt.ylim([0.0, 100]) 37 | plt.xlabel('Estimated win probability') 38 | plt.ylabel('True win percentage') 39 | plt.title('Win probability calibration, binned by percent') 40 | plt.show() 41 | 42 | return 43 | 44 | 45 | def plot_roc(fpr, tpr, roc_auc): 46 | """Plots the ROC curve for the win probability model along with 47 | the AUC. 48 | """ 49 | fig, ax = plt.subplots() 50 | ax.set(title='Receiver Operating Characteristic', 51 | xlim=[0, 1], ylim=[0, 1], xlabel='False Positive Rate', 52 | ylabel='True Positive Rate') 53 | ax.plot(fpr, tpr, 'b', label='AUC = %0.2f' % roc_auc) 54 | ax.plot([0, 1], [0, 1], 'k--') 55 | ax.legend(loc='lower right') 56 | plt.show() 57 | 58 | 59 | @click.command() 60 | @click.option('--plot/--no-plot', default=False) 61 | def main(plot): 62 | pd.set_option('display.max_columns', 200) 63 | 64 | # Only train on actual plays, remove 2pt conversion attempts 65 | click.echo('Reading play by play data.') 66 | df = pd.read_csv('data/pbp_cleaned.csv', index_col=0) 67 | df_plays = df.loc[(df['type'] != 'CONV')].copy() 68 | 69 | # Custom features 70 | # Interaction between qtr & score difference -- late score differences 71 | # are more important than early ones. 72 | df_plays['qtr_scorediff'] = df_plays.qtr * df_plays.score_diff 73 | 74 | # Decay effect of spread over course of game 75 | df_plays['spread'] = df_plays.spread * (df_plays.secs_left / 3600) 76 | 77 | # Features to use in the model 78 | features = ['dwn', 'yfog', 'secs_left', 79 | 'score_diff', 'timo', 'timd', 'spread', 80 | 'kneel_down', 'qtr', 81 | 'qtr_scorediff'] 82 | target = 'win' 83 | 84 | click.echo('Splitting data into train/test sets.') 85 | (train_X, test_X, train_y, test_y) = train_test_split(df_plays[features], 86 | df_plays[target], 87 | test_size=0.1) 88 | 89 | click.echo('Scaling features.') 90 | scaler = StandardScaler() 91 | scaler.fit(train_X) 92 | train_X_scaled = scaler.transform(train_X) 93 | 94 | click.echo('Training model.') 95 | logit = LogisticRegression() 96 | logit.fit(train_X_scaled, train_y) 97 | 98 | click.echo('Making predictions on test set.') 99 | test_X_scaled = scaler.transform(test_X) 100 | preds = logit.predict_proba(test_X_scaled)[:, 1] 101 | 102 | click.echo('Evaluating model performance.') 103 | fpr, tpr, thresholds = roc_curve(test_y, preds) 104 | roc_auc = auc(fpr, tpr) 105 | click.echo('AUC: {}'.format(roc_auc)) 106 | click.echo('Log loss: {}'.format(log_loss(test_y, preds))) 107 | 108 | pred_outcomes = logit.predict(test_X_scaled) 109 | click.echo(classification_report(test_y, pred_outcomes)) 110 | click.echo('F1 score: {}'.format(f1_score(test_y, pred_outcomes))) 111 | 112 | if plot: 113 | click.echo('Plotting ROC curve and calibration plot.') 114 | click.echo('Note: plots may appear behind current active window.') 115 | plot_roc(fpr, tpr, roc_auc) 116 | calibration_plot(preds, test_y) 117 | 118 | click.echo('Pickling model and scaler.') 119 | if not os.path.exists('models'): 120 | os.mkdir('models') 121 | 122 | joblib.dump(logit, 'models/win_probability.pkl') 123 | joblib.dump(scaler, 'models/scaler.pkl') 124 | 125 | if __name__ == '__main__': 126 | main() 127 | -------------------------------------------------------------------------------- /package.json: -------------------------------------------------------------------------------- 1 | { 2 | "name": "4thdownbot-model", 3 | "version": "0.0.1", 4 | "description": "Analyzes fourth down decisions. With math.", 5 | "homepage": "http://nyt4thdownbot.com/", 6 | "license": "MIT", 7 | "repository": { 8 | "type": "git", 9 | "url": "http://github.com/TheUpshot/4thdownbot-model/" 10 | }, 11 | "devDependencies": { 12 | "minimist": "~1.2.0", 13 | "underscore": "~1.8.0" 14 | } 15 | } 16 | -------------------------------------------------------------------------------- /plays.py: -------------------------------------------------------------------------------- 1 | from __future__ import division, print_function 2 | 3 | from collections import OrderedDict 4 | 5 | import numpy as np 6 | 7 | 8 | def kneel_down(score_diff, timd, secs_left, dwn): 9 | """Return 1 if the offense can definitely kneel out the game, 10 | else return 0.""" 11 | 12 | if score_diff <= 0 or dwn == 4: 13 | return 0 14 | 15 | if timd == 0 and secs_left <= 120 and dwn == 1: 16 | return 1 17 | if timd == 1 and secs_left <= 87 and dwn == 1: 18 | return 1 19 | if timd == 2 and secs_left <= 48 and dwn == 1: 20 | return 1 21 | 22 | if timd == 0 and secs_left <= 84 and dwn == 2: 23 | return 1 24 | if timd == 1 and secs_left <= 45 and dwn == 2: 25 | return 1 26 | 27 | if timd == 0 and secs_left <= 42 and dwn == 3: 28 | return 1 29 | 30 | return 0 31 | 32 | 33 | def change_poss(situation, play_type, features, **kwargs): 34 | """Handles situation updating for all plays that involve 35 | a change of possession, including punts, field goals, 36 | missed field goals, touchdowns, turnover on downs. 37 | 38 | Parameters 39 | ---------- 40 | situation : OrderedDict 41 | play_type : function 42 | features : list[str] 43 | 44 | Returns 45 | ------- 46 | new_situation : OrderedDict 47 | """ 48 | 49 | new_situation = OrderedDict.fromkeys(features) 50 | 51 | # Nearly all changes of possession result in a 1st & 10 52 | # Doesn't cover the edge case of a turnover within own 10 yardline. 53 | new_situation['dwn'] = 1 54 | new_situation['ytg'] = 10 55 | 56 | # Assumes 10 seconds of game clock have elapsed per play 57 | # Could tune this. 58 | new_situation['secs_left'] = max([situation['secs_left'] - 10, 0]) 59 | new_situation['qtr'] = qtr(new_situation['secs_left']) 60 | 61 | # Assign timeouts to the correct teams 62 | new_situation['timo'], new_situation['timd'] = ( 63 | situation['timd'], situation['timo']) 64 | 65 | # Valid types are turnover_downs, punt, field_goal, 66 | # missed_field_goal, touchdown 67 | 68 | # Any score changes are handled here 69 | new_situation = play_type(situation, new_situation, **kwargs) 70 | 71 | # Change sign on spread, recompute over-under 72 | new_situation['spread'] = -1 * situation['spread'] + 0 73 | 74 | # Avoid negative zeros 75 | if new_situation.get('score_diff') is None: 76 | new_situation['score_diff'] = int(-1 * situation['score_diff']) 77 | else: 78 | new_situation['score_diff'] = int(-1 * new_situation['score_diff']) 79 | 80 | new_situation['kneel_down'] = kneel_down(new_situation['score_diff'], 81 | new_situation['timd'], 82 | new_situation['secs_left'], 83 | new_situation['dwn']) 84 | 85 | new_situation['qtr_scorediff'] = ( 86 | new_situation['qtr'] * new_situation['score_diff']) 87 | 88 | return new_situation 89 | 90 | 91 | def field_goal(situation, new_situation, **kwargs): 92 | new_situation['score_diff'] = situation['score_diff'] + 3 93 | 94 | # Assume the starting field position will be own 25, accounts 95 | # for touchbacks and some run backs. 96 | 97 | new_situation['yfog'] = 25 98 | 99 | return new_situation 100 | 101 | 102 | def missed_field_goal(situation, new_situation, **kwargs): 103 | """Opponent takes over from the spot of the kick.""" 104 | new_situation['yfog'] = 100 - (situation['yfog'] - 8) 105 | return new_situation 106 | 107 | 108 | def touchdown(situation, new_situation, **kwargs): 109 | """Assumes successful XP and no 2PC -- revisit this for 2015?""" 110 | new_situation['score_diff'] = situation['score_diff'] + 7 111 | new_situation['yfog'] = 25 112 | return new_situation 113 | 114 | 115 | def turnover_downs(situation, new_situation, **kwargs): 116 | new_situation['yfog'] = 100 - situation['yfog'] 117 | return new_situation 118 | 119 | 120 | def punt(situation, new_situation, **kwargs): 121 | """Use the average net punt distance (punt distance - return yards). 122 | 123 | Not all situations have historical data, especially very 124 | close to opponent's end zone. Use a net punt distance of 125 | 5 yards here. 126 | """ 127 | 128 | default_punt = 5 129 | 130 | try: 131 | pnet = kwargs['data'].loc[kwargs['data'].yfog == situation['yfog'], 132 | 'pnet'].values[0] 133 | 134 | except IndexError: 135 | pnet = default_punt 136 | 137 | new_yfog = np.floor(100 - (situation['yfog'] + pnet)) 138 | 139 | # Touchback 140 | new_situation['yfog'] = new_yfog if new_yfog > 0 else 25 141 | 142 | return new_situation 143 | 144 | 145 | def first_down(situation): 146 | new_situation = OrderedDict() 147 | new_situation['dwn'] = 1 148 | 149 | yfog = situation['yfog'] + situation['ytg'] 150 | new_situation['ytg'] = min([10, yfog]) 151 | new_situation['yfog'] = yfog 152 | 153 | # 10 seconds of clock time elapsed, or game over. 154 | new_situation['secs_left'] = max([situation['secs_left'] - 10, 0]) 155 | 156 | # These values don't change 157 | new_situation['score_diff'] = situation['score_diff'] 158 | new_situation['timo'], new_situation['timd'] = ( 159 | situation['timo'], situation['timd']) 160 | new_situation['spread'] = situation['spread'] 161 | 162 | new_situation['kneel_down'] = kneel_down(new_situation['score_diff'], 163 | new_situation['timd'], 164 | new_situation['secs_left'], 165 | new_situation['dwn']) 166 | 167 | new_situation['qtr'] = qtr(new_situation['secs_left']) 168 | new_situation['qtr_scorediff'] = ( 169 | new_situation['qtr'] * new_situation['score_diff']) 170 | 171 | return new_situation 172 | 173 | 174 | def qtr(secs_left): 175 | if secs_left <= 900: 176 | return 4 177 | if secs_left <= 1800: 178 | return 3 179 | if secs_left <= 2700: 180 | return 2 181 | return 1 182 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | click==5.1 2 | matplotlib==1.4.3 3 | numpy==1.9.3 4 | pandas==0.16.2 5 | scikit-learn==0.16.1 6 | scipy==0.16.0 7 | wheel==0.24.0 8 | -------------------------------------------------------------------------------- /winprob.py: -------------------------------------------------------------------------------- 1 | from __future__ import division, print_function 2 | 3 | import logging 4 | import random 5 | import sys 6 | 7 | from collections import OrderedDict 8 | 9 | import plays as p 10 | 11 | 12 | logging.basicConfig(stream=sys.stderr) 13 | 14 | 15 | def generate_response(situation, data, model): 16 | """Parent function called by the bot to make decisions on 4th downs. 17 | 18 | Parameters 19 | ---------- 20 | situation : OrderedDict 21 | data : dict, contains historical data 22 | model : LogisticRegression 23 | 24 | Returns 25 | ------- 26 | payload : dict 27 | """ 28 | 29 | situation = calculate_features(situation, data) 30 | 31 | # Generate the game state of possible outcomes 32 | scenarios = simulate_scenarios(situation, data) 33 | 34 | # Calculate the win probability for each scenario 35 | probs = generate_win_probabilities(situation, scenarios, model, data) 36 | 37 | # Calculate breakeven points, make decision on optimal decision 38 | decision, probs = generate_decision(situation, data, probs) 39 | 40 | payload = {'decision': decision, 'probs': probs, 'situation': situation} 41 | 42 | return payload 43 | 44 | 45 | def calculate_features(situation, data): 46 | """Generate features needed for the win probability model that are 47 | not contained in the general game state information passed via API. 48 | 49 | Parameters 50 | ---------- 51 | situation : OrderedDict 52 | 53 | Returns 54 | ------- 55 | situation : The same OrderedDict, with new keys and values. 56 | """ 57 | 58 | situation['kneel_down'] = p.kneel_down(situation['score_diff'], 59 | situation['timd'], 60 | situation['secs_left'], 61 | situation['dwn']) 62 | 63 | situation['qtr'] = qtr(situation['secs_left']) 64 | situation['qtr_scorediff'] = situation['qtr'] * situation['score_diff'] 65 | 66 | situation['spread'] = ( 67 | situation['spread'] * (situation['secs_left'] / 3600)) 68 | 69 | cum_pct = ( 70 | (situation['secs_left'] - data['final_drives'].secs).abs().argmin()) 71 | 72 | situation['poss_prob'] = data['final_drives'].ix[cum_pct].cum_pct 73 | 74 | return situation 75 | 76 | 77 | def qtr(secs_left): 78 | """Given the seconds left in the game, determine the current quarter.""" 79 | if secs_left <= 900: 80 | return 4 81 | if secs_left <= 1800: 82 | return 3 83 | if secs_left <= 2700: 84 | return 2 85 | return 1 86 | 87 | 88 | def simulate_scenarios(situation, data): 89 | """Simulate game state after each possible outcome. 90 | 91 | Possible scenarios are: touchdown, first down, turnover on downs, 92 | field goal attempt (success or failure), and punt. 93 | """ 94 | 95 | features = data['features'] 96 | scenarios = dict() 97 | 98 | # If it's 4th & goal, success is a touchdown, otherwise a 1st down. 99 | 100 | if situation['ytg'] + situation['yfog'] >= 100: 101 | scenarios['touchdown'] = p.change_poss(situation, p.touchdown, features) 102 | else: 103 | scenarios['first_down'] = p.first_down(situation) 104 | 105 | scenarios['fail'] = p.change_poss(situation, p.turnover_downs, features) 106 | 107 | scenarios['punt'] = p.change_poss(situation, p.punt, features, 108 | data=data['punts']) 109 | 110 | scenarios['fg'] = p.change_poss(situation, p.field_goal, features) 111 | scenarios['missed_fg'] = p.change_poss(situation, p.missed_field_goal, 112 | features) 113 | 114 | return scenarios 115 | 116 | 117 | def generate_win_probabilities(situation, scenarios, model, data, **kwargs): 118 | """For each of the possible scenarios, estimate the win probability 119 | for that game state.""" 120 | 121 | probs = dict.fromkeys([k + '_wp' for k in scenarios.keys()]) 122 | 123 | features = data['features'] 124 | 125 | # Pre-play win probability calculation 126 | # Note there is more information in situation than just model features. 127 | 128 | feature_vec = [val for key, val in situation.items() if key in features] 129 | feature_vec = data['scaler'].transform(feature_vec) 130 | 131 | probs['pre_play_wp'] = model.predict_proba(feature_vec)[0][1] 132 | 133 | for scenario, outcome in scenarios.items(): 134 | feature_vec = [val for key, val in outcome.items() if key in features] 135 | feature_vec = data['scaler'].transform(feature_vec) 136 | pred_prob = model.predict_proba(feature_vec)[0][1] 137 | 138 | # Change of possessions require 1 - WP 139 | if scenario in ('fg', 'fail', 'punt', 'missed_fg', 'touchdown'): 140 | pred_prob = 1 - pred_prob 141 | 142 | probs[str(scenario + '_wp')] = pred_prob 143 | 144 | # Account for situations in which an opponent's field goal can end 145 | # the game, driving win probability down to 0. 146 | 147 | if (situation['secs_left'] < 40 and (0 <= situation['score_diff'] <= 2) 148 | and situation['timo'] == 0): 149 | # Estimate probability of successful field goal and 150 | # set the win probability of failing to convert a 4th down 151 | # to that win probability. 152 | 153 | if situation['dome'] > 0: 154 | prob_opp_fg = (data['fgs'].loc[ 155 | data['fgs'].yfog == scenarios['fail']['yfog'], 156 | 'dome_rate'].values[0]) 157 | else: 158 | prob_opp_fg = (data['fgs'].loc[ 159 | data['fgs'].yfog == scenarios['fail']['yfog'], 160 | 'open_rate'].values[0]) 161 | 162 | probs['fail_wp'] = ((1 - prob_opp_fg) * probs['fail_wp']) 163 | 164 | # Teams may not get the ball back during the 4th quarter 165 | 166 | if situation['qtr'] == 4: 167 | probs['fail_wp'] = probs['fail_wp'] * situation['poss_prob'] 168 | probs['punt_wp'] = probs['punt_wp'] * situation['poss_prob'] 169 | 170 | # Always have a 'success_wp' field, regardless of TD or 1st down 171 | 172 | if 'touchdown_wp' in probs: 173 | probs['success_wp'] = probs['touchdown_wp'] 174 | else: 175 | probs['success_wp'] = probs['first_down_wp'] 176 | return probs 177 | 178 | 179 | def generate_decision(situation, data, probs, **kwargs): 180 | """Decide on optimal play based on game states and their associated 181 | win probabilities. Note the currently 'best play' is based purely 182 | on the outcome with the highest expected win probability. This 183 | does not account for uncertainty of these estimates. 184 | 185 | For example, the win probabilty added by a certain play may be 186 | very small (0.0001), but that may be the 'best play.' 187 | """ 188 | 189 | decision = {} 190 | 191 | decision['prob_success'] = calc_prob_success(situation, data) 192 | 193 | # Expected value of win probability of going for it 194 | wp_ev_goforit = expected_win_prob(decision['prob_success'], 195 | probs['success_wp'], 196 | probs['fail_wp']) 197 | probs['wp_ev_goforit'] = wp_ev_goforit 198 | 199 | # Expected value of kick factors in probability of FG 200 | probs['prob_success_fg'], probs['fg_ev_wp'] = expected_wp_fg( 201 | situation, probs, data) 202 | 203 | # If the offense can end the game with a field goal, set the 204 | # expected win probability for a field goal attempt to the 205 | # probability of a successful field goal kick. 206 | 207 | if (situation['secs_left'] < 40 and (-2 <= situation['score_diff'] <= 0) 208 | and situation['timd'] == 0): 209 | probs['fg_wp'] = probs['prob_success_fg'] 210 | probs['fg_ev_wp'] = probs['prob_success_fg'] 211 | 212 | # If down by more than a field goal in the 4th quarter, need to 213 | # incorporate the probability that you will get the ball back. 214 | 215 | if situation['qtr'] == 4 and situation['score_diff'] < -3: 216 | probs['fg_ev_wp'] = probs['fg_ev_wp'] * situation['poss_prob'] 217 | 218 | # Breakeven success probabilities 219 | decision['breakeven_punt'], decision['breakeven_fg'] = breakeven(probs) 220 | 221 | # Of the kicking options, pick the one with the highest E(WP) 222 | decision['kicking_option'], decision['wpa_going_for_it'] = ( 223 | best_kicking_option(probs, wp_ev_goforit)) 224 | 225 | # Make the final call on kick / punt / go for it 226 | # If a win is unlikely in any circumstance, favor going for it. 227 | 228 | # if probs['pre_play_wp'] < .05: 229 | # decision['best_play'] = 'go for it' 230 | # else: 231 | decision['best_play'] = decide_best_play(decision) 232 | 233 | # Only provide historical data outside of two-minute warning 234 | decision = get_historical_decision(situation, data, decision) 235 | 236 | return decision, probs 237 | 238 | 239 | def get_historical_decision(situation, data, decision): 240 | """Compare current game situation to historically similar situations. 241 | 242 | Currently uses score difference and field position to provide 243 | rough guides to what coaches have done in the past. 244 | """ 245 | 246 | historical_data = data['decisions'] 247 | 248 | down_by_td = situation['score_diff'] <= -4 249 | up_by_td = situation['score_diff'] >= 4 250 | yfog_bin = situation['yfog'] // 20 251 | short_tg = int(situation['ytg'] <= 3) 252 | med_tg = int((situation['ytg'] >= 4) and (situation['ytg'] <= 7)) 253 | long_tg = int(situation['ytg'] > 7) 254 | 255 | history = historical_data.loc[(historical_data.down_by_td == down_by_td) & 256 | (historical_data.up_by_td == up_by_td) & 257 | (historical_data.yfog_bin == yfog_bin) & 258 | (historical_data.short == short_tg) & 259 | (historical_data.med == med_tg) & 260 | (historical_data['long'] == long_tg)] 261 | 262 | # Check to see if no similar situations 263 | if historical_data.shape[0] == 0: 264 | decision['historical_goforit_pct'] = 'None' 265 | decision['historical_punt_pct'] = 'None' 266 | decision['historical_kick_pct'] = 'None' 267 | decision['historical_N'] = 'None' 268 | else: 269 | decision['historical_punt_pct'] = (history.proportion_punted.values[0]) 270 | decision['historical_kick_pct'] = (history.proportion_kicked.values[0]) 271 | decision['historical_goforit_pct'] = (history.proportion_went.values[0]) 272 | decision['historical_goforit_N'] = (history.sample_size.values[0]) 273 | return decision 274 | 275 | 276 | def expected_win_prob(pos_prob, pos_win_prob, neg_win_prob): 277 | """Expected value of win probability, factoring in p(success).""" 278 | return (pos_prob * pos_win_prob) + ((1 - pos_prob) * neg_win_prob) 279 | 280 | 281 | def expected_wp_fg(situation, probs, data): 282 | """Expected WP from kicking, factoring in p(FG made).""" 283 | if 'fg_make_prob' in situation and isinstance(situation['fg_make_prob'], float): 284 | pos = situation['fg_make_prob'] 285 | else: 286 | fgs = data['fgs'] 287 | 288 | # Set the probability of success of implausibly long kicks to 0. 289 | if situation['yfog'] < 42: 290 | pos = 0 291 | else: 292 | # Account for indoor vs. outdoor kicking 293 | if situation['dome'] > 0: 294 | pos = fgs.loc[fgs.yfog == situation['yfog'], 'dome_rate'].values[0] 295 | else: 296 | pos = fgs.loc[fgs.yfog == situation['yfog'], 'open_rate'].values[0] 297 | 298 | return pos, expected_win_prob(pos, probs['fg_wp'], probs['missed_fg_wp']) 299 | return pos, expected_win_prob(pos, probs['fg_wp'], probs['missed_fg_wp']) 300 | 301 | 302 | def breakeven(probs): 303 | """Calculates the breakeven point for making the decision. 304 | 305 | The breakeven is the point at which a coach should be indifferent 306 | between two options. We compare the expected win probability 307 | of going for it on 4th down to the next best kicking option 308 | and determine what the probability of converting the 4th down 309 | needs to be in order to make the coach indifferent to going for it 310 | or kicking. 311 | """ 312 | 313 | denom = probs['success_wp'] - probs['fail_wp'] 314 | 315 | breakeven_punt = (probs['punt_wp'] - probs['fail_wp']) / denom 316 | breakeven_fg = (probs['fg_ev_wp'] - probs['fail_wp']) / denom 317 | 318 | # Coerce breakevens to be in the range [0, 1] 319 | breakeven_punt = max(min(1, breakeven_punt), 0) 320 | breakeven_fg = max(min(1, breakeven_fg), 0) 321 | 322 | return breakeven_punt, breakeven_fg 323 | 324 | 325 | def calc_prob_success(situation, data): 326 | """Use historical first down rates. When inside the opponent's 10, 327 | use dwn, ytg, yfog specific rates. Otherwise, use binned yfog where 328 | field is broken into 10 segments""" 329 | 330 | fd_open = data['fd_open_field'] 331 | fd_inside = data['fd_inside_10'] 332 | 333 | if situation['yfog'] < 90: 334 | try: 335 | yfog_bin = situation['yfog'] // 10 336 | p_success = fd_open.loc[(fd_open.dwn == situation['dwn']) & 337 | (fd_open.ytg == situation['ytg']) & 338 | (fd_open.yfog_bin == yfog_bin), 339 | 'fdr'].values[0] 340 | except IndexError: 341 | # Arbitrary, set the probability of success for very long 342 | # 4th downs to be 0.1 343 | p_success = 0.1 344 | 345 | else: 346 | p_success = fd_inside.loc[(fd_inside.dwn == situation['dwn']) & 347 | (fd_inside.ytg == situation['ytg']) & 348 | (fd_inside.yfog == situation['yfog']), 349 | 'fdr'].values[0] 350 | return p_success 351 | 352 | 353 | def best_kicking_option(probs, wp_ev_goforit): 354 | """Use the expected win probabilities to determine best kicking option""" 355 | 356 | # Account for end of game situations where FG WP is higher 357 | if probs['fg_ev_wp'] > probs['punt_wp'] and probs['prob_success_fg'] > .3: 358 | decision = 'kick' 359 | win_prob_added = wp_ev_goforit - probs['fg_ev_wp'] 360 | 361 | else: 362 | decision = 'punt' 363 | win_prob_added = wp_ev_goforit - probs['punt_wp'] 364 | 365 | return decision, win_prob_added 366 | 367 | 368 | def decide_best_play(decision): 369 | if (decision['kicking_option'] == 'punt' and 370 | decision['prob_success'] < decision['breakeven_punt']): 371 | return 'punt' 372 | 373 | elif (decision['kicking_option'] == 'kick' and 374 | decision['prob_success'] < decision['breakeven_fg']): 375 | return 'kick' 376 | 377 | else: 378 | return 'go for it' 379 | 380 | 381 | def random_play(data): 382 | """Generate a random play with plausible values for debugging purposes.""" 383 | 384 | features = data['features'] 385 | situation = OrderedDict.fromkeys(features) 386 | 387 | situation['dwn'] = 4 388 | situation['ytg'] = random.randint(1, 10) 389 | situation['yfog'] = random.randint(1, (100 - situation['ytg'])) 390 | situation['secs_left'] = random.randint(1, 3600) 391 | situation['score_diff'] = random.randint(-20, 20) 392 | situation['timo'] = random.randint(0, 3) 393 | situation['timd'] = random.randint(0, 3) 394 | situation['spread'] = 0 395 | 396 | situation = calculate_features(situation, data) 397 | 398 | situation['dome'] = random.randint(0, 1) 399 | return situation 400 | --------------------------------------------------------------------------------