├── .coveragerc ├── .gitignore ├── .travis.yml ├── LICENSE ├── README.md ├── baseball_field_hitloc.jpg ├── logging.json ├── main.py ├── requirements.txt └── retrosheet ├── __init__.py ├── archive.py ├── event.py ├── game.py ├── helpers.py ├── info.txt ├── parser.py ├── statistics.txt └── version.py /.coveragerc: -------------------------------------------------------------------------------- 1 | [run] 2 | branch = True 3 | 4 | source = retrosheet 5 | 6 | [report] 7 | # Regexes for lines to exclude from consideration 8 | 9 | include = 10 | retrosheet/__init__.py 11 | retrosheet/[A-z]*.py 12 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | *.py[cod] 2 | *.ipynb 3 | *.csv 4 | *.zip 5 | *.EVA 6 | *.EVN 7 | *.ROS 8 | .DS_Store 9 | *.egg-info 10 | dist 11 | *-jgrover* 12 | *.log 13 | .coverage 14 | coverage.xml 15 | htmlcov 16 | -------------------------------------------------------------------------------- /.travis.yml: -------------------------------------------------------------------------------- 1 | language: python 2 | python: 3 | - "3.6" 4 | script: nosetests 5 | before_install: 6 | pip install -r requirements.txt 7 | after_success: 8 | codecov 9 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2018 Lucas Calestini 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # retrosheet 2 | 3 | [![Build Status](https://travis-ci.org/calestini/retrosheet.svg?branch=master)](https://travis-ci.org/calestini/retrosheet) [![codecov](https://codecov.io/gh/calestini/retrosheet/branch/master/graph/badge.svg)](https://codecov.io/gh/calestini/retrosheet) [![Python 3.6](https://img.shields.io/badge/python-3.6-blue.svg)](https://www.python.org/downloads/release/python-360/) [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT) [![Version: 0.1.0](https://img.shields.io/badge/version-0.1.0-green.svg)](https://img.shields.io/badge/version-0.1.0-green.svg) 4 | 5 | 6 | A project to parse [retrosheet](https://www.retrosheet.org) baseball data in python. All data contained at Retrosheet site is copyright © 1996-2003 by Retrosheet. All Rights Reserved. 7 | 8 | _The information used here was obtained free of charge from and is copyrighted by Retrosheet. Interested parties may contact Retrosheet at "www.retrosheet.org"_ 9 | 10 | ## Motivation 11 | 12 | The motivation behind this project is to enhance python-based baseball analytics, from data collection to advanced predictive modeling techniques. 13 | 14 | --- 15 | ## Before you start 16 | 17 | If you are looking for a complete solution out of the box, check [Chadwick Bureau](http://chadwick-bureau.com/) 18 | 19 | If you are looking for a quick way to check stats, see [Baseball-Reference](https://www.baseball-reference.com) 20 | 21 | If you want a web-scrapping solution, check [pybaseball](https://github.com/jldbc/pybaseball) 22 | 23 | ## Getting Started 24 | 25 | ### Downloading Package 26 | 27 | Run the following code to create the folder structure 28 | ```bash 29 | git clone https://github.com/calestini/retrosheet.git 30 | ``` 31 | 32 | ### Downloading historical data to csv 33 | 34 | **Note: This package is a work in progress, and the files are not yet fully parsed, and statistics not fully validated.** 35 | 36 | The code below will save data from 1921 to 2017 in your machine. Be careful as it will take some time to download it all (10min with a decent machine and decent internet connection). Final datasets add up to ~ 3GB 37 | 38 | ```python 39 | from retrosheet import Retrosheet 40 | rs = Retrosheet() 41 | rs.batch_parse(yearFrom=1921, yearTo=2017, batchsize=10) #10 files at a time 42 | ``` 43 | ```bash 44 | [========================================] 100.0% ... Completed 1921-1930 45 | [========================================] 100.0% ... Completed 1931-1940 46 | [========================================] 100.0% ... Completed 1941-1950 47 | [========================================] 100.0% ... Completed 1951-1960 48 | [========================================] 100.0% ... Completed 1961-1970 49 | [========================================] 100.0% ... Completed 1971-1980 50 | [========================================] 100.0% ... Completed 1981-1990 51 | [========================================] 100.0% ... Completed 1991-2000 52 | [========================================] 100.0% ... Completed 2001-2010 53 | [========================================] 100.0% ... Completed 2011-2017 54 | ``` 55 | 56 | ## Files it will download / create: 57 | 58 | - plays.csv 59 | - teams.csv 60 | - rosters.csv 61 | - lineup.csv 62 | - pitching.csv 63 | - fielding.csv 64 | - batting.csv 65 | - running.csv 66 | - info.csv 67 | 68 | --- 69 | ## Useful Links / References 70 | 71 | - Our own summary of Retrosheet terminology can be found [here](retrosheet/info.txt) 72 | - For the events file, the pitches field sometimes repeats over the following role, whenever there was a play (CS, SB, etc.). In these cases, the code needs to remove the duplication. 73 | - Main baseball statistics --> [here](https://en.wikipedia.org/wiki/Baseball_statistics) 74 | - Hit location diagram are [here](https://www.retrosheet.org/location.htm) 75 | - Link to downloads [here](https://www.retrosheet.org/game.htm) 76 | - [Glossary of Baseball](https://en.wikipedia.org/wiki/Glossary_of_baseball) 77 | - Information about the event files can be found [here](https://www.retrosheet.org/eventfile.htm) 78 | - Documentation on the datasets can be found [here](https://www.retrosheet.org/datause.txt) 79 | - Putouts and Assists [rules](https://baseballscoring.wordpress.com/site-index/putouts-and-assists/) 80 | 81 | 82 | ### Play Field in Event File: 83 | 84 | - What does 'BF' in '1/BF' stand for? bunt fly? 85 | - Why some specific codes for modifier are 2R / 2RF / 8RM / 8RS / 8RXD / L9Ls / RNT ? 86 | 87 | ## TODO 88 | 89 | - [ ] Finish parsing pitches 90 | - [ ] Clean-up code and logic 91 | - [ ] Test primary stats with [game logs](https://www.retrosheet.org/gamelogs/) 92 | - [X] Test innings ending in 3 outs 93 | - [ ] Playoff files 94 | - [ ] [Parks files](https://www.retrosheet.org/parkcode.txt) 95 | - [ ] Player files 96 | - [ ] Create sql export option 97 | - [ ] Aggregate more advanced metrics 98 | - [ ] Map out location 99 | - [ ] Add additional data if possible 100 | - [ ] Load [game-log data](https://www.retrosheet.org/gamelogs/) 101 | - [ ] Load [player / manager/ umpire data](https://www.retrosheet.org/retroID.htm) 102 | 103 | ## Validating Career Stats - Spot Checks 104 | 105 | ### Batting + Fielding 106 | 107 | - Josh Donaldson (player_id = donaj001) 108 | 109 | | Source | R | H | HR | SB | 110 | |-|:-:|:-:|:-:|:-:| 111 | | Official | 526 | 860 | 174 | 32 | 112 | | ThisPackage | 524 | 853 | 173 | 32 | 113 | 114 | - Nelson Cruz (player_id = cruzn002) 115 | 116 | | Source | R | H | HR | SB | 117 | |-|:-:|:-:|:-:|:-:| 118 | | Official | 768 | 1447 | 317 | 75 | 119 | | ThisPackage | 767 | 1427 | 317 | 75 | 120 | -------------------------------------------------------------------------------- /baseball_field_hitloc.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/calestini/retrosheet/dc95f79f48e25e5b8f75959c363b430ccc581d08/baseball_field_hitloc.jpg -------------------------------------------------------------------------------- /logging.json: -------------------------------------------------------------------------------- 1 | { 2 | "version": 1, 3 | "disable_existing_loggers": false, 4 | "formatters": { 5 | "simple": { 6 | "format": "%(asctime)s - %(name)s - %(levelname)s - %(message)s" 7 | } 8 | }, 9 | 10 | "handlers": { 11 | "file_handler": { 12 | "class": "logging.FileHandler", 13 | "level": "DEBUG", 14 | "formatter": "simple", 15 | "filename": "python_logging.log", 16 | "encoding": "utf8" 17 | } 18 | }, 19 | 20 | "root": { 21 | "level": "DEBUG", 22 | "handlers": ["file_handler"] 23 | } 24 | } 25 | -------------------------------------------------------------------------------- /main.py: -------------------------------------------------------------------------------- 1 | from retrosheet import Retrosheet 2 | from argparse import ArgumentParser 3 | 4 | 5 | if __name__ == '__main__': 6 | 7 | parser = ArgumentParser() 8 | parser.add_argument("-s", "--start", dest="year_start", help="Start year", type=int) 9 | parser.add_argument("-e", "--end", dest="year_end", help="End year for the parser", type=int) 10 | 11 | args = parser.parse_args() 12 | 13 | rs = Retrosheet() 14 | rs.batch_parse(yearFrom=args.year_start, yearTo=args.year_end, batchsize=10) 15 | 16 | ''' 17 | rs.get_data(yearFrom=args.year_start, yearTo=args.year_end) 18 | rs.to_df() 19 | rs.save_csv(path_str='') 20 | ''' 21 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | pandas 2 | urllib3 3 | -------------------------------------------------------------------------------- /retrosheet/__init__.py: -------------------------------------------------------------------------------- 1 | import os 2 | import json 3 | import logging.config 4 | 5 | from .parser import Retrosheet 6 | from .version import __version__ 7 | 8 | def setup_logging(default_path='logging.json', default_level=logging.INFO, env_key='LOG_CFG'): 9 | 10 | """Setup logging configuration 11 | """ 12 | 13 | path = default_path 14 | value = os.getenv(env_key, None) 15 | 16 | if value: 17 | path = value 18 | 19 | if os.path.exists(path): 20 | with open(path, 'rt') as f: 21 | config = json.load(f) 22 | logging.config.dictConfig(config) 23 | 24 | requests_log = logging.getLogger("requests") 25 | requests_log.setLevel(logging.WARNING) 26 | 27 | else: 28 | logging.basicConfig(level=default_level) 29 | 30 | 31 | setup_logging() 32 | -------------------------------------------------------------------------------- /retrosheet/archive.py: -------------------------------------------------------------------------------- 1 | 2 | class Parser(object): 3 | """docstring for Parser.""" 4 | 5 | endpoint = 'https://www.retrosheet.org/events/' 6 | extension = '.zip' 7 | 8 | def __init__(self): 9 | #self.endpoint = 10 | #self.extension = '.zip' 11 | self.errors = [] 12 | self.info = pd.DataFrame() 13 | self.starting = pd.DataFrame() 14 | self.plays = pd.DataFrame() 15 | self.er = pd.DataFrame() 16 | self.subs = pd.DataFrame() 17 | self.comments = pd.DataFrame() 18 | self.rosters = pd.DataFrame() 19 | self.teams = pd.DataFrame() 20 | self.metadata = pd.DataFrame() 21 | 22 | 23 | def _pitch_count(self, string, current_count): 24 | """ 25 | For now it is including pickoffs 26 | """ 27 | #simplest idea: 28 | clean_pitches = string.replace('>','').replace('+','').replace('*','').replace('??','') 29 | splits = clean_pitches.split('.') #results in a list 30 | count = current_count + len(splits[len(splits)-1]) 31 | 32 | return count 33 | 34 | 35 | def parse_file(self, year): 36 | 37 | """ 38 | Will parse the file respective for one year. 39 | - It will first look for the file in current directory 40 | - Else, it will take from the web (without making a copy) 41 | """ 42 | 43 | event = Event() 44 | filename = '{0}eve{1}'.format(year, self.extension) 45 | 46 | try: #the files locally: 47 | zipfile = ZipFile(filename) 48 | self.log.debug("Found locally") 49 | except: #take from the web 50 | resp = urlopen(self.endpoint + filename) 51 | zipfile = ZipFile(BytesIO(resp.read())) 52 | self.log.debug("Donwloading from the web") 53 | 54 | infos, starting, plays, er, subs, comments, rosters, teams, metadata = ([] for i in range(9)) 55 | 56 | for file in zipfile.namelist(): 57 | 58 | metadata.append([file, datetime.datetime.now(), __version__]) 59 | 60 | if file[:4] == 'TEAM': 61 | 62 | for row in zipfile.open(file).readlines(): 63 | row = row.decode("utf-8") 64 | team_piece = [] 65 | for i in range(4): team_piece.append(row.rstrip('\n').split(',')[i].replace('\r','')) 66 | teams.append([year]+team_piece) 67 | 68 | elif file[-3:] == 'ROS': #roster file 69 | 70 | for row in zipfile.open(file, 'r').readlines(): 71 | row = row.decode("utf-8") 72 | roster_piece = [] 73 | for i in range(7): roster_piece.append(row.rstrip('\n').split(',')[i].replace('\r','')) 74 | rosters.append([year]+roster_piece) 75 | 76 | else: #event file 77 | order, game_id, version, runs = (0 for i in range(4)) 78 | inning = '1' 79 | team = '0' 80 | 81 | file_lines = zipfile.open(file, 'r').readlines() 82 | for loop, row in enumerate(file_lines): 83 | 84 | row = row.decode("utf-8") 85 | row_type = row.rstrip('\n').split(',')[0] 86 | 87 | if row_type == 'id': 88 | 89 | #initialize variables 90 | order = 0 91 | game_id = row.rstrip('\n').split(',')[1].strip('\r') 92 | event.play = {'B': 1,'1': 0,'2': 0,'3': 0,'H': 0, 'out': 3, 'run': 0} 93 | home_team_score = 0 94 | away_team_score = 0 95 | 96 | infos.append([game_id, '__version__', __version__]) # parsing version 97 | infos.append([game_id, 'file', file]) # file name 98 | 99 | if row_type == 'version': 100 | version = row.rstrip('\n').split(',')[1].strip('\r') 101 | 102 | if row_type == 'info': 103 | var = row.rstrip('\n').split(',')[1] 104 | value = row.rstrip('\n').split(',')[2].replace('\r','').replace('"','') 105 | value = None if value == 'unknown' else value 106 | value = None if value == 0 and var == 'temp' else value 107 | value = None if value == -1 and var == 'windspeed' else value 108 | 109 | infos.append([game_id, var, value]) 110 | 111 | if row_type == 'start': 112 | #starting pitchers 113 | if row.rstrip('\n').split(',')[5].strip('\r') == '1': 114 | if row.rstrip('\n').split(',')[3] == '1': 115 | home_pitcher_id = row.rstrip('\n').split(',')[1] 116 | home_pitch_count = 0 117 | else: #away pitcher 118 | away_pitcher_id = row.rstrip('\n').split(',')[1] 119 | away_pitch_count = 0 120 | 121 | start_piece = [] 122 | for i in range(1,6,1): start_piece.append(row.rstrip('\n').split(',')[i].replace('"','').replace('\r','')) 123 | ''' 124 | start_piece = [ 125 | row.rstrip('\n').split(',')[1], 126 | row.rstrip('\n').split(',')[2].strip('"'), 127 | row.rstrip('\n').split(',')[3], 128 | row.rstrip('\n').split(',')[4], 129 | row.rstrip('\n').split(',')[5].strip('\r') 130 | ] 131 | ''' 132 | starting.append([game_id, version]+start_piece) 133 | 134 | if row_type == 'play': 135 | 136 | if team != row.rstrip('\n').split(',')[2]: #if previous team != current team 137 | runs = 0 138 | if not row.rstrip('\n').split(',')[2] == '0' and not row.rstrip('\n').split(',')[1] == '1': #if not first obs 139 | #assert (event.play['out'] == 3),"Game: {3} Inning {0} team {4} ended with {1} outs [{2}]".format(inning,event.play['out'], event.str, game_id, team) 140 | if event.play['out'] != 3: 141 | self.errors.append("Game: {3} Inning {0} team {4} ended with {1} outs [{2}]".format(inning,event.play['out'], event.str, game_id, team)) 142 | event.play['out'] = 0 143 | 144 | 145 | event.str = row.rstrip('\n').split(',')[6].strip('\r') 146 | event.decipher() 147 | 148 | if row.rstrip('\n').split(',')[2] == '0': #the opposite team is pitching 149 | pitcher_id = home_pitcher_id 150 | home_pitch_count = self._pitch_count(row.rstrip('\n').split(',')[5], home_pitch_count) 151 | pitch_count = home_pitch_count 152 | away_team_score = away_team_score + event.play['run'] - runs 153 | 154 | 155 | elif row.rstrip('\n').split(',')[2] == '1': #away 156 | pitcher_id = away_pitcher_id 157 | away_pitch_count = self._pitch_count(row.rstrip('\n').split(',')[5], away_pitch_count) 158 | pitch_count = away_pitch_count 159 | home_team_score = home_team_score + event.play['run'] - runs 160 | 161 | 162 | inning = row.rstrip('\n').split(',')[1] 163 | team = row.rstrip('\n').split(',')[2] 164 | runs = event.play['run'] 165 | 166 | 167 | play_piece = [ 168 | inning, team, pitcher_id, pitch_count, 169 | row.rstrip('\n').split(',')[3], 170 | row.rstrip('\n').split(',')[4], 171 | row.rstrip('\n').split(',')[5], 172 | row.rstrip('\n').split(',')[6].strip('\r'), 173 | event.play['B'], 174 | event.play['1'], 175 | event.play['2'], 176 | event.play['3'], 177 | event.play['H'], 178 | event.play['run'], 179 | event.play['out'], 180 | away_team_score, 181 | home_team_score 182 | ] 183 | plays.append([order, game_id, version] + play_piece) 184 | 185 | order += 1 186 | 187 | if row_type == 'sub': 188 | if row.rstrip('\n').split(',')[5].strip('\r') == '1': 189 | if row.rstrip('\n').split(',')[3] == '1': 190 | home_pitcher_id = row.rstrip('\n').split(',')[1] 191 | #print ('sub: home pitcher: ', home_pitcher_id) 192 | home_pitch_count = 0 193 | else: #away pitcher 194 | away_pitcher_id = row.rstrip('\n').split(',')[1] 195 | #print ('sub: away pitcher: ', away_pitcher_id) 196 | away_pitch_count = 0 197 | sub_piece = [ 198 | row.rstrip('\n').split(',')[1], 199 | row.rstrip('\n').split(',')[2].strip('"'), 200 | row.rstrip('\n').split(',')[3], 201 | row.rstrip('\n').split(',')[4], 202 | row.rstrip('\n').split(',')[5].strip('\r') 203 | ] 204 | subs.append([order, game_id, version] + sub_piece) 205 | order += 1 206 | 207 | if row_type == 'com': #comments 208 | com_piece = [ 209 | row.rstrip('\n').split('"')[1] 210 | ] 211 | comments.append([order, game_id, version] + com_piece) 212 | 213 | if row_type == 'data': 214 | 215 | #add info of game that just finished #check 216 | infos.append([game_id, 'hometeam_score', home_team_score]) 217 | infos.append([game_id, 'awayteam_score', away_team_score]) 218 | 219 | data_piece = [ 220 | row.rstrip('\n').split(',')[1], 221 | row.rstrip('\n').split(',')[2], 222 | row.rstrip('\n').split(',')[3].strip('\r') 223 | ] 224 | er.append([game_id, version] + data_piece) 225 | 226 | rosters_df = pd.DataFrame(rosters, columns = ['year','player_id','last_name','first_name','batting_hand','throwing_hand','team_abbr_1','position']) 227 | teams_df = pd.DataFrame(teams, columns=['year','team_abbr','league','city','name']) 228 | 229 | 230 | info = pd.DataFrame(infos, columns = ['game_id','var','value']) 231 | games = info[~info.duplicated(subset=['game_id','var'], keep='last')].pivot('game_id','var','value').reset_index() 232 | #self.log.warning('{0}: Error on pivoting games'.format(year)) 233 | #games = pd.DataFrame() 234 | 235 | starting_df = pd.DataFrame(starting, columns = ['game_id','version','player_id','player_name','home_team','batting_position','fielding_position']) 236 | subs_df = pd.DataFrame(subs, columns = ['order','game_id','version', 'player_id','player_name','home_team','batting_position','position']) 237 | plays_df = pd.DataFrame(plays, columns = [ 238 | 'order','game_id','version','inning','home_team','pitcher_id','pitch_count','batter_id','count_on_batter','pitches','play', 239 | 'B','1','2','3','H','run','out','away_score','home_score' 240 | ]) 241 | comments_df = pd.DataFrame(comments, columns = ['order','game_id','version','comment']) 242 | er_df = pd.DataFrame(er, columns = ['game_id','version','earned_run','player_id','variable']) 243 | metadata_df = pd.DataFrame(metadata, columns = ['file', 'datetime', 'version']) 244 | 245 | return games ,starting_df , plays_df, er_df, subs_df, comments_df, rosters_df, teams_df, metadata_df 246 | 247 | 248 | def get_data(self, yearFrom='2017', yearTo=None): 249 | 250 | if yearTo is None: 251 | yearTo = yearFrom 252 | 253 | self.log.warning('Parsing Files. Looking locally or downloading from retrosheet.org ...') 254 | 255 | for count, year in enumerate(range(yearFrom,yearTo+1,1), 0): #+1 for inclusive 256 | 257 | total = yearTo-yearFrom+1 258 | progress(count, total, status=year) 259 | 260 | info_temp, starting_temp, plays_temp, er_temp, subs_temp, comments_temp, rosters_temp, teams_temp, meta = self.parse_file(year) 261 | 262 | self.info = self.info.append(info_temp) 263 | self.starting = self.starting.append(starting_temp) 264 | self.plays = self.plays.append(plays_temp) 265 | self.er = self.er.append(er_temp) 266 | self.subs = self.subs.append(subs_temp) 267 | self.comments = self.comments.append(comments_temp) 268 | self.rosters = self.rosters.append(rosters_temp) 269 | self.teams = self.teams.append(teams_temp) 270 | self.metadata = self.metadata.append(meta) 271 | 272 | progress(100,100, status="Files Parsed") 273 | self.log.warning(self.errors) 274 | self.log.warning('Total errors: {0}'.format(len(self.errors))) 275 | 276 | return True#info, starting, plays, er, subs, comments, rosters, teams 277 | 278 | 279 | def save_csv(self, path=''): 280 | self.log.warning('Saving files to csv ({0}) ...'.format(path)) 281 | 282 | self.info.to_csv('{0}info.csv'.format(path), index=False) 283 | self.starting.to_csv('{0}starting.csv'.format(path), index=False) 284 | self.plays.to_csv('{0}plays.csv'.format(path), index=False) 285 | self.er.to_csv('{0}er.csv'.format(path), index=False) 286 | self.subs.to_csv('{0}subs.csv'.format(path), index=False) 287 | self.comments.to_csv('{0}comments.csv'.format(path), index=False) 288 | self.rosters.to_csv('{0}rosters.csv'.format(path), index=False) 289 | self.teams.to_csv('{0}teams.csv'.format(path), index=False) 290 | self.metadata.to_csv('{0}metadata.csv'.format(path), index=False) 291 | 292 | self.log.warning('Saved ...') 293 | 294 | 295 | 296 | 297 | class Event(object): 298 | 299 | """Events 300 | Parameters: 301 | - event_string (NP = No Play) 302 | - play = {'B': 1,'1': 0,'2': 0,'3': 0,'H': 0, 'out': 0, 'run': 0} 303 | 304 | TODO: 305 | - clean code, make it less redundant, potentially in a module only. 306 | """ 307 | 308 | def __init__(self): 309 | self.log = logging.getLogger(__name__) 310 | self.str = 'NP' 311 | self.play = {'B': 1,'1': 0,'2': 0,'3': 0,'H': 0, 'out': 0, 'run': 0} 312 | 313 | def _print_diamond(self): 314 | """ 315 | This function prints the diamond for the specific event for easier visualization 316 | e.g: 317 | Play: 318 | |----------[ 2B ]----------| 319 | |--------------------------| 320 | |----[ 3B ]-----[ 1B ]-----| 321 | |--------------------------| 322 | |------[ H ]---[ B ]-------| 323 | |--------------------------| 324 | Runs: [%] Outs: [%] 325 | 326 | TODO: 327 | - Log instead of print 328 | """ 329 | diamond = '''Play: {0}\n|---------[ {3} ]-----------|\n|-------------------------|\n|----[ {4} ]------[ {2} ]-----|\n|-------------------------|\n|------[ {5} ]--[ {1} ]-------|\n|-------------------------|\nRuns: {7}\tOuts: {6}\n''' 330 | print (diamond.format(self.str, self.play['B'], self.play['1'], self.play['2'], 331 | self.play['3'], self.play['H'], self.play['out'], self.play['run'])) 332 | 333 | 334 | def parse_advance(self): 335 | """ 336 | This portion parses the explicit advancements 337 | """ 338 | self.play = {'B': 1,'1': 0,'2': 0,'3': 0,'H': 0, 'out': 0, 'run': 0} if self.play['out'] >= 3 else self.play 339 | ##################################### 340 | #this_play = {'B': 1,'1': 0,'2': 0,'3': 0,'H': 0, 'out': 0, 'run': 0} 341 | self.play['B'] = 1 342 | 343 | self.advances = self.str.split('.')[len(self.str.split('.'))-1] if len(self.str.split('.'))>1 else '' 344 | 345 | if re.search('\.', self.str): #there was an advance: 346 | #test using regular expressions 347 | #Step2: Understanding advances / outs in advances 348 | out_in_advance = re.findall('[1-3B]X[1-3H](?:\([^\)]+\))*', self.advances) 349 | 350 | #two type of PLAYER ERRORS on advances: 351 | #a) notation is out ($X$) but error negates the out. 'X' becomes '-' 352 | error_out = re.findall('[1-3B]X[1-3H](?:\([^\)]+\))*(?:\([^\)]*E[^\)]+\))(?:\([^\)]+\))?', self.advances) #error a 353 | 354 | #b) notation is advance ($-$) and parenthesis explain the error that generated the advance 355 | ## no action needed. Error is an explanation of play, like all others 356 | 357 | advanced = re.findall('[1-3B]\-[1-3H](?:\([^\)]+\))*', self.advances) 358 | 359 | #element 0 is where they come from 360 | #element 2 is where they are/were headed 361 | for oia in out_in_advance: 362 | self.play[oia[0]] = 0 363 | self.play['out'] += 1 364 | 365 | for error in error_out: 366 | if not re.findall('(?:\([1-9U/TH]+\))+', error) or re.findall('^[1-3B]X[1-3H](?:\(TH\))', error): # 'BX3(36)(E5/TH)' and 'BXH(TH)(E2/TH)(8E2)(NR)(UR)' are not errors. 367 | self.play['out'] -= 1 368 | if error[2] == 'H': 369 | self.play[error[0]] = 0 #decrease from where they left 370 | self.play[error[2]] += 1 #increase where they touched 371 | self.play['run'] += 1 372 | else: 373 | self.play[error[0]] = 0 #decrease from where they left 374 | self.play[error[2]] = 1 #incresae where they touched 375 | 376 | 377 | for advance in advanced: 378 | if advance[2] == 'H': 379 | self.play[advance[0]] = 0 #decrease from where they left 380 | self.play[advance[2]] += 1 #increase where they touched 381 | self.play['run'] += 1 382 | else: 383 | self.play[advance[0]] = 0 #decrease from where they left 384 | self.play[advance[2]] = 1 #incresae where they touched 385 | 386 | return True 387 | return False 388 | 389 | 390 | def _left_base(self, arriving_base): 391 | if arriving_base == 'H': 392 | self.play['3'] = 0 393 | elif arriving_base == '1': 394 | self.play['B'] = 0 395 | else: 396 | self.play[str(int(arriving_base)-1)] = 0 397 | 398 | return True 399 | 400 | 401 | def _advance(self, arriving_base): 402 | if arriving_base == 'H': 403 | self.play[arriving_base] += 1 404 | else: 405 | self.play[arriving_base] = 1 406 | self._left_base(arriving_base) 407 | return True 408 | 409 | 410 | def _out_in_advance(self, arriving_base): 411 | self.play['out'] += 1 412 | self._left_base(arriving_base) 413 | return True 414 | 415 | 416 | def _secondary_event(self, secondary_event): 417 | """ 418 | Events happening with K or Walks. This can be merged with parse_event() if written well 419 | """ 420 | if re.findall('^CS[23H](?:\([1-9]+\))+',secondary_event): 421 | #print ('CAUGHT STEALING') 422 | for cs in secondary_event.split(';'): 423 | self._out_in_advance(cs[2]) 424 | 425 | ##caught stealing errors --> calls reversed: 426 | elif re.findall('^CS[23H](?:\([1-9]*E[1-9]+)+',secondary_event): # removed last ')' as some observations didnt have it 427 | for cs in secondary_event.split(';'): 428 | self._advance(cs[2]) 429 | 430 | elif re.findall('^[EOPF][1-3ABI]$',secondary_event): 431 | pass #explicit event 432 | 433 | elif re.findall('^WP$',secondary_event): 434 | pass #explicit event (?) 435 | 436 | elif re.findall('^PO[123](?:\([1-9]+\))',secondary_event): 437 | self.play[secondary_event[2]] = 0 438 | self.play['out'] += 1 439 | 440 | #only the errors (allowing to stay) 441 | elif re.findall('^PO[123](?:\([1-9]*E[1-9]+)',secondary_event): 442 | pass #will keep explicit for now, but it usually shows one base advance. 443 | 444 | 445 | #POCS%($$) picked off off base % (2, 3 or H) with the runner charged with a caught stealing 446 | #without errors 447 | elif re.findall('^POCS[23H](?:\([1-9]+\))',secondary_event): 448 | for split in secondary_event.split(';'): #there are CS events together with POCS 449 | if split[0:2] == 'CS': 450 | self.play[split[3]] = 0 451 | self.play['out'] += 1 452 | else: #POCS 453 | self.play[split[4]] = 0 454 | self.play['out'] += 1 455 | 456 | #only the errors (allowing advances) 457 | elif re.findall('^POCS[23H](?:\([1-9]*E[1-9]+)', secondary_event): 458 | pass #will wait for explicit advances 459 | 460 | 461 | elif re.findall('^SB[23H]',secondary_event): 462 | for sb in secondary_event.split(';'): 463 | if sb[0:2] == 'SB': 464 | self._advance(sb[2]) 465 | 466 | elif re.findall('^[1-9]*E[1-9]*$',secondary_event): #errors 467 | pass #wait for explicit change or B-1 468 | 469 | return True 470 | 471 | 472 | def parse_event(self): 473 | 474 | if self.str is None: 475 | pass#return False 476 | 477 | result = '' 478 | 479 | a = self.str.split('.')[0].split('/')[0].replace('!','').replace('#','').replace('?','') 480 | modifiers = self.str.split('.')[0].split('/')[1:] if len(self.str.split('.')[0].split('/'))>1 else [] 481 | 482 | #play['B'] = 1 483 | 484 | #at least one out: 485 | if re.findall('^[1-9](?:[1-9]*(?:\([B123]\))?)*\+?\-?$',a): 486 | result = 'out' 487 | if re.findall('(?:\([B123]\))',a): #double or triple play 488 | outs = len(re.findall('(?:\([B123]\))',a)) 489 | #check if there is a double play or tripple play 490 | ################ MODIFIER ##################### 491 | if modifiers: 492 | for modifier in modifiers: 493 | if re.findall('^[B]?[PUGFL]?DP',modifier): #double play 494 | outs = 2 495 | elif re.findall('^[B]?[PUGFL]?TP',modifier): #tripple play 496 | outs = 3 497 | ############################################### 498 | if re.search('(?:\([B]\))',a): #at-bat explicit out 499 | for out in re.findall('(?:\([B123]\))',a): 500 | if out[1] != 'B': 501 | self._out_in_advance(str(int(out[1])+1)) 502 | else: 503 | self._out_in_advance('1') 504 | 505 | elif len(re.findall('(?:\([B123]\))',a)) != outs: #B is implicit 506 | # new addition - ad hoc 507 | if len(re.findall('(?:\([B123]\))',a)) == 1 and outs == 3 and not re.findall('[B]X[1-3H](?:\([^\)]+\))*', self.advances) : 508 | self.play['out'] += 1 509 | ### 510 | 511 | if not re.findall('[1-3B]X[1-3H](?:\([^\)]+\))*', self.advances): #out is implicit in advances too 512 | self._out_in_advance('1') 513 | 514 | elif len(re.findall('[1-3B]X[1-3H](?:\([^\)]+\))*', self.advances)) == len(re.findall('[1-3B]X[1-3H](?:\([^\)]+\))*(?:\([^\)]*E[^\)]+\))(?:\([^\)]+\))?', self.advances)): 515 | self._out_in_advance('1') 516 | 517 | for out in re.findall('(?:\([B123]\))',a): 518 | self._out_in_advance(str(int(out[1])+1)) 519 | 520 | elif not re.search('(?:\([B]\))',a) and len(re.findall('(?:\([B123]\))',a)) == outs: 521 | for out in re.findall('(?:\([B123]\))',a): 522 | self._out_in_advance(str(int(out[1])+1)) 523 | self._advance('1') 524 | 525 | else: 526 | self._out_in_advance('1') 527 | 528 | #out + error: #out is negated 529 | elif re.findall('^[1-9][1-9]*E[1-9]*$',a): 530 | result = 'out error' 531 | if not re.findall('[B]\-[1-3H](?:\([^\)]+\))*', self.advances) or not re.findall('[B]X[1-3H](?:\([^\)]+\))*(?:\([^\)]*E[^\)]+\))(?:\([^\)]+\))?', self.advances): 532 | self.play['B'] = 0 533 | self.play['1'] = 1 534 | 535 | #wait for explicit change or B-1 536 | 537 | 538 | ##caught stealing (except errors): 539 | elif re.findall('^CS[23H](?:\([1-9]+\))+', a): 540 | result = 'cs' 541 | for cs in a.split(';'): 542 | self._out_in_advance(cs[2]) 543 | 544 | 545 | ##caught stealing errors --> calls reversed: 546 | elif re.findall('^CS[23H](?:\([1-9]*E[1-9]+)+',a): # removed last ')' as some observations didnt have it 547 | result = 'cs error' 548 | for cs in a.split(';'): 549 | self._advance(cs[2]) 550 | 551 | 552 | ##if its a balk 553 | elif re.findall('^BK$', a):# balk (batter remains but all other get one base) 554 | result = 'balk' 555 | pass #will test for explicit 556 | 557 | ##double 558 | elif re.findall('^D[0-9]*\??$', a): 559 | result = 'double' 560 | if not self.play['B'] == 0: 561 | self.play['2'] = 1 562 | self.play['B'] = 0 563 | 564 | ##ground rule double (two bases for everyone as ball went out after being in) 565 | elif re.findall('^DGR[0-9]*$', a): 566 | result = 'dgr' 567 | #will keep other advancements explicit. check. 568 | if not self.play['B'] == 0: 569 | self.play['2'] = 1 570 | self.play['B'] = 0 571 | 572 | ## defensive indifference 573 | elif re.findall('^DI$', a): 574 | result = 'di' 575 | pass #explicit advancements 576 | 577 | ## error allowing batter to get on base (B-1 implicit or not) 578 | elif re.findall('^E[1-9]\??$', a): 579 | result = 'single' 580 | if not self.play['B'] == 0: 581 | self._advance('1') 582 | 583 | # fielders choice (also implicit B-1) 584 | elif re.findall('^FC[1-9]?\??$',a): 585 | result = 'single' 586 | if not self.play['B'] == 0: 587 | self._advance('1') 588 | 589 | # error on foul fly play (error given to the play but no advances) 590 | elif re.findall('^FLE[1-9]+$',a): 591 | result = 'fle' 592 | pass 593 | 594 | # home run 595 | elif re.findall('^H[R]?[1-9]*[D]?$',a): 596 | result = 'hr' 597 | #will keep other advancements explicit. check. 598 | if not self.play['B'] == 0: 599 | self.play['H'] += 1 600 | self.play['B'] = 0 601 | self.play['run'] += 1 602 | 603 | ## hit by pitch 604 | elif re.findall('^HP$', a): 605 | result = 'single' 606 | if not self.play['B'] == 0: 607 | self.play['1'] = 1 608 | self.play['B'] = 0 609 | 610 | ## intentional walks can happen + SB%, CS%, PO%, PB, WP and E$. 611 | ## b-1 is implicit + whatever else happens in the play 612 | elif re.findall('^I[W]?\+?(?:WP)?(?:OA)?(?:SB[23H])?(?:CS[23H](?:\([1-9]+\)))?(?:PO[1-3](?:\([1-9]+\)))?$',a): 613 | result = 'single' 614 | if not self.play['B'] == 0: 615 | self.play['1'] = 1 616 | self.play['B'] = 0 617 | 618 | other_event = a.split('+')[1] if len(a.split('+'))>1 else [] 619 | if other_event: 620 | self._secondary_event(other_event) 621 | 622 | 623 | #walks. B-1 implicit + other plays 624 | elif re.findall('^W(?!P)',a): 625 | result = 'single' 626 | if not self.play['B'] == 0: 627 | self.play['1'] = 1 628 | self.play['B'] = 0 629 | 630 | other_event = a.split('+')[1] if len(a.split('+'))>1 else [] 631 | if other_event: 632 | self._secondary_event(other_event) 633 | 634 | #elif re.findall('^K$',a): 635 | # result = 'out' 636 | # self._out_in_advance('1') 637 | # if re.findall('[B]X[1-3H](?:\([1-9]+\))*', self.advances): 638 | # explicit strikeout 639 | # self.play['out'] -= 1 640 | 641 | ## Strikeouts. Events can happen too: SB%, CS%, OA, PO%, PB, WP and E$ 642 | elif re.findall('^K',a): 643 | result = 'out' 644 | self._out_in_advance('1') 645 | #if its a strikeout w fourceout of an explicit out, remove the out here to avoid double-count 646 | #if re.findall('[B]X[1-3H](?:\([1-9]+\))*', self.advances): 647 | #explicit strikeout 648 | # self.play['out'] -= 1 649 | 650 | if modifiers: 651 | if modifiers[0] == 'FO' and re.findall('[1-3B]X[1-3H](?:\([^\)]+\))*', self.advances): 652 | self.play['out'] -= 1 653 | 654 | if modifiers[0] == 'NDP' and re.findall('[B]\-[1-3H](?:\([^\)]+\))*', self.advances) and re.findall('[1-3B]X[1-3H](?:\([^\)]+\))*', self.advances): #no double play credited 655 | self.play['out'] -= 1 656 | 657 | if modifiers[0] == 'TH' and re.findall('[B]X[1-3H](?:\([^\)]+\))*', self.advances): 658 | self.play['out'] -= 1 659 | 660 | 661 | if modifiers[0] == 'C' and re.findall('[B]X[1-3H](?:\([^\)]+\))*', self.advances): 662 | self.play['out'] -= 1 663 | 664 | if modifiers[0] == 'C' and re.findall('[B]\-[1-3H](?:\([^\)]+\))*(?:\([^\)]*E[^\)]+\))(?:\([^\)]+\))?', self.advances): 665 | self.play['out'] -= 1 666 | 667 | if modifiers[0] =='DP' and re.findall('[B]X[1-3H](?:\([^\)]+\))*', self.advances) and not re.findall('[B]X[1-3H](?:\([^\)]+\))*(?:\([^\)]*E[^\)]+\))(?:\([^\)]+\))?', self.advances): 668 | self.play['out'] -= 1 669 | 670 | if modifiers[0] =='AP' and re.findall('[B]X[1-3H](?:\([^\)]+\))*', self.advances) and not re.findall('[B]X[1-3H](?:\([^\)]+\))*(?:\([^\)]*E[^\)]+\))(?:\([^\)]+\))?', self.advances): 671 | self.play['out'] -= 1 672 | #total_out = 2 673 | #advances_out = len(re.findall('[1-3B]X[1-3H](?:\([^\)]+\))*', self.advances)) - len(re.findall('[1-3B]X[1-3H](?:\([^\)]+\))*(?:\([^\)]*E[^\)]+\))(?:\([^\)]+\))?', self.advances)) 674 | #for out in range(total_out - advances_out): 675 | # self.play['out'] -= 1 676 | 677 | 678 | if re.findall('^K$',a) and modifiers[0] == 'MREV' or modifiers == 'UREV': 679 | if re.findall('[B]\-[1-3H]', self.advances): #base runner explicit, so no strikeout 680 | self.play['out'] -= 1 681 | 682 | 683 | elif re.findall('[B]X[1-3H](?:\([^\)]+\))*', self.advances) and not re.findall('[1-3B]X[1-3H](?:\([^\)]+\))*(?:\([^\)]*E[^\)]+\))(?:\([^\)]+\))?', self.advances): #Base explicit out 684 | self.play['out'] -= 1 685 | 686 | elif re.findall('^K$',a) and (re.findall('[B]\-[1-3H](?:\([^\)]+\))*', self.advances) or re.findall('[B]X[1-3H](?:\([^\)]+\))*(?:\([^\)]*E[^\)]+\))(?:\([^\)]+\))?', self.advances)): #strike but base runner advanced 687 | self.play['out'] -= 1 688 | 689 | other_event = a.split('+')[1] if len(a.split('+'))>1 else [] 690 | if other_event: 691 | # if its a wild pitch and base runner moves explicitly, decrease the out as its no longer a strike: 692 | if re.findall('[B]\-[1-3H](?:\([^\)]+\))*', self.advances): #Base advanced 693 | self.play['out'] -= 1 694 | base_advanced = re.findall('[B]\-[1-3H](?:\([^\)]+\))*', self.advances)[0][2] 695 | self.play[base_advanced] = 1 696 | 697 | elif re.findall('[B]X[1-3H](?:\([^\)]+\))*(?:\([^\)]*E[^\)]+\))(?:\([^\)]+\))?', self.advances): 698 | self.play['out'] -= 1 699 | base_advanced = re.findall('[B]X[1-3H](?:\([^\)]+\))*(?:\([^\)]*E[^\)]+\))(?:\([^\)]+\))?', self.advances)[0][2] 700 | self.play[base_advanced] = 1 701 | #elif re.findall('[B]X[1-3H](?:\([^\)]+\))*', self.advances): #Base advanced 702 | # self.play['out'] -= 1 703 | 704 | 705 | self._secondary_event(other_event) 706 | 707 | #No Play == substitution 708 | elif re.findall('^NP$',a): 709 | result = 'np' 710 | pass 711 | 712 | #Unkown Play 713 | elif re.findall('^(?:OA)?(?:99)?$',a): 714 | result = 'unkown' 715 | pass 716 | 717 | ## passed ball - B-1 implicit 718 | elif re.findall('^PB$', a): 719 | result = 'single' 720 | """keeping advancements explicit 721 | if not play['B'] == 0: 722 | play['1'] = 1 723 | play['B'] = 0 724 | """ 725 | ## PO%($$) picked off of base %(players sequence) (pickoff). Errors negate the out, and runner advance 726 | #without errors: 727 | elif re.findall('^PO[123](?:\([1-9]+\))',a): 728 | result = 'out' 729 | self.play[a[2]] = 0 730 | self.play['out'] += 1 731 | 732 | #only the errors (allowing to stay) 733 | elif re.findall('^PO[123](?:\([1-9]*E[1-9]+)',a): 734 | result = 'out error' 735 | pass #will keep explicit for now, but it usually shows one base advance. 736 | 737 | 738 | #POCS%($$) picked off off base % (2, 3 or H) with the runner charged with a caught stealing 739 | #without errors 740 | elif re.findall('^POCS[23H](?:\([1-9]+\))',a): 741 | result = 'out' 742 | for split in a.split(';'): #there are CS events together with POCS 743 | if split[0:2] == 'CS': 744 | self.play[split[3]] = 0 745 | self.play['out'] += 1 746 | else: #POCS 747 | self.play[split[4]] = 0 748 | self.play['out'] += 1 749 | 750 | #only the errors (allowing advances) 751 | elif re.findall('^POCS[23H](?:\([1-9]*E[1-9]+)',a): 752 | pass #will wait for explicit advances 753 | 754 | #single 755 | elif re.findall('^S[0-9]*\??\+?$',a): 756 | result = 'single' 757 | #print ('single') 758 | if not self.play['B'] == 0: 759 | self.play['1'] = 1 760 | self.play['B'] = 0 761 | 762 | #stolen base 763 | elif re.findall('^SB[23H]',a): 764 | result = 'sb' 765 | for sb in a.split(';'): 766 | if sb[0:2] == 'SB': 767 | self._advance(sb[2]) 768 | 769 | #tripple 770 | elif re.findall('^T[0-9]*\??\+?$',a): 771 | result = 'tripple' 772 | if not self.play['B'] == 0: 773 | #other advances explicit 774 | self.play['3'] = 1 775 | self.play['B'] = 0 776 | 777 | ## wild pitch - base runner advances 778 | elif re.findall('^WP', a): 779 | result = 'single' 780 | if not self.play['B'] == 0: 781 | self.play['1'] = 1 782 | self.play['B'] = 0 783 | 784 | elif re.findall('^C$', a): 785 | #What is "C" - strikeout ?? 786 | pass 787 | #result = 'single' 788 | #self.play['out'] += 1 789 | #self.play['B'] = 0 790 | 791 | else: 792 | raise EventNotFoundError('Event Not Known', a) 793 | 794 | #self._print_diamond() 795 | #print (result) 796 | return True 797 | 798 | 799 | def decipher(self): 800 | self.parse_advance() 801 | self.parse_event() 802 | -------------------------------------------------------------------------------- /retrosheet/event.py: -------------------------------------------------------------------------------- 1 | # encoding: utf-8 2 | 3 | import logging 4 | import re 5 | from .helpers import ( 6 | out_in_advance, advance_base, 7 | PREVIOUS_BASE, NEXT_BASE, pitch_count, 8 | move_base, leave_base 9 | ) 10 | 11 | 12 | class event(object): 13 | """ 14 | New event parsing class. This will worry only with the current event string. 15 | Any contextual information will be taken by the Game class (player_id, pitcher, etc). 16 | The objective is to map everything that happened, by all players, for quick 17 | reference. 18 | 19 | TODO:remove redundancies 20 | """ 21 | 22 | def __init__(self): 23 | 24 | self.log = logging.getLogger(__name__) 25 | self.str = 'NP' 26 | self.base = {'B': None,'1': None,'2': None,'3': None, 'H':[]} 27 | self.advances={'B': 1,'1': 0,'2': 0,'3': 0,'H': 0, 'out': 0,'run': 0} 28 | 29 | 30 | #def _initialize_modifiers(self): 31 | def _is_explicit(self, bfrom='B'): 32 | for em in self.em: 33 | if em[0][0]==bfrom: 34 | #self.log.debug('{0} is explicit'.format(bfrom)) 35 | return True 36 | #self.log.debug('{0} is not explicit'.format(bfrom)) 37 | return False 38 | 39 | def _modifiers(self, modifiers): 40 | """ 41 | """ 42 | ### Play Modifier: 43 | for mpm in modifiers: 44 | mpm = mpm.replace('#','').replace('-','').replace('+','')\ 45 | .replace('!','').replace('?','').upper() 46 | 47 | if re.findall('^[B]?[PUGFL]?DP$',mpm): #double play 48 | #self.main_play['out'] = 2 49 | self.modifiers['DP'] = True 50 | self.modifiers['bunt'] = 1 if mpm[0]=='B' else 0 51 | if self.modifiers['trajectory'] == '': 52 | self.modifiers['trajectory'] = mpm[1] if mpm[1] in ['PGFL'] else '' 53 | self.modifiers['trajectory'] = mpm[0] if mpm[0] in ['PGFL'] else '' 54 | elif re.findall('^[B]?[PUGFL]?TP$',mpm): #tripple play 55 | #self.main_play['out'] = 3 56 | self.modifiers['TP'] = True 57 | elif re.findall('^U[1-9]+', mpm): 58 | self.modifiers['passes'].append(mpm) 59 | elif re.findall('^[B]$',mpm): #tripple play 60 | self.modifiers['bunt'] = 1 61 | elif re.findall('^COU[BFR]$',mpm): #courtesy batter , fielder, runner 62 | self.modifiers['courtesy'] = mpm[3] 63 | elif re.findall('^[BFRU]?INT$', mpm): #interception 64 | self.modifiers['interference'] = mpm[0] if mpm[0] in ['B','F','R','U'] else '' 65 | elif re.findall('^[MU]REV$', mpm): #review 66 | self.modifiers['review'] = mpm[0] 67 | elif re.findall('^FL$', mpm): #foul 68 | self.modifiers['foul'] = 1 69 | elif re.findall('^FO$', mpm): #force out 70 | self.modifiers['force out']= 1 71 | elif re.findall('^TH[H]?[1-9\)]*$', mpm): #throw 72 | self.modifiers['throw']= 1 73 | elif re.findall('^S[FH]$', mpm): #sacrifice hit or fly 74 | self.modifiers['sacrifice']= mpm[1] 75 | self.modifiers['bunt'] = 1 if mpm[1]=='H' else 0 #sacrifice hit is a bunt 76 | elif re.findall('^[U]?[6]?R[0-9URNHBS]*(?:\(TH\))?$', mpm): #relay 77 | self.modifiers['relay'] = 1 78 | self.modifiers['passes'].append(mpm) 79 | if re.findall('TH',mpm): 80 | self.modifiers['throw'] = 1 81 | elif re.findall('^E[1-9]*$', mpm): #error on $ 82 | error = re.findall('^E[1-9]*$', mpm) 83 | if 'TH' in mpm: 84 | self.stats['fielding'].append(['E(TH)', error[0][1]]) 85 | else: 86 | self.stats['fielding'].append(['E', error[0][1]]) 87 | 88 | #self.modifiers['errors'].append(mpm[1]) if len(mpm)>1 else '' 89 | elif mpm in ['AP','BOOT','IPHR','NDP','BR','IF','OBS','PASS','C','U','RNT']: #other #U for unkown 90 | self.modifiers['other'].append(mpm) 91 | elif re.findall('^B?[PGFL][1-9MLRDXSF]?[1-9LRMXDSFW]*$',mpm): 92 | self.modifiers['bunt'] = 1 if mpm[0] =='B' else 0 93 | if self.modifiers['trajectory'] =='': 94 | self.modifiers['trajectory'] = mpm[1] if mpm[0] =='B' else mpm[0] 95 | if self.modifiers['location'] == '': 96 | self.modifiers['location'] = mpm[2:] if mpm[0] =='B' else mpm[1:] 97 | elif re.findall('^[BU]?[1-9MLRDXSF][1-9LRMXDSFW]*$' ,mpm): 98 | self.modifiers['bunt'] = 1 if mpm[0] =='B' else 0 99 | self.modifiers['location'] = mpm 100 | elif mpm == '' or mpm=='U4U1': 101 | pass 102 | else: 103 | self.log.debug('Event Not Known: {0}'.format(mpm)) 104 | 105 | 106 | def _advances(self): 107 | ### Explicit advances 108 | self.ad_out = 0 109 | 110 | for loop, move in enumerate(self.em): 111 | error = None 112 | error_loop = None 113 | #each element is a list 114 | move = move[0] #--> retrieve string 115 | bfrom = move[0] 116 | bto = move[2] 117 | if re.findall('X', move): 118 | #it could be on error or not 119 | was_out = None 120 | #if describer is numbers only, it was not an error. 121 | if self.ad[loop]: #there is a modifier 122 | 123 | for desc_loop, desc in enumerate(self.ad[loop]): 124 | if re.findall('^[1-9U]+$', desc): 125 | was_out = True 126 | was_out_loop =desc_loop 127 | 128 | if was_out:#re.findall('^[1-9U]+$', self.ad[loop][0]): 129 | #print ('was out') 130 | #print (bfrom, bto) 131 | self.advances = out_in_advance(self.advances, bfrom=bfrom, bto=bto) 132 | self.ad_out +=1 133 | 134 | self.base = leave_base(self.base, bfrom=bfrom) 135 | 136 | ########################### stats ############################## 137 | PO = re.findall('[1-9U]$',self.ad[loop][was_out_loop]) 138 | if PO: 139 | self.stats['fielding'].append(['PO',PO[-1]]) 140 | 141 | As = re.findall('[1-9U]+', self.ad[loop][was_out_loop]) 142 | if As: 143 | As = As[0] 144 | for a in As: 145 | self.stats['fielding'].append(['A',a]) if a not in PO else None 146 | 147 | self.stats['running'].append(['PO',bfrom, bto]) 148 | ########################### end ############################## 149 | passes = re.findall('[1-9U]+',self.ad[loop][was_out_loop]) 150 | #append pass sequence (for location purposes) 151 | self.modifiers['passes'].append(passes[0]) if passes else None 152 | 153 | else: 154 | for describer_loop, describer in enumerate(self.ad[loop]): 155 | if re.findall('[1-9]*E[1-9]',describer): 156 | error = re.findall('E[1-9]',describer)[0] 157 | error_loop = describer_loop 158 | 159 | if error: 160 | self.move_on_error.append(bto) 161 | self.advances = advance_base(self.advances, bfrom=bfrom, bto=bto) 162 | self.base = move_base( self.base, bfrom=bfrom, bto=bto) 163 | 164 | ########################### stats ############################## 165 | #error describer 166 | error_modifier = self.am[loop][error_loop][0] if self.am[loop][error_loop] else '' 167 | if re.sub('[1-9U]','', error_modifier) == 'TH': 168 | self.stats['fielding'].append(['E(TH)', error[-1]]) 169 | else: 170 | self.stats['fielding'].append(['E', error[-1]]) 171 | 172 | #append pass sequence (for location purposes) 173 | passes = re.sub('[^0-9]','', error+error_modifier) 174 | self.modifiers['passes'].append(passes) if passes else None 175 | 176 | if move[2] == 'H': 177 | run_describer = 'R' 178 | run_describer += '(UR)' if 'UR' in self.ad[loop] else '' 179 | run_describer += '(NR)' if 'NR' in self.ad[loop] else '' 180 | run_describer += '(RBI)' if 'RBI' in self.ad[loop] else '' 181 | run_describer += '(NORBI)' if 'NORBI' in self.ad[loop] else '' 182 | run_describer += '(TUR)' if 'TUR' in self.ad[loop] else '' 183 | 184 | self.stats['running'].append([run_describer,bfrom, bto]) 185 | 186 | ########################### end ############################## 187 | 188 | else: 189 | self.advances = out_in_advance(self.advances, bfrom=bfrom, bto=bto) 190 | self.base = leave_base(self.base, bfrom=bfrom) 191 | self.ad_out +=1 192 | ########################### stats ############################## 193 | PO = re.findall('[1-9U]$',self.ad[loop][0]) if self.ad[loop] else None 194 | if PO: 195 | self.stats['fielding'].append(['PO',PO[0]]) 196 | 197 | As = re.findall('[1-9U]+', self.ad[loop][0]) if self.ad[loop] else None 198 | if As: 199 | As = As[0] 200 | for a in As: 201 | self.stats['fielding'].append(['A',a]) if a not in PO else None 202 | 203 | self.stats['running'].append(['PO',bfrom, bto]) 204 | ########################### end ############################## 205 | passes = re.findall('[1-9U]+',self.ad[loop][0]) if self.ad[loop] else None 206 | #append pass sequence (for location purposes) 207 | self.modifiers['passes'].append(passes[0]) if passes else None 208 | 209 | #map other errors, if existing (remove error modifier loop) 210 | 211 | if len(self.ad[loop]) > 1: 212 | for loop2, describer in enumerate(self.ad[loop][1:]): 213 | if loop2 != error_loop: 214 | other_error = re.findall('E[1-9]',describer) 215 | if other_error: 216 | 217 | error_modifier = self.am[loop][loop2][0] if self.am[loop][loop2] else '' 218 | #print ('error modifier', error_modifier) 219 | if re.sub('[1-9U]','', error_modifier) == 'TH': 220 | self.stats['fielding'].append(['E(TH)', other_error[0][-1]]) 221 | else: 222 | self.stats['fielding'].append(['E', other_error[0][-1]]) 223 | 224 | #append pass sequence (for location purposes) 225 | passes = re.sub('[^0-9]','', other_error[0]+error_modifier) 226 | self.modifiers['passes'].append(passes) if passes else None 227 | ''' 228 | for describer_loop, describer in enumerate(self.ad[loop]): 229 | if re.findall('E[1-9]',describer): 230 | error = re.findall('E[1-9]',describer)[0] 231 | error_loop = describer_loop 232 | 233 | if error: 234 | self.advances = advance_base(self.advances, bfrom=bfrom, bto=bto) 235 | 236 | ########################### stats ############################## 237 | #error describer 238 | error_modifier = self.am[loop][error_loop][0] if self.am[loop][error_loop] else '' 239 | if re.sub('[1-9U]','', error_modifier) == 'TH': 240 | self.stats['fielding'].append(['E(TH)', error[1]]) 241 | else: 242 | self.stats['fielding'].append(['E', error[1]]) 243 | 244 | #append pass sequence (for location purposes) 245 | passes = re.sub('[^0-9]','', error+error_modifier) 246 | self.modifiers['passes'].append(passes) if passes else None 247 | 248 | if move[2] == 'H': 249 | run_describer = 'R' 250 | run_describer += '(UR)' if 'UR' in self.ad[loop] else '' 251 | run_describer += '(NR)' if 'NR' in self.ad[loop] else '' 252 | run_describer += '(RBI)' if 'RBI' in self.ad[loop] else '' 253 | run_describer += '(NORBI)' if 'NORBI' in self.ad[loop] else '' 254 | run_describer += '(TUR)' if 'TUR' in self.ad[loop] else '' 255 | 256 | self.stats['running'].append([run_describer,bfrom, bto]) 257 | 258 | ########################### end ############################## 259 | 260 | 261 | else: 262 | self.advances = out_in_advance(self.advances, bfrom=bfrom, bto=bto) 263 | 264 | ########################### stats ############################## 265 | PO = re.findall('[1-9U]$',self.ad[loop][0]) if self.ad[loop] else None 266 | if PO: 267 | self.stats['fielding'].append(['PO',PO[0]]) 268 | 269 | As = re.findall('[1-9U]+', self.ad[loop][0]) if self.ad[loop] else None 270 | if As: 271 | As = As[0] 272 | for a in As: 273 | self.stats['fielding'].append(['A',a]) if a not in PO else None 274 | 275 | self.stats['running'].append(['PO',bfrom, bto]) 276 | ########################### end ############################## 277 | passes = re.findall('[1-9U]+',self.ad[loop][0]) if self.ad[loop] else None 278 | #append pass sequence (for location purposes) 279 | self.modifiers['passes'].append(passes[0]) if passes else None 280 | 281 | #map other errors, if existing (remove error modifier loop) 282 | if len(self.ad[loop]) > 1: 283 | for loop2, describer in enumerate(self.ad[loop][1:]): 284 | other_error = re.findall('[1-9]*E[1-9]',describer) 285 | if other_error: 286 | 287 | error_modifier = self.am[loop][loop2][0] if self.am[loop][loop2] else '' 288 | #print ('error modifier', error_modifier) 289 | if re.sub('[1-9U]','', error_modifier) == 'TH': 290 | self.stats['fielding'].append(['E(TH)', other_error[0][-1]]) 291 | else: 292 | self.stats['fielding'].append(['E', other_error[0][-1]]) 293 | 294 | #append pass sequence (for location purposes) 295 | passes = re.sub('[^0-9]','', other_error[0]+error_modifier) 296 | self.modifiers['passes'].append(passes) if passes else None 297 | 298 | ''' 299 | elif re.findall('\-', move): 300 | bfrom = move[0] 301 | bto = move[2] 302 | self.advances = advance_base(self.advances, bfrom=bfrom, bto=bto) 303 | self.base = move_base(self.base, bfrom=bfrom, bto=bto) 304 | ########################### stats ############################## 305 | if bto == 'H': 306 | run_describer = 'R' 307 | run_describer += '(UR)' if 'UR' in self.ad[loop] else '' 308 | run_describer += '(NR)' if 'NR' in self.ad[loop] else '' 309 | run_describer += '(RBI)' if 'RBI' in self.ad[loop] else '' 310 | run_describer += '(NORBI)' if 'NORBI' in self.ad[loop] else '' 311 | run_describer += '(TUR)' if 'TUR' in self.ad[loop] else '' 312 | 313 | self.stats['running'].append([run_describer,bfrom, bto]) 314 | 315 | for describer_loop, describer in enumerate(self.ad[loop]): 316 | if re.findall('[1-9]*E[1-9]',describer): 317 | error = re.findall('E[1-9]',describer)[0] 318 | error_loop = describer_loop 319 | #print ('loop', loop,'error', error,'error loop', error_loop) 320 | 321 | if error: 322 | error_modifier = self.am[loop][error_loop][0] if self.am[loop][error_loop] else '' 323 | #print (self.am, self.str) 324 | 325 | if re.sub('[1-9U]','', error_modifier) == 'TH': 326 | self.stats['fielding'].append(['E(TH)', error[-1]]) 327 | else: 328 | self.stats['fielding'].append(['E', error[-1]]) 329 | 330 | #append pass sequence (for location purposes) 331 | passes = re.sub('[^0-9]','', error[0]+error_modifier) 332 | self.modifiers['passes'].append(passes) if passes else None 333 | 334 | ########################### end ############################## 335 | 336 | #map other errors, if existing (remove error modifier loop) 337 | if len(self.ad[loop]) > 1: 338 | 339 | for loop2, describer in enumerate(self.ad[loop]): 340 | if loop2 != error_loop: 341 | other_error = re.findall('[1-9]*E[1-9]',describer) 342 | if other_error: 343 | 344 | error_modifier = self.am[loop][loop2][0] if self.am[loop][loop2] else '' 345 | #print ('error modifier', error_modifier) 346 | if re.sub('[1-9U]','', error_modifier) == 'TH': 347 | self.stats['fielding'].append(['E(TH)', other_error[0][-1]]) 348 | else: 349 | self.stats['fielding'].append(['E', other_error[0][-1]]) 350 | 351 | #append pass sequence (for location purposes) 352 | passes = re.sub('[^0-9]','', other_error[0]+error_modifier) 353 | self.modifiers['passes'].append(passes) if passes else None 354 | 355 | else: 356 | self.log.debug('Explicit move not found: {0}'.format(move)) 357 | 358 | """""" 359 | def _play_null(self): 360 | self.main_play = out_in_advance(self.main_play, bfrom='B') if not self._is_explicit() else self.main_play#at bat is out 361 | self.base = leave_base(self.base, bfrom='B') if not self._is_explicit() else self.base 362 | 363 | def _play_flyout(self): 364 | if 'FO' not in mpm: 365 | self.main_play = out_in_advance(self.main_play, bfrom='B') if not self._is_explicit() else self.main_play#at bat is out 366 | self.base = leave_base(self.base, bfrom='B') if not self._is_explicit() else self.base 367 | 368 | if 'FO' in mpm and not re.findall('B', mp): 369 | self.main_play = advance_base(self.main_play, bfrom='B') if not self._is_explicit() else self.main_play #B-1 except if explicily moving on advances 370 | self.base = move_base(self.base, bfrom='B', bto='1') 371 | 372 | PO = mp[-1] 373 | As = mp[:-1] 374 | if As: 375 | for a in As: self.stats['fielding'].append(['A',a]) 376 | 377 | self.stats['batting'].append(['SF','']) if 'SF' in mpm else None 378 | self.stats['batting'].append(['SH','']) if 'SH' in mpm else None 379 | self.stats['batting'].append(['GDP','']) if 'GDP' in mpm else None 380 | 381 | passes = re.sub('(?:\([^\)]+\))','',mp) 382 | self.modifiers['passes'].append(passes) 383 | 384 | def _play_pass_outs(self): 385 | for base_out in re.findall('(?:\([B123]\))', mp): 386 | self.main_play = out_in_advance(self.main_play, bfrom=base_out[1]) #excluding at bat 387 | self.base = leave_base(self.base, bfrom=base_out[1]) 388 | 389 | if 'FO' in mpm and not re.findall('B', mp): 390 | self.main_play = advance_base(self.main_play, bfrom='B') if not self._is_explicit() else self.main_play #B-1 except if explicily moving on advances 391 | self.base = move_base(self.base, bfrom='B', bto='1') 392 | 393 | 394 | #Testing for double play 395 | double_play = False 396 | triple_play = False 397 | 398 | if 'BGDP' in mpm or 'BPDP' in mpm or 'DP' in mpm or 'FDP'in mpm or 'GDP' in mpm or 'LDP' in mpm: 399 | double_play = True 400 | 401 | if 'BGTP' in mpm or 'BPTP' in mpm or 'TP' in mpm or 'FTP' in mpm or 'GTP' in mpm or 'LTP' in mpm: 402 | triple_play = True 403 | 404 | 405 | if double_play and self.main_play['out'] + self.ad_out < 2: 406 | if 'FO' not in mpm: 407 | self.main_play = out_in_advance(self.main_play, bfrom='B') if not self._is_explicit() else self.main_play#at bat is out 408 | self.base = leave_base(self.base, bfrom='B') if not self._is_explicit() else self.base 409 | else: 410 | self.base = move_base(self.base, bfrom='B', bto='1') if not self._is_explicit() else self.base 411 | 412 | if triple_play and self.main_play['out'] + self.ad_out < 3: 413 | if 'FO' not in mpm: 414 | self.main_play = out_in_advance(self.main_play, bfrom='B') if not self._is_explicit() else self.main_play#at bat is out 415 | self.base = leave_base(self.base, bfrom='B') if not self._is_explicit() else self.base 416 | else: 417 | self.base = move_base(self.base, bfrom='B', bto='1') if not self._is_explicit() else self.base 418 | 419 | ########################### stats ############################## 420 | fielder1 = re.findall('^[1-9]$', mp) #flyball, not always present 421 | fielders2 = re.findall('[1-9]\(', mp)#$$()$ play, with explicit outs 422 | fielders2 = [x.replace('(','') for x in fielders2] if fielders2 else [] 423 | fielders3 = re.findall('^[1-9][1-9]+$', mp) #when its a sequence and out 424 | fielders3 = [fielders3[0][-1]] if fielders3 else [] 425 | fielders4 = [mp[-1]] if re.findall('[1-9]$', mp) and 'GDP' in mpm else [] #it was a Ground into Double Play 426 | 427 | POs = fielder1 + fielders2 + fielders3 + fielders4 428 | 429 | double_play = False 430 | triple_play = False 431 | 432 | if 'BGDP' in mpm or 'BPDP' in mpm or 'DP' in mpm or 'FDP'in mpm or 'GDP' in mpm or 'LDP' in mpm: 433 | double_play = True 434 | 435 | if 'BGTP' in mpm or 'BPTP' in mpm or 'TP' in mpm or 'FTP' in mpm or 'GTP' in mpm or 'LTP' in mpm: 436 | triple_play = True 437 | 438 | for po in POs: 439 | self.stats['fielding'].append(['PO',po[0]]) 440 | self.stats['fielding'].append(['DP',po[0]]) if double_play else None 441 | self.stats['fielding'].append(['TP',po[0]]) if triple_play else None 442 | 443 | all_fielders_touched = re.sub(r'\([^)]*\)', '', mp) 444 | for fielder in all_fielders_touched: 445 | if fielder not in POs: 446 | self.stats['fielding'].append(['A',fielder]) 447 | 448 | 449 | self.stats['batting'].append(['SF','']) if 'SF' in mpm else None 450 | self.stats['batting'].append(['SH','']) if 'SH' in mpm else None 451 | self.stats['batting'].append(['GDP','']) if 'GDP' in mpm else None 452 | 453 | passes = re.sub('(?:\([^\)]+\))','',mp) 454 | self.modifiers['passes'].append(passes) 455 | 456 | def _play_error_on_out(self): 457 | self.main_play = advance_base(self.main_play, bfrom='B') if not self._is_explicit() else self.main_play #B-1 except if explicily moving on advances 458 | self.base = move_base(self.base, bfrom='B', bto='1') if not self._is_explicit() else self.base #B-1 except if explicily moving on advances 459 | 460 | ########################### stats ############################## 461 | error_fielder = re.findall('E[1-9]$', mp)[0] 462 | self.stats['fielding'].append(['E',error_fielder[1]]) 463 | 464 | def _play_cs(self): 465 | for cs in mp.split(';'): 466 | bto = cs[2] 467 | bfrom = PREVIOUS_BASE[cs[2]] 468 | self.main_play = out_in_advance(self.main_play, bto=cs[2]) 469 | self.base = leave_base(self.base, bfrom=bfrom) if not self._is_explicit() else self.base 470 | 471 | ########################### stats ############################## 472 | self.stats['running'].append(['CS',bfrom, bto]) 473 | 474 | PO = re.findall('[1-9]\)', cs) 475 | if PO: 476 | PO = PO[0].replace(')','') 477 | self.stats['fielding'].append(['PO',PO[0]]) 478 | 479 | As = re.findall('(?:\([^\(]+\))', cs) 480 | if As: 481 | As = As[0].replace('(','').replace(')','') 482 | for a in As: 483 | if a not in PO: 484 | self.stats['fielding'].append(['A',a]) 485 | 486 | passes = re.sub('CS[23H]','', cs).replace('(','').replace(')','').replace('E','') 487 | if passes: 488 | self.modifiers['passes'].append(passes) 489 | ########################### end ################################ 490 | 491 | def _play_cs_error(self): 492 | #the advance could also be explicit given the error, for more than one base. 493 | for cs in mp.split(';'): 494 | bto = cs[2] 495 | bfrom = PREVIOUS_BASE[cs[2]] 496 | self.main_play = advance_base(self.main_play, bto=bto) if not self._is_explicit(bfrom=bfrom) else self.main_play 497 | self.base = move_base(self.base, bfrom=bfrom, bto=bto) if not self._is_explicit(bfrom=bfrom) else self.base #B-1 except if explicily moving on advances 498 | 499 | ########################### stats ############################## 500 | self.stats['running'].append(['CS(E)',bfrom, bto]) #caught stealing w error 501 | 502 | As = re.findall('^(?:\([1-9]+E)+', cs) 503 | if As: 504 | As = As[0].replace('E','').replace('(','') 505 | for a in As: 506 | self.stats['fielding'].append(['A',a]) 507 | 508 | 509 | error_fielder = re.findall('E[1-9]', cs)[0] 510 | self.stats['fielding'].append(['E',error_fielder[1]]) 511 | 512 | passes = re.sub('CS[23H]','', cs).replace('(','').replace(')','').replace('E','') 513 | if passes: 514 | self.modifiers['passes'].append(passes) 515 | ########################### end ################################ 516 | 517 | def _play_balk(self): 518 | self.stats['pitching'].append(['BK','1']) 519 | 520 | def _play_double(self): 521 | self.main_play = advance_base(self.main_play, bto='2',bfrom='B') if not self._is_explicit() else self.main_play 522 | self.base = move_base(self.base, bfrom='B', bto='2') if not self._is_explicit() else self.base #B-1 except if explicily moving on advances 523 | ########################### stats ############################## 524 | self.stats['batting'].append(['2B','']) 525 | self.stats['batting'].append(['H','']) #hit 526 | self.stats['pitching'].append(['H','1']) 527 | 528 | passes = re.findall('[0-9]', mp) 529 | if passes: 530 | self.modifiers['passes'].append(passes[0]) 531 | ########################### end ################################ 532 | 533 | def _play_grd(self): #ground rule double 534 | self.main_play = advance_base(self.main_play, bto='2',bfrom='B') if not self._is_explicit() else self.main_play 535 | self.base = move_base(self.base, bfrom='B', bto='2') if not self._is_explicit() else self.base #B-1 except if explicily moving on advances 536 | ########################### stats ############################## 537 | self.stats['batting'].append(['DGR','']) 538 | self.stats['batting'].append(['H','']) #hit 539 | self.stats['pitching'].append(['H','1']) 540 | 541 | passes = re.findall('[0-9]+', mp) 542 | if passes: 543 | self.modifiers['passes'].append(passes[0]) 544 | 545 | def _play_di(self): #defensive indiference 546 | ########################### stats ############################## 547 | for explicit_move in self.em: 548 | bto = explicit_move[0][2] 549 | bfrom = explicit_move[0][0] 550 | self.stats['running'].append(['DI',bfrom, bto]) 551 | ########################### end ################################ 552 | 553 | def _play_error2(self): 554 | self.main_play = advance_base(self.main_play, bfrom='B') if not self._is_explicit() else self.main_play 555 | self.base = move_base(self.base, bfrom='B',bto='1') if not self._is_explicit() else self.base #B-1 except if explicily moving on advances 556 | ########################### stats ############################## 557 | error_fielder = re.findall('E[1-9]$', mp)[0] 558 | if 'TH' in mpm: #throwing error 559 | self.stats['fielding'].append(['E(TH)',error_fielder[1]]) 560 | else: 561 | self.stats['fielding'].append(['E',error_fielder[1]]) 562 | 563 | passes = re.findall('[0-9]+', mp) 564 | if passes: 565 | self.modifiers['passes'].append(passes[0]) 566 | ########################### end ################################ 567 | 568 | def _play_fc(self): #fielder's choice 569 | self.main_play = advance_base(self.main_play, bfrom='B') if not self._is_explicit() else self.main_play 570 | self.base = move_base(self.base, bfrom='B', bto='1') if not self._is_explicit() else self.base #B-1 except if explicily moving on advances 571 | ########################### stats ############################## 572 | self.stats['batting'].append(['FC','']) 573 | if len(mp) > 2: 574 | self.stats['fielding'].append(['FC',mp[2]]) 575 | self.modifiers['passes'].append(mp[2]) 576 | ########################### end ################################ 577 | 578 | def _play_fle(self): # error on foul fly play (error given to the play but no advances) 579 | ########################### stats ############################## 580 | self.stats['fielding'].append(['FLE',mp[3]]) 581 | ########################### end ################################ 582 | 583 | def _play_home_run(self): 584 | self.main_play = advance_base(self.main_play, bto='H',bfrom='B') if not self._is_explicit() else self.main_play 585 | self.base = move_base(self.base, bfrom='B', bto='H') if not self._is_explicit() else self.base #B-1 except if explicily moving on advances 586 | ########################### stats ############################## 587 | self.stats['running'].append(['R','B', 'H']) 588 | 589 | self.stats['batting'].append(['HR','']) #home run 590 | self.stats['pitching'].append(['HR','1']) 591 | 592 | self.stats['batting'].append(['H','']) #hit 593 | self.stats['pitching'].append(['H','1']) 594 | 595 | self.stats['batting'].append(['R','']) #run 596 | 597 | if 'IPHR' in mpm: 598 | self.stats['batting'].append(['IPHR','']) 599 | ########################### end ################################ 600 | 601 | def _play_hb(self): 602 | self.main_play = advance_base(self.main_play, bfrom='B') if not self._is_explicit() else self.main_play 603 | self.base = move_base(self.base, bfrom='B', bto='1') if not self._is_explicit() else self.base #B-1 except if explicily moving on advances 604 | ########################### stats ############################## 605 | self.stats['batting'].append(['HBP','']) #hit by pitch 606 | ########################### end ################################ 607 | 608 | def _play_walk(self): 609 | self.main_play = advance_base(self.main_play, bfrom='B') if not self._is_explicit() else self.main_play 610 | self.base = move_base(self.base, bfrom='B', bto='1') if not self._is_explicit() else self.base #B-1 except if explicily moving on advances 611 | ########################### stats ############################## 612 | self.stats['batting'].append(['BB','']) #base on balls 613 | self.stats['pitching'].append(['BB','1']) #base on balls 614 | ########################### end ################################ 615 | 616 | def _play_iwalk(self): 617 | self.main_play = advance_base(self.main_play, bfrom='B') if not self._is_explicit() else self.main_play 618 | self.base = move_base(self.base, bfrom='B', bto='1') if not self._is_explicit() else self.base #B-1 except if explicily moving on advances 619 | ########################### stats ############################## 620 | self.stats['batting'].append(['IBB','']) #base on balls 621 | self.stats['pitching'].append(['IBB','1']) #base on balls 622 | ########################### end ################################ 623 | 624 | def _play_strikeout(self): 625 | self.main_play = out_in_advance(self.main_play, bfrom='B') if not self._is_explicit() else self.main_play 626 | self.base = leave_base(self.base, bfrom='B') if not self._is_explicit() else self.base 627 | 628 | ########################### stats ############################## 629 | self.stats['batting'].append(['K','']) #strikeout 630 | self.stats['fielding'].append(['PO','2']) #strikeout 631 | self.stats['pitching'].append(['K','1']) #strikeout 632 | self.stats['batting'].append(['SF','']) if 'SF' in mpm else None 633 | self.stats['batting'].append(['SH','']) if 'SH' in mpm else None 634 | ########################### end ################################ 635 | 636 | def _play_pb(self): 637 | ########################### stats ############################## 638 | self.stats['fielding'].append(['PB','2']) 639 | ########################### end ################################ 640 | 641 | def _play_po(self): 642 | bfrom = mp[2] 643 | bto = NEXT_BASE[mp[2]] 644 | 645 | self.main_play = out_in_advance(self.main_play, bfrom=bfrom) if not self._is_explicit(bfrom) else self.main_play 646 | self.base = leave_base(self.base, bfrom=bfrom) if not self._is_explicit() else self.base 647 | 648 | ########################### stats ############################## 649 | PO = re.findall('[1-9]\)', mp) 650 | if PO: 651 | PO = PO[0].replace(')','') 652 | self.stats['fielding'].append(['PO',PO[0]]) 653 | 654 | As = re.findall('(?:\([^\(]+\))', mp) 655 | if As: 656 | As = As[0].replace('(','').replace(')','') 657 | for a in As: 658 | if a not in PO: 659 | self.stats['fielding'].append(['A',a]) 660 | 661 | passes = re.sub('PO[123]\(','', mp).replace(')','').replace('E','') 662 | self.modifiers['passes'].append(passes) 663 | 664 | 665 | self.stats['running'].append(['PO',bfrom, bfrom]) #player never moved base 666 | ########################### end ################################ 667 | 668 | def _play_po_error(self): 669 | ########################### stats ############################## 670 | bfrom = mp[2] 671 | bto = NEXT_BASE[mp[2]] 672 | self.stats['running'].append(['PO(E)',bfrom, bto]) 673 | As = re.findall('^(?:\([1-9]+E)+', mp) #assists to other players 674 | if As: 675 | As = As[0].replace('E','').replace('(','') 676 | for a in As: 677 | self.stats['fielding'].append(['A',a]) 678 | 679 | passes = re.sub('PO[123]\(','', mp).replace(')','').replace('E','') 680 | self.modifiers['passes'].append(passes) 681 | 682 | error_fielder = re.findall('E[1-9]', mp)[0] 683 | self.stats['fielding'].append(['E',error_fielder[1]]) 684 | ########################### end ################################ 685 | 686 | def _play_pocs(self): 687 | for split in mp.split(';'): 688 | if split[0:2] == 'CS': 689 | bto = split[2] 690 | bfrom = PREVIOUS_BASE[split[2]] 691 | self.main_play = out_in_advance(self.main_play, bto=bto) if not self._is_explicit(bfrom) else self.main_play 692 | self.base = leave_base(self.base, bfrom=bfrom) if not self._is_explicit() else self.base 693 | self.stats['running'].append(['CS',bfrom, bto]) 694 | else: 695 | bto = split[4] 696 | bfrom = PREVIOUS_BASE[split[4]] 697 | out_in_advance( self.main_play, bto=bto) if not self._is_explicit(bfrom) else self.main_play #there are CS events together with POCS 698 | self.base = leave_base(self.base, bfrom=bfrom) if not self._is_explicit() else self.base 699 | self.stats['running'].append(['CS',bfrom, bto]) 700 | 701 | ########################### stats ############################## 702 | 703 | PO = re.findall('[1-9]\)', split) 704 | if PO: 705 | PO = PO[0].replace(')','') 706 | self.stats['fielding'].append(['PO',PO[0]]) 707 | 708 | As = re.findall('(?:\([^\(]+\))', split) 709 | if As: 710 | As = As[0].replace('(','').replace(')','') 711 | for a in As: 712 | if a not in PO: 713 | self.stats['fielding'].append(['A',a]) 714 | 715 | passes = re.sub('POCS[123]\(','', mp).replace(')','').replace('E','') 716 | self.modifiers['passes'].append(passes) 717 | ########################### end ################################ 718 | 719 | def _play_pocs_error(self): 720 | ########################### stats ############################## 721 | bto = mp[4] 722 | bfrom = PREVIOUS_BASE[mp[4]] 723 | self.stats['running'].append(['CS(E)',bfrom, bto]) 724 | 725 | As = re.findall('^(?:\([1-9]+E)+', mp) #assists to other players 726 | if As: 727 | As = As[0].replace('E','').replace('(','') 728 | for a in As: 729 | self.stats['fielding'].append(['A',a]) 730 | 731 | error_fielder = re.findall('E[1-9]', mp)[0] 732 | self.stats['fielding'].append(['E',error_fielder[1]]) 733 | 734 | passes = re.sub('POCS[123]\(','', mp).replace(')','').replace('E','') 735 | self.modifiers['passes'].append(passes) 736 | ########################### end ################################ 737 | 738 | def _play_single(self): 739 | self.main_play = advance_base(self.main_play, bfrom='B') if not self._is_explicit() else self.main_play 740 | self.base = move_base(self.base, bfrom='B', bto='1') if not self._is_explicit() else self.base #B-1 except if explicily moving on advances 741 | ########################### stats ############################## 742 | self.stats['batting'].append(['1B','']) #single 743 | self.stats['batting'].append(['H','']) #hit 744 | self.stats['pitching'].append(['H','1']) 745 | 746 | passes = re.findall('[0-9]', mp) 747 | if passes: 748 | self.modifiers['passes'].append(passes[0]) 749 | ########################### end ################################ 750 | 751 | def _play_stolen_base(self): 752 | for sb in mp.split(';'): 753 | if sb[0:2] == 'SB': 754 | bto = sb[2] 755 | bfrom = PREVIOUS_BASE[sb[2]] 756 | self.main_play = advance_base(self.main_play, bto=sb[2]) if not self._is_explicit(bfrom) else self.main_play 757 | self.base = move_base(self.base, bfrom=bfrom, bto=bto) if not self._is_explicit(bfrom) else self.base #B-1 except if explicily moving on advances 758 | ########################### stats ############################## 759 | self.stats['running'].append(['SB',bfrom, bto]) 760 | self.stats['running'].append(['R',bfrom, bto]) if sb[2] == 'H' else None 761 | ########################### end ################################ 762 | 763 | def _play_triple(self): 764 | self.main_play = advance_base(self.main_play, bfrom='B', bto='3') if not self._is_explicit() else self.main_play 765 | self.base = move_base(self.base, bfrom='B', bto='3') if not self._is_explicit() else self.base #B-1 except if explicily moving on advances 766 | ########################### stats ############################## 767 | self.stats['batting'].append(['3B','']) 768 | self.stats['batting'].append(['H','']) #hit 769 | 770 | passes = re.findall('[0-9]', mp) 771 | if passes: 772 | self.modifiers['passes'].append(passes[0]) 773 | ########################### end ################################ 774 | 775 | def _play_wp(self): 776 | ########################### stats ############################## 777 | self.stats['pitching'].append(['WP','1']) 778 | ########################### end ################################ 779 | 780 | def _play_ci(self): 781 | if 'E1' in mpm : 782 | ########################### stats ############################## 783 | self.stats['fielding'].append(['E','1']) 784 | ########################### end ################################ 785 | elif 'E2' in mpm: 786 | ########################### stats ############################## 787 | self.stats['fielding'].append(['CI','2']) 788 | ########################### end ################################ 789 | elif 'E3' in mpm: 790 | ########################### stats ############################## 791 | self.stats['fielding'].append(['E','3']) 792 | ########################### end ################################ 793 | 794 | 795 | 796 | def _main_play(self, mp, mpm): 797 | """Parse main play""" 798 | 799 | if mp == '99': #error or unknown --> usually out 800 | self.main_play = out_in_advance(self.main_play, bfrom='B') if not self._is_explicit() else self.main_play#at bat is out 801 | self.base = leave_base(self.base, bfrom='B') if not self._is_explicit() else self.base 802 | 803 | elif re.findall('^[1-9]', mp) and not re.findall('\(', mp) and not re.findall('E', mp): 804 | 805 | #single out, or without multiple plays 806 | 807 | if 'FO' not in mpm: 808 | self.main_play = out_in_advance(self.main_play, bfrom='B') if not self._is_explicit() else self.main_play#at bat is out 809 | self.base = leave_base(self.base, bfrom='B') if not self._is_explicit() else self.base 810 | 811 | if 'FO' in mpm and not re.findall('B', mp): 812 | self.main_play = advance_base(self.main_play, bfrom='B') if not self._is_explicit() else self.main_play #B-1 except if explicily moving on advances 813 | self.base = move_base(self.base, bfrom='B', bto='1') 814 | 815 | PO = mp[-1] 816 | As = mp[:-1] 817 | if As: 818 | for a in As: self.stats['fielding'].append(['A',a]) 819 | 820 | self.stats['batting'].append(['SF','']) if 'SF' in mpm else None 821 | self.stats['batting'].append(['SH','']) if 'SH' in mpm else None 822 | self.stats['batting'].append(['GDP','']) if 'GDP' in mpm else None 823 | 824 | passes = re.sub('(?:\([^\)]+\))','',mp) 825 | self.modifiers['passes'].append(passes) 826 | 827 | 828 | elif re.findall('^[1-9](?:[1-9]*(?:\([B123]\))?)*\+?\-?$', mp): # implicit B out or not 829 | 830 | for base_out in re.findall('(?:\([B123]\))', mp): 831 | expression = '[\-]{0}'.format(base_out[1]) 832 | moves = self.str.split('.')[len(self.str.split('.'))-1] 833 | if not re.findall(expression, self.str.split('.')[len(self.str.split('.'))-1]) and base_out[1] not in self.move_on_error: #a player moved to that base in advaances 834 | self.main_play = out_in_advance(self.main_play, bfrom=base_out[1]) #excluding at bat 835 | self.base = leave_base(self.base, bfrom=base_out[1]) 836 | else: 837 | self.main_play['out'] += 1 838 | #self.base = leave_base(self.base, bfrom=base_out[1]) 839 | 840 | if 'FO' in mpm and not re.findall('B', mp): 841 | self.main_play = advance_base(self.main_play, bfrom='B') if not self._is_explicit() else self.main_play #B-1 except if explicily moving on advances 842 | self.base = move_base(self.base, bfrom='B', bto='1') if not self._is_explicit() else self.base 843 | 844 | 845 | #Testing for double play 846 | double_play = False 847 | triple_play = False 848 | 849 | if 'BGDP' in mpm or 'BPDP' in mpm or 'DP' in mpm or 'FDP'in mpm or 'GDP' in mpm or 'LDP' in mpm: 850 | double_play = True 851 | 852 | if 'BGTP' in mpm or 'BPTP' in mpm or 'TP' in mpm or 'FTP' in mpm or 'GTP' in mpm or 'LTP' in mpm: 853 | triple_play = True 854 | 855 | if double_play and not re.findall('B', mp) and (self.main_play['out'] + self.ad_out) == 2: 856 | # E.G: 5(2)4(1)/GDP --> b advanced to first 857 | self.main_play = advance_base(self.main_play, bfrom='B') if not self._is_explicit() else self.main_play 858 | self.base = move_base(self.base, bfrom='B',bto='1') if not self._is_explicit() else self.base 859 | 860 | if not double_play and not re.findall('B', mp): 861 | # E.G.: 16(1)/F 862 | self.main_play = advance_base(self.main_play, bfrom='B') if not self._is_explicit() else self.main_play 863 | self.base = move_base(self.base, bfrom='B',bto='1') if not self._is_explicit() else self.base 864 | 865 | if double_play and self.main_play['out'] + self.ad_out < 2: 866 | if 'FO' not in mpm: 867 | self.main_play = out_in_advance(self.main_play, bfrom='B') if not self._is_explicit() else self.main_play#at bat is out 868 | self.base = leave_base(self.base, bfrom='B') if not self._is_explicit() else self.base 869 | else: 870 | self.base = move_base(self.base, bfrom='B', bto='1') if not self._is_explicit() else self.base 871 | 872 | 873 | if triple_play and self.main_play['out'] + self.ad_out < 3: 874 | if 'FO' not in mpm: 875 | self.main_play = out_in_advance(self.main_play, bfrom='B') if not self._is_explicit() else self.main_play#at bat is out 876 | self.base = leave_base(self.base, bfrom='B') if not self._is_explicit() else self.base 877 | else: 878 | self.base = move_base(self.base, bfrom='B', bto='1') if not self._is_explicit() else self.base 879 | 880 | ########################### stats ############################## 881 | fielder1 = re.findall('^[1-9]$', mp) #flyball, not always present 882 | fielders2 = re.findall('[1-9]\(', mp)#$$()$ play, with explicit outs 883 | fielders2 = [x.replace('(','') for x in fielders2] if fielders2 else [] 884 | fielders3 = re.findall('^[1-9][1-9]+$', mp) #when its a sequence and out 885 | fielders3 = [fielders3[0][-1]] if fielders3 else [] 886 | fielders4 = [mp[-1]] if re.findall('[1-9]$', mp) and 'GDP' in mpm else [] #it was a Ground into Double Play 887 | 888 | POs = fielder1 + fielders2 + fielders3 + fielders4 889 | 890 | double_play = False 891 | triple_play = False 892 | 893 | if 'BGDP' in mpm or 'BPDP' in mpm or 'DP' in mpm or 'FDP'in mpm or 'GDP' in mpm or 'LDP' in mpm: 894 | double_play = True 895 | 896 | if 'BGTP' in mpm or 'BPTP' in mpm or 'TP' in mpm or 'FTP' in mpm or 'GTP' in mpm or 'LTP' in mpm: 897 | triple_play = True 898 | 899 | for po in POs: 900 | self.stats['fielding'].append(['PO',po[0]]) 901 | self.stats['fielding'].append(['DP',po[0]]) if double_play else None 902 | self.stats['fielding'].append(['TP',po[0]]) if triple_play else None 903 | 904 | all_fielders_touched = re.sub(r'\([^)]*\)', '', mp) 905 | for fielder in all_fielders_touched: 906 | if fielder not in POs: 907 | self.stats['fielding'].append(['A',fielder]) 908 | 909 | 910 | self.stats['batting'].append(['SF','']) if 'SF' in mpm else None 911 | self.stats['batting'].append(['SH','']) if 'SH' in mpm else None 912 | self.stats['batting'].append(['GDP','']) if 'GDP' in mpm else None 913 | 914 | passes = re.sub('(?:\([^\)]+\))','',mp) 915 | self.modifiers['passes'].append(passes) 916 | ########################### end ############################## 917 | 918 | elif re.findall('^[1-9][1-9]*E[1-9]*$', mp): #error on out, B-1 implicit if not explicit 919 | self.main_play = advance_base(self.main_play, bfrom='B') if not self._is_explicit() else self.main_play #B-1 except if explicily moving on advances 920 | self.base = move_base(self.base, bfrom='B', bto='1') if not self._is_explicit() else self.base #B-1 except if explicily moving on advances 921 | 922 | ########################### stats ############################## 923 | error_fielder = re.findall('E[1-9]$', mp)[0] 924 | self.stats['fielding'].append(['E',error_fielder[1]]) 925 | ########################### end ############################## 926 | 927 | elif re.findall('^CS[23H](?:\([1-9]+\))+', mp):##caught stealing (except errors): 928 | for cs in mp.split(';'): 929 | bto = cs[2] 930 | bfrom = PREVIOUS_BASE[cs[2]] 931 | 932 | if re.findall('[\-X]{0}'.format(bfrom), self.str.split('.')[len(self.str.split('.'))-1]): 933 | 934 | self.main_play['out'] += 1 935 | else: 936 | 937 | self.main_play = out_in_advance(self.main_play, bto=cs[2]) 938 | self.base = leave_base(self.base, bfrom=bfrom) if not self._is_explicit(bfrom) else self.base 939 | 940 | ########################### stats ############################## 941 | self.stats['running'].append(['CS',bfrom, bto]) 942 | 943 | PO = re.findall('[1-9]\)', cs) 944 | if PO: 945 | PO = PO[0].replace(')','') 946 | self.stats['fielding'].append(['PO',PO[0]]) 947 | 948 | As = re.findall('(?:\([^\(]+\))', cs) 949 | if As: 950 | As = As[0].replace('(','').replace(')','') 951 | for a in As: 952 | if a not in PO: 953 | self.stats['fielding'].append(['A',a]) 954 | 955 | passes = re.sub('CS[23H]','', cs).replace('(','').replace(')','').replace('E','') 956 | if passes: 957 | self.modifiers['passes'].append(passes) 958 | ########################### end ################################ 959 | 960 | elif re.findall('^CS[23H](?:\([1-9]*E[1-9]+)+', mp): ## caught stealing errors 961 | #the advance could also be explicit given the error, for more than one base. 962 | for cs in mp.split(';'): 963 | bto = cs[2] 964 | bfrom = PREVIOUS_BASE[cs[2]] 965 | 966 | if not self._is_explicit(bfrom): 967 | if re.findall('[\-X]{0}'.format(bfrom), self.str.split('.')[len(self.str.split('.'))-1]): 968 | self.main_play[bto] = 1 969 | if bto == 'H' or bfrom == '3': 970 | self.main_play['run'] += 1 971 | 972 | if bto=='H': 973 | self.base[bto].append(self.base[bfrom]) 974 | else: 975 | self.base[bto] = self.base[bfrom] 976 | 977 | else: 978 | self.main_play = advance_base(self.main_play, bto=bto) 979 | self.base = move_base(self.base, bfrom=bfrom, bto=bto) 980 | 981 | ########################### stats ############################## 982 | self.stats['running'].append(['CS(E)',bfrom, bto]) #caught stealing w error 983 | 984 | As = re.findall('^(?:\([1-9]+E)+', cs) 985 | if As: 986 | As = As[0].replace('E','').replace('(','') 987 | for a in As: 988 | self.stats['fielding'].append(['A',a]) 989 | 990 | 991 | error_fielder = re.findall('E[1-9]', cs)[0] 992 | self.stats['fielding'].append(['E',error_fielder[1]]) 993 | 994 | passes = re.sub('CS[23H]','', cs).replace('(','').replace(')','').replace('E','') 995 | if passes: 996 | self.modifiers['passes'].append(passes) 997 | ########################### end ################################ 998 | 999 | 1000 | elif re.findall('^BK$', mp):# balk (batter remains but all other get one base) 1001 | 1002 | ########################### stats ############################## 1003 | self.stats['pitching'].append(['BK','1']) 1004 | ########################### end ################################ 1005 | 1006 | elif re.findall('^D[0-9]*\??$', mp): #double 1007 | self.main_play = advance_base(self.main_play, bto='2',bfrom='B') if not self._is_explicit() else self.main_play 1008 | self.base = move_base(self.base, bfrom='B', bto='2') if not self._is_explicit() else self.base #B-1 except if explicily moving on advances 1009 | ########################### stats ############################## 1010 | self.stats['batting'].append(['2B','']) 1011 | self.stats['batting'].append(['H','']) #hit 1012 | self.stats['pitching'].append(['H','1']) 1013 | 1014 | passes = re.findall('[0-9]', mp) 1015 | if passes: 1016 | self.modifiers['passes'].append(passes[0]) 1017 | ########################### end ################################ 1018 | 1019 | elif re.findall('^DGR[0-9]*$', mp): #ground rule double (two bases for everyone as ball went out after being in) 1020 | self.main_play = advance_base(self.main_play, bto='2',bfrom='B') if not self._is_explicit() else self.main_play 1021 | self.base = move_base(self.base, bfrom='B', bto='2') if not self._is_explicit() else self.base #B-1 except if explicily moving on advances 1022 | ########################### stats ############################## 1023 | self.stats['batting'].append(['DGR','']) 1024 | self.stats['batting'].append(['H','']) #hit 1025 | self.stats['pitching'].append(['H','1']) 1026 | 1027 | passes = re.findall('[0-9]+', mp) 1028 | if passes: 1029 | self.modifiers['passes'].append(passes[0]) 1030 | ########################### end ################################ 1031 | 1032 | elif re.findall('^DI$', mp): #defensive indifference 1033 | 1034 | ########################### stats ############################## 1035 | for explicit_move in self.em: 1036 | bto = explicit_move[0][2] 1037 | bfrom = explicit_move[0][0] 1038 | self.stats['running'].append(['DI',bfrom, bto]) 1039 | ########################### end ################################ 1040 | 1041 | elif re.findall('^E[1-9]+\??$', mp): ## error allowing batter to get on base (B-1 implicit or not) 1042 | 1043 | if not re.findall('K', self.mp[0]): #it is an error but not on second event following strike 1044 | self.main_play = advance_base(self.main_play, bfrom='B') if not self._is_explicit() else self.main_play 1045 | self.base = move_base(self.base, bfrom='B',bto='1') if not self._is_explicit() else self.base #B-1 except if explicily moving on advances 1046 | ########################### stats ############################## 1047 | error_fielder = re.findall('E[1-9]$', mp)[0] 1048 | if 'TH' in mpm: #throwing error 1049 | self.stats['fielding'].append(['E(TH)',error_fielder[1]]) 1050 | else: 1051 | self.stats['fielding'].append(['E',error_fielder[1]]) 1052 | 1053 | passes = re.findall('[0-9]+', mp) 1054 | if passes: 1055 | self.modifiers['passes'].append(passes[0]) 1056 | ########################### end ################################ 1057 | 1058 | elif re.findall('^FC[1-9]?\??$',mp):# fielders choice (also implicit B-1) 1059 | self.main_play = advance_base(self.main_play, bfrom='B') if not self._is_explicit() else self.main_play 1060 | self.base = move_base(self.base, bfrom='B', bto='1') if not self._is_explicit() else self.base #B-1 except if explicily moving on advances 1061 | ########################### stats ############################## 1062 | self.stats['batting'].append(['FC','']) 1063 | if len(mp) > 2: 1064 | self.stats['fielding'].append(['FC',mp[2]]) 1065 | self.modifiers['passes'].append(mp[2]) 1066 | ########################### end ################################ 1067 | 1068 | elif re.findall('^FLE[1-9]+$',mp): # error on foul fly play (error given to the play but no advances) 1069 | 1070 | ########################### stats ############################## 1071 | self.stats['fielding'].append(['FLE',mp[3]]) 1072 | ########################### end ################################ 1073 | 1074 | elif re.findall('^H[R]?[1-9]*[D]?$', mp): #home run 1075 | self.main_play = advance_base(self.main_play, bto='H',bfrom='B') if not self._is_explicit() else self.main_play 1076 | self.base = move_base(self.base, bfrom='B', bto='H') if not self._is_explicit() else self.base #B-1 except if explicily moving on advances 1077 | ########################### stats ############################## 1078 | self.stats['running'].append(['R','B', 'H']) 1079 | 1080 | self.stats['batting'].append(['HR','']) #home run 1081 | self.stats['pitching'].append(['HR','1']) 1082 | 1083 | self.stats['batting'].append(['H','']) #hit 1084 | self.stats['pitching'].append(['H','1']) 1085 | 1086 | self.stats['batting'].append(['R','']) #run 1087 | 1088 | if 'IPHR' in mpm: 1089 | self.stats['batting'].append(['IPHR','']) 1090 | ########################### end ################################ 1091 | 1092 | 1093 | elif re.findall('^HP$', mp): #hit by pitch 1094 | self.main_play = advance_base(self.main_play, bfrom='B') if not self._is_explicit() else self.main_play 1095 | self.base = move_base(self.base, bfrom='B', bto='1') if not self._is_explicit() else self.base #B-1 except if explicily moving on advances 1096 | ########################### stats ############################## 1097 | self.stats['batting'].append(['HBP','']) #hit by pitch 1098 | ########################### end ################################ 1099 | 1100 | elif re.findall('^W[^P]',mp) or mp=='W': # walk 1101 | self.main_play = advance_base(self.main_play, bfrom='B') if not self._is_explicit() else self.main_play 1102 | self.base = move_base(self.base, bfrom='B', bto='1') if not self._is_explicit() else self.base #B-1 except if explicily moving on advances 1103 | ########################### stats ############################## 1104 | self.stats['batting'].append(['BB','']) #base on balls 1105 | self.stats['pitching'].append(['BB','1']) #base on balls 1106 | ########################### end ################################ 1107 | 1108 | elif re.findall('^I[W]?',mp): # intentional walk 1109 | self.main_play = advance_base(self.main_play, bfrom='B') if not self._is_explicit() else self.main_play 1110 | self.base = move_base(self.base, bfrom='B', bto='1') if not self._is_explicit() else self.base #B-1 except if explicily moving on advances 1111 | ########################### stats ############################## 1112 | self.stats['batting'].append(['IBB','']) #base on balls 1113 | self.stats['pitching'].append(['IBB','1']) #base on balls 1114 | ########################### end ################################ 1115 | 1116 | elif re.findall('^K',mp): #strikeout 1117 | self.main_play = out_in_advance(self.main_play, bfrom='B') if not self._is_explicit() else self.main_play 1118 | self.base = leave_base(self.base, bfrom='B') if not self._is_explicit() else self.base 1119 | 1120 | ########################### stats ############################## 1121 | self.stats['batting'].append(['K','']) #strikeout 1122 | self.stats['fielding'].append(['PO','2']) #strikeout 1123 | self.stats['pitching'].append(['K','1']) #strikeout 1124 | self.stats['batting'].append(['SF','']) if 'SF' in mpm else None 1125 | self.stats['batting'].append(['SH','']) if 'SH' in mpm else None 1126 | ########################### end ################################ 1127 | 1128 | elif re.findall('^NP$',mp): #no play 1129 | pass 1130 | 1131 | elif re.findall('^(?:OA)?(?:99)?$',mp): #unkown play 1132 | pass 1133 | 1134 | elif re.findall('^PB$', mp): #passed ball 1135 | #will keep any advancement to explicit for now. Othersie uncomment below 1136 | #self.main_play = advance_base(self.main_play, bfrom='B') if not self._is_explicit() else self.main_play 1137 | #self.base = move_base(self.base, bfrom='B', bto='1') if not self._is_explicit() else self.base #B-1 except if explicily moving on advances 1138 | ########################### stats ############################## 1139 | self.stats['fielding'].append(['PB','2']) 1140 | ########################### end ################################ 1141 | 1142 | elif re.findall('^PO[123](?:\([1-9]+\))',mp): #picked off of base (without error) 1143 | bfrom = mp[2] 1144 | bto = NEXT_BASE[mp[2]] 1145 | 1146 | if re.findall('[\-X]{0}'.format(bfrom), self.str.split('.')[len(self.str.split('.'))-1]): 1147 | self.main_play['out'] += 1 1148 | else: 1149 | self.main_play = out_in_advance(self.main_play, bto=bto) if not self._is_explicit(bfrom) else self.main_play 1150 | self.base = leave_base(self.base, bfrom=bfrom) if not self._is_explicit(bfrom) else self.base 1151 | 1152 | #self.main_play = out_in_advance(self.main_play, bfrom=bfrom) if not self._is_explicit(bfrom) else self.main_play 1153 | #self.base = leave_base(self.base, bfrom=bfrom) if not self._is_explicit(bfrom) else self.base 1154 | 1155 | ########################### stats ############################## 1156 | PO = re.findall('[1-9]\)', mp) 1157 | if PO: 1158 | PO = PO[0].replace(')','') 1159 | self.stats['fielding'].append(['PO',PO[0]]) 1160 | 1161 | As = re.findall('(?:\([^\(]+\))', mp) 1162 | if As: 1163 | As = As[0].replace('(','').replace(')','') 1164 | for a in As: 1165 | if a not in PO: 1166 | self.stats['fielding'].append(['A',a]) 1167 | 1168 | passes = re.sub('PO[123]\(','', mp).replace(')','').replace('E','') 1169 | self.modifiers['passes'].append(passes) 1170 | 1171 | 1172 | self.stats['running'].append(['PO',bfrom, bfrom]) #player never moved base 1173 | ########################### end ################################ 1174 | 1175 | elif re.findall('^PO[123](?:\([1-9]*E[1-9]+)',mp): #pick off with pass error (no out nothing implicit) 1176 | 1177 | ########################### stats ############################## 1178 | bfrom = mp[2] 1179 | bto = NEXT_BASE[mp[2]] 1180 | self.stats['running'].append(['PO(E)',bfrom, bto]) 1181 | As = re.findall('^(?:\([1-9]+E)+', mp) #assists to other players 1182 | if As: 1183 | As = As[0].replace('E','').replace('(','') 1184 | for a in As: 1185 | self.stats['fielding'].append(['A',a]) 1186 | 1187 | passes = re.sub('PO[123]\(','', mp).replace(')','').replace('E','') 1188 | self.modifiers['passes'].append(passes) 1189 | 1190 | error_fielder = re.findall('E[1-9]', mp)[0] 1191 | self.stats['fielding'].append(['E',error_fielder[1]]) 1192 | ########################### end ################################ 1193 | 1194 | elif re.findall('^POCS[23H](?:\([1-9]+\))',mp): #POCS%($$) picked off off base % (2, 3 or H) with the runner charged with a caught stealing 1195 | 1196 | for split in mp.split(';'): 1197 | if split[0:2] == 'CS': 1198 | bto = split[2] 1199 | bfrom = PREVIOUS_BASE[split[2]] 1200 | 1201 | if re.findall('[\-]{0}'.format(bfrom), self.str.split('.')[len(self.str.split('.'))-1]) or bfrom in self.move_on_error: 1202 | self.main_play['out'] += 1 1203 | else: 1204 | self.main_play = out_in_advance(self.main_play, bto=bto) if not self._is_explicit(bfrom) else self.main_play 1205 | self.base = leave_base(self.base, bfrom=bfrom) if not self._is_explicit(bfrom) else self.base 1206 | 1207 | #self.main_play = out_in_advance(self.main_play, bto=bto) if not self._is_explicit(bfrom) else self.main_play 1208 | #self.base = leave_base(self.base, bfrom=bfrom) if not self._is_explicit(bfrom) else self.base 1209 | 1210 | 1211 | self.stats['running'].append(['CS',bfrom, bto]) 1212 | else: 1213 | bto = split[4] 1214 | bfrom = PREVIOUS_BASE[split[4]] 1215 | 1216 | if re.findall('[\-]{0}'.format(bfrom), self.str.split('.')[len(self.str.split('.'))-1]) or bfrom in self.move_on_error: 1217 | self.main_play['out'] += 1 1218 | else: 1219 | self.main_play = out_in_advance(self.main_play, bto=bto) if not self._is_explicit(bfrom) else self.main_play 1220 | self.base = leave_base(self.base, bfrom=bfrom) if not self._is_explicit(bfrom) else self.base 1221 | 1222 | #out_in_advance( self.main_play, bto=bto) if not self._is_explicit(bfrom) else self.main_play #there are CS events together with POCS 1223 | #self.base = leave_base(self.base, bfrom=bfrom) if not self._is_explicit(bfrom) else self.base 1224 | 1225 | self.stats['running'].append(['CS',bfrom, bto]) 1226 | 1227 | ########################### stats ############################## 1228 | 1229 | PO = re.findall('[1-9]\)', split) 1230 | if PO: 1231 | PO = PO[0].replace(')','') 1232 | self.stats['fielding'].append(['PO',PO[0]]) 1233 | 1234 | As = re.findall('(?:\([^\(]+\))', split) 1235 | if As: 1236 | As = As[0].replace('(','').replace(')','') 1237 | for a in As: 1238 | if a not in PO: 1239 | self.stats['fielding'].append(['A',a]) 1240 | 1241 | passes = re.sub('POCS[123]\(','', mp).replace(')','').replace('E','') 1242 | self.modifiers['passes'].append(passes) 1243 | ########################### end ################################ 1244 | 1245 | elif re.findall('^POCS[23H](?:\([1-9]*E[1-9]+)',mp):#POCS errors 1246 | 1247 | for split in mp.split(';'): 1248 | bto = split[4] 1249 | bfrom = PREVIOUS_BASE[split[4]] 1250 | 1251 | if not self._is_explicit(bfrom): 1252 | if re.findall('[\-]{0}'.format(bfrom), self.str.split('.')[len(self.str.split('.'))-1]) or bfrom in self.move_on_error: 1253 | self.main_play[bto] = 1 1254 | if bto == 'H' or bfrom == '3': 1255 | self.main_play['run'] += 1 1256 | 1257 | if bto=='H': 1258 | self.base[bto].append(self.base[bfrom]) 1259 | else: 1260 | self.base[bto] = self.base[bfrom] 1261 | 1262 | else: 1263 | self.main_play = advance_base(self.main_play, bto=bto) 1264 | self.base = move_base(self.base, bfrom=bfrom, bto=bto) 1265 | 1266 | 1267 | #self.base = move_base(self.base, bfrom=bfrom, bto=bto) if not self._is_explicit(bfrom) else self.base 1268 | #self.advances = advance_base(self.advances, bfrom=bfrom, bto=bto) if not self._is_explicit(bfrom) else self.advances 1269 | 1270 | ########################### stats ############################## 1271 | bto = mp[4] 1272 | bfrom = PREVIOUS_BASE[mp[4]] 1273 | self.stats['running'].append(['CS(E)',bfrom, bto]) 1274 | 1275 | As = re.findall('^(?:\([1-9]+E)+', mp) #assists to other players 1276 | if As: 1277 | As = As[0].replace('E','').replace('(','') 1278 | for a in As: 1279 | self.stats['fielding'].append(['A',a]) 1280 | 1281 | error_fielder = re.findall('E[1-9]', mp)[0] 1282 | self.stats['fielding'].append(['E',error_fielder[1]]) 1283 | 1284 | passes = re.sub('POCS[123]\(','', mp).replace(')','').replace('E','') 1285 | self.modifiers['passes'].append(passes) 1286 | ########################### end ################################ 1287 | 1288 | elif re.findall('^S[0-9]*\??\+?$',mp): #single 1289 | self.main_play = advance_base(self.main_play, bfrom='B') if not self._is_explicit() else self.main_play 1290 | self.base = move_base(self.base, bfrom='B', bto='1') if not self._is_explicit() else self.base #B-1 except if explicily moving on advances 1291 | ########################### stats ############################## 1292 | self.stats['batting'].append(['1B','']) #single 1293 | self.stats['batting'].append(['H','']) #hit 1294 | self.stats['pitching'].append(['H','1']) 1295 | 1296 | passes = re.findall('[0-9]', mp) 1297 | if passes: 1298 | self.modifiers['passes'].append(passes[0]) 1299 | ########################### end ################################ 1300 | 1301 | elif re.findall('^SB[23H]',mp): #stolen base 1302 | sbs = [] 1303 | for sb in mp.split(';'): 1304 | if sb[0:2] == 'SB': 1305 | sbs.append(sb) 1306 | 1307 | sbs.sort(key = lambda item: (['1','2','3','H'].index(item[2]), item), reverse=True) 1308 | 1309 | for sb in sbs: 1310 | bto = sb[2] 1311 | bfrom = PREVIOUS_BASE[sb[2]] 1312 | 1313 | if not self._is_explicit(bfrom): 1314 | #check if explicit moved, so wont zero out the base left 1315 | if re.findall('[\-]{0}'.format(bfrom), self.str.split('.')[len(self.str.split('.'))-1]) or bfrom in self.move_on_error: 1316 | self.main_play[bto] = 1 1317 | if bto == 'H' or bfrom == '3': 1318 | self.main_play['run'] += 1 1319 | 1320 | if bto=='H': 1321 | self.base[bto].append(self.base[bfrom]) 1322 | else: 1323 | self.base[bto] = self.base[bfrom] 1324 | 1325 | 1326 | else: 1327 | self.main_play = advance_base(self.main_play, bto=sb[2]) 1328 | self.base = move_base(self.base, bfrom=bfrom, bto=bto) 1329 | 1330 | 1331 | ########################### stats ############################## 1332 | self.stats['running'].append(['SB',bfrom, bto]) 1333 | self.stats['running'].append(['R',bfrom, bto]) if sb[2] == 'H' else None 1334 | ########################### end ################################ 1335 | 1336 | elif re.findall('^T[0-9]*\??\+?$',mp): #triple 1337 | self.main_play = advance_base(self.main_play, bfrom='B', bto='3') if not self._is_explicit() else self.main_play 1338 | self.base = move_base(self.base, bfrom='B', bto='3') if not self._is_explicit() else self.base #B-1 except if explicily moving on advances 1339 | ########################### stats ############################## 1340 | self.stats['batting'].append(['3B','']) 1341 | self.stats['batting'].append(['H','']) #hit 1342 | 1343 | passes = re.findall('[0-9]', mp) 1344 | if passes: 1345 | self.modifiers['passes'].append(passes[0]) 1346 | ########################### end ################################ 1347 | 1348 | elif re.findall('^WP', mp): ## wild pitch - base runner advances 1349 | #the advance should only be explicit. If not, uncomment below 1350 | #self.main_play = advance_base(self.main_play, bfrom='B') if not self._is_explicit() else self.main_play 1351 | #self.base = move_base(self.base, bfrom='B', bto='1') if not self._is_explicit() else self.base #B-1 except if explicily moving on advances 1352 | 1353 | ########################### stats ############################## 1354 | self.stats['pitching'].append(['WP','1']) 1355 | ########################### end ################################ 1356 | 1357 | elif re.findall('^C$', mp): #catcher interference or pitcher or first baseman 1358 | if 'E1' in mpm : 1359 | ########################### stats ############################## 1360 | self.main_play = advance_base(self.main_play, bfrom='B', bto='1') if not self._is_explicit() else self.main_play 1361 | self.base = move_base(self.base, bfrom='B', bto='1') if not self._is_explicit() else self.base 1362 | 1363 | self.stats['fielding'].append(['E','1']) 1364 | ########################### end ################################ 1365 | elif 'E2' in mpm: 1366 | ########################### stats ############################## 1367 | self.main_play = advance_base(self.main_play, bfrom='B', bto='1') if not self._is_explicit() else self.main_play 1368 | self.base = move_base(self.base, bfrom='B', bto='1') if not self._is_explicit() else self.base 1369 | 1370 | self.stats['fielding'].append(['CI','2']) 1371 | ########################### end ################################ 1372 | elif 'E3' in mpm: 1373 | ########################### stats ############################## 1374 | self.main_play = advance_base(self.main_play, bfrom='B', bto='1') if not self._is_explicit() else self.main_play 1375 | self.base = move_base(self.base, bfrom='B', bto='1') if not self._is_explicit() else self.base 1376 | 1377 | self.stats['fielding'].append(['E','3']) 1378 | ########################### end ################################ 1379 | 1380 | 1381 | else: 1382 | self.log.debug('Main event not known: {0}'.format(mp)) 1383 | #raise eventNotFoundError('Event Not Known', mp) 1384 | 1385 | 1386 | def _split_plays(self): 1387 | """ 1388 | split the play into: 1389 | - main play --> main string 1390 | - implicit advances --> calculated 1391 | - main play modifiers --> separated by '/' 1392 | - secondary_play --> (for K+ and [I]W+ events) 1393 | 1394 | - explicit advances --> separated from main play by '.'. It is = explicit move + advance description + advance modifiers 1395 | - explicit move --> the move of players, without modifiers. base-base or baseXbase 1396 | - advance description --> descriptors only, enclosed by '()' 1397 | - advance modifiers --> modifiers for the description, separated by '/' 1398 | """ 1399 | self.mp = [] # main play 1400 | self.mpm= [] # main play modifiers, preceeded by '/' 1401 | self.mpd = [] # main play describers, inside '()' 1402 | 1403 | self.mpdm = []# main play describer modifiers, preceeded by '/' #not in use for now 1404 | 1405 | self.sp = [] # secondary play 1406 | self.spm = [] # secondary play modifiers, preceeded by '/' 1407 | 1408 | self.ea = [] # explicit advances 1409 | self.em = [] # explicit move 1410 | self.ad = [] # advance descriptions 1411 | self.am = [] # advance modifiers 1412 | 1413 | #main part: 1414 | self.mp = re.findall('^(?:[^\.^\+^/]+)', self.str.split('.')[0].split('+')[0])#self.str.split('.')[0] 1415 | #print ('\nmp:\t', self.mp) 1416 | 1417 | #secondary play 1418 | self.sp = re.findall('(?<=\+)(?:[^\.^\+^/]+)', self.str.split('.')[0]) 1419 | 1420 | #'+' could be a string in a location or a separator of plays (second play) 1421 | if not self.sp: 1422 | self.mpm = re.findall('(?<=/)[^\+^/]+', self.str.split('.')[0].replace('#','').replace('+','')) 1423 | else: 1424 | self.mpm = re.findall('(?<=/)[^\+^/]+', self.str.split('.')[0].split('+')[0].replace('#','').replace('+','')) 1425 | #print ('\nmpm:\t', self.mpm) 1426 | 1427 | self.mpd = re.findall('(?<=\()(?:[^\)^/])+', self.str.split('.')[0].split('+')[0]) 1428 | #print ('\nmpd\t', self.mpd) 1429 | 1430 | 1431 | str_spm = self.str.split('.')[0].split('+',1)[1] if len(self.str.split('.')[0].split('+',1)) > 1 else '' 1432 | self.spm = re.findall('(?<=/)(?:[^/^\+]+)', str_spm) 1433 | #print ('\nspm:\t', self.spm) 1434 | 1435 | #advances: 1436 | self.ea = self.str.split('.')[len(self.str.split('.'))-1].split(';') if len(self.str.split('.'))>1 else [] 1437 | self.ea.sort(key = lambda item: (['B','1','2','3'].index(item[0]), item), reverse=True) 1438 | 1439 | for advance in self.ea: self.em.append(re.findall('[1-3B][\-X][1-3H]', advance)) 1440 | for advance in self.ea: self.ad.append(re.findall('(?<=\()(?:[^\)^/]+)', advance)) 1441 | for advance in self.ea: 1442 | describers = re.findall('(?<=\()(?:[^\)]+)', advance) 1443 | if not describers: 1444 | self.am.append([[]]) 1445 | else: 1446 | temp = [] 1447 | for describer in describers: 1448 | temp.append(re.findall('(?<=/)[^/^\)]+', describer)) 1449 | self.am.append(temp) 1450 | 1451 | #print ('\nea:\t', self.ea) 1452 | #print ('\nem:\t', self.em) 1453 | #print ('\nad:\t', self.ad) 1454 | #print ('\nam:\t', self.am) 1455 | 1456 | 1457 | def final_moves(self): 1458 | """Combine main play with explicit advances. 1459 | Also, it needs to check to make sure bases are correct based on previous 1460 | play (previous_advances) 1461 | """ 1462 | 1463 | for key, value in self.main_play.items(): 1464 | if key in ['out', 'run','H']: 1465 | self.advances[key] += value 1466 | else: #bases 1467 | self.advances[key] = value 1468 | 1469 | 1470 | def decipher(self): 1471 | """Parse baseball play 1472 | """ 1473 | self.move_on_error = [] 1474 | #initialize this play 1475 | self.modifiers = { 1476 | 'out': 0, 1477 | 'run': 0, 1478 | 'bunt': 0, 1479 | 'trajectory': '', 1480 | 'location': '', 1481 | 'interference':'', 1482 | 'review': '', 1483 | 'foul': 0, 1484 | 'force out': 0, 1485 | 'throw':0, 1486 | 'sacrifice': '', 1487 | 'relay':0, 1488 | 'other':[], 1489 | 'courtesy':'', 1490 | 'passes': [], 1491 | 'DP': False, 1492 | 'TP': False, 1493 | } 1494 | 1495 | self.stats = { 1496 | 'batting': [], #event, player (left blank as batter is contextual) 1497 | 'fielding': [], #event, event 1498 | 'running':[], #event, base_from, base_to 1499 | 'pitching':[], #event, player 1500 | } 1501 | 1502 | self.main_play={'out': 0,'run': 0} 1503 | #self._initialize_modifiers() 1504 | 1505 | #take the pieces of hte play (main play, secondary, advances, modifiers, describers) 1506 | self._split_plays() 1507 | mp = self.mp[0].replace('#','').replace('!','').replace('?','') 1508 | mpm= self.mpm 1509 | 1510 | #read advance first (Explicit moves) 1511 | self._advances() 1512 | 1513 | #read main play 1514 | self._main_play(mp = mp, mpm=mpm) 1515 | self._modifiers(modifiers = self.mpm) 1516 | 1517 | #read secondary play if there 1518 | if self.sp: 1519 | sp = self.sp[0].replace('#','').replace('!','').replace('?','') 1520 | spm = self.spm 1521 | self._main_play(mp = sp, mpm=spm) 1522 | self._modifiers(modifiers= self.spm) 1523 | 1524 | #combine explicit + implicit moves 1525 | self.final_moves() 1526 | 1527 | 1528 | class eventNotFoundError(Exception): 1529 | """ Exception that is raised when an event is not recognized 1530 | """ 1531 | def __init__(self, error, event): 1532 | self.log = logging.getLogger(__name__) 1533 | self.log.debug("Event not found: {0}".format(event)) 1534 | super(eventNotFoundError, self).__init__(event) 1535 | -------------------------------------------------------------------------------- /retrosheet/game.py: -------------------------------------------------------------------------------- 1 | # encoding: utf-8 2 | 3 | import logging 4 | import re 5 | from io import BytesIO 6 | from zipfile import ZipFile 7 | from collections import OrderedDict 8 | import pandas as pd 9 | from urllib.request import urlopen 10 | import os.path 11 | 12 | from .helpers import pitch_count, progress, game_state 13 | from .version import __version__ 14 | from .event import event 15 | 16 | 17 | class parse_row(object): 18 | """ Parse one single row 19 | - A row can return only one type of data (id, version, start, play, sub, com, data) 20 | """ 21 | def __init__(self): 22 | self.log = logging.getLogger(__name__) #initialize logging 23 | self.row_str ='' 24 | self.row_values = [] 25 | self.row_results = {} 26 | self.row_data = [] 27 | self.row_id = [] 28 | 29 | def _clean_row(self): 30 | self.row_str = self.row_str.decode("utf-8") 31 | self.row_values = self.row_str.rstrip('\n').split(',') 32 | self.row_values = [x.replace('\r','').replace('"','') for x in self.row_values] 33 | 34 | def read_row(self): 35 | self._clean_row() 36 | self.row_id = self.row_values[0] #string 37 | self.row_data = self.row_values[1:] #list 38 | 39 | 40 | class parse_game(parse_row): 41 | 42 | """"Object for each baseball game, subclass. 43 | - Data is expected to be sequentially passed (Retrosheet format) 44 | - When this class is initialized, it restarts all stats for the game 45 | """ 46 | 47 | def __init__(self, id=''): 48 | self.log = logging.getLogger(__name__) #initialize logging 49 | parse_row.__init__(self) 50 | self.location = 0 51 | self.has_started = False 52 | self.has_finished = False 53 | self.current_inning = '1' #starting of game 54 | self.current_team = '0' #starting of game 55 | self.score = {'1':0,'0':0} 56 | self.current_pitcher = {'1':'','0':''} #1 for home, 0 for away 57 | self.pitch_count = {'1':0,'0':0} #1 for home, 0 for away 58 | self.game = { 59 | 'meta': {'filename': '', '__version__': __version__, 'events':''}, 60 | 'id': id, 61 | 'version': '', 62 | 'starting_lineup':{'1': {}, '0': {}}, #static for each game 63 | 'playing_lineup':{'1': {}, '0': {}}, #dynamic, based on subs 64 | 'info': [], #'start': [], 65 | 'play_data': [], 66 | 'play_player': [], #pitching_id | batter_id | player_id | event | value | 67 | #'sub': [], 68 | 'com': [], 69 | 'data': [], 70 | 'stats': {'pitching':[], 'batting':[], 'fielding': [], 'running':[]} 71 | } 72 | self.event = event() 73 | self.event.base = {'B': None,'1': None,'2': None,'3': None,'H': []} 74 | self.event.advances = {'B': 1,'1': 0,'2': 0,'3': 0,'H': 0, 'out': 3, 'run': 0} 75 | 76 | 77 | def parse_start(self, start_sub = 'start'): 78 | """ This will happen before the game starts""" 79 | fielding_position = self.row_values[5] 80 | player_id = self.row_values[1] 81 | home_away = self.row_values[-3][-1] #some entires are '01' 82 | 83 | try: 84 | self.current_pitcher[home_away] = player_id if fielding_position == '1' else self.current_pitcher[home_away] 85 | self.pitch_count[home_away] = 0 if fielding_position == '1' else self.pitch_count[home_away] 86 | except: 87 | self.log.debug('Something wrong with {0} home_away pitcher in {1}, {2}'.format(self.game['id'], start_sub, self.row_values)) 88 | 89 | self.game['playing_lineup'][home_away][fielding_position] = player_id 90 | 91 | if start_sub == 'start': 92 | self.game['starting_lineup'][home_away][fielding_position] = player_id 93 | 94 | 95 | def parse_play(self): 96 | """ 97 | ----------------------------------------------------------------------------------------- 98 | field format: "play | inning | home_away | player_id | count on batter | pitches | play "| 99 | index counts: 0 1 2 3 4 5 6 | 100 | ------------------------------------------------------------------------------------------ 101 | """ 102 | 103 | self.event.str = self.row_values[6] #pass string to parse values 104 | 105 | if self.current_team != self.row_values[2]: 106 | self.score[self.current_team] += self.event.advances['run'] 107 | self.event.base = {'B': None,'1': None,'2': None,'3': None, 'H': []} #players on base 108 | 109 | if self.event.advances['out'] != 3: #catching errors 110 | self.log.warning('INNING NO 3RD OUT:\tGame: {0}\tteam: {1}\tinning{2}\tout: {3}'.format(self.game['id'], self.current_team, self.current_inning, self.event.advances['out'])) 111 | 112 | self.event.advances={'B': 1,'1': 0,'2': 0,'3': 0,'H': 0, 'out': 0,'run': 0} if self.event.advances['out'] >= 3 else self.event.advances 113 | 114 | self.event.base['B'] = self.row_values[3] #current at bat 115 | base_before_play = self.event.base.copy() 116 | 117 | pre_event = self.event.advances.copy() 118 | self.event.decipher() 119 | post_event = self.event.advances.copy() 120 | this_play_runs = post_event['run'] - pre_event['run'] 121 | this_play_outs = post_event['out'] - pre_event['out'] 122 | pre_state, post_state = game_state(pre_event, post_event) 123 | 124 | if post_state == 25: 125 | states = [25,26,27,28] 126 | post_state = states[this_play_runs] 127 | 128 | 129 | pitcher_home_away = '1' if self.row_values[2] == '0' else '0' #remember picher is defense 130 | pitch_string = self.row_values[3] 131 | 132 | 133 | self.pitch_count[pitcher_home_away] = pitch_count(self.row_values[5], self.pitch_count[pitcher_home_away]) 134 | self.current_inning = self.row_values[1] 135 | self.current_team = self.row_values[2] if self.row_values[2] in ['0','1'] else self.current_team 136 | 137 | if self.event.str != 'NP': #only append if plays happened (skip subs(NP) on play file) 138 | self.game['play_data'].append({ 139 | 'game_id': self.game['id'], 140 | 'order': self.location, 141 | 'pitcher': self.current_pitcher[pitcher_home_away], 142 | 'pitch_count': self.pitch_count[pitcher_home_away], 143 | 'inning': self.current_inning, 144 | 'team': self.current_team, 145 | 'player_id': self.row_values[3], 146 | 'count_on_batter': self.row_values[4], 147 | 'pitch_str': self.row_values[5], 148 | 'play_str': self.row_values[6], 149 | 'B': self.event.advances['B'], 150 | '1': self.event.advances['1'], 151 | '2': self.event.advances['2'], 152 | '3': self.event.advances['3'], 153 | 'H': self.event.advances['H'], 154 | 'run': self.event.advances['run'], 155 | 'out': self.event.advances['out'], 156 | 'on-B': self.event.base['B'], 157 | 'on-1': self.event.base['1'], 158 | 'on-2': self.event.base['2'], 159 | 'on-3': self.event.base['3'], 160 | 'on-H': self.event.base['H'], 161 | 'hometeam_score': self.score['1'], 162 | 'awayteam_score': self.score['0'], 163 | 'trajectory': self.event.modifiers['trajectory'], 164 | 'passes': self.event.modifiers['passes'], 165 | 'location': self.event.modifiers['location'], 166 | 'pre_state': pre_state, 167 | 'post_state': post_state, 168 | 'play_runs': this_play_runs, 169 | 'play_outs': this_play_outs 170 | }) 171 | 172 | 173 | #import stats for the play 174 | #batting 175 | for bat_stat in self.event.stats['batting']: 176 | bat_stat[1] = self.row_values[3] 177 | self.game['stats']['batting'].append([self.game['id'], self.location] + bat_stat) 178 | 179 | #pitching 180 | for pit_stat in self.event.stats['pitching']: 181 | pit_stat[1] = self.current_pitcher[pitcher_home_away] 182 | self.game['stats']['pitching'].append([self.game['id'], self.location] + pit_stat) 183 | 184 | #running -- > need to track player together with base 185 | for run_stat in self.event.stats['running']: 186 | run_stat.append(base_before_play[run_stat[1]])#bfrom 187 | self.game['stats']['running'].append([self.game['id'], self.location] + run_stat) 188 | 189 | 190 | #fielding --> use current positions 191 | fld_home_away = '1' if self.current_team == '0' else '0' #defense is the opposite team 192 | for fld_stat in self.event.stats['fielding']: 193 | try: 194 | fld_stat[1] = self.game['playing_lineup'][fld_home_away][fld_stat[1]] 195 | except: 196 | self.log.debug(fld_stat) 197 | self.game['stats']['fielding'].append([self.game['id'], self.location] + fld_stat) 198 | 199 | self.location += 1 200 | 201 | 202 | def parse_com(self): 203 | self.game['com'].append([self.game['id'], self.location] + self.row_data) 204 | 205 | 206 | def parse_event(self, row_str): 207 | self.row_str = row_str 208 | self.read_row() 209 | if self.row_id == 'id' or self.row_id == 'version': 210 | self.game[self.row_id] = self.row_data[0] 211 | self.has_started = True 212 | elif self.row_id == 'info': 213 | self.game[self.row_id].append([self.game['id'],self.row_values[1], self.row_values[2]]) 214 | elif self.row_id == 'data': 215 | self.has_finished=True 216 | self.game['meta']['events'] = self.location + 1 #0 index 217 | if not self.game['data']: 218 | self.game['info'].append([self.game['id'], 'hometeam_score', self.score['1']]) 219 | self.game['info'].append([self.game['id'], 'awayteam_score', self.score['0']]) 220 | self.game[self.row_id].append([self.game['id'], self.game['meta']['events']]+self.row_data) 221 | else: 222 | self.parse_start(self.row_id) if self.row_id in ['start','sub'] else None 223 | self.parse_play() if self.row_id == 'play' else None 224 | self.parse_com() if self.row_id == 'com' else None 225 | 226 | 227 | class parse_games(object): 228 | """ 229 | """ 230 | def __init__(self): 231 | self.log = logging.getLogger(__name__) #initialize logging 232 | self.file = None 233 | self.game_list = [] 234 | self.zipfile = None 235 | 236 | 237 | def get_games(self): 238 | game = parse_game() #files in 1991 start with something other than id 239 | for loop, row in enumerate(self.zipfile.open(self.file).readlines()): 240 | if row.decode("utf-8").rstrip('\n').split(',')[0] == 'id': 241 | game_id = row.decode("utf-8").rstrip('\n').split(',')[1].rstrip('\r') 242 | #start new game 243 | self.game_list.append(game.game) if loop > 0 else None 244 | game = parse_game(game_id) 245 | else: 246 | game.parse_event(row) 247 | 248 | 249 | def debug_game(self, game_id): 250 | diamond = '''Play: {2}, Inning: {0}, Team: {1} \n|---------[ {5} ]-----------|\n|-------------------------|\n|----[ {6} ]------[ {4} ]-----|\n|-------------------------|\n|------[ {7} ]--[ {3} ]-------|\n|-------------------------|\nRuns: {8}\tOuts: {9}\n''' 251 | 252 | for game in self.game_list: 253 | if game['id'] == game_id: 254 | for play in game['play_data']: 255 | print (diamond.format( 256 | play['inning'], play['team'], play['play_str'], 257 | play['B'], 258 | play['1'], 259 | play['2'], 260 | play['3'], 261 | play['H'], 262 | play['run'], 263 | play['out'] 264 | )) 265 | 266 | 267 | class parse_files(parse_games): 268 | 269 | endpoint = 'https://www.retrosheet.org/events/' 270 | extension = '.zip' 271 | 272 | def __init__(self): 273 | parse_games.__init__(self) 274 | self.log = logging.getLogger(__name__) 275 | self.teams_list = [] 276 | self.rosters_list = [] 277 | 278 | def read_files(self): 279 | try: #the files locally: 280 | zipfile = ZipFile(self.filename) 281 | #self.log.debug("Found locally") 282 | except: #take from the web 283 | resp = urlopen(self.endpoint + self.filename) 284 | zipfile = ZipFile(BytesIO(resp.read())) 285 | #self.log.debug("Donwloading from the web") 286 | 287 | self.zipfile = zipfile 288 | 289 | teams = [] 290 | rosters = [] 291 | 292 | for file in self.zipfile.namelist(): 293 | if file[-3:] in ['EVA','EVN']: 294 | self.file = file 295 | self.get_games() 296 | 297 | elif file[:4] == 'TEAM': 298 | year = file[4:8] 299 | for row in zipfile.open(file).readlines(): 300 | row = row.decode("utf-8") 301 | team_piece = [] 302 | for i in range(4): team_piece.append(row.rstrip('\n').split(',')[i].replace('\r','')) 303 | self.teams_list.append([year]+team_piece) 304 | 305 | elif file[-3:] == 'ROS': #roster file 306 | year = file[3:7] 307 | for row in zipfile.open(file, 'r').readlines(): 308 | row = row.decode("utf-8") 309 | roster_piece = [] 310 | for i in range(7): roster_piece.append(row.rstrip('\n').split(',')[i].replace('\r','')) 311 | self.rosters_list.append([year]+roster_piece) 312 | 313 | 314 | def get_data(self, yearFrom = None, yearTo = None): 315 | """ 316 | """ 317 | yearTo = yearTo if yearTo else '2017' 318 | yearFrom = yearFrom if yearFrom else yearTo 319 | 320 | for loop, year in enumerate(range(yearFrom, yearTo+1, 1)): 321 | progress(loop, (yearTo - yearFrom+1), status='Year: {0}'.format(year)) 322 | self.log.debug('Getting data for {0}...'.format(year)) 323 | self.filename = '{0}eve{1}'.format(year, self.extension) 324 | self.read_files() 325 | 326 | progress(1,1,'Completed {0}-{1}'.format(yearFrom, yearTo)) 327 | return True 328 | 329 | 330 | def to_df(self): 331 | """ 332 | """ 333 | plays = [] 334 | infos = [] 335 | datas = [] 336 | lineups = [] 337 | battings = [] 338 | fieldings = [] 339 | pitchings = [] 340 | runnings = [] 341 | 342 | for loop, game in enumerate(self.game_list): 343 | plays += game['play_data'] 344 | infos += game['info'] 345 | datas += game['data'] 346 | 347 | battings += game['stats']['batting'] 348 | fieldings += game['stats']['fielding'] 349 | pitchings += game['stats']['pitching'] 350 | runnings += game['stats']['running'] 351 | 352 | game['starting_lineup']['1']['game_id'] = game['id'] 353 | game['starting_lineup']['1']['home_away'] = 'home' 354 | game['starting_lineup']['0']['game_id'] = game['id'] 355 | game['starting_lineup']['0']['home_away'] = 'away' 356 | 357 | lineups.append(game['starting_lineup']['1']) 358 | lineups.append(game['starting_lineup']['0']) 359 | 360 | self.plays = pd.DataFrame(plays) 361 | self.info = pd.DataFrame(infos, columns = ['game_id', 'var', 'value']) 362 | #self.info = self.info[~self.info.duplicated(subset=['game_id','var'], keep='last')].pivot('game_id','var','value').reset_index() 363 | 364 | self.lineup = pd.DataFrame(lineups) 365 | self.fielding = pd.DataFrame(fieldings, columns = ['game_id','order','stat','player_id']) 366 | 367 | data_df = pd.DataFrame(datas, columns = ['game_id','order','stat','player_id','value']) 368 | self.pitching = pd.DataFrame(pitchings, columns = ['game_id','order','stat','player_id']) 369 | self.pitching['value'] = 1 370 | self.pitching = pd.concat([self.pitching, data_df], axis = 0) 371 | 372 | self.batting = pd.DataFrame(battings, columns = ['game_id','order','stat','player_id']) 373 | self.running = pd.DataFrame(runnings, columns = ['game_id','order','stat','bfrom','bto','player_id']) 374 | 375 | self.rosters = pd.DataFrame(self.rosters_list, columns = ['year','player_id','last_name','first_name','batting_hand','throwing_hand','team_abbr_1','position']) 376 | self.teams = pd.DataFrame(self.teams_list, columns=['year','team_abbr','league','city','name']) 377 | 378 | return True 379 | 380 | 381 | def save_csv(self, path_str='', append=True): 382 | """save dataframes to csv 383 | append = True for large downloads 384 | """ 385 | if path_str: 386 | path_str + '/' if path_str[-1] != '/' else path_str 387 | 388 | datasets = { 389 | 'plays': self.plays, 390 | 'info': self.info, 391 | 'lineup': self.lineup, 392 | 'fielding': self.fielding, 393 | 'pitching': self.pitching, 394 | 'batting': self.batting, 395 | 'running': self.running, 396 | 'rosters': self.rosters, 397 | 'teams': self.teams, 398 | } 399 | 400 | for key, dataset in datasets.items(): 401 | filename = path_str + key + '.csv' 402 | if not os.path.isfile(filename): 403 | dataset.to_csv(filename, mode='w', index=False, header=True) 404 | elif os.path.isfile(filename) and append==True: 405 | dataset.to_csv(filename, mode='a', index=False, header=False) 406 | else: 407 | dataset.to_csv(filename, mode='w', index=False, header=True) 408 | 409 | return True 410 | -------------------------------------------------------------------------------- /retrosheet/helpers.py: -------------------------------------------------------------------------------- 1 | # encoding: utf-8 2 | import sys 3 | 4 | PREVIOUS_BASE = {'H':'3','3':'2','2':'1','1':'B'} 5 | NEXT_BASE = {'B':'1','1':'2','2':'3','3':'H'} 6 | 7 | def move_base(bases_dict, bfrom, bto): 8 | if bto=='H': 9 | bases_dict[bto].append(bases_dict[bfrom]) 10 | bases_dict[bfrom] = None if bfrom != bto else bases_dict[bfrom] 11 | return bases_dict 12 | bases_dict[bto] = bases_dict[bfrom] 13 | bases_dict[bfrom] = None 14 | return bases_dict 15 | 16 | 17 | def leave_base(bases_dict, bfrom): 18 | bases_dict[bfrom] = None 19 | return bases_dict 20 | 21 | 22 | def pitch_count(string, current_count): 23 | """ 24 | For now it is including pickoffs 25 | """ 26 | #simplest idea: 27 | clean_pitches = string.replace('>','').replace('+','').replace('*','').replace('??','') 28 | splits = clean_pitches.split('.') #results in a list 29 | count = current_count + len(splits[len(splits)-1]) 30 | 31 | return count 32 | 33 | def out_in_advance(play_dict, bto=None, bfrom=None): 34 | """runner out when advancing by next base 35 | - play_dict: play dictionary 36 | - bto : base to, heading to 37 | - bfrom: base coming from, previous base 38 | """ 39 | bto = '1' if not bto and not bfrom else bto 40 | if bfrom: 41 | play_dict[bfrom] = 0 42 | play_dict['out'] += 1 43 | return play_dict 44 | elif bto: 45 | play_dict[PREVIOUS_BASE[bto]] = 0 46 | play_dict['out'] += 1 47 | return play_dict 48 | #play_dict[bfrom] = 0 49 | #play_dict['out'] += 1 50 | 51 | #return play_dict 52 | 53 | 54 | def advance_base(play_dict, bto=None, bfrom=None): 55 | """runner advanced to next base 56 | - play_dict: play dictionary 57 | - bto : base to, heading to 58 | - bfrom: base coming from, previous base 59 | """ 60 | bto = '1' if not bto and not bfrom else bto 61 | if bto == 'H': 62 | play_dict['run'] += 1 63 | if bto and not bfrom: 64 | play_dict.update(dict(zip([PREVIOUS_BASE[bto],bto],(0,1)))) 65 | elif bfrom and not bto: 66 | play_dict.update(dict(zip([bfrom,NEXT_BASE[bfrom]],(0,1)))) 67 | else: #bto and bfrom explicit 68 | play_dict.update(dict(zip([bfrom,bto],(0,1)))) 69 | return play_dict 70 | 71 | def progress(count, total, status=''): 72 | """ 73 | Adapted from https://gist.github.com/vladignatyev/06860ec2040cb497f0f3 74 | """ 75 | bar_len = 60 76 | filled_len = int(round(bar_len * count / float(total))) 77 | 78 | percents = round(100.0 * count / float(total), 1) 79 | bar = '=' * filled_len + '-' * (bar_len - filled_len) 80 | 81 | sys.stdout.write('[{0}] {1}{2} ... {3}\r'.format(bar, percents, '%', status)) 82 | sys.stdout.flush() 83 | if count == total: 84 | print ('') 85 | 86 | def game_state(pre, post): 87 | """ 88 | Expected format of pre/post: {'B': 1,'1': 0,'2': 0,'3': 0,'H': 0, 'out': 0,'run': 0} 89 | """ 90 | pre_list = [int(pre['1']),int(pre['2']),int(pre['3']),int(pre['out'])] 91 | post_list = [int(post['1']),int(post['2']),int(post['3']),int(post['out'])] 92 | 93 | #first, second, third, out 94 | states = [ 95 | [0,0,0,0],[1,0,0,0],[0,1,0,0],[0,0,1,0],[1,1,0,0],[1,0,1,0],[0,1,1,0],[1,1,1,0], 96 | [0,0,0,1],[1,0,0,1],[0,1,0,1],[0,0,1,1],[1,1,0,1],[1,0,1,1],[0,1,1,1],[1,1,1,1], 97 | [0,0,0,2],[1,0,0,2],[0,1,0,2],[0,0,1,2],[1,1,0,2],[1,0,1,2],[0,1,1,2],[1,1,1,2], 98 | [0,0,0,3] #25th state 99 | ] 100 | 101 | for loop, item in enumerate(states): 102 | if post['out'] == 3: 103 | post_state = 25 104 | 105 | if pre_list == item: 106 | pre_state = loop + 1 107 | 108 | if post_list == item: 109 | post_state = loop + 1 110 | 111 | return pre_state, post_state 112 | 113 | 114 | 115 | def position_name(position_number): 116 | """ 117 | """ 118 | position_dic = { 119 | '1':'P', #pitcher 120 | '2':'C', #catcher 121 | '3':'1B', #first baseman 122 | '4':'2B', #second baseman 123 | '5':'3B', #thrid baseman 124 | '6':'SS', #shortstop 125 | '7':'LF', #left fielder 126 | '8':'CF', #center fielder 127 | '9':'RF', #right fielder 128 | '10':'DH', #designated hitter 129 | '11':'PH', #pinch hitter 130 | '12':'PR', #pinch runner 131 | } 132 | if position_number in position_dic: 133 | return position_dic[position_number] 134 | return position_number 135 | 136 | 137 | def field_conditions(string): 138 | """ 139 | fieldcond: dry, soaked, wet, unknown; 140 | precip: drizzle, none, rain, showers, snow, unknown; 141 | sky: cloudy, dome, night, overcast, sunny, unknown; 142 | winddir: fromcf, fromlf, fromrf, ltor, rtol, tocf, tolf, torf, unknown; 143 | temp: (0 is unkown) 144 | windspeed: (-1 is unkown) 145 | """ 146 | pass 147 | -------------------------------------------------------------------------------- /retrosheet/info.txt: -------------------------------------------------------------------------------- 1 | 2 | ################################################################################ 3 | Below is a summary of play notations 4 | ################################################################################ 5 | 6 | # Distinct markers: 7 | ### '!' Exceptional play 8 | ### '?' uncertainty about the play 9 | ### '#' uncertainty about the play 10 | ### '+' modifying a trajectory (it is also a separator) 11 | ### '-' modifying a trajectory 12 | 13 | # Separators: 14 | ### '/' for modifier of main play 15 | ### '.' for advances 16 | ### ';' for splitting plays or advances. 17 | ### '+' for seconday plays in the same event 18 | ### (): for explanation of/on advances 19 | 20 | # Attacking players / positions: 21 | ### 'B' = batter 22 | ### '1 /2 /3' = base runner 23 | ### 'H' = home 24 | 25 | # Defensive positions: 26 | ### same naming as position_dic (1 for pitcher, 2 for catcher, etc.) 27 | 28 | # Ball trajectories: 29 | ### G for ground ball 30 | ### L for line drive 31 | ### P for pop up 32 | ### F for a fly ball 33 | ### BG for bunt grounder 34 | ### BP for bunt pop up 35 | 36 | # Play codes involving batter: 37 | ### GDP Grounded into Double Play (.e.g 64(1)3/GDP/G6) 38 | ### G ground ball 39 | ### FO fource out 40 | ### SH sacrifice hit or bunt 41 | ### 99 unkown plays 42 | ### SF sacrifice fly 43 | ### C/E2 catcher interference (implicit B-1) 44 | ### C/E1 or C/E3 intereference by pitcher or first baseman (batter not charged with at bat) 45 | ### S$ single 46 | ### D$ double 47 | ### T$ triple 48 | ### S / D / T single, double, triple play (implicit B-1, B-2, B-3) 49 | ### DGR ground rule double (when the ball leaves play after fair hit. Two bases awarded to every player) 50 | ### E$ error allowing batter ot get on base. B-1 can be implicit. $ indicateds position (1 for pitcher, 2 for cather, etc) 51 | ### FC$ Fielder's choice (offensive player reaching a base due to the defense's attempt to put out another baserunner). B-1 might be implicit 52 | ### FLE$ Error on foul fly ball 53 | ### H or HR home run leaving the ball park (e.g. HR/F78XD.2-H;1-H) 54 | ### H$ or HR$ indicates an inside-the-park home run by giving a fielder as part of the code (e.g. HR9/F9LS.3-H;1-H) 55 | ### HP batter hit by pitch. B-1 implicit 56 | ### K strike-out 57 | ### K+event On third strikes various base running play may also occur. The event can be SB%, CS%, OA, PO%, PB, WP and E$ 58 | ### NP no play, substitutions happening 59 | ### I or IW intentional walk 60 | ### W walk. B-1 implicit 61 | ### W+event, IW+event On ball four various base running plays may also occur. The event can be SB%, CS%, PO%, PB, WP and E$ 62 | 63 | # Play codes not involving batter 64 | ### BK balk (pitcher illegal move). Other advances might occur but batter remains on plate 65 | ### CS%($$) caught stealing (e.g. 'CSH(12)','CS2(24).2-3'). An error might null the caught stealing (e.g. 'CS2(24).2-3') 66 | ### DI defensive indifference. When there is no attemp to prevent a stolen base. Advances explicit 67 | ### OA baserunner advance not covered by other codes. 68 | ### PB passed ball - catcher is unable to handle a pitch and a base runner advances 69 | ### WP wil pitch - catcher is unable to handle a pitch and a base runner advances 70 | ### PO%($$) - picked off of base % (1,2, or 3) with the ($$) indicating the throw(s) and fielder making the putout (e.g. 'PO2(14)', 'PO1(E3).1-2') 71 | ### POCS%($$) picked off off base % (2, 3 or H) with the runner charged with a caught stealing. The ($$) is the sequence of throws resulting in the out 72 | ### SB% stolen base. Bases for % can be 2,3, or H 73 | 74 | # Modifiers (preceded by '/') 75 | ### AP appeal play 76 | ### BP pop up bunt 77 | ### BG ground ball bunt 78 | ### BGDP bunt grounded into double play 79 | ### BINT batter interference 80 | ### BL line drive bunt 81 | ### BOOT batting out of turn 82 | ### BP bunt pop up 83 | ### BPDP bunt popped into double play 84 | ### BR runner hit by batted ball 85 | ### C called third strike 86 | ### COUB courtesy batter 87 | ### COUF courtesy fielder 88 | ### COUR courtesy runner 89 | ### DP unspecified double play 90 | ### E$ error on $ 91 | ### F fly 92 | ### FDP fly ball double play 93 | ### FINT fan interference 94 | ### FL foul 95 | ### FO force out 96 | ### G ground ball 97 | ### GDP ground ball double play 98 | ### GTP ground ball triple play 99 | ### IF infield fly rule 100 | ### INT interference 101 | ### IPHR inside the park home run 102 | ### L line drive 103 | ### LDP lined into double play 104 | ### LTP lined into triple play 105 | ### MREV manager challenge of call on the field 106 | ### NDP no double play credited for this play 107 | ### OBS obstruction (fielder obstructing a runner) 108 | ### P pop fly 109 | ### PASS a runner passed another runner and was called out 110 | ### R$ relay throw from the initial fielder to $ with no out made 111 | ### RINT runner interference 112 | ### SF sacrifice fly 113 | ### SH sacrifice hit (bunt) 114 | ### TH throw 115 | ### TH% throw to base % 116 | ### TP unspecified triple play 117 | ### UINT umpire interference 118 | ### UREV umpire review of call on the field 119 | -------------------------------------------------------------------------------- /retrosheet/parser.py: -------------------------------------------------------------------------------- 1 | # encoding: utf-8 2 | 3 | from .event import event 4 | from .game import parse_files 5 | from .version import __version__ 6 | import logging 7 | 8 | class Retrosheet(event, parse_files): 9 | 10 | """A python object to parse retrosheet data""" 11 | 12 | def __init__(self): 13 | self.__version__ = __version__ 14 | self.log = logging.getLogger(__name__) 15 | event.__init__(self) 16 | parse_files.__init__(self) 17 | 18 | 19 | def batch_parse(self, yearFrom = None, yearTo = None, batchsize=10, append=True): 20 | """ 21 | """ 22 | yearTo = yearTo if yearTo else '2017' 23 | yearFrom = yearFrom if yearFrom else yearTo 24 | 25 | if yearFrom < 1921 or yearTo > 2017 or yearTo < yearFrom: 26 | raise InvalidYearError('Invalid Years', (yearFrom, yearTo)) 27 | 28 | batches = int((yearTo - yearFrom + 1)/batchsize)+1 29 | 30 | 31 | 32 | for loop, batch in enumerate(range(batches)): 33 | start_year = yearFrom if loop == 0 else end_year + 1 34 | 35 | end_year = start_year + batchsize-1 if (start_year + batchsize-1) <= yearTo else yearTo 36 | 37 | self.get_data(yearFrom=start_year, yearTo=end_year) 38 | self.to_df() 39 | self.save_csv(path_str='', append = False) if loop == 0 else self.save_csv(path_str='', append = True) 40 | 41 | #empty datasets for free-up memory 42 | self.file = None 43 | self.game_list = [] 44 | self.zipfile = None 45 | self.teams_list = [] 46 | self.rosters_list = [] 47 | self.plays = None 48 | self.info = None 49 | self.lineup = None 50 | self.fielding = None 51 | self.pitching = None 52 | self.batting = None 53 | self.running = None 54 | self.rosters = None 55 | self.teams = None 56 | 57 | 58 | class InvalidYearError(Exception): 59 | """ Exception that is raised when years are not within possible range 60 | """ 61 | def __init__(self, error, years): 62 | self.log = logging.getLogger(__name__) 63 | self.log.debug("Invalid Year Passed: {0}-{1}".format(years[0], years[1])) 64 | super(InvalidYearError, self).__init__(years) 65 | -------------------------------------------------------------------------------- /retrosheet/statistics.txt: -------------------------------------------------------------------------------- 1 | Batting Statistics 2 | 3 | 1B – Single: hits on which the batter reaches first base safely without the contribution of a fielding error. 4 | 2B – Double: hits on which the batter reaches second base safely without the contribution of a fielding error. 5 | 3B – Triple: hits on which the batter reaches third base safely without the contribution of a fielding error. 6 | AB – At bat: Plate appearances, not including bases on balls, being hit by pitch, sacrifices, interference, or obstruction. 7 | BB – Base on balls (also called a "walk"): hitter not swinging at four pitches called out of the strike zone and awarded first base. 8 | FC – Fielder's choice: times reaching base safely because a fielder chose to try for an out on another runner 9 | GDP or GIDP – Ground into double play: number of ground balls hit that became double plays 10 | H – Hits: times reached base because of a batted, fair ball without error by the defense 11 | HBP – Hit by pitch: times touched by a pitch and awarded first base as a result 12 | HR – Home runs: hits on which the batter successfully touched all four bases, without the contribution of a fielding error. 13 | ITPHR – Inside-the-park home run: hits on which the batter successfully touched all four bases, without the contribution of a fielding error or the ball going outside the ball park. 14 | IBB – Intentional base on balls: times awarded first base on balls (see BB above) deliberately thrown by the pitcher. Also known as IW (intentional walk). 15 | K – Strike out (also abbreviated SO): number of times that a third strike is taken or swung at and missed, or bunted foul. Catcher must catch the third strike or batter may attempt to run to first base. 16 | LOB – Left on base: number of runners neither out nor scored at the end of an inning. 17 | PA – Plate appearance: number of completed batting appearances 18 | R – Runs scored: number of times a player crosses home plate 19 | RBI – Run batted in: number of runners who score due to a batters' action, except when batter grounded into double play or reached on an error 20 | ROE - Reach on Error: Advance to a base due to a fielding error. 21 | SF – Sacrifice fly: Fly balls hit to the outfield which although caught for an out, allow a baserunner to advance 22 | SH – Sacrifice hit: number of sacrifice bunts which allow runners to advance on the basepaths 23 | 24 | 25 | Baserunning statistics 26 | 27 | SB – Stolen base: number of bases advanced by the runner while the ball is in the possession of the defense. 28 | CS – Caught stealing: times tagged out while attempting to steal a base 29 | DI – Defensive Indifference: if the catcher does not attempt to throw out a runner (usually because the base would be insignificant), the runner is not awarded a steal. Scored as a fielder's choice. 30 | R – Runs scored: times reached home plate legally and safely 31 | 32 | 33 | Pitching statistics 34 | 35 | BB – Base on balls (also called a "walk"): times pitching four balls, allowing the batter to take first base 36 | BF – Total batters faced: opponent team's total plate appearances 37 | BK – Balk: number of times pitcher commits an illegal pitching action while in contact with the pitching rubber as judged by umpire, resulting in baserunners advancing one base 38 | **BS – Blown save: number of times entering the game in a save situation, and being charged the run (earned or not) which eliminates his team's lead 39 | CG – Complete game: number of games where player was the only pitcher for his team 40 | ER – Earned run: number of runs that did not occur as a result of errors or passed balls 41 | FPOM – First pitch outs made: Number of outs earned where the batter ground or flies out on the first pitch. 42 | G – Games (AKA "appearances"): number of times a pitcher pitches in a season 43 | GF – Games finished: number of games pitched where player was the final pitcher for his team as a relief pitcher 44 | **GIDP – Double plays induced: number of double play groundouts induced 45 | GIR - Games in relief: games as a non starting pitcher 46 | GS – Starts: number of games pitched where player was the first pitcher for his team 47 | H (or HA) – Hits allowed: total hits allowed 48 | **HLD (or H) – Hold: number of games entered in a save situation, recorded at least one out, did not surrender the lead, and did not complete the game 49 | HR (or HRA) – Home runs allowed: total home runs allowed 50 | IBB – Intentional base on balls allowed 51 | IR – Inherited runners: number of runners on base when the pitcher enters the game 52 | K (or SO) – Strikeout: number of batters who received strike three 53 | LOB% – Left-on-base percentage: LOB% represents the percentage of baserunners a pitcher does not allow to score. LOB% tends to regress toward 70–72% over time, so unusually high or low percentages could indicate that pitcher's ERA could be expected to rise or lower in the future. An occasional exception to this logic is a pitcher with a very high strikeout rate.[3] 54 | PIT (or NP) – Pitches thrown (Pitch count) 55 | **RRA – Relief run average: A function of how many inherited base runners a relief pitcher allowed to score. 56 | **SV – Save: number of games where the pitcher enters a game led by the pitcher's team, finishes the game without surrendering the lead, is not the winning pitcher, and either (a) the lead was three runs or fewer when the pitcher entered the game; (b) the potential tying run was on base, at bat, or on deck; or (c) the pitcher pitched three or more innings 57 | **SVO – Save opportunity: When a pitcher 1) enters the game with a lead of three or fewer runs and pitches at least one inning, 2) enters the game with the potential tying run on base, at bat, or on deck, or 3) pitches three or more innings with a lead and is credited with a save by the official scorer 58 | **W – Win: number of games where pitcher was pitching while his team took the lead and went on to win, also the starter needs to pitch at least 5 innings of work (also related: winning percentage) 59 | **whiff rate: a term, usually used in reference to pitchers, that divides the number of pitches swung at and missed by the total number of swings in a given sample. If a pitcher throws 100 pitches at which batters swing, and the batters fail to make contact on 26 of them, the pitcher's whiff rate is 26%. 60 | WP – Wild pitches: charged when a pitch is too high, low, or wide of home plate for the catcher to field, thereby allowing one or more runners to advance or score 61 | 62 | 63 | Fielding statistics 64 | 65 | A – Assists: number of outs recorded on a play where a fielder touched the ball, except if such touching is the putout 66 | CI – Catcher's Interference (e.g., catcher makes contact with bat) 67 | DP – Double plays: one for each double play during which the fielder recorded a putout or an assist. 68 | E – Errors: number of times a fielder fails to make a play he should have made with common effort, and the offense benefits as a result 69 | INN – Innings: number of innings that a player is at one certain position 70 | PB – Passed ball: charged to the catcher when the ball is dropped and one or more runners advance 71 | PO – Putout: number of times the fielder tags, forces, or appeals a runner and he is called out as a result 72 | **TC – Total chances: assists plus putouts plus errors 73 | TP – Triple play: one for each triple play during which the fielder recorded a putout or an assist 74 | -------------------------------------------------------------------------------- /retrosheet/version.py: -------------------------------------------------------------------------------- 1 | # encoding: utf-8 2 | 3 | __version__ = '0.1.0' 4 | --------------------------------------------------------------------------------