├── LICENSE ├── README.md ├── mapping_nba_ids ├── .gitignore ├── README.md ├── mapnbaid.py ├── mapping_nba_ids.csv └── requirements.txt └── sat_logo.jpeg /LICENSE: -------------------------------------------------------------------------------- 1 | Apache License 2 | Version 2.0, January 2004 3 | http://www.apache.org/licenses/ 4 | 5 | TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 6 | 7 | 1. Definitions. 8 | 9 | "License" shall mean the terms and conditions for use, reproduction, 10 | and distribution as defined by Sections 1 through 9 of this document. 11 | 12 | "Licensor" shall mean the copyright owner or entity authorized by 13 | the copyright owner that is granting the License. 14 | 15 | "Legal Entity" shall mean the union of the acting entity and all 16 | other entities that control, are controlled by, or are under common 17 | control with that entity. For the purposes of this definition, 18 | "control" means (i) the power, direct or indirect, to cause the 19 | direction or management of such entity, whether by contract or 20 | otherwise, or (ii) ownership of fifty percent (50%) or more of the 21 | outstanding shares, or (iii) beneficial ownership of such entity. 22 | 23 | "You" (or "Your") shall mean an individual or Legal Entity 24 | exercising permissions granted by this License. 25 | 26 | "Source" form shall mean the preferred form for making modifications, 27 | including but not limited to software source code, documentation 28 | source, and configuration files. 29 | 30 | "Object" form shall mean any form resulting from mechanical 31 | transformation or translation of a Source form, including but 32 | not limited to compiled object code, generated documentation, 33 | and conversions to other media types. 34 | 35 | "Work" shall mean the work of authorship, whether in Source or 36 | Object form, made available under the License, as indicated by a 37 | copyright notice that is included in or attached to the work 38 | (an example is provided in the Appendix below). 39 | 40 | "Derivative Works" shall mean any work, whether in Source or Object 41 | form, that is based on (or derived from) the Work and for which the 42 | editorial revisions, annotations, elaborations, or other modifications 43 | represent, as a whole, an original work of authorship. For the purposes 44 | of this License, Derivative Works shall not include works that remain 45 | separable from, or merely link (or bind by name) to the interfaces of, 46 | the Work and Derivative Works thereof. 47 | 48 | "Contribution" shall mean any work of authorship, including 49 | the original version of the Work and any modifications or additions 50 | to that Work or Derivative Works thereof, that is intentionally 51 | submitted to Licensor for inclusion in the Work by the copyright owner 52 | or by an individual or Legal Entity authorized to submit on behalf of 53 | the copyright owner. For the purposes of this definition, "submitted" 54 | means any form of electronic, verbal, or written communication sent 55 | to the Licensor or its representatives, including but not limited to 56 | communication on electronic mailing lists, source code control systems, 57 | and issue tracking systems that are managed by, or on behalf of, the 58 | Licensor for the purpose of discussing and improving the Work, but 59 | excluding communication that is conspicuously marked or otherwise 60 | designated in writing by the copyright owner as "Not a Contribution." 61 | 62 | "Contributor" shall mean Licensor and any individual or Legal Entity 63 | on behalf of whom a Contribution has been received by Licensor and 64 | subsequently incorporated within the Work. 65 | 66 | 2. Grant of Copyright License. Subject to the terms and conditions of 67 | this License, each Contributor hereby grants to You a perpetual, 68 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 69 | copyright license to reproduce, prepare Derivative Works of, 70 | publicly display, publicly perform, sublicense, and distribute the 71 | Work and such Derivative Works in Source or Object form. 72 | 73 | 3. Grant of Patent License. Subject to the terms and conditions of 74 | this License, each Contributor hereby grants to You a perpetual, 75 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 76 | (except as stated in this section) patent license to make, have made, 77 | use, offer to sell, sell, import, and otherwise transfer the Work, 78 | where such license applies only to those patent claims licensable 79 | by such Contributor that are necessarily infringed by their 80 | Contribution(s) alone or by combination of their Contribution(s) 81 | with the Work to which such Contribution(s) was submitted. If You 82 | institute patent litigation against any entity (including a 83 | cross-claim or counterclaim in a lawsuit) alleging that the Work 84 | or a Contribution incorporated within the Work constitutes direct 85 | or contributory patent infringement, then any patent licenses 86 | granted to You under this License for that Work shall terminate 87 | as of the date such litigation is filed. 88 | 89 | 4. Redistribution. You may reproduce and distribute copies of the 90 | Work or Derivative Works thereof in any medium, with or without 91 | modifications, and in Source or Object form, provided that You 92 | meet the following conditions: 93 | 94 | (a) You must give any other recipients of the Work or 95 | Derivative Works a copy of this License; and 96 | 97 | (b) You must cause any modified files to carry prominent notices 98 | stating that You changed the files; and 99 | 100 | (c) You must retain, in the Source form of any Derivative Works 101 | that You distribute, all copyright, patent, trademark, and 102 | attribution notices from the Source form of the Work, 103 | excluding those notices that do not pertain to any part of 104 | the Derivative Works; and 105 | 106 | (d) If the Work includes a "NOTICE" text file as part of its 107 | distribution, then any Derivative Works that You distribute must 108 | include a readable copy of the attribution notices contained 109 | within such NOTICE file, excluding those notices that do not 110 | pertain to any part of the Derivative Works, in at least one 111 | of the following places: within a NOTICE text file distributed 112 | as part of the Derivative Works; within the Source form or 113 | documentation, if provided along with the Derivative Works; or, 114 | within a display generated by the Derivative Works, if and 115 | wherever such third-party notices normally appear. The contents 116 | of the NOTICE file are for informational purposes only and 117 | do not modify the License. You may add Your own attribution 118 | notices within Derivative Works that You distribute, alongside 119 | or as an addendum to the NOTICE text from the Work, provided 120 | that such additional attribution notices cannot be construed 121 | as modifying the License. 122 | 123 | You may add Your own copyright statement to Your modifications and 124 | may provide additional or different license terms and conditions 125 | for use, reproduction, or distribution of Your modifications, or 126 | for any such Derivative Works as a whole, provided Your use, 127 | reproduction, and distribution of the Work otherwise complies with 128 | the conditions stated in this License. 129 | 130 | 5. Submission of Contributions. Unless You explicitly state otherwise, 131 | any Contribution intentionally submitted for inclusion in the Work 132 | by You to the Licensor shall be under the terms and conditions of 133 | this License, without any additional terms or conditions. 134 | Notwithstanding the above, nothing herein shall supersede or modify 135 | the terms of any separate license agreement you may have executed 136 | with Licensor regarding such Contributions. 137 | 138 | 6. Trademarks. This License does not grant permission to use the trade 139 | names, trademarks, service marks, or product names of the Licensor, 140 | except as required for reasonable and customary use in describing the 141 | origin of the Work and reproducing the content of the NOTICE file. 142 | 143 | 7. Disclaimer of Warranty. Unless required by applicable law or 144 | agreed to in writing, Licensor provides the Work (and each 145 | Contributor provides its Contributions) on an "AS IS" BASIS, 146 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 147 | implied, including, without limitation, any warranties or conditions 148 | of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A 149 | PARTICULAR PURPOSE. You are solely responsible for determining the 150 | appropriateness of using or redistributing the Work and assume any 151 | risks associated with Your exercise of permissions under this License. 152 | 153 | 8. Limitation of Liability. In no event and under no legal theory, 154 | whether in tort (including negligence), contract, or otherwise, 155 | unless required by applicable law (such as deliberate and grossly 156 | negligent acts) or agreed to in writing, shall any Contributor be 157 | liable to You for damages, including any direct, indirect, special, 158 | incidental, or consequential damages of any character arising as a 159 | result of this License or out of the use or inability to use the 160 | Work (including but not limited to damages for loss of goodwill, 161 | work stoppage, computer failure or malfunction, or any and all 162 | other commercial damages or losses), even if such Contributor 163 | has been advised of the possibility of such damages. 164 | 165 | 9. Accepting Warranty or Additional Liability. While redistributing 166 | the Work or Derivative Works thereof, You may choose to offer, 167 | and charge a fee for, acceptance of support, warranty, indemnity, 168 | or other liability obligations and/or rights consistent with this 169 | License. However, in accepting such obligations, You may act only 170 | on Your own behalf and on Your sole responsibility, not on behalf 171 | of any other Contributor, and only if You agree to indemnify, 172 | defend, and hold each Contributor harmless for any liability 173 | incurred by, or claims asserted against, such Contributor by reason 174 | of your accepting any such warranty or additional liability. 175 | 176 | END OF TERMS AND CONDITIONS 177 | 178 | APPENDIX: How to apply the Apache License to your work. 179 | 180 | To apply the Apache License to your work, attach the following 181 | boilerplate notice, with the fields enclosed by brackets "[]" 182 | replaced with your own identifying information. (Don't include 183 | the brackets!) The text should be enclosed in the appropriate 184 | comment syntax for the file format. We also recommend that a 185 | file or class name and description of purpose be included on the 186 | same "printed page" as the copyright notice for easier 187 | identification within third-party archives. 188 | 189 | Copyright [yyyy] [name of copyright owner] 190 | 191 | Licensed under the Apache License, Version 2.0 (the "License"); 192 | you may not use this file except in compliance with the License. 193 | You may obtain a copy of the License at 194 | 195 | http://www.apache.org/licenses/LICENSE-2.0 196 | 197 | Unless required by applicable law or agreed to in writing, software 198 | distributed under the License is distributed on an "AS IS" BASIS, 199 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 200 | See the License for the specific language governing permissions and 201 | limitations under the License. 202 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 |

2 | 3 |

4 | 5 |

Sport Analytics Tools

6 | 7 | **Sport Analytics Tools** is a project dedicated to publishing information (code, tutorials, media) to help data scientists and sports enthusiasts work with sports data effectively. 8 | 9 | ## Motivation 10 | 11 | - I believe that open-source solutions are superior to proprietary technologies. 12 | - I believe that if you've solved a problem or possess valuable knowledge, you should share it. 13 | 14 | ## Objective 15 | 16 | The goal of this project is to assist people interested in sports data science in tackling data analysis tasks. 17 | 18 | - I've developed **nba_data**, a repository of NBA data that can be accessed in seconds instead of hours via the NBA API. 19 | - I am the author of the **nba-on-court** library, which simplifies working with NBA data. 20 | - I am a contributor to well-known sports libraries such as **nba_api**, **hoopR**, and **worldfootballR**. 21 | 22 | Through this project, I aim to create a comprehensive knowledge base of tools and resources to enhance the workflow with sports data. 23 | 24 | ## List of Projects 25 | 26 | ### 1. NBA Player ID Mapping Tool 🏀 27 | Tool for mapping player IDs between NBA Stats API and Basketball Reference. 28 | 29 | #### Features 30 | - Automated ID mapping between different basketball data sources 31 | - Multiple matching algorithms for high accuracy 32 | - Handles special cases and non-English names 33 | - Easy-to-use Python interface 34 | 35 | #### Requirements 36 | - Python 3.8+ 37 | - Core dependencies: beautifulsoup4, numpy, pandas, requests, nba_api, python-Levenshtein 38 | 39 | [Learn more about NBA Player ID Mapping Tool →](https://github.com/shufinskiy/sport_analytics_tools/tree/main/mapping_nba_ids) 40 | 41 | ## Project Table 42 | 43 | |Name|Description| 44 | |------|---------| 45 | |[NBA Player ID Mapping Tool](https://github.com/shufinskiy/sport_analytics_tools/tree/main/mapping_nba_ids)| Code automating the process of mapping ID from the NBA website and basketball-reference| 46 | 47 | ## Installation 48 | 49 | ```bash 50 | # Clone the repository 51 | git clone https://github.com/shufinskiy/sport_analytics_tools.git 52 | cd sport_analytics_tools 53 | 54 | # Install dependencies 55 | pip install -r requirements.txt 56 | ``` 57 | 58 | ## Contributing 🤝 59 | Contributions are welcome! Please feel free to submit pull requests, particularly for: 60 | 61 | - Adding new tools for sports analytics 62 | - Improving existing functionality 63 | - Adding documentation and tutorials 64 | - Bug fixes and optimizations 65 | 66 | ## License 📄 67 | Apache License 2.0 68 | 69 | ## Contact 📫 70 | 71 | 84 | -------------------------------------------------------------------------------- /mapping_nba_ids/.gitignore: -------------------------------------------------------------------------------- 1 | # Byte-compiled / optimized / DLL files 2 | __pycache__/ 3 | *.py[cod] 4 | *$py.class 5 | 6 | # C extensions 7 | *.so 8 | 9 | # Distribution / packaging 10 | .Python 11 | build/ 12 | develop-eggs/ 13 | dist/ 14 | downloads/ 15 | eggs/ 16 | .eggs/ 17 | lib/ 18 | lib64/ 19 | parts/ 20 | sdist/ 21 | var/ 22 | wheels/ 23 | share/python-wheels/ 24 | *.egg-info/ 25 | .installed.cfg 26 | *.egg 27 | MANIFEST 28 | 29 | # PyInstaller 30 | # Usually these files are written by a python script from a template 31 | # before PyInstaller builds the exe, so as to inject date/other infos into it. 32 | *.manifest 33 | *.spec 34 | 35 | # Installer logs 36 | pip-log.txt 37 | pip-delete-this-directory.txt 38 | 39 | # Unit test / coverage reports 40 | htmlcov/ 41 | .tox/ 42 | .nox/ 43 | .coverage 44 | .coverage.* 45 | .cache 46 | nosetests.xml 47 | coverage.xml 48 | *.cover 49 | *.py,cover 50 | .hypothesis/ 51 | .pytest_cache/ 52 | cover/ 53 | 54 | # Translations 55 | *.mo 56 | *.pot 57 | 58 | # Django stuff: 59 | *.log 60 | local_settings.py 61 | db.sqlite3 62 | db.sqlite3-journal 63 | 64 | # Flask stuff: 65 | instance/ 66 | .webassets-cache 67 | 68 | # Scrapy stuff: 69 | .scrapy 70 | 71 | # Sphinx documentation 72 | docs/_build/ 73 | 74 | # PyBuilder 75 | .pybuilder/ 76 | target/ 77 | 78 | # Jupyter Notebook 79 | .ipynb_checkpoints 80 | 81 | # IPython 82 | profile_default/ 83 | ipython_config.py 84 | 85 | # pyenv 86 | # For a library or package, you might want to ignore these files since the code is 87 | # intended to run in multiple environments; otherwise, check them in: 88 | # .python-version 89 | 90 | # pipenv 91 | # According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control. 92 | # However, in case of collaboration, if having platform-specific dependencies or dependencies 93 | # having no cross-platform support, pipenv may install dependencies that don't work, or not 94 | # install all needed dependencies. 95 | #Pipfile.lock 96 | 97 | # UV 98 | # Similar to Pipfile.lock, it is generally recommended to include uv.lock in version control. 99 | # This is especially recommended for binary packages to ensure reproducibility, and is more 100 | # commonly ignored for libraries. 101 | #uv.lock 102 | 103 | # poetry 104 | # Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control. 105 | # This is especially recommended for binary packages to ensure reproducibility, and is more 106 | # commonly ignored for libraries. 107 | # https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control 108 | #poetry.lock 109 | 110 | # pdm 111 | # Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control. 112 | #pdm.lock 113 | # pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it 114 | # in version control. 115 | # https://pdm.fming.dev/latest/usage/project/#working-with-version-control 116 | .pdm.toml 117 | .pdm-python 118 | .pdm-build/ 119 | 120 | # PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm 121 | __pypackages__/ 122 | 123 | # Celery stuff 124 | celerybeat-schedule 125 | celerybeat.pid 126 | 127 | # SageMath parsed files 128 | *.sage.py 129 | 130 | # Environments 131 | .env 132 | .venv 133 | env/ 134 | venv/ 135 | ENV/ 136 | env.bak/ 137 | venv.bak/ 138 | 139 | # Spyder project settings 140 | .spyderproject 141 | .spyproject 142 | 143 | # Rope project settings 144 | .ropeproject 145 | 146 | # mkdocs documentation 147 | /site 148 | 149 | # mypy 150 | .mypy_cache/ 151 | .dmypy.json 152 | dmypy.json 153 | 154 | # Pyre type checker 155 | .pyre/ 156 | 157 | # pytype static type analyzer 158 | .pytype/ 159 | 160 | # Cython debug symbols 161 | cython_debug/ 162 | 163 | # PyCharm 164 | # JetBrains specific template is maintained in a separate JetBrains.gitignore that can 165 | # be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore 166 | # and can be added to the global gitignore or merged into this file. For a more nuclear 167 | # option (not recommended) you can uncomment the following to ignore the entire idea folder. 168 | #.idea/ 169 | -------------------------------------------------------------------------------- /mapping_nba_ids/README.md: -------------------------------------------------------------------------------- 1 | # NBA Player ID Mapping Tool 🏀 2 | 3 | A Python tool for mapping player IDs between NBA Stats API and Basketball Reference. This tool helps solve the common challenge of matching player data across different basketball data sources. 4 | 5 | ## Why This Tool? 🤔 6 | 7 | When working with basketball data, analysts often need to combine data from multiple sources. Two of the most popular sources are: 8 | - NBA Stats API (official NBA statistics) 9 | - Basketball Reference (comprehensive historical data) 10 | 11 | However, these sources use different ID systems for players, making it difficult to merge data. This tool creates a mapping between these IDs, allowing for seamless data integration. 12 | 13 | ## How It Works 🛠️ 14 | 15 | The tool uses a multi-step matching algorithm to ensure the highest possible accuracy: 16 | 17 | 1. **Exact Name Matching** 📋 18 | - First attempts to match players by their exact names 19 | - Separates cases with no matches and multiple matches for further processing 20 | 21 | 2. **Multiple Match Resolution** 🔄 22 | - For players with multiple potential matches, uses additional criteria like active years 23 | - Creates separate handling for special cases 24 | 25 | 3. **Non-English Character Handling** 🌐 26 | - Processes names containing non-English characters 27 | - Attempts various transliterations to find matches 28 | 29 | 4. **Surname-Based Matching** 👥 30 | - Matches players using surnames when full names don't match 31 | - Includes additional verification using career years 32 | 33 | 5. **Fuzzy Matching** 🔍 34 | - Removes punctuation and special characters 35 | - Uses Levenshtein distance for approximate string matching 36 | 37 | 6. **Manual Dictionary Mapping** 📘 38 | - Falls back to a pre-defined mapping for special cases 39 | - Handles edge cases that automated matching can't resolve 40 | 41 | ## Usage 💻 42 | 43 | ```python 44 | from mapping_nba_ids import mapping_nba_id 45 | 46 | # Basic usage with default parameters 47 | mapped_players = mapping_nba_id() 48 | 49 | # Advanced usage with custom parameters 50 | mapped_players = mapping_nba_id( 51 | verbose=True, # Print progress information 52 | letters='abcde', # Only process players whose names start with these letters 53 | base_url='https://www.basketball-reference.com/players' # Custom base URL 54 | ) 55 | ``` 56 | 57 | ## Requirements 📦 58 | 59 | ### Python Version 60 | - Python 3.8 or higher 61 | 62 | ### Required Libraries 63 | ```txt 64 | nba_api>=1.4.0 65 | numpy>=1.22.2,<2.0.0 66 | pandas>=2.0.0 67 | Levenshtein==0.26.1 68 | beautifulsoup4>=4.10.0 69 | requests>=2.31.0 70 | lxml>=5.2.0 71 | ``` 72 | 73 | ## Output 📊 74 | The tool returns a pandas DataFrame containing: 75 | 76 | - NBA Stats API Player ID 77 | - Player Name 78 | - Basketball Reference ID 79 | - Basketball Reference URL 80 | 81 | **ID mapping table is located in mapping_nba_ids.csv file and will be updated periodically. You can run the code locally or just download this file.** 82 | 83 | ## Contributing 🤝 84 | Contributions are welcome! Here's how you can help: 85 | 86 | 1. Fork the repository 87 | 2. Create your feature branch (`git checkout -b feature/amazing-feature`) 88 | 3. Commit your changes (`git commit -m 'Add some amazing feature`) 89 | 4. Push to the branch (`git push origin feature/amazing-feature`) 90 | 5. Open a Pull Request 91 | 92 | Author ✍️ 93 | shufinskiy - [GitHub Profile](https://github.com/shufinskiy) 94 | 95 | - 📫 How to reach me: Create an issue in this repository 96 | - 🌟 If you find this tool useful, please consider giving it a star! 97 | 98 | -------------------------------------------------------------------------------- /mapping_nba_ids/mapnbaid.py: -------------------------------------------------------------------------------- 1 | """ 2 | Module for mapping NBA player IDs between different data sources. 3 | This module provides functionality to map player IDs between NBA Stats API and Basketball Reference. 4 | """ 5 | 6 | from string import ascii_lowercase 7 | from pathlib import Path 8 | from typing import Optional, Union 9 | from itertools import product 10 | import re 11 | 12 | import requests 13 | from bs4 import BeautifulSoup 14 | import numpy as np 15 | import pandas as pd 16 | from nba_api.stats.endpoints import CommonAllPlayers 17 | from Levenshtein import distance 18 | 19 | 20 | ENGLISH = np.hstack((np.arange(65, 91),np.arange(97, 123), np.array([32, 45, 46]))) 21 | 22 | MAPPING_DICT = { 23 | 202392: 'blakema01', 24 | 1629129: 'bluietr01', 25 | 1642486: 'buiebo01', 26 | 202221: 'butchbr01', 27 | 1642382: 'carlsbr01', 28 | 1642269: 'cartede02', 29 | 1642353: 'chrisca02', 30 | 1642384: 'crawfis01', 31 | 1642368: 'nfalyda01', 32 | 76521: 'davisdw01', 33 | 1642399: 'edwarje01', 34 | 1642348: 'edwarju01', 35 | 203543: 'favervi01', 36 | 1642280: 'flowetr01', 37 | 202070: 'gaffnto01', 38 | 1641945: 'galloja01', 39 | 1619: 'garriki01', 40 | 2775: 'seungha01', 41 | 202238: 'hasbrke01', 42 | 1641747: 'holmeda01', 43 | 1630258: 'homesca01', 44 | 77082: 'hundlho01', 45 | 201998: 'jerrecu01', 46 | 1642352: 'johnske10', 47 | 77199: 'joneswa01', 48 | 1641752: 'klintbo01', 49 | 1630249: 'krejcvi01', 50 | 986: 'mannma01', 51 | 1641970: 'pereima01', 52 | 77510: 'mcclate01', 53 | 1641755: 'mcculke01', 54 | 203183: 'mitchto03', 55 | 203502: 'mitchto02', 56 | 1642439: 'olivaqu01', 57 | 1629341: 'phillta01', 58 | 1642366: 'postqu01', 59 | 202375: 'rollema01', 60 | 202067: 'simpsdi01', 61 | 1630569: 'stewadj01', 62 | 1630597: 'stewadj02', 63 | 78302: 'taylofa01', 64 | 1642260: 'topicni01', 65 | 201987: 'vadenro01', 66 | 78409: 'vaughch01', 67 | 1630492: 'vildolu01', 68 | 202358: 'whitete01', 69 | 78539: 'williar01', 70 | 1629624: 'wooteke01', 71 | 1642385: 'cuiyo01', 72 | 1631322: "mccoyja01" 73 | } 74 | 75 | 76 | class PlayerDataBBref(object): 77 | """Class for scraping player data from Basketball Reference website. 78 | 79 | This class handles the scraping of player data from basketball-reference.com, 80 | organizing it by player name's first letter. 81 | 82 | Attributes: 83 | base_url (str): Base URL for basketball-reference player pages. 84 | letters (str): Letters to iterate through for player lookup. 85 | verbose (bool): If True, prints progress information during scraping. 86 | bbref_players (list): List of dictionaries containing player information. 87 | """ 88 | 89 | def __init__(self, 90 | base_url: str="https://www.basketball-reference.com/players", 91 | letters: str=ascii_lowercase, 92 | verbose: bool=False) -> None: 93 | """Initialize PlayerDataBBref. 94 | 95 | Args: 96 | base_url (str, optional): Base URL for basketball-reference.com player pages. 97 | Defaults to "https://www.basketball-reference.com/players". 98 | letters (str, optional): Letters to iterate through. Defaults to ascii_lowercase. 99 | verbose (bool, optional): Whether to print progress information. Defaults to False. 100 | """ 101 | self.base_url = base_url 102 | self.letters = letters 103 | self.verbose = verbose 104 | self.bbref_players: list[dict[str: Union[str, int]]] = [] 105 | 106 | def bbref_player_data(self) -> pd.DataFrame: 107 | """Scrape player data for all specified letters. 108 | 109 | Returns: 110 | pd.DataFrame: DataFrame containing player information from Basketball Reference. 111 | """ 112 | for letter in self.letters: 113 | self.scrape_player_data(letter) 114 | if self.verbose: 115 | print(f"Letter: {letter} finished") 116 | return pd.DataFrame(self.bbref_players) 117 | 118 | def scrape_player_data(self, letter: str) -> None: 119 | """Scrape player data for a specific letter. 120 | 121 | Args: 122 | letter (str): The letter to scrape player data for. 123 | 124 | Raises: 125 | ValueError: If no player information is found on the page. 126 | """ 127 | url = f"{self.base_url}/{letter}/" 128 | response = requests.get(url) 129 | soup = BeautifulSoup(response.content, 'lxml') 130 | table = soup.find('table', {'id': 'players'}) 131 | if table: 132 | rows = table.find('tbody').find_all('tr') 133 | 134 | if len(rows) != 0: 135 | for row in rows: 136 | player_name = row.find('th').get_text() 137 | player_url = row.find('th').find('a')['href'] if row.find('th').find('a') else None 138 | from_year = row.find("td", {"data-stat": "year_min"}).get_text() if row.find("td", {"data-stat": "year_min"}) else None 139 | to_year = row.find("td", {"data-stat": "year_max"}).get_text() if row.find("td", {"data-stat": "year_max"}) else None 140 | 141 | self.bbref_players.append({ 142 | 'name': player_name.replace("*", ""), 143 | 'url': f"https://www.basketball-reference.com{player_url}" if player_url else None, 144 | 'bbref_id': Path(player_url).stem if player_url else None, 145 | 'from_year': int(from_year) - 1, 146 | 'to_year': int(to_year) - 1 147 | }) 148 | else: 149 | raise ValueError(f"On page {url} there is no information about the players") 150 | 151 | 152 | class MergePlayerID(object): 153 | """Class for merging player IDs between NBA Stats and Basketball Reference. 154 | 155 | This class implements various methods to match and merge player identifiers 156 | between NBA Stats API and Basketball Reference data sources. 157 | 158 | Attributes: 159 | nbastats (pd.DataFrame): DataFrame containing NBA Stats API player data. 160 | bbref (pd.DataFrame): DataFrame containing Basketball Reference player data. 161 | zero_df (pd.DataFrame, optional): Players with no matches. 162 | double_df (pd.DataFrame, optional): Players with multiple matches. 163 | non_merge_bbref (pd.DataFrame, optional): Unmatched Basketball Reference players. 164 | non_merge_nbastats (pd.DataFrame, optional): Unmatched NBA Stats players. 165 | full_coincidence_df (pd.DataFrame, optional): Players with full information matches. 166 | """ 167 | 168 | def __init__(self, 169 | nbastats: pd.DataFrame, 170 | bbref: pd.DataFrame) -> None: 171 | """Initialize MergePlayerID. 172 | 173 | Args: 174 | nbastats (pd.DataFrame): NBA Stats API player data. 175 | bbref (pd.DataFrame): Basketball Reference player data. 176 | """ 177 | self.nbastats = ( 178 | nbastats 179 | .assign(FIRST_LETTER = lambda df_: [x[:1].lower() for x in df_.DISPLAY_LAST_COMMA_FIRST]) 180 | ) 181 | self.bbref = bbref 182 | self.zero_df: Optional[pd.DataFrame] = None 183 | self.double_df: Optional[pd.DataFrame] = None 184 | self.non_merge_bbref: Optional[pd.DataFrame] = None 185 | self.non_merge_nbastats: Optional[pd.DataFrame] = None 186 | self.full_coincidence_df: Optional[pd.DataFrame] = None 187 | 188 | def merge_by_name(self) -> pd.DataFrame: 189 | """Merge players by exact name matches. 190 | 191 | Returns: 192 | pd.DataFrame: DataFrame of matched players by name. 193 | """ 194 | merge_index = [] 195 | zero_index = [] 196 | double_index = [] 197 | for idx, nba_name in enumerate(self.nbastats.loc[:, "DISPLAY_FIRST_LAST"]): 198 | tmp_df = self.bbref.loc[self.bbref["name"] == nba_name] 199 | if tmp_df.shape[0] == 1: 200 | merge_index.append(idx) 201 | elif tmp_df.shape[0] == 0: 202 | zero_index.append(idx) 203 | elif tmp_df.shape[0] > 1: 204 | double_index.append(idx) 205 | else: 206 | raise ValueError("Error") 207 | 208 | self.zero_df = self.nbastats.iloc[zero_index].reset_index(drop=True) 209 | self.double_df = self.nbastats.iloc[double_index].reset_index(drop=True) 210 | 211 | merge_df = ( 212 | self.nbastats 213 | .iloc[merge_index] 214 | .reset_index(drop=True) 215 | .pipe(lambda df_: df_.loc[:, ["PERSON_ID", "DISPLAY_FIRST_LAST", "FROM_YEAR", "TO_YEAR"]]) 216 | .pipe(lambda df_: df_.merge(self.bbref, how="inner", left_on="DISPLAY_FIRST_LAST", right_on="name")) 217 | ) 218 | 219 | self.upd_non_merge(merge_df) 220 | 221 | return merge_df 222 | 223 | def merge_double(self, merge_df: pd.DataFrame) -> pd.DataFrame: 224 | """Merge players with multiple potential matches. 225 | 226 | Args: 227 | merge_df (pd.DataFrame): Previously merged player data. 228 | 229 | Returns: 230 | pd.DataFrame: Updated DataFrame with additional matches. 231 | """ 232 | merge_double = ( 233 | self.double_df 234 | .pipe(lambda df_: df_.loc[:, ["PERSON_ID", "DISPLAY_FIRST_LAST", "FROM_YEAR", "TO_YEAR"]]) 235 | .astype({'FROM_YEAR': 'int', 'TO_YEAR': 'int'}) 236 | .pipe(lambda df_: df_.merge(self.non_merge_bbref, 237 | how="left", 238 | left_on=["DISPLAY_FIRST_LAST", "FROM_YEAR", "TO_YEAR"], 239 | right_on=["name", "from_year", "to_year"] 240 | )) 241 | ) 242 | 243 | non_match = merge_double.loc[pd.isna(merge_double.name), ["PERSON_ID", "DISPLAY_FIRST_LAST", "FROM_YEAR", "TO_YEAR"]] 244 | self.full_coincidence_df = ( 245 | merge_double 246 | .pipe(lambda df_: df_.merge(( 247 | df_ 248 | .groupby(["PERSON_ID"], as_index=False)["TO_YEAR"] 249 | .count() 250 | .pipe(lambda df_: df_.loc[df_.TO_YEAR > 1, "PERSON_ID"]) 251 | ), how="inner", on="PERSON_ID")) 252 | .loc[:, ["PERSON_ID", "DISPLAY_FIRST_LAST", "FROM_YEAR", "TO_YEAR"]] 253 | .drop_duplicates() 254 | .reset_index(drop=True) 255 | ) 256 | 257 | non_ids = non_match.PERSON_ID.to_list() + self.full_coincidence_df.PERSON_ID.to_list() 258 | 259 | merge_df = pd.concat([merge_df, merge_double.loc[~merge_double.PERSON_ID.isin(non_ids)]], 260 | axis=0, ignore_index=True) 261 | self.upd_non_merge(merge_df) 262 | 263 | merge_non_match = non_match.merge(self.non_merge_bbref, how="left", left_on="DISPLAY_FIRST_LAST", right_on="name") 264 | merge_df = pd.concat([merge_df, merge_non_match], axis=0, ignore_index=True) 265 | 266 | self.upd_non_merge(merge_df) 267 | 268 | return merge_df 269 | 270 | def merge_non_english(self, merge_df: pd.DataFrame) -> pd.DataFrame: 271 | """Merge players with non-English characters in their names. 272 | 273 | Args: 274 | merge_df (pd.DataFrame): Previously merged player data. 275 | 276 | Returns: 277 | pd.DataFrame: Updated DataFrame with additional matches. 278 | """ 279 | non_eng_idx = np.array([self._detect_non_english(x) for x in self.non_merge_bbref.name]) 280 | non_eng = self.non_merge_bbref.iloc[non_eng_idx].reset_index(drop=True) 281 | non_eng["non_english_count"] = [self._count_non_english(x) for x in non_eng.name] 282 | non_eng["name_lower"] = [x.lower() for x in non_eng["name"]] 283 | 284 | check_non_eng = ( 285 | self.non_merge_nbastats 286 | .pipe(lambda df_: df_.merge(non_eng, how="inner", left_on="DISPLAY_FIRST_LAST", right_on="name")) 287 | .loc[:, ["PERSON_ID", "DISPLAY_FIRST_LAST", "FROM_YEAR", "TO_YEAR", 288 | "name", "url", "bbref_id", "from_year", "to_year"]] 289 | ) 290 | 291 | if check_non_eng.shape[0] != 0: 292 | merge_df = pd.concat([merge_df, check_non_eng], axis=0, ignore_index=True) 293 | self.upd_non_merge(merge_df) 294 | 295 | transform_nbastats = ( 296 | self.non_merge_nbastats 297 | .assign( 298 | name_lower=lambda df_: [x.lower() for x in df_.DISPLAY_FIRST_LAST], 299 | ) 300 | .loc[:, ["PERSON_ID", "DISPLAY_FIRST_LAST", "FROM_YEAR", "TO_YEAR", "name_lower"]] 301 | ) 302 | 303 | cnt_sym = np.sort(np.unique(non_eng.non_english_count)) 304 | prod_dict = {key: list(product(*[list(ascii_lowercase) for _ in range(key)])) for key in cnt_sym} 305 | eng = np.hstack((np.arange(65, 91), np.arange(97, 123), np.array([32, 45, 46]))) 306 | 307 | for i, row in enumerate(non_eng.itertuples()): 308 | n = np.array([ord(x) in eng for x in row.name_lower]) 309 | if row.non_english_count == 1: 310 | replace_idx = np.where(n == False)[0] 311 | else: 312 | replace_idx = np.where(n == False)[0] 313 | for sym_cand in prod_dict[row.non_english_count]: 314 | name_ = list(row.name_lower) 315 | for pos in range(len(sym_cand)): 316 | name_[replace_idx[pos]] = sym_cand[pos] 317 | new_name = "".join(name_) 318 | check_idx = transform_nbastats.loc[transform_nbastats["name_lower"] == new_name].index 319 | if len(check_idx) == 0: 320 | continue 321 | else: 322 | non_eng.iloc[i, 6] = new_name 323 | break 324 | 325 | merge_non_eng = ( 326 | non_eng 327 | .pipe(lambda df_: df_.merge(transform_nbastats, how="inner", on="name_lower")) 328 | .loc[:, ["PERSON_ID", "DISPLAY_FIRST_LAST", "FROM_YEAR", "TO_YEAR", 329 | "name", "url", "bbref_id", "from_year", "to_year"]] 330 | ) 331 | merge_df = pd.concat([merge_df, merge_non_eng], axis=0, ignore_index=True) 332 | self.upd_non_merge(merge_df) 333 | 334 | return merge_df 335 | 336 | def merge_surname(self, merge_df: pd.DataFrame) -> pd.DataFrame: 337 | """Merge players based on surname matches. 338 | 339 | Args: 340 | merge_df (pd.DataFrame): Previously merged player data. 341 | 342 | Returns: 343 | pd.DataFrame: Updated DataFrame with additional matches. 344 | """ 345 | nbastats_surname = set( 346 | self.non_merge_nbastats 347 | .assign( 348 | CNT_PART = lambda df_: [len(x.split()) for x in df_.DISPLAY_FIRST_LAST], 349 | SURNAME=lambda df_: [x.split()[1] if y > 1 else None for x, y in zip(df_.DISPLAY_FIRST_LAST, df_.CNT_PART)] 350 | ) 351 | .groupby("SURNAME", as_index=False)["PERSON_ID"].count() 352 | .pipe(lambda df_: df_.loc[df_.PERSON_ID == 1]) 353 | .reset_index(drop=True) 354 | .iloc[:, 0] 355 | .to_list() 356 | ) 357 | 358 | bbref_surname = ( 359 | self.non_merge_bbref 360 | .assign(SURNAME=lambda df_: [x.split()[1] for x in df_.name]) 361 | .groupby("SURNAME", as_index=False)["bbref_id"].count() 362 | .pipe(lambda df_: df_.loc[df_.bbref_id == 1]) 363 | .reset_index(drop=True) 364 | .iloc[:, 0] 365 | .to_list() 366 | ) 367 | surname_set = nbastats_surname.intersection(bbref_surname) 368 | 369 | comp_surname = ( 370 | self.non_merge_nbastats 371 | .assign(SURNAME=lambda df_: [x.split()[1] for x in df_.DISPLAY_FIRST_LAST]) 372 | .pipe(lambda df_: df_.loc[df_.SURNAME.isin(surname_set)]) 373 | .reset_index(drop=True) 374 | .pipe(lambda df_: df_.merge( 375 | ( 376 | self.non_merge_bbref 377 | .assign(SURNAME=lambda df_: [x.split()[1] for x in df_.name]) 378 | .pipe(lambda df_: df_.loc[df_.SURNAME.isin(surname_set)]) 379 | .reset_index(drop=True) 380 | ), 381 | how="inner", 382 | on="SURNAME" 383 | )) 384 | .pipe(lambda df_: df_.loc[:, ["PERSON_ID", "DISPLAY_FIRST_LAST", "FROM_YEAR", "TO_YEAR", 385 | "name", "url", "bbref_id", "from_year", "to_year"]]) 386 | ) 387 | 388 | merge_df = pd.concat([merge_df, comp_surname], axis=0, ignore_index=True) 389 | 390 | self.upd_non_merge(merge_df) 391 | 392 | nbastats_surname_year = ( 393 | self.non_merge_nbastats 394 | .assign( 395 | CNT_PART = lambda df_: [len(x.split()) for x in df_.DISPLAY_FIRST_LAST], 396 | SURNAME=lambda df_: [x.split()[1] if y > 1 else None for x, y in zip(df_.DISPLAY_FIRST_LAST, df_.CNT_PART)] 397 | ) 398 | .drop(columns="CNT_PART") 399 | ) 400 | 401 | bbref_surname_year = ( 402 | self.non_merge_bbref 403 | .assign( 404 | CNT_PART=lambda df_: [len(x.split()) for x in df_.name], 405 | SURNAME=lambda df_: [x.split()[1] if y > 1 else None for x, y in 406 | zip(df_.name, df_.CNT_PART)] 407 | ) 408 | .drop(columns="CNT_PART") 409 | ) 410 | 411 | comp_surname_year = ( 412 | nbastats_surname_year 413 | .astype({'FROM_YEAR': 'int', 'TO_YEAR': 'int'}) 414 | .pipe(lambda df_: df_.merge( 415 | bbref_surname_year, 416 | how="inner", 417 | left_on=["SURNAME", "FROM_YEAR", "TO_YEAR"], 418 | right_on=["SURNAME", "from_year", "to_year"] 419 | )) 420 | .pipe(lambda df_: df_.loc[~df_.PERSON_ID.isin([203183, 203502]), 421 | ["PERSON_ID", "DISPLAY_FIRST_LAST", "FROM_YEAR", "TO_YEAR", 422 | "name", "url", "bbref_id", "from_year", "to_year"]]) 423 | .reset_index(drop=True) 424 | ) 425 | 426 | merge_df = pd.concat([merge_df, comp_surname_year], axis=0, ignore_index=True) 427 | 428 | self.upd_non_merge(merge_df) 429 | 430 | return merge_df 431 | 432 | def merge_wo_punctuation(self, merge_df: pd.DataFrame) -> pd.DataFrame: 433 | """Merge players after removing punctuation from names. 434 | 435 | Args: 436 | merge_df (pd.DataFrame): Previously merged player data. 437 | 438 | Returns: 439 | pd.DataFrame: Updated DataFrame with additional matches. 440 | """ 441 | nba_letters = ( 442 | self.non_merge_nbastats 443 | .assign( 444 | ONLY_LETTER=lambda df_: [re.sub(r'[^a-zA-Z]', "", re.sub(" I$| II$| III$| IV$| V$", "", x)).lower() for 445 | x in df_.DISPLAY_FIRST_LAST]) 446 | ) 447 | 448 | bbref_letters = ( 449 | self.non_merge_bbref 450 | .assign(only_letter=lambda df_: [re.sub(r'[^a-zA-Z]', '', x).lower() for x in df_.name]) 451 | ) 452 | 453 | comp_letter = ( 454 | nba_letters 455 | .pipe(lambda df_: df_.loc[:, ["PERSON_ID", "DISPLAY_FIRST_LAST", "FROM_YEAR", "TO_YEAR", "ONLY_LETTER"]]) 456 | .pipe(lambda df_: df_.merge(bbref_letters, how="inner", left_on="ONLY_LETTER", right_on="only_letter")) 457 | .pipe(lambda df_: df_.loc[~df_["PERSON_ID"].isin([203183, 203502])]) 458 | .reset_index(drop=True) 459 | ) 460 | 461 | merge_df = pd.concat([merge_df, comp_letter.drop(columns=["ONLY_LETTER", "only_letter"])], axis=0, ignore_index=True) 462 | 463 | self.upd_non_merge(merge_df) 464 | 465 | bbref_letters = ( 466 | bbref_letters 467 | .pipe(lambda df_: df_.loc[~df_.bbref_id.isin(comp_letter.bbref_id)]) 468 | .reset_index(drop=True) 469 | ) 470 | 471 | nba_letters = ( 472 | nba_letters 473 | .pipe(lambda df_: df_.loc[~df_.PERSON_ID.isin(comp_letter.PERSON_ID)]) 474 | .reset_index(drop=True) 475 | ) 476 | 477 | list_nba_names = nba_letters.ONLY_LETTER.to_list() 478 | list_bbref_names = bbref_letters.only_letter.to_list() 479 | 480 | best = [] 481 | idx_best = [] 482 | for player in list_nba_names: 483 | min_dist = 10000 484 | second_min_dist = 10000 485 | idx = 0 486 | for idx_comp, player_comp in enumerate(list_bbref_names): 487 | dist = distance(player, player_comp) 488 | if dist <= min_dist: 489 | min_dist = dist 490 | idx = idx_comp 491 | elif dist < second_min_dist: 492 | second_min_dist = dist 493 | else: 494 | pass 495 | best.append(min_dist) 496 | idx_best.append(idx) 497 | 498 | comp_lev = ( 499 | nba_letters 500 | .pipe(lambda df_: df_.loc[:, ["PERSON_ID", "DISPLAY_FIRST_LAST", "FROM_YEAR", "TO_YEAR", "ONLY_LETTER"]]) 501 | .assign( 502 | LEN_LETTER=lambda df_: [len(x) for x in df_.ONLY_LETTER], 503 | BEST_LEV=best, 504 | BEST_IDX=idx_best 505 | ) 506 | .pipe(lambda df_: df_.loc[(df_.BEST_LEV <= 2) & (~df_.PERSON_ID.isin([203183, 203502])), 507 | ["PERSON_ID", "DISPLAY_FIRST_LAST", 508 | "FROM_YEAR", "TO_YEAR", "BEST_IDX"]]) 509 | .reset_index(drop=True) 510 | .pipe(lambda df_: df_.merge( 511 | bbref_letters.assign(IDX=lambda df_: df_.index), 512 | how="inner", 513 | left_on="BEST_IDX", right_on="IDX" 514 | )) 515 | .drop(columns=["BEST_IDX", "only_letter", "IDX"]) 516 | ) 517 | 518 | merge_df = pd.concat([merge_df, comp_lev], axis=0, ignore_index=True) 519 | 520 | self.upd_non_merge(merge_df) 521 | 522 | return merge_df 523 | 524 | def merge_from_dict(self, merge_df: pd.DataFrame) -> pd.DataFrame: 525 | """Merge players using predefined mapping dictionary. 526 | 527 | Args: 528 | merge_df (pd.DataFrame): Previously merged player data. 529 | 530 | Returns: 531 | pd.DataFrame: Final merged DataFrame with all matches. 532 | """ 533 | comp_dict = ( 534 | self.non_merge_nbastats 535 | .assign(bbref_id = lambda df_: [self._mapping_dict(x) for x in df_.PERSON_ID]) 536 | .pipe(lambda df_: df_.merge(self.non_merge_bbref, how="left", on="bbref_id")) 537 | .pipe(lambda df_: df_.loc[:, ["PERSON_ID", "DISPLAY_FIRST_LAST", "FROM_YEAR", "TO_YEAR", "name", "url", 538 | "bbref_id", "from_year", "to_year"]]) 539 | .assign( 540 | url=lambda df_: [ 541 | "/".join(["https://www.basketball-reference.com/players", x[0], x + ".html"]) 542 | if isinstance(x, str) else None 543 | for x in df_.bbref_id 544 | ] 545 | ) 546 | ) 547 | 548 | merge_df = pd.concat([merge_df, comp_dict], axis=0, ignore_index=True) 549 | 550 | self.upd_non_merge(merge_df) 551 | 552 | return merge_df.drop(columns=["FROM_YEAR", "TO_YEAR", "from_year", "to_year"]) 553 | 554 | def upd_non_merge(self, merge_df: pd.DataFrame) -> None: 555 | """Update non-merged player lists after each merge operation. 556 | 557 | Args: 558 | merge_df (pd.DataFrame): Current merged player data. 559 | """ 560 | merge_bbref_id = merge_df.bbref_id 561 | merge_person_id = merge_df.PERSON_ID 562 | 563 | if self.non_merge_bbref is not None: 564 | self.non_merge_bbref = self.non_merge_bbref[~self.non_merge_bbref.bbref_id.isin(merge_bbref_id)].reset_index(drop=True) 565 | else: 566 | self.non_merge_bbref = self.bbref.loc[~self.bbref.bbref_id.isin(merge_bbref_id)].reset_index(drop=True) 567 | 568 | if self.non_merge_nbastats is not None: 569 | self.non_merge_nbastats = self.non_merge_nbastats[~self.non_merge_nbastats.PERSON_ID.isin(merge_person_id)].reset_index(drop=True) 570 | else: 571 | self.non_merge_nbastats = self.nbastats.loc[~self.nbastats.PERSON_ID.isin(merge_person_id)].reset_index(drop=True) 572 | 573 | @staticmethod 574 | def _detect_non_english(names: str) -> bool: 575 | """Detect if a name contains non-English characters. 576 | 577 | Args: 578 | names (str): Player name to check. 579 | 580 | Returns: 581 | bool: True if name contains non-English characters, False otherwise. 582 | """ 583 | ord_name = not all([ord(x) in ENGLISH for x in names]) 584 | return ord_name 585 | 586 | @staticmethod 587 | def _count_non_english(names: str) -> int: 588 | """Count number of non-English characters in a name. 589 | 590 | Args: 591 | names (str): Player name to check. 592 | 593 | Returns: 594 | int: Number of non-English characters found. 595 | """ 596 | return np.sum([ord(x) not in ENGLISH for x in names]) 597 | 598 | @staticmethod 599 | def _mapping_dict(person_id: int) -> Optional[str]: 600 | """Get Basketball Reference ID from mapping dictionary. 601 | 602 | Args: 603 | person_id (int): NBA Stats API player ID. 604 | 605 | Returns: 606 | Optional[str]: Basketball Reference ID if found, None otherwise. 607 | """ 608 | try: 609 | bbref_id = MAPPING_DICT[person_id] 610 | except KeyError: 611 | bbref_id = None 612 | return bbref_id 613 | 614 | class MappingBasketID(object): 615 | """Main class for mapping basketball player IDs between different sources. 616 | 617 | This class orchestrates the entire process of mapping player IDs between 618 | NBA Stats API and Basketball Reference data sources. 619 | """ 620 | 621 | def __init__(self): 622 | """Initialize MappingBasketID.""" 623 | pass 624 | 625 | def __call__(self, *args, **kwargs): 626 | """Execute the complete ID mapping process. 627 | 628 | Args: 629 | **kwargs: Keyword arguments including: 630 | verbose (bool): Whether to print progress information. 631 | bbref (pd.DataFrame): Existing Basketball Reference data. 632 | nbastats (pd.DataFrame): Existing NBA Stats data. 633 | letters (str): Letters to scrape from Basketball Reference. 634 | base_url (str): Base URL for Basketball Reference. 635 | 636 | Returns: 637 | pd.DataFrame: Complete mapping between NBA Stats and Basketball Reference IDs. 638 | """ 639 | 640 | self.verbose = kwargs.get("verbose", False) 641 | self.bbref = kwargs.get("bbref", None) 642 | self.nbastats = kwargs.get("nbastats", None) 643 | self.letters = kwargs.get("letters", ascii_lowercase) 644 | self.base_url = kwargs.get("base_url", "https://www.basketball-reference.com/players") 645 | if self.bbref is None: 646 | bbref_players = PlayerDataBBref(verbose=self.verbose, letters=self.letters, base_url=self.base_url) 647 | self.bbref = bbref_players.bbref_player_data() 648 | if self.nbastats is None: 649 | self.nbastats = CommonAllPlayers().get_data_frames()[0] 650 | merge_players = MergePlayerID(self.nbastats, self.bbref) 651 | players_df = merge_players.merge_by_name() 652 | players_df = merge_players.merge_double(players_df) 653 | players_df = merge_players.merge_non_english(players_df) 654 | players_df = merge_players.merge_surname(players_df) 655 | players_df = merge_players.merge_surname(players_df) 656 | players_df = merge_players.merge_wo_punctuation(players_df) 657 | players_df = merge_players.merge_from_dict(players_df) 658 | 659 | return players_df 660 | 661 | mapping_nba_id = MappingBasketID() 662 | -------------------------------------------------------------------------------- /mapping_nba_ids/requirements.txt: -------------------------------------------------------------------------------- 1 | nba_api>=1.4.0 2 | numpy>=1.22.2,<2.0.0 3 | pandas>=2.0.0 4 | Levenshtein==0.26.1 5 | beautifulsoup4>=4.10.0 6 | requests>=2.31.0 7 | lxml>=5.2.0 -------------------------------------------------------------------------------- /sat_logo.jpeg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shufinskiy/sport_analytics_tools/c3b1172790725630953800b3878a472d55153b7b/sat_logo.jpeg --------------------------------------------------------------------------------