├── LICENSE
├── README.md
├── mapping_nba_ids
├── .gitignore
├── README.md
├── mapnbaid.py
├── mapping_nba_ids.csv
└── requirements.txt
└── sat_logo.jpeg
/LICENSE:
--------------------------------------------------------------------------------
1 | Apache License
2 | Version 2.0, January 2004
3 | http://www.apache.org/licenses/
4 |
5 | TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
6 |
7 | 1. Definitions.
8 |
9 | "License" shall mean the terms and conditions for use, reproduction,
10 | and distribution as defined by Sections 1 through 9 of this document.
11 |
12 | "Licensor" shall mean the copyright owner or entity authorized by
13 | the copyright owner that is granting the License.
14 |
15 | "Legal Entity" shall mean the union of the acting entity and all
16 | other entities that control, are controlled by, or are under common
17 | control with that entity. For the purposes of this definition,
18 | "control" means (i) the power, direct or indirect, to cause the
19 | direction or management of such entity, whether by contract or
20 | otherwise, or (ii) ownership of fifty percent (50%) or more of the
21 | outstanding shares, or (iii) beneficial ownership of such entity.
22 |
23 | "You" (or "Your") shall mean an individual or Legal Entity
24 | exercising permissions granted by this License.
25 |
26 | "Source" form shall mean the preferred form for making modifications,
27 | including but not limited to software source code, documentation
28 | source, and configuration files.
29 |
30 | "Object" form shall mean any form resulting from mechanical
31 | transformation or translation of a Source form, including but
32 | not limited to compiled object code, generated documentation,
33 | and conversions to other media types.
34 |
35 | "Work" shall mean the work of authorship, whether in Source or
36 | Object form, made available under the License, as indicated by a
37 | copyright notice that is included in or attached to the work
38 | (an example is provided in the Appendix below).
39 |
40 | "Derivative Works" shall mean any work, whether in Source or Object
41 | form, that is based on (or derived from) the Work and for which the
42 | editorial revisions, annotations, elaborations, or other modifications
43 | represent, as a whole, an original work of authorship. For the purposes
44 | of this License, Derivative Works shall not include works that remain
45 | separable from, or merely link (or bind by name) to the interfaces of,
46 | the Work and Derivative Works thereof.
47 |
48 | "Contribution" shall mean any work of authorship, including
49 | the original version of the Work and any modifications or additions
50 | to that Work or Derivative Works thereof, that is intentionally
51 | submitted to Licensor for inclusion in the Work by the copyright owner
52 | or by an individual or Legal Entity authorized to submit on behalf of
53 | the copyright owner. For the purposes of this definition, "submitted"
54 | means any form of electronic, verbal, or written communication sent
55 | to the Licensor or its representatives, including but not limited to
56 | communication on electronic mailing lists, source code control systems,
57 | and issue tracking systems that are managed by, or on behalf of, the
58 | Licensor for the purpose of discussing and improving the Work, but
59 | excluding communication that is conspicuously marked or otherwise
60 | designated in writing by the copyright owner as "Not a Contribution."
61 |
62 | "Contributor" shall mean Licensor and any individual or Legal Entity
63 | on behalf of whom a Contribution has been received by Licensor and
64 | subsequently incorporated within the Work.
65 |
66 | 2. Grant of Copyright License. Subject to the terms and conditions of
67 | this License, each Contributor hereby grants to You a perpetual,
68 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable
69 | copyright license to reproduce, prepare Derivative Works of,
70 | publicly display, publicly perform, sublicense, and distribute the
71 | Work and such Derivative Works in Source or Object form.
72 |
73 | 3. Grant of Patent License. Subject to the terms and conditions of
74 | this License, each Contributor hereby grants to You a perpetual,
75 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable
76 | (except as stated in this section) patent license to make, have made,
77 | use, offer to sell, sell, import, and otherwise transfer the Work,
78 | where such license applies only to those patent claims licensable
79 | by such Contributor that are necessarily infringed by their
80 | Contribution(s) alone or by combination of their Contribution(s)
81 | with the Work to which such Contribution(s) was submitted. If You
82 | institute patent litigation against any entity (including a
83 | cross-claim or counterclaim in a lawsuit) alleging that the Work
84 | or a Contribution incorporated within the Work constitutes direct
85 | or contributory patent infringement, then any patent licenses
86 | granted to You under this License for that Work shall terminate
87 | as of the date such litigation is filed.
88 |
89 | 4. Redistribution. You may reproduce and distribute copies of the
90 | Work or Derivative Works thereof in any medium, with or without
91 | modifications, and in Source or Object form, provided that You
92 | meet the following conditions:
93 |
94 | (a) You must give any other recipients of the Work or
95 | Derivative Works a copy of this License; and
96 |
97 | (b) You must cause any modified files to carry prominent notices
98 | stating that You changed the files; and
99 |
100 | (c) You must retain, in the Source form of any Derivative Works
101 | that You distribute, all copyright, patent, trademark, and
102 | attribution notices from the Source form of the Work,
103 | excluding those notices that do not pertain to any part of
104 | the Derivative Works; and
105 |
106 | (d) If the Work includes a "NOTICE" text file as part of its
107 | distribution, then any Derivative Works that You distribute must
108 | include a readable copy of the attribution notices contained
109 | within such NOTICE file, excluding those notices that do not
110 | pertain to any part of the Derivative Works, in at least one
111 | of the following places: within a NOTICE text file distributed
112 | as part of the Derivative Works; within the Source form or
113 | documentation, if provided along with the Derivative Works; or,
114 | within a display generated by the Derivative Works, if and
115 | wherever such third-party notices normally appear. The contents
116 | of the NOTICE file are for informational purposes only and
117 | do not modify the License. You may add Your own attribution
118 | notices within Derivative Works that You distribute, alongside
119 | or as an addendum to the NOTICE text from the Work, provided
120 | that such additional attribution notices cannot be construed
121 | as modifying the License.
122 |
123 | You may add Your own copyright statement to Your modifications and
124 | may provide additional or different license terms and conditions
125 | for use, reproduction, or distribution of Your modifications, or
126 | for any such Derivative Works as a whole, provided Your use,
127 | reproduction, and distribution of the Work otherwise complies with
128 | the conditions stated in this License.
129 |
130 | 5. Submission of Contributions. Unless You explicitly state otherwise,
131 | any Contribution intentionally submitted for inclusion in the Work
132 | by You to the Licensor shall be under the terms and conditions of
133 | this License, without any additional terms or conditions.
134 | Notwithstanding the above, nothing herein shall supersede or modify
135 | the terms of any separate license agreement you may have executed
136 | with Licensor regarding such Contributions.
137 |
138 | 6. Trademarks. This License does not grant permission to use the trade
139 | names, trademarks, service marks, or product names of the Licensor,
140 | except as required for reasonable and customary use in describing the
141 | origin of the Work and reproducing the content of the NOTICE file.
142 |
143 | 7. Disclaimer of Warranty. Unless required by applicable law or
144 | agreed to in writing, Licensor provides the Work (and each
145 | Contributor provides its Contributions) on an "AS IS" BASIS,
146 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
147 | implied, including, without limitation, any warranties or conditions
148 | of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
149 | PARTICULAR PURPOSE. You are solely responsible for determining the
150 | appropriateness of using or redistributing the Work and assume any
151 | risks associated with Your exercise of permissions under this License.
152 |
153 | 8. Limitation of Liability. In no event and under no legal theory,
154 | whether in tort (including negligence), contract, or otherwise,
155 | unless required by applicable law (such as deliberate and grossly
156 | negligent acts) or agreed to in writing, shall any Contributor be
157 | liable to You for damages, including any direct, indirect, special,
158 | incidental, or consequential damages of any character arising as a
159 | result of this License or out of the use or inability to use the
160 | Work (including but not limited to damages for loss of goodwill,
161 | work stoppage, computer failure or malfunction, or any and all
162 | other commercial damages or losses), even if such Contributor
163 | has been advised of the possibility of such damages.
164 |
165 | 9. Accepting Warranty or Additional Liability. While redistributing
166 | the Work or Derivative Works thereof, You may choose to offer,
167 | and charge a fee for, acceptance of support, warranty, indemnity,
168 | or other liability obligations and/or rights consistent with this
169 | License. However, in accepting such obligations, You may act only
170 | on Your own behalf and on Your sole responsibility, not on behalf
171 | of any other Contributor, and only if You agree to indemnify,
172 | defend, and hold each Contributor harmless for any liability
173 | incurred by, or claims asserted against, such Contributor by reason
174 | of your accepting any such warranty or additional liability.
175 |
176 | END OF TERMS AND CONDITIONS
177 |
178 | APPENDIX: How to apply the Apache License to your work.
179 |
180 | To apply the Apache License to your work, attach the following
181 | boilerplate notice, with the fields enclosed by brackets "[]"
182 | replaced with your own identifying information. (Don't include
183 | the brackets!) The text should be enclosed in the appropriate
184 | comment syntax for the file format. We also recommend that a
185 | file or class name and description of purpose be included on the
186 | same "printed page" as the copyright notice for easier
187 | identification within third-party archives.
188 |
189 | Copyright [yyyy] [name of copyright owner]
190 |
191 | Licensed under the Apache License, Version 2.0 (the "License");
192 | you may not use this file except in compliance with the License.
193 | You may obtain a copy of the License at
194 |
195 | http://www.apache.org/licenses/LICENSE-2.0
196 |
197 | Unless required by applicable law or agreed to in writing, software
198 | distributed under the License is distributed on an "AS IS" BASIS,
199 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
200 | See the License for the specific language governing permissions and
201 | limitations under the License.
202 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
5 | Sport Analytics Tools
6 |
7 | **Sport Analytics Tools** is a project dedicated to publishing information (code, tutorials, media) to help data scientists and sports enthusiasts work with sports data effectively.
8 |
9 | ## Motivation
10 |
11 | - I believe that open-source solutions are superior to proprietary technologies.
12 | - I believe that if you've solved a problem or possess valuable knowledge, you should share it.
13 |
14 | ## Objective
15 |
16 | The goal of this project is to assist people interested in sports data science in tackling data analysis tasks.
17 |
18 | - I've developed **nba_data**, a repository of NBA data that can be accessed in seconds instead of hours via the NBA API.
19 | - I am the author of the **nba-on-court** library, which simplifies working with NBA data.
20 | - I am a contributor to well-known sports libraries such as **nba_api**, **hoopR**, and **worldfootballR**.
21 |
22 | Through this project, I aim to create a comprehensive knowledge base of tools and resources to enhance the workflow with sports data.
23 |
24 | ## List of Projects
25 |
26 | ### 1. NBA Player ID Mapping Tool 🏀
27 | Tool for mapping player IDs between NBA Stats API and Basketball Reference.
28 |
29 | #### Features
30 | - Automated ID mapping between different basketball data sources
31 | - Multiple matching algorithms for high accuracy
32 | - Handles special cases and non-English names
33 | - Easy-to-use Python interface
34 |
35 | #### Requirements
36 | - Python 3.8+
37 | - Core dependencies: beautifulsoup4, numpy, pandas, requests, nba_api, python-Levenshtein
38 |
39 | [Learn more about NBA Player ID Mapping Tool →](https://github.com/shufinskiy/sport_analytics_tools/tree/main/mapping_nba_ids)
40 |
41 | ## Project Table
42 |
43 | |Name|Description|
44 | |------|---------|
45 | |[NBA Player ID Mapping Tool](https://github.com/shufinskiy/sport_analytics_tools/tree/main/mapping_nba_ids)| Code automating the process of mapping ID from the NBA website and basketball-reference|
46 |
47 | ## Installation
48 |
49 | ```bash
50 | # Clone the repository
51 | git clone https://github.com/shufinskiy/sport_analytics_tools.git
52 | cd sport_analytics_tools
53 |
54 | # Install dependencies
55 | pip install -r requirements.txt
56 | ```
57 |
58 | ## Contributing 🤝
59 | Contributions are welcome! Please feel free to submit pull requests, particularly for:
60 |
61 | - Adding new tools for sports analytics
62 | - Improving existing functionality
63 | - Adding documentation and tutorials
64 | - Bug fixes and optimizations
65 |
66 | ## License 📄
67 | Apache License 2.0
68 |
69 | ## Contact 📫
70 |
71 |
84 |
--------------------------------------------------------------------------------
/mapping_nba_ids/.gitignore:
--------------------------------------------------------------------------------
1 | # Byte-compiled / optimized / DLL files
2 | __pycache__/
3 | *.py[cod]
4 | *$py.class
5 |
6 | # C extensions
7 | *.so
8 |
9 | # Distribution / packaging
10 | .Python
11 | build/
12 | develop-eggs/
13 | dist/
14 | downloads/
15 | eggs/
16 | .eggs/
17 | lib/
18 | lib64/
19 | parts/
20 | sdist/
21 | var/
22 | wheels/
23 | share/python-wheels/
24 | *.egg-info/
25 | .installed.cfg
26 | *.egg
27 | MANIFEST
28 |
29 | # PyInstaller
30 | # Usually these files are written by a python script from a template
31 | # before PyInstaller builds the exe, so as to inject date/other infos into it.
32 | *.manifest
33 | *.spec
34 |
35 | # Installer logs
36 | pip-log.txt
37 | pip-delete-this-directory.txt
38 |
39 | # Unit test / coverage reports
40 | htmlcov/
41 | .tox/
42 | .nox/
43 | .coverage
44 | .coverage.*
45 | .cache
46 | nosetests.xml
47 | coverage.xml
48 | *.cover
49 | *.py,cover
50 | .hypothesis/
51 | .pytest_cache/
52 | cover/
53 |
54 | # Translations
55 | *.mo
56 | *.pot
57 |
58 | # Django stuff:
59 | *.log
60 | local_settings.py
61 | db.sqlite3
62 | db.sqlite3-journal
63 |
64 | # Flask stuff:
65 | instance/
66 | .webassets-cache
67 |
68 | # Scrapy stuff:
69 | .scrapy
70 |
71 | # Sphinx documentation
72 | docs/_build/
73 |
74 | # PyBuilder
75 | .pybuilder/
76 | target/
77 |
78 | # Jupyter Notebook
79 | .ipynb_checkpoints
80 |
81 | # IPython
82 | profile_default/
83 | ipython_config.py
84 |
85 | # pyenv
86 | # For a library or package, you might want to ignore these files since the code is
87 | # intended to run in multiple environments; otherwise, check them in:
88 | # .python-version
89 |
90 | # pipenv
91 | # According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
92 | # However, in case of collaboration, if having platform-specific dependencies or dependencies
93 | # having no cross-platform support, pipenv may install dependencies that don't work, or not
94 | # install all needed dependencies.
95 | #Pipfile.lock
96 |
97 | # UV
98 | # Similar to Pipfile.lock, it is generally recommended to include uv.lock in version control.
99 | # This is especially recommended for binary packages to ensure reproducibility, and is more
100 | # commonly ignored for libraries.
101 | #uv.lock
102 |
103 | # poetry
104 | # Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
105 | # This is especially recommended for binary packages to ensure reproducibility, and is more
106 | # commonly ignored for libraries.
107 | # https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
108 | #poetry.lock
109 |
110 | # pdm
111 | # Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
112 | #pdm.lock
113 | # pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
114 | # in version control.
115 | # https://pdm.fming.dev/latest/usage/project/#working-with-version-control
116 | .pdm.toml
117 | .pdm-python
118 | .pdm-build/
119 |
120 | # PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
121 | __pypackages__/
122 |
123 | # Celery stuff
124 | celerybeat-schedule
125 | celerybeat.pid
126 |
127 | # SageMath parsed files
128 | *.sage.py
129 |
130 | # Environments
131 | .env
132 | .venv
133 | env/
134 | venv/
135 | ENV/
136 | env.bak/
137 | venv.bak/
138 |
139 | # Spyder project settings
140 | .spyderproject
141 | .spyproject
142 |
143 | # Rope project settings
144 | .ropeproject
145 |
146 | # mkdocs documentation
147 | /site
148 |
149 | # mypy
150 | .mypy_cache/
151 | .dmypy.json
152 | dmypy.json
153 |
154 | # Pyre type checker
155 | .pyre/
156 |
157 | # pytype static type analyzer
158 | .pytype/
159 |
160 | # Cython debug symbols
161 | cython_debug/
162 |
163 | # PyCharm
164 | # JetBrains specific template is maintained in a separate JetBrains.gitignore that can
165 | # be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
166 | # and can be added to the global gitignore or merged into this file. For a more nuclear
167 | # option (not recommended) you can uncomment the following to ignore the entire idea folder.
168 | #.idea/
169 |
--------------------------------------------------------------------------------
/mapping_nba_ids/README.md:
--------------------------------------------------------------------------------
1 | # NBA Player ID Mapping Tool 🏀
2 |
3 | A Python tool for mapping player IDs between NBA Stats API and Basketball Reference. This tool helps solve the common challenge of matching player data across different basketball data sources.
4 |
5 | ## Why This Tool? 🤔
6 |
7 | When working with basketball data, analysts often need to combine data from multiple sources. Two of the most popular sources are:
8 | - NBA Stats API (official NBA statistics)
9 | - Basketball Reference (comprehensive historical data)
10 |
11 | However, these sources use different ID systems for players, making it difficult to merge data. This tool creates a mapping between these IDs, allowing for seamless data integration.
12 |
13 | ## How It Works 🛠️
14 |
15 | The tool uses a multi-step matching algorithm to ensure the highest possible accuracy:
16 |
17 | 1. **Exact Name Matching** 📋
18 | - First attempts to match players by their exact names
19 | - Separates cases with no matches and multiple matches for further processing
20 |
21 | 2. **Multiple Match Resolution** 🔄
22 | - For players with multiple potential matches, uses additional criteria like active years
23 | - Creates separate handling for special cases
24 |
25 | 3. **Non-English Character Handling** 🌐
26 | - Processes names containing non-English characters
27 | - Attempts various transliterations to find matches
28 |
29 | 4. **Surname-Based Matching** 👥
30 | - Matches players using surnames when full names don't match
31 | - Includes additional verification using career years
32 |
33 | 5. **Fuzzy Matching** 🔍
34 | - Removes punctuation and special characters
35 | - Uses Levenshtein distance for approximate string matching
36 |
37 | 6. **Manual Dictionary Mapping** 📘
38 | - Falls back to a pre-defined mapping for special cases
39 | - Handles edge cases that automated matching can't resolve
40 |
41 | ## Usage 💻
42 |
43 | ```python
44 | from mapping_nba_ids import mapping_nba_id
45 |
46 | # Basic usage with default parameters
47 | mapped_players = mapping_nba_id()
48 |
49 | # Advanced usage with custom parameters
50 | mapped_players = mapping_nba_id(
51 | verbose=True, # Print progress information
52 | letters='abcde', # Only process players whose names start with these letters
53 | base_url='https://www.basketball-reference.com/players' # Custom base URL
54 | )
55 | ```
56 |
57 | ## Requirements 📦
58 |
59 | ### Python Version
60 | - Python 3.8 or higher
61 |
62 | ### Required Libraries
63 | ```txt
64 | nba_api>=1.4.0
65 | numpy>=1.22.2,<2.0.0
66 | pandas>=2.0.0
67 | Levenshtein==0.26.1
68 | beautifulsoup4>=4.10.0
69 | requests>=2.31.0
70 | lxml>=5.2.0
71 | ```
72 |
73 | ## Output 📊
74 | The tool returns a pandas DataFrame containing:
75 |
76 | - NBA Stats API Player ID
77 | - Player Name
78 | - Basketball Reference ID
79 | - Basketball Reference URL
80 |
81 | **ID mapping table is located in mapping_nba_ids.csv file and will be updated periodically. You can run the code locally or just download this file.**
82 |
83 | ## Contributing 🤝
84 | Contributions are welcome! Here's how you can help:
85 |
86 | 1. Fork the repository
87 | 2. Create your feature branch (`git checkout -b feature/amazing-feature`)
88 | 3. Commit your changes (`git commit -m 'Add some amazing feature`)
89 | 4. Push to the branch (`git push origin feature/amazing-feature`)
90 | 5. Open a Pull Request
91 |
92 | Author ✍️
93 | shufinskiy - [GitHub Profile](https://github.com/shufinskiy)
94 |
95 | - 📫 How to reach me: Create an issue in this repository
96 | - 🌟 If you find this tool useful, please consider giving it a star!
97 |
98 |
--------------------------------------------------------------------------------
/mapping_nba_ids/mapnbaid.py:
--------------------------------------------------------------------------------
1 | """
2 | Module for mapping NBA player IDs between different data sources.
3 | This module provides functionality to map player IDs between NBA Stats API and Basketball Reference.
4 | """
5 |
6 | from string import ascii_lowercase
7 | from pathlib import Path
8 | from typing import Optional, Union
9 | from itertools import product
10 | import re
11 |
12 | import requests
13 | from bs4 import BeautifulSoup
14 | import numpy as np
15 | import pandas as pd
16 | from nba_api.stats.endpoints import CommonAllPlayers
17 | from Levenshtein import distance
18 |
19 |
20 | ENGLISH = np.hstack((np.arange(65, 91),np.arange(97, 123), np.array([32, 45, 46])))
21 |
22 | MAPPING_DICT = {
23 | 202392: 'blakema01',
24 | 1629129: 'bluietr01',
25 | 1642486: 'buiebo01',
26 | 202221: 'butchbr01',
27 | 1642382: 'carlsbr01',
28 | 1642269: 'cartede02',
29 | 1642353: 'chrisca02',
30 | 1642384: 'crawfis01',
31 | 1642368: 'nfalyda01',
32 | 76521: 'davisdw01',
33 | 1642399: 'edwarje01',
34 | 1642348: 'edwarju01',
35 | 203543: 'favervi01',
36 | 1642280: 'flowetr01',
37 | 202070: 'gaffnto01',
38 | 1641945: 'galloja01',
39 | 1619: 'garriki01',
40 | 2775: 'seungha01',
41 | 202238: 'hasbrke01',
42 | 1641747: 'holmeda01',
43 | 1630258: 'homesca01',
44 | 77082: 'hundlho01',
45 | 201998: 'jerrecu01',
46 | 1642352: 'johnske10',
47 | 77199: 'joneswa01',
48 | 1641752: 'klintbo01',
49 | 1630249: 'krejcvi01',
50 | 986: 'mannma01',
51 | 1641970: 'pereima01',
52 | 77510: 'mcclate01',
53 | 1641755: 'mcculke01',
54 | 203183: 'mitchto03',
55 | 203502: 'mitchto02',
56 | 1642439: 'olivaqu01',
57 | 1629341: 'phillta01',
58 | 1642366: 'postqu01',
59 | 202375: 'rollema01',
60 | 202067: 'simpsdi01',
61 | 1630569: 'stewadj01',
62 | 1630597: 'stewadj02',
63 | 78302: 'taylofa01',
64 | 1642260: 'topicni01',
65 | 201987: 'vadenro01',
66 | 78409: 'vaughch01',
67 | 1630492: 'vildolu01',
68 | 202358: 'whitete01',
69 | 78539: 'williar01',
70 | 1629624: 'wooteke01',
71 | 1642385: 'cuiyo01',
72 | 1631322: "mccoyja01"
73 | }
74 |
75 |
76 | class PlayerDataBBref(object):
77 | """Class for scraping player data from Basketball Reference website.
78 |
79 | This class handles the scraping of player data from basketball-reference.com,
80 | organizing it by player name's first letter.
81 |
82 | Attributes:
83 | base_url (str): Base URL for basketball-reference player pages.
84 | letters (str): Letters to iterate through for player lookup.
85 | verbose (bool): If True, prints progress information during scraping.
86 | bbref_players (list): List of dictionaries containing player information.
87 | """
88 |
89 | def __init__(self,
90 | base_url: str="https://www.basketball-reference.com/players",
91 | letters: str=ascii_lowercase,
92 | verbose: bool=False) -> None:
93 | """Initialize PlayerDataBBref.
94 |
95 | Args:
96 | base_url (str, optional): Base URL for basketball-reference.com player pages.
97 | Defaults to "https://www.basketball-reference.com/players".
98 | letters (str, optional): Letters to iterate through. Defaults to ascii_lowercase.
99 | verbose (bool, optional): Whether to print progress information. Defaults to False.
100 | """
101 | self.base_url = base_url
102 | self.letters = letters
103 | self.verbose = verbose
104 | self.bbref_players: list[dict[str: Union[str, int]]] = []
105 |
106 | def bbref_player_data(self) -> pd.DataFrame:
107 | """Scrape player data for all specified letters.
108 |
109 | Returns:
110 | pd.DataFrame: DataFrame containing player information from Basketball Reference.
111 | """
112 | for letter in self.letters:
113 | self.scrape_player_data(letter)
114 | if self.verbose:
115 | print(f"Letter: {letter} finished")
116 | return pd.DataFrame(self.bbref_players)
117 |
118 | def scrape_player_data(self, letter: str) -> None:
119 | """Scrape player data for a specific letter.
120 |
121 | Args:
122 | letter (str): The letter to scrape player data for.
123 |
124 | Raises:
125 | ValueError: If no player information is found on the page.
126 | """
127 | url = f"{self.base_url}/{letter}/"
128 | response = requests.get(url)
129 | soup = BeautifulSoup(response.content, 'lxml')
130 | table = soup.find('table', {'id': 'players'})
131 | if table:
132 | rows = table.find('tbody').find_all('tr')
133 |
134 | if len(rows) != 0:
135 | for row in rows:
136 | player_name = row.find('th').get_text()
137 | player_url = row.find('th').find('a')['href'] if row.find('th').find('a') else None
138 | from_year = row.find("td", {"data-stat": "year_min"}).get_text() if row.find("td", {"data-stat": "year_min"}) else None
139 | to_year = row.find("td", {"data-stat": "year_max"}).get_text() if row.find("td", {"data-stat": "year_max"}) else None
140 |
141 | self.bbref_players.append({
142 | 'name': player_name.replace("*", ""),
143 | 'url': f"https://www.basketball-reference.com{player_url}" if player_url else None,
144 | 'bbref_id': Path(player_url).stem if player_url else None,
145 | 'from_year': int(from_year) - 1,
146 | 'to_year': int(to_year) - 1
147 | })
148 | else:
149 | raise ValueError(f"On page {url} there is no information about the players")
150 |
151 |
152 | class MergePlayerID(object):
153 | """Class for merging player IDs between NBA Stats and Basketball Reference.
154 |
155 | This class implements various methods to match and merge player identifiers
156 | between NBA Stats API and Basketball Reference data sources.
157 |
158 | Attributes:
159 | nbastats (pd.DataFrame): DataFrame containing NBA Stats API player data.
160 | bbref (pd.DataFrame): DataFrame containing Basketball Reference player data.
161 | zero_df (pd.DataFrame, optional): Players with no matches.
162 | double_df (pd.DataFrame, optional): Players with multiple matches.
163 | non_merge_bbref (pd.DataFrame, optional): Unmatched Basketball Reference players.
164 | non_merge_nbastats (pd.DataFrame, optional): Unmatched NBA Stats players.
165 | full_coincidence_df (pd.DataFrame, optional): Players with full information matches.
166 | """
167 |
168 | def __init__(self,
169 | nbastats: pd.DataFrame,
170 | bbref: pd.DataFrame) -> None:
171 | """Initialize MergePlayerID.
172 |
173 | Args:
174 | nbastats (pd.DataFrame): NBA Stats API player data.
175 | bbref (pd.DataFrame): Basketball Reference player data.
176 | """
177 | self.nbastats = (
178 | nbastats
179 | .assign(FIRST_LETTER = lambda df_: [x[:1].lower() for x in df_.DISPLAY_LAST_COMMA_FIRST])
180 | )
181 | self.bbref = bbref
182 | self.zero_df: Optional[pd.DataFrame] = None
183 | self.double_df: Optional[pd.DataFrame] = None
184 | self.non_merge_bbref: Optional[pd.DataFrame] = None
185 | self.non_merge_nbastats: Optional[pd.DataFrame] = None
186 | self.full_coincidence_df: Optional[pd.DataFrame] = None
187 |
188 | def merge_by_name(self) -> pd.DataFrame:
189 | """Merge players by exact name matches.
190 |
191 | Returns:
192 | pd.DataFrame: DataFrame of matched players by name.
193 | """
194 | merge_index = []
195 | zero_index = []
196 | double_index = []
197 | for idx, nba_name in enumerate(self.nbastats.loc[:, "DISPLAY_FIRST_LAST"]):
198 | tmp_df = self.bbref.loc[self.bbref["name"] == nba_name]
199 | if tmp_df.shape[0] == 1:
200 | merge_index.append(idx)
201 | elif tmp_df.shape[0] == 0:
202 | zero_index.append(idx)
203 | elif tmp_df.shape[0] > 1:
204 | double_index.append(idx)
205 | else:
206 | raise ValueError("Error")
207 |
208 | self.zero_df = self.nbastats.iloc[zero_index].reset_index(drop=True)
209 | self.double_df = self.nbastats.iloc[double_index].reset_index(drop=True)
210 |
211 | merge_df = (
212 | self.nbastats
213 | .iloc[merge_index]
214 | .reset_index(drop=True)
215 | .pipe(lambda df_: df_.loc[:, ["PERSON_ID", "DISPLAY_FIRST_LAST", "FROM_YEAR", "TO_YEAR"]])
216 | .pipe(lambda df_: df_.merge(self.bbref, how="inner", left_on="DISPLAY_FIRST_LAST", right_on="name"))
217 | )
218 |
219 | self.upd_non_merge(merge_df)
220 |
221 | return merge_df
222 |
223 | def merge_double(self, merge_df: pd.DataFrame) -> pd.DataFrame:
224 | """Merge players with multiple potential matches.
225 |
226 | Args:
227 | merge_df (pd.DataFrame): Previously merged player data.
228 |
229 | Returns:
230 | pd.DataFrame: Updated DataFrame with additional matches.
231 | """
232 | merge_double = (
233 | self.double_df
234 | .pipe(lambda df_: df_.loc[:, ["PERSON_ID", "DISPLAY_FIRST_LAST", "FROM_YEAR", "TO_YEAR"]])
235 | .astype({'FROM_YEAR': 'int', 'TO_YEAR': 'int'})
236 | .pipe(lambda df_: df_.merge(self.non_merge_bbref,
237 | how="left",
238 | left_on=["DISPLAY_FIRST_LAST", "FROM_YEAR", "TO_YEAR"],
239 | right_on=["name", "from_year", "to_year"]
240 | ))
241 | )
242 |
243 | non_match = merge_double.loc[pd.isna(merge_double.name), ["PERSON_ID", "DISPLAY_FIRST_LAST", "FROM_YEAR", "TO_YEAR"]]
244 | self.full_coincidence_df = (
245 | merge_double
246 | .pipe(lambda df_: df_.merge((
247 | df_
248 | .groupby(["PERSON_ID"], as_index=False)["TO_YEAR"]
249 | .count()
250 | .pipe(lambda df_: df_.loc[df_.TO_YEAR > 1, "PERSON_ID"])
251 | ), how="inner", on="PERSON_ID"))
252 | .loc[:, ["PERSON_ID", "DISPLAY_FIRST_LAST", "FROM_YEAR", "TO_YEAR"]]
253 | .drop_duplicates()
254 | .reset_index(drop=True)
255 | )
256 |
257 | non_ids = non_match.PERSON_ID.to_list() + self.full_coincidence_df.PERSON_ID.to_list()
258 |
259 | merge_df = pd.concat([merge_df, merge_double.loc[~merge_double.PERSON_ID.isin(non_ids)]],
260 | axis=0, ignore_index=True)
261 | self.upd_non_merge(merge_df)
262 |
263 | merge_non_match = non_match.merge(self.non_merge_bbref, how="left", left_on="DISPLAY_FIRST_LAST", right_on="name")
264 | merge_df = pd.concat([merge_df, merge_non_match], axis=0, ignore_index=True)
265 |
266 | self.upd_non_merge(merge_df)
267 |
268 | return merge_df
269 |
270 | def merge_non_english(self, merge_df: pd.DataFrame) -> pd.DataFrame:
271 | """Merge players with non-English characters in their names.
272 |
273 | Args:
274 | merge_df (pd.DataFrame): Previously merged player data.
275 |
276 | Returns:
277 | pd.DataFrame: Updated DataFrame with additional matches.
278 | """
279 | non_eng_idx = np.array([self._detect_non_english(x) for x in self.non_merge_bbref.name])
280 | non_eng = self.non_merge_bbref.iloc[non_eng_idx].reset_index(drop=True)
281 | non_eng["non_english_count"] = [self._count_non_english(x) for x in non_eng.name]
282 | non_eng["name_lower"] = [x.lower() for x in non_eng["name"]]
283 |
284 | check_non_eng = (
285 | self.non_merge_nbastats
286 | .pipe(lambda df_: df_.merge(non_eng, how="inner", left_on="DISPLAY_FIRST_LAST", right_on="name"))
287 | .loc[:, ["PERSON_ID", "DISPLAY_FIRST_LAST", "FROM_YEAR", "TO_YEAR",
288 | "name", "url", "bbref_id", "from_year", "to_year"]]
289 | )
290 |
291 | if check_non_eng.shape[0] != 0:
292 | merge_df = pd.concat([merge_df, check_non_eng], axis=0, ignore_index=True)
293 | self.upd_non_merge(merge_df)
294 |
295 | transform_nbastats = (
296 | self.non_merge_nbastats
297 | .assign(
298 | name_lower=lambda df_: [x.lower() for x in df_.DISPLAY_FIRST_LAST],
299 | )
300 | .loc[:, ["PERSON_ID", "DISPLAY_FIRST_LAST", "FROM_YEAR", "TO_YEAR", "name_lower"]]
301 | )
302 |
303 | cnt_sym = np.sort(np.unique(non_eng.non_english_count))
304 | prod_dict = {key: list(product(*[list(ascii_lowercase) for _ in range(key)])) for key in cnt_sym}
305 | eng = np.hstack((np.arange(65, 91), np.arange(97, 123), np.array([32, 45, 46])))
306 |
307 | for i, row in enumerate(non_eng.itertuples()):
308 | n = np.array([ord(x) in eng for x in row.name_lower])
309 | if row.non_english_count == 1:
310 | replace_idx = np.where(n == False)[0]
311 | else:
312 | replace_idx = np.where(n == False)[0]
313 | for sym_cand in prod_dict[row.non_english_count]:
314 | name_ = list(row.name_lower)
315 | for pos in range(len(sym_cand)):
316 | name_[replace_idx[pos]] = sym_cand[pos]
317 | new_name = "".join(name_)
318 | check_idx = transform_nbastats.loc[transform_nbastats["name_lower"] == new_name].index
319 | if len(check_idx) == 0:
320 | continue
321 | else:
322 | non_eng.iloc[i, 6] = new_name
323 | break
324 |
325 | merge_non_eng = (
326 | non_eng
327 | .pipe(lambda df_: df_.merge(transform_nbastats, how="inner", on="name_lower"))
328 | .loc[:, ["PERSON_ID", "DISPLAY_FIRST_LAST", "FROM_YEAR", "TO_YEAR",
329 | "name", "url", "bbref_id", "from_year", "to_year"]]
330 | )
331 | merge_df = pd.concat([merge_df, merge_non_eng], axis=0, ignore_index=True)
332 | self.upd_non_merge(merge_df)
333 |
334 | return merge_df
335 |
336 | def merge_surname(self, merge_df: pd.DataFrame) -> pd.DataFrame:
337 | """Merge players based on surname matches.
338 |
339 | Args:
340 | merge_df (pd.DataFrame): Previously merged player data.
341 |
342 | Returns:
343 | pd.DataFrame: Updated DataFrame with additional matches.
344 | """
345 | nbastats_surname = set(
346 | self.non_merge_nbastats
347 | .assign(
348 | CNT_PART = lambda df_: [len(x.split()) for x in df_.DISPLAY_FIRST_LAST],
349 | SURNAME=lambda df_: [x.split()[1] if y > 1 else None for x, y in zip(df_.DISPLAY_FIRST_LAST, df_.CNT_PART)]
350 | )
351 | .groupby("SURNAME", as_index=False)["PERSON_ID"].count()
352 | .pipe(lambda df_: df_.loc[df_.PERSON_ID == 1])
353 | .reset_index(drop=True)
354 | .iloc[:, 0]
355 | .to_list()
356 | )
357 |
358 | bbref_surname = (
359 | self.non_merge_bbref
360 | .assign(SURNAME=lambda df_: [x.split()[1] for x in df_.name])
361 | .groupby("SURNAME", as_index=False)["bbref_id"].count()
362 | .pipe(lambda df_: df_.loc[df_.bbref_id == 1])
363 | .reset_index(drop=True)
364 | .iloc[:, 0]
365 | .to_list()
366 | )
367 | surname_set = nbastats_surname.intersection(bbref_surname)
368 |
369 | comp_surname = (
370 | self.non_merge_nbastats
371 | .assign(SURNAME=lambda df_: [x.split()[1] for x in df_.DISPLAY_FIRST_LAST])
372 | .pipe(lambda df_: df_.loc[df_.SURNAME.isin(surname_set)])
373 | .reset_index(drop=True)
374 | .pipe(lambda df_: df_.merge(
375 | (
376 | self.non_merge_bbref
377 | .assign(SURNAME=lambda df_: [x.split()[1] for x in df_.name])
378 | .pipe(lambda df_: df_.loc[df_.SURNAME.isin(surname_set)])
379 | .reset_index(drop=True)
380 | ),
381 | how="inner",
382 | on="SURNAME"
383 | ))
384 | .pipe(lambda df_: df_.loc[:, ["PERSON_ID", "DISPLAY_FIRST_LAST", "FROM_YEAR", "TO_YEAR",
385 | "name", "url", "bbref_id", "from_year", "to_year"]])
386 | )
387 |
388 | merge_df = pd.concat([merge_df, comp_surname], axis=0, ignore_index=True)
389 |
390 | self.upd_non_merge(merge_df)
391 |
392 | nbastats_surname_year = (
393 | self.non_merge_nbastats
394 | .assign(
395 | CNT_PART = lambda df_: [len(x.split()) for x in df_.DISPLAY_FIRST_LAST],
396 | SURNAME=lambda df_: [x.split()[1] if y > 1 else None for x, y in zip(df_.DISPLAY_FIRST_LAST, df_.CNT_PART)]
397 | )
398 | .drop(columns="CNT_PART")
399 | )
400 |
401 | bbref_surname_year = (
402 | self.non_merge_bbref
403 | .assign(
404 | CNT_PART=lambda df_: [len(x.split()) for x in df_.name],
405 | SURNAME=lambda df_: [x.split()[1] if y > 1 else None for x, y in
406 | zip(df_.name, df_.CNT_PART)]
407 | )
408 | .drop(columns="CNT_PART")
409 | )
410 |
411 | comp_surname_year = (
412 | nbastats_surname_year
413 | .astype({'FROM_YEAR': 'int', 'TO_YEAR': 'int'})
414 | .pipe(lambda df_: df_.merge(
415 | bbref_surname_year,
416 | how="inner",
417 | left_on=["SURNAME", "FROM_YEAR", "TO_YEAR"],
418 | right_on=["SURNAME", "from_year", "to_year"]
419 | ))
420 | .pipe(lambda df_: df_.loc[~df_.PERSON_ID.isin([203183, 203502]),
421 | ["PERSON_ID", "DISPLAY_FIRST_LAST", "FROM_YEAR", "TO_YEAR",
422 | "name", "url", "bbref_id", "from_year", "to_year"]])
423 | .reset_index(drop=True)
424 | )
425 |
426 | merge_df = pd.concat([merge_df, comp_surname_year], axis=0, ignore_index=True)
427 |
428 | self.upd_non_merge(merge_df)
429 |
430 | return merge_df
431 |
432 | def merge_wo_punctuation(self, merge_df: pd.DataFrame) -> pd.DataFrame:
433 | """Merge players after removing punctuation from names.
434 |
435 | Args:
436 | merge_df (pd.DataFrame): Previously merged player data.
437 |
438 | Returns:
439 | pd.DataFrame: Updated DataFrame with additional matches.
440 | """
441 | nba_letters = (
442 | self.non_merge_nbastats
443 | .assign(
444 | ONLY_LETTER=lambda df_: [re.sub(r'[^a-zA-Z]', "", re.sub(" I$| II$| III$| IV$| V$", "", x)).lower() for
445 | x in df_.DISPLAY_FIRST_LAST])
446 | )
447 |
448 | bbref_letters = (
449 | self.non_merge_bbref
450 | .assign(only_letter=lambda df_: [re.sub(r'[^a-zA-Z]', '', x).lower() for x in df_.name])
451 | )
452 |
453 | comp_letter = (
454 | nba_letters
455 | .pipe(lambda df_: df_.loc[:, ["PERSON_ID", "DISPLAY_FIRST_LAST", "FROM_YEAR", "TO_YEAR", "ONLY_LETTER"]])
456 | .pipe(lambda df_: df_.merge(bbref_letters, how="inner", left_on="ONLY_LETTER", right_on="only_letter"))
457 | .pipe(lambda df_: df_.loc[~df_["PERSON_ID"].isin([203183, 203502])])
458 | .reset_index(drop=True)
459 | )
460 |
461 | merge_df = pd.concat([merge_df, comp_letter.drop(columns=["ONLY_LETTER", "only_letter"])], axis=0, ignore_index=True)
462 |
463 | self.upd_non_merge(merge_df)
464 |
465 | bbref_letters = (
466 | bbref_letters
467 | .pipe(lambda df_: df_.loc[~df_.bbref_id.isin(comp_letter.bbref_id)])
468 | .reset_index(drop=True)
469 | )
470 |
471 | nba_letters = (
472 | nba_letters
473 | .pipe(lambda df_: df_.loc[~df_.PERSON_ID.isin(comp_letter.PERSON_ID)])
474 | .reset_index(drop=True)
475 | )
476 |
477 | list_nba_names = nba_letters.ONLY_LETTER.to_list()
478 | list_bbref_names = bbref_letters.only_letter.to_list()
479 |
480 | best = []
481 | idx_best = []
482 | for player in list_nba_names:
483 | min_dist = 10000
484 | second_min_dist = 10000
485 | idx = 0
486 | for idx_comp, player_comp in enumerate(list_bbref_names):
487 | dist = distance(player, player_comp)
488 | if dist <= min_dist:
489 | min_dist = dist
490 | idx = idx_comp
491 | elif dist < second_min_dist:
492 | second_min_dist = dist
493 | else:
494 | pass
495 | best.append(min_dist)
496 | idx_best.append(idx)
497 |
498 | comp_lev = (
499 | nba_letters
500 | .pipe(lambda df_: df_.loc[:, ["PERSON_ID", "DISPLAY_FIRST_LAST", "FROM_YEAR", "TO_YEAR", "ONLY_LETTER"]])
501 | .assign(
502 | LEN_LETTER=lambda df_: [len(x) for x in df_.ONLY_LETTER],
503 | BEST_LEV=best,
504 | BEST_IDX=idx_best
505 | )
506 | .pipe(lambda df_: df_.loc[(df_.BEST_LEV <= 2) & (~df_.PERSON_ID.isin([203183, 203502])),
507 | ["PERSON_ID", "DISPLAY_FIRST_LAST",
508 | "FROM_YEAR", "TO_YEAR", "BEST_IDX"]])
509 | .reset_index(drop=True)
510 | .pipe(lambda df_: df_.merge(
511 | bbref_letters.assign(IDX=lambda df_: df_.index),
512 | how="inner",
513 | left_on="BEST_IDX", right_on="IDX"
514 | ))
515 | .drop(columns=["BEST_IDX", "only_letter", "IDX"])
516 | )
517 |
518 | merge_df = pd.concat([merge_df, comp_lev], axis=0, ignore_index=True)
519 |
520 | self.upd_non_merge(merge_df)
521 |
522 | return merge_df
523 |
524 | def merge_from_dict(self, merge_df: pd.DataFrame) -> pd.DataFrame:
525 | """Merge players using predefined mapping dictionary.
526 |
527 | Args:
528 | merge_df (pd.DataFrame): Previously merged player data.
529 |
530 | Returns:
531 | pd.DataFrame: Final merged DataFrame with all matches.
532 | """
533 | comp_dict = (
534 | self.non_merge_nbastats
535 | .assign(bbref_id = lambda df_: [self._mapping_dict(x) for x in df_.PERSON_ID])
536 | .pipe(lambda df_: df_.merge(self.non_merge_bbref, how="left", on="bbref_id"))
537 | .pipe(lambda df_: df_.loc[:, ["PERSON_ID", "DISPLAY_FIRST_LAST", "FROM_YEAR", "TO_YEAR", "name", "url",
538 | "bbref_id", "from_year", "to_year"]])
539 | .assign(
540 | url=lambda df_: [
541 | "/".join(["https://www.basketball-reference.com/players", x[0], x + ".html"])
542 | if isinstance(x, str) else None
543 | for x in df_.bbref_id
544 | ]
545 | )
546 | )
547 |
548 | merge_df = pd.concat([merge_df, comp_dict], axis=0, ignore_index=True)
549 |
550 | self.upd_non_merge(merge_df)
551 |
552 | return merge_df.drop(columns=["FROM_YEAR", "TO_YEAR", "from_year", "to_year"])
553 |
554 | def upd_non_merge(self, merge_df: pd.DataFrame) -> None:
555 | """Update non-merged player lists after each merge operation.
556 |
557 | Args:
558 | merge_df (pd.DataFrame): Current merged player data.
559 | """
560 | merge_bbref_id = merge_df.bbref_id
561 | merge_person_id = merge_df.PERSON_ID
562 |
563 | if self.non_merge_bbref is not None:
564 | self.non_merge_bbref = self.non_merge_bbref[~self.non_merge_bbref.bbref_id.isin(merge_bbref_id)].reset_index(drop=True)
565 | else:
566 | self.non_merge_bbref = self.bbref.loc[~self.bbref.bbref_id.isin(merge_bbref_id)].reset_index(drop=True)
567 |
568 | if self.non_merge_nbastats is not None:
569 | self.non_merge_nbastats = self.non_merge_nbastats[~self.non_merge_nbastats.PERSON_ID.isin(merge_person_id)].reset_index(drop=True)
570 | else:
571 | self.non_merge_nbastats = self.nbastats.loc[~self.nbastats.PERSON_ID.isin(merge_person_id)].reset_index(drop=True)
572 |
573 | @staticmethod
574 | def _detect_non_english(names: str) -> bool:
575 | """Detect if a name contains non-English characters.
576 |
577 | Args:
578 | names (str): Player name to check.
579 |
580 | Returns:
581 | bool: True if name contains non-English characters, False otherwise.
582 | """
583 | ord_name = not all([ord(x) in ENGLISH for x in names])
584 | return ord_name
585 |
586 | @staticmethod
587 | def _count_non_english(names: str) -> int:
588 | """Count number of non-English characters in a name.
589 |
590 | Args:
591 | names (str): Player name to check.
592 |
593 | Returns:
594 | int: Number of non-English characters found.
595 | """
596 | return np.sum([ord(x) not in ENGLISH for x in names])
597 |
598 | @staticmethod
599 | def _mapping_dict(person_id: int) -> Optional[str]:
600 | """Get Basketball Reference ID from mapping dictionary.
601 |
602 | Args:
603 | person_id (int): NBA Stats API player ID.
604 |
605 | Returns:
606 | Optional[str]: Basketball Reference ID if found, None otherwise.
607 | """
608 | try:
609 | bbref_id = MAPPING_DICT[person_id]
610 | except KeyError:
611 | bbref_id = None
612 | return bbref_id
613 |
614 | class MappingBasketID(object):
615 | """Main class for mapping basketball player IDs between different sources.
616 |
617 | This class orchestrates the entire process of mapping player IDs between
618 | NBA Stats API and Basketball Reference data sources.
619 | """
620 |
621 | def __init__(self):
622 | """Initialize MappingBasketID."""
623 | pass
624 |
625 | def __call__(self, *args, **kwargs):
626 | """Execute the complete ID mapping process.
627 |
628 | Args:
629 | **kwargs: Keyword arguments including:
630 | verbose (bool): Whether to print progress information.
631 | bbref (pd.DataFrame): Existing Basketball Reference data.
632 | nbastats (pd.DataFrame): Existing NBA Stats data.
633 | letters (str): Letters to scrape from Basketball Reference.
634 | base_url (str): Base URL for Basketball Reference.
635 |
636 | Returns:
637 | pd.DataFrame: Complete mapping between NBA Stats and Basketball Reference IDs.
638 | """
639 |
640 | self.verbose = kwargs.get("verbose", False)
641 | self.bbref = kwargs.get("bbref", None)
642 | self.nbastats = kwargs.get("nbastats", None)
643 | self.letters = kwargs.get("letters", ascii_lowercase)
644 | self.base_url = kwargs.get("base_url", "https://www.basketball-reference.com/players")
645 | if self.bbref is None:
646 | bbref_players = PlayerDataBBref(verbose=self.verbose, letters=self.letters, base_url=self.base_url)
647 | self.bbref = bbref_players.bbref_player_data()
648 | if self.nbastats is None:
649 | self.nbastats = CommonAllPlayers().get_data_frames()[0]
650 | merge_players = MergePlayerID(self.nbastats, self.bbref)
651 | players_df = merge_players.merge_by_name()
652 | players_df = merge_players.merge_double(players_df)
653 | players_df = merge_players.merge_non_english(players_df)
654 | players_df = merge_players.merge_surname(players_df)
655 | players_df = merge_players.merge_surname(players_df)
656 | players_df = merge_players.merge_wo_punctuation(players_df)
657 | players_df = merge_players.merge_from_dict(players_df)
658 |
659 | return players_df
660 |
661 | mapping_nba_id = MappingBasketID()
662 |
--------------------------------------------------------------------------------
/mapping_nba_ids/requirements.txt:
--------------------------------------------------------------------------------
1 | nba_api>=1.4.0
2 | numpy>=1.22.2,<2.0.0
3 | pandas>=2.0.0
4 | Levenshtein==0.26.1
5 | beautifulsoup4>=4.10.0
6 | requests>=2.31.0
7 | lxml>=5.2.0
--------------------------------------------------------------------------------
/sat_logo.jpeg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/shufinskiy/sport_analytics_tools/c3b1172790725630953800b3878a472d55153b7b/sat_logo.jpeg
--------------------------------------------------------------------------------