├── .Python ├── .gitignore ├── LICENSE.rst ├── README ├── __init__.py ├── manage.py ├── mbb ├── __init__.py ├── admin.py ├── models.py ├── scrapers.py ├── tests.py └── views.py ├── pip-selfcheck.json ├── requirements.txt └── utils.py /.Python: -------------------------------------------------------------------------------- 1 | /System/Library/Frameworks/Python.framework/Versions/2.7/Python -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | *.pyc 2 | settings.py 3 | bin/ 4 | include/ 5 | lib/ 6 | settings.py 7 | manage.py 8 | -------------------------------------------------------------------------------- /LICENSE.rst: -------------------------------------------------------------------------------- 1 | Apache License 2 | Version 2.0, January 2004 3 | http://www.apache.org/licenses/ 4 | 5 | TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 6 | 7 | 1. Definitions. 8 | 9 | "License" shall mean the terms and conditions for use, reproduction, 10 | and distribution as defined by Sections 1 through 9 of this document. 11 | 12 | "Licensor" shall mean the copyright owner or entity authorized by 13 | the copyright owner that is granting the License. 14 | 15 | "Legal Entity" shall mean the union of the acting entity and all 16 | other entities that control, are controlled by, or are under common 17 | control with that entity. For the purposes of this definition, 18 | "control" means (i) the power, direct or indirect, to cause the 19 | direction or management of such entity, whether by contract or 20 | otherwise, or (ii) ownership of fifty percent (50%) or more of the 21 | outstanding shares, or (iii) beneficial ownership of such entity. 22 | 23 | "You" (or "Your") shall mean an individual or Legal Entity 24 | exercising permissions granted by this License. 25 | 26 | "Source" form shall mean the preferred form for making modifications, 27 | including but not limited to software source code, documentation 28 | source, and configuration files. 29 | 30 | "Object" form shall mean any form resulting from mechanical 31 | transformation or translation of a Source form, including but 32 | not limited to compiled object code, generated documentation, 33 | and conversions to other media types. 34 | 35 | "Work" shall mean the work of authorship, whether in Source or 36 | Object form, made available under the License, as indicated by a 37 | copyright notice that is included in or attached to the work 38 | (an example is provided in the Appendix below). 39 | 40 | "Derivative Works" shall mean any work, whether in Source or Object 41 | form, that is based on (or derived from) the Work and for which the 42 | editorial revisions, annotations, elaborations, or other modifications 43 | represent, as a whole, an original work of authorship. For the purposes 44 | of this License, Derivative Works shall not include works that remain 45 | separable from, or merely link (or bind by name) to the interfaces of, 46 | the Work and Derivative Works thereof. 47 | 48 | "Contribution" shall mean any work of authorship, including 49 | the original version of the Work and any modifications or additions 50 | to that Work or Derivative Works thereof, that is intentionally 51 | submitted to Licensor for inclusion in the Work by the copyright owner 52 | or by an individual or Legal Entity authorized to submit on behalf of 53 | the copyright owner. For the purposes of this definition, "submitted" 54 | means any form of electronic, verbal, or written communication sent 55 | to the Licensor or its representatives, including but not limited to 56 | communication on electronic mailing lists, source code control systems, 57 | and issue tracking systems that are managed by, or on behalf of, the 58 | Licensor for the purpose of discussing and improving the Work, but 59 | excluding communication that is conspicuously marked or otherwise 60 | designated in writing by the copyright owner as "Not a Contribution." 61 | 62 | "Contributor" shall mean Licensor and any individual or Legal Entity 63 | on behalf of whom a Contribution has been received by Licensor and 64 | subsequently incorporated within the Work. 65 | 66 | 2. Grant of Copyright License. Subject to the terms and conditions of 67 | this License, each Contributor hereby grants to You a perpetual, 68 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 69 | copyright license to reproduce, prepare Derivative Works of, 70 | publicly display, publicly perform, sublicense, and distribute the 71 | Work and such Derivative Works in Source or Object form. 72 | 73 | 3. Grant of Patent License. Subject to the terms and conditions of 74 | this License, each Contributor hereby grants to You a perpetual, 75 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 76 | (except as stated in this section) patent license to make, have made, 77 | use, offer to sell, sell, import, and otherwise transfer the Work, 78 | where such license applies only to those patent claims licensable 79 | by such Contributor that are necessarily infringed by their 80 | Contribution(s) alone or by combination of their Contribution(s) 81 | with the Work to which such Contribution(s) was submitted. If You 82 | institute patent litigation against any entity (including a 83 | cross-claim or counterclaim in a lawsuit) alleging that the Work 84 | or a Contribution incorporated within the Work constitutes direct 85 | or contributory patent infringement, then any patent licenses 86 | granted to You under this License for that Work shall terminate 87 | as of the date such litigation is filed. 88 | 89 | 4. Redistribution. You may reproduce and distribute copies of the 90 | Work or Derivative Works thereof in any medium, with or without 91 | modifications, and in Source or Object form, provided that You 92 | meet the following conditions: 93 | 94 | (a) You must give any other recipients of the Work or 95 | Derivative Works a copy of this License; and 96 | 97 | (b) You must cause any modified files to carry prominent notices 98 | stating that You changed the files; and 99 | 100 | (c) You must retain, in the Source form of any Derivative Works 101 | that You distribute, all copyright, patent, trademark, and 102 | attribution notices from the Source form of the Work, 103 | excluding those notices that do not pertain to any part of 104 | the Derivative Works; and 105 | 106 | (d) If the Work includes a "NOTICE" text file as part of its 107 | distribution, then any Derivative Works that You distribute must 108 | include a readable copy of the attribution notices contained 109 | within such NOTICE file, excluding those notices that do not 110 | pertain to any part of the Derivative Works, in at least one 111 | of the following places: within a NOTICE text file distributed 112 | as part of the Derivative Works; within the Source form or 113 | documentation, if provided along with the Derivative Works; or, 114 | within a display generated by the Derivative Works, if and 115 | wherever such third-party notices normally appear. The contents 116 | of the NOTICE file are for informational purposes only and 117 | do not modify the License. You may add Your own attribution 118 | notices within Derivative Works that You distribute, alongside 119 | or as an addendum to the NOTICE text from the Work, provided 120 | that such additional attribution notices cannot be construed 121 | as modifying the License. 122 | 123 | You may add Your own copyright statement to Your modifications and 124 | may provide additional or different license terms and conditions 125 | for use, reproduction, or distribution of Your modifications, or 126 | for any such Derivative Works as a whole, provided Your use, 127 | reproduction, and distribution of the Work otherwise complies with 128 | the conditions stated in this License. 129 | 130 | 5. Submission of Contributions. Unless You explicitly state otherwise, 131 | any Contribution intentionally submitted for inclusion in the Work 132 | by You to the Licensor shall be under the terms and conditions of 133 | this License, without any additional terms or conditions. 134 | Notwithstanding the above, nothing herein shall supersede or modify 135 | the terms of any separate license agreement you may have executed 136 | with Licensor regarding such Contributions. 137 | 138 | 6. Trademarks. This License does not grant permission to use the trade 139 | names, trademarks, service marks, or product names of the Licensor, 140 | except as required for reasonable and customary use in describing the 141 | origin of the Work and reproducing the content of the NOTICE file. 142 | 143 | 7. Disclaimer of Warranty. Unless required by applicable law or 144 | agreed to in writing, Licensor provides the Work (and each 145 | Contributor provides its Contributions) on an "AS IS" BASIS, 146 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 147 | implied, including, without limitation, any warranties or conditions 148 | of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A 149 | PARTICULAR PURPOSE. You are solely responsible for determining the 150 | appropriateness of using or redistributing the Work and assume any 151 | risks associated with Your exercise of permissions under this License. 152 | 153 | 8. Limitation of Liability. In no event and under no legal theory, 154 | whether in tort (including negligence), contract, or otherwise, 155 | unless required by applicable law (such as deliberate and grossly 156 | negligent acts) or agreed to in writing, shall any Contributor be 157 | liable to You for damages, including any direct, indirect, special, 158 | incidental, or consequential damages of any character arising as a 159 | result of this License or out of the use or inability to use the 160 | Work (including but not limited to damages for loss of goodwill, 161 | work stoppage, computer failure or malfunction, or any and all 162 | other commercial damages or losses), even if such Contributor 163 | has been advised of the possibility of such damages. 164 | 165 | 9. Accepting Warranty or Additional Liability. While redistributing 166 | the Work or Derivative Works thereof, You may choose to offer, 167 | and charge a fee for, acceptance of support, warranty, indemnity, 168 | or other liability obligations and/or rights consistent with this 169 | License. However, in accepting such obligations, You may act only 170 | on Your own behalf and on Your sole responsibility, not on behalf 171 | of any other Contributor, and only if You agree to indemnify, 172 | defend, and hold each Contributor harmless for any liability 173 | incurred by, or claims asserted against, such Contributor by reason 174 | of your accepting any such warranty or additional liability. 175 | 176 | END OF TERMS AND CONDITIONS 177 | 178 | Copyright 2011-2013 Derek Willis 179 | 180 | Licensed under the Apache License, Version 2.0 (the "License"); 181 | you may not use this file except in compliance with the License. 182 | You may obtain a copy of the License at 183 | 184 | http://www.apache.org/licenses/LICENSE-2.0 185 | 186 | Unless required by applicable law or agreed to in writing, software 187 | distributed under the License is distributed on an "AS IS" BASIS, 188 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 189 | See the License for the specific language governing permissions and 190 | limitations under the License. 191 | -------------------------------------------------------------------------------- /README: -------------------------------------------------------------------------------- 1 | ==== NCAA API ==== 2 | 3 | A Python application to turn the NCAA's Web-based statistics into an API. Relies on Django, BeautifulSoup and madness. And, in an ironic twist, the NCAA's stat site is a Rails app. The initial application is for men's basketball, but in theory (and my dreams) it could be extended to other sports. 4 | 5 | What works: Provided you create a Season object using the appropriate NCAA id (see the models in the mbb app for details), you can run the team_parser and roster_parser to fill out information about teams and players. TK: season stats for teams and players first, then individual games. -------------------------------------------------------------------------------- /__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/dwillis/NCAA-API/3401bdc0dc69f74a806cfe4ca79bae87ba8ad38f/__init__.py -------------------------------------------------------------------------------- /manage.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | import os 3 | import sys 4 | 5 | if __name__ == "__main__": 6 | os.environ.setdefault("DJANGO_SETTINGS_MODULE", "settings") 7 | 8 | from django.core.management import execute_from_command_line 9 | 10 | execute_from_command_line(sys.argv) 11 | -------------------------------------------------------------------------------- /mbb/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/dwillis/NCAA-API/3401bdc0dc69f74a806cfe4ca79bae87ba8ad38f/mbb/__init__.py -------------------------------------------------------------------------------- /mbb/admin.py: -------------------------------------------------------------------------------- 1 | from django.contrib import admin 2 | from mbb.models import Team, Season, TeamSeason, Player, PlayerSeason, Game 3 | 4 | class SeasonAdmin(admin.ModelAdmin): 5 | list_display = ('season', 'start_year', 'end_year', 'ncaa_id') 6 | 7 | class TeamAdmin(admin.ModelAdmin): 8 | list_display = ('name', 'ncaa_id') 9 | 10 | class TeamSeasonAdmin(admin.ModelAdmin): 11 | pass 12 | 13 | class PlayerAdmin(admin.ModelAdmin): 14 | pass 15 | 16 | class PlayerSeasonAdmin(admin.ModelAdmin): 17 | list_display = ('player', 'team_season', 'position', 'year') 18 | 19 | class GameAdmin(admin.ModelAdmin): 20 | list_display = ('ncaa_id', 'home_team', 'home_team_score', 'visiting_team', 'visiting_team_score', 'datetime') 21 | 22 | admin.site.register(Season, SeasonAdmin) 23 | admin.site.register(Team, TeamAdmin) 24 | admin.site.register(TeamSeason, TeamSeasonAdmin) 25 | admin.site.register(Player, PlayerAdmin) 26 | admin.site.register(PlayerSeason, PlayerSeasonAdmin) 27 | admin.site.register(Game, GameAdmin) 28 | -------------------------------------------------------------------------------- /mbb/models.py: -------------------------------------------------------------------------------- 1 | from django.db import models 2 | from django.template.defaultfilters import slugify 3 | from django.utils.encoding import smart_unicode 4 | from decimal import * 5 | 6 | class Team(models.Model): 7 | """ 8 | Represents a college with a basketball team. The NCAA id is the one used by the 9 | stats.ncaa.org site to denote a team. For example, Pittsburgh's id is 545. 10 | """ 11 | ncaa_id = models.IntegerField() 12 | name = models.CharField(max_length=125) 13 | slug = models.SlugField(max_length=125) 14 | 15 | def __unicode__(self): 16 | return smart_unicode(self.name) 17 | 18 | def save(self, *args, **kwargs): 19 | self.slug = slugify(self.name) 20 | super(Team, self).save(*args, **kwargs) 21 | 22 | 23 | class Season(models.Model): 24 | """ 25 | Represents a single basketball season which spans two years. The NCAA id is the one used by the 26 | stats.ncaa.org site to denote a season. For example, the 2010-11 season has an id of 10440. 27 | """ 28 | season = models.CharField(max_length=7) 29 | start_year = models.IntegerField() 30 | end_year = models.IntegerField() 31 | ncaa_id = models.IntegerField() 32 | 33 | def __unicode__(self): 34 | return smart_unicode(self.season) 35 | 36 | class TeamSeason(models.Model): 37 | """ 38 | Represents a team during a particular season, along with information about that team. Since 39 | a team can change divisions from one season to the next, the division information is here, not 40 | in Team. 41 | """ 42 | team = models.ForeignKey(Team) 43 | season = models.ForeignKey(Season) 44 | division = models.IntegerField() 45 | wins = models.IntegerField(default=0) 46 | losses = models.IntegerField(default=0) 47 | minutes = models.IntegerField(default=0) 48 | field_goals_made = models.IntegerField(default=0) 49 | field_goals_attempted = models.IntegerField(default=0) 50 | field_goals_percentage = models.DecimalField(max_digits=5, decimal_places=2, default=Decimal('0.0')) 51 | three_point_fg_made = models.IntegerField(default=0) 52 | three_point_fg_attempted = models.IntegerField(default=0) 53 | three_point_fg_percentage = models.DecimalField(max_digits=5, decimal_places=2, default=Decimal('0.0')) 54 | free_throws_made = models.IntegerField(default=0) 55 | free_throws_attempted = models.IntegerField(default=0) 56 | free_throws_percentage = models.DecimalField(max_digits=5, decimal_places=2, default=Decimal('0.0')) 57 | points = models.IntegerField(default=0) 58 | scoring_average = models.DecimalField(max_digits=4, decimal_places=2, default=Decimal('0.0')) 59 | offensive_rebounds = models.IntegerField(default=0) 60 | defensive_rebounds = models.IntegerField(default=0) 61 | total_rebounds = models.IntegerField(default=0) 62 | rebounds_average = models.DecimalField(max_digits=4, decimal_places=2, default=Decimal('0.0')) 63 | assists = models.IntegerField(default=0) 64 | turnovers = models.IntegerField(default=0) 65 | steals = models.IntegerField(default=0) 66 | blocks = models.IntegerField(default=0) 67 | fouls = models.IntegerField(default=0) 68 | opp_field_goals_made = models.IntegerField(default=0) 69 | opp_field_goals_attempted = models.IntegerField(default=0) 70 | opp_field_goals_percentage = models.DecimalField(max_digits=5, decimal_places=2, default=Decimal('0.0')) 71 | opp_three_point_fg_made = models.IntegerField(default=0) 72 | opp_three_point_fg_attempted = models.IntegerField(default=0) 73 | opp_three_point_fg_percentage = models.DecimalField(max_digits=5, decimal_places=2, default=Decimal('0.0')) 74 | opp_free_throws_made = models.IntegerField(default=0) 75 | opp_free_throws_attempted = models.IntegerField(default=0) 76 | opp_free_throws_percentage = models.DecimalField(max_digits=5, decimal_places=2, default=Decimal('0.0')) 77 | opp_points = models.IntegerField(default=0) 78 | opp_scoring_average = models.DecimalField(max_digits=4, decimal_places=2, default=Decimal('0.0')) 79 | opp_offensive_rebounds = models.IntegerField(default=0) 80 | opp_defensive_rebounds = models.IntegerField(default=0) 81 | opp_total_rebounds = models.IntegerField(default=0) 82 | opp_rebounds_average = models.DecimalField(max_digits=4, decimal_places=2, default=Decimal('0.0')) 83 | opp_assists = models.IntegerField(default=0) 84 | opp_turnovers = models.IntegerField(default=0) 85 | opp_steals = models.IntegerField(default=0) 86 | opp_blocks = models.IntegerField(default=0) 87 | opp_fouls = models.IntegerField(default=0) 88 | 89 | def __unicode__(self): 90 | return smart_unicode('%s in %s') % (self.team, self.season) 91 | 92 | def ncaa_url(self): 93 | return "http://stats.ncaa.org/team/index/%s?org_id=%s" % (self.season.ncaa_id, self.team.ncaa_id) 94 | 95 | 96 | class Player(models.Model): 97 | """ 98 | Represents a college basketball player as identified by the NCAA. The ncaa_id is the unique one used by 99 | the stats.ncaa.org site. For example, Ashton Gibbs of Pittsburgh has an id of 904890.0, of which we only 100 | store the integer, since that's all that seems to matter. Not every player has an ID, however, including 101 | some transfers who are not eligible in a given year. 102 | """ 103 | name = models.CharField(max_length=255) 104 | slug = models.SlugField(max_length=255) 105 | ncaa_id = models.IntegerField() 106 | 107 | def __unicode__(self): 108 | return smart_unicode(self.name) 109 | 110 | def save(self, *args, **kwargs): 111 | self.slug = slugify(self.name) 112 | super(Player, self).save(*args, **kwargs) 113 | 114 | 115 | class PlayerSeason(models.Model): 116 | """ 117 | Represents a college basketball player during a particular season. Since player information such as uniform 118 | number and year change from season to season, this information is retained here rather than in Player. The 119 | height is broken into two fields to enable comparisons. Not all players have positions, heights, years or 120 | jersey numbers that are integers. 121 | """ 122 | player = models.ForeignKey(Player) 123 | team_season = models.ForeignKey(TeamSeason) 124 | position = models.CharField(max_length=7) 125 | feet = models.IntegerField(null=True) 126 | inches = models.IntegerField(null=True) 127 | year = models.CharField(max_length=5, null=True) 128 | jersey = models.IntegerField(null=True) 129 | games_played = models.IntegerField(default=0) 130 | games_started = models.IntegerField(default=0) 131 | minutes = models.IntegerField(default=0) 132 | field_goals_made = models.IntegerField(default=0) 133 | field_goals_attempted = models.IntegerField(default=0) 134 | field_goals_percentage = models.DecimalField(max_digits=5, decimal_places=2, default=Decimal('0.0')) 135 | three_point_fg_made = models.IntegerField(default=0) 136 | three_point_fg_attempted = models.IntegerField(default=0) 137 | three_point_fg_percentage = models.DecimalField(max_digits=5, decimal_places=2, default=Decimal('0.0')) 138 | free_throws_made = models.IntegerField(default=0) 139 | free_throws_attempted = models.IntegerField(default=0) 140 | free_throws_percentage = models.DecimalField(max_digits=5, decimal_places=2, default=Decimal('0.0')) 141 | points = models.IntegerField(default=0) 142 | scoring_average = models.DecimalField(max_digits=4, decimal_places=2, default=Decimal('0.0')) 143 | offensive_rebounds = models.IntegerField(default=0) 144 | defensive_rebounds = models.IntegerField(default=0) 145 | total_rebounds = models.IntegerField(default=0) 146 | rebounds_average = models.DecimalField(max_digits=4, decimal_places=2, default=Decimal('0.0')) 147 | assists = models.IntegerField(default=0) 148 | turnovers = models.IntegerField(default=0) 149 | steals = models.IntegerField(default=0) 150 | blocks = models.IntegerField(default=0) 151 | fouls = models.IntegerField(default=0) 152 | double_doubles = models.IntegerField(default=0) 153 | triple_doubles = models.IntegerField(default=0) 154 | 155 | def __unicode__(self): 156 | return smart_unicode('%s, %s') % (self.player, self.team_season) 157 | 158 | def height(self): 159 | return smart_unicode('%s-%s') % (self.feet, self.inches) 160 | 161 | def ncaa_url(self): 162 | return "http://stats.ncaa.org/player?game_sport_year_ctl_id=%s&stats_player_seq=%s" % (self.team_season_id, self.player_id) 163 | 164 | class Game(models.Model): 165 | """ 166 | Represents a game between two teams. Has an NCAA-generated unique ID and can have multiple 167 | TeamGamePeriods. 168 | """ 169 | ncaa_id = models.IntegerField() 170 | home_team = models.ForeignKey(TeamSeason, related_name="home_team") 171 | visiting_team = models.ForeignKey(TeamSeason, related_name="visiting_team") 172 | datetime = models.DateTimeField() 173 | location = models.CharField(max_length=255) 174 | attendance = models.IntegerField(null=True) 175 | officials = models.CharField(max_length=255) 176 | home_team_score = models.IntegerField() 177 | visiting_team_score = models.IntegerField() 178 | 179 | def __unicode__(self): 180 | return smart_unicode("%d") % self.ncaa_id 181 | 182 | def box_score_url(self): 183 | return "http://stats.ncaa.org/game/box_score/%s" % self.ncaa_id 184 | 185 | def period_stats_url(self): 186 | return "http://stats.ncaa.org/game/period_stats/%s" % self.ncaa_id 187 | 188 | def play_by_play_url(self): 189 | return "http://stats.ncaa.org/game/play_by_play/%s" % self.ncaa_id 190 | 191 | 192 | class TeamGamePeriod(models.Model): 193 | """ 194 | Represents a period within a game for a team - typically the first or second halves, 195 | or final, but games may have multiple overtime periods as well. Times of possession 196 | fields are in seconds, not minutes. 197 | """ 198 | game = models.ForeignKey(Game) 199 | team = models.ForeignKey(TeamSeason) 200 | is_home_team = models.BooleanField() 201 | game_period = models.CharField(max_length=5) 202 | field_goals_made = models.IntegerField(default=0) 203 | field_goals_attempted = models.IntegerField(default=0) 204 | three_point_fg_made = models.IntegerField(default=0) 205 | three_point_fg_attempted = models.IntegerField(default=0) 206 | free_throws_made = models.IntegerField(default=0) 207 | free_throws_attempted = models.IntegerField(default=0) 208 | points = models.IntegerField(default=0) 209 | offensive_rebounds = models.IntegerField(default=0) 210 | defensive_rebounds = models.IntegerField(default=0) 211 | total_rebounds = models.IntegerField(default=0) 212 | assists = models.IntegerField(default=0) 213 | turnovers = models.IntegerField(default=0) 214 | steals = models.IntegerField(default=0) 215 | blocks = models.IntegerField(default=0) 216 | fouls = models.IntegerField(default=0) 217 | time_of_possession = models.IntegerField(default=0) 218 | scoring_time_of_possession = models.IntegerField(default=0) 219 | times_took_lead = models.IntegerField(default=0) 220 | largest_lead = models.IntegerField(default=0) 221 | time_of_largest_lead = models.IntegerField(default=0) 222 | bench_points = models.IntegerField(default=0) 223 | times_tied_score = models.IntegerField(default=0) 224 | second_chance_points = models.IntegerField(default=0) 225 | points_off_turnovers = models.IntegerField(default=0) 226 | fastbreak_points = models.IntegerField(default=0) 227 | points_in_paint = models.IntegerField(default=0) 228 | 229 | 230 | 231 | 232 | -------------------------------------------------------------------------------- /mbb/scrapers.py: -------------------------------------------------------------------------------- 1 | import datetime 2 | from dateutil.parser import * 3 | from utils import soupify 4 | from django.utils.safestring import SafeUnicode 5 | from mbb.models import Game, Season, Team, TeamSeason, Player, PlayerSeason 6 | 7 | def load_team_schedules(season_id): 8 | teams = Team.objects.all() 9 | for team in teams: 10 | schedule_parser(season_id, team.ncaa_id) 11 | 12 | def game_parser(game_id, season_id=2011): 13 | url = "http://stats.ncaa.org/game/box_score/%s" % game_id 14 | soup = soupify(url) 15 | season = Season.objects.get(end_year=season_id) 16 | visit_id, home_id = [int(x['href'].split('=')[1]) for x in soup.findAll('table')[0].findAll('a')] 17 | try: 18 | visit = TeamSeason.objects.select_related().get(team__ncaa_id=visit_id, season=season) 19 | except: 20 | v_team, created = Team.objects.get_or_create(ncaa_id=visit_id, name=soup.findAll('table')[0].findAll('a')[0].renderContents()) 21 | visit = TeamSeason.objects.create(team=v_team, season=season, division=0) 22 | home = TeamSeason.objects.select_related().get(team__ncaa_id=home_id, season=season) 23 | game_details = soup.findAll('table')[2] 24 | dt = parse(game_details.findAll('td')[1].contents[0]) 25 | loc = game_details.findAll('td')[3].contents[0] 26 | try: 27 | attend = int(game_details.findAll('td')[5].contents[0].replace(',','')) 28 | except: 29 | attend = None 30 | officials = soup.findAll('table')[3].findAll('td')[1].contents[0].strip() 31 | scores = soup.findAll('table')[0].findAll('td', attrs={'align':'right'}) 32 | visit_team_scores = [int(x.renderContents()) for x in scores[0:len(scores)/2]] 33 | home_team_scores = [int(x.renderContents()) for x in scores[len(scores)/2:len(scores)]] # second team listed is considered home team 34 | home_final = home_team_scores[(len(scores)/2)-1] 35 | visit_final = visit_team_scores[(len(scores)/2)-1] 36 | game, created = Game.objects.get_or_create(ncaa_id=game_id, home_team=home, visiting_team=visit, datetime=dt, location=SafeUnicode(loc), attendance=attend, officials=SafeUnicode(officials), home_team_score=home_final, visiting_team_score=visit_final) 37 | 38 | def team_parser(season_id=2011, division="1"): 39 | # defaults to division 1, but also supports division 3 40 | season = Season.objects.get(end_year=season_id) 41 | url = "http://stats.ncaa.org/team/inst_team_list/%s?division=%s" % (season.ncaa_id, division) 42 | soup = soupify(url) 43 | team_links = [x.find('a') for x in soup.findAll('td')] 44 | for team in team_links: 45 | ncaa_id = int(team["href"].split("=")[1]) 46 | name = SafeUnicode(team.contents[0]) 47 | t, created = Team.objects.get_or_create(ncaa_id = ncaa_id, name = name) 48 | team_season, created = TeamSeason.objects.get_or_create(team=t, season=season, division=1) 49 | 50 | def schedule_parser(season_id, team_id): 51 | season = Season.objects.get(ncaa_id=season_id) 52 | url = "http://stats.ncaa.org/team/index/%s?org_id=%s" % (season_id, team_id) 53 | soup = soupify(url) 54 | game_ids = [] 55 | links = soup.findAll('table')[1].findAll(lambda tag: tag.name == 'a' and tag.findParent('td', attrs={'class':'smtext'})) 56 | for link in links: 57 | if not link.has_key('onclick'): 58 | game_ids.append(int(link["href"].split("?")[0].split("/")[3])) 59 | for game_id in game_ids: 60 | game_parser(game_id) 61 | 62 | 63 | def roster_parser(season_id, team_id, division=1): 64 | team_season = TeamSeason.objects.select_related().get(team__ncaa_id=team_id, season__end_year=season_id) 65 | url = "http://stats.ncaa.org/team/index/%s?org_id=%s" % (team_season.season.ncaa_id, team_id) 66 | soup = soupify(url) 67 | rows = soup.findAll('table')[2].findAll('tr') 68 | player_links = rows[2:len(rows)] 69 | for p in player_links: 70 | try: 71 | ncaa_id = int(float(p.findAll('td')[1].find('a')['href'].split('=', 2)[2])) 72 | name = extract_player_name(p.findAll('td')[1].find('a').contents[0].split(',')) 73 | except: 74 | ncaa_id = -1 75 | name = extract_player_name(p.findAll('td')[1].contents[0].split(',')) 76 | player, player_created = Player.objects.get_or_create(name=name, ncaa_id = ncaa_id) 77 | player_season, ps_created = PlayerSeason.objects.get_or_create(player=player, team_season=team_season) 78 | if ps_created: 79 | try: 80 | player_season.jersey = int(p.findAll('td')[0].contents[0]) 81 | except: 82 | player_season.jersey = None 83 | try: 84 | player_season.position = SafeUnicode(p.findAll('td')[2].contents[0]) 85 | player_season.feet = int(p.findAll('td')[3].contents[0].split('-')[0]) 86 | player_season.inches = int(p.findAll('td')[3].contents[0].split('-')[1]) 87 | player_season.year = SafeUnicode(p.findAll('td')[4].contents[0]) 88 | except: 89 | pass 90 | player_season.save() 91 | 92 | def extract_player_name(name_text): 93 | try: 94 | last, first = [x.strip() for x in name_text] 95 | name = first + u' '+ last 96 | except ValueError: 97 | last, rest, first = [x.strip() for x in name_text] 98 | name = first + u' '+ last + u' '+ rest 99 | return name 100 | -------------------------------------------------------------------------------- /mbb/tests.py: -------------------------------------------------------------------------------- 1 | """ 2 | This file demonstrates writing tests using the unittest module. These will pass 3 | when you run "manage.py test". 4 | 5 | Replace this with more appropriate tests for your application. 6 | """ 7 | 8 | from django.test import TestCase 9 | 10 | 11 | class SimpleTest(TestCase): 12 | def test_basic_addition(self): 13 | """ 14 | Tests that 1 + 1 always equals 2. 15 | """ 16 | self.assertEqual(1 + 1, 2) 17 | -------------------------------------------------------------------------------- /mbb/views.py: -------------------------------------------------------------------------------- 1 | # Create your views here. 2 | -------------------------------------------------------------------------------- /pip-selfcheck.json: -------------------------------------------------------------------------------- 1 | {"last_check":"2015-09-08T00:18:35Z","pypi_version":"7.1.2"} -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | BeautifulSoup==3.2.1 2 | Django==1.8.4 3 | psycopg2==2.6.1 4 | requests==2.7.0 5 | -------------------------------------------------------------------------------- /utils.py: -------------------------------------------------------------------------------- 1 | import requests 2 | from BeautifulSoup import BeautifulSoup 3 | from mbb.models import Season, Team 4 | 5 | def soupify(url): 6 | """ 7 | Takes a url and returns parsed html via BeautifulSoup and requests. Used by the scrapers. 8 | """ 9 | r = requests.get(url) 10 | soup = BeautifulSoup(r.text) 11 | return soup 12 | 13 | 14 | def create_initial_seasons(): 15 | twenty_eleven = Season.objects.create(season='2010-11', start_year=2010, end_year=2011, ncaa_id=10440) 16 | twenty_ten = Season.objects.create(season='2009-10', start_year=2009, end_year=2010, ncaa_id=10260) 17 | --------------------------------------------------------------------------------