├── README.md
├── __init__.py
├── data_download.py
├── playerank
├── __init__.py
├── conf
│ └── features_weigths.json
├── features
│ ├── __init__.py
│ ├── abstract.py
│ ├── centerOfPerformanceFeature.py
│ ├── goalScoredFeatures.py
│ ├── matchPlayedFeatures.py
│ ├── plainAggregation.py
│ ├── playerankFeatures.py
│ ├── qualityFeatures.py
│ ├── relativeAggregation.py
│ ├── requirements.txt
│ ├── roleFeatures.py
│ └── wyscoutEventsDefinition.py
├── models
│ ├── Clusterer.py
│ ├── Rater.py
│ ├── Weighter.py
│ └── __init__.py
├── setup.py
└── utils
│ ├── __init__.py
│ ├── compute_features_weight.py
│ ├── compute_features_weight.py~
│ ├── compute_playerank.py
│ ├── compute_playerank.py~
│ └── compute_roles.py
└── playerank_schema_tist.png
/README.md:
--------------------------------------------------------------------------------
1 | # PlayeRank
2 |
3 | PlayeRank, a data-driven framework that offers a principled multi-dimensional and role-aware evaluation of the performance of soccer players.
4 |
5 | Playerank is designed to work with [soccer-logs](https://www.nature.com/articles/s41597-019-0247-7), in which a match consists of a sequence of events encoded as a tuple: `id`, `type`, `position`, `timestamp`,where `id` is the identifer of the player that originated/refers to this event, `type` is the event type (i.e., passes, shots, goals, tackles, etc.), `position` and `timestamp` denote the spatio-temporal coordinates of the event over the soccer field. PlayeRank assumes that soccer-logs are stored into a database, which is updated with new events after each soccer match.
6 |
7 | As described by the figure below, the PlayeRank framework consists of four main components:
8 |
9 | - soccer-logs database
10 | - rating module
11 | - learning module
12 | - ranking module
13 |
14 |
15 | 
16 | **Schema of the PlayeRank framework**. Starting from a database of soccer-logs **(a)**, it consists of three main phases. The learning phase **(c)** is an "online" procedure: It must be executed at least once before theother phases, since it generates information used in the other two phases, but then it can be updated separately. The rating **(b)** and the ranking phases **(d)** are online procedures, i.e., they are executed every time anew match is available in the database of soccer-logs.
17 |
18 | An exhaustive description of PlayeRank framework is available in this paper, if you use PlayeRank please cite it:
19 |
20 | Pappalardo, Luca, Cintia, Paolo, Ferragina, Paolo, Massucco, Emanuele, Pedreschi, Dino & Giannotti, Fosca (2019) PlayeRank: Data-driven Performance Evaluation and Player Ranking in Soccer via a Machine Learning Approach. ACM Transactions on Intelligent Systems and Technologies 10(5), DOI:https://doi.org/10.1145/3343172
21 |
22 | Bibtex:
23 | ```
24 | @article{10.1145/3343172,
25 | author = {Pappalardo, Luca and Cintia, Paolo and Ferragina, Paolo and Massucco, Emanuele and Pedreschi, Dino and Giannotti, Fosca},
26 | title = {PlayeRank: Data-Driven Performance Evaluation and Player Ranking in Soccer via a Machine Learning Approach},
27 | year = {2019},
28 | issue_date = {November 2019},
29 | publisher = {Association for Computing Machinery},
30 | address = {New York, NY, USA},
31 | volume = {10},
32 | number = {5},
33 | issn = {2157-6904},
34 | url = {https://doi.org/10.1145/3343172},
35 | doi = {10.1145/3343172},
36 | journal = {ACM Trans. Intell. Syst. Technol.},
37 | month = sep,
38 | articleno = {Article 59},
39 | numpages = {27},
40 | keywords = {data science, soccer analytics, clustering, searching, multi-dimensional analysis, football analytics, predictive modelling, ranking, big data, Sports analytics}
41 | }
42 | ```
43 |
44 | To build player rankings from soccer-logs data, the following steps are required:
45 |
46 | 1. compute feature weights (learning)
47 | 2. compute roles (learning)
48 | 3. compute performance scores (rating)
49 | 4. aggregate performance scores (ranking)
50 |
51 | The code to reproduce the PlayRank framework is available as a Google Colab document here:
52 | http://bit.ly/playerank_Tutorial
53 |
54 | It works on public soccer-logs data that are usable, upon citation, from this paper:
55 |
56 | Pappalardo, L., Cintia, P., Rossi, A., Massucco, E., Ferragina, P., Pedreschi, D. & Giannotti, F. (2019) A public data set of spatio-temporal match events in soccer competitions. Nature Scientific Data 6(236),doi:10.1038/s41597-019-0247-7
57 |
58 | Bibtex:
59 | ```
60 | @article{10.1038/s41597-019-0247-7,
61 | Abstract = {Soccer analytics is attracting increasing interest in academia and industry, thanks to the availability of sensing technologies that provide high-fidelity data streams for every match. Unfortunately, these detailed data are owned by specialized companies and hence are rarely publicly available for scientific research. To fill this gap, this paper describes the largest open collection of soccer-logs ever released, containing all the spatio-temporal events (passes, shots, fouls, etc.) that occured during each match for an entire season of seven prominent soccer competitions. Each match event contains information about its position, time, outcome, player and characteristics. The nature of team sports like soccer, halfway between the abstraction of a game and the reality of complex social systems, combined with the unique size and composition of this dataset, provide an ideal ground for tackling a wide range of data science problems, including the measurement and evaluation of performance, both at individual and at collective level, and the determinants of success and failure.},
62 | Author = {Pappalardo, Luca and Cintia, Paolo and Rossi, Alessio and Massucco, Emanuele and Ferragina, Paolo and Pedreschi, Dino and Giannotti, Fosca},
63 | Da = {2019/10/28},
64 | Date-Added = {2019-12-29 16:44:01 +0000},
65 | Date-Modified = {2019-12-29 16:44:01 +0000},
66 | Doi = {10.1038/s41597-019-0247-7},
67 | Id = {Pappalardo2019},
68 | Isbn = {2052-4463},
69 | Journal = {Scientific Data},
70 | Number = {1},
71 | Pages = {236},
72 | Title = {A public data set of spatio-temporal match events in soccer competitions},
73 | Ty = {JOUR},
74 | Url = {https://doi.org/10.1038/s41597-019-0247-7},
75 | Volume = {6},
76 | Year = {2019},
77 | Bdsk-Url-1 = {https://doi.org/10.1038/s41597-019-0247-7},
78 | Bdsk-Url-2 = {http://dx.doi.org/10.1038/s41597-019-0247-7}}
79 |
80 | ```
81 |
82 |
83 |
--------------------------------------------------------------------------------
/__init__.py:
--------------------------------------------------------------------------------
1 |
2 |
--------------------------------------------------------------------------------
/data_download.py:
--------------------------------------------------------------------------------
1 | """
2 |
3 | Downloading script for soccer logs public open dataset:
4 | https://figshare.com/collections/Soccer_match_event_dataset/4415000/2
5 |
6 | Data description available here:
7 |
8 | Please cite the source as:
9 |
10 |
11 | """
12 |
13 | import requests, zipfile, json, io
14 |
15 |
16 | dataset_links = {
17 |
18 | 'matches' : 'https://ndownloader.figshare.com/files/14464622',
19 | 'events' : 'https://ndownloader.figshare.com/files/14464685',
20 | 'players' : 'https://ndownloader.figshare.com/files/15073721',
21 | 'teams': 'https://ndownloader.figshare.com/files/15073697',
22 | }
23 |
24 |
25 | r = requests.get(dataset_links['matches'], stream=True)
26 | z = zipfile.ZipFile(io.BytesIO(r.content))
27 | z.extractall("data/matches")
28 |
29 | r = requests.get(dataset_links['events'], stream=True)
30 | z = zipfile.ZipFile(io.BytesIO(r.content))
31 | z.extractall("data/events")
32 | #
33 | r = requests.get(dataset_links['teams'], stream=False)
34 | print (r.text, file=open('data/teams.json','w'))
35 |
36 |
37 | r = requests.get(dataset_links['players'], stream=False)
38 | print (r.text, file=open('data/players.json','w'))
39 |
--------------------------------------------------------------------------------
/playerank/__init__.py:
--------------------------------------------------------------------------------
1 |
2 |
--------------------------------------------------------------------------------
/playerank/conf/features_weigths.json:
--------------------------------------------------------------------------------
1 | [{"feature_name": "Duel-Air duel", "weight": -0.006539158031210258}, {"feature_name": "Duel-Ground attacking duel", "weight": -0.010449840258461175}, {"feature_name": "Duel-Ground defending duel", "weight": -0.011199884240897105}, {"feature_name": "Duel-Ground loose ball duel", "weight": -0.006563558749954084}, {"feature_name": "Foul-Foul", "weight": -0.0024418993059094306}, {"feature_name": "Foul-Hand foul", "weight": -4.691059387662471e-05}, {"feature_name": "Foul-Late card foul", "weight": -4.627009973076854e-06}, {"feature_name": "Free Kick-Corner", "weight": -0.000770251646940783}, {"feature_name": "Free Kick-Free Kick", "weight": -0.0016080494936435326}, {"feature_name": "Free Kick-Free kick cross", "weight": -0.00027956399045302253}, {"feature_name": "Free Kick-Free kick shot", "weight": -2.4486972707553348e-05}, {"feature_name": "Free Kick-Goal kick", "weight": -0.001562937886724554}, {"feature_name": "Free Kick-Penalty", "weight": -2.6608471224568587e-06}, {"feature_name": "Free Kick-Throw in", "weight": -0.00335808302308207}, {"feature_name": "Goalkeeper leaving line-Goalkeeper leaving line", "weight": -0.0001439884391271861}, {"feature_name": "Others on the ball-Acceleration", "weight": -0.0012001055794066166}, {"feature_name": "Others on the ball-Clearance", "weight": -0.0024385592601346476}, {"feature_name": "Others on the ball-Touch", "weight": -0.006009466163102354}, {"feature_name": "Pass-Cross", "weight": -0.0027830544469876215}, {"feature_name": "Pass-Hand pass", "weight": -0.00032481786643807177}, {"feature_name": "Pass-Head pass", "weight": -0.0042851410847089535}, {"feature_name": "Pass-High pass", "weight": -0.005054173369198981}, {"feature_name": "Pass-Launch", "weight": -0.0018547584405906774}, {"feature_name": "Pass-Simple pass", "weight": -0.04355855466251161}, {"feature_name": "Pass-Smart pass", "weight": -0.0007643856879811938}, {"feature_name": "Save attempt-Reflexes", "weight": -0.00032000870309680147}, {"feature_name": "Save attempt-Save attempt", "weight": -0.00030415717728838655}, {"feature_name": "Shot-Shot", "weight": -0.0017280657576526214}, {"feature_name": "entity", "weight": -0.03312147947261782}, {"feature_name": "goal-scored", "weight": -0.8512573718382006}]
--------------------------------------------------------------------------------
/playerank/features/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/mesosbrodleto/playerank/79dd6464be98bbc35f48f99b4a3b626ea43e9a7e/playerank/features/__init__.py
--------------------------------------------------------------------------------
/playerank/features/abstract.py:
--------------------------------------------------------------------------------
1 | import abc
2 |
3 | class Feature(object):
4 | __metaclass__ = abc.ABCMeta
5 | """
6 | class to wrap all the scripts/method to aggregate features from the database
7 | """
8 | @abc.abstractmethod
9 | def createFeature(self,collectionName,param):
10 | """
11 | Method to define how a feature/set of features is computed.
12 | param contains eventual parameters for querying database competion, subset of teams, whatever
13 | Best practice:
14 | features have to be stored into a collection of documents in the form:
15 | {_id: {match: (numeric) unique identifier of the match,
16 | name: (string) name of the feature,
17 | entity: (string) name of the entity target of the aggregation. It could be teamId, playerID, teamID + role or whatever significant for an aggregation},
18 | value: (numeric) the count for the feature}
19 |
20 | return the name of the collection where the features have been stored
21 | """
22 | return
23 |
24 | class Aggregation(object):
25 | __metaclass__ = abc.ABCMeta
26 |
27 | """
28 | defines the methods to aggregate one/more collection of features for each match
29 | it have to provide results as a dataframe,
30 |
31 | e.g.
32 | it is used to compute relative feature for each match
33 | match -> team (or entity) -> featureTeam - featureOppositor
34 |
35 | """
36 |
37 | @abc.abstractproperty
38 | def get_features(self):
39 |
40 | return 'Should never get here'
41 | @abc.abstractproperty
42 | def set_features(self, collection_list):
43 | """
44 | set the list of collection to use for relative features computing
45 | e.g.
46 | we could have a collection of quality features, one for quantity features, one for goals scored etc
47 | """
48 | return
49 |
50 | @abc.abstractmethod
51 | def aggregate(self):
52 | """
53 | merge the collections of feature and aggregate by match and team, computing the relative value for each team
54 | e.g.
55 | match -> team (or entity) -> featureTeam - featureOppositor
56 |
57 | returns a dataframe
58 | """
59 | return
--------------------------------------------------------------------------------
/playerank/features/centerOfPerformanceFeature.py:
--------------------------------------------------------------------------------
1 | from .abstract import Feature
2 | from .wyscoutEventsDefinition import *
3 | import json
4 | import glob
5 | import numpy as np
6 | from collections import defaultdict
7 |
8 | class centerOfPerformanceFeature(Feature):
9 |
10 |
11 | def createFeature(self, events_path, players_file, select = None):
12 |
13 | """
14 | compute centerOfPerformanceFeatures
15 | parameters:
16 | -events_path: folder path of events file
17 | -players_file: file path of players data file
18 | -select: function for filtering matches collection. Default: aggregate over all matches
19 | -entity: it could either 'team' or 'player'.
20 | It selects the aggregation for qualityFeatures among teams or players qualityfeatures.
21 | Note: aggregation by team is exploited during learning phase, for features weights estimation,
22 | while aggregation by players is involved for rating phase.
23 |
24 | Output:
25 | list of json docs dictionaries in the format:
26 | {matchId : int , entity : int, feature: string , value}
27 | """
28 |
29 |
30 | events = []
31 | for file in glob.glob("%s"%events_path):
32 | data = json.load(open(file))
33 | if select:
34 | data = list(filter(select,data))
35 | events += data
36 | print ("[centerOfPerformanceFeature] added %s events from %s"%(len(data),file))
37 | events = filter(lambda x: x['playerId']!=0,events) #filtering out referee
38 | if select:
39 | events = filter(select,events)
40 | players = json.load(open(players_file))
41 |
42 | goalkeepers_ids = {player['wyId']:'GK' for player in players
43 | if player['role']['name']=='Goalkeeper'}
44 | events = filter(lambda x: x['playerId'] not in goalkeepers_ids,events )
45 | aggregated_features = defaultdict(lambda : defaultdict(lambda: defaultdict(int)))
46 |
47 | MIN_EVENTS = 10
48 | players_positions = defaultdict(lambda : defaultdict(list))
49 | for evt in events:
50 | if 'positions' in evt:
51 | player = evt['playerId']
52 | match = evt['matchId']
53 | position = (evt['positions'][0]['x'],evt['positions'][0]['y'])
54 | players_positions[match][player].append(position)
55 |
56 |
57 | #formatting features as json document
58 | results = []
59 | for match,players_pos in players_positions.items():
60 | for p in players_pos:
61 | positions = players_pos[p]
62 | x,y,count = np.mean([x[0] for x in positions]),np.mean([x[1] for x in positions]),len(positions)
63 | if count>MIN_EVENTS:
64 | documents = [
65 | {'feature':'avg_x','entity':p,'match':match,'value':int(x)},
66 | {'feature':'avg_y','entity':p,'match':match,'value':int(y)},
67 | {'feature':'n_events','entity':p,'match':match,'value':count},
68 |
69 | ]
70 | results+=documents
71 |
72 | return results
73 |
--------------------------------------------------------------------------------
/playerank/features/goalScoredFeatures.py:
--------------------------------------------------------------------------------
1 | from .abstract import Feature
2 | from .wyscoutEventsDefinition import *
3 | import json
4 | from collections import defaultdict
5 | import glob
6 |
7 |
8 | class goalScoredFeatures(Feature):
9 | """
10 | goals scored by each team in each match
11 | """
12 | def createFeature(self,matches_path,select = None):
13 | """
14 | stores qualityFeatures on database
15 | parameters:
16 | -matches_path: file path of matches file
17 | -select: function for filtering matches collection. Default: aggregate over all matches
18 |
19 | Output:
20 | list of documents in the format: match: matchId, entity: team, feature: feature, value: value
21 | """
22 | matches =[]
23 | for file in glob.glob("%s"%matches_path):
24 | data = json.load(open(file))
25 | matches += data
26 | print ("[GoalScored features] added %s matches"%len(data))
27 | if select:
28 | matches = filter(select,matches)
29 | result =[]
30 |
31 | for match in matches:
32 | if 'teamsData' in match:
33 | for team in match['teamsData']:
34 | document = {}
35 | document['match'] = match['wyId']
36 | document['entity'] = team
37 | document['feature'] = 'goal-scored'
38 | document['value'] = match['teamsData'][team]['score']
39 | result.append(document)
40 |
41 |
42 | return result
43 |
--------------------------------------------------------------------------------
/playerank/features/matchPlayedFeatures.py:
--------------------------------------------------------------------------------
1 | from .abstract import Feature
2 | from .wyscoutEventsDefinition import *
3 | import json
4 | import glob
5 |
6 |
7 | class matchPlayedFeatures(Feature):
8 |
9 | def createFeature(self,matches_path,players_file,select = None):
10 | """
11 | It computes, for each player and match, total time (in minutes) played,
12 | goals scored and
13 |
14 |
15 | Input:
16 | -matches_path: folder with json files corresponding to matches data
17 | -select: function for filtering matches collection. Default: aggregate over all matches
18 |
19 | Output:
20 |
21 | a collection of documents in the f
22 | ormat _id-> {'match': this.wyId, 'player' : player,
23 | 'name': 'minutesPlayed'|'team'|'goalScored'|'timestamp'},value: |;
24 |
25 | """
26 | players = json.load(open(players_file))
27 | # filtering out all the events from goalkeepers
28 | goalkeepers_ids = [player['wyId'] for player in players
29 | if player['role']['name']=='Goalkeeper']
30 | matches= []
31 | for file in glob.glob("%s"%matches_path):
32 | matches += json.load(open(file))
33 | if select:
34 | matches = list(filter(select,matches))
35 |
36 | print ("[matchPlayedFeatures] processing %s matches"%len(matches))
37 | result = []
38 | for match in matches:
39 | matchId= match['wyId']
40 | duration = 90
41 | if match['duration'] != 'Regular':
42 | duration = 120
43 |
44 | timestamp = match['dateutc']
45 |
46 | for team in match['teamsData']:
47 | minutes_played = {}
48 | goals_scored = {}
49 | if match['teamsData'][team]['hasFormation']==1 and 'substitutions' in match['teamsData'][team]['formation']:
50 | for sub in match['teamsData'][team]['formation']['substitutions']:
51 | if type(sub) == dict:
52 | minute = sub['minute']
53 | minutes_played[sub['playerOut']] = minute
54 | minutes_played[sub['playerIn']] = duration - minute
55 | if match['teamsData'][team]['hasFormation']==1 and 'lineup' in match['teamsData'][team]['formation']:
56 | for player in match['teamsData'][team]['formation']['lineup']:
57 | goals_scored[player['playerId']] = player['goals']
58 | if player['playerId'] not in minutes_played:
59 | #player not substituted
60 | minutes_played[player['playerId']] = duration
61 | if match['teamsData'][team]['hasFormation']==1 and 'bench' in match['teamsData'][team]['formation']:
62 | for player in match['teamsData'][team]['formation']['bench']:
63 | goals_scored[player['playerId']] = player['goals']
64 | if player['playerId'] not in minutes_played:
65 | #player not substituted
66 | minutes_played[player['playerId']] = duration
67 | for player,min in minutes_played.items():
68 | if player not in goalkeepers_ids:
69 | document = {'match':matchId,'entity':player,'feature':'minutesPlayed',
70 | 'value': min}
71 | result.append (document)
72 |
73 | for player,gs in goals_scored.items():
74 | if player not in goalkeepers_ids:
75 | try:
76 | gs = int(gs)
77 | except:
78 | gs = 0
79 | document = {'match':matchId,'entity':player,'feature':'goalScored',
80 | 'value': gs}
81 | result.append (document)
82 | ## adding timestamp and team for each player
83 | document = {'match':matchId,'entity':player,'feature':'timestamp',
84 | 'value': timestamp}
85 |
86 | result.append (document)
87 | ## adding timestamp and team for each player
88 | document = {'match':matchId,'entity':player,'feature':'team',
89 | 'value': team}
90 | result.append (document)
91 | print ("[matchPlayedFeatures] matches features computed. %s features processed"%(len(result)))
92 | return result
93 |
--------------------------------------------------------------------------------
/playerank/features/plainAggregation.py:
--------------------------------------------------------------------------------
1 | from .abstract import Aggregation
2 | from .wyscoutEventsDefinition import *
3 | import json
4 | import pandas as pd
5 | from collections import defaultdict
6 |
7 | class plainAggregation(Aggregation):
8 | """
9 | merge features for each player and return a data frame
10 | match -> team (or entity) -> feature (playerank, timestamp, team, etc..)
11 |
12 | """
13 | def set_features(self,collection_list):
14 | self.collections=collection_list
15 |
16 | def get_features(self):
17 | return self.collections
18 | def set_aggregated_collection(self, collection):
19 | self.aggregated_collection = collection
20 | def get_aggregated_collection(self):
21 | return self.aggregated_collection
22 | def aggregate(self, to_dataframe = False):
23 |
24 |
25 |
26 | ###
27 | # prior to aggregation, we merge all the features collections
28 | featdata = []
29 | for collection in self.collections:
30 | featdata+=collection
31 | print ("[plainAggregation] added %s features"%len(collection))
32 | """
33 | single stage: transform aggregated feature per match into a collection of the form:
34 | match -> player -> {feature:value}
35 | """
36 | aggregated = defaultdict(lambda : defaultdict(dict))
37 |
38 |
39 |
40 | for document in featdata:
41 | match = document['match']
42 | entity = int(document['entity'])
43 | feature = document['feature']
44 | value = document['value']
45 | aggregated[match][entity].update({feature:value})
46 |
47 | result = []
48 |
49 | for match in aggregated:
50 | for entity in aggregated[match]:
51 |
52 | document = {'match':match,'entity':entity}
53 | document.update(aggregated[match][entity])
54 | result.append(document)
55 |
56 |
57 | print ("[plainAggregation] matches aggregated: %s"%len(result))
58 | if to_dataframe :
59 |
60 |
61 | df=pd.DataFrame(result).fillna(0)
62 | return df
63 | else:
64 | return result
65 |
--------------------------------------------------------------------------------
/playerank/features/playerankFeatures.py:
--------------------------------------------------------------------------------
1 | from .abstract import Feature
2 | from .wyscoutEventsDefinition import *
3 | from collections import defaultdict
4 | import json
5 |
6 |
7 | class playerankFeatures(Feature):
8 | """
9 | Given a method to aggregate features and the corresponding weight of each feature,
10 | it computes playerank for each player and match
11 | input:
12 | -- features weights, computed within learning phase of playerank framework
13 |
14 | output:
15 | -- a collection of json documents in the format:
16 | {match:match_id, name: 'playerankScore', player:player_id,
17 | value: playerankScore(float)}
18 | """
19 | def set_features(self,collection_list):
20 | self.collections=collection_list
21 |
22 | def get_features(self):
23 | return self.collections
24 | def createFeature(self,weights_file):
25 |
26 | weights=json.load(open(weights_file))
27 | playerank_scores = defaultdict(lambda: defaultdict(float))
28 | for feature_list in self.get_features():
29 | for f in feature_list:
30 | if f['feature'] in weights: #check if feature have not been filtered
31 | playerank_scores[f['match']][f['entity']]+=f['value']*weights[f['feature']]
32 |
33 | result = []
34 | for match in playerank_scores:
35 | for player in playerank_scores[match]:
36 | document = {
37 | 'match': match,
38 | 'entity': player,
39 | 'feature' : 'playerankScore',
40 | 'value' : float(playerank_scores[match][player])
41 | }
42 | result.append(document)
43 | print ("[playerankFeatures] playerank scores computed. %s performance processed"%len(result))
44 | return result
45 |
--------------------------------------------------------------------------------
/playerank/features/qualityFeatures.py:
--------------------------------------------------------------------------------
1 | from .abstract import Feature
2 | from .wyscoutEventsDefinition import *
3 | import json
4 | from collections import defaultdict
5 | import glob
6 |
7 |
8 | class qualityFeatures(Feature):
9 | """
10 | Quality features are the count of events with outcomes.
11 | E.g.
12 | - number of accurate passes
13 | - number of wrong passes
14 | ...
15 | """
16 | def createFeature(self,events_path,players_file,entity = 'team',select = None):
17 | """
18 | compute qualityFeatures
19 | parameters:
20 | -events_path: file path of events file
21 | -select: function for filtering events collection. Default: aggregate over all events
22 | -entity: it could either 'team' or 'player'. It selects the aggregation for qualityFeatures among teams or players qualityfeatures
23 |
24 | Output:
25 | list of dictionaries in the format: matchId -> entity -> feature -> value
26 | """
27 | event2subevent2outcome={
28 | 1:{10: [1801, 1802],
29 | 11: [1801, 1802],
30 | 12: [1801, 1802],
31 | 13: [1801, 1802]},
32 | 2: [1702, 1703, 1701], #fouls aggregated into macroevent
33 |
34 | 3 :{30: [1801, 1802],
35 | 31: [1801, 1802],
36 | 32: [1801, 1802],
37 | 33: [1801, 1802],
38 | 34: [1801, 1802],
39 | 35: [1802],
40 | 36: [1801, 1802]},
41 | 4: {40: [1801, 1802]},
42 | 6: {60: []},
43 | 7: {70: [1801, 1802,101],
44 | 71: [1801, 1802,101],
45 | 72: [1401, 1302, 201, 1901, 1301, 2001, 301]},
46 | 8: {80: [1801, 1802,302,301],
47 | 81: [1801, 1802,302,301],
48 | 82: [1801, 1802,302,301],
49 | 83: [1801, 1802,302,301],
50 | 84: [1801, 1802,302,301],
51 | 85: [1801, 1802,302,301],
52 | 86: [1801, 1802,302,301]},
53 | #90: [1801, 1802],
54 | #91: [1801, 1802],
55 | 10: {100: [1801, 1802]}}
56 |
57 | aggregated_features = defaultdict(lambda : defaultdict(lambda: defaultdict(int)))
58 |
59 | players = json.load(open(players_file))
60 | # filtering out all the events from goalkeepers
61 | goalkeepers_ids = [player['wyId'] for player in players
62 | if player['role']['name']=='Goalkeeper']
63 |
64 | events = []
65 | for file in glob.glob("%s"%events_path):
66 | data = json.load(open(file))
67 | if select:
68 | data = list(filter(select,data))
69 | events += list(filter(lambda x: x['matchPeriod'] in ['1H','2H'] and x['playerId'] not in goalkeepers_ids,data)) #excluding penalties events
70 | print ("[qualityFeatures] added %s events from %s"%(len(data), file))
71 |
72 |
73 | for evt in events:
74 | if evt['eventId'] in event2subevent2outcome:
75 | ent = evt['teamId'] #default
76 | if entity == 'player':
77 | ent = evt['playerId']
78 |
79 | evtName =evt['eventName']
80 |
81 | if type(event2subevent2outcome[evt['eventId']]) == dict:
82 | #hierarchy as event->subevent->tags
83 | if evt['subEventId'] not in event2subevent2outcome[evt['eventId']]:
84 | #malformed events
85 | continue #skip to next event
86 | tags = [x for x in evt['tags'] if x['id'] in event2subevent2outcome[evt['eventId']][evt['subEventId']]]
87 |
88 | evtName+="-%s"%evt['subEventName']
89 | else:
90 | #hierarchy as event->tags
91 | tags = [x for x in evt['tags'] if x['id'] in event2subevent2outcome[evt['eventId']]]
92 |
93 | if len(tags)>0:
94 | for tag in tags:
95 | aggregated_features[evt['matchId']][ent]["%s-%s"%(evtName,tag2name[tag['id']])]+=1
96 |
97 | else:
98 | aggregated_features[evt['matchId']][ent]["%s"%(evtName)]+=1
99 | result =[]
100 | for match in aggregated_features:
101 | for entity in aggregated_features[match]:
102 | for feature in aggregated_features[match][entity]:
103 | document = {}
104 | document['match'] = match
105 | document['entity'] = entity
106 | document['feature'] = feature
107 | document['value'] = aggregated_features[match][entity][feature]
108 | result.append(document)
109 |
110 | return result
111 |
--------------------------------------------------------------------------------
/playerank/features/relativeAggregation.py:
--------------------------------------------------------------------------------
1 | from .abstract import Aggregation
2 | from .wyscoutEventsDefinition import *
3 | import json
4 | import pandas as pd
5 | from collections import defaultdict
6 |
7 | class relativeAggregation(Aggregation):
8 | """
9 | compute relative feature for each match
10 | match -> team (or entity) -> featureTeam - featureOpponents
11 | """
12 | def set_features(self,collection_list):
13 | self.collections=collection_list
14 |
15 | def get_features(self):
16 | return self.collections
17 | def aggregate(self,to_dataframe = False):
18 | """
19 | compute relative aggregation: give a set of features it compute the A-B
20 | value for each entity in each team.
21 | Ex:
22 | passes for team A in match 111 : 500
23 | passes for team B in match 111 : 300
24 | lead to output:
25 | {'passes': 200}
26 |
27 | this method is involved for feature weight estimation phase of playerank framework.
28 | param
29 |
30 | - to_dataframe : return a dataframe instead of a list of documents
31 |
32 | """
33 |
34 | featdata = []
35 | for collection in self.collections:
36 | featdata+=collection
37 | print ("[relativeAggregation] added %s features"%len(collection))
38 | #selecting teamA e teamB as teams[0] and team[1]
39 | aggregated = defaultdict(lambda : defaultdict(lambda: defaultdict(int)))
40 | #format of aggregation: match,team,feature,valueTeam-valueOppositor
41 | result = []
42 | for document in featdata:
43 | match = document['match']
44 | entity = int(document['entity'])
45 | feature = document['feature']
46 | value = document['value']
47 | aggregated[match][entity][feature] = int(value)
48 |
49 |
50 | for match in aggregated:
51 | for entity in aggregated[match]:
52 | for feature in aggregated[match][entity]:
53 | opponents = [x for x in aggregated[match] if x!=entity][0]
54 |
55 | result_doc = {}
56 | result_doc['match'] = match
57 | result_doc['entity'] = entity
58 | result_doc['name'] = feature
59 | value = aggregated[match][entity][feature]
60 | if feature in aggregated[match][opponents]:
61 | result_doc['value'] = value - aggregated[match][opponents][feature]
62 | else:
63 | result_doc['value'] = value
64 | result.append(result_doc)
65 |
66 | if to_dataframe :
67 |
68 | featlist = defaultdict(dict)
69 | for data in result:
70 |
71 | featlist["%s-%s"%(data['match'],data['entity'])].update({data['name']:data['value']})
72 | print ("[relativeAggregation] matches aggregated: %s"%len(featlist.keys()))
73 |
74 | df=pd.DataFrame(list(featlist.values())).fillna(0)
75 |
76 | return df
77 | else:
78 | return result
79 |
--------------------------------------------------------------------------------
/playerank/features/requirements.txt:
--------------------------------------------------------------------------------
1 | pandas==0.23.4
2 | numpy==1.11.0
3 |
--------------------------------------------------------------------------------
/playerank/features/roleFeatures.py:
--------------------------------------------------------------------------------
1 | from .abstract import Feature
2 | from .wyscoutEventsDefinition import *
3 | import json
4 | from collections import defaultdict
5 |
6 | class roleFeatures(Feature):
7 |
8 | def set_features(self,collection_list):
9 | self.collections=collection_list
10 |
11 | def get_features(self):
12 | return self.collections
13 | def createFeature(self,matrix_role_file):
14 | """
15 | Given the matrix for roles, it computes, for each player and match,
16 | the role of a player
17 |
18 | A role matrix is a data structure where, given x and y (between 0 and 100),
19 | it contains the correspinding roles for a player having center of performance = x,y
20 | Role_matrix is computed within learning phase of playerank framework
21 |
22 | Input:
23 | role_matrix: file patch for dictionary in the format x->y->role
24 | feature_lists: lists of features for each player in each match, describing
25 | players' average position
26 |
27 | """
28 |
29 | role_matrix = json.load(open(matrix_role_file,"r"))
30 | roles = defaultdict(lambda: defaultdict(dict))
31 | for feature_list in self.get_features():
32 | for f in feature_list:
33 | roles[f['match']][f['entity']].update({f['feature']: f['value']})
34 | ## for each match and player we have
35 | ## avg_x,avg_y,n_events
36 | results = []
37 | for match in roles:
38 | for player in roles[match]:
39 | match_data = roles[match][player]
40 | role_label = role_matrix[str(match_data['avg_x'])][str(match_data['avg_y'])]
41 | #note: string conversion because loading role matrix from file does set everything as string
42 | document = {'match':match, 'entity':player, 'feature':'roleCluster','value':role_label}
43 | results.append(document)
44 | return results
45 |
--------------------------------------------------------------------------------
/playerank/features/wyscoutEventsDefinition.py:
--------------------------------------------------------------------------------
1 | ## events name definition for wyscout data
2 |
3 |
4 | ## CONSTANTS
5 | VICTORY = 1
6 | DEFEAT = 2
7 | DRAW = 0
8 | DRAW_DEFEAT = 0
9 | VICTORY_DRAW = 1
10 |
11 | ## CONSTANTS IDENTIFYING MACRO EVENTS
12 | DUEL = 1
13 | FOUL = 2
14 | FREE_KICK = 3
15 | GOALKEEPER_LEAVING_LINE = 4
16 | INTERRUPTION = 5
17 | OFFSIDE = 6
18 | OTHERS = 7
19 | PASS = 8
20 | SAVE = 9
21 | SHOT = 10
22 |
23 | macroevent2name = {DUEL: 'duel', FOUL: 'foul', FREE_KICK: 'free kick',
24 | GOALKEEPER_LEAVING_LINE: 'goalkeeper leaving line',
25 | INTERRUPTION: 'interruption',
26 | OFFSIDE: 'offside', OTHERS: 'others on the ball',
27 | PASS: 'pass', SAVE: 'save attempt', SHOT: 'shot'}
28 |
29 | macroevent2positions_index = {DUEL: 0, FOUL: 0, FREE_KICK: 0,
30 | GOALKEEPER_LEAVING_LINE: 0, INTERRUPTION: 0,
31 | OFFSIDE: 0, OTHERS: 0, PASS: 0, SAVE: 1, SHOT: 0
32 | }
33 |
34 | ## CONSTANTS IDENTIFYING SUB EVENTS
35 | # duels
36 | AIR_DUEL = 10
37 | GROUND_ATTACKING_DUEL = 11
38 | GROUND_DEFENDING_DUEL = 12
39 | GROUND_LOOSE_BALL_DUEL = 13
40 |
41 | # fouls
42 | NORMAL_FOUL = 20
43 | HAND_FOUL = 21
44 | LATE_CARD_FOUL = 22
45 | OUT_OF_GAME_FOUL = 23
46 | PROTEST_FOUL = 24
47 | SIMULATION_FOUL = 25
48 | TIME_LOST_FOUL = 26
49 | VIOLENT_FOUL = 27
50 |
51 | #free kicks
52 | CORNER_FREE_KICK = 30
53 | NORMAL_FREE_KICK = 31
54 | CROSS_FREE_KICK = 32
55 | SHOT_FREE_KICK = 33
56 | GOAL_FREE_KICK = 34
57 | PENALTY_FREE_KICK = 35
58 | THROW_IN_FREE_KICK = 36
59 | # goalkeeping leaving line
60 | GOALKEEPING_LEAVING_LINE = 40
61 | # interruption
62 | BALL_OUT_INTERRUPTION = 50
63 | WHISTLE_INTERRUPTION = 51
64 | # offside
65 | NORMAL_OFFSIDE=60
66 | # others on the ball
67 | ACCELERATION_OTHERS = 70
68 | CLEARANCE_OTHERS = 71
69 | TOUCH_OTHERS = 72
70 | # pass
71 | CROSS_PASS = 80
72 | HAND_PASS = 81
73 | HEAD_PASS = 82
74 | HIGH_PASS = 83
75 | LAUNCH_PASS = 84
76 | SIMPLE_PASS = 85
77 | SMART_PASS = 86
78 | # save
79 | REFLEXES_SAVE = 90
80 | NORMAL_SAVE = 91
81 | # shot
82 | NORMAL_SHOT = 100
83 |
84 | macroevent2subevents = {DUEL: [AIR_DUEL, GROUND_ATTACKING_DUEL, GROUND_DEFENDING_DUEL, GROUND_LOOSE_BALL_DUEL],
85 | FOUL: [NORMAL_FOUL, HAND_FOUL, LATE_CARD_FOUL, OUT_OF_GAME_FOUL, PROTEST_FOUL, SIMULATION_FOUL,
86 | TIME_LOST_FOUL, VIOLENT_FOUL],
87 | FREE_KICK: [CORNER_FREE_KICK, NORMAL_FREE_KICK, CROSS_FREE_KICK, SHOT_FREE_KICK, GOAL_FREE_KICK,
88 | PENALTY_FREE_KICK, THROW_IN_FREE_KICK],
89 | INTERRUPTION: [BALL_OUT_INTERRUPTION, WHISTLE_INTERRUPTION],
90 | OFFSIDE: [NORMAL_OFFSIDE],
91 | OTHERS: [ACCELERATION_OTHERS, CLEARANCE_OTHERS, TOUCH_OTHERS],
92 | PASS: [CROSS_PASS, HAND_PASS, HEAD_PASS, HIGH_PASS, LAUNCH_PASS, SIMPLE_PASS, SMART_PASS],
93 | SAVE: [REFLEXES_SAVE, NORMAL_SAVE],
94 | SHOT: [NORMAL_SHOT]
95 | }
96 |
97 | subevents2name = {
98 |
99 | AIR_DUEL:'air duel', GROUND_ATTACKING_DUEL:'ground attacking duel',
100 | GROUND_DEFENDING_DUEL:'ground defending duel', GROUND_LOOSE_BALL_DUEL:'ground loose ball duel',
101 |
102 | NORMAL_FOUL:'normal foul', HAND_FOUL:'hand foul', LATE_CARD_FOUL:'late card foul',
103 | OUT_OF_GAME_FOUL:'out of game foul', PROTEST_FOUL:'protest foul', SIMULATION_FOUL:'simulation foul',
104 | TIME_LOST_FOUL:'time lost foul', OUT_OF_GAME_FOUL:'out of game foul', PROTEST_FOUL:'protest foul',
105 | SIMULATION_FOUL:'simulation foul', TIME_LOST_FOUL:'time lost foul', VIOLENT_FOUL:'violent foul',
106 |
107 | CORNER_FREE_KICK:'corner free kick', NORMAL_FREE_KICK:'normal free kick', CROSS_FREE_KICK:'cross free kick',
108 | SHOT_FREE_KICK:'shot free kick', GOAL_FREE_KICK:'goal free kick', PENALTY_FREE_KICK:'penalty free kick',
109 | THROW_IN_FREE_KICK:'throw in free kick',
110 |
111 | BALL_OUT_INTERRUPTION:'ball out interruption', WHISTLE_INTERRUPTION:'whistle interruption',
112 |
113 | ACCELERATION_OTHERS:'accelleration', CLEARANCE_OTHERS:'clearance', TOUCH_OTHERS:'touch',
114 |
115 | CROSS_PASS:'cross pass', HAND_PASS:'hand pass', HEAD_PASS:'head pass', HIGH_PASS:'high pass',
116 | LAUNCH_PASS:'launch pass', SIMPLE_PASS:'simple pass', SMART_PASS:'smart pass',
117 |
118 | REFLEXES_SAVE:'reflexes save', NORMAL_SAVE:'normal save',
119 |
120 | NORMAL_SHOT: 'shot',
121 |
122 | NORMAL_OFFSIDE: 'offside',
123 |
124 | GOALKEEPING_LEAVING_LINE: 'goalkeeping leaving line'
125 | }
126 |
127 |
128 | ## CONSTANTS FOR TAGS
129 | GOAL_TAG = 101
130 | OWN_GOAL_TAG = 102
131 | ASSIST_TAG = 301
132 | KEY_PASS_TAG = 302
133 | COUNTER_ATTACK_TAG = 1901
134 | LEFT_FOOT_TAG = 401
135 | RIGHT_FOOT_TAG = 402
136 | HEAD_BODY_TAG = 403
137 | DIRECT_TAG = 1101
138 | INDIRECT_TAG = 1102
139 | DANGEROUS_BALL_LOST_TAG = 2001
140 | BLOCKED_TAG = 2101
141 | HIGH_TAG = 801
142 | LOW_TAG = 802
143 | INTERCEPTION_TAG = 1401
144 | CLEARANCE_TAG = 1501
145 | OPPORTUNITY_TAG = 201
146 | FEINT_TAG = 1301
147 | MISSED_BALL_TAG = 1302
148 | FREE_SPACE_RIGHT_TAG = 501
149 | FREE_SPACE_LEFT_TAG = 502
150 | TAKE_ON_LEFT_TAG = 503
151 | TAKE_ON_RIGHT_TAG = 504
152 | SLIDING_TACKLE_TAG = 1601
153 | ANTICIPATED_TAG = 601
154 | ANTICIPATION_TAG = 602
155 | RED_CARD_TAG = 1701
156 | YELLOW_CARD_TAG = 1702
157 | SECOND_YELLOW_CARD_TAG = 1703
158 | THROUGH_TAG = 901
159 | FAIRPLAY_TAG = 1001
160 | LOST_TAG = 701
161 | NEUTRAL_TAG = 702
162 | WON_TAG = 703
163 | ACCURATE_TAG = 1801
164 | NOT_ACCURATE_TAG = 1802
165 | NO_TAG = -1
166 | tag2name = {
167 | GOAL_TAG: 'GOAL', OWN_GOAL_TAG: 'OWN GOAL', ASSIST_TAG: 'assist', KEY_PASS_TAG:'key pass',
168 | COUNTER_ATTACK_TAG:'counter attack', LEFT_FOOT_TAG:'left foot', RIGHT_FOOT_TAG:'right foot',
169 | HEAD_BODY_TAG:'head_body', DIRECT_TAG:'direct', INDIRECT_TAG:'indirect',
170 | DANGEROUS_BALL_LOST_TAG:'dangerous ball lost', BLOCKED_TAG:'blocked', HIGH_TAG:'high',
171 | LOW_TAG:'low', INTERCEPTION_TAG: 'interception', CLEARANCE_TAG:'clearance', OPPORTUNITY_TAG:'opportunity',
172 | FEINT_TAG:'feint', MISSED_BALL_TAG:'missed ball', FREE_SPACE_RIGHT_TAG:'free space right',
173 | FREE_SPACE_LEFT_TAG:'free space left', TAKE_ON_LEFT_TAG:'takeon left', TAKE_ON_RIGHT_TAG:'takeon right',
174 | SLIDING_TACKLE_TAG:'sliding tackle', ANTICIPATED_TAG:'anticipated', ANTICIPATION_TAG:'anticipation',
175 | RED_CARD_TAG:'red card', YELLOW_CARD_TAG: 'yellow card', SECOND_YELLOW_CARD_TAG: 'second yellow card',
176 | THROUGH_TAG:'through', FAIRPLAY_TAG:'fairplay', LOST_TAG:'lost', NEUTRAL_TAG:'neutral',
177 | WON_TAG:'won', ACCURATE_TAG:'accurate', NOT_ACCURATE_TAG:'not accurate', NO_TAG: 'no tag'
178 | }
179 |
180 |
181 | subevent2tags = {
182 |
183 | ## OFFSIDE
184 | NORMAL_OFFSIDE: [],
185 |
186 | ## LEAVING LINE
187 | GOALKEEPING_LEAVING_LINE: [],
188 |
189 | ## SHOT
190 | NORMAL_SHOT:
191 | [LEFT_FOOT_TAG, OPPORTUNITY_TAG, NOT_ACCURATE_TAG, ACCURATE_TAG, HEAD_BODY_TAG, RIGHT_FOOT_TAG, BLOCKED_TAG, GOAL_TAG, INTERCEPTION_TAG, COUNTER_ATTACK_TAG, ASSIST_TAG],
192 |
193 | ##### DUELS ########
194 |
195 | GROUND_ATTACKING_DUEL:
196 | [LOST_TAG, NOT_ACCURATE_TAG, WON_TAG, ACCURATE_TAG, TAKE_ON_RIGHT_TAG, ANTICIPATION_TAG, FREE_SPACE_LEFT_TAG, TAKE_ON_LEFT_TAG, NEUTRAL_TAG, FREE_SPACE_RIGHT_TAG, DANGEROUS_BALL_LOST_TAG, INTERCEPTION_TAG, COUNTER_ATTACK_TAG, SLIDING_TACKLE_TAG, OPPORTUNITY_TAG],
197 |
198 | AIR_DUEL:
199 | [LOST_TAG, NOT_ACCURATE_TAG, WON_TAG, ACCURATE_TAG, NEUTRAL_TAG, COUNTER_ATTACK_TAG, KEY_PASS_TAG, ASSIST_TAG],
200 |
201 | GROUND_LOOSE_BALL_DUEL:
202 | [LOST_TAG, NOT_ACCURATE_TAG, WON_TAG, ACCURATE_TAG, NEUTRAL_TAG, SLIDING_TACKLE_TAG, COUNTER_ATTACK_TAG, INTERCEPTION_TAG, DANGEROUS_BALL_LOST_TAG],
203 |
204 | GROUND_DEFENDING_DUEL: [SLIDING_TACKLE_TAG, WON_TAG, ACCURATE_TAG, LOST_TAG, NOT_ACCURATE_TAG, TAKE_ON_LEFT_TAG, ANTICIPATED_TAG, FREE_SPACE_RIGHT_TAG, TAKE_ON_RIGHT_TAG, NEUTRAL_TAG, FREE_SPACE_LEFT_TAG, COUNTER_ATTACK_TAG],
205 |
206 | ######### FREE KICKS ###########
207 |
208 | SHOT_FREE_KICK:
209 | [RIGHT_FOOT_TAG, DIRECT_TAG, OPPORTUNITY_TAG, ACCURATE_TAG, BLOCKED_TAG, NOT_ACCURATE_TAG, LEFT_FOOT_TAG, INDIRECT_TAG, GOAL_TAG],
210 |
211 | CROSS_FREE_KICK:
212 | [HIGH_TAG, NOT_ACCURATE_TAG, ASSIST_TAG, ACCURATE_TAG, KEY_PASS_TAG],
213 |
214 | NORMAL_FREE_KICK:
215 | [ACCURATE_TAG, NOT_ACCURATE_TAG, KEY_PASS_TAG],
216 |
217 | CORNER_FREE_KICK:
218 | [HIGH_TAG, NOT_ACCURATE_TAG, ACCURATE_TAG, KEY_PASS_TAG, OPPORTUNITY_TAG, ASSIST_TAG],
219 |
220 | THROW_IN_FREE_KICK:
221 | [ACCURATE_TAG, NOT_ACCURATE_TAG, FAIRPLAY_TAG],
222 |
223 | PENALTY_FREE_KICK:
224 | [GOAL_TAG, RIGHT_FOOT_TAG, ACCURATE_TAG, LEFT_FOOT_TAG, NOT_ACCURATE_TAG],
225 |
226 | GOAL_FREE_KICK: [NO_TAG],
227 |
228 |
229 | #### FOULS ####
230 |
231 | PROTEST_FOUL:
232 | [YELLOW_CARD_TAG, RED_CARD_TAG, SECOND_YELLOW_CARD_TAG],
233 |
234 | SIMULATION_FOUL:
235 | [YELLOW_CARD_TAG, SECOND_YELLOW_CARD_TAG, RED_CARD_TAG],
236 |
237 | TIME_LOST_FOUL:
238 | [YELLOW_CARD_TAG, SECOND_YELLOW_CARD_TAG, RED_CARD_TAG],
239 |
240 | VIOLENT_FOUL:
241 | [YELLOW_CARD_TAG, SECOND_YELLOW_CARD_TAG, RED_CARD_TAG],
242 |
243 | NORMAL_FOUL:
244 | [YELLOW_CARD_TAG, SECOND_YELLOW_CARD_TAG, RED_CARD_TAG],
245 |
246 | HAND_FOUL:
247 | [YELLOW_CARD_TAG, SECOND_YELLOW_CARD_TAG, RED_CARD_TAG],
248 |
249 | LATE_CARD_FOUL:
250 | [YELLOW_CARD_TAG, SECOND_YELLOW_CARD_TAG, RED_CARD_TAG],
251 |
252 | OUT_OF_GAME_FOUL:
253 | [YELLOW_CARD_TAG, SECOND_YELLOW_CARD_TAG, RED_CARD_TAG],
254 |
255 | #WHISTLE_INTERRUPTION: [],
256 |
257 | #BALL_OUT_INTERRUPTION: [],
258 |
259 | ### TOUCHS ####
260 |
261 | TOUCH_OTHERS:
262 | [INTERCEPTION_TAG, MISSED_BALL_TAG, OPPORTUNITY_TAG, COUNTER_ATTACK_TAG, FEINT_TAG, DANGEROUS_BALL_LOST_TAG, ASSIST_TAG, OWN_GOAL_TAG],
263 |
264 | CLEARANCE_OTHERS:
265 | [INTERCEPTION_TAG, NOT_ACCURATE_TAG, ACCURATE_TAG, COUNTER_ATTACK_TAG, FAIRPLAY_TAG, MISSED_BALL_TAG, OWN_GOAL_TAG],
266 |
267 | ACCELERATION_OTHERS:
268 | [NOT_ACCURATE_TAG, ACCURATE_TAG, COUNTER_ATTACK_TAG, INTERCEPTION_TAG],
269 |
270 | ##### SAVES ####
271 |
272 | NORMAL_SAVE:
273 | [ACCURATE_TAG, GOAL_TAG, NOT_ACCURATE_TAG, COUNTER_ATTACK_TAG],
274 |
275 | REFLEXES_SAVE:
276 | [ACCURATE_TAG, GOAL_TAG, NOT_ACCURATE_TAG, COUNTER_ATTACK_TAG],
277 |
278 |
279 | #### PASSES #####################
280 |
281 | HEAD_PASS:
282 | [INTERCEPTION_TAG, ACCURATE_TAG, NOT_ACCURATE_TAG, ASSIST_TAG, COUNTER_ATTACK_TAG, KEY_PASS_TAG, DANGEROUS_BALL_LOST_TAG],
283 |
284 | HIGH_PASS:
285 | [NOT_ACCURATE_TAG, ACCURATE_TAG, KEY_PASS_TAG, THROUGH_TAG, COUNTER_ATTACK_TAG, INTERCEPTION_TAG, ASSIST_TAG],
286 |
287 | CROSS_PASS:
288 | [LEFT_FOOT_TAG, HIGH_TAG, NOT_ACCURATE_TAG, RIGHT_FOOT_TAG, ACCURATE_TAG, KEY_PASS_TAG, ASSIST_TAG, BLOCKED_TAG, COUNTER_ATTACK_TAG, INTERCEPTION_TAG],
289 |
290 | HAND_PASS:
291 | [ACCURATE_TAG, NOT_ACCURATE_TAG, INTERCEPTION_TAG, COUNTER_ATTACK_TAG, FAIRPLAY_TAG],
292 |
293 | SMART_PASS:
294 | [NOT_ACCURATE_TAG, ACCURATE_TAG, THROUGH_TAG, KEY_PASS_TAG, ASSIST_TAG, COUNTER_ATTACK_TAG, INTERCEPTION_TAG],
295 |
296 | LAUNCH_PASS:
297 | [ACCURATE_TAG, NOT_ACCURATE_TAG, INTERCEPTION_TAG, FAIRPLAY_TAG, DANGEROUS_BALL_LOST_TAG],
298 |
299 | SIMPLE_PASS: [ACCURATE_TAG, NOT_ACCURATE_TAG, INTERCEPTION_TAG, COUNTER_ATTACK_TAG, KEY_PASS_TAG, FAIRPLAY_TAG, DANGEROUS_BALL_LOST_TAG, ASSIST_TAG, OWN_GOAL_TAG]
300 |
301 | }
302 |
303 | #subevent2outcome={PROTEST_FOUL:
304 | # subevent2tags[PROTEST_FOUL]+ [NO_TAG],
305 |
306 | #SIMULATION_FOUL:
307 | # subevent2tags[SIMULATION_FOUL]+[ NO_TAG],
308 |
309 | #TIME_LOST_FOUL:
310 | # subevent2tags[TIME_LOST_FOUL]+[NO_TAG],
311 |
312 | #VIOLENT_FOUL:
313 | # subevent2tags[VIOLENT_FOUL]+[NO_TAG],
314 |
315 | #NORMAL_FOUL:
316 | # subevent2tags[NORMAL_FOUL]+[NO_TAG],
317 |
318 | #HAND_FOUL:
319 | # subevent2tags[HAND_FOUL]+[NO_TAG ],
320 |
321 | #LATE_CARD_FOUL:
322 | # subevent2tags[LATE_CARD_FOUL]+[ NO_TAG],
323 |
324 | #OUT_OF_GAME_FOUL:
325 | # subevent2tags[OUT_OF_GAME_FOUL]+[NO_TAG],
326 | # 'default':[ACCURATE_TAG, NOT_ACCURATE_TAG],
327 | # TOUCH_OTHERS:subevent2tags[TOUCH_OTHERS],
328 | #}
329 |
330 |
331 |
332 |
333 |
--------------------------------------------------------------------------------
/playerank/models/Clusterer.py:
--------------------------------------------------------------------------------
1 | # /usr/local/bin/python
2 | from collections import defaultdict, OrderedDict, Counter
3 | import numpy as np
4 | from scipy import optimize
5 | from scipy.stats import gaussian_kde
6 | #from utils import *
7 | from sklearn.base import BaseEstimator
8 | from sklearn.svm import LinearSVC
9 | from sklearn.model_selection import cross_val_score
10 | from sklearn.dummy import DummyClassifier
11 | from sklearn.feature_selection import VarianceThreshold
12 | from sklearn.model_selection import GridSearchCV, StratifiedKFold
13 | from sklearn.feature_selection import RFECV
14 | from scipy.spatial.distance import euclidean
15 | from sklearn.preprocessing import StandardScaler, LabelEncoder
16 | from sklearn.metrics import silhouette_score, silhouette_samples
17 | from sklearn.cluster import KMeans, MiniBatchKMeans
18 | from sklearn.base import BaseEstimator, ClusterMixin
19 | from joblib import Parallel, delayed
20 |
21 | from sklearn.metrics.pairwise import pairwise_distances
22 | from itertools import combinations
23 | from sklearn.utils import check_random_state
24 | from sklearn.preprocessing import MinMaxScaler
25 | import json
26 |
27 | def scalable_silhouette_score(X, labels, metric='euclidean', sample_size=None,
28 | random_state=None, n_jobs=1, **kwds):
29 | """
30 | Compute the mean Silhouette Coefficient of all samples.
31 | The Silhouette Coefficient is compute using the mean intra-cluster distance (a)
32 | and the mean nearest-cluster distance (b) for each sample.
33 |
34 | The Silhouette Coefficient for a sample is $(b - a) / max(a, b)$.
35 | To clarify, b is the distance between a sample and the nearest cluster
36 | that b is not a part of.
37 |
38 | This function returns the mean Silhoeutte Coefficient over all samples.
39 | To obtain the values for each sample, it uses silhouette_samples.
40 |
41 | The best value is 1 and the worst value is -1. Values near 0 indicate
42 | overlapping clusters. Negative values generally indicate that a sample has
43 | been assigned to the wrong cluster, as a different cluster is more similar.
44 |
45 | Parameters
46 | ----------
47 | X : array [n_samples_a, n_features]
48 | the Feature array.
49 |
50 | labels : array, shape = [n_samples]
51 | label values for each sample
52 |
53 | metric : string, or callable
54 | The metric to use when calculating distance between instances in a
55 | feature array. If metric is a string, it must be one of the options
56 | allowed by metrics.pairwise.pairwise_distances. If X is the distance
57 | array itself, use "precomputed" as the metric.
58 |
59 | sample_size : int or None
60 | The size of the sample to use when computing the Silhouette
61 | Coefficient. If sample_size is None, no sampling is used.
62 |
63 | random_state : integer or numpy.RandomState, optional
64 | The generator used to initialize the centers. If an integer is
65 | given, it fixes the seed. Defaults to the global numpy random
66 | number generator.
67 |
68 | **kwds : optional keyword parameters
69 | Any further parameters are passed directly to the distance function.
70 | If using a scipy.spatial.distance metric, the parameters are still
71 | metric dependent. See the scipy docs for usage examples.
72 |
73 | Returns
74 | -------
75 | silhouette : float
76 | the Mean Silhouette Coefficient for all samples.
77 |
78 | References
79 | ----------
80 | Peter J. Rousseeuw (1987). "Silhouettes: a Graphical Aid to the
81 | Interpretation and Validation of Cluster Analysis". Computational
82 | and Applied Mathematics 20: 53-65. doi:10.1016/0377-0427(87)90125-7.
83 | http://en.wikipedia.org/wiki/Silhouette_(clustering)
84 | """
85 | if sample_size is not None:
86 | random_state = check_random_state(random_state)
87 | indices = random_state.permutation(X.shape[0])[:sample_size]
88 | if metric == "precomputed":
89 | raise ValueError('Distance matrix cannot be precomputed')
90 | else:
91 | X, labels = X[indices], labels[indices]
92 |
93 | return np.mean(scalable_silhouette_samples(
94 | X, labels, metric=metric, n_jobs=n_jobs, **kwds))
95 |
96 |
97 | def scalable_silhouette_samples(X, labels, metric='euclidean', n_jobs=1, **kwds):
98 | """
99 | Compute the Silhouette Coefficient for each sample. The Silhoeutte Coefficient
100 | is a measure of how well samples are clustered with samples that are similar to themselves.
101 | Clustering models with a high Silhouette Coefficient are said to be dense,
102 | where samples in the same cluster are similar to each other, and well separated,
103 | where samples in different clusters are not very similar to each other.
104 |
105 | The Silhouette Coefficient is calculated using the mean intra-cluster
106 | distance (a) and the mean nearest-cluster distance (b) for each sample.
107 |
108 | The Silhouette Coefficient for a sample is $(b - a) / max(a, b)$.
109 | This function returns the Silhoeutte Coefficient for each sample.
110 | The best value is 1 and the worst value is -1. Values near 0 indicate
111 | overlapping clusters.
112 |
113 | Parameters
114 | ----------
115 | X : array [n_samples_a, n_features]
116 | Feature array.
117 |
118 | labels : array, shape = [n_samples]
119 | label values for each sample
120 |
121 | metric : string, or callable
122 | The metric to use when calculating distance between instances in a
123 | feature array. If metric is a string, it must be one of the options
124 | allowed by metrics.pairwise.pairwise_distances. If X is the distance
125 | array itself, use "precomputed" as the metric.
126 |
127 | **kwds : optional keyword parameters
128 | Any further parameters are passed directly to the distance function.
129 | If using a scipy.spatial.distance metric, the parameters are still
130 | metric dependent. See the scipy docs for usage examples.
131 |
132 | Returns
133 | -------
134 | silhouette : array, shape = [n_samples]
135 | Silhouette Coefficient for each samples.
136 |
137 | References
138 | ----------
139 | Peter J. Rousseeuw (1987). "Silhouettes: a Graphical Aid to the
140 | Interpretation and Validation of Cluster Analysis". Computational
141 | and Applied Mathematics 20: 53-65. doi:10.1016/0377-0427(87)90125-7.
142 | http://en.wikipedia.org/wiki/Silhouette_(clustering)
143 | """
144 | A = _intra_cluster_distances_block(X, labels, metric, n_jobs=n_jobs,
145 | **kwds)
146 | B = _nearest_cluster_distance_block(X, labels, metric, n_jobs=n_jobs,
147 | **kwds)
148 | sil_samples = (B - A) / np.maximum(A, B)
149 | # nan values are for clusters of size 1, and should be 0
150 | return np.nan_to_num(sil_samples)
151 |
152 | def _intra_cluster_distances_block(X, labels, metric, n_jobs=1, **kwds):
153 | """
154 | Calculate the mean intra-cluster distance for sample i.
155 |
156 | Parameters
157 | ----------
158 | X : array [n_samples_a, n_features]
159 | Feature array.
160 |
161 | labels : array, shape = [n_samples]
162 | label values for each sample
163 |
164 | metric : string, or callable
165 | The metric to use when calculating distance between instances in a
166 | feature array. If metric is a string, it must be one of the options
167 | allowed by metrics.pairwise.pairwise_distances. If X is the distance
168 | array itself, use "precomputed" as the metric.
169 |
170 | **kwds : optional keyword parameters
171 | Any further parameters are passed directly to the distance function.
172 | If using a scipy.spatial.distance metric, the parameters are still
173 | metric dependent. See the scipy docs for usage examples.
174 |
175 | Returns
176 | -------
177 | a : array [n_samples_a]
178 | Mean intra-cluster distance
179 | """
180 | intra_dist = np.zeros(labels.size, dtype=float)
181 | values = Parallel(n_jobs=n_jobs)(
182 | delayed(_intra_cluster_distances_block_)
183 | (X[np.where(labels == label)[0]], metric, **kwds)
184 | for label in np.unique(labels))
185 | for label, values_ in zip(np.unique(labels), values):
186 | intra_dist[np.where(labels == label)[0]] = values_
187 | return intra_dist
188 |
189 |
190 | def _nearest_cluster_distance_block(X, labels, metric, n_jobs=1, **kwds):
191 | """Calculate the mean nearest-cluster distance for sample i.
192 |
193 | Parameters
194 | ----------
195 | X : array [n_samples_a, n_features]
196 | Feature array.
197 |
198 | labels : array, shape = [n_samples]
199 | label values for each sample
200 |
201 | metric : string, or callable
202 | The metric to use when calculating distance between instances in a
203 | feature array. If metric is a string, it must be one of the options
204 | allowed by metrics.pairwise.pairwise_distances. If X is the distance
205 | array itself, use "precomputed" as the metric.
206 |
207 | **kwds : optional keyword parameters
208 | Any further parameters are passed directly to the distance function.
209 | If using a scipy.spatial.distance metric, the parameters are still
210 | metric dependent. See the scipy docs for usage examples.
211 |
212 | X : array [n_samples_a, n_features]
213 | Feature array.
214 |
215 | Returns
216 | -------
217 | b : float
218 | Mean nearest-cluster distance for sample i
219 | """
220 | inter_dist = np.empty(labels.size, dtype=float)
221 | inter_dist.fill(np.inf)
222 | # Compute cluster distance between pairs of clusters
223 | unique_labels = np.unique(labels)
224 |
225 | values = Parallel(n_jobs=n_jobs)(
226 | delayed(_nearest_cluster_distance_block_)(
227 | X[np.where(labels == label_a)[0]],
228 | X[np.where(labels == label_b)[0]],
229 | metric, **kwds)
230 | for label_a, label_b in combinations(unique_labels, 2))
231 |
232 | for (label_a, label_b), (values_a, values_b) in \
233 | zip(combinations(unique_labels, 2), values):
234 |
235 | indices_a = np.where(labels == label_a)[0]
236 | inter_dist[indices_a] = np.minimum(values_a, inter_dist[indices_a])
237 | del indices_a
238 | indices_b = np.where(labels == label_b)[0]
239 | inter_dist[indices_b] = np.minimum(values_b, inter_dist[indices_b])
240 | del indices_b
241 | return inter_dist
242 |
243 | def _intra_cluster_distances_block_(subX, metric, **kwds):
244 | distances = pairwise_distances(subX, metric=metric, **kwds)
245 | return distances.sum(axis=1) / (distances.shape[0] - 1)
246 |
247 | def _nearest_cluster_distance_block_(subX_a, subX_b, metric, **kwds):
248 | dist = pairwise_distances(subX_a, subX_b, metric=metric, **kwds)
249 | dist_a = dist.mean(axis=1)
250 | dist_b = dist.mean(axis=0)
251 | return dist_a, dist_b
252 |
253 | class Clusterer(BaseEstimator, ClusterMixin):
254 | """Performance clustering
255 |
256 | Parameters
257 | ----------
258 | k_range: tuple (pair)
259 | the minimum and the maximum $k$ to try when choosing the best value of $k$
260 | (the one having the best silhouette score)
261 |
262 | border_threshold: float
263 | the threshold to use for selecting the borderline.
264 | It indicates the max silhouette for a borderline point.
265 |
266 | verbose: boolean
267 | verbosity mode.
268 | default: False
269 |
270 | random_state : int
271 | RandomState instance or None, optional, default: None
272 | If int, random_state is the seed used by the random number generator;
273 | If RandomState instance, random_state is the random number generator;
274 | If None, the random number generator is the RandomState instance used
275 | by `np.random`.
276 |
277 | sample_size : int
278 | the number of samples (rows) that must be used when computing the silhouette score (the function silhouette_score is computationally expensive and generates a Memory Error when the number of samples is too high)
279 | default: 10000
280 |
281 | max_rows : int
282 | the maximum number of samples (rows) to be considered for the clustering task (the function silhouette_samples is computationally expensive and generates a Memory Error when the input matrix have too many rows)
283 | default: 40000
284 |
285 |
286 | Attributes
287 | ----------
288 | cluster_centers_ : array, [n_clusters, n_features]
289 | Coordinates of cluster centers
290 | n_clusters_: int
291 | number of clusters found by the algorithm
292 | labels_ :
293 | Labels of each point
294 | k_range: tuple
295 | minimum and maximum number of clusters to try
296 | verbose: boolean
297 | whether or not to show details of the execution
298 | random_state: int
299 | RandomState instance or None, optional, default: None
300 | If int, random_state is the seed used by the random number generator;
301 | If RandomState instance, random_state is the random number generator;
302 | If None, the random number generator is the RandomState instance used
303 | by 'np.random'.
304 | sample_size: None
305 | kmeans: scikit-learn KMeans object
306 | """
307 |
308 | def __init__(self, k_range=(2, 15), border_threshold=0.2, verbose=False, random_state=42,
309 | sample_size=None):
310 | self.k_range = k_range
311 | self.border_threshold = border_threshold
312 | self.verbose = verbose
313 | self.sample_size = sample_size
314 | # initialize attributes
315 | self.labels_ = []
316 | self.random_state = random_state
317 |
318 | def _find_clusters(self, X, make_plot=True):
319 | if self.verbose:
320 | print ('FITTING kmeans...\n')
321 | print ('n_clust\t|silhouette')
322 | print ('---------------------')
323 |
324 | self.k2silhouettes_ = {}
325 | kmin, kmax = self.k_range
326 | range_n_clusters = range(kmin, kmax + 1)
327 | best_k, best_silhouette = 0, 0.0
328 | for k in range_n_clusters:
329 |
330 | # computation
331 | kmeans = MiniBatchKMeans(n_clusters=k, init='k-means++', max_iter=1000, n_init=1,
332 | random_state =self.random_state)
333 | kmeans.fit(X)
334 | cluster_labels = kmeans.labels_
335 |
336 | silhouette = scalable_silhouette_score(X, cluster_labels,
337 | sample_size=self.sample_size,
338 | random_state=self.random_state)
339 | if self.verbose:
340 | print ('%s\t|%s' % (k, round(silhouette, 4)))
341 |
342 | if silhouette >= best_silhouette:
343 | best_silhouette = silhouette
344 | best_k = k
345 | #best_silhouette_samples = ss
346 |
347 | self.k2silhouettes_[k] = silhouette
348 |
349 | kmeans = MiniBatchKMeans(n_clusters=best_k, init='k-means++', max_iter=10000, n_init=1,
350 | random_state=self.random_state)
351 | kmeans.fit(X)
352 | self.kmeans_ = kmeans
353 | self.n_clusters_ = best_k
354 | self.cluster_centers_ = kmeans.cluster_centers_
355 | self.labels_ = kmeans.labels_
356 | if self.verbose:
357 | print ('Best: n_clust=%s (silhouette=%s)\n' % (best_k, round(best_silhouette, 4)))
358 |
359 | def _cluster_borderline(self, X):
360 | """
361 | Assign clusters to borderline points, according to the borderline_threshold
362 | specified in the constructor
363 | """
364 | if self.verbose:
365 | print ('FINDING hybrid centers of performance...\n')
366 |
367 | self.labels_ = [[] for i in range(len(X))]
368 |
369 | ss = scalable_silhouette_samples(X, self.kmeans_.labels_)
370 | for i, (row, silhouette, cluster_label) in enumerate(zip(X, ss, self.kmeans_.labels_)):
371 | if silhouette >= self.border_threshold:
372 | self.labels_[i].append(cluster_label)
373 | else:
374 | intra_silhouette = euclidean(row, self.kmeans_.cluster_centers_[cluster_label])
375 | for label in set(self.kmeans_.labels_):
376 | inter_silhouette = euclidean(row, self.kmeans_.cluster_centers_[label])
377 | silhouette = (inter_silhouette - intra_silhouette) / max(inter_silhouette, intra_silhouette)
378 | if silhouette <= self.border_threshold:
379 | self.labels_[i].append(label)
380 |
381 | return ss
382 |
383 | def _generate_matrix(self, ss, kind = 'multi'):
384 | """
385 | Generate a matrix for optimizing the predict function
386 | """
387 | matrix = {}
388 | X = []
389 |
390 | for i in range(0, 101):
391 | for j in range(0, 101):
392 | X.append([i, j])
393 | if kind == 'multi':
394 | multi_labels = self._predict_with_silhouette(X, ss)
395 | for row, labels in zip(X, multi_labels):
396 | matrix[tuple(row)] = labels
397 | else:
398 | for row, labels in zip(X, self.kmeans_.predict(X)):
399 | matrix[tuple(row)] = labels
400 | self._matrix = matrix
401 |
402 | def get_clusters_matrix(self, kind = 'single'):
403 | roles_matrix = {}
404 | m= self._matrix.items()
405 | # if kind != 'single':
406 | # m= self._matrix.items()
407 | #
408 | # else:
409 | # m = self._matrix_single.items()
410 |
411 | for k,v in m:
412 | x,y = int(k[0]),int(k[1])
413 | if k[0] not in roles_matrix:
414 | roles_matrix[x] = {}
415 | roles_matrix[x][y] = "-".join(map(str,v)) if kind !='single' else int(v) #casting with python int, otherwise it's not json serializable
416 | return roles_matrix
417 |
418 | def fit(self, player_ids, match_ids, dataframe, y=None, kind='single', filename='clusters'):
419 | """
420 | Compute performance clustering.
421 |
422 | Parameters
423 | ----------
424 | X : array-like or sparse matrix, shape=(n_samples, n_features)
425 | Training instances to cluster.
426 |
427 | kind: str
428 | single: single cluster
429 | multi: multi cluster
430 |
431 | y: ignored
432 | """
433 | self.kind_ = kind
434 | X = dataframe.values
435 |
436 | self._find_clusters(X) # find the clusters with kmeans
437 | if kind != 'single':
438 |
439 |
440 | silhouette_scores = self._cluster_borderline(X) # assign multiclusters to borderline performances
441 | self._generate_matrix(silhouette_scores) # generate the matrix for optimizing the predict function
442 | else:
443 | self._generate_matrix(None, kind = 'single') #no silhouette scores if kind single
444 | if self.verbose:
445 | print ("DONE.")
446 |
447 |
448 |
449 |
450 | return self
451 |
452 | def _predict_with_silhouette(self, X, ss):
453 | cluster_labels, threshold = self.kmeans_.predict(X), self.border_threshold
454 | multicluster_labels = [[] for _ in cluster_labels]
455 | if len(set(cluster_labels)) == 1:
456 | return [[cluster_label] for cluster_label in cluster_labels]
457 | for i, (row, silhouette, cluster_label) in enumerate(zip(X, ss, cluster_labels)):
458 | if silhouette >= threshold:
459 | multicluster_labels[i].append(cluster_label)
460 | else:
461 | intra_silhouette = euclidean(row, self.cluster_centers_[cluster_label])
462 | for label in set(cluster_labels):
463 | inter_silhouette = euclidean(row, self.cluster_centers_[label])
464 | silhouette = (inter_silhouette - intra_silhouette) / max(inter_silhouette, intra_silhouette)
465 | if silhouette <= threshold:
466 | multicluster_labels[i].append(label)
467 |
468 | return np.array(multicluster_labels)
469 |
470 | def predict(self, X, y=None):
471 | """
472 | Predict the closest cluster each sample in X belongs to.
473 | In the vector quantization literature, `cluster_centers_` is called
474 | the code book and each value returned by `predict` is the index of
475 | the closest code in the code book.
476 |
477 | Parameters
478 | ----------
479 | X : {array-like, sparse matrix}, shape = [n_samples, n_features]
480 | New data to predict.
481 |
482 | Returns
483 | -------
484 | multi_labels : array, shape [n_samples,]
485 | Index of the cluster each sample belongs to.
486 | """
487 | if self.kind_ == 'single':
488 | return self.kmeans_predict(X)
489 | else:
490 | multi_labels = []
491 | for row in X:
492 | x, y = tuple(row)
493 | labels = self._matrix[(int(x), int(y))]
494 | multi_labels.append(labels)
495 | return multi_labels
496 |
--------------------------------------------------------------------------------
/playerank/models/Rater.py:
--------------------------------------------------------------------------------
1 | # /usr/local/bin/python
2 | from collections import defaultdict, OrderedDict, Counter
3 | import numpy as np
4 |
5 | from sklearn.preprocessing import MinMaxScaler
6 |
7 | class Rater():
8 | """Performance rating
9 |
10 | Parameters
11 | ----------
12 | alpha_goal: float
13 | importance of the goal in the evaluation of performance, in the range [0, 1]
14 | default=0.0
15 |
16 | Attributes
17 | ----------
18 | ratings_: numpy array
19 | the ratings of the performances
20 | """
21 | def __init__(self, alpha_goal=0.0):
22 | self.alpha_goal = alpha_goal
23 | self.ratings_ = []
24 |
25 | def get_rating(self, weighted_sum, goals):
26 | return weighted_sum * (1 - self.alpha_goal) + self.alpha_goal * goals
27 |
28 | def predict(self, dataframe, goal_feature, score_feature, filename='ratings'):
29 | """
30 | Compute the rating of each performance in X
31 |
32 | Parameters
33 | ----------
34 | dataframe: dataframe of playerank scores
35 | goal_feature: column name for goal scored dataframe column
36 | score_feature: column name for playerank score dataframe column
37 |
38 |
39 | Returns
40 | -------
41 | ratings_: numpy array
42 | """
43 | feature_names = dataframe.columns
44 | X = dataframe.values
45 |
46 | for i, row in enumerate(X):
47 |
48 | goal_index = feature_names.get_loc(goal_feature)
49 | pr_index = feature_names.get_loc(score_feature)
50 |
51 | rating = self.get_rating(float(row[pr_index]), float(row[goal_index]),)
52 | self.ratings_.append(rating)
53 | self.ratings_ = MinMaxScaler().fit_transform(np.array(self.ratings_).reshape(-1, 1))[:, 0]
54 |
55 |
56 |
57 | return self.ratings_
58 |
--------------------------------------------------------------------------------
/playerank/models/Weighter.py:
--------------------------------------------------------------------------------
1 | # /usr/local/bin/python
2 | from sklearn.base import BaseEstimator
3 | from sklearn.svm import LinearSVC
4 | from sklearn.model_selection import cross_val_score
5 |
6 | from sklearn.preprocessing import StandardScaler, LabelEncoder
7 | from sklearn.utils import check_random_state
8 | from sklearn.preprocessing import MinMaxScaler
9 | from sklearn.feature_selection import VarianceThreshold
10 | import json
11 | import pandas as pd
12 | import numpy as np
13 |
14 | class Weighter(BaseEstimator):
15 | """Automatic weighting of performance features
16 |
17 | Parameters
18 | ----------
19 | label_type: str
20 | the label type associated to the game outcome.
21 | options: w-dl (victory vs draw or defeat), wd-l (victory or draw vs defeat),
22 | w-d-l (victory, draw, defeat)
23 | random_state : int
24 | RandomState instance or None, optional, default: None
25 | If int, random_state is the seed used by the random number generator;
26 | If RandomState instance, random_state is the random number generator;
27 | If None, the random number generator is the RandomState instance used
28 | by `np.random`.
29 |
30 | Attributes
31 | ----------
32 | feature_names_ : array, [n_features]
33 | names of the features
34 | label_type_: str
35 | the label type associated to the game outcome.
36 | options: w-dl (victory vs draw or defeat), wd-l (victory or draw vs defeat),
37 | w-d-l (victory, draw, defeat)
38 | clf_: LinearSVC object
39 | the object of the trained classifier
40 | weights_ : array, [n_features]
41 | weights of the features computed by the classifier
42 | random_state_: int
43 | RandomState instance or None, optional, default: None
44 | If int, random_state is the seed used by the random number generator;
45 | If RandomState instance, random_state is the random number generator;
46 | If None, the random number generator is the RandomState instance used
47 | by 'np.random'.
48 | """
49 | def __init__(self, label_type='w-dl', random_state=42):
50 | self.label_type_ = label_type
51 | self.random_state_ = random_state
52 |
53 | def fit(self, dataframe, target, scaled=False, var_threshold = 0.001 , filename='weights.json'):
54 | """
55 | Compute weights of features.
56 |
57 | Parameters
58 | ----------
59 | dataframe : pandas DataFrame
60 | a dataframe containing the feature values and the target values
61 |
62 | target: str
63 | a string indicating the name of the target variable in the dataframe
64 |
65 | scaled: boolean
66 | True if X must be normalized, False otherwise
67 | (optional)
68 |
69 | filename: str
70 | the name of the files to be saved (the json file containing the feature weights,
71 | )
72 | default: "weights"
73 | """
74 | ##feature selection by variance, to delete outlier features
75 | feature_names = list(dataframe.columns)
76 | # normalize the data and then eliminate the variables with zero variance
77 | sel = VarianceThreshold(var_threshold)
78 | X = sel.fit_transform(dataframe)
79 | selected_feature_names = [feature_names[i] for i, var in enumerate(list(sel.variances_)) if var > var_threshold]
80 | print ("[Weighter] filtered features:", [(feature_names[i],var) for i, var in enumerate(list(sel.variances_)) if var <= var_threshold])
81 | dataframe = pd.DataFrame(X, columns=selected_feature_names)
82 | if self.label_type_ == 'w-dl':
83 | y = dataframe[target].apply(lambda x: 1 if x > 0 else -1)
84 | elif self.label_type_ == 'wd-l':
85 | y = dataframe[target].apply(lambda x: 1 if x >= 0 else -1 )
86 | else:
87 | y = dataframe[target].apply(lambda x: 1 if x > 0 else 0 if x==0 else 2)
88 | X = dataframe.loc[:, dataframe.columns != target].values
89 | y = y.values
90 |
91 | if scaled:
92 | X = StandardScaler().fit_transform(X)
93 |
94 | self.feature_names_ = dataframe.loc[:, dataframe.columns != target].columns
95 | self.clf_ = LinearSVC(fit_intercept=True, dual = False, max_iter = 50000,random_state=self.random_state_)
96 |
97 | #f1_score = np.mean(cross_val_score(self.clf_, X, y, cv=2, scoring='f1_weighted'))
98 | #self.f1_score_ = f1_score
99 |
100 | self.clf_.fit(X, y)
101 |
102 | outcome = 0
103 | if self.label_type_ == 'w-d-l':
104 | outcome = 1
105 |
106 | importances = self.clf_.coef_[outcome]
107 |
108 | sum_importances = sum(np.abs(importances))
109 | self.weights_ = importances / sum_importances
110 |
111 | ## Save the computed weights into a json file
112 | features_and_weights = {}
113 | for feature, weight in sorted(zip(self.feature_names_, self.weights_),key = lambda x: x[1]):
114 | features_and_weights[feature]= weight
115 | json.dump(features_and_weights, open('%s' %filename, 'w'))
116 | ## Save the object
117 | #pkl.dump(self, open('%s.pkl' %filename, 'wb'))
118 |
119 | def get_weights(self):
120 | return self.weights_
121 |
122 | def get_feature_names(self):
123 | return self.feature_names_
124 |
--------------------------------------------------------------------------------
/playerank/models/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/mesosbrodleto/playerank/79dd6464be98bbc35f48f99b4a3b626ea43e9a7e/playerank/models/__init__.py
--------------------------------------------------------------------------------
/playerank/setup.py:
--------------------------------------------------------------------------------
1 | from distutils.core import setup
2 |
3 | setup(
4 | name='playerank',
5 | version='1.0',
6 | packages=['playerank',],
7 | install_requires=[
8 | 'pandas==0.23.4',
9 | 'scipy==0.17.1',
10 | 'numpy==1.11.0',
11 | 'scikit_learn==0.21.3',
12 | 'joblib'
13 | ],
14 | license='Creative Commons Attribution-Noncommercial-Share Alike license',
15 | long_description=open('README.md').read(),
16 | )
17 |
--------------------------------------------------------------------------------
/playerank/utils/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/mesosbrodleto/playerank/79dd6464be98bbc35f48f99b4a3b626ea43e9a7e/playerank/utils/__init__.py
--------------------------------------------------------------------------------
/playerank/utils/compute_features_weight.py:
--------------------------------------------------------------------------------
1 | #import .models
2 | from ..models import Weighter
3 |
4 | from ..features import qualityFeatures, relativeAggregation,goalScoredFeatures
5 |
6 | def compute_feature_weights(output_path):
7 |
8 | qualityFeat = qualityFeatures.qualityFeatures()
9 | quality= qualityFeat.createFeature(events_path = 'playerank/data/events',
10 | players_file='playerank/data/players.json' ,entity = 'team')
11 | #computing goals scored for each team in each match
12 | gs=goalScoredFeatures.goalScoredFeatures()
13 | goals=gs.createFeature('playerank/data/matches')
14 | #merging of quality features and goals scored
15 | aggregation = relativeAggregation.relativeAggregation()
16 | aggregation.set_features([quality,goals])
17 | df = aggregation.aggregate(to_dataframe = True)
18 |
19 | weighter = Weighter.Weighter(label_type='wd-l')
20 | weighter.fit(df, 'goal-scored', filename=output_path)
21 | print ("features weights stored in %s"%output_path)
22 |
23 |
24 | compute_feature_weights('playerank/conf/features_weights.json')
25 |
--------------------------------------------------------------------------------
/playerank/utils/compute_features_weight.py~:
--------------------------------------------------------------------------------
1 | #import .models
2 | from ..models import Weighter
3 |
4 | from ..features import qualityFeatures, relativeAggregation,goalScoredFeatures
5 | import sys
6 | #computing all quality features (passes accurate, passes failed, shots, etc.)
7 |
8 | def compute_feature_weights(output_path):
9 |
10 | qualityFeat = qualityFeatures.qualityFeatures()
11 | quality= qualityFeat.createFeature('playerank/data/events.json',entity = 'team'])
12 | #computing goals scored for each team in each match
13 | gs=goalScoredFeatures.goalScoredFeatures()
14 | goals=gs.createFeature('playerank/data/matches.json')
15 | #merging of quality features and goals scored
16 | aggregation = relativeAggregation.relativeAggregation()
17 | aggregation.set_features([quality,goals])
18 | df = aggregation.aggregate(to_dataframe = True)
19 |
20 | weighter = Weighter.Weighter(label_type='w-dl')
21 | weighter.fit(df, 'goal-scored', filename=output_path)
22 | print ("features weights stored in %s"%output_path)
23 |
24 |
25 | compute_feature_weights('playerank/conf/features_weigths.json')
26 |
--------------------------------------------------------------------------------
/playerank/utils/compute_playerank.py:
--------------------------------------------------------------------------------
1 | from ..models import Weighter
2 |
3 | from ..features import centerOfPerformanceFeature,qualityFeatures,playerankFeatures, plainAggregation, matchPlayedFeatures,roleFeatures
4 | import sys,json
5 |
6 | weigths_file ='playerank/conf/features_weights.json'
7 |
8 | qualityFeat = qualityFeatures.qualityFeatures()
9 | quality= qualityFeat.createFeature(events_path = 'playerank/data/events',
10 | players_file='playerank/data/players.json' ,entity = 'player')
11 |
12 |
13 | prFeat = playerankFeatures.playerankFeatures()
14 | prFeat.set_features([quality])
15 | pr= prFeat.createFeature(weigths_file)
16 |
17 |
18 | matchPlayedFeat = matchPlayedFeatures.matchPlayedFeatures()
19 | matchplayed = matchPlayedFeat.createFeature(matches_path = 'playerank/data/matches',
20 | players_file='playerank/data/players.json' )
21 |
22 | center_performance = centerOfPerformanceFeature.centerOfPerformanceFeature()
23 |
24 | center_performance = center_performance.createFeature(events_path = 'playerank/data/events',
25 | players_file = 'playerank/data/players.json' )
26 |
27 |
28 | roleFeat = roleFeatures.roleFeatures()
29 | roleFeat.set_features([center_performance])
30 | roles= roleFeat.createFeature(matrix_role_file = 'playerank/conf/role_matrix.json')
31 |
32 |
33 | aggregation = plainAggregation.plainAggregation()
34 |
35 | aggregation.set_features([matchplayed,pr,roles])
36 |
37 | df = aggregation.aggregate(to_dataframe = True)
38 |
39 | print (df.head())
40 |
--------------------------------------------------------------------------------
/playerank/utils/compute_playerank.py~:
--------------------------------------------------------------------------------
1 | from ..models import Weighter
2 |
3 | from ..features import playerankFeatures, plainAggregation, matchPlayedFeatures,roleFeatures
4 | import sys,json
5 |
6 | weigths_file = sys.argv[1]
7 | prFeat = playerankFeatures.playerankFeatures()
8 |
9 | pr= prFeat.createFeature(weigths_file,param={'competitionId': 524},limit = 5)
10 |
11 | print ("PlayeRank Score Computed. \n %s performance processed"%pr.count())
12 | mFeat= matchPlayedFeatures.matchPlayedFeatures()
13 |
14 | mins= mFeat.createFeature(param={'competitionId': 524})
15 |
16 | matrix_role = json.load(open('playerlib/conf/role_matrix.json'))
17 | roleFeat = roleFeatures.roleFeatures()
18 |
19 | roles= roleFeat.createFeature(matrix_role, param={'competitionId': 524})
20 |
21 | aggregation = plainAggregation.plainAggregation()
22 |
23 | aggregation.set_features([mins,pr])
24 |
25 | df = aggregation.aggregate(to_dataframe = True, stored_collection_name = 'playerank_scores')
26 |
27 | df.to_csv('playerank.csv')
28 |
--------------------------------------------------------------------------------
/playerank/utils/compute_roles.py:
--------------------------------------------------------------------------------
1 | #import .models
2 | from ..models import Clusterer
3 |
4 | from ..features import centerOfPerformanceFeature, plainAggregation
5 | import sys,json
6 | #computing all quality features (passes accurate, passes failed, shots, etc.)
7 |
8 | def compute_roleMatrix(output_path):
9 | #getting average position for each player in each match
10 | centerfeat = centerOfPerformanceFeature.centerOfPerformanceFeature()
11 | centerfeat = centerfeat.createFeature(events_path = 'playerank/data/events',
12 | players_file='playerank/data/players.json')
13 |
14 | #plain aggregation to get a dataframe
15 | aggregation = plainAggregation.plainAggregation()
16 | aggregation.set_features([centerfeat])
17 | df = aggregation.aggregate(to_dataframe = True )
18 |
19 | #use clustering object to get the best fit
20 | clusterer = Clusterer.Clusterer(verbose=True, k_range=(8, 9))
21 | clusterer.fit(df.entity, df.match, df[['avg_x', 'avg_y']], kind='multi')
22 |
23 | matrix_role = clusterer.get_clusters_matrix(kind = 'multi')
24 |
25 | matrix_role = matrix_role
26 | json.dump(matrix_role,open(output_path,'w'))
27 |
28 |
29 | compute_roleMatrix('playerank/conf/role_matrix.json')
30 |
--------------------------------------------------------------------------------
/playerank_schema_tist.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/mesosbrodleto/playerank/79dd6464be98bbc35f48f99b4a3b626ea43e9a7e/playerank_schema_tist.png
--------------------------------------------------------------------------------