├── README.md
├── inference.py
├── input
    ├── hero2ix.csv
    ├── map2ix.csv
    ├── map_teams.csv
    └── teams.csv
├── model
    ├── __init__.py
    ├── hero2vec.py
    └── map2vec.py
├── output
    ├── hero
    │   ├── hero_embddings_2d.png
    │   ├── hero_embeddings.npy
    │   ├── loss_hitory.png
    │   └── model.p
    └── map
    │   ├── loss_hitory.png
    │   ├── map_embddings_2d.png
    │   └── map_embeddings.npy
├── setup
    ├── install _without_torch.sh
    └── install.sh
├── train_hero.py
├── train_map.py
└── utils
    ├── __init__.py
    ├── dataset.py
    ├── evaluation.py
    └── prediction.py


/README.md:
--------------------------------------------------------------------------------
  1 | # Hero2Vec
  2 | A Machine learning model to understand the game design and player experience of a video game, the Overwatch.
  3 | 
  4 | # Table of Contents
  5 | 1. [Introduction](README.md#introduction)
  6 | 2. [Challenges](README.md#challenges)
  7 | 3. [Motivations](README.md#motivations)
  8 | 4. [Model Selection](README.md#model-selection)
  9 | 5. [Model Architecture](README.md#model-architecture)
 10 | 6. [Input](README.md#input)
 11 | 7. [Output](README.md#output)
 12 | 8. [Repo Structure](README.md#repo-structure)
 13 | 9. [Setup](README.md#setup)
 14 | 10. [Usage](README.md#usage)
 15 | 
 16 | # Introduction
 17 | 
 18 | The goal of this project is to predict the outcome (winning-rate) of a team in a video game, particularly multiplayer online games like Overwatch/Dota2/LoL, given the status at a certain time in the game, like kills/deaths/team-compositions, etc.
 19 | 
 20 | This repo focuses on part of the project, namely, modeling the team compositions (or the heroes) and maps in the game.
 21 | 
 22 | # Challenges
 23 | 
 24 | 1. The dataset is not large enough. We only have the results from less than 300 games.
 25 | 
 26 | 2. The dataset consist of a lot of categorical features, like team compositions and maps. A simple one-hot encoding can result in high dimensional sparse input and unfortunately we don't have enough data to conquer the **curse of dimensionality**. Moreover, the team-composition/map plays an extremely important role in the game, so we can't simply drop it.
 27 | 
 28 | # Motivations
 29 | 
 30 | 1. Just like humans (or words), heroes have their own characteristics and also share some **similarities**. So rather than one-hot orthogonal vectors, they can be represented by **distributed representations**. What's more, just like words in a sentence, heroes in a team also have strong **co-occurrence**. So heroes can be modeled in a very similar fashion as **word2vec**, i.e., **hero2vec**.
 31 | 
 32 | 2. The team compositions are widely available online. This is independent of my own dataset and can serve the training of hero2vec just like Wiki corpus used for word2vec.
 33 | 
 34 | 3. By modeling heroes in the game by distributed representations, I can not only address the curse of dimensionality, but also gain valuable information on the game designs of the heroes as well as the how the players appreciate these designs.
 35 | 
 36 | 4. All the above motivations apply to the maps similarly, i.e., **map2vec**.
 37 | 
 38 | # Model Selection
 39 | 
 40 | 1. As mentioned above, heroes in a team have strong co-occurrence, i.e., the conditional probability P(h1|h2.., h6) (6 heroes in a team) is high. h1 doesn't have to be a specific hero, any hero in the team can be this center hero. This is very suitable for the **Continuous Bag of Words (CBOW)** model, since the attributes of a team (or the 5 context heroes) are really a sum of the attributes of all the individuals, unlike the sum of context words in a sentence is not always intuitive.
 41 | 
 42 | 2. The map in the game can be modeled in a similar way. The conditional probability P(map|team) is high. So the weight of the last affine layer of the classifier is the embeddings for the maps.
 43 | 
 44 | # Model Architecture
 45 | 
 46 | 1. hero2vec. The model pipeline is as follows:
 47 | `input context heroes (5 heroes)` -> `embeddings` -> `sum` -> `fully connected layers` -> `softmax (center hero)`
 48 | 
 49 | 2. map2vec. The model pipeline is as follows:
 50 | `input team (6 heroes)` -> `hero2vev embeddings` -> `sum` -> `fully connected layers` -> `map embeddings` -> `softmax (map)`
 51 | 
 52 | # Input
 53 | 
 54 | 1. `teams.csv` under `input` folder. This is a csv table that contains the team composition. Can be easily changed to other team-based games like Dota2/LoL.
 55 | 
 56 | 2. `map_teams.csv` under `input` folder. This the csv table that contains both the team and map composition.
 57 | 
 58 | 3. `hero2ix.csv` under `input` folder. This is csv table that maps the input hero names to their int ID and further to the embeddings. Can be easily customized in case different name is used for the same hero, e.g., 'dva' (used in this one) is written as 'D.Va'.
 59 | 
 60 | 4. `map2ix.csv` under `input` folder. This is csv table that maps the input map names to their int ID and further to the embeddings.
 61 | 
 62 | # Output
 63 | 
 64 | 1. `hero` folder
 65 | Output contains a graph showing the embeddings (after PCA to 2D) of the heroes `hero_embeddings_2d.png`, a numpy array contains the embeddings `hero_embeddings.npy`, a graph of the training loss `loss_history.png` and pickled model `model.p`. For example, the `hero_embeddings_2d.png` looks like:
 66 | 
 67 | <img src="https://github.com/ybw9000/hero2vec/blob/master/output/hero/hero_embddings_2d.png" align="center">
 68 | 
 69 | 2. `map` folder
 70 | Output contains a graph showing the embeddings (after PCA to 2D) of the maps `map_embeddings_2d.png`, a numpy array contains the embeddings `map_embeddings.npy` and a graph of the training loss `loss_history.png`. For example, the `map_embeddings_2d.png` looks like:
 71 | 
 72 | <img src="https://github.com/ybw9000/hero2vec/blob/master/output/map/map_embddings_2d.png" align="center">
 73 | 
 74 | # Repo Structure
 75 | 
 76 | The directory structure for the repo looks like this:
 77 | 
 78 |     ├── README.md
 79 |     ├── train_hero.py
 80 |     ├── train_map.py
 81 |     ├── inference.py
 82 |     ├── setup
 83 |     │   └── install.sh
 84 |     │   └── install_without_torch.sh
 85 |     ├── model
 86 |     │   └── __init__.py    
 87 |     │   └── hero2vec.py
 88 |     │   └── map2vec.py
 89 |     ├── utils
 90 |     │   └── __init__.py
 91 |     │   └── dataset.py
 92 |     │   └── evaluation.py
 93 |     │   └── prediction.py
 94 |     ├── input
 95 |     │   └── hero2ix.csv
 96 |     │   └── map2ix.csv
 97 |     │   └── teams.csv
 98 |     │   └── map_teams.csv
 99 |     └── output
100 |         ├── hero
101 |         │   └── hero_embeddings_2d.png
102 |         │   └── hero_embeddings.npy
103 |         │   └── loss_history.npy
104 |         │   └── model.p
105 |         └── map
106 |             └── map_embeddings_2d.png
107 |             └── map_embeddings.npy
108 |             └── loss_history.npy
109 | # Setup
110 | 
111 | Under `setup` folder, run:
112 | 
113 | `bash install.sh`
114 | 
115 | if issues occurs with installing pytorch, please refer to http://pytorch.org/ for installation of pytorch. Then run:
116 | 
117 | `bash install_without_torch.sh`
118 | 
119 | # Usage
120 | 
121 | 1. Train hero2vec. run: `python train_hero.py ./input/teams.csv ./input/hero2ix.csv`
122 | 
123 | 2. Train map2vec. run: `python train_map.py ./input/map_teams.csv ./input/hero2ix.csv ./input/map2ix.csv`
124 | 
125 | 3. Predict the center hero given five other heroes. run: `python inference.py <heroes>`. `<heroes>` contains the hero names of five known members. For example: `python inference.py dva genji tracer lucio winston`. Note: hero names must be in the `hero` column in `hero2ix.csv` in `input` folder.
126 | 


--------------------------------------------------------------------------------
/inference.py:
--------------------------------------------------------------------------------
 1 | import sys
 2 | import pickle
 3 | 
 4 | import numpy as np
 5 | import pandas as pd
 6 | 
 7 | from utils.prediction import *
 8 | 
 9 | def main():
10 | 
11 |     model_dir = './output/hero/model.p'
12 |     model = pickle.load(open(model_dir, 'rb'))
13 | 
14 |     hero2ix_dir = './input/hero2ix.csv'
15 |     hero2ix_df = pd.read_csv(hero2ix_dir, index_col=0)
16 | 
17 |     indicator = Predictor(model, hero2ix_df)
18 | 
19 |     heroes = sys.argv[1:]
20 |     center_hero = indicator.predict(heroes)
21 |     print('suggested hero: ', center_hero)
22 | 
23 | if __name__ == '__main__':
24 |     main()
25 | 


--------------------------------------------------------------------------------
/input/hero2ix.csv:
--------------------------------------------------------------------------------
 1 | ,hero,ID
 2 | 0,ana,0
 3 | 1,bastion,1
 4 | 2,doomfist,2
 5 | 3,dva,3
 6 | 4,genji,4
 7 | 5,hanzo,5
 8 | 6,junkrat,6
 9 | 7,lucio,7
10 | 8,mccree,8
11 | 9,mei,9
12 | 10,mercy,10
13 | 11,moira,11
14 | 12,orisa,12
15 | 13,pharah,13
16 | 14,reaper,14
17 | 15,reinhardt,15
18 | 16,roadhog,16
19 | 17,soldier76,17
20 | 18,sombra,18
21 | 19,symmetra,19
22 | 20,torbjorn,20
23 | 21,tracer,21
24 | 22,widowmaker,22
25 | 23,winston,23
26 | 24,zarya,24
27 | 25,zenyatta,25
28 | 


--------------------------------------------------------------------------------
/input/map2ix.csv:
--------------------------------------------------------------------------------
 1 | ,map,ID
 2 | 0,Dorado_Attack,0
 3 | 1,Dorado_Defense,1
 4 | 2,Eichenwalde_Attack,2
 5 | 3,Eichenwalde_Defense,3
 6 | 4,Hanamura_Attack,4
 7 | 5,Hanamura_Defense,5
 8 | 6,Hollywood_Attack,6
 9 | 7,Hollywood_Defense,7
10 | 8,Horizon Lunar Colony_Attack,8
11 | 9,Horizon Lunar Colony_Defense,9
12 | 10,Ilios_Lighthouse,10
13 | 11,Ilios_Ruins,11
14 | 12,Ilios_Well,12
15 | 13,Junkertown_Attack,13
16 | 14,Junkertown_Defense,14
17 | 15,King's Row_Attack,15
18 | 16,King's Row_Defense,16
19 | 17,Lijiang Tower_Control Center,17
20 | 18,Lijiang Tower_Garden,18
21 | 19,Lijiang Tower_Night Market,19
22 | 20,Nepal_Sanctum,20
23 | 21,Nepal_Shrine,21
24 | 22,Nepal_Village,22
25 | 23,Numbani_Attack,23
26 | 24,Numbani_Defense,24
27 | 25,Oasis_City Center,25
28 | 26,Oasis_Gardens,26
29 | 27,Oasis_University,27
30 | 28,Route 66_Attack,28
31 | 29,Route 66_Defense,29
32 | 30,Temple of Anubis_Attack,30
33 | 31,Temple of Anubis_Defense,31
34 | 32,Volskaya Industries_Attack,32
35 | 33,Volskaya Industries_Defense,33
36 | 34,Watchpoint: Gibraltar_Attack,34
37 | 35,Watchpoint: Gibraltar_Defense,35
38 | 


--------------------------------------------------------------------------------
/model/__init__.py:
--------------------------------------------------------------------------------
1 | from . import hero2vec
2 | from . import map2vec
3 | 


--------------------------------------------------------------------------------
/model/hero2vec.py:
--------------------------------------------------------------------------------
  1 | import torch
  2 | import torch.autograd as autograd
  3 | import torch.nn as nn
  4 | import torch.nn.functional as F
  5 | import torch.optim as optim
  6 | from torch.nn import init
  7 | 
  8 | class CBOH(nn.Module):
  9 | 
 10 |     def __init__(self, heropool_size, embedding_dim):
 11 |         """
 12 |         Initialize an NN with one hidden layer. Weight of the hidden layer is
 13 |         the embedding.
 14 |         inputs:
 15 |             heropool_size: int
 16 |             embedding_dim: int
 17 |         """
 18 |         super().__init__()
 19 |         self.embedding_dim = embedding_dim
 20 |         self.embeddings = nn.Embedding(heropool_size, embedding_dim)
 21 |         self.affine = nn.Linear(embedding_dim, heropool_size)
 22 |         self.init_emb()
 23 | 
 24 |     def init_emb(self):
 25 |         """
 26 |         init embeddings and affine layer
 27 |         """
 28 |         initrange = 0.5 / self.embedding_dim
 29 |         self.embeddings.weight.data.uniform_(-initrange, initrange)
 30 |         self.affine.weight.data.uniform_(-0, 0)
 31 |         self.affine.bias.data.zero_()
 32 | 
 33 |     def forward(self, inputs):
 34 |         """
 35 |         inputs:
 36 |             inputs: torch.autograd.Variable, size = (N, 5)
 37 |         returns:
 38 |             out: torch.autograd.Variable, size = (N, heropool_size)
 39 |         """
 40 |         embeds = self.embeddings(inputs).sum(dim=1) #contiuous
 41 |         out = self.affine(embeds)
 42 |         return out
 43 | 
 44 | class CBOHBilayer(nn.Module):
 45 | 
 46 |     def __init__(self, heropool_size, embedding_dim, hidden_dim=10):
 47 |         """
 48 |         Initialize an NN with two hidden layers. Weight of the first hidden
 49 |         layer is the embedding.
 50 |         inputs:
 51 |             heropool_size: int
 52 |             embedding_dim: int
 53 |             hidden_dim: int
 54 |         """
 55 |         super().__init__()
 56 |         self.embedding_dim = embedding_dim
 57 |         self.hidden_dim = hidden_dim
 58 |         self.embeddings = nn.Embedding(heropool_size, embedding_dim)
 59 |         #Initialize 2nd hidden layer with dimension = hidden_dim
 60 |         self.linear1 = nn.Linear(embedding_dim, hidden_dim)
 61 |         self.relu1 = nn.ReLU()
 62 |         self.affine = nn.Linear(hidden_dim, heropool_size)
 63 |         self.init_emb()
 64 | 
 65 |     def init_emb(self):
 66 |         """
 67 |         init embeddings and affine layer. The weight of the 2nd hidden layer is
 68 |         initialized by Kaiming_norm.
 69 |         """
 70 |         initrange = 0.5 / self.embedding_dim
 71 |         self.embeddings.weight.data.uniform_(-initrange, initrange)
 72 |         init.kaiming_normal(self.linear1.weight.data)
 73 |         self.linear1.bias.data.zero_()
 74 |         self.affine.weight.data.uniform_(-0, 0)
 75 |         self.affine.bias.data.zero_()
 76 | 
 77 |     def forward(self, inputs):
 78 |         """
 79 |         inputs:
 80 |             inputs: torch.autograd.Variable, size = (N, 5)
 81 |         returns:
 82 |             out: torch.autograd.Variable, size = (N, heropool_size)
 83 |         """
 84 |         embeds = self.embeddings(inputs).sum(dim=1) #contiuous
 85 |         pipe = nn.Sequential(self.linear1, self.relu1, self.affine)
 86 |         out = pipe(embeds)
 87 |         return out
 88 | 
 89 | class CBOHTrilayer(nn.Module):
 90 | 
 91 |     def __init__(self, heropool_size, embedding_dim, hidden_dim=10,
 92 |                  affine_dim=10):
 93 |         """
 94 |         Initialize an NN with three hidden layers. Weight of the first hidden
 95 |         layer is the embedding.
 96 |         inputs:
 97 |             heropool_size: int
 98 |             embedding_dim: int
 99 |             hidden_dim: int
100 |             affine_dim: int
101 |         """
102 |         super().__init__()
103 |         self.embedding_dim = embedding_dim
104 |         self.affine_dim = affine_dim
105 |         self.embeddings = nn.Embedding(heropool_size, embedding_dim)
106 |         #Initialize 2nd hidden layer with dimension = hidden_dim
107 |         self.linear1 = nn.Linear(embedding_dim, hidden_dim)
108 |         self.relu1 = nn.ReLU()
109 |         #Initialize 3rd hidden layer with dimension = affine_dim
110 |         self.linear2 = nn.Linear(hidden_dim, affine_dim)
111 |         self.relu2 = nn.ReLU()
112 |         self.affine = nn.Linear(affine_dim, heropool_size)
113 |         self.init_emb()
114 | 
115 |     def init_emb(self):
116 |         """
117 |         init embeddings and affine layer. The weights of the 2nd and 3rd hidden
118 |         layers are initialized by Kaiming_norm.
119 |         """
120 |         initrange = 0.5 / self.embedding_dim
121 |         self.embeddings.weight.data.uniform_(-initrange, initrange)
122 |         init.kaiming_normal(self.linear1.weight.data)
123 |         self.linear1.bias.data.zero_()
124 |         init.kaiming_normal(self.linear2.weight.data)
125 |         self.linear2.bias.data.zero_()
126 |         self.affine.weight.data.uniform_(-0, 0)
127 |         self.affine.bias.data.zero_()
128 | 
129 |     def forward(self, inputs):
130 |         """
131 |         inputs:
132 |             inputs: torch.autograd.Variable, size = (N, 5)
133 |         returns:
134 |             out: torch.autograd.Variable, size = (N, heropool_size)
135 |         """
136 |         embeds = self.embeddings(inputs).sum(dim=1)
137 |         pipe = nn.Sequential(self.linear1, self.relu1, self.linear2, self.relu2)
138 |         # skip connection to assist gradient flow
139 |         if self.embedding_dim == self.affine_dim:
140 |             out = self.affine(pipe(embeds) + embeds)
141 |         else:
142 |             out = self.affine(pipe(embeds))
143 |         return out
144 | 


--------------------------------------------------------------------------------
/model/map2vec.py:
--------------------------------------------------------------------------------
  1 | import torch
  2 | import torch.autograd as autograd
  3 | import torch.nn as nn
  4 | import torch.nn.functional as F
  5 | import torch.optim as optim
  6 | from torch.nn import init
  7 | 
  8 | class CBOM(nn.Module):
  9 | 
 10 |     def __init__(self, hero_embeddings, mappool_size):
 11 |         """
 12 |         Initialize an NN with one hidden layer.
 13 |         inputs:
 14 |             hero_embeddings: numpy array
 15 |             mappool_size: int
 16 |         """
 17 |         super().__init__()
 18 |         self.mappool_size = mappool_size
 19 |         self.hero_embeddings_data = hero_embeddings
 20 |         #initialize hero_embeddings from the numpy array
 21 |         self.heropool_size, self.hero_embedding_dim = hero_embeddings.shape
 22 |         self.hero_embeddings = nn.Embedding(self.heropool_size, self.hero_embedding_dim)
 23 | 
 24 |         # for one hidden layer the embedding_dim of map has to the same as heroes
 25 |         self.map_embedding_dim = self.hero_embedding_dim
 26 |         self.map_embeddings = nn.Embedding(self.mappool_size, self.map_embedding_dim)
 27 |         self.init_emb()
 28 | 
 29 |     def init_emb(self):
 30 |         """
 31 |         initialize with kaiming_normal
 32 |         """
 33 |         self.hero_embeddings.weight.data = torch.Tensor(self.hero_embeddings_data)
 34 |         init.kaiming_normal(self.map_embeddings.weight.data)
 35 | 
 36 |     def forward(self, inputs):
 37 |         """
 38 |         inputs:
 39 |             inputs: torch.autograd.Variable, size = (N, 6)
 40 |         returns:
 41 |             out: torch.autograd.Variable, size = (N, mappool_size)
 42 |         """
 43 |         # read all the embeddings out from map_embeddings
 44 |         indexes = autograd.Variable(torch.arange(0, self.mappool_size).long())
 45 |         hero_embeds = self.hero_embeddings(inputs).sum(dim=1)
 46 |         map_embeds = self.map_embeddings(indexes)
 47 |         out = torch.matmul(hero_embeds, map_embeds.t())
 48 |         return out
 49 | 
 50 | class CBOMTrilayer(nn.Module):
 51 |     
 52 |     def __init__(self, hero_embeddings, mappool_size, map_embedding_dim=10, hidden_dim=20):
 53 |         """
 54 |         Initialize an NN with three hidden layers.
 55 |         inputs:
 56 |             hero_embeddings: numpy array
 57 |             mappool_size: int
 58 |             map_embedding_dim: int
 59 |             hidden_dim: int
 60 |         """
 61 |         super().__init__()
 62 |         self.hero_embeddings_data = hero_embeddings
 63 |         self.mappool_size = mappool_size
 64 |         self.map_embedding_dim = map_embedding_dim
 65 |         self.hidden_dim = hidden_dim
 66 | 
 67 |         #initialize hero_embeddings from the numpy array
 68 |         self.heropool_size, self.hero_embedding_dim = hero_embeddings.shape
 69 |         self.hero_embeddings = nn.Embedding(self.heropool_size, self.hero_embedding_dim)
 70 | 
 71 |         self.linear1 = nn.Linear(self.hero_embedding_dim, self.hidden_dim)
 72 |         self.relu1 = nn.ReLU()
 73 |         self.linear2 = nn.Linear(self.hidden_dim, self.map_embedding_dim)
 74 |         self.relu2 = nn.ReLU()
 75 |         self.map_embeddings = nn.Embedding(self.mappool_size, self.map_embedding_dim)
 76 |         self.init_emb()
 77 | 
 78 |     def init_emb(self):
 79 |         """
 80 |         initialize with kaiming_normal
 81 |         """
 82 |         self.hero_embeddings.weight.data = torch.Tensor(self.hero_embeddings_data)
 83 |         init.kaiming_normal(self.map_embeddings.weight.data)
 84 |         init.kaiming_normal(self.linear1.weight.data)
 85 |         self.linear1.bias.data.zero_()
 86 |         init.kaiming_normal(self.linear2.weight.data)
 87 |         self.linear2.bias.data.zero_()
 88 | 
 89 |     def forward(self, inputs):
 90 |         """
 91 |         inputs:
 92 |             inputs: torch.autograd.Variable, size = (N, 6)
 93 |         returns:
 94 |             out: torch.autograd.Variable, size = (N, mappool_size)
 95 |         """
 96 |         # read all the embeddings out from map_embeddings
 97 |         indexes = autograd.Variable(torch.arange(0, self.mappool_size).long())
 98 |         hero_embeds = self.hero_embeddings(inputs).sum(dim=1)
 99 |         pipe = nn.Sequential(self.linear1, self.relu1, self.linear2, self.relu2)
100 |         last = pipe(hero_embeds)
101 |         # skip connection like resnet
102 |         if self.hero_embedding_dim == self.map_embedding_dim:
103 |             last = last + hero_embeds
104 |         map_embeds = self.map_embeddings(indexes)
105 |         out = torch.matmul(last, map_embeds.t())
106 |         return out
107 | 


--------------------------------------------------------------------------------
/output/hero/hero_embddings_2d.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/bowenyang008/hero2vec/f31fa9346d0db00890b8fb36398cee49e9e4e180/output/hero/hero_embddings_2d.png


--------------------------------------------------------------------------------
/output/hero/hero_embeddings.npy:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/bowenyang008/hero2vec/f31fa9346d0db00890b8fb36398cee49e9e4e180/output/hero/hero_embeddings.npy


--------------------------------------------------------------------------------
/output/hero/loss_hitory.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/bowenyang008/hero2vec/f31fa9346d0db00890b8fb36398cee49e9e4e180/output/hero/loss_hitory.png


--------------------------------------------------------------------------------
/output/hero/model.p:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/bowenyang008/hero2vec/f31fa9346d0db00890b8fb36398cee49e9e4e180/output/hero/model.p


--------------------------------------------------------------------------------
/output/map/loss_hitory.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/bowenyang008/hero2vec/f31fa9346d0db00890b8fb36398cee49e9e4e180/output/map/loss_hitory.png


--------------------------------------------------------------------------------
/output/map/map_embddings_2d.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/bowenyang008/hero2vec/f31fa9346d0db00890b8fb36398cee49e9e4e180/output/map/map_embddings_2d.png


--------------------------------------------------------------------------------
/output/map/map_embeddings.npy:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/bowenyang008/hero2vec/f31fa9346d0db00890b8fb36398cee49e9e4e180/output/map/map_embeddings.npy


--------------------------------------------------------------------------------
/setup/install _without_torch.sh:
--------------------------------------------------------------------------------
1 | #!/bin/bash
2 | pip install numpy
3 | pip install pandas
4 | pip install scipy
5 | pip install scikit-learn
6 | pip install matplotlib
7 | 


--------------------------------------------------------------------------------
/setup/install.sh:
--------------------------------------------------------------------------------
1 | #!/bin/bash
2 | pip3 install http://download.pytorch.org/whl/cu80/torch-0.3.0.post4-cp35-cp35m-linux_x86_64.whl
3 | pip3 install torchvision
4 | pip install numpy
5 | pip install pandas
6 | pip install scipy
7 | pip install scikit-learn
8 | pip install matplotlib
9 | 


--------------------------------------------------------------------------------
/train_hero.py:
--------------------------------------------------------------------------------
  1 | import sys
  2 | import pickle
  3 | import pandas as pd
  4 | import numpy as np
  5 | import matplotlib.pyplot as plt
  6 | 
  7 | from model.hero2vec import *
  8 | from utils.evaluation import *
  9 | from utils.dataset import DataFrameIterator
 10 | 
 11 | import torch
 12 | import torch.autograd as autograd
 13 | import torch.nn as nn
 14 | import torch.optim as optim
 15 | from torch.utils.data import DataLoader
 16 | from torch.utils.data import sampler
 17 | from torch.utils.data import Dataset
 18 | 
 19 | def train(model, dataloader, loss_function=nn.CrossEntropyLoss(),
 20 |           init_lr=0.1, epochs=100, lr_decay_epoch = 30,
 21 |           print_epoch = 10, gpu=False):
 22 | 
 23 |     # Cuda is not critical for this task with low dimensionol inputs
 24 |     if gpu and torch.cuda.is_available():
 25 |         model.cuda()
 26 | 
 27 |     losses = []
 28 |     for epoch in range(epochs):
 29 | 
 30 |         # learning rate decay
 31 |         div, mod = divmod(epoch, lr_decay_epoch)
 32 |         if mod == 0:
 33 |             optimizer = optim.SGD(model.parameters(), lr=init_lr*(0.1)**div)
 34 | 
 35 |         total_loss = torch.Tensor([0])
 36 | 
 37 |         # iterate the dataset to load context heroes(team) and center hero(target)
 38 |         for teams, targets in dataloader:
 39 | 
 40 |             if gpu and torch.cuda.is_available():
 41 |                 teams = teams.cuda()
 42 |                 targets = targets.cuda()
 43 | 
 44 |             # wrap the embeddings of the team and target center hero to Variable
 45 |             inputs = autograd.Variable(teams)
 46 |             targets = autograd.Variable(targets.view(-1))
 47 | 
 48 |             # zero out the accumulated gradients
 49 |             model.zero_grad()
 50 | 
 51 |             # Run the forward pass
 52 |             out = model(inputs)
 53 | 
 54 |             # Compute your loss function.
 55 |             loss = loss_function(out, targets)
 56 | 
 57 |             # backpropagate and update the embeddings
 58 |             loss.backward()
 59 |             optimizer.step()
 60 | 
 61 |             # record total loss in this epoch
 62 |             total_loss += loss.cpu().data
 63 | 
 64 |         if epoch % print_epoch == 0:
 65 |             print('epoch: %d, loss: %.3f' % (epoch, total_loss/len(dataloader)))
 66 | 
 67 |         losses.append(total_loss/len(dataloader))
 68 |     # return losses for plot
 69 |     return np.array(losses)
 70 | 
 71 | def save_embeddings(model, filename = 'embeddings.npy'):
 72 |     embeddings = model.embeddings.weight.cpu().data.numpy()
 73 |     np.save(file = filename, arr=embeddings)
 74 | 
 75 | def main():
 76 | 
 77 |     data_dir = sys.argv[1]
 78 |     hero2ix_dir = sys.argv[2]
 79 | 
 80 |     # import DataFrame and hero2ix dictionary
 81 |     heroes_df = pd.read_csv(data_dir, index_col=0)
 82 |     hero2ix_df = pd.read_csv(hero2ix_dir, index_col=0)
 83 |     heroes_df = heroes_df.dropna().reset_index(drop=True)
 84 |     hero2ix = dict(zip(hero2ix_df.hero, hero2ix_df.ID))
 85 |     # heroes = hero2ix_df['hero'].values
 86 | 
 87 |     # train test split
 88 |     split = int(len(heroes_df)*0.9)
 89 |     heroes_train = heroes_df.iloc[:split]
 90 |     heroes_test = heroes_df.iloc[split:]
 91 | 
 92 |     # build dataset generator
 93 |     train_gen = DataFrameIterator(heroes_train, hero2ix)
 94 |     test_gen = DataFrameIterator(heroes_test, hero2ix)
 95 | 
 96 |     # Use Dataloader class in pytorch to generate batched data
 97 |     batch_size = 16
 98 |     loader_train = DataLoader(train_gen, batch_size=batch_size,
 99 |                               sampler=sampler.RandomSampler(train_gen),
100 |                               num_workers=4)
101 |     loader_test = DataLoader(test_gen, batch_size=batch_size,
102 |                               sampler=sampler.SequentialSampler(test_gen),
103 |                               num_workers=4)
104 | 
105 |     # define model, totally three models in hetor2vec.py
106 |     model = CBOH(embedding_dim=10, heropool_size=len(hero2ix))
107 | 
108 |     # define loss function
109 |     loss_function = nn.CrossEntropyLoss()
110 | 
111 |     # run train
112 |     losses = train(model=model, dataloader=loader_train, loss_function=loss_function,
113 |                    init_lr=0.1, epochs=20, lr_decay_epoch=8, print_epoch=2, gpu=False)
114 | 
115 |     # check test accuracy
116 |     print('accuracy: ', accuracy(model, dataloader=loader_test,
117 |                                  batch_size=batch_size, gpu=False))
118 | 
119 |     # save embeddings as numpy arrays
120 |     output_dir = './output/hero/hero_embeddings.npy'
121 |     save_embeddings(model, filename=output_dir)
122 | 
123 |     # pickle model
124 |     pickle_dir = './output/hero/model.p'
125 |     pickle.dump(obj=model, file=open(pickle_dir, 'wb'))
126 | 
127 |     # plot loss vs epoch
128 |     plot_loss(losses, './output/hero/loss_hitory.png')
129 | 
130 |     # project embeddings to 2d plane
131 |     plot_embeddings(model, hero2ix)
132 | 
133 | if __name__ == '__main__':
134 |     main()
135 | 


--------------------------------------------------------------------------------
/train_map.py:
--------------------------------------------------------------------------------
  1 | import sys
  2 | import pandas as pd
  3 | import numpy as np
  4 | import matplotlib.pyplot as plt
  5 | import os
  6 | 
  7 | from model.map2vec import *
  8 | from utils.evaluation import *
  9 | from utils.dataset import MapDataFrameIterator
 10 | 
 11 | import torch
 12 | import torch.autograd as autograd
 13 | import torch.nn as nn
 14 | import torch.optim as optim
 15 | from torch.utils.data import DataLoader
 16 | from torch.utils.data import sampler
 17 | from torch.utils.data import Dataset
 18 | 
 19 | def train(model, dataloader, loss_function=nn.CrossEntropyLoss(),
 20 |           init_lr=0.1, epochs=100, lr_decay_epoch = 30,
 21 |           print_epoch = 10, gpu=False):
 22 | 
 23 |     # Cuda is not critical for this task with low dimensionol inputs
 24 |     if gpu and torch.cuda.is_available():
 25 |         model.cuda()
 26 | 
 27 |     losses = []
 28 |     for epoch in range(epochs):
 29 | 
 30 |         # learning rate decay
 31 |         div, mod = divmod(epoch, lr_decay_epoch)
 32 |         if mod == 0:
 33 |             optimizer = optim.SGD(model.parameters(), lr=init_lr*(0.1)**div)
 34 | 
 35 |         total_loss = torch.Tensor([0])
 36 | 
 37 |         # iterate the dataset to load context heroes(team) and center hero(target)
 38 |         for teams, targets in dataloader:
 39 | 
 40 |             if gpu and torch.cuda.is_available():
 41 |                 teams = teams.cuda()
 42 |                 targets = targets.cuda()
 43 | 
 44 |             # wrap the embeddings of the team and target center hero to Variable
 45 |             inputs = autograd.Variable(teams)
 46 |             targets = autograd.Variable(targets.view(-1))
 47 | 
 48 |             # zero out the accumulated gradients
 49 |             model.zero_grad()
 50 | 
 51 |             # Run the forward pass
 52 |             out = model(inputs)
 53 | 
 54 |             # Compute your loss function.
 55 |             loss = loss_function(out, targets)
 56 | 
 57 |             # backpropagate and update the embeddings
 58 |             loss.backward()
 59 |             optimizer.step()
 60 | 
 61 |             # record total loss in this epoch
 62 |             total_loss += loss.cpu().data
 63 | 
 64 |         if epoch % print_epoch == 0:
 65 |             print('epoch: %d, loss: %.3f' % (epoch, total_loss/len(dataloader)))
 66 | 
 67 |         losses.append(total_loss/len(dataloader))
 68 |     # return losses for plot
 69 |     return np.array(losses)
 70 | 
 71 | def save_embeddings_map(model, filename = 'map_embeddings.npy'):
 72 |     embeddings = model.map_embeddings.weight.cpu().data.numpy()
 73 |     np.save(file = filename, arr=embeddings)
 74 | 
 75 | def main():
 76 | 
 77 |     data_dir = sys.argv[1]
 78 |     hero2ix_dir = sys.argv[2]
 79 |     map2ix_dir = sys.argv[3]
 80 | 
 81 |     # import DataFrame and
 82 |     df = pd.read_csv(data_dir, index_col=0)
 83 |     df = df.dropna().reset_index(drop=True)
 84 | 
 85 |     # hero2ix dictionary
 86 |     hero2ix_df = pd.read_csv(hero2ix_dir, index_col=0)
 87 |     hero2ix = dict(zip(hero2ix_df.hero, hero2ix_df.ID))
 88 | 
 89 |     # map2ix dictionary
 90 |     map2ix_df = pd.read_csv(map2ix_dir, index_col=0)
 91 |     map2ix = map2ix = dict(zip(map2ix_df.map, map2ix_df.ID))
 92 | 
 93 |     # train test split
 94 |     split = int(len(df)*0.9)
 95 |     map_train = df.iloc[:split]
 96 |     map_test = df.iloc[split:]
 97 | 
 98 |     # build dataset generator
 99 |     train_gen = MapDataFrameIterator(map_train, hero2ix, map2ix)
100 |     test_gen = MapDataFrameIterator(map_test, hero2ix, map2ix)
101 | 
102 |     # Use Dataloader class in pytorch to generate batched data
103 |     batch_size = 16
104 |     loader_train = DataLoader(train_gen, batch_size=batch_size,
105 |                               sampler=sampler.RandomSampler(train_gen),
106 |                               num_workers=4)
107 |     loader_test = DataLoader(test_gen, batch_size=batch_size,
108 |                               sampler=sampler.SequentialSampler(test_gen),
109 |                               num_workers=4)
110 | 
111 |     hero_emb_dir = './output/hero/hero_embeddings.npy'
112 |     # define model, totally two models in map2vec.py
113 |     assert os.path.isfile(hero_emb_dir), "hero_embeddings.npy doesn't exist"
114 | 
115 |     hero_embeddings = np.load(hero_emb_dir)
116 |     model = CBOMTrilayer(hero_embeddings=hero_embeddings, mappool_size=len(map2ix),
117 |                          map_embedding_dim=10, hidden_dim=40)
118 | 
119 |     # define loss function
120 |     loss_function = nn.CrossEntropyLoss()
121 | 
122 |     # run train
123 |     losses = train(model=model, dataloader=loader_train, loss_function=loss_function,
124 |                    init_lr=0.1, epochs=20, lr_decay_epoch=8, print_epoch=2, gpu=False)
125 | 
126 |     # check test accuracy
127 |     print('accuracy: ', accuracy(model, dataloader=loader_test,
128 |                                  batch_size=batch_size, gpu=False))
129 | 
130 |     # save embeddings as numpy arrays
131 |     output_dir = './output/map/map_embeddings.npy'
132 |     save_embeddings_map(model, filename=output_dir)
133 | 
134 |     # plot loss vs epoch
135 |     plot_loss(losses, './output/map/loss_hitory.png')
136 | 
137 |     # project embeddings to 2d plane
138 |     plot_embeddings_map(model, map2ix)
139 | 
140 | 
141 | if __name__ == '__main__':
142 |     main()
143 | 


--------------------------------------------------------------------------------
/utils/__init__.py:
--------------------------------------------------------------------------------
1 | from . import dataset
2 | from . import evaluation
3 | from . import prediction
4 | 


--------------------------------------------------------------------------------
/utils/dataset.py:
--------------------------------------------------------------------------------
 1 | import pandas as pd
 2 | import numpy as np
 3 | import torch
 4 | from torch.utils.data import Dataset
 5 | 
 6 | class DataFrameIterator(Dataset):
 7 | 
 8 |     def __init__(self, df, hero2ix):
 9 |         """
10 |         inputs:
11 |             df: pandas Dataframe
12 |             hero2ix: dictionary
13 |         """
14 |         self.df = df
15 |         self.hero2ix = hero2ix
16 | 
17 |     def __len__(self):
18 |         """
19 |         Each team compostions can result in 6 center hero (context heroes)
20 |         """
21 |         return int(len(self.df)*6)
22 | 
23 |     def __getitem__(self, idx):
24 |         """
25 |         inputs:
26 |             idx: int
27 |         returns:
28 |             inputs: torch.LongTensor, size = (5, )
29 |             targets: int
30 |         """
31 |         # Each team composition can give 6 center hero (context heroes)
32 |         # So a specific (context heroes, center_hero) is determined by the team
33 |         # composition and the position of the center hero
34 |         team, center_hero = divmod(idx, 6)
35 | 
36 |         #locate the team
37 |         heroes = list(self.df.iloc[team])
38 | 
39 |         #divide context and center hero
40 |         context_heroes = heroes[:center_hero] + heroes[center_hero + 1:]
41 | 
42 |         team_idxs = list(map(lambda x: int(self.hero2ix[x]), context_heroes))
43 |         center_hero_idx = int(self.hero2ix[heroes[center_hero]])
44 |         inputs = torch.LongTensor(team_idxs)
45 |         targets = center_hero_idx
46 |         return inputs, targets
47 | 
48 | class MapDataFrameIterator(Dataset):
49 | 
50 |     def __init__(self, df, hero2ix, map2ix):
51 |         """
52 |         inputs:
53 |             df: pandas Dataframe
54 |             hero2ix: dictionary
55 |             map2ix: dictionary
56 |         """
57 |         self.df = df
58 |         self.hero2ix = hero2ix
59 |         self.map2ix = map2ix
60 | 
61 |     def __len__(self):
62 |         """
63 |         returns:
64 |             length of DataFrame
65 |         """
66 |         return len(self.df)
67 | 
68 |     def __getitem__(self, idx):
69 |         """
70 |         inputs:
71 |             idx: int
72 |         returns:
73 |             inputs: torch.LongTensor, size = (6, )
74 |             targets: int
75 |         """
76 |         #locate the team and map
77 |         row = self.df.iloc[idx]
78 |         team, map_name = list(row[1:]), row[0]
79 | 
80 |         team_idxs = list(map(lambda x: int(self.hero2ix[x]), team))
81 |         map_idx = int(self.map2ix[map_name])
82 |         inputs = torch.LongTensor(team_idxs)
83 |         targets = map_idx
84 |         return inputs, targets
85 | 


--------------------------------------------------------------------------------
/utils/evaluation.py:
--------------------------------------------------------------------------------
  1 | import pandas as pd
  2 | import numpy as np
  3 | import matplotlib.pyplot as plt
  4 | from sklearn.decomposition import PCA
  5 | 
  6 | import torch
  7 | import torch.autograd as autograd
  8 | import torch.nn as nn
  9 | import torch.optim as optim
 10 | 
 11 | def accuracy(model, dataloader, batch_size, gpu=False):
 12 |     if gpu and torch.cuda.is_available():
 13 |         model.cuda()
 14 |     model.eval()
 15 | 
 16 |     # number of total (context_heroes, center_hero)
 17 |     length = len(dataloader)*batch_size
 18 |     count = 0
 19 |     for teams, targets in dataloader:
 20 |         if gpu and torch.cuda.is_available():
 21 |             teams = teams.cuda()
 22 |             targets = targets.cuda()
 23 |         inputs = autograd.Variable(teams)
 24 |         targets = autograd.Variable(targets.view(-1))
 25 |         out = model(inputs)
 26 | 
 27 |         # idx is the index of the maximum value
 28 |         val, idx = torch.max(out, dim=1)
 29 | 
 30 |         # count how many predictions are right and convert to python int
 31 |         count += idx.eq(targets).sum().cpu().data[0]
 32 |     return count/length
 33 | 
 34 | def make_plot_color(x, y, hero2ix):
 35 | 
 36 |     # divide heroes to their own categories/roles
 37 |     tank = set(['dva', 'orisa', 'reinhardt', 'roadhog', 'winston', 'zarya'])
 38 |     supporter = set(['ana', 'lucio', 'mercy', 'moira', 'symmetra', 'zenyatta'])
 39 |     tanks, supporters, dps = [], [], []
 40 |     for name, idx in hero2ix.items():
 41 |         if name in tank:
 42 |             tanks.append(idx)
 43 |         elif name in supporter:
 44 |             supporters.append(idx)
 45 |         else:
 46 |             dps.append(idx)
 47 | 
 48 |     # plot tank, dps, and supporters respectively
 49 |     att_x, att_y = x[tanks], y[tanks]
 50 |     den_x, den_y = x[supporters], y[supporters]
 51 |     con_x, con_y = x[dps], y[dps]
 52 |     fig = plt.figure(figsize=(16, 12), dpi = 100)
 53 |     ax = plt.subplot(111)
 54 |     marker_size = 200
 55 |     ax.scatter(att_x, att_y, c= 'tomato', s=marker_size)
 56 |     ax.scatter(den_x, den_y, c = 'darkcyan', s=marker_size)
 57 |     ax.scatter(con_x, con_y, c = 'royalblue', s=marker_size)
 58 | 
 59 |     # annotate each hero's name
 60 |     for name, i in hero2ix.items():
 61 |         ax.annotate(name, (x[i], y[i]), fontsize=18)
 62 |     plt.show()
 63 |     fig.savefig('./output/hero/hero_embddings_2d.png')
 64 | 
 65 | def make_plot_color_map(x, y, map2ix):
 66 | 
 67 |     # divide maps to their own categories
 68 |     attacks, defenses, controls = [], [], []
 69 |     for name, idx in map2ix.items():
 70 |         name = name.split('_')
 71 |         if name[1] == 'Attack':
 72 |             attacks.append(idx)
 73 |         elif name[1] == 'Defense':
 74 |             defenses.append(idx)
 75 |         else:
 76 |             controls.append(idx)
 77 | 
 78 |     # plot attack, defense, controls map respectively
 79 |     att_x, att_y = x[attacks], y[attacks]
 80 |     den_x, den_y = x[defenses], y[defenses]
 81 |     con_x, con_y = x[controls], y[controls]
 82 | 
 83 |     fig = plt.figure(figsize=(16, 12), dpi = 100)
 84 |     ax = plt.subplot(111)
 85 |     marker_size = 200
 86 |     ax.scatter(att_x, att_y, c= 'tomato', s=marker_size)
 87 |     ax.scatter(den_x, den_y, c = 'darkcyan', s=marker_size)
 88 |     ax.scatter(con_x, con_y, c = 'royalblue', s=marker_size)
 89 | 
 90 |     # annotate each map's name
 91 |     for name, i in map2ix.items():
 92 |         ax.annotate(name, (x[i], y[i]))
 93 | 
 94 |     plt.show()
 95 |     fig.savefig('./output/map/map_embddings_2d.png')
 96 | 
 97 | def plot_embeddings(model, names):
 98 |     embeddings = model.embeddings.weight.cpu().data.numpy()
 99 | 
100 |     #makes mean at 0
101 |     embeddings -= np.mean(embeddings, axis=0)
102 | 
103 |     # run pca to reduce to 2 dimensions
104 |     pca = PCA(n_components=2)
105 |     embeddings_2d = pca.fit_transform(embeddings)
106 |     x, y = embeddings_2d[:, 0], embeddings_2d[:, 1]
107 |     make_plot_color(x, y, names)
108 | 
109 | def plot_embeddings_map(model, names):
110 |     embeddings = model.map_embeddings.weight.cpu().data.numpy()
111 | 
112 |     #makes mean at 0
113 |     embeddings -= np.mean(embeddings, axis=0)
114 | 
115 |     # run pca to reduce to 2 dimensions
116 |     pca = PCA(n_components=2)
117 |     embeddings_2d = pca.fit_transform(embeddings)
118 |     x, y = embeddings_2d[:, 0], embeddings_2d[:, 1]
119 |     make_plot_color_map(x, y, names)
120 | 
121 | def plot_loss(losses, directory):
122 |     fig = plt.figure(figsize=(8, 6), dpi=100)
123 |     ax = plt.subplot(111)
124 |     ax.plot(losses)
125 |     ax.set_xlabel('Epochs', fontsize=24)
126 |     ax.set_ylabel('Train_loss', fontsize=24)
127 |     fig.savefig(directory)
128 |     plt.close()
129 | 


--------------------------------------------------------------------------------
/utils/prediction.py:
--------------------------------------------------------------------------------
 1 | import torch
 2 | from torch.autograd import Variable
 3 | 
 4 | class Predictor():
 5 | 
 6 |     def __init__(self, model, hero2ix_df):
 7 |         """
 8 |         input:
 9 |             model: pytorch model
10 |             hero2ix_df: pandas DataFrame
11 |         """
12 |         self.model = model
13 |         self.model.eval()
14 |         self.hero2ix_df = hero2ix_df
15 | 
16 |     def predict(self, heroes):
17 |         """
18 |         input:
19 |             heroes: list of str
20 |         return:
21 |             center_hero: str
22 |         """
23 |         assert len(heroes) == 5, 'Input has to be 5 five heroes'
24 | 
25 |         for hero in heroes:
26 |             if hero not in self.hero2ix_df.hero.values:
27 |                 raise KeyError('wrong hero name:' + hero)
28 | 
29 |         # find idxs for heroes
30 |         team_idxs = list(self.hero2ix_df[self.hero2ix_df.hero.isin(heroes)].ID)
31 | 
32 |         inputs = Variable(torch.LongTensor(team_idxs)).view(-1, 5)
33 |         out = self.model(inputs)
34 |         val, idx = torch.max(out, dim=1)
35 | 
36 |         # map hero id to hero name
37 |         center_hero = self.hero2ix_df.hero.loc[int(idx)]
38 |         return center_hero
39 | 


--------------------------------------------------------------------------------