├── README.md ├── inference.py ├── input ├── hero2ix.csv ├── map2ix.csv ├── map_teams.csv └── teams.csv ├── model ├── __init__.py ├── hero2vec.py └── map2vec.py ├── output ├── hero │ ├── hero_embddings_2d.png │ ├── hero_embeddings.npy │ ├── loss_hitory.png │ └── model.p └── map │ ├── loss_hitory.png │ ├── map_embddings_2d.png │ └── map_embeddings.npy ├── setup ├── install _without_torch.sh └── install.sh ├── train_hero.py ├── train_map.py └── utils ├── __init__.py ├── dataset.py ├── evaluation.py └── prediction.py /README.md: -------------------------------------------------------------------------------- 1 | # Hero2Vec 2 | A Machine learning model to understand the game design and player experience of a video game, the Overwatch. 3 | 4 | # Table of Contents 5 | 1. [Introduction](README.md#introduction) 6 | 2. [Challenges](README.md#challenges) 7 | 3. [Motivations](README.md#motivations) 8 | 4. [Model Selection](README.md#model-selection) 9 | 5. [Model Architecture](README.md#model-architecture) 10 | 6. [Input](README.md#input) 11 | 7. [Output](README.md#output) 12 | 8. [Repo Structure](README.md#repo-structure) 13 | 9. [Setup](README.md#setup) 14 | 10. [Usage](README.md#usage) 15 | 16 | # Introduction 17 | 18 | The goal of this project is to predict the outcome (winning-rate) of a team in a video game, particularly multiplayer online games like Overwatch/Dota2/LoL, given the status at a certain time in the game, like kills/deaths/team-compositions, etc. 19 | 20 | This repo focuses on part of the project, namely, modeling the team compositions (or the heroes) and maps in the game. 21 | 22 | # Challenges 23 | 24 | 1. The dataset is not large enough. We only have the results from less than 300 games. 25 | 26 | 2. The dataset consist of a lot of categorical features, like team compositions and maps. A simple one-hot encoding can result in high dimensional sparse input and unfortunately we don't have enough data to conquer the **curse of dimensionality**. Moreover, the team-composition/map plays an extremely important role in the game, so we can't simply drop it. 27 | 28 | # Motivations 29 | 30 | 1. Just like humans (or words), heroes have their own characteristics and also share some **similarities**. So rather than one-hot orthogonal vectors, they can be represented by **distributed representations**. What's more, just like words in a sentence, heroes in a team also have strong **co-occurrence**. So heroes can be modeled in a very similar fashion as **word2vec**, i.e., **hero2vec**. 31 | 32 | 2. The team compositions are widely available online. This is independent of my own dataset and can serve the training of hero2vec just like Wiki corpus used for word2vec. 33 | 34 | 3. By modeling heroes in the game by distributed representations, I can not only address the curse of dimensionality, but also gain valuable information on the game designs of the heroes as well as the how the players appreciate these designs. 35 | 36 | 4. All the above motivations apply to the maps similarly, i.e., **map2vec**. 37 | 38 | # Model Selection 39 | 40 | 1. As mentioned above, heroes in a team have strong co-occurrence, i.e., the conditional probability P(h1|h2.., h6) (6 heroes in a team) is high. h1 doesn't have to be a specific hero, any hero in the team can be this center hero. This is very suitable for the **Continuous Bag of Words (CBOW)** model, since the attributes of a team (or the 5 context heroes) are really a sum of the attributes of all the individuals, unlike the sum of context words in a sentence is not always intuitive. 41 | 42 | 2. The map in the game can be modeled in a similar way. The conditional probability P(map|team) is high. So the weight of the last affine layer of the classifier is the embeddings for the maps. 43 | 44 | # Model Architecture 45 | 46 | 1. hero2vec. The model pipeline is as follows: 47 | `input context heroes (5 heroes)` -> `embeddings` -> `sum` -> `fully connected layers` -> `softmax (center hero)` 48 | 49 | 2. map2vec. The model pipeline is as follows: 50 | `input team (6 heroes)` -> `hero2vev embeddings` -> `sum` -> `fully connected layers` -> `map embeddings` -> `softmax (map)` 51 | 52 | # Input 53 | 54 | 1. `teams.csv` under `input` folder. This is a csv table that contains the team composition. Can be easily changed to other team-based games like Dota2/LoL. 55 | 56 | 2. `map_teams.csv` under `input` folder. This the csv table that contains both the team and map composition. 57 | 58 | 3. `hero2ix.csv` under `input` folder. This is csv table that maps the input hero names to their int ID and further to the embeddings. Can be easily customized in case different name is used for the same hero, e.g., 'dva' (used in this one) is written as 'D.Va'. 59 | 60 | 4. `map2ix.csv` under `input` folder. This is csv table that maps the input map names to their int ID and further to the embeddings. 61 | 62 | # Output 63 | 64 | 1. `hero` folder 65 | Output contains a graph showing the embeddings (after PCA to 2D) of the heroes `hero_embeddings_2d.png`, a numpy array contains the embeddings `hero_embeddings.npy`, a graph of the training loss `loss_history.png` and pickled model `model.p`. For example, the `hero_embeddings_2d.png` looks like: 66 | 67 | 68 | 69 | 2. `map` folder 70 | Output contains a graph showing the embeddings (after PCA to 2D) of the maps `map_embeddings_2d.png`, a numpy array contains the embeddings `map_embeddings.npy` and a graph of the training loss `loss_history.png`. For example, the `map_embeddings_2d.png` looks like: 71 | 72 | 73 | 74 | # Repo Structure 75 | 76 | The directory structure for the repo looks like this: 77 | 78 | ├── README.md 79 | ├── train_hero.py 80 | ├── train_map.py 81 | ├── inference.py 82 | ├── setup 83 | │ └── install.sh 84 | │ └── install_without_torch.sh 85 | ├── model 86 | │ └── __init__.py 87 | │ └── hero2vec.py 88 | │ └── map2vec.py 89 | ├── utils 90 | │ └── __init__.py 91 | │ └── dataset.py 92 | │ └── evaluation.py 93 | │ └── prediction.py 94 | ├── input 95 | │ └── hero2ix.csv 96 | │ └── map2ix.csv 97 | │ └── teams.csv 98 | │ └── map_teams.csv 99 | └── output 100 | ├── hero 101 | │ └── hero_embeddings_2d.png 102 | │ └── hero_embeddings.npy 103 | │ └── loss_history.npy 104 | │ └── model.p 105 | └── map 106 | └── map_embeddings_2d.png 107 | └── map_embeddings.npy 108 | └── loss_history.npy 109 | # Setup 110 | 111 | Under `setup` folder, run: 112 | 113 | `bash install.sh` 114 | 115 | if issues occurs with installing pytorch, please refer to http://pytorch.org/ for installation of pytorch. Then run: 116 | 117 | `bash install_without_torch.sh` 118 | 119 | # Usage 120 | 121 | 1. Train hero2vec. run: `python train_hero.py ./input/teams.csv ./input/hero2ix.csv` 122 | 123 | 2. Train map2vec. run: `python train_map.py ./input/map_teams.csv ./input/hero2ix.csv ./input/map2ix.csv` 124 | 125 | 3. Predict the center hero given five other heroes. run: `python inference.py `. `` contains the hero names of five known members. For example: `python inference.py dva genji tracer lucio winston`. Note: hero names must be in the `hero` column in `hero2ix.csv` in `input` folder. 126 | -------------------------------------------------------------------------------- /inference.py: -------------------------------------------------------------------------------- 1 | import sys 2 | import pickle 3 | 4 | import numpy as np 5 | import pandas as pd 6 | 7 | from utils.prediction import * 8 | 9 | def main(): 10 | 11 | model_dir = './output/hero/model.p' 12 | model = pickle.load(open(model_dir, 'rb')) 13 | 14 | hero2ix_dir = './input/hero2ix.csv' 15 | hero2ix_df = pd.read_csv(hero2ix_dir, index_col=0) 16 | 17 | indicator = Predictor(model, hero2ix_df) 18 | 19 | heroes = sys.argv[1:] 20 | center_hero = indicator.predict(heroes) 21 | print('suggested hero: ', center_hero) 22 | 23 | if __name__ == '__main__': 24 | main() 25 | -------------------------------------------------------------------------------- /input/hero2ix.csv: -------------------------------------------------------------------------------- 1 | ,hero,ID 2 | 0,ana,0 3 | 1,bastion,1 4 | 2,doomfist,2 5 | 3,dva,3 6 | 4,genji,4 7 | 5,hanzo,5 8 | 6,junkrat,6 9 | 7,lucio,7 10 | 8,mccree,8 11 | 9,mei,9 12 | 10,mercy,10 13 | 11,moira,11 14 | 12,orisa,12 15 | 13,pharah,13 16 | 14,reaper,14 17 | 15,reinhardt,15 18 | 16,roadhog,16 19 | 17,soldier76,17 20 | 18,sombra,18 21 | 19,symmetra,19 22 | 20,torbjorn,20 23 | 21,tracer,21 24 | 22,widowmaker,22 25 | 23,winston,23 26 | 24,zarya,24 27 | 25,zenyatta,25 28 | -------------------------------------------------------------------------------- /input/map2ix.csv: -------------------------------------------------------------------------------- 1 | ,map,ID 2 | 0,Dorado_Attack,0 3 | 1,Dorado_Defense,1 4 | 2,Eichenwalde_Attack,2 5 | 3,Eichenwalde_Defense,3 6 | 4,Hanamura_Attack,4 7 | 5,Hanamura_Defense,5 8 | 6,Hollywood_Attack,6 9 | 7,Hollywood_Defense,7 10 | 8,Horizon Lunar Colony_Attack,8 11 | 9,Horizon Lunar Colony_Defense,9 12 | 10,Ilios_Lighthouse,10 13 | 11,Ilios_Ruins,11 14 | 12,Ilios_Well,12 15 | 13,Junkertown_Attack,13 16 | 14,Junkertown_Defense,14 17 | 15,King's Row_Attack,15 18 | 16,King's Row_Defense,16 19 | 17,Lijiang Tower_Control Center,17 20 | 18,Lijiang Tower_Garden,18 21 | 19,Lijiang Tower_Night Market,19 22 | 20,Nepal_Sanctum,20 23 | 21,Nepal_Shrine,21 24 | 22,Nepal_Village,22 25 | 23,Numbani_Attack,23 26 | 24,Numbani_Defense,24 27 | 25,Oasis_City Center,25 28 | 26,Oasis_Gardens,26 29 | 27,Oasis_University,27 30 | 28,Route 66_Attack,28 31 | 29,Route 66_Defense,29 32 | 30,Temple of Anubis_Attack,30 33 | 31,Temple of Anubis_Defense,31 34 | 32,Volskaya Industries_Attack,32 35 | 33,Volskaya Industries_Defense,33 36 | 34,Watchpoint: Gibraltar_Attack,34 37 | 35,Watchpoint: Gibraltar_Defense,35 38 | -------------------------------------------------------------------------------- /model/__init__.py: -------------------------------------------------------------------------------- 1 | from . import hero2vec 2 | from . import map2vec 3 | -------------------------------------------------------------------------------- /model/hero2vec.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import torch.autograd as autograd 3 | import torch.nn as nn 4 | import torch.nn.functional as F 5 | import torch.optim as optim 6 | from torch.nn import init 7 | 8 | class CBOH(nn.Module): 9 | 10 | def __init__(self, heropool_size, embedding_dim): 11 | """ 12 | Initialize an NN with one hidden layer. Weight of the hidden layer is 13 | the embedding. 14 | inputs: 15 | heropool_size: int 16 | embedding_dim: int 17 | """ 18 | super().__init__() 19 | self.embedding_dim = embedding_dim 20 | self.embeddings = nn.Embedding(heropool_size, embedding_dim) 21 | self.affine = nn.Linear(embedding_dim, heropool_size) 22 | self.init_emb() 23 | 24 | def init_emb(self): 25 | """ 26 | init embeddings and affine layer 27 | """ 28 | initrange = 0.5 / self.embedding_dim 29 | self.embeddings.weight.data.uniform_(-initrange, initrange) 30 | self.affine.weight.data.uniform_(-0, 0) 31 | self.affine.bias.data.zero_() 32 | 33 | def forward(self, inputs): 34 | """ 35 | inputs: 36 | inputs: torch.autograd.Variable, size = (N, 5) 37 | returns: 38 | out: torch.autograd.Variable, size = (N, heropool_size) 39 | """ 40 | embeds = self.embeddings(inputs).sum(dim=1) #contiuous 41 | out = self.affine(embeds) 42 | return out 43 | 44 | class CBOHBilayer(nn.Module): 45 | 46 | def __init__(self, heropool_size, embedding_dim, hidden_dim=10): 47 | """ 48 | Initialize an NN with two hidden layers. Weight of the first hidden 49 | layer is the embedding. 50 | inputs: 51 | heropool_size: int 52 | embedding_dim: int 53 | hidden_dim: int 54 | """ 55 | super().__init__() 56 | self.embedding_dim = embedding_dim 57 | self.hidden_dim = hidden_dim 58 | self.embeddings = nn.Embedding(heropool_size, embedding_dim) 59 | #Initialize 2nd hidden layer with dimension = hidden_dim 60 | self.linear1 = nn.Linear(embedding_dim, hidden_dim) 61 | self.relu1 = nn.ReLU() 62 | self.affine = nn.Linear(hidden_dim, heropool_size) 63 | self.init_emb() 64 | 65 | def init_emb(self): 66 | """ 67 | init embeddings and affine layer. The weight of the 2nd hidden layer is 68 | initialized by Kaiming_norm. 69 | """ 70 | initrange = 0.5 / self.embedding_dim 71 | self.embeddings.weight.data.uniform_(-initrange, initrange) 72 | init.kaiming_normal(self.linear1.weight.data) 73 | self.linear1.bias.data.zero_() 74 | self.affine.weight.data.uniform_(-0, 0) 75 | self.affine.bias.data.zero_() 76 | 77 | def forward(self, inputs): 78 | """ 79 | inputs: 80 | inputs: torch.autograd.Variable, size = (N, 5) 81 | returns: 82 | out: torch.autograd.Variable, size = (N, heropool_size) 83 | """ 84 | embeds = self.embeddings(inputs).sum(dim=1) #contiuous 85 | pipe = nn.Sequential(self.linear1, self.relu1, self.affine) 86 | out = pipe(embeds) 87 | return out 88 | 89 | class CBOHTrilayer(nn.Module): 90 | 91 | def __init__(self, heropool_size, embedding_dim, hidden_dim=10, 92 | affine_dim=10): 93 | """ 94 | Initialize an NN with three hidden layers. Weight of the first hidden 95 | layer is the embedding. 96 | inputs: 97 | heropool_size: int 98 | embedding_dim: int 99 | hidden_dim: int 100 | affine_dim: int 101 | """ 102 | super().__init__() 103 | self.embedding_dim = embedding_dim 104 | self.affine_dim = affine_dim 105 | self.embeddings = nn.Embedding(heropool_size, embedding_dim) 106 | #Initialize 2nd hidden layer with dimension = hidden_dim 107 | self.linear1 = nn.Linear(embedding_dim, hidden_dim) 108 | self.relu1 = nn.ReLU() 109 | #Initialize 3rd hidden layer with dimension = affine_dim 110 | self.linear2 = nn.Linear(hidden_dim, affine_dim) 111 | self.relu2 = nn.ReLU() 112 | self.affine = nn.Linear(affine_dim, heropool_size) 113 | self.init_emb() 114 | 115 | def init_emb(self): 116 | """ 117 | init embeddings and affine layer. The weights of the 2nd and 3rd hidden 118 | layers are initialized by Kaiming_norm. 119 | """ 120 | initrange = 0.5 / self.embedding_dim 121 | self.embeddings.weight.data.uniform_(-initrange, initrange) 122 | init.kaiming_normal(self.linear1.weight.data) 123 | self.linear1.bias.data.zero_() 124 | init.kaiming_normal(self.linear2.weight.data) 125 | self.linear2.bias.data.zero_() 126 | self.affine.weight.data.uniform_(-0, 0) 127 | self.affine.bias.data.zero_() 128 | 129 | def forward(self, inputs): 130 | """ 131 | inputs: 132 | inputs: torch.autograd.Variable, size = (N, 5) 133 | returns: 134 | out: torch.autograd.Variable, size = (N, heropool_size) 135 | """ 136 | embeds = self.embeddings(inputs).sum(dim=1) 137 | pipe = nn.Sequential(self.linear1, self.relu1, self.linear2, self.relu2) 138 | # skip connection to assist gradient flow 139 | if self.embedding_dim == self.affine_dim: 140 | out = self.affine(pipe(embeds) + embeds) 141 | else: 142 | out = self.affine(pipe(embeds)) 143 | return out 144 | -------------------------------------------------------------------------------- /model/map2vec.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import torch.autograd as autograd 3 | import torch.nn as nn 4 | import torch.nn.functional as F 5 | import torch.optim as optim 6 | from torch.nn import init 7 | 8 | class CBOM(nn.Module): 9 | 10 | def __init__(self, hero_embeddings, mappool_size): 11 | """ 12 | Initialize an NN with one hidden layer. 13 | inputs: 14 | hero_embeddings: numpy array 15 | mappool_size: int 16 | """ 17 | super().__init__() 18 | self.mappool_size = mappool_size 19 | self.hero_embeddings_data = hero_embeddings 20 | #initialize hero_embeddings from the numpy array 21 | self.heropool_size, self.hero_embedding_dim = hero_embeddings.shape 22 | self.hero_embeddings = nn.Embedding(self.heropool_size, self.hero_embedding_dim) 23 | 24 | # for one hidden layer the embedding_dim of map has to the same as heroes 25 | self.map_embedding_dim = self.hero_embedding_dim 26 | self.map_embeddings = nn.Embedding(self.mappool_size, self.map_embedding_dim) 27 | self.init_emb() 28 | 29 | def init_emb(self): 30 | """ 31 | initialize with kaiming_normal 32 | """ 33 | self.hero_embeddings.weight.data = torch.Tensor(self.hero_embeddings_data) 34 | init.kaiming_normal(self.map_embeddings.weight.data) 35 | 36 | def forward(self, inputs): 37 | """ 38 | inputs: 39 | inputs: torch.autograd.Variable, size = (N, 6) 40 | returns: 41 | out: torch.autograd.Variable, size = (N, mappool_size) 42 | """ 43 | # read all the embeddings out from map_embeddings 44 | indexes = autograd.Variable(torch.arange(0, self.mappool_size).long()) 45 | hero_embeds = self.hero_embeddings(inputs).sum(dim=1) 46 | map_embeds = self.map_embeddings(indexes) 47 | out = torch.matmul(hero_embeds, map_embeds.t()) 48 | return out 49 | 50 | class CBOMTrilayer(nn.Module): 51 | 52 | def __init__(self, hero_embeddings, mappool_size, map_embedding_dim=10, hidden_dim=20): 53 | """ 54 | Initialize an NN with three hidden layers. 55 | inputs: 56 | hero_embeddings: numpy array 57 | mappool_size: int 58 | map_embedding_dim: int 59 | hidden_dim: int 60 | """ 61 | super().__init__() 62 | self.hero_embeddings_data = hero_embeddings 63 | self.mappool_size = mappool_size 64 | self.map_embedding_dim = map_embedding_dim 65 | self.hidden_dim = hidden_dim 66 | 67 | #initialize hero_embeddings from the numpy array 68 | self.heropool_size, self.hero_embedding_dim = hero_embeddings.shape 69 | self.hero_embeddings = nn.Embedding(self.heropool_size, self.hero_embedding_dim) 70 | 71 | self.linear1 = nn.Linear(self.hero_embedding_dim, self.hidden_dim) 72 | self.relu1 = nn.ReLU() 73 | self.linear2 = nn.Linear(self.hidden_dim, self.map_embedding_dim) 74 | self.relu2 = nn.ReLU() 75 | self.map_embeddings = nn.Embedding(self.mappool_size, self.map_embedding_dim) 76 | self.init_emb() 77 | 78 | def init_emb(self): 79 | """ 80 | initialize with kaiming_normal 81 | """ 82 | self.hero_embeddings.weight.data = torch.Tensor(self.hero_embeddings_data) 83 | init.kaiming_normal(self.map_embeddings.weight.data) 84 | init.kaiming_normal(self.linear1.weight.data) 85 | self.linear1.bias.data.zero_() 86 | init.kaiming_normal(self.linear2.weight.data) 87 | self.linear2.bias.data.zero_() 88 | 89 | def forward(self, inputs): 90 | """ 91 | inputs: 92 | inputs: torch.autograd.Variable, size = (N, 6) 93 | returns: 94 | out: torch.autograd.Variable, size = (N, mappool_size) 95 | """ 96 | # read all the embeddings out from map_embeddings 97 | indexes = autograd.Variable(torch.arange(0, self.mappool_size).long()) 98 | hero_embeds = self.hero_embeddings(inputs).sum(dim=1) 99 | pipe = nn.Sequential(self.linear1, self.relu1, self.linear2, self.relu2) 100 | last = pipe(hero_embeds) 101 | # skip connection like resnet 102 | if self.hero_embedding_dim == self.map_embedding_dim: 103 | last = last + hero_embeds 104 | map_embeds = self.map_embeddings(indexes) 105 | out = torch.matmul(last, map_embeds.t()) 106 | return out 107 | -------------------------------------------------------------------------------- /output/hero/hero_embddings_2d.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/bowenyang008/hero2vec/f31fa9346d0db00890b8fb36398cee49e9e4e180/output/hero/hero_embddings_2d.png -------------------------------------------------------------------------------- /output/hero/hero_embeddings.npy: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/bowenyang008/hero2vec/f31fa9346d0db00890b8fb36398cee49e9e4e180/output/hero/hero_embeddings.npy -------------------------------------------------------------------------------- /output/hero/loss_hitory.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/bowenyang008/hero2vec/f31fa9346d0db00890b8fb36398cee49e9e4e180/output/hero/loss_hitory.png -------------------------------------------------------------------------------- /output/hero/model.p: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/bowenyang008/hero2vec/f31fa9346d0db00890b8fb36398cee49e9e4e180/output/hero/model.p -------------------------------------------------------------------------------- /output/map/loss_hitory.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/bowenyang008/hero2vec/f31fa9346d0db00890b8fb36398cee49e9e4e180/output/map/loss_hitory.png -------------------------------------------------------------------------------- /output/map/map_embddings_2d.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/bowenyang008/hero2vec/f31fa9346d0db00890b8fb36398cee49e9e4e180/output/map/map_embddings_2d.png -------------------------------------------------------------------------------- /output/map/map_embeddings.npy: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/bowenyang008/hero2vec/f31fa9346d0db00890b8fb36398cee49e9e4e180/output/map/map_embeddings.npy -------------------------------------------------------------------------------- /setup/install _without_torch.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | pip install numpy 3 | pip install pandas 4 | pip install scipy 5 | pip install scikit-learn 6 | pip install matplotlib 7 | -------------------------------------------------------------------------------- /setup/install.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | pip3 install http://download.pytorch.org/whl/cu80/torch-0.3.0.post4-cp35-cp35m-linux_x86_64.whl 3 | pip3 install torchvision 4 | pip install numpy 5 | pip install pandas 6 | pip install scipy 7 | pip install scikit-learn 8 | pip install matplotlib 9 | -------------------------------------------------------------------------------- /train_hero.py: -------------------------------------------------------------------------------- 1 | import sys 2 | import pickle 3 | import pandas as pd 4 | import numpy as np 5 | import matplotlib.pyplot as plt 6 | 7 | from model.hero2vec import * 8 | from utils.evaluation import * 9 | from utils.dataset import DataFrameIterator 10 | 11 | import torch 12 | import torch.autograd as autograd 13 | import torch.nn as nn 14 | import torch.optim as optim 15 | from torch.utils.data import DataLoader 16 | from torch.utils.data import sampler 17 | from torch.utils.data import Dataset 18 | 19 | def train(model, dataloader, loss_function=nn.CrossEntropyLoss(), 20 | init_lr=0.1, epochs=100, lr_decay_epoch = 30, 21 | print_epoch = 10, gpu=False): 22 | 23 | # Cuda is not critical for this task with low dimensionol inputs 24 | if gpu and torch.cuda.is_available(): 25 | model.cuda() 26 | 27 | losses = [] 28 | for epoch in range(epochs): 29 | 30 | # learning rate decay 31 | div, mod = divmod(epoch, lr_decay_epoch) 32 | if mod == 0: 33 | optimizer = optim.SGD(model.parameters(), lr=init_lr*(0.1)**div) 34 | 35 | total_loss = torch.Tensor([0]) 36 | 37 | # iterate the dataset to load context heroes(team) and center hero(target) 38 | for teams, targets in dataloader: 39 | 40 | if gpu and torch.cuda.is_available(): 41 | teams = teams.cuda() 42 | targets = targets.cuda() 43 | 44 | # wrap the embeddings of the team and target center hero to Variable 45 | inputs = autograd.Variable(teams) 46 | targets = autograd.Variable(targets.view(-1)) 47 | 48 | # zero out the accumulated gradients 49 | model.zero_grad() 50 | 51 | # Run the forward pass 52 | out = model(inputs) 53 | 54 | # Compute your loss function. 55 | loss = loss_function(out, targets) 56 | 57 | # backpropagate and update the embeddings 58 | loss.backward() 59 | optimizer.step() 60 | 61 | # record total loss in this epoch 62 | total_loss += loss.cpu().data 63 | 64 | if epoch % print_epoch == 0: 65 | print('epoch: %d, loss: %.3f' % (epoch, total_loss/len(dataloader))) 66 | 67 | losses.append(total_loss/len(dataloader)) 68 | # return losses for plot 69 | return np.array(losses) 70 | 71 | def save_embeddings(model, filename = 'embeddings.npy'): 72 | embeddings = model.embeddings.weight.cpu().data.numpy() 73 | np.save(file = filename, arr=embeddings) 74 | 75 | def main(): 76 | 77 | data_dir = sys.argv[1] 78 | hero2ix_dir = sys.argv[2] 79 | 80 | # import DataFrame and hero2ix dictionary 81 | heroes_df = pd.read_csv(data_dir, index_col=0) 82 | hero2ix_df = pd.read_csv(hero2ix_dir, index_col=0) 83 | heroes_df = heroes_df.dropna().reset_index(drop=True) 84 | hero2ix = dict(zip(hero2ix_df.hero, hero2ix_df.ID)) 85 | # heroes = hero2ix_df['hero'].values 86 | 87 | # train test split 88 | split = int(len(heroes_df)*0.9) 89 | heroes_train = heroes_df.iloc[:split] 90 | heroes_test = heroes_df.iloc[split:] 91 | 92 | # build dataset generator 93 | train_gen = DataFrameIterator(heroes_train, hero2ix) 94 | test_gen = DataFrameIterator(heroes_test, hero2ix) 95 | 96 | # Use Dataloader class in pytorch to generate batched data 97 | batch_size = 16 98 | loader_train = DataLoader(train_gen, batch_size=batch_size, 99 | sampler=sampler.RandomSampler(train_gen), 100 | num_workers=4) 101 | loader_test = DataLoader(test_gen, batch_size=batch_size, 102 | sampler=sampler.SequentialSampler(test_gen), 103 | num_workers=4) 104 | 105 | # define model, totally three models in hetor2vec.py 106 | model = CBOH(embedding_dim=10, heropool_size=len(hero2ix)) 107 | 108 | # define loss function 109 | loss_function = nn.CrossEntropyLoss() 110 | 111 | # run train 112 | losses = train(model=model, dataloader=loader_train, loss_function=loss_function, 113 | init_lr=0.1, epochs=20, lr_decay_epoch=8, print_epoch=2, gpu=False) 114 | 115 | # check test accuracy 116 | print('accuracy: ', accuracy(model, dataloader=loader_test, 117 | batch_size=batch_size, gpu=False)) 118 | 119 | # save embeddings as numpy arrays 120 | output_dir = './output/hero/hero_embeddings.npy' 121 | save_embeddings(model, filename=output_dir) 122 | 123 | # pickle model 124 | pickle_dir = './output/hero/model.p' 125 | pickle.dump(obj=model, file=open(pickle_dir, 'wb')) 126 | 127 | # plot loss vs epoch 128 | plot_loss(losses, './output/hero/loss_hitory.png') 129 | 130 | # project embeddings to 2d plane 131 | plot_embeddings(model, hero2ix) 132 | 133 | if __name__ == '__main__': 134 | main() 135 | -------------------------------------------------------------------------------- /train_map.py: -------------------------------------------------------------------------------- 1 | import sys 2 | import pandas as pd 3 | import numpy as np 4 | import matplotlib.pyplot as plt 5 | import os 6 | 7 | from model.map2vec import * 8 | from utils.evaluation import * 9 | from utils.dataset import MapDataFrameIterator 10 | 11 | import torch 12 | import torch.autograd as autograd 13 | import torch.nn as nn 14 | import torch.optim as optim 15 | from torch.utils.data import DataLoader 16 | from torch.utils.data import sampler 17 | from torch.utils.data import Dataset 18 | 19 | def train(model, dataloader, loss_function=nn.CrossEntropyLoss(), 20 | init_lr=0.1, epochs=100, lr_decay_epoch = 30, 21 | print_epoch = 10, gpu=False): 22 | 23 | # Cuda is not critical for this task with low dimensionol inputs 24 | if gpu and torch.cuda.is_available(): 25 | model.cuda() 26 | 27 | losses = [] 28 | for epoch in range(epochs): 29 | 30 | # learning rate decay 31 | div, mod = divmod(epoch, lr_decay_epoch) 32 | if mod == 0: 33 | optimizer = optim.SGD(model.parameters(), lr=init_lr*(0.1)**div) 34 | 35 | total_loss = torch.Tensor([0]) 36 | 37 | # iterate the dataset to load context heroes(team) and center hero(target) 38 | for teams, targets in dataloader: 39 | 40 | if gpu and torch.cuda.is_available(): 41 | teams = teams.cuda() 42 | targets = targets.cuda() 43 | 44 | # wrap the embeddings of the team and target center hero to Variable 45 | inputs = autograd.Variable(teams) 46 | targets = autograd.Variable(targets.view(-1)) 47 | 48 | # zero out the accumulated gradients 49 | model.zero_grad() 50 | 51 | # Run the forward pass 52 | out = model(inputs) 53 | 54 | # Compute your loss function. 55 | loss = loss_function(out, targets) 56 | 57 | # backpropagate and update the embeddings 58 | loss.backward() 59 | optimizer.step() 60 | 61 | # record total loss in this epoch 62 | total_loss += loss.cpu().data 63 | 64 | if epoch % print_epoch == 0: 65 | print('epoch: %d, loss: %.3f' % (epoch, total_loss/len(dataloader))) 66 | 67 | losses.append(total_loss/len(dataloader)) 68 | # return losses for plot 69 | return np.array(losses) 70 | 71 | def save_embeddings_map(model, filename = 'map_embeddings.npy'): 72 | embeddings = model.map_embeddings.weight.cpu().data.numpy() 73 | np.save(file = filename, arr=embeddings) 74 | 75 | def main(): 76 | 77 | data_dir = sys.argv[1] 78 | hero2ix_dir = sys.argv[2] 79 | map2ix_dir = sys.argv[3] 80 | 81 | # import DataFrame and 82 | df = pd.read_csv(data_dir, index_col=0) 83 | df = df.dropna().reset_index(drop=True) 84 | 85 | # hero2ix dictionary 86 | hero2ix_df = pd.read_csv(hero2ix_dir, index_col=0) 87 | hero2ix = dict(zip(hero2ix_df.hero, hero2ix_df.ID)) 88 | 89 | # map2ix dictionary 90 | map2ix_df = pd.read_csv(map2ix_dir, index_col=0) 91 | map2ix = map2ix = dict(zip(map2ix_df.map, map2ix_df.ID)) 92 | 93 | # train test split 94 | split = int(len(df)*0.9) 95 | map_train = df.iloc[:split] 96 | map_test = df.iloc[split:] 97 | 98 | # build dataset generator 99 | train_gen = MapDataFrameIterator(map_train, hero2ix, map2ix) 100 | test_gen = MapDataFrameIterator(map_test, hero2ix, map2ix) 101 | 102 | # Use Dataloader class in pytorch to generate batched data 103 | batch_size = 16 104 | loader_train = DataLoader(train_gen, batch_size=batch_size, 105 | sampler=sampler.RandomSampler(train_gen), 106 | num_workers=4) 107 | loader_test = DataLoader(test_gen, batch_size=batch_size, 108 | sampler=sampler.SequentialSampler(test_gen), 109 | num_workers=4) 110 | 111 | hero_emb_dir = './output/hero/hero_embeddings.npy' 112 | # define model, totally two models in map2vec.py 113 | assert os.path.isfile(hero_emb_dir), "hero_embeddings.npy doesn't exist" 114 | 115 | hero_embeddings = np.load(hero_emb_dir) 116 | model = CBOMTrilayer(hero_embeddings=hero_embeddings, mappool_size=len(map2ix), 117 | map_embedding_dim=10, hidden_dim=40) 118 | 119 | # define loss function 120 | loss_function = nn.CrossEntropyLoss() 121 | 122 | # run train 123 | losses = train(model=model, dataloader=loader_train, loss_function=loss_function, 124 | init_lr=0.1, epochs=20, lr_decay_epoch=8, print_epoch=2, gpu=False) 125 | 126 | # check test accuracy 127 | print('accuracy: ', accuracy(model, dataloader=loader_test, 128 | batch_size=batch_size, gpu=False)) 129 | 130 | # save embeddings as numpy arrays 131 | output_dir = './output/map/map_embeddings.npy' 132 | save_embeddings_map(model, filename=output_dir) 133 | 134 | # plot loss vs epoch 135 | plot_loss(losses, './output/map/loss_hitory.png') 136 | 137 | # project embeddings to 2d plane 138 | plot_embeddings_map(model, map2ix) 139 | 140 | 141 | if __name__ == '__main__': 142 | main() 143 | -------------------------------------------------------------------------------- /utils/__init__.py: -------------------------------------------------------------------------------- 1 | from . import dataset 2 | from . import evaluation 3 | from . import prediction 4 | -------------------------------------------------------------------------------- /utils/dataset.py: -------------------------------------------------------------------------------- 1 | import pandas as pd 2 | import numpy as np 3 | import torch 4 | from torch.utils.data import Dataset 5 | 6 | class DataFrameIterator(Dataset): 7 | 8 | def __init__(self, df, hero2ix): 9 | """ 10 | inputs: 11 | df: pandas Dataframe 12 | hero2ix: dictionary 13 | """ 14 | self.df = df 15 | self.hero2ix = hero2ix 16 | 17 | def __len__(self): 18 | """ 19 | Each team compostions can result in 6 center hero (context heroes) 20 | """ 21 | return int(len(self.df)*6) 22 | 23 | def __getitem__(self, idx): 24 | """ 25 | inputs: 26 | idx: int 27 | returns: 28 | inputs: torch.LongTensor, size = (5, ) 29 | targets: int 30 | """ 31 | # Each team composition can give 6 center hero (context heroes) 32 | # So a specific (context heroes, center_hero) is determined by the team 33 | # composition and the position of the center hero 34 | team, center_hero = divmod(idx, 6) 35 | 36 | #locate the team 37 | heroes = list(self.df.iloc[team]) 38 | 39 | #divide context and center hero 40 | context_heroes = heroes[:center_hero] + heroes[center_hero + 1:] 41 | 42 | team_idxs = list(map(lambda x: int(self.hero2ix[x]), context_heroes)) 43 | center_hero_idx = int(self.hero2ix[heroes[center_hero]]) 44 | inputs = torch.LongTensor(team_idxs) 45 | targets = center_hero_idx 46 | return inputs, targets 47 | 48 | class MapDataFrameIterator(Dataset): 49 | 50 | def __init__(self, df, hero2ix, map2ix): 51 | """ 52 | inputs: 53 | df: pandas Dataframe 54 | hero2ix: dictionary 55 | map2ix: dictionary 56 | """ 57 | self.df = df 58 | self.hero2ix = hero2ix 59 | self.map2ix = map2ix 60 | 61 | def __len__(self): 62 | """ 63 | returns: 64 | length of DataFrame 65 | """ 66 | return len(self.df) 67 | 68 | def __getitem__(self, idx): 69 | """ 70 | inputs: 71 | idx: int 72 | returns: 73 | inputs: torch.LongTensor, size = (6, ) 74 | targets: int 75 | """ 76 | #locate the team and map 77 | row = self.df.iloc[idx] 78 | team, map_name = list(row[1:]), row[0] 79 | 80 | team_idxs = list(map(lambda x: int(self.hero2ix[x]), team)) 81 | map_idx = int(self.map2ix[map_name]) 82 | inputs = torch.LongTensor(team_idxs) 83 | targets = map_idx 84 | return inputs, targets 85 | -------------------------------------------------------------------------------- /utils/evaluation.py: -------------------------------------------------------------------------------- 1 | import pandas as pd 2 | import numpy as np 3 | import matplotlib.pyplot as plt 4 | from sklearn.decomposition import PCA 5 | 6 | import torch 7 | import torch.autograd as autograd 8 | import torch.nn as nn 9 | import torch.optim as optim 10 | 11 | def accuracy(model, dataloader, batch_size, gpu=False): 12 | if gpu and torch.cuda.is_available(): 13 | model.cuda() 14 | model.eval() 15 | 16 | # number of total (context_heroes, center_hero) 17 | length = len(dataloader)*batch_size 18 | count = 0 19 | for teams, targets in dataloader: 20 | if gpu and torch.cuda.is_available(): 21 | teams = teams.cuda() 22 | targets = targets.cuda() 23 | inputs = autograd.Variable(teams) 24 | targets = autograd.Variable(targets.view(-1)) 25 | out = model(inputs) 26 | 27 | # idx is the index of the maximum value 28 | val, idx = torch.max(out, dim=1) 29 | 30 | # count how many predictions are right and convert to python int 31 | count += idx.eq(targets).sum().cpu().data[0] 32 | return count/length 33 | 34 | def make_plot_color(x, y, hero2ix): 35 | 36 | # divide heroes to their own categories/roles 37 | tank = set(['dva', 'orisa', 'reinhardt', 'roadhog', 'winston', 'zarya']) 38 | supporter = set(['ana', 'lucio', 'mercy', 'moira', 'symmetra', 'zenyatta']) 39 | tanks, supporters, dps = [], [], [] 40 | for name, idx in hero2ix.items(): 41 | if name in tank: 42 | tanks.append(idx) 43 | elif name in supporter: 44 | supporters.append(idx) 45 | else: 46 | dps.append(idx) 47 | 48 | # plot tank, dps, and supporters respectively 49 | att_x, att_y = x[tanks], y[tanks] 50 | den_x, den_y = x[supporters], y[supporters] 51 | con_x, con_y = x[dps], y[dps] 52 | fig = plt.figure(figsize=(16, 12), dpi = 100) 53 | ax = plt.subplot(111) 54 | marker_size = 200 55 | ax.scatter(att_x, att_y, c= 'tomato', s=marker_size) 56 | ax.scatter(den_x, den_y, c = 'darkcyan', s=marker_size) 57 | ax.scatter(con_x, con_y, c = 'royalblue', s=marker_size) 58 | 59 | # annotate each hero's name 60 | for name, i in hero2ix.items(): 61 | ax.annotate(name, (x[i], y[i]), fontsize=18) 62 | plt.show() 63 | fig.savefig('./output/hero/hero_embddings_2d.png') 64 | 65 | def make_plot_color_map(x, y, map2ix): 66 | 67 | # divide maps to their own categories 68 | attacks, defenses, controls = [], [], [] 69 | for name, idx in map2ix.items(): 70 | name = name.split('_') 71 | if name[1] == 'Attack': 72 | attacks.append(idx) 73 | elif name[1] == 'Defense': 74 | defenses.append(idx) 75 | else: 76 | controls.append(idx) 77 | 78 | # plot attack, defense, controls map respectively 79 | att_x, att_y = x[attacks], y[attacks] 80 | den_x, den_y = x[defenses], y[defenses] 81 | con_x, con_y = x[controls], y[controls] 82 | 83 | fig = plt.figure(figsize=(16, 12), dpi = 100) 84 | ax = plt.subplot(111) 85 | marker_size = 200 86 | ax.scatter(att_x, att_y, c= 'tomato', s=marker_size) 87 | ax.scatter(den_x, den_y, c = 'darkcyan', s=marker_size) 88 | ax.scatter(con_x, con_y, c = 'royalblue', s=marker_size) 89 | 90 | # annotate each map's name 91 | for name, i in map2ix.items(): 92 | ax.annotate(name, (x[i], y[i])) 93 | 94 | plt.show() 95 | fig.savefig('./output/map/map_embddings_2d.png') 96 | 97 | def plot_embeddings(model, names): 98 | embeddings = model.embeddings.weight.cpu().data.numpy() 99 | 100 | #makes mean at 0 101 | embeddings -= np.mean(embeddings, axis=0) 102 | 103 | # run pca to reduce to 2 dimensions 104 | pca = PCA(n_components=2) 105 | embeddings_2d = pca.fit_transform(embeddings) 106 | x, y = embeddings_2d[:, 0], embeddings_2d[:, 1] 107 | make_plot_color(x, y, names) 108 | 109 | def plot_embeddings_map(model, names): 110 | embeddings = model.map_embeddings.weight.cpu().data.numpy() 111 | 112 | #makes mean at 0 113 | embeddings -= np.mean(embeddings, axis=0) 114 | 115 | # run pca to reduce to 2 dimensions 116 | pca = PCA(n_components=2) 117 | embeddings_2d = pca.fit_transform(embeddings) 118 | x, y = embeddings_2d[:, 0], embeddings_2d[:, 1] 119 | make_plot_color_map(x, y, names) 120 | 121 | def plot_loss(losses, directory): 122 | fig = plt.figure(figsize=(8, 6), dpi=100) 123 | ax = plt.subplot(111) 124 | ax.plot(losses) 125 | ax.set_xlabel('Epochs', fontsize=24) 126 | ax.set_ylabel('Train_loss', fontsize=24) 127 | fig.savefig(directory) 128 | plt.close() 129 | -------------------------------------------------------------------------------- /utils/prediction.py: -------------------------------------------------------------------------------- 1 | import torch 2 | from torch.autograd import Variable 3 | 4 | class Predictor(): 5 | 6 | def __init__(self, model, hero2ix_df): 7 | """ 8 | input: 9 | model: pytorch model 10 | hero2ix_df: pandas DataFrame 11 | """ 12 | self.model = model 13 | self.model.eval() 14 | self.hero2ix_df = hero2ix_df 15 | 16 | def predict(self, heroes): 17 | """ 18 | input: 19 | heroes: list of str 20 | return: 21 | center_hero: str 22 | """ 23 | assert len(heroes) == 5, 'Input has to be 5 five heroes' 24 | 25 | for hero in heroes: 26 | if hero not in self.hero2ix_df.hero.values: 27 | raise KeyError('wrong hero name:' + hero) 28 | 29 | # find idxs for heroes 30 | team_idxs = list(self.hero2ix_df[self.hero2ix_df.hero.isin(heroes)].ID) 31 | 32 | inputs = Variable(torch.LongTensor(team_idxs)).view(-1, 5) 33 | out = self.model(inputs) 34 | val, idx = torch.max(out, dim=1) 35 | 36 | # map hero id to hero name 37 | center_hero = self.hero2ix_df.hero.loc[int(idx)] 38 | return center_hero 39 | --------------------------------------------------------------------------------