├── README.md
├── inference.py
├── input
├── hero2ix.csv
├── map2ix.csv
├── map_teams.csv
└── teams.csv
├── model
├── __init__.py
├── hero2vec.py
└── map2vec.py
├── output
├── hero
│ ├── hero_embddings_2d.png
│ ├── hero_embeddings.npy
│ ├── loss_hitory.png
│ └── model.p
└── map
│ ├── loss_hitory.png
│ ├── map_embddings_2d.png
│ └── map_embeddings.npy
├── setup
├── install _without_torch.sh
└── install.sh
├── train_hero.py
├── train_map.py
└── utils
├── __init__.py
├── dataset.py
├── evaluation.py
└── prediction.py
/README.md:
--------------------------------------------------------------------------------
1 | # Hero2Vec
2 | A Machine learning model to understand the game design and player experience of a video game, the Overwatch.
3 |
4 | # Table of Contents
5 | 1. [Introduction](README.md#introduction)
6 | 2. [Challenges](README.md#challenges)
7 | 3. [Motivations](README.md#motivations)
8 | 4. [Model Selection](README.md#model-selection)
9 | 5. [Model Architecture](README.md#model-architecture)
10 | 6. [Input](README.md#input)
11 | 7. [Output](README.md#output)
12 | 8. [Repo Structure](README.md#repo-structure)
13 | 9. [Setup](README.md#setup)
14 | 10. [Usage](README.md#usage)
15 |
16 | # Introduction
17 |
18 | The goal of this project is to predict the outcome (winning-rate) of a team in a video game, particularly multiplayer online games like Overwatch/Dota2/LoL, given the status at a certain time in the game, like kills/deaths/team-compositions, etc.
19 |
20 | This repo focuses on part of the project, namely, modeling the team compositions (or the heroes) and maps in the game.
21 |
22 | # Challenges
23 |
24 | 1. The dataset is not large enough. We only have the results from less than 300 games.
25 |
26 | 2. The dataset consist of a lot of categorical features, like team compositions and maps. A simple one-hot encoding can result in high dimensional sparse input and unfortunately we don't have enough data to conquer the **curse of dimensionality**. Moreover, the team-composition/map plays an extremely important role in the game, so we can't simply drop it.
27 |
28 | # Motivations
29 |
30 | 1. Just like humans (or words), heroes have their own characteristics and also share some **similarities**. So rather than one-hot orthogonal vectors, they can be represented by **distributed representations**. What's more, just like words in a sentence, heroes in a team also have strong **co-occurrence**. So heroes can be modeled in a very similar fashion as **word2vec**, i.e., **hero2vec**.
31 |
32 | 2. The team compositions are widely available online. This is independent of my own dataset and can serve the training of hero2vec just like Wiki corpus used for word2vec.
33 |
34 | 3. By modeling heroes in the game by distributed representations, I can not only address the curse of dimensionality, but also gain valuable information on the game designs of the heroes as well as the how the players appreciate these designs.
35 |
36 | 4. All the above motivations apply to the maps similarly, i.e., **map2vec**.
37 |
38 | # Model Selection
39 |
40 | 1. As mentioned above, heroes in a team have strong co-occurrence, i.e., the conditional probability P(h1|h2.., h6) (6 heroes in a team) is high. h1 doesn't have to be a specific hero, any hero in the team can be this center hero. This is very suitable for the **Continuous Bag of Words (CBOW)** model, since the attributes of a team (or the 5 context heroes) are really a sum of the attributes of all the individuals, unlike the sum of context words in a sentence is not always intuitive.
41 |
42 | 2. The map in the game can be modeled in a similar way. The conditional probability P(map|team) is high. So the weight of the last affine layer of the classifier is the embeddings for the maps.
43 |
44 | # Model Architecture
45 |
46 | 1. hero2vec. The model pipeline is as follows:
47 | `input context heroes (5 heroes)` -> `embeddings` -> `sum` -> `fully connected layers` -> `softmax (center hero)`
48 |
49 | 2. map2vec. The model pipeline is as follows:
50 | `input team (6 heroes)` -> `hero2vev embeddings` -> `sum` -> `fully connected layers` -> `map embeddings` -> `softmax (map)`
51 |
52 | # Input
53 |
54 | 1. `teams.csv` under `input` folder. This is a csv table that contains the team composition. Can be easily changed to other team-based games like Dota2/LoL.
55 |
56 | 2. `map_teams.csv` under `input` folder. This the csv table that contains both the team and map composition.
57 |
58 | 3. `hero2ix.csv` under `input` folder. This is csv table that maps the input hero names to their int ID and further to the embeddings. Can be easily customized in case different name is used for the same hero, e.g., 'dva' (used in this one) is written as 'D.Va'.
59 |
60 | 4. `map2ix.csv` under `input` folder. This is csv table that maps the input map names to their int ID and further to the embeddings.
61 |
62 | # Output
63 |
64 | 1. `hero` folder
65 | Output contains a graph showing the embeddings (after PCA to 2D) of the heroes `hero_embeddings_2d.png`, a numpy array contains the embeddings `hero_embeddings.npy`, a graph of the training loss `loss_history.png` and pickled model `model.p`. For example, the `hero_embeddings_2d.png` looks like:
66 |
67 |
68 |
69 | 2. `map` folder
70 | Output contains a graph showing the embeddings (after PCA to 2D) of the maps `map_embeddings_2d.png`, a numpy array contains the embeddings `map_embeddings.npy` and a graph of the training loss `loss_history.png`. For example, the `map_embeddings_2d.png` looks like:
71 |
72 |
73 |
74 | # Repo Structure
75 |
76 | The directory structure for the repo looks like this:
77 |
78 | ├── README.md
79 | ├── train_hero.py
80 | ├── train_map.py
81 | ├── inference.py
82 | ├── setup
83 | │ └── install.sh
84 | │ └── install_without_torch.sh
85 | ├── model
86 | │ └── __init__.py
87 | │ └── hero2vec.py
88 | │ └── map2vec.py
89 | ├── utils
90 | │ └── __init__.py
91 | │ └── dataset.py
92 | │ └── evaluation.py
93 | │ └── prediction.py
94 | ├── input
95 | │ └── hero2ix.csv
96 | │ └── map2ix.csv
97 | │ └── teams.csv
98 | │ └── map_teams.csv
99 | └── output
100 | ├── hero
101 | │ └── hero_embeddings_2d.png
102 | │ └── hero_embeddings.npy
103 | │ └── loss_history.npy
104 | │ └── model.p
105 | └── map
106 | └── map_embeddings_2d.png
107 | └── map_embeddings.npy
108 | └── loss_history.npy
109 | # Setup
110 |
111 | Under `setup` folder, run:
112 |
113 | `bash install.sh`
114 |
115 | if issues occurs with installing pytorch, please refer to http://pytorch.org/ for installation of pytorch. Then run:
116 |
117 | `bash install_without_torch.sh`
118 |
119 | # Usage
120 |
121 | 1. Train hero2vec. run: `python train_hero.py ./input/teams.csv ./input/hero2ix.csv`
122 |
123 | 2. Train map2vec. run: `python train_map.py ./input/map_teams.csv ./input/hero2ix.csv ./input/map2ix.csv`
124 |
125 | 3. Predict the center hero given five other heroes. run: `python inference.py `. `` contains the hero names of five known members. For example: `python inference.py dva genji tracer lucio winston`. Note: hero names must be in the `hero` column in `hero2ix.csv` in `input` folder.
126 |
--------------------------------------------------------------------------------
/inference.py:
--------------------------------------------------------------------------------
1 | import sys
2 | import pickle
3 |
4 | import numpy as np
5 | import pandas as pd
6 |
7 | from utils.prediction import *
8 |
9 | def main():
10 |
11 | model_dir = './output/hero/model.p'
12 | model = pickle.load(open(model_dir, 'rb'))
13 |
14 | hero2ix_dir = './input/hero2ix.csv'
15 | hero2ix_df = pd.read_csv(hero2ix_dir, index_col=0)
16 |
17 | indicator = Predictor(model, hero2ix_df)
18 |
19 | heroes = sys.argv[1:]
20 | center_hero = indicator.predict(heroes)
21 | print('suggested hero: ', center_hero)
22 |
23 | if __name__ == '__main__':
24 | main()
25 |
--------------------------------------------------------------------------------
/input/hero2ix.csv:
--------------------------------------------------------------------------------
1 | ,hero,ID
2 | 0,ana,0
3 | 1,bastion,1
4 | 2,doomfist,2
5 | 3,dva,3
6 | 4,genji,4
7 | 5,hanzo,5
8 | 6,junkrat,6
9 | 7,lucio,7
10 | 8,mccree,8
11 | 9,mei,9
12 | 10,mercy,10
13 | 11,moira,11
14 | 12,orisa,12
15 | 13,pharah,13
16 | 14,reaper,14
17 | 15,reinhardt,15
18 | 16,roadhog,16
19 | 17,soldier76,17
20 | 18,sombra,18
21 | 19,symmetra,19
22 | 20,torbjorn,20
23 | 21,tracer,21
24 | 22,widowmaker,22
25 | 23,winston,23
26 | 24,zarya,24
27 | 25,zenyatta,25
28 |
--------------------------------------------------------------------------------
/input/map2ix.csv:
--------------------------------------------------------------------------------
1 | ,map,ID
2 | 0,Dorado_Attack,0
3 | 1,Dorado_Defense,1
4 | 2,Eichenwalde_Attack,2
5 | 3,Eichenwalde_Defense,3
6 | 4,Hanamura_Attack,4
7 | 5,Hanamura_Defense,5
8 | 6,Hollywood_Attack,6
9 | 7,Hollywood_Defense,7
10 | 8,Horizon Lunar Colony_Attack,8
11 | 9,Horizon Lunar Colony_Defense,9
12 | 10,Ilios_Lighthouse,10
13 | 11,Ilios_Ruins,11
14 | 12,Ilios_Well,12
15 | 13,Junkertown_Attack,13
16 | 14,Junkertown_Defense,14
17 | 15,King's Row_Attack,15
18 | 16,King's Row_Defense,16
19 | 17,Lijiang Tower_Control Center,17
20 | 18,Lijiang Tower_Garden,18
21 | 19,Lijiang Tower_Night Market,19
22 | 20,Nepal_Sanctum,20
23 | 21,Nepal_Shrine,21
24 | 22,Nepal_Village,22
25 | 23,Numbani_Attack,23
26 | 24,Numbani_Defense,24
27 | 25,Oasis_City Center,25
28 | 26,Oasis_Gardens,26
29 | 27,Oasis_University,27
30 | 28,Route 66_Attack,28
31 | 29,Route 66_Defense,29
32 | 30,Temple of Anubis_Attack,30
33 | 31,Temple of Anubis_Defense,31
34 | 32,Volskaya Industries_Attack,32
35 | 33,Volskaya Industries_Defense,33
36 | 34,Watchpoint: Gibraltar_Attack,34
37 | 35,Watchpoint: Gibraltar_Defense,35
38 |
--------------------------------------------------------------------------------
/model/__init__.py:
--------------------------------------------------------------------------------
1 | from . import hero2vec
2 | from . import map2vec
3 |
--------------------------------------------------------------------------------
/model/hero2vec.py:
--------------------------------------------------------------------------------
1 | import torch
2 | import torch.autograd as autograd
3 | import torch.nn as nn
4 | import torch.nn.functional as F
5 | import torch.optim as optim
6 | from torch.nn import init
7 |
8 | class CBOH(nn.Module):
9 |
10 | def __init__(self, heropool_size, embedding_dim):
11 | """
12 | Initialize an NN with one hidden layer. Weight of the hidden layer is
13 | the embedding.
14 | inputs:
15 | heropool_size: int
16 | embedding_dim: int
17 | """
18 | super().__init__()
19 | self.embedding_dim = embedding_dim
20 | self.embeddings = nn.Embedding(heropool_size, embedding_dim)
21 | self.affine = nn.Linear(embedding_dim, heropool_size)
22 | self.init_emb()
23 |
24 | def init_emb(self):
25 | """
26 | init embeddings and affine layer
27 | """
28 | initrange = 0.5 / self.embedding_dim
29 | self.embeddings.weight.data.uniform_(-initrange, initrange)
30 | self.affine.weight.data.uniform_(-0, 0)
31 | self.affine.bias.data.zero_()
32 |
33 | def forward(self, inputs):
34 | """
35 | inputs:
36 | inputs: torch.autograd.Variable, size = (N, 5)
37 | returns:
38 | out: torch.autograd.Variable, size = (N, heropool_size)
39 | """
40 | embeds = self.embeddings(inputs).sum(dim=1) #contiuous
41 | out = self.affine(embeds)
42 | return out
43 |
44 | class CBOHBilayer(nn.Module):
45 |
46 | def __init__(self, heropool_size, embedding_dim, hidden_dim=10):
47 | """
48 | Initialize an NN with two hidden layers. Weight of the first hidden
49 | layer is the embedding.
50 | inputs:
51 | heropool_size: int
52 | embedding_dim: int
53 | hidden_dim: int
54 | """
55 | super().__init__()
56 | self.embedding_dim = embedding_dim
57 | self.hidden_dim = hidden_dim
58 | self.embeddings = nn.Embedding(heropool_size, embedding_dim)
59 | #Initialize 2nd hidden layer with dimension = hidden_dim
60 | self.linear1 = nn.Linear(embedding_dim, hidden_dim)
61 | self.relu1 = nn.ReLU()
62 | self.affine = nn.Linear(hidden_dim, heropool_size)
63 | self.init_emb()
64 |
65 | def init_emb(self):
66 | """
67 | init embeddings and affine layer. The weight of the 2nd hidden layer is
68 | initialized by Kaiming_norm.
69 | """
70 | initrange = 0.5 / self.embedding_dim
71 | self.embeddings.weight.data.uniform_(-initrange, initrange)
72 | init.kaiming_normal(self.linear1.weight.data)
73 | self.linear1.bias.data.zero_()
74 | self.affine.weight.data.uniform_(-0, 0)
75 | self.affine.bias.data.zero_()
76 |
77 | def forward(self, inputs):
78 | """
79 | inputs:
80 | inputs: torch.autograd.Variable, size = (N, 5)
81 | returns:
82 | out: torch.autograd.Variable, size = (N, heropool_size)
83 | """
84 | embeds = self.embeddings(inputs).sum(dim=1) #contiuous
85 | pipe = nn.Sequential(self.linear1, self.relu1, self.affine)
86 | out = pipe(embeds)
87 | return out
88 |
89 | class CBOHTrilayer(nn.Module):
90 |
91 | def __init__(self, heropool_size, embedding_dim, hidden_dim=10,
92 | affine_dim=10):
93 | """
94 | Initialize an NN with three hidden layers. Weight of the first hidden
95 | layer is the embedding.
96 | inputs:
97 | heropool_size: int
98 | embedding_dim: int
99 | hidden_dim: int
100 | affine_dim: int
101 | """
102 | super().__init__()
103 | self.embedding_dim = embedding_dim
104 | self.affine_dim = affine_dim
105 | self.embeddings = nn.Embedding(heropool_size, embedding_dim)
106 | #Initialize 2nd hidden layer with dimension = hidden_dim
107 | self.linear1 = nn.Linear(embedding_dim, hidden_dim)
108 | self.relu1 = nn.ReLU()
109 | #Initialize 3rd hidden layer with dimension = affine_dim
110 | self.linear2 = nn.Linear(hidden_dim, affine_dim)
111 | self.relu2 = nn.ReLU()
112 | self.affine = nn.Linear(affine_dim, heropool_size)
113 | self.init_emb()
114 |
115 | def init_emb(self):
116 | """
117 | init embeddings and affine layer. The weights of the 2nd and 3rd hidden
118 | layers are initialized by Kaiming_norm.
119 | """
120 | initrange = 0.5 / self.embedding_dim
121 | self.embeddings.weight.data.uniform_(-initrange, initrange)
122 | init.kaiming_normal(self.linear1.weight.data)
123 | self.linear1.bias.data.zero_()
124 | init.kaiming_normal(self.linear2.weight.data)
125 | self.linear2.bias.data.zero_()
126 | self.affine.weight.data.uniform_(-0, 0)
127 | self.affine.bias.data.zero_()
128 |
129 | def forward(self, inputs):
130 | """
131 | inputs:
132 | inputs: torch.autograd.Variable, size = (N, 5)
133 | returns:
134 | out: torch.autograd.Variable, size = (N, heropool_size)
135 | """
136 | embeds = self.embeddings(inputs).sum(dim=1)
137 | pipe = nn.Sequential(self.linear1, self.relu1, self.linear2, self.relu2)
138 | # skip connection to assist gradient flow
139 | if self.embedding_dim == self.affine_dim:
140 | out = self.affine(pipe(embeds) + embeds)
141 | else:
142 | out = self.affine(pipe(embeds))
143 | return out
144 |
--------------------------------------------------------------------------------
/model/map2vec.py:
--------------------------------------------------------------------------------
1 | import torch
2 | import torch.autograd as autograd
3 | import torch.nn as nn
4 | import torch.nn.functional as F
5 | import torch.optim as optim
6 | from torch.nn import init
7 |
8 | class CBOM(nn.Module):
9 |
10 | def __init__(self, hero_embeddings, mappool_size):
11 | """
12 | Initialize an NN with one hidden layer.
13 | inputs:
14 | hero_embeddings: numpy array
15 | mappool_size: int
16 | """
17 | super().__init__()
18 | self.mappool_size = mappool_size
19 | self.hero_embeddings_data = hero_embeddings
20 | #initialize hero_embeddings from the numpy array
21 | self.heropool_size, self.hero_embedding_dim = hero_embeddings.shape
22 | self.hero_embeddings = nn.Embedding(self.heropool_size, self.hero_embedding_dim)
23 |
24 | # for one hidden layer the embedding_dim of map has to the same as heroes
25 | self.map_embedding_dim = self.hero_embedding_dim
26 | self.map_embeddings = nn.Embedding(self.mappool_size, self.map_embedding_dim)
27 | self.init_emb()
28 |
29 | def init_emb(self):
30 | """
31 | initialize with kaiming_normal
32 | """
33 | self.hero_embeddings.weight.data = torch.Tensor(self.hero_embeddings_data)
34 | init.kaiming_normal(self.map_embeddings.weight.data)
35 |
36 | def forward(self, inputs):
37 | """
38 | inputs:
39 | inputs: torch.autograd.Variable, size = (N, 6)
40 | returns:
41 | out: torch.autograd.Variable, size = (N, mappool_size)
42 | """
43 | # read all the embeddings out from map_embeddings
44 | indexes = autograd.Variable(torch.arange(0, self.mappool_size).long())
45 | hero_embeds = self.hero_embeddings(inputs).sum(dim=1)
46 | map_embeds = self.map_embeddings(indexes)
47 | out = torch.matmul(hero_embeds, map_embeds.t())
48 | return out
49 |
50 | class CBOMTrilayer(nn.Module):
51 |
52 | def __init__(self, hero_embeddings, mappool_size, map_embedding_dim=10, hidden_dim=20):
53 | """
54 | Initialize an NN with three hidden layers.
55 | inputs:
56 | hero_embeddings: numpy array
57 | mappool_size: int
58 | map_embedding_dim: int
59 | hidden_dim: int
60 | """
61 | super().__init__()
62 | self.hero_embeddings_data = hero_embeddings
63 | self.mappool_size = mappool_size
64 | self.map_embedding_dim = map_embedding_dim
65 | self.hidden_dim = hidden_dim
66 |
67 | #initialize hero_embeddings from the numpy array
68 | self.heropool_size, self.hero_embedding_dim = hero_embeddings.shape
69 | self.hero_embeddings = nn.Embedding(self.heropool_size, self.hero_embedding_dim)
70 |
71 | self.linear1 = nn.Linear(self.hero_embedding_dim, self.hidden_dim)
72 | self.relu1 = nn.ReLU()
73 | self.linear2 = nn.Linear(self.hidden_dim, self.map_embedding_dim)
74 | self.relu2 = nn.ReLU()
75 | self.map_embeddings = nn.Embedding(self.mappool_size, self.map_embedding_dim)
76 | self.init_emb()
77 |
78 | def init_emb(self):
79 | """
80 | initialize with kaiming_normal
81 | """
82 | self.hero_embeddings.weight.data = torch.Tensor(self.hero_embeddings_data)
83 | init.kaiming_normal(self.map_embeddings.weight.data)
84 | init.kaiming_normal(self.linear1.weight.data)
85 | self.linear1.bias.data.zero_()
86 | init.kaiming_normal(self.linear2.weight.data)
87 | self.linear2.bias.data.zero_()
88 |
89 | def forward(self, inputs):
90 | """
91 | inputs:
92 | inputs: torch.autograd.Variable, size = (N, 6)
93 | returns:
94 | out: torch.autograd.Variable, size = (N, mappool_size)
95 | """
96 | # read all the embeddings out from map_embeddings
97 | indexes = autograd.Variable(torch.arange(0, self.mappool_size).long())
98 | hero_embeds = self.hero_embeddings(inputs).sum(dim=1)
99 | pipe = nn.Sequential(self.linear1, self.relu1, self.linear2, self.relu2)
100 | last = pipe(hero_embeds)
101 | # skip connection like resnet
102 | if self.hero_embedding_dim == self.map_embedding_dim:
103 | last = last + hero_embeds
104 | map_embeds = self.map_embeddings(indexes)
105 | out = torch.matmul(last, map_embeds.t())
106 | return out
107 |
--------------------------------------------------------------------------------
/output/hero/hero_embddings_2d.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/bowenyang008/hero2vec/f31fa9346d0db00890b8fb36398cee49e9e4e180/output/hero/hero_embddings_2d.png
--------------------------------------------------------------------------------
/output/hero/hero_embeddings.npy:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/bowenyang008/hero2vec/f31fa9346d0db00890b8fb36398cee49e9e4e180/output/hero/hero_embeddings.npy
--------------------------------------------------------------------------------
/output/hero/loss_hitory.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/bowenyang008/hero2vec/f31fa9346d0db00890b8fb36398cee49e9e4e180/output/hero/loss_hitory.png
--------------------------------------------------------------------------------
/output/hero/model.p:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/bowenyang008/hero2vec/f31fa9346d0db00890b8fb36398cee49e9e4e180/output/hero/model.p
--------------------------------------------------------------------------------
/output/map/loss_hitory.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/bowenyang008/hero2vec/f31fa9346d0db00890b8fb36398cee49e9e4e180/output/map/loss_hitory.png
--------------------------------------------------------------------------------
/output/map/map_embddings_2d.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/bowenyang008/hero2vec/f31fa9346d0db00890b8fb36398cee49e9e4e180/output/map/map_embddings_2d.png
--------------------------------------------------------------------------------
/output/map/map_embeddings.npy:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/bowenyang008/hero2vec/f31fa9346d0db00890b8fb36398cee49e9e4e180/output/map/map_embeddings.npy
--------------------------------------------------------------------------------
/setup/install _without_torch.sh:
--------------------------------------------------------------------------------
1 | #!/bin/bash
2 | pip install numpy
3 | pip install pandas
4 | pip install scipy
5 | pip install scikit-learn
6 | pip install matplotlib
7 |
--------------------------------------------------------------------------------
/setup/install.sh:
--------------------------------------------------------------------------------
1 | #!/bin/bash
2 | pip3 install http://download.pytorch.org/whl/cu80/torch-0.3.0.post4-cp35-cp35m-linux_x86_64.whl
3 | pip3 install torchvision
4 | pip install numpy
5 | pip install pandas
6 | pip install scipy
7 | pip install scikit-learn
8 | pip install matplotlib
9 |
--------------------------------------------------------------------------------
/train_hero.py:
--------------------------------------------------------------------------------
1 | import sys
2 | import pickle
3 | import pandas as pd
4 | import numpy as np
5 | import matplotlib.pyplot as plt
6 |
7 | from model.hero2vec import *
8 | from utils.evaluation import *
9 | from utils.dataset import DataFrameIterator
10 |
11 | import torch
12 | import torch.autograd as autograd
13 | import torch.nn as nn
14 | import torch.optim as optim
15 | from torch.utils.data import DataLoader
16 | from torch.utils.data import sampler
17 | from torch.utils.data import Dataset
18 |
19 | def train(model, dataloader, loss_function=nn.CrossEntropyLoss(),
20 | init_lr=0.1, epochs=100, lr_decay_epoch = 30,
21 | print_epoch = 10, gpu=False):
22 |
23 | # Cuda is not critical for this task with low dimensionol inputs
24 | if gpu and torch.cuda.is_available():
25 | model.cuda()
26 |
27 | losses = []
28 | for epoch in range(epochs):
29 |
30 | # learning rate decay
31 | div, mod = divmod(epoch, lr_decay_epoch)
32 | if mod == 0:
33 | optimizer = optim.SGD(model.parameters(), lr=init_lr*(0.1)**div)
34 |
35 | total_loss = torch.Tensor([0])
36 |
37 | # iterate the dataset to load context heroes(team) and center hero(target)
38 | for teams, targets in dataloader:
39 |
40 | if gpu and torch.cuda.is_available():
41 | teams = teams.cuda()
42 | targets = targets.cuda()
43 |
44 | # wrap the embeddings of the team and target center hero to Variable
45 | inputs = autograd.Variable(teams)
46 | targets = autograd.Variable(targets.view(-1))
47 |
48 | # zero out the accumulated gradients
49 | model.zero_grad()
50 |
51 | # Run the forward pass
52 | out = model(inputs)
53 |
54 | # Compute your loss function.
55 | loss = loss_function(out, targets)
56 |
57 | # backpropagate and update the embeddings
58 | loss.backward()
59 | optimizer.step()
60 |
61 | # record total loss in this epoch
62 | total_loss += loss.cpu().data
63 |
64 | if epoch % print_epoch == 0:
65 | print('epoch: %d, loss: %.3f' % (epoch, total_loss/len(dataloader)))
66 |
67 | losses.append(total_loss/len(dataloader))
68 | # return losses for plot
69 | return np.array(losses)
70 |
71 | def save_embeddings(model, filename = 'embeddings.npy'):
72 | embeddings = model.embeddings.weight.cpu().data.numpy()
73 | np.save(file = filename, arr=embeddings)
74 |
75 | def main():
76 |
77 | data_dir = sys.argv[1]
78 | hero2ix_dir = sys.argv[2]
79 |
80 | # import DataFrame and hero2ix dictionary
81 | heroes_df = pd.read_csv(data_dir, index_col=0)
82 | hero2ix_df = pd.read_csv(hero2ix_dir, index_col=0)
83 | heroes_df = heroes_df.dropna().reset_index(drop=True)
84 | hero2ix = dict(zip(hero2ix_df.hero, hero2ix_df.ID))
85 | # heroes = hero2ix_df['hero'].values
86 |
87 | # train test split
88 | split = int(len(heroes_df)*0.9)
89 | heroes_train = heroes_df.iloc[:split]
90 | heroes_test = heroes_df.iloc[split:]
91 |
92 | # build dataset generator
93 | train_gen = DataFrameIterator(heroes_train, hero2ix)
94 | test_gen = DataFrameIterator(heroes_test, hero2ix)
95 |
96 | # Use Dataloader class in pytorch to generate batched data
97 | batch_size = 16
98 | loader_train = DataLoader(train_gen, batch_size=batch_size,
99 | sampler=sampler.RandomSampler(train_gen),
100 | num_workers=4)
101 | loader_test = DataLoader(test_gen, batch_size=batch_size,
102 | sampler=sampler.SequentialSampler(test_gen),
103 | num_workers=4)
104 |
105 | # define model, totally three models in hetor2vec.py
106 | model = CBOH(embedding_dim=10, heropool_size=len(hero2ix))
107 |
108 | # define loss function
109 | loss_function = nn.CrossEntropyLoss()
110 |
111 | # run train
112 | losses = train(model=model, dataloader=loader_train, loss_function=loss_function,
113 | init_lr=0.1, epochs=20, lr_decay_epoch=8, print_epoch=2, gpu=False)
114 |
115 | # check test accuracy
116 | print('accuracy: ', accuracy(model, dataloader=loader_test,
117 | batch_size=batch_size, gpu=False))
118 |
119 | # save embeddings as numpy arrays
120 | output_dir = './output/hero/hero_embeddings.npy'
121 | save_embeddings(model, filename=output_dir)
122 |
123 | # pickle model
124 | pickle_dir = './output/hero/model.p'
125 | pickle.dump(obj=model, file=open(pickle_dir, 'wb'))
126 |
127 | # plot loss vs epoch
128 | plot_loss(losses, './output/hero/loss_hitory.png')
129 |
130 | # project embeddings to 2d plane
131 | plot_embeddings(model, hero2ix)
132 |
133 | if __name__ == '__main__':
134 | main()
135 |
--------------------------------------------------------------------------------
/train_map.py:
--------------------------------------------------------------------------------
1 | import sys
2 | import pandas as pd
3 | import numpy as np
4 | import matplotlib.pyplot as plt
5 | import os
6 |
7 | from model.map2vec import *
8 | from utils.evaluation import *
9 | from utils.dataset import MapDataFrameIterator
10 |
11 | import torch
12 | import torch.autograd as autograd
13 | import torch.nn as nn
14 | import torch.optim as optim
15 | from torch.utils.data import DataLoader
16 | from torch.utils.data import sampler
17 | from torch.utils.data import Dataset
18 |
19 | def train(model, dataloader, loss_function=nn.CrossEntropyLoss(),
20 | init_lr=0.1, epochs=100, lr_decay_epoch = 30,
21 | print_epoch = 10, gpu=False):
22 |
23 | # Cuda is not critical for this task with low dimensionol inputs
24 | if gpu and torch.cuda.is_available():
25 | model.cuda()
26 |
27 | losses = []
28 | for epoch in range(epochs):
29 |
30 | # learning rate decay
31 | div, mod = divmod(epoch, lr_decay_epoch)
32 | if mod == 0:
33 | optimizer = optim.SGD(model.parameters(), lr=init_lr*(0.1)**div)
34 |
35 | total_loss = torch.Tensor([0])
36 |
37 | # iterate the dataset to load context heroes(team) and center hero(target)
38 | for teams, targets in dataloader:
39 |
40 | if gpu and torch.cuda.is_available():
41 | teams = teams.cuda()
42 | targets = targets.cuda()
43 |
44 | # wrap the embeddings of the team and target center hero to Variable
45 | inputs = autograd.Variable(teams)
46 | targets = autograd.Variable(targets.view(-1))
47 |
48 | # zero out the accumulated gradients
49 | model.zero_grad()
50 |
51 | # Run the forward pass
52 | out = model(inputs)
53 |
54 | # Compute your loss function.
55 | loss = loss_function(out, targets)
56 |
57 | # backpropagate and update the embeddings
58 | loss.backward()
59 | optimizer.step()
60 |
61 | # record total loss in this epoch
62 | total_loss += loss.cpu().data
63 |
64 | if epoch % print_epoch == 0:
65 | print('epoch: %d, loss: %.3f' % (epoch, total_loss/len(dataloader)))
66 |
67 | losses.append(total_loss/len(dataloader))
68 | # return losses for plot
69 | return np.array(losses)
70 |
71 | def save_embeddings_map(model, filename = 'map_embeddings.npy'):
72 | embeddings = model.map_embeddings.weight.cpu().data.numpy()
73 | np.save(file = filename, arr=embeddings)
74 |
75 | def main():
76 |
77 | data_dir = sys.argv[1]
78 | hero2ix_dir = sys.argv[2]
79 | map2ix_dir = sys.argv[3]
80 |
81 | # import DataFrame and
82 | df = pd.read_csv(data_dir, index_col=0)
83 | df = df.dropna().reset_index(drop=True)
84 |
85 | # hero2ix dictionary
86 | hero2ix_df = pd.read_csv(hero2ix_dir, index_col=0)
87 | hero2ix = dict(zip(hero2ix_df.hero, hero2ix_df.ID))
88 |
89 | # map2ix dictionary
90 | map2ix_df = pd.read_csv(map2ix_dir, index_col=0)
91 | map2ix = map2ix = dict(zip(map2ix_df.map, map2ix_df.ID))
92 |
93 | # train test split
94 | split = int(len(df)*0.9)
95 | map_train = df.iloc[:split]
96 | map_test = df.iloc[split:]
97 |
98 | # build dataset generator
99 | train_gen = MapDataFrameIterator(map_train, hero2ix, map2ix)
100 | test_gen = MapDataFrameIterator(map_test, hero2ix, map2ix)
101 |
102 | # Use Dataloader class in pytorch to generate batched data
103 | batch_size = 16
104 | loader_train = DataLoader(train_gen, batch_size=batch_size,
105 | sampler=sampler.RandomSampler(train_gen),
106 | num_workers=4)
107 | loader_test = DataLoader(test_gen, batch_size=batch_size,
108 | sampler=sampler.SequentialSampler(test_gen),
109 | num_workers=4)
110 |
111 | hero_emb_dir = './output/hero/hero_embeddings.npy'
112 | # define model, totally two models in map2vec.py
113 | assert os.path.isfile(hero_emb_dir), "hero_embeddings.npy doesn't exist"
114 |
115 | hero_embeddings = np.load(hero_emb_dir)
116 | model = CBOMTrilayer(hero_embeddings=hero_embeddings, mappool_size=len(map2ix),
117 | map_embedding_dim=10, hidden_dim=40)
118 |
119 | # define loss function
120 | loss_function = nn.CrossEntropyLoss()
121 |
122 | # run train
123 | losses = train(model=model, dataloader=loader_train, loss_function=loss_function,
124 | init_lr=0.1, epochs=20, lr_decay_epoch=8, print_epoch=2, gpu=False)
125 |
126 | # check test accuracy
127 | print('accuracy: ', accuracy(model, dataloader=loader_test,
128 | batch_size=batch_size, gpu=False))
129 |
130 | # save embeddings as numpy arrays
131 | output_dir = './output/map/map_embeddings.npy'
132 | save_embeddings_map(model, filename=output_dir)
133 |
134 | # plot loss vs epoch
135 | plot_loss(losses, './output/map/loss_hitory.png')
136 |
137 | # project embeddings to 2d plane
138 | plot_embeddings_map(model, map2ix)
139 |
140 |
141 | if __name__ == '__main__':
142 | main()
143 |
--------------------------------------------------------------------------------
/utils/__init__.py:
--------------------------------------------------------------------------------
1 | from . import dataset
2 | from . import evaluation
3 | from . import prediction
4 |
--------------------------------------------------------------------------------
/utils/dataset.py:
--------------------------------------------------------------------------------
1 | import pandas as pd
2 | import numpy as np
3 | import torch
4 | from torch.utils.data import Dataset
5 |
6 | class DataFrameIterator(Dataset):
7 |
8 | def __init__(self, df, hero2ix):
9 | """
10 | inputs:
11 | df: pandas Dataframe
12 | hero2ix: dictionary
13 | """
14 | self.df = df
15 | self.hero2ix = hero2ix
16 |
17 | def __len__(self):
18 | """
19 | Each team compostions can result in 6 center hero (context heroes)
20 | """
21 | return int(len(self.df)*6)
22 |
23 | def __getitem__(self, idx):
24 | """
25 | inputs:
26 | idx: int
27 | returns:
28 | inputs: torch.LongTensor, size = (5, )
29 | targets: int
30 | """
31 | # Each team composition can give 6 center hero (context heroes)
32 | # So a specific (context heroes, center_hero) is determined by the team
33 | # composition and the position of the center hero
34 | team, center_hero = divmod(idx, 6)
35 |
36 | #locate the team
37 | heroes = list(self.df.iloc[team])
38 |
39 | #divide context and center hero
40 | context_heroes = heroes[:center_hero] + heroes[center_hero + 1:]
41 |
42 | team_idxs = list(map(lambda x: int(self.hero2ix[x]), context_heroes))
43 | center_hero_idx = int(self.hero2ix[heroes[center_hero]])
44 | inputs = torch.LongTensor(team_idxs)
45 | targets = center_hero_idx
46 | return inputs, targets
47 |
48 | class MapDataFrameIterator(Dataset):
49 |
50 | def __init__(self, df, hero2ix, map2ix):
51 | """
52 | inputs:
53 | df: pandas Dataframe
54 | hero2ix: dictionary
55 | map2ix: dictionary
56 | """
57 | self.df = df
58 | self.hero2ix = hero2ix
59 | self.map2ix = map2ix
60 |
61 | def __len__(self):
62 | """
63 | returns:
64 | length of DataFrame
65 | """
66 | return len(self.df)
67 |
68 | def __getitem__(self, idx):
69 | """
70 | inputs:
71 | idx: int
72 | returns:
73 | inputs: torch.LongTensor, size = (6, )
74 | targets: int
75 | """
76 | #locate the team and map
77 | row = self.df.iloc[idx]
78 | team, map_name = list(row[1:]), row[0]
79 |
80 | team_idxs = list(map(lambda x: int(self.hero2ix[x]), team))
81 | map_idx = int(self.map2ix[map_name])
82 | inputs = torch.LongTensor(team_idxs)
83 | targets = map_idx
84 | return inputs, targets
85 |
--------------------------------------------------------------------------------
/utils/evaluation.py:
--------------------------------------------------------------------------------
1 | import pandas as pd
2 | import numpy as np
3 | import matplotlib.pyplot as plt
4 | from sklearn.decomposition import PCA
5 |
6 | import torch
7 | import torch.autograd as autograd
8 | import torch.nn as nn
9 | import torch.optim as optim
10 |
11 | def accuracy(model, dataloader, batch_size, gpu=False):
12 | if gpu and torch.cuda.is_available():
13 | model.cuda()
14 | model.eval()
15 |
16 | # number of total (context_heroes, center_hero)
17 | length = len(dataloader)*batch_size
18 | count = 0
19 | for teams, targets in dataloader:
20 | if gpu and torch.cuda.is_available():
21 | teams = teams.cuda()
22 | targets = targets.cuda()
23 | inputs = autograd.Variable(teams)
24 | targets = autograd.Variable(targets.view(-1))
25 | out = model(inputs)
26 |
27 | # idx is the index of the maximum value
28 | val, idx = torch.max(out, dim=1)
29 |
30 | # count how many predictions are right and convert to python int
31 | count += idx.eq(targets).sum().cpu().data[0]
32 | return count/length
33 |
34 | def make_plot_color(x, y, hero2ix):
35 |
36 | # divide heroes to their own categories/roles
37 | tank = set(['dva', 'orisa', 'reinhardt', 'roadhog', 'winston', 'zarya'])
38 | supporter = set(['ana', 'lucio', 'mercy', 'moira', 'symmetra', 'zenyatta'])
39 | tanks, supporters, dps = [], [], []
40 | for name, idx in hero2ix.items():
41 | if name in tank:
42 | tanks.append(idx)
43 | elif name in supporter:
44 | supporters.append(idx)
45 | else:
46 | dps.append(idx)
47 |
48 | # plot tank, dps, and supporters respectively
49 | att_x, att_y = x[tanks], y[tanks]
50 | den_x, den_y = x[supporters], y[supporters]
51 | con_x, con_y = x[dps], y[dps]
52 | fig = plt.figure(figsize=(16, 12), dpi = 100)
53 | ax = plt.subplot(111)
54 | marker_size = 200
55 | ax.scatter(att_x, att_y, c= 'tomato', s=marker_size)
56 | ax.scatter(den_x, den_y, c = 'darkcyan', s=marker_size)
57 | ax.scatter(con_x, con_y, c = 'royalblue', s=marker_size)
58 |
59 | # annotate each hero's name
60 | for name, i in hero2ix.items():
61 | ax.annotate(name, (x[i], y[i]), fontsize=18)
62 | plt.show()
63 | fig.savefig('./output/hero/hero_embddings_2d.png')
64 |
65 | def make_plot_color_map(x, y, map2ix):
66 |
67 | # divide maps to their own categories
68 | attacks, defenses, controls = [], [], []
69 | for name, idx in map2ix.items():
70 | name = name.split('_')
71 | if name[1] == 'Attack':
72 | attacks.append(idx)
73 | elif name[1] == 'Defense':
74 | defenses.append(idx)
75 | else:
76 | controls.append(idx)
77 |
78 | # plot attack, defense, controls map respectively
79 | att_x, att_y = x[attacks], y[attacks]
80 | den_x, den_y = x[defenses], y[defenses]
81 | con_x, con_y = x[controls], y[controls]
82 |
83 | fig = plt.figure(figsize=(16, 12), dpi = 100)
84 | ax = plt.subplot(111)
85 | marker_size = 200
86 | ax.scatter(att_x, att_y, c= 'tomato', s=marker_size)
87 | ax.scatter(den_x, den_y, c = 'darkcyan', s=marker_size)
88 | ax.scatter(con_x, con_y, c = 'royalblue', s=marker_size)
89 |
90 | # annotate each map's name
91 | for name, i in map2ix.items():
92 | ax.annotate(name, (x[i], y[i]))
93 |
94 | plt.show()
95 | fig.savefig('./output/map/map_embddings_2d.png')
96 |
97 | def plot_embeddings(model, names):
98 | embeddings = model.embeddings.weight.cpu().data.numpy()
99 |
100 | #makes mean at 0
101 | embeddings -= np.mean(embeddings, axis=0)
102 |
103 | # run pca to reduce to 2 dimensions
104 | pca = PCA(n_components=2)
105 | embeddings_2d = pca.fit_transform(embeddings)
106 | x, y = embeddings_2d[:, 0], embeddings_2d[:, 1]
107 | make_plot_color(x, y, names)
108 |
109 | def plot_embeddings_map(model, names):
110 | embeddings = model.map_embeddings.weight.cpu().data.numpy()
111 |
112 | #makes mean at 0
113 | embeddings -= np.mean(embeddings, axis=0)
114 |
115 | # run pca to reduce to 2 dimensions
116 | pca = PCA(n_components=2)
117 | embeddings_2d = pca.fit_transform(embeddings)
118 | x, y = embeddings_2d[:, 0], embeddings_2d[:, 1]
119 | make_plot_color_map(x, y, names)
120 |
121 | def plot_loss(losses, directory):
122 | fig = plt.figure(figsize=(8, 6), dpi=100)
123 | ax = plt.subplot(111)
124 | ax.plot(losses)
125 | ax.set_xlabel('Epochs', fontsize=24)
126 | ax.set_ylabel('Train_loss', fontsize=24)
127 | fig.savefig(directory)
128 | plt.close()
129 |
--------------------------------------------------------------------------------
/utils/prediction.py:
--------------------------------------------------------------------------------
1 | import torch
2 | from torch.autograd import Variable
3 |
4 | class Predictor():
5 |
6 | def __init__(self, model, hero2ix_df):
7 | """
8 | input:
9 | model: pytorch model
10 | hero2ix_df: pandas DataFrame
11 | """
12 | self.model = model
13 | self.model.eval()
14 | self.hero2ix_df = hero2ix_df
15 |
16 | def predict(self, heroes):
17 | """
18 | input:
19 | heroes: list of str
20 | return:
21 | center_hero: str
22 | """
23 | assert len(heroes) == 5, 'Input has to be 5 five heroes'
24 |
25 | for hero in heroes:
26 | if hero not in self.hero2ix_df.hero.values:
27 | raise KeyError('wrong hero name:' + hero)
28 |
29 | # find idxs for heroes
30 | team_idxs = list(self.hero2ix_df[self.hero2ix_df.hero.isin(heroes)].ID)
31 |
32 | inputs = Variable(torch.LongTensor(team_idxs)).view(-1, 5)
33 | out = self.model(inputs)
34 | val, idx = torch.max(out, dim=1)
35 |
36 | # map hero id to hero name
37 | center_hero = self.hero2ix_df.hero.loc[int(idx)]
38 | return center_hero
39 |
--------------------------------------------------------------------------------