├── .gitignore
├── img
├── food2.png
├── background.png
├── notraining.gif
├── snakeBody.png
├── snake_new.gif
└── training.gif
├── weights
└── weights.h5
├── requirements.txt
├── logs
├── Scores_20200319190057.txt
└── Scores_20200319190425.txt
├── README.md
├── bayesOpt.py
├── DQN.py
└── snakeClass.py
/.gitignore:
--------------------------------------------------------------------------------
1 | *.idea
2 | __pycache__
--------------------------------------------------------------------------------
/img/food2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/maurock/snake-ga/HEAD/img/food2.png
--------------------------------------------------------------------------------
/img/background.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/maurock/snake-ga/HEAD/img/background.png
--------------------------------------------------------------------------------
/img/notraining.gif:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/maurock/snake-ga/HEAD/img/notraining.gif
--------------------------------------------------------------------------------
/img/snakeBody.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/maurock/snake-ga/HEAD/img/snakeBody.png
--------------------------------------------------------------------------------
/img/snake_new.gif:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/maurock/snake-ga/HEAD/img/snake_new.gif
--------------------------------------------------------------------------------
/img/training.gif:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/maurock/snake-ga/HEAD/img/training.gif
--------------------------------------------------------------------------------
/weights/weights.h5:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/maurock/snake-ga/HEAD/weights/weights.h5
--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
1 | absl-py==0.8.0
2 | certifi==2020.12.5
3 | cmake-example==0.0.1
4 | cycler==0.10.0
5 | dataclasses==0.8
6 | decorator==4.4.2
7 | google-pasta==0.1.7
8 | GPy==1.9.9
9 | GPyOpt==1.2.6
10 | kiwisolver==1.2.0
11 | matplotlib==3.2.0
12 | mkl-fft==1.2.0
13 | mkl-random==1.1.1
14 | mkl-service==2.3.0
15 | msgpack-numpy==0.4.4.3
16 | numpy==1.18.1
17 | olefile==0.46
18 | pandas==0.25.1
19 | paramz==0.9.5
20 | Pillow @ file:///C:/ci/pillow_1603822370986/work
21 | pygame==1.9.6
22 | pyparsing==2.4.7
23 | python-dateutil==2.8.1
24 | pytz==2020.1
25 | scipy @ file:///C:/ci/scipy_1597675683670/work
26 | seaborn==0.10.1
27 | six @ file:///C:/ci/six_1605187303045/work
28 | tabulate==0.8.3
29 | tensorflow-estimator==1.14.0
30 | tensorpack==0.9.4
31 | tgan==0.1.0
32 | torch==1.7.1
33 | typing-extensions @ file:///tmp/build/80754af9/typing_extensions_1598376058250/work
34 | wincertstore==0.2
35 | wrapt==1.11.2
36 |
--------------------------------------------------------------------------------
/logs/Scores_20200319190057.txt:
--------------------------------------------------------------------------------
1 | snake_lr00019371_struct200_300_300_eps0
2 | Params: {'episodes': 2, 'memory_size': 2500, 'batch_size': 500, 'load_weights': True, 'bayesian_optimization': True, 'plot_score': False, 'display': False, 'log_path': 'logs/Scores_20200319190057.txt', 'learning_rate': 0.0001937141601579515, 'first_layer_size': 200, 'second_layer_size': 300, 'third_layer_size': 300, 'epsilon_decay_linear': 0, 'name_scenario': 'snake_lr00019371_struct200_300_300_eps0', 'weights_path': 'weights/weights_snake_lr00019371_struct200_300_300_eps0.h5', 'train': False}
3 | Total score: 2 Mean: 1 Std dev: 0.0
4 |
5 | snake_lr00005829_struct200_50_200_eps2
6 | Params: {'episodes': 2, 'memory_size': 2500, 'batch_size': 500, 'load_weights': True, 'bayesian_optimization': True, 'plot_score': False, 'display': False, 'log_path': 'logs/Scores_20200319190057.txt', 'learning_rate': 5.8287680069085044e-05, 'first_layer_size': 200, 'second_layer_size': 50, 'third_layer_size': 200, 'epsilon_decay_linear': 2, 'name_scenario': 'snake_lr00005829_struct200_50_200_eps2', 'weights_path': 'weights/weights_snake_lr00005829_struct200_50_200_eps2.h5', 'train': False}
7 | Total score: 0 Mean: 0 Std dev: 0.0
8 |
9 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # Deep Reinforcement Learning
2 | ## Project: Train AI to play Snake
3 | *UPDATE:*
4 |
5 | This project has been recently updated and improved:
6 | - It is now possible to optimize the Deep RL approach using Bayesian Optimization.
7 | - The code of Deep Reinforcement Learning was ported from Keras/TF to Pytorch. To see the old version of the code in Keras/TF, please refer to this repository: [snake-ga-tf](https://github.com/maurock/snake-ga-tf).
8 |
9 | ## Introduction
10 | The goal of this project is to develop an AI Bot able to learn how to play the popular game Snake from scratch. In order to do it, I implemented a Deep Reinforcement Learning algorithm. This approach consists in giving the system parameters related to its state, and a positive or negative reward based on its actions. No rules about the game are given, and initially the Bot has no information on what it needs to do. The goal for the system is to figure it out and elaborate a strategy to maximize the score - or the reward. \
11 | We are going to see how a Deep Q-Learning algorithm learns how to play Snake, scoring up to 50 points and showing a solid strategy after only 5 minutes of training. \
12 | Additionally, it is possible to run the Bayesian Optimization method to find the optimal parameters of the Deep neural network, as well as some parameters of the Deep RL approach.
13 |
14 | ## Install
15 | This project requires Python 3.6 with the pygame library installed, as well as Pytorch. If you encounter any error with `torch=1.7.1`, you might need to install Visual C++ 2015-2019 (or simply downgrade your pytorch version, it should be fine). \
16 | The full list of requirements is in `requirements.txt`.
17 | ```bash
18 | git clone git@github.com:maurock/snake-ga.git
19 | ```
20 |
21 | ## Run
22 | To run and show the game, executes in the snake-ga folder:
23 |
24 | ```python
25 | python snakeClass.py
26 | ```
27 | Arguments description:
28 |
29 | - --display - Type bool, default True, display or not game view
30 | - --speed - Type integer, default 50, game speed
31 |
32 | The default configuration loads the file *weights/weights.h5* and runs a test.
33 |
34 | To train the agent, set in the file snakeClass.py:
35 | - `params['train'] = True`
36 | The parameters of the Deep neural network can be changed in *snakeClass.py* by modifying the dictionary `params` in the function `define_parameters()`
37 |
38 | If you run snakeClass.py from the command line, you can set the arguments `--display=False` and `--speed=0`. This way, the game display is not shown and the training phase is faster.
39 |
40 | ## Optimize Deep RL with Bayesian Optimization
41 | To optimize the Deep neural network and additional parameters, run:
42 |
43 | ```python
44 | python snakeClass.py --bayesianopt=True
45 | ```
46 |
47 | This method uses Bayesian optimization to optimize some parameters of Deep RL. The parameters and the features' search space can be modified in *bayesOpt.py* by editing the `optim_params` dictionary in `optimize_RL`.
48 |
49 | ## For Mac users
50 | It seems there is a OSX specific problem, since many users cannot see the game running.
51 | To fix this problem, in update_screen(), add this line.
52 |
53 | ```
54 | def update_screen():
55 | pygame.display.update()
56 | pygame.event.get() # <--- Add this line ###
57 | ```
58 |
--------------------------------------------------------------------------------
/bayesOpt.py:
--------------------------------------------------------------------------------
1 | # -*- coding: utf-8 -*-
2 | """
3 | Created on Sun Mar 15 21:10:29 2020
4 |
5 | @author: mauro
6 | """
7 | from snakeClass import run
8 | from GPyOpt.methods import BayesianOptimization
9 | import datetime
10 |
11 | ################################################
12 | # Set parameters for Bayesian Optimization #
13 | ################################################
14 |
15 | class BayesianOptimizer():
16 | def __init__(self, params):
17 | self.params = params
18 |
19 | def optimize_RL(self):
20 | def optimize(inputs):
21 | print("INPUT", inputs)
22 | inputs = inputs[0]
23 |
24 | # Variables to optimize
25 | self.params["learning_rate"] = inputs[0]
26 | lr_string = '{:.8f}'.format(self.params["learning_rate"])[2:]
27 | self.params["first_layer_size"] = int(inputs[1])
28 | self.params["second_layer_size"] = int(inputs[2])
29 | self.params["third_layer_size"] = int(inputs[3])
30 | self.params["epsilon_decay_linear"] = int(inputs[4])
31 |
32 | self.params['name_scenario'] = 'snake_lr{}_struct{}_{}_{}_eps{}'.format(lr_string,
33 | self.params['first_layer_size'],
34 | self.params['second_layer_size'],
35 | self.params['third_layer_size'],
36 | self.params['epsilon_decay_linear'])
37 |
38 | self.params['weights_path'] = 'weights/weights_' + self.params['name_scenario'] + '.h5'
39 | self.params['load_weights'] = False
40 | self.params['train'] = True
41 | print(self.params)
42 | score, mean, stdev = run(self.params)
43 | print('Total score: {} Mean: {} Std dev: {}'.format(score, mean, stdev))
44 | with open(self.params['log_path'], 'a') as f:
45 | f.write(str(self.params['name_scenario']) + '\n')
46 | f.write('Params: ' + str(self.params) + '\n')
47 | return score
48 |
49 | optim_params = [
50 | {"name": "learning_rate", "type": "continuous", "domain": (0.00005, 0.001)},
51 | {"name": "first_layer_size", "type": "discrete", "domain": (20,50,100,200)},
52 | {"name": "second_layer_size", "type": "discrete", "domain": (20,50,100,200)},
53 | {"name": "third_layer_size", "type": "discrete", "domain": (20,50,100,200)},
54 | {"name":'epsilon_decay_linear', "type": "discrete", "domain": (self.params['episodes']*0.2,
55 | self.params['episodes']*0.4,
56 | self.params['episodes']*0.6,
57 | self.params['episodes']*0.8,
58 | self.params['episodes']*1)}
59 | ]
60 |
61 | bayes_optimizer = BayesianOptimization(f=optimize,
62 | domain=optim_params,
63 | initial_design_numdata=6,
64 | acquisition_type="EI",
65 | exact_feval=True,
66 | maximize=True)
67 |
68 | bayes_optimizer.run_optimization(max_iter=20)
69 | print('Optimized learning rate: ', bayes_optimizer.x_opt[0])
70 | print('Optimized first layer: ', bayes_optimizer.x_opt[1])
71 | print('Optimized second layer: ', bayes_optimizer.x_opt[2])
72 | print('Optimized third layer: ', bayes_optimizer.x_opt[3])
73 | print('Optimized epsilon linear decay: ', bayes_optimizer.x_opt[4])
74 | return self.params
75 |
76 |
77 | ##################
78 | # Main #
79 | ##################
80 | if __name__ == '__main__':
81 | # Define optimizer
82 | bayesOpt = BayesianOptimizer(params)
83 | bayesOpt.optimize_RL()
--------------------------------------------------------------------------------
/DQN.py:
--------------------------------------------------------------------------------
1 | import random
2 | import numpy as np
3 | import pandas as pd
4 | from operator import add
5 | import collections
6 | import torch
7 | import torch.nn as nn
8 | import torch.nn.functional as F
9 | import torch.optim as optim
10 | import copy
11 | DEVICE = 'cpu' # 'cuda' if torch.cuda.is_available() else 'cpu'
12 |
13 | class DQNAgent(torch.nn.Module):
14 | def __init__(self, params):
15 | super().__init__()
16 | self.reward = 0
17 | self.gamma = 0.9
18 | self.dataframe = pd.DataFrame()
19 | self.short_memory = np.array([])
20 | self.agent_target = 1
21 | self.agent_predict = 0
22 | self.learning_rate = params['learning_rate']
23 | self.epsilon = 1
24 | self.actual = []
25 | self.first_layer = params['first_layer_size']
26 | self.second_layer = params['second_layer_size']
27 | self.third_layer = params['third_layer_size']
28 | self.memory = collections.deque(maxlen=params['memory_size'])
29 | self.weights = params['weights_path']
30 | self.load_weights = params['load_weights']
31 | self.optimizer = None
32 | self.network()
33 |
34 | def network(self):
35 | # Layers
36 | self.f1 = nn.Linear(11, self.first_layer)
37 | self.f2 = nn.Linear(self.first_layer, self.second_layer)
38 | self.f3 = nn.Linear(self.second_layer, self.third_layer)
39 | self.f4 = nn.Linear(self.third_layer, 3)
40 | # weights
41 | if self.load_weights:
42 | self.model = self.load_state_dict(torch.load(self.weights))
43 | print("weights loaded")
44 |
45 | def forward(self, x):
46 | x = F.relu(self.f1(x))
47 | x = F.relu(self.f2(x))
48 | x = F.relu(self.f3(x))
49 | x = F.softmax(self.f4(x), dim=-1)
50 | return x
51 |
52 | def get_state(self, game, player, food):
53 | """
54 | Return the state.
55 | The state is a numpy array of 11 values, representing:
56 | - Danger 1 OR 2 steps ahead
57 | - Danger 1 OR 2 steps on the right
58 | - Danger 1 OR 2 steps on the left
59 | - Snake is moving left
60 | - Snake is moving right
61 | - Snake is moving up
62 | - Snake is moving down
63 | - The food is on the left
64 | - The food is on the right
65 | - The food is on the upper side
66 | - The food is on the lower side
67 | """
68 | state = [
69 | (player.x_change == 20 and player.y_change == 0 and ((list(map(add, player.position[-1], [20, 0])) in player.position) or
70 | player.position[-1][0] + 20 >= (game.game_width - 20))) or (player.x_change == -20 and player.y_change == 0 and ((list(map(add, player.position[-1], [-20, 0])) in player.position) or
71 | player.position[-1][0] - 20 < 20)) or (player.x_change == 0 and player.y_change == -20 and ((list(map(add, player.position[-1], [0, -20])) in player.position) or
72 | player.position[-1][-1] - 20 < 20)) or (player.x_change == 0 and player.y_change == 20 and ((list(map(add, player.position[-1], [0, 20])) in player.position) or
73 | player.position[-1][-1] + 20 >= (game.game_height-20))), # danger straight
74 |
75 | (player.x_change == 0 and player.y_change == -20 and ((list(map(add,player.position[-1],[20, 0])) in player.position) or
76 | player.position[ -1][0] + 20 > (game.game_width-20))) or (player.x_change == 0 and player.y_change == 20 and ((list(map(add,player.position[-1],
77 | [-20,0])) in player.position) or player.position[-1][0] - 20 < 20)) or (player.x_change == -20 and player.y_change == 0 and ((list(map(
78 | add,player.position[-1],[0,-20])) in player.position) or player.position[-1][-1] - 20 < 20)) or (player.x_change == 20 and player.y_change == 0 and (
79 | (list(map(add,player.position[-1],[0,20])) in player.position) or player.position[-1][
80 | -1] + 20 >= (game.game_height-20))), # danger right
81 |
82 | (player.x_change == 0 and player.y_change == 20 and ((list(map(add,player.position[-1],[20,0])) in player.position) or
83 | player.position[-1][0] + 20 > (game.game_width-20))) or (player.x_change == 0 and player.y_change == -20 and ((list(map(
84 | add, player.position[-1],[-20,0])) in player.position) or player.position[-1][0] - 20 < 20)) or (player.x_change == 20 and player.y_change == 0 and (
85 | (list(map(add,player.position[-1],[0,-20])) in player.position) or player.position[-1][-1] - 20 < 20)) or (
86 | player.x_change == -20 and player.y_change == 0 and ((list(map(add,player.position[-1],[0,20])) in player.position) or
87 | player.position[-1][-1] + 20 >= (game.game_height-20))), #danger left
88 |
89 |
90 | player.x_change == -20, # move left
91 | player.x_change == 20, # move right
92 | player.y_change == -20, # move up
93 | player.y_change == 20, # move down
94 | food.x_food < player.x, # food left
95 | food.x_food > player.x, # food right
96 | food.y_food < player.y, # food up
97 | food.y_food > player.y # food down
98 | ]
99 |
100 | for i in range(len(state)):
101 | if state[i]:
102 | state[i]=1
103 | else:
104 | state[i]=0
105 |
106 | return np.asarray(state)
107 |
108 | def set_reward(self, player, crash):
109 | """
110 | Return the reward.
111 | The reward is:
112 | -10 when Snake crashes.
113 | +10 when Snake eats food
114 | 0 otherwise
115 | """
116 | self.reward = 0
117 | if crash:
118 | self.reward = -10
119 | return self.reward
120 | if player.eaten:
121 | self.reward = 10
122 | return self.reward
123 |
124 | def remember(self, state, action, reward, next_state, done):
125 | """
126 | Store the tuple in a
127 | memory buffer for replay memory.
128 | """
129 | self.memory.append((state, action, reward, next_state, done))
130 |
131 | def replay_new(self, memory, batch_size):
132 | """
133 | Replay memory.
134 | """
135 | if len(memory) > batch_size:
136 | minibatch = random.sample(memory, batch_size)
137 | else:
138 | minibatch = memory
139 | for state, action, reward, next_state, done in minibatch:
140 | self.train()
141 | torch.set_grad_enabled(True)
142 | target = reward
143 | next_state_tensor = torch.tensor(np.expand_dims(next_state, 0), dtype=torch.float32).to(DEVICE)
144 | state_tensor = torch.tensor(np.expand_dims(state, 0), dtype=torch.float32, requires_grad=True).to(DEVICE)
145 | if not done:
146 | target = reward + self.gamma * torch.max(self.forward(next_state_tensor)[0])
147 | output = self.forward(state_tensor)
148 | target_f = output.clone()
149 | target_f[0][np.argmax(action)] = target
150 | target_f.detach()
151 | self.optimizer.zero_grad()
152 | loss = F.mse_loss(output, target_f)
153 | loss.backward()
154 | self.optimizer.step()
155 |
156 | def train_short_memory(self, state, action, reward, next_state, done):
157 | """
158 | Train the DQN agent on the
159 | tuple at the current timestep.
160 | """
161 | self.train()
162 | torch.set_grad_enabled(True)
163 | target = reward
164 | next_state_tensor = torch.tensor(next_state.reshape((1, 11)), dtype=torch.float32).to(DEVICE)
165 | state_tensor = torch.tensor(state.reshape((1, 11)), dtype=torch.float32, requires_grad=True).to(DEVICE)
166 | if not done:
167 | target = reward + self.gamma * torch.max(self.forward(next_state_tensor[0]))
168 | output = self.forward(state_tensor)
169 | target_f = output.clone()
170 | target_f[0][np.argmax(action)] = target
171 | target_f.detach()
172 | self.optimizer.zero_grad()
173 | loss = F.mse_loss(output, target_f)
174 | loss.backward()
175 | self.optimizer.step()
--------------------------------------------------------------------------------
/logs/Scores_20200319190425.txt:
--------------------------------------------------------------------------------
1 | snake_lr00066711_struct100_200_300_eps120
2 | Params: {'episodes': 150, 'memory_size': 2500, 'batch_size': 500, 'load_weights': True, 'bayesian_optimization': True, 'plot_score': False, 'display': False, 'log_path': 'logs/Scores_20200319190425.txt', 'learning_rate': 0.0006671088271327207, 'first_layer_size': 100, 'second_layer_size': 200, 'third_layer_size': 300, 'epsilon_decay_linear': 120, 'name_scenario': 'snake_lr00066711_struct100_200_300_eps120', 'weights_path': 'weights/weights_snake_lr00066711_struct100_200_300_eps120.h5', 'train': False}
3 | Total score: 1590 Mean: 10.6 Std dev: 6.260561546098179
4 |
5 | snake_lr00088417_struct100_200_300_eps30
6 | Params: {'episodes': 150, 'memory_size': 2500, 'batch_size': 500, 'load_weights': True, 'bayesian_optimization': True, 'plot_score': False, 'display': False, 'log_path': 'logs/Scores_20200319190425.txt', 'learning_rate': 0.0008841734649773743, 'first_layer_size': 100, 'second_layer_size': 200, 'third_layer_size': 300, 'epsilon_decay_linear': 30, 'name_scenario': 'snake_lr00088417_struct100_200_300_eps30', 'weights_path': 'weights/weights_snake_lr00088417_struct100_200_300_eps30.h5', 'train': False}
7 | Total score: 2467 Mean: 16.446666666666665 Std dev: 10.621958177554978
8 |
9 | snake_lr00053319_struct100_50_100_eps120
10 | Params: {'episodes': 150, 'memory_size': 2500, 'batch_size': 500, 'load_weights': True, 'bayesian_optimization': True, 'plot_score': False, 'display': False, 'log_path': 'logs/Scores_20200319190425.txt', 'learning_rate': 0.000533193546882061, 'first_layer_size': 100, 'second_layer_size': 50, 'third_layer_size': 100, 'epsilon_decay_linear': 120, 'name_scenario': 'snake_lr00053319_struct100_50_100_eps120', 'weights_path': 'weights/weights_snake_lr00053319_struct100_50_100_eps120.h5', 'train': False}
11 | Total score: 600 Mean: 4 Std dev: 1.900194267708846
12 |
13 | snake_lr00051081_struct200_100_300_eps90
14 | Params: {'episodes': 150, 'memory_size': 2500, 'batch_size': 500, 'load_weights': True, 'bayesian_optimization': True, 'plot_score': False, 'display': False, 'log_path': 'logs/Scores_20200319190425.txt', 'learning_rate': 0.0005108060711725915, 'first_layer_size': 200, 'second_layer_size': 100, 'third_layer_size': 300, 'epsilon_decay_linear': 90, 'name_scenario': 'snake_lr00051081_struct200_100_300_eps90', 'weights_path': 'weights/weights_snake_lr00051081_struct200_100_300_eps90.h5', 'train': False}
15 | Total score: 3017 Mean: 20.113333333333333 Std dev: 9.613006196980043
16 |
17 | snake_lr00007919_struct300_200_100_eps90
18 | Params: {'episodes': 150, 'memory_size': 2500, 'batch_size': 500, 'load_weights': True, 'bayesian_optimization': True, 'plot_score': False, 'display': False, 'log_path': 'logs/Scores_20200319190425.txt', 'learning_rate': 7.918794353236998e-05, 'first_layer_size': 300, 'second_layer_size': 200, 'third_layer_size': 100, 'epsilon_decay_linear': 90, 'name_scenario': 'snake_lr00007919_struct300_200_100_eps90', 'weights_path': 'weights/weights_snake_lr00007919_struct300_200_100_eps90.h5', 'train': False}
19 | Total score: 2183 Mean: 14.553333333333333 Std dev: 8.807818877456329
20 |
21 | snake_lr00039636_struct200_100_200_eps60
22 | Params: {'episodes': 150, 'memory_size': 2500, 'batch_size': 500, 'load_weights': True, 'bayesian_optimization': True, 'plot_score': False, 'display': False, 'log_path': 'logs/Scores_20200319190425.txt', 'learning_rate': 0.00039636221156584545, 'first_layer_size': 200, 'second_layer_size': 100, 'third_layer_size': 200, 'epsilon_decay_linear': 60, 'name_scenario': 'snake_lr00039636_struct200_100_200_eps60', 'weights_path': 'weights/weights_snake_lr00039636_struct200_100_200_eps60.h5', 'train': False}
23 | Total score: 591 Mean: 3.94 Std dev: 1.675844973689072
24 |
25 | snake_lr00053872_struct300_50_50_eps30
26 | Params: {'episodes': 150, 'memory_size': 2500, 'batch_size': 500, 'load_weights': True, 'bayesian_optimization': True, 'plot_score': False, 'display': False, 'log_path': 'logs/Scores_20200319190425.txt', 'learning_rate': 0.0005387156658845444, 'first_layer_size': 300, 'second_layer_size': 50, 'third_layer_size': 50, 'epsilon_decay_linear': 30, 'name_scenario': 'snake_lr00053872_struct300_50_50_eps30', 'weights_path': 'weights/weights_snake_lr00053872_struct300_50_50_eps30.h5', 'train': False}
27 | Total score: 2583 Mean: 17.22 Std dev: 9.316125876257892
28 |
29 | snake_lr00057193_struct200_200_200_eps120
30 | Params: {'episodes': 150, 'memory_size': 2500, 'batch_size': 500, 'load_weights': True, 'bayesian_optimization': True, 'plot_score': False, 'display': False, 'log_path': 'logs/Scores_20200319190425.txt', 'learning_rate': 0.0005719335813607053, 'first_layer_size': 200, 'second_layer_size': 200, 'third_layer_size': 200, 'epsilon_decay_linear': 120, 'name_scenario': 'snake_lr00057193_struct200_200_200_eps120', 'weights_path': 'weights/weights_snake_lr00057193_struct200_200_200_eps120.h5', 'train': False}
31 | Total score: 2728 Mean: 18.186666666666667 Std dev: 10.081804106865844
32 |
33 | snake_lr00023502_struct200_300_50_eps60
34 | Params: {'episodes': 150, 'memory_size': 2500, 'batch_size': 500, 'load_weights': True, 'bayesian_optimization': True, 'plot_score': False, 'display': False, 'log_path': 'logs/Scores_20200319190425.txt', 'learning_rate': 0.00023501845629467593, 'first_layer_size': 200, 'second_layer_size': 300, 'third_layer_size': 50, 'epsilon_decay_linear': 60, 'name_scenario': 'snake_lr00023502_struct200_300_50_eps60', 'weights_path': 'weights/weights_snake_lr00023502_struct200_300_50_eps60.h5', 'train': False}
35 | Total score: 2539 Mean: 16.926666666666666 Std dev: 9.048042293846246
36 |
37 | snake_lr00066209_struct300_300_50_eps60
38 | Params: {'episodes': 150, 'memory_size': 2500, 'batch_size': 500, 'load_weights': True, 'bayesian_optimization': True, 'plot_score': False, 'display': False, 'log_path': 'logs/Scores_20200319190425.txt', 'learning_rate': 0.0006620925545328411, 'first_layer_size': 300, 'second_layer_size': 300, 'third_layer_size': 50, 'epsilon_decay_linear': 60, 'name_scenario': 'snake_lr00066209_struct300_300_50_eps60', 'weights_path': 'weights/weights_snake_lr00066209_struct300_300_50_eps60.h5', 'train': False}
39 | Total score: 1723 Mean: 11.486666666666666 Std dev: 7.992224969009398
40 |
41 | snake_lr00031494_struct200_200_50_eps150
42 | Params: {'episodes': 150, 'memory_size': 2500, 'batch_size': 500, 'load_weights': True, 'bayesian_optimization': True, 'plot_score': False, 'display': False, 'log_path': 'logs/Scores_20200319190425.txt', 'learning_rate': 0.00031494471298126344, 'first_layer_size': 200, 'second_layer_size': 200, 'third_layer_size': 50, 'epsilon_decay_linear': 150, 'name_scenario': 'snake_lr00031494_struct200_200_50_eps150', 'weights_path': 'weights/weights_snake_lr00031494_struct200_200_50_eps150.h5', 'train': False}
43 | Total score: 2982 Mean: 19.88 Std dev: 10.088148405680478
44 |
45 | snake_lr00028596_struct100_100_50_eps150
46 | Params: {'episodes': 150, 'memory_size': 2500, 'batch_size': 500, 'load_weights': True, 'bayesian_optimization': True, 'plot_score': False, 'display': False, 'log_path': 'logs/Scores_20200319190425.txt', 'learning_rate': 0.000285964733475369, 'first_layer_size': 100, 'second_layer_size': 100, 'third_layer_size': 50, 'epsilon_decay_linear': 150, 'name_scenario': 'snake_lr00028596_struct100_100_50_eps150', 'weights_path': 'weights/weights_snake_lr00028596_struct100_100_50_eps150.h5', 'train': False}
47 | Total score: 3015 Mean: 20.1 Std dev: 9.30739043870641
48 |
49 | snake_lr00010969_struct200_300_300_eps120
50 | Params: {'episodes': 150, 'memory_size': 2500, 'batch_size': 500, 'load_weights': True, 'bayesian_optimization': True, 'plot_score': False, 'display': False, 'log_path': 'logs/Scores_20200319190425.txt', 'learning_rate': 0.00010968974982318428, 'first_layer_size': 200, 'second_layer_size': 300, 'third_layer_size': 300, 'epsilon_decay_linear': 120, 'name_scenario': 'snake_lr00010969_struct200_300_300_eps120', 'weights_path': 'weights/weights_snake_lr00010969_struct200_300_300_eps120.h5', 'train': False}
51 | Total score: 3409 Mean: 22.726666666666667 Std dev: 10.088798138268023
52 |
53 | snake_lr00025869_struct200_100_100_eps150
54 | Params: {'episodes': 150, 'memory_size': 2500, 'batch_size': 500, 'load_weights': True, 'bayesian_optimization': True, 'plot_score': False, 'display': False, 'log_path': 'logs/Scores_20200319190425.txt', 'learning_rate': 0.00025868953302909644, 'first_layer_size': 200, 'second_layer_size': 100, 'third_layer_size': 100, 'epsilon_decay_linear': 150, 'name_scenario': 'snake_lr00025869_struct200_100_100_eps150', 'weights_path': 'weights/weights_snake_lr00025869_struct200_100_100_eps150.h5', 'train': False}
55 | Total score: 3052 Mean: 20.346666666666668 Std dev: 9.230811162642096
56 |
57 | snake_lr00052242_struct100_300_300_eps60
58 | Params: {'episodes': 150, 'memory_size': 2500, 'batch_size': 500, 'load_weights': True, 'bayesian_optimization': True, 'plot_score': False, 'display': False, 'log_path': 'logs/Scores_20200319190425.txt', 'learning_rate': 0.0005224151053210025, 'first_layer_size': 100, 'second_layer_size': 300, 'third_layer_size': 300, 'epsilon_decay_linear': 60, 'name_scenario': 'snake_lr00052242_struct100_300_300_eps60', 'weights_path': 'weights/weights_snake_lr00052242_struct100_300_300_eps60.h5', 'train': False}
59 | Total score: 900 Mean: 6 Std dev: 4.341156249652
60 |
61 | snake_lr00070273_struct300_200_200_eps150
62 | Params: {'episodes': 150, 'memory_size': 2500, 'batch_size': 500, 'load_weights': True, 'bayesian_optimization': True, 'plot_score': False, 'display': False, 'log_path': 'logs/Scores_20200319190425.txt', 'learning_rate': 0.0007027336011268658, 'first_layer_size': 300, 'second_layer_size': 200, 'third_layer_size': 200, 'epsilon_decay_linear': 150, 'name_scenario': 'snake_lr00070273_struct300_200_200_eps150', 'weights_path': 'weights/weights_snake_lr00070273_struct300_200_200_eps150.h5', 'train': False}
63 | Total score: 1462 Mean: 9.746666666666666 Std dev: 6.084623227532587
64 |
65 | snake_lr00017618_struct200_100_50_eps90
66 | Params: {'episodes': 150, 'memory_size': 2500, 'batch_size': 500, 'load_weights': True, 'bayesian_optimization': True, 'plot_score': False, 'display': False, 'log_path': 'logs/Scores_20200319190425.txt', 'learning_rate': 0.0001761762380559867, 'first_layer_size': 200, 'second_layer_size': 100, 'third_layer_size': 50, 'epsilon_decay_linear': 90, 'name_scenario': 'snake_lr00017618_struct200_100_50_eps90', 'weights_path': 'weights/weights_snake_lr00017618_struct200_100_50_eps90.h5', 'train': False}
67 | Total score: 3283 Mean: 21.886666666666667 Std dev: 9.504175462943909
68 |
69 | snake_lr00042738_struct50_300_50_eps60
70 | Params: {'episodes': 150, 'memory_size': 2500, 'batch_size': 500, 'load_weights': True, 'bayesian_optimization': True, 'plot_score': False, 'display': False, 'log_path': 'logs/Scores_20200319190425.txt', 'learning_rate': 0.00042737711977818937, 'first_layer_size': 50, 'second_layer_size': 300, 'third_layer_size': 50, 'epsilon_decay_linear': 60, 'name_scenario': 'snake_lr00042738_struct50_300_50_eps60', 'weights_path': 'weights/weights_snake_lr00042738_struct50_300_50_eps60.h5', 'train': False}
71 | Total score: 4398 Mean: 29.32 Std dev: 11.656907391451899
72 |
73 | snake_lr00008813_struct50_200_300_eps150
74 | Params: {'episodes': 150, 'memory_size': 2500, 'batch_size': 500, 'load_weights': True, 'bayesian_optimization': True, 'plot_score': False, 'display': False, 'log_path': 'logs/Scores_20200319190425.txt', 'learning_rate': 8.812995821921475e-05, 'first_layer_size': 50, 'second_layer_size': 200, 'third_layer_size': 300, 'epsilon_decay_linear': 150, 'name_scenario': 'snake_lr00008813_struct50_200_300_eps150', 'weights_path': 'weights/weights_snake_lr00008813_struct50_200_300_eps150.h5', 'train': False}
75 | Total score: 2743 Mean: 18.286666666666665 Std dev: 9.824866866708467
76 |
77 | snake_lr00074486_struct50_300_300_eps60
78 | Params: {'episodes': 150, 'memory_size': 2500, 'batch_size': 500, 'load_weights': True, 'bayesian_optimization': True, 'plot_score': False, 'display': False, 'log_path': 'logs/Scores_20200319190425.txt', 'learning_rate': 0.0007448567189619143, 'first_layer_size': 50, 'second_layer_size': 300, 'third_layer_size': 300, 'epsilon_decay_linear': 60, 'name_scenario': 'snake_lr00074486_struct50_300_300_eps60', 'weights_path': 'weights/weights_snake_lr00074486_struct50_300_300_eps60.h5', 'train': False}
79 | Total score: 1493 Mean: 9.953333333333333 Std dev: 6.151702625302738
80 |
81 |
--------------------------------------------------------------------------------
/snakeClass.py:
--------------------------------------------------------------------------------
1 | import os
2 | import pygame
3 | import argparse
4 | import numpy as np
5 | import seaborn as sns
6 | import matplotlib.pyplot as plt
7 | from DQN import DQNAgent
8 | from random import randint
9 | import random
10 | import statistics
11 | import torch.optim as optim
12 | import torch
13 | from GPyOpt.methods import BayesianOptimization
14 | from bayesOpt import *
15 | import datetime
16 | import distutils.util
17 | DEVICE = 'cpu' # 'cuda' if torch.cuda.is_available() else 'cpu'
18 |
19 | #################################
20 | # Define parameters manually #
21 | #################################
22 | def define_parameters():
23 | params = dict()
24 | # Neural Network
25 | params['epsilon_decay_linear'] = 1/100
26 | params['learning_rate'] = 0.00013629
27 | params['first_layer_size'] = 200 # neurons in the first layer
28 | params['second_layer_size'] = 20 # neurons in the second layer
29 | params['third_layer_size'] = 50 # neurons in the third layer
30 | params['episodes'] = 250
31 | params['memory_size'] = 2500
32 | params['batch_size'] = 1000
33 | # Settings
34 | params['weights_path'] = 'weights/weights.h5'
35 | params['train'] = False
36 | params["test"] = True
37 | params['plot_score'] = True
38 | params['log_path'] = 'logs/scores_' + str(datetime.datetime.now().strftime("%Y%m%d%H%M%S")) +'.txt'
39 | return params
40 |
41 |
42 | class Game:
43 | """ Initialize PyGAME """
44 |
45 | def __init__(self, game_width, game_height):
46 | pygame.display.set_caption('SnakeGen')
47 | self.game_width = game_width
48 | self.game_height = game_height
49 | self.gameDisplay = pygame.display.set_mode((game_width, game_height + 60))
50 | self.bg = pygame.image.load("img/background.png")
51 | self.crash = False
52 | self.player = Player(self)
53 | self.food = Food()
54 | self.score = 0
55 |
56 |
57 | class Player(object):
58 | def __init__(self, game):
59 | x = 0.45 * game.game_width
60 | y = 0.5 * game.game_height
61 | self.x = x - x % 20
62 | self.y = y - y % 20
63 | self.position = []
64 | self.position.append([self.x, self.y])
65 | self.food = 1
66 | self.eaten = False
67 | self.image = pygame.image.load('img/snakeBody.png')
68 | self.x_change = 20
69 | self.y_change = 0
70 |
71 | def update_position(self, x, y):
72 | if self.position[-1][0] != x or self.position[-1][1] != y:
73 | if self.food > 1:
74 | for i in range(0, self.food - 1):
75 | self.position[i][0], self.position[i][1] = self.position[i + 1]
76 | self.position[-1][0] = x
77 | self.position[-1][1] = y
78 |
79 | def do_move(self, move, x, y, game, food, agent):
80 | move_array = [self.x_change, self.y_change]
81 |
82 | if self.eaten:
83 | self.position.append([self.x, self.y])
84 | self.eaten = False
85 | self.food = self.food + 1
86 | if np.array_equal(move, [1, 0, 0]):
87 | move_array = self.x_change, self.y_change
88 | elif np.array_equal(move, [0, 1, 0]) and self.y_change == 0: # right - going horizontal
89 | move_array = [0, self.x_change]
90 | elif np.array_equal(move, [0, 1, 0]) and self.x_change == 0: # right - going vertical
91 | move_array = [-self.y_change, 0]
92 | elif np.array_equal(move, [0, 0, 1]) and self.y_change == 0: # left - going horizontal
93 | move_array = [0, -self.x_change]
94 | elif np.array_equal(move, [0, 0, 1]) and self.x_change == 0: # left - going vertical
95 | move_array = [self.y_change, 0]
96 | self.x_change, self.y_change = move_array
97 | self.x = x + self.x_change
98 | self.y = y + self.y_change
99 |
100 | if self.x < 20 or self.x > game.game_width - 40 \
101 | or self.y < 20 \
102 | or self.y > game.game_height - 40 \
103 | or [self.x, self.y] in self.position:
104 | game.crash = True
105 | eat(self, food, game)
106 |
107 | self.update_position(self.x, self.y)
108 |
109 | def display_player(self, x, y, food, game):
110 | self.position[-1][0] = x
111 | self.position[-1][1] = y
112 |
113 | if game.crash == False:
114 | for i in range(food):
115 | x_temp, y_temp = self.position[len(self.position) - 1 - i]
116 | game.gameDisplay.blit(self.image, (x_temp, y_temp))
117 |
118 | update_screen()
119 | else:
120 | pygame.time.wait(300)
121 |
122 |
123 | class Food(object):
124 | def __init__(self):
125 | self.x_food = 240
126 | self.y_food = 200
127 | self.image = pygame.image.load('img/food2.png')
128 |
129 | def food_coord(self, game, player):
130 | x_rand = randint(20, game.game_width - 40)
131 | self.x_food = x_rand - x_rand % 20
132 | y_rand = randint(20, game.game_height - 40)
133 | self.y_food = y_rand - y_rand % 20
134 | if [self.x_food, self.y_food] not in player.position:
135 | return self.x_food, self.y_food
136 | else:
137 | self.food_coord(game, player)
138 |
139 | def display_food(self, x, y, game):
140 | game.gameDisplay.blit(self.image, (x, y))
141 | update_screen()
142 |
143 |
144 | def eat(player, food, game):
145 | if player.x == food.x_food and player.y == food.y_food:
146 | food.food_coord(game, player)
147 | player.eaten = True
148 | game.score = game.score + 1
149 |
150 |
151 | def get_record(score, record):
152 | if score >= record:
153 | return score
154 | else:
155 | return record
156 |
157 |
158 | def display_ui(game, score, record):
159 | myfont = pygame.font.SysFont('Segoe UI', 20)
160 | myfont_bold = pygame.font.SysFont('Segoe UI', 20, True)
161 | text_score = myfont.render('SCORE: ', True, (0, 0, 0))
162 | text_score_number = myfont.render(str(score), True, (0, 0, 0))
163 | text_highest = myfont.render('HIGHEST SCORE: ', True, (0, 0, 0))
164 | text_highest_number = myfont_bold.render(str(record), True, (0, 0, 0))
165 | game.gameDisplay.blit(text_score, (45, 440))
166 | game.gameDisplay.blit(text_score_number, (120, 440))
167 | game.gameDisplay.blit(text_highest, (190, 440))
168 | game.gameDisplay.blit(text_highest_number, (350, 440))
169 | game.gameDisplay.blit(game.bg, (10, 10))
170 |
171 |
172 | def display(player, food, game, record):
173 | game.gameDisplay.fill((255, 255, 255))
174 | display_ui(game, game.score, record)
175 | player.display_player(player.position[-1][0], player.position[-1][1], player.food, game)
176 | food.display_food(food.x_food, food.y_food, game)
177 |
178 |
179 | def update_screen():
180 | pygame.display.update()
181 |
182 |
183 | def initialize_game(player, game, food, agent, batch_size):
184 | state_init1 = agent.get_state(game, player, food) # [0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0]
185 | action = [1, 0, 0]
186 | player.do_move(action, player.x, player.y, game, food, agent)
187 | state_init2 = agent.get_state(game, player, food)
188 | reward1 = agent.set_reward(player, game.crash)
189 | agent.remember(state_init1, action, reward1, state_init2, game.crash)
190 | agent.replay_new(agent.memory, batch_size)
191 |
192 |
193 | def plot_seaborn(array_counter, array_score, train):
194 | sns.set(color_codes=True, font_scale=1.5)
195 | sns.set_style("white")
196 | plt.figure(figsize=(13,8))
197 | fit_reg = False if train== False else True
198 | ax = sns.regplot(
199 | np.array([array_counter])[0],
200 | np.array([array_score])[0],
201 | #color="#36688D",
202 | x_jitter=.1,
203 | scatter_kws={"color": "#36688D"},
204 | label='Data',
205 | fit_reg = fit_reg,
206 | line_kws={"color": "#F49F05"}
207 | )
208 | # Plot the average line
209 | y_mean = [np.mean(array_score)]*len(array_counter)
210 | ax.plot(array_counter,y_mean, label='Mean', linestyle='--')
211 | ax.legend(loc='upper right')
212 | ax.set(xlabel='# games', ylabel='score')
213 | plt.show()
214 |
215 |
216 | def get_mean_stdev(array):
217 | return statistics.mean(array), statistics.stdev(array)
218 |
219 |
220 | def test(params):
221 | params['load_weights'] = True
222 | params['train'] = False
223 | params["test"] = False
224 | score, mean, stdev = run(params)
225 | return score, mean, stdev
226 |
227 |
228 | def run(params):
229 | """
230 | Run the DQN algorithm, based on the parameters previously set.
231 | """
232 | pygame.init()
233 | agent = DQNAgent(params)
234 | agent = agent.to(DEVICE)
235 | agent.optimizer = optim.Adam(agent.parameters(), weight_decay=0, lr=params['learning_rate'])
236 | counter_games = 0
237 | score_plot = []
238 | counter_plot = []
239 | record = 0
240 | total_score = 0
241 | while counter_games < params['episodes']:
242 | for event in pygame.event.get():
243 | if event.type == pygame.QUIT:
244 | pygame.quit()
245 | quit()
246 | # Initialize classes
247 | game = Game(440, 440)
248 | player1 = game.player
249 | food1 = game.food
250 |
251 | # Perform first move
252 | initialize_game(player1, game, food1, agent, params['batch_size'])
253 | if params['display']:
254 | display(player1, food1, game, record)
255 |
256 | steps = 0 # steps since the last positive reward
257 | while (not game.crash) and (steps < 100):
258 | if not params['train']:
259 | agent.epsilon = 0.01
260 | else:
261 | # agent.epsilon is set to give randomness to actions
262 | agent.epsilon = 1 - (counter_games * params['epsilon_decay_linear'])
263 |
264 | # get old state
265 | state_old = agent.get_state(game, player1, food1)
266 |
267 | # perform random actions based on agent.epsilon, or choose the action
268 | if random.uniform(0, 1) < agent.epsilon:
269 | final_move = np.eye(3)[randint(0,2)]
270 | else:
271 | # predict action based on the old state
272 | with torch.no_grad():
273 | state_old_tensor = torch.tensor(state_old.reshape((1, 11)), dtype=torch.float32).to(DEVICE)
274 | prediction = agent(state_old_tensor)
275 | final_move = np.eye(3)[np.argmax(prediction.detach().cpu().numpy()[0])]
276 |
277 | # perform new move and get new state
278 | player1.do_move(final_move, player1.x, player1.y, game, food1, agent)
279 | state_new = agent.get_state(game, player1, food1)
280 |
281 | # set reward for the new state
282 | reward = agent.set_reward(player1, game.crash)
283 |
284 | # if food is eaten, steps is set to 0
285 | if reward > 0:
286 | steps = 0
287 |
288 | if params['train']:
289 | # train short memory base on the new action and state
290 | agent.train_short_memory(state_old, final_move, reward, state_new, game.crash)
291 | # store the new data into a long term memory
292 | agent.remember(state_old, final_move, reward, state_new, game.crash)
293 |
294 | record = get_record(game.score, record)
295 | if params['display']:
296 | display(player1, food1, game, record)
297 | pygame.time.wait(params['speed'])
298 | steps+=1
299 | if params['train']:
300 | agent.replay_new(agent.memory, params['batch_size'])
301 | counter_games += 1
302 | total_score += game.score
303 | print(f'Game {counter_games} Score: {game.score}')
304 | score_plot.append(game.score)
305 | counter_plot.append(counter_games)
306 | mean, stdev = get_mean_stdev(score_plot)
307 | if params['train']:
308 | model_weights = agent.state_dict()
309 | torch.save(model_weights, params["weights_path"])
310 | if params['plot_score']:
311 | plot_seaborn(counter_plot, score_plot, params['train'])
312 | return total_score, mean, stdev
313 |
314 | if __name__ == '__main__':
315 | # Set options to activate or deactivate the game view, and its speed
316 | pygame.font.init()
317 | parser = argparse.ArgumentParser()
318 | params = define_parameters()
319 | parser.add_argument("--display", nargs='?', type=distutils.util.strtobool, default=True)
320 | parser.add_argument("--speed", nargs='?', type=int, default=50)
321 | parser.add_argument("--bayesianopt", nargs='?', type=distutils.util.strtobool, default=False)
322 | args = parser.parse_args()
323 | print("Args", args)
324 | params['display'] = args.display
325 | params['speed'] = args.speed
326 | if args.bayesianopt:
327 | bayesOpt = BayesianOptimizer(params)
328 | bayesOpt.optimize_RL()
329 | if params['train']:
330 | print("Training...")
331 | params['load_weights'] = False # when training, the network is not pre-trained
332 | run(params)
333 | if params['test']:
334 | print("Testing...")
335 | params['train'] = False
336 | params['load_weights'] = True
337 | run(params)
--------------------------------------------------------------------------------