├── .gitignore ├── img ├── food2.png ├── background.png ├── notraining.gif ├── snakeBody.png ├── snake_new.gif └── training.gif ├── weights └── weights.h5 ├── requirements.txt ├── logs ├── Scores_20200319190057.txt └── Scores_20200319190425.txt ├── README.md ├── bayesOpt.py ├── DQN.py └── snakeClass.py /.gitignore: -------------------------------------------------------------------------------- 1 | *.idea 2 | __pycache__ -------------------------------------------------------------------------------- /img/food2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/maurock/snake-ga/HEAD/img/food2.png -------------------------------------------------------------------------------- /img/background.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/maurock/snake-ga/HEAD/img/background.png -------------------------------------------------------------------------------- /img/notraining.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/maurock/snake-ga/HEAD/img/notraining.gif -------------------------------------------------------------------------------- /img/snakeBody.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/maurock/snake-ga/HEAD/img/snakeBody.png -------------------------------------------------------------------------------- /img/snake_new.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/maurock/snake-ga/HEAD/img/snake_new.gif -------------------------------------------------------------------------------- /img/training.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/maurock/snake-ga/HEAD/img/training.gif -------------------------------------------------------------------------------- /weights/weights.h5: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/maurock/snake-ga/HEAD/weights/weights.h5 -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | absl-py==0.8.0 2 | certifi==2020.12.5 3 | cmake-example==0.0.1 4 | cycler==0.10.0 5 | dataclasses==0.8 6 | decorator==4.4.2 7 | google-pasta==0.1.7 8 | GPy==1.9.9 9 | GPyOpt==1.2.6 10 | kiwisolver==1.2.0 11 | matplotlib==3.2.0 12 | mkl-fft==1.2.0 13 | mkl-random==1.1.1 14 | mkl-service==2.3.0 15 | msgpack-numpy==0.4.4.3 16 | numpy==1.18.1 17 | olefile==0.46 18 | pandas==0.25.1 19 | paramz==0.9.5 20 | Pillow @ file:///C:/ci/pillow_1603822370986/work 21 | pygame==1.9.6 22 | pyparsing==2.4.7 23 | python-dateutil==2.8.1 24 | pytz==2020.1 25 | scipy @ file:///C:/ci/scipy_1597675683670/work 26 | seaborn==0.10.1 27 | six @ file:///C:/ci/six_1605187303045/work 28 | tabulate==0.8.3 29 | tensorflow-estimator==1.14.0 30 | tensorpack==0.9.4 31 | tgan==0.1.0 32 | torch==1.7.1 33 | typing-extensions @ file:///tmp/build/80754af9/typing_extensions_1598376058250/work 34 | wincertstore==0.2 35 | wrapt==1.11.2 36 | -------------------------------------------------------------------------------- /logs/Scores_20200319190057.txt: -------------------------------------------------------------------------------- 1 | snake_lr00019371_struct200_300_300_eps0 2 | Params: {'episodes': 2, 'memory_size': 2500, 'batch_size': 500, 'load_weights': True, 'bayesian_optimization': True, 'plot_score': False, 'display': False, 'log_path': 'logs/Scores_20200319190057.txt', 'learning_rate': 0.0001937141601579515, 'first_layer_size': 200, 'second_layer_size': 300, 'third_layer_size': 300, 'epsilon_decay_linear': 0, 'name_scenario': 'snake_lr00019371_struct200_300_300_eps0', 'weights_path': 'weights/weights_snake_lr00019371_struct200_300_300_eps0.h5', 'train': False} 3 | Total score: 2 Mean: 1 Std dev: 0.0 4 | 5 | snake_lr00005829_struct200_50_200_eps2 6 | Params: {'episodes': 2, 'memory_size': 2500, 'batch_size': 500, 'load_weights': True, 'bayesian_optimization': True, 'plot_score': False, 'display': False, 'log_path': 'logs/Scores_20200319190057.txt', 'learning_rate': 5.8287680069085044e-05, 'first_layer_size': 200, 'second_layer_size': 50, 'third_layer_size': 200, 'epsilon_decay_linear': 2, 'name_scenario': 'snake_lr00005829_struct200_50_200_eps2', 'weights_path': 'weights/weights_snake_lr00005829_struct200_50_200_eps2.h5', 'train': False} 7 | Total score: 0 Mean: 0 Std dev: 0.0 8 | 9 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Deep Reinforcement Learning 2 | ## Project: Train AI to play Snake 3 | *UPDATE:* 4 | 5 | This project has been recently updated and improved: 6 | - It is now possible to optimize the Deep RL approach using Bayesian Optimization. 7 | - The code of Deep Reinforcement Learning was ported from Keras/TF to Pytorch. To see the old version of the code in Keras/TF, please refer to this repository: [snake-ga-tf](https://github.com/maurock/snake-ga-tf). 8 | 9 | ## Introduction 10 | The goal of this project is to develop an AI Bot able to learn how to play the popular game Snake from scratch. In order to do it, I implemented a Deep Reinforcement Learning algorithm. This approach consists in giving the system parameters related to its state, and a positive or negative reward based on its actions. No rules about the game are given, and initially the Bot has no information on what it needs to do. The goal for the system is to figure it out and elaborate a strategy to maximize the score - or the reward. \ 11 | We are going to see how a Deep Q-Learning algorithm learns how to play Snake, scoring up to 50 points and showing a solid strategy after only 5 minutes of training. \ 12 | Additionally, it is possible to run the Bayesian Optimization method to find the optimal parameters of the Deep neural network, as well as some parameters of the Deep RL approach. 13 | 14 | ## Install 15 | This project requires Python 3.6 with the pygame library installed, as well as Pytorch. If you encounter any error with `torch=1.7.1`, you might need to install Visual C++ 2015-2019 (or simply downgrade your pytorch version, it should be fine). \ 16 | The full list of requirements is in `requirements.txt`. 17 | ```bash 18 | git clone git@github.com:maurock/snake-ga.git 19 | ``` 20 | 21 | ## Run 22 | To run and show the game, executes in the snake-ga folder: 23 | 24 | ```python 25 | python snakeClass.py 26 | ``` 27 | Arguments description: 28 | 29 | - --display - Type bool, default True, display or not game view 30 | - --speed - Type integer, default 50, game speed 31 | 32 | The default configuration loads the file *weights/weights.h5* and runs a test. 33 | 34 | To train the agent, set in the file snakeClass.py: 35 | - `params['train'] = True` 36 | The parameters of the Deep neural network can be changed in *snakeClass.py* by modifying the dictionary `params` in the function `define_parameters()` 37 | 38 | If you run snakeClass.py from the command line, you can set the arguments `--display=False` and `--speed=0`. This way, the game display is not shown and the training phase is faster. 39 | 40 | ## Optimize Deep RL with Bayesian Optimization 41 | To optimize the Deep neural network and additional parameters, run: 42 | 43 | ```python 44 | python snakeClass.py --bayesianopt=True 45 | ``` 46 | 47 | This method uses Bayesian optimization to optimize some parameters of Deep RL. The parameters and the features' search space can be modified in *bayesOpt.py* by editing the `optim_params` dictionary in `optimize_RL`. 48 | 49 | ## For Mac users 50 | It seems there is a OSX specific problem, since many users cannot see the game running. 51 | To fix this problem, in update_screen(), add this line. 52 | 53 | ``` 54 | def update_screen(): 55 | pygame.display.update()
56 | pygame.event.get() # <--- Add this line ### 57 | ``` 58 | -------------------------------------------------------------------------------- /bayesOpt.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | """ 3 | Created on Sun Mar 15 21:10:29 2020 4 | 5 | @author: mauro 6 | """ 7 | from snakeClass import run 8 | from GPyOpt.methods import BayesianOptimization 9 | import datetime 10 | 11 | ################################################ 12 | # Set parameters for Bayesian Optimization # 13 | ################################################ 14 | 15 | class BayesianOptimizer(): 16 | def __init__(self, params): 17 | self.params = params 18 | 19 | def optimize_RL(self): 20 | def optimize(inputs): 21 | print("INPUT", inputs) 22 | inputs = inputs[0] 23 | 24 | # Variables to optimize 25 | self.params["learning_rate"] = inputs[0] 26 | lr_string = '{:.8f}'.format(self.params["learning_rate"])[2:] 27 | self.params["first_layer_size"] = int(inputs[1]) 28 | self.params["second_layer_size"] = int(inputs[2]) 29 | self.params["third_layer_size"] = int(inputs[3]) 30 | self.params["epsilon_decay_linear"] = int(inputs[4]) 31 | 32 | self.params['name_scenario'] = 'snake_lr{}_struct{}_{}_{}_eps{}'.format(lr_string, 33 | self.params['first_layer_size'], 34 | self.params['second_layer_size'], 35 | self.params['third_layer_size'], 36 | self.params['epsilon_decay_linear']) 37 | 38 | self.params['weights_path'] = 'weights/weights_' + self.params['name_scenario'] + '.h5' 39 | self.params['load_weights'] = False 40 | self.params['train'] = True 41 | print(self.params) 42 | score, mean, stdev = run(self.params) 43 | print('Total score: {} Mean: {} Std dev: {}'.format(score, mean, stdev)) 44 | with open(self.params['log_path'], 'a') as f: 45 | f.write(str(self.params['name_scenario']) + '\n') 46 | f.write('Params: ' + str(self.params) + '\n') 47 | return score 48 | 49 | optim_params = [ 50 | {"name": "learning_rate", "type": "continuous", "domain": (0.00005, 0.001)}, 51 | {"name": "first_layer_size", "type": "discrete", "domain": (20,50,100,200)}, 52 | {"name": "second_layer_size", "type": "discrete", "domain": (20,50,100,200)}, 53 | {"name": "third_layer_size", "type": "discrete", "domain": (20,50,100,200)}, 54 | {"name":'epsilon_decay_linear', "type": "discrete", "domain": (self.params['episodes']*0.2, 55 | self.params['episodes']*0.4, 56 | self.params['episodes']*0.6, 57 | self.params['episodes']*0.8, 58 | self.params['episodes']*1)} 59 | ] 60 | 61 | bayes_optimizer = BayesianOptimization(f=optimize, 62 | domain=optim_params, 63 | initial_design_numdata=6, 64 | acquisition_type="EI", 65 | exact_feval=True, 66 | maximize=True) 67 | 68 | bayes_optimizer.run_optimization(max_iter=20) 69 | print('Optimized learning rate: ', bayes_optimizer.x_opt[0]) 70 | print('Optimized first layer: ', bayes_optimizer.x_opt[1]) 71 | print('Optimized second layer: ', bayes_optimizer.x_opt[2]) 72 | print('Optimized third layer: ', bayes_optimizer.x_opt[3]) 73 | print('Optimized epsilon linear decay: ', bayes_optimizer.x_opt[4]) 74 | return self.params 75 | 76 | 77 | ################## 78 | # Main # 79 | ################## 80 | if __name__ == '__main__': 81 | # Define optimizer 82 | bayesOpt = BayesianOptimizer(params) 83 | bayesOpt.optimize_RL() -------------------------------------------------------------------------------- /DQN.py: -------------------------------------------------------------------------------- 1 | import random 2 | import numpy as np 3 | import pandas as pd 4 | from operator import add 5 | import collections 6 | import torch 7 | import torch.nn as nn 8 | import torch.nn.functional as F 9 | import torch.optim as optim 10 | import copy 11 | DEVICE = 'cpu' # 'cuda' if torch.cuda.is_available() else 'cpu' 12 | 13 | class DQNAgent(torch.nn.Module): 14 | def __init__(self, params): 15 | super().__init__() 16 | self.reward = 0 17 | self.gamma = 0.9 18 | self.dataframe = pd.DataFrame() 19 | self.short_memory = np.array([]) 20 | self.agent_target = 1 21 | self.agent_predict = 0 22 | self.learning_rate = params['learning_rate'] 23 | self.epsilon = 1 24 | self.actual = [] 25 | self.first_layer = params['first_layer_size'] 26 | self.second_layer = params['second_layer_size'] 27 | self.third_layer = params['third_layer_size'] 28 | self.memory = collections.deque(maxlen=params['memory_size']) 29 | self.weights = params['weights_path'] 30 | self.load_weights = params['load_weights'] 31 | self.optimizer = None 32 | self.network() 33 | 34 | def network(self): 35 | # Layers 36 | self.f1 = nn.Linear(11, self.first_layer) 37 | self.f2 = nn.Linear(self.first_layer, self.second_layer) 38 | self.f3 = nn.Linear(self.second_layer, self.third_layer) 39 | self.f4 = nn.Linear(self.third_layer, 3) 40 | # weights 41 | if self.load_weights: 42 | self.model = self.load_state_dict(torch.load(self.weights)) 43 | print("weights loaded") 44 | 45 | def forward(self, x): 46 | x = F.relu(self.f1(x)) 47 | x = F.relu(self.f2(x)) 48 | x = F.relu(self.f3(x)) 49 | x = F.softmax(self.f4(x), dim=-1) 50 | return x 51 | 52 | def get_state(self, game, player, food): 53 | """ 54 | Return the state. 55 | The state is a numpy array of 11 values, representing: 56 | - Danger 1 OR 2 steps ahead 57 | - Danger 1 OR 2 steps on the right 58 | - Danger 1 OR 2 steps on the left 59 | - Snake is moving left 60 | - Snake is moving right 61 | - Snake is moving up 62 | - Snake is moving down 63 | - The food is on the left 64 | - The food is on the right 65 | - The food is on the upper side 66 | - The food is on the lower side 67 | """ 68 | state = [ 69 | (player.x_change == 20 and player.y_change == 0 and ((list(map(add, player.position[-1], [20, 0])) in player.position) or 70 | player.position[-1][0] + 20 >= (game.game_width - 20))) or (player.x_change == -20 and player.y_change == 0 and ((list(map(add, player.position[-1], [-20, 0])) in player.position) or 71 | player.position[-1][0] - 20 < 20)) or (player.x_change == 0 and player.y_change == -20 and ((list(map(add, player.position[-1], [0, -20])) in player.position) or 72 | player.position[-1][-1] - 20 < 20)) or (player.x_change == 0 and player.y_change == 20 and ((list(map(add, player.position[-1], [0, 20])) in player.position) or 73 | player.position[-1][-1] + 20 >= (game.game_height-20))), # danger straight 74 | 75 | (player.x_change == 0 and player.y_change == -20 and ((list(map(add,player.position[-1],[20, 0])) in player.position) or 76 | player.position[ -1][0] + 20 > (game.game_width-20))) or (player.x_change == 0 and player.y_change == 20 and ((list(map(add,player.position[-1], 77 | [-20,0])) in player.position) or player.position[-1][0] - 20 < 20)) or (player.x_change == -20 and player.y_change == 0 and ((list(map( 78 | add,player.position[-1],[0,-20])) in player.position) or player.position[-1][-1] - 20 < 20)) or (player.x_change == 20 and player.y_change == 0 and ( 79 | (list(map(add,player.position[-1],[0,20])) in player.position) or player.position[-1][ 80 | -1] + 20 >= (game.game_height-20))), # danger right 81 | 82 | (player.x_change == 0 and player.y_change == 20 and ((list(map(add,player.position[-1],[20,0])) in player.position) or 83 | player.position[-1][0] + 20 > (game.game_width-20))) or (player.x_change == 0 and player.y_change == -20 and ((list(map( 84 | add, player.position[-1],[-20,0])) in player.position) or player.position[-1][0] - 20 < 20)) or (player.x_change == 20 and player.y_change == 0 and ( 85 | (list(map(add,player.position[-1],[0,-20])) in player.position) or player.position[-1][-1] - 20 < 20)) or ( 86 | player.x_change == -20 and player.y_change == 0 and ((list(map(add,player.position[-1],[0,20])) in player.position) or 87 | player.position[-1][-1] + 20 >= (game.game_height-20))), #danger left 88 | 89 | 90 | player.x_change == -20, # move left 91 | player.x_change == 20, # move right 92 | player.y_change == -20, # move up 93 | player.y_change == 20, # move down 94 | food.x_food < player.x, # food left 95 | food.x_food > player.x, # food right 96 | food.y_food < player.y, # food up 97 | food.y_food > player.y # food down 98 | ] 99 | 100 | for i in range(len(state)): 101 | if state[i]: 102 | state[i]=1 103 | else: 104 | state[i]=0 105 | 106 | return np.asarray(state) 107 | 108 | def set_reward(self, player, crash): 109 | """ 110 | Return the reward. 111 | The reward is: 112 | -10 when Snake crashes. 113 | +10 when Snake eats food 114 | 0 otherwise 115 | """ 116 | self.reward = 0 117 | if crash: 118 | self.reward = -10 119 | return self.reward 120 | if player.eaten: 121 | self.reward = 10 122 | return self.reward 123 | 124 | def remember(self, state, action, reward, next_state, done): 125 | """ 126 | Store the tuple in a 127 | memory buffer for replay memory. 128 | """ 129 | self.memory.append((state, action, reward, next_state, done)) 130 | 131 | def replay_new(self, memory, batch_size): 132 | """ 133 | Replay memory. 134 | """ 135 | if len(memory) > batch_size: 136 | minibatch = random.sample(memory, batch_size) 137 | else: 138 | minibatch = memory 139 | for state, action, reward, next_state, done in minibatch: 140 | self.train() 141 | torch.set_grad_enabled(True) 142 | target = reward 143 | next_state_tensor = torch.tensor(np.expand_dims(next_state, 0), dtype=torch.float32).to(DEVICE) 144 | state_tensor = torch.tensor(np.expand_dims(state, 0), dtype=torch.float32, requires_grad=True).to(DEVICE) 145 | if not done: 146 | target = reward + self.gamma * torch.max(self.forward(next_state_tensor)[0]) 147 | output = self.forward(state_tensor) 148 | target_f = output.clone() 149 | target_f[0][np.argmax(action)] = target 150 | target_f.detach() 151 | self.optimizer.zero_grad() 152 | loss = F.mse_loss(output, target_f) 153 | loss.backward() 154 | self.optimizer.step() 155 | 156 | def train_short_memory(self, state, action, reward, next_state, done): 157 | """ 158 | Train the DQN agent on the 159 | tuple at the current timestep. 160 | """ 161 | self.train() 162 | torch.set_grad_enabled(True) 163 | target = reward 164 | next_state_tensor = torch.tensor(next_state.reshape((1, 11)), dtype=torch.float32).to(DEVICE) 165 | state_tensor = torch.tensor(state.reshape((1, 11)), dtype=torch.float32, requires_grad=True).to(DEVICE) 166 | if not done: 167 | target = reward + self.gamma * torch.max(self.forward(next_state_tensor[0])) 168 | output = self.forward(state_tensor) 169 | target_f = output.clone() 170 | target_f[0][np.argmax(action)] = target 171 | target_f.detach() 172 | self.optimizer.zero_grad() 173 | loss = F.mse_loss(output, target_f) 174 | loss.backward() 175 | self.optimizer.step() -------------------------------------------------------------------------------- /logs/Scores_20200319190425.txt: -------------------------------------------------------------------------------- 1 | snake_lr00066711_struct100_200_300_eps120 2 | Params: {'episodes': 150, 'memory_size': 2500, 'batch_size': 500, 'load_weights': True, 'bayesian_optimization': True, 'plot_score': False, 'display': False, 'log_path': 'logs/Scores_20200319190425.txt', 'learning_rate': 0.0006671088271327207, 'first_layer_size': 100, 'second_layer_size': 200, 'third_layer_size': 300, 'epsilon_decay_linear': 120, 'name_scenario': 'snake_lr00066711_struct100_200_300_eps120', 'weights_path': 'weights/weights_snake_lr00066711_struct100_200_300_eps120.h5', 'train': False} 3 | Total score: 1590 Mean: 10.6 Std dev: 6.260561546098179 4 | 5 | snake_lr00088417_struct100_200_300_eps30 6 | Params: {'episodes': 150, 'memory_size': 2500, 'batch_size': 500, 'load_weights': True, 'bayesian_optimization': True, 'plot_score': False, 'display': False, 'log_path': 'logs/Scores_20200319190425.txt', 'learning_rate': 0.0008841734649773743, 'first_layer_size': 100, 'second_layer_size': 200, 'third_layer_size': 300, 'epsilon_decay_linear': 30, 'name_scenario': 'snake_lr00088417_struct100_200_300_eps30', 'weights_path': 'weights/weights_snake_lr00088417_struct100_200_300_eps30.h5', 'train': False} 7 | Total score: 2467 Mean: 16.446666666666665 Std dev: 10.621958177554978 8 | 9 | snake_lr00053319_struct100_50_100_eps120 10 | Params: {'episodes': 150, 'memory_size': 2500, 'batch_size': 500, 'load_weights': True, 'bayesian_optimization': True, 'plot_score': False, 'display': False, 'log_path': 'logs/Scores_20200319190425.txt', 'learning_rate': 0.000533193546882061, 'first_layer_size': 100, 'second_layer_size': 50, 'third_layer_size': 100, 'epsilon_decay_linear': 120, 'name_scenario': 'snake_lr00053319_struct100_50_100_eps120', 'weights_path': 'weights/weights_snake_lr00053319_struct100_50_100_eps120.h5', 'train': False} 11 | Total score: 600 Mean: 4 Std dev: 1.900194267708846 12 | 13 | snake_lr00051081_struct200_100_300_eps90 14 | Params: {'episodes': 150, 'memory_size': 2500, 'batch_size': 500, 'load_weights': True, 'bayesian_optimization': True, 'plot_score': False, 'display': False, 'log_path': 'logs/Scores_20200319190425.txt', 'learning_rate': 0.0005108060711725915, 'first_layer_size': 200, 'second_layer_size': 100, 'third_layer_size': 300, 'epsilon_decay_linear': 90, 'name_scenario': 'snake_lr00051081_struct200_100_300_eps90', 'weights_path': 'weights/weights_snake_lr00051081_struct200_100_300_eps90.h5', 'train': False} 15 | Total score: 3017 Mean: 20.113333333333333 Std dev: 9.613006196980043 16 | 17 | snake_lr00007919_struct300_200_100_eps90 18 | Params: {'episodes': 150, 'memory_size': 2500, 'batch_size': 500, 'load_weights': True, 'bayesian_optimization': True, 'plot_score': False, 'display': False, 'log_path': 'logs/Scores_20200319190425.txt', 'learning_rate': 7.918794353236998e-05, 'first_layer_size': 300, 'second_layer_size': 200, 'third_layer_size': 100, 'epsilon_decay_linear': 90, 'name_scenario': 'snake_lr00007919_struct300_200_100_eps90', 'weights_path': 'weights/weights_snake_lr00007919_struct300_200_100_eps90.h5', 'train': False} 19 | Total score: 2183 Mean: 14.553333333333333 Std dev: 8.807818877456329 20 | 21 | snake_lr00039636_struct200_100_200_eps60 22 | Params: {'episodes': 150, 'memory_size': 2500, 'batch_size': 500, 'load_weights': True, 'bayesian_optimization': True, 'plot_score': False, 'display': False, 'log_path': 'logs/Scores_20200319190425.txt', 'learning_rate': 0.00039636221156584545, 'first_layer_size': 200, 'second_layer_size': 100, 'third_layer_size': 200, 'epsilon_decay_linear': 60, 'name_scenario': 'snake_lr00039636_struct200_100_200_eps60', 'weights_path': 'weights/weights_snake_lr00039636_struct200_100_200_eps60.h5', 'train': False} 23 | Total score: 591 Mean: 3.94 Std dev: 1.675844973689072 24 | 25 | snake_lr00053872_struct300_50_50_eps30 26 | Params: {'episodes': 150, 'memory_size': 2500, 'batch_size': 500, 'load_weights': True, 'bayesian_optimization': True, 'plot_score': False, 'display': False, 'log_path': 'logs/Scores_20200319190425.txt', 'learning_rate': 0.0005387156658845444, 'first_layer_size': 300, 'second_layer_size': 50, 'third_layer_size': 50, 'epsilon_decay_linear': 30, 'name_scenario': 'snake_lr00053872_struct300_50_50_eps30', 'weights_path': 'weights/weights_snake_lr00053872_struct300_50_50_eps30.h5', 'train': False} 27 | Total score: 2583 Mean: 17.22 Std dev: 9.316125876257892 28 | 29 | snake_lr00057193_struct200_200_200_eps120 30 | Params: {'episodes': 150, 'memory_size': 2500, 'batch_size': 500, 'load_weights': True, 'bayesian_optimization': True, 'plot_score': False, 'display': False, 'log_path': 'logs/Scores_20200319190425.txt', 'learning_rate': 0.0005719335813607053, 'first_layer_size': 200, 'second_layer_size': 200, 'third_layer_size': 200, 'epsilon_decay_linear': 120, 'name_scenario': 'snake_lr00057193_struct200_200_200_eps120', 'weights_path': 'weights/weights_snake_lr00057193_struct200_200_200_eps120.h5', 'train': False} 31 | Total score: 2728 Mean: 18.186666666666667 Std dev: 10.081804106865844 32 | 33 | snake_lr00023502_struct200_300_50_eps60 34 | Params: {'episodes': 150, 'memory_size': 2500, 'batch_size': 500, 'load_weights': True, 'bayesian_optimization': True, 'plot_score': False, 'display': False, 'log_path': 'logs/Scores_20200319190425.txt', 'learning_rate': 0.00023501845629467593, 'first_layer_size': 200, 'second_layer_size': 300, 'third_layer_size': 50, 'epsilon_decay_linear': 60, 'name_scenario': 'snake_lr00023502_struct200_300_50_eps60', 'weights_path': 'weights/weights_snake_lr00023502_struct200_300_50_eps60.h5', 'train': False} 35 | Total score: 2539 Mean: 16.926666666666666 Std dev: 9.048042293846246 36 | 37 | snake_lr00066209_struct300_300_50_eps60 38 | Params: {'episodes': 150, 'memory_size': 2500, 'batch_size': 500, 'load_weights': True, 'bayesian_optimization': True, 'plot_score': False, 'display': False, 'log_path': 'logs/Scores_20200319190425.txt', 'learning_rate': 0.0006620925545328411, 'first_layer_size': 300, 'second_layer_size': 300, 'third_layer_size': 50, 'epsilon_decay_linear': 60, 'name_scenario': 'snake_lr00066209_struct300_300_50_eps60', 'weights_path': 'weights/weights_snake_lr00066209_struct300_300_50_eps60.h5', 'train': False} 39 | Total score: 1723 Mean: 11.486666666666666 Std dev: 7.992224969009398 40 | 41 | snake_lr00031494_struct200_200_50_eps150 42 | Params: {'episodes': 150, 'memory_size': 2500, 'batch_size': 500, 'load_weights': True, 'bayesian_optimization': True, 'plot_score': False, 'display': False, 'log_path': 'logs/Scores_20200319190425.txt', 'learning_rate': 0.00031494471298126344, 'first_layer_size': 200, 'second_layer_size': 200, 'third_layer_size': 50, 'epsilon_decay_linear': 150, 'name_scenario': 'snake_lr00031494_struct200_200_50_eps150', 'weights_path': 'weights/weights_snake_lr00031494_struct200_200_50_eps150.h5', 'train': False} 43 | Total score: 2982 Mean: 19.88 Std dev: 10.088148405680478 44 | 45 | snake_lr00028596_struct100_100_50_eps150 46 | Params: {'episodes': 150, 'memory_size': 2500, 'batch_size': 500, 'load_weights': True, 'bayesian_optimization': True, 'plot_score': False, 'display': False, 'log_path': 'logs/Scores_20200319190425.txt', 'learning_rate': 0.000285964733475369, 'first_layer_size': 100, 'second_layer_size': 100, 'third_layer_size': 50, 'epsilon_decay_linear': 150, 'name_scenario': 'snake_lr00028596_struct100_100_50_eps150', 'weights_path': 'weights/weights_snake_lr00028596_struct100_100_50_eps150.h5', 'train': False} 47 | Total score: 3015 Mean: 20.1 Std dev: 9.30739043870641 48 | 49 | snake_lr00010969_struct200_300_300_eps120 50 | Params: {'episodes': 150, 'memory_size': 2500, 'batch_size': 500, 'load_weights': True, 'bayesian_optimization': True, 'plot_score': False, 'display': False, 'log_path': 'logs/Scores_20200319190425.txt', 'learning_rate': 0.00010968974982318428, 'first_layer_size': 200, 'second_layer_size': 300, 'third_layer_size': 300, 'epsilon_decay_linear': 120, 'name_scenario': 'snake_lr00010969_struct200_300_300_eps120', 'weights_path': 'weights/weights_snake_lr00010969_struct200_300_300_eps120.h5', 'train': False} 51 | Total score: 3409 Mean: 22.726666666666667 Std dev: 10.088798138268023 52 | 53 | snake_lr00025869_struct200_100_100_eps150 54 | Params: {'episodes': 150, 'memory_size': 2500, 'batch_size': 500, 'load_weights': True, 'bayesian_optimization': True, 'plot_score': False, 'display': False, 'log_path': 'logs/Scores_20200319190425.txt', 'learning_rate': 0.00025868953302909644, 'first_layer_size': 200, 'second_layer_size': 100, 'third_layer_size': 100, 'epsilon_decay_linear': 150, 'name_scenario': 'snake_lr00025869_struct200_100_100_eps150', 'weights_path': 'weights/weights_snake_lr00025869_struct200_100_100_eps150.h5', 'train': False} 55 | Total score: 3052 Mean: 20.346666666666668 Std dev: 9.230811162642096 56 | 57 | snake_lr00052242_struct100_300_300_eps60 58 | Params: {'episodes': 150, 'memory_size': 2500, 'batch_size': 500, 'load_weights': True, 'bayesian_optimization': True, 'plot_score': False, 'display': False, 'log_path': 'logs/Scores_20200319190425.txt', 'learning_rate': 0.0005224151053210025, 'first_layer_size': 100, 'second_layer_size': 300, 'third_layer_size': 300, 'epsilon_decay_linear': 60, 'name_scenario': 'snake_lr00052242_struct100_300_300_eps60', 'weights_path': 'weights/weights_snake_lr00052242_struct100_300_300_eps60.h5', 'train': False} 59 | Total score: 900 Mean: 6 Std dev: 4.341156249652 60 | 61 | snake_lr00070273_struct300_200_200_eps150 62 | Params: {'episodes': 150, 'memory_size': 2500, 'batch_size': 500, 'load_weights': True, 'bayesian_optimization': True, 'plot_score': False, 'display': False, 'log_path': 'logs/Scores_20200319190425.txt', 'learning_rate': 0.0007027336011268658, 'first_layer_size': 300, 'second_layer_size': 200, 'third_layer_size': 200, 'epsilon_decay_linear': 150, 'name_scenario': 'snake_lr00070273_struct300_200_200_eps150', 'weights_path': 'weights/weights_snake_lr00070273_struct300_200_200_eps150.h5', 'train': False} 63 | Total score: 1462 Mean: 9.746666666666666 Std dev: 6.084623227532587 64 | 65 | snake_lr00017618_struct200_100_50_eps90 66 | Params: {'episodes': 150, 'memory_size': 2500, 'batch_size': 500, 'load_weights': True, 'bayesian_optimization': True, 'plot_score': False, 'display': False, 'log_path': 'logs/Scores_20200319190425.txt', 'learning_rate': 0.0001761762380559867, 'first_layer_size': 200, 'second_layer_size': 100, 'third_layer_size': 50, 'epsilon_decay_linear': 90, 'name_scenario': 'snake_lr00017618_struct200_100_50_eps90', 'weights_path': 'weights/weights_snake_lr00017618_struct200_100_50_eps90.h5', 'train': False} 67 | Total score: 3283 Mean: 21.886666666666667 Std dev: 9.504175462943909 68 | 69 | snake_lr00042738_struct50_300_50_eps60 70 | Params: {'episodes': 150, 'memory_size': 2500, 'batch_size': 500, 'load_weights': True, 'bayesian_optimization': True, 'plot_score': False, 'display': False, 'log_path': 'logs/Scores_20200319190425.txt', 'learning_rate': 0.00042737711977818937, 'first_layer_size': 50, 'second_layer_size': 300, 'third_layer_size': 50, 'epsilon_decay_linear': 60, 'name_scenario': 'snake_lr00042738_struct50_300_50_eps60', 'weights_path': 'weights/weights_snake_lr00042738_struct50_300_50_eps60.h5', 'train': False} 71 | Total score: 4398 Mean: 29.32 Std dev: 11.656907391451899 72 | 73 | snake_lr00008813_struct50_200_300_eps150 74 | Params: {'episodes': 150, 'memory_size': 2500, 'batch_size': 500, 'load_weights': True, 'bayesian_optimization': True, 'plot_score': False, 'display': False, 'log_path': 'logs/Scores_20200319190425.txt', 'learning_rate': 8.812995821921475e-05, 'first_layer_size': 50, 'second_layer_size': 200, 'third_layer_size': 300, 'epsilon_decay_linear': 150, 'name_scenario': 'snake_lr00008813_struct50_200_300_eps150', 'weights_path': 'weights/weights_snake_lr00008813_struct50_200_300_eps150.h5', 'train': False} 75 | Total score: 2743 Mean: 18.286666666666665 Std dev: 9.824866866708467 76 | 77 | snake_lr00074486_struct50_300_300_eps60 78 | Params: {'episodes': 150, 'memory_size': 2500, 'batch_size': 500, 'load_weights': True, 'bayesian_optimization': True, 'plot_score': False, 'display': False, 'log_path': 'logs/Scores_20200319190425.txt', 'learning_rate': 0.0007448567189619143, 'first_layer_size': 50, 'second_layer_size': 300, 'third_layer_size': 300, 'epsilon_decay_linear': 60, 'name_scenario': 'snake_lr00074486_struct50_300_300_eps60', 'weights_path': 'weights/weights_snake_lr00074486_struct50_300_300_eps60.h5', 'train': False} 79 | Total score: 1493 Mean: 9.953333333333333 Std dev: 6.151702625302738 80 | 81 | -------------------------------------------------------------------------------- /snakeClass.py: -------------------------------------------------------------------------------- 1 | import os 2 | import pygame 3 | import argparse 4 | import numpy as np 5 | import seaborn as sns 6 | import matplotlib.pyplot as plt 7 | from DQN import DQNAgent 8 | from random import randint 9 | import random 10 | import statistics 11 | import torch.optim as optim 12 | import torch 13 | from GPyOpt.methods import BayesianOptimization 14 | from bayesOpt import * 15 | import datetime 16 | import distutils.util 17 | DEVICE = 'cpu' # 'cuda' if torch.cuda.is_available() else 'cpu' 18 | 19 | ################################# 20 | # Define parameters manually # 21 | ################################# 22 | def define_parameters(): 23 | params = dict() 24 | # Neural Network 25 | params['epsilon_decay_linear'] = 1/100 26 | params['learning_rate'] = 0.00013629 27 | params['first_layer_size'] = 200 # neurons in the first layer 28 | params['second_layer_size'] = 20 # neurons in the second layer 29 | params['third_layer_size'] = 50 # neurons in the third layer 30 | params['episodes'] = 250 31 | params['memory_size'] = 2500 32 | params['batch_size'] = 1000 33 | # Settings 34 | params['weights_path'] = 'weights/weights.h5' 35 | params['train'] = False 36 | params["test"] = True 37 | params['plot_score'] = True 38 | params['log_path'] = 'logs/scores_' + str(datetime.datetime.now().strftime("%Y%m%d%H%M%S")) +'.txt' 39 | return params 40 | 41 | 42 | class Game: 43 | """ Initialize PyGAME """ 44 | 45 | def __init__(self, game_width, game_height): 46 | pygame.display.set_caption('SnakeGen') 47 | self.game_width = game_width 48 | self.game_height = game_height 49 | self.gameDisplay = pygame.display.set_mode((game_width, game_height + 60)) 50 | self.bg = pygame.image.load("img/background.png") 51 | self.crash = False 52 | self.player = Player(self) 53 | self.food = Food() 54 | self.score = 0 55 | 56 | 57 | class Player(object): 58 | def __init__(self, game): 59 | x = 0.45 * game.game_width 60 | y = 0.5 * game.game_height 61 | self.x = x - x % 20 62 | self.y = y - y % 20 63 | self.position = [] 64 | self.position.append([self.x, self.y]) 65 | self.food = 1 66 | self.eaten = False 67 | self.image = pygame.image.load('img/snakeBody.png') 68 | self.x_change = 20 69 | self.y_change = 0 70 | 71 | def update_position(self, x, y): 72 | if self.position[-1][0] != x or self.position[-1][1] != y: 73 | if self.food > 1: 74 | for i in range(0, self.food - 1): 75 | self.position[i][0], self.position[i][1] = self.position[i + 1] 76 | self.position[-1][0] = x 77 | self.position[-1][1] = y 78 | 79 | def do_move(self, move, x, y, game, food, agent): 80 | move_array = [self.x_change, self.y_change] 81 | 82 | if self.eaten: 83 | self.position.append([self.x, self.y]) 84 | self.eaten = False 85 | self.food = self.food + 1 86 | if np.array_equal(move, [1, 0, 0]): 87 | move_array = self.x_change, self.y_change 88 | elif np.array_equal(move, [0, 1, 0]) and self.y_change == 0: # right - going horizontal 89 | move_array = [0, self.x_change] 90 | elif np.array_equal(move, [0, 1, 0]) and self.x_change == 0: # right - going vertical 91 | move_array = [-self.y_change, 0] 92 | elif np.array_equal(move, [0, 0, 1]) and self.y_change == 0: # left - going horizontal 93 | move_array = [0, -self.x_change] 94 | elif np.array_equal(move, [0, 0, 1]) and self.x_change == 0: # left - going vertical 95 | move_array = [self.y_change, 0] 96 | self.x_change, self.y_change = move_array 97 | self.x = x + self.x_change 98 | self.y = y + self.y_change 99 | 100 | if self.x < 20 or self.x > game.game_width - 40 \ 101 | or self.y < 20 \ 102 | or self.y > game.game_height - 40 \ 103 | or [self.x, self.y] in self.position: 104 | game.crash = True 105 | eat(self, food, game) 106 | 107 | self.update_position(self.x, self.y) 108 | 109 | def display_player(self, x, y, food, game): 110 | self.position[-1][0] = x 111 | self.position[-1][1] = y 112 | 113 | if game.crash == False: 114 | for i in range(food): 115 | x_temp, y_temp = self.position[len(self.position) - 1 - i] 116 | game.gameDisplay.blit(self.image, (x_temp, y_temp)) 117 | 118 | update_screen() 119 | else: 120 | pygame.time.wait(300) 121 | 122 | 123 | class Food(object): 124 | def __init__(self): 125 | self.x_food = 240 126 | self.y_food = 200 127 | self.image = pygame.image.load('img/food2.png') 128 | 129 | def food_coord(self, game, player): 130 | x_rand = randint(20, game.game_width - 40) 131 | self.x_food = x_rand - x_rand % 20 132 | y_rand = randint(20, game.game_height - 40) 133 | self.y_food = y_rand - y_rand % 20 134 | if [self.x_food, self.y_food] not in player.position: 135 | return self.x_food, self.y_food 136 | else: 137 | self.food_coord(game, player) 138 | 139 | def display_food(self, x, y, game): 140 | game.gameDisplay.blit(self.image, (x, y)) 141 | update_screen() 142 | 143 | 144 | def eat(player, food, game): 145 | if player.x == food.x_food and player.y == food.y_food: 146 | food.food_coord(game, player) 147 | player.eaten = True 148 | game.score = game.score + 1 149 | 150 | 151 | def get_record(score, record): 152 | if score >= record: 153 | return score 154 | else: 155 | return record 156 | 157 | 158 | def display_ui(game, score, record): 159 | myfont = pygame.font.SysFont('Segoe UI', 20) 160 | myfont_bold = pygame.font.SysFont('Segoe UI', 20, True) 161 | text_score = myfont.render('SCORE: ', True, (0, 0, 0)) 162 | text_score_number = myfont.render(str(score), True, (0, 0, 0)) 163 | text_highest = myfont.render('HIGHEST SCORE: ', True, (0, 0, 0)) 164 | text_highest_number = myfont_bold.render(str(record), True, (0, 0, 0)) 165 | game.gameDisplay.blit(text_score, (45, 440)) 166 | game.gameDisplay.blit(text_score_number, (120, 440)) 167 | game.gameDisplay.blit(text_highest, (190, 440)) 168 | game.gameDisplay.blit(text_highest_number, (350, 440)) 169 | game.gameDisplay.blit(game.bg, (10, 10)) 170 | 171 | 172 | def display(player, food, game, record): 173 | game.gameDisplay.fill((255, 255, 255)) 174 | display_ui(game, game.score, record) 175 | player.display_player(player.position[-1][0], player.position[-1][1], player.food, game) 176 | food.display_food(food.x_food, food.y_food, game) 177 | 178 | 179 | def update_screen(): 180 | pygame.display.update() 181 | 182 | 183 | def initialize_game(player, game, food, agent, batch_size): 184 | state_init1 = agent.get_state(game, player, food) # [0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0] 185 | action = [1, 0, 0] 186 | player.do_move(action, player.x, player.y, game, food, agent) 187 | state_init2 = agent.get_state(game, player, food) 188 | reward1 = agent.set_reward(player, game.crash) 189 | agent.remember(state_init1, action, reward1, state_init2, game.crash) 190 | agent.replay_new(agent.memory, batch_size) 191 | 192 | 193 | def plot_seaborn(array_counter, array_score, train): 194 | sns.set(color_codes=True, font_scale=1.5) 195 | sns.set_style("white") 196 | plt.figure(figsize=(13,8)) 197 | fit_reg = False if train== False else True 198 | ax = sns.regplot( 199 | np.array([array_counter])[0], 200 | np.array([array_score])[0], 201 | #color="#36688D", 202 | x_jitter=.1, 203 | scatter_kws={"color": "#36688D"}, 204 | label='Data', 205 | fit_reg = fit_reg, 206 | line_kws={"color": "#F49F05"} 207 | ) 208 | # Plot the average line 209 | y_mean = [np.mean(array_score)]*len(array_counter) 210 | ax.plot(array_counter,y_mean, label='Mean', linestyle='--') 211 | ax.legend(loc='upper right') 212 | ax.set(xlabel='# games', ylabel='score') 213 | plt.show() 214 | 215 | 216 | def get_mean_stdev(array): 217 | return statistics.mean(array), statistics.stdev(array) 218 | 219 | 220 | def test(params): 221 | params['load_weights'] = True 222 | params['train'] = False 223 | params["test"] = False 224 | score, mean, stdev = run(params) 225 | return score, mean, stdev 226 | 227 | 228 | def run(params): 229 | """ 230 | Run the DQN algorithm, based on the parameters previously set. 231 | """ 232 | pygame.init() 233 | agent = DQNAgent(params) 234 | agent = agent.to(DEVICE) 235 | agent.optimizer = optim.Adam(agent.parameters(), weight_decay=0, lr=params['learning_rate']) 236 | counter_games = 0 237 | score_plot = [] 238 | counter_plot = [] 239 | record = 0 240 | total_score = 0 241 | while counter_games < params['episodes']: 242 | for event in pygame.event.get(): 243 | if event.type == pygame.QUIT: 244 | pygame.quit() 245 | quit() 246 | # Initialize classes 247 | game = Game(440, 440) 248 | player1 = game.player 249 | food1 = game.food 250 | 251 | # Perform first move 252 | initialize_game(player1, game, food1, agent, params['batch_size']) 253 | if params['display']: 254 | display(player1, food1, game, record) 255 | 256 | steps = 0 # steps since the last positive reward 257 | while (not game.crash) and (steps < 100): 258 | if not params['train']: 259 | agent.epsilon = 0.01 260 | else: 261 | # agent.epsilon is set to give randomness to actions 262 | agent.epsilon = 1 - (counter_games * params['epsilon_decay_linear']) 263 | 264 | # get old state 265 | state_old = agent.get_state(game, player1, food1) 266 | 267 | # perform random actions based on agent.epsilon, or choose the action 268 | if random.uniform(0, 1) < agent.epsilon: 269 | final_move = np.eye(3)[randint(0,2)] 270 | else: 271 | # predict action based on the old state 272 | with torch.no_grad(): 273 | state_old_tensor = torch.tensor(state_old.reshape((1, 11)), dtype=torch.float32).to(DEVICE) 274 | prediction = agent(state_old_tensor) 275 | final_move = np.eye(3)[np.argmax(prediction.detach().cpu().numpy()[0])] 276 | 277 | # perform new move and get new state 278 | player1.do_move(final_move, player1.x, player1.y, game, food1, agent) 279 | state_new = agent.get_state(game, player1, food1) 280 | 281 | # set reward for the new state 282 | reward = agent.set_reward(player1, game.crash) 283 | 284 | # if food is eaten, steps is set to 0 285 | if reward > 0: 286 | steps = 0 287 | 288 | if params['train']: 289 | # train short memory base on the new action and state 290 | agent.train_short_memory(state_old, final_move, reward, state_new, game.crash) 291 | # store the new data into a long term memory 292 | agent.remember(state_old, final_move, reward, state_new, game.crash) 293 | 294 | record = get_record(game.score, record) 295 | if params['display']: 296 | display(player1, food1, game, record) 297 | pygame.time.wait(params['speed']) 298 | steps+=1 299 | if params['train']: 300 | agent.replay_new(agent.memory, params['batch_size']) 301 | counter_games += 1 302 | total_score += game.score 303 | print(f'Game {counter_games} Score: {game.score}') 304 | score_plot.append(game.score) 305 | counter_plot.append(counter_games) 306 | mean, stdev = get_mean_stdev(score_plot) 307 | if params['train']: 308 | model_weights = agent.state_dict() 309 | torch.save(model_weights, params["weights_path"]) 310 | if params['plot_score']: 311 | plot_seaborn(counter_plot, score_plot, params['train']) 312 | return total_score, mean, stdev 313 | 314 | if __name__ == '__main__': 315 | # Set options to activate or deactivate the game view, and its speed 316 | pygame.font.init() 317 | parser = argparse.ArgumentParser() 318 | params = define_parameters() 319 | parser.add_argument("--display", nargs='?', type=distutils.util.strtobool, default=True) 320 | parser.add_argument("--speed", nargs='?', type=int, default=50) 321 | parser.add_argument("--bayesianopt", nargs='?', type=distutils.util.strtobool, default=False) 322 | args = parser.parse_args() 323 | print("Args", args) 324 | params['display'] = args.display 325 | params['speed'] = args.speed 326 | if args.bayesianopt: 327 | bayesOpt = BayesianOptimizer(params) 328 | bayesOpt.optimize_RL() 329 | if params['train']: 330 | print("Training...") 331 | params['load_weights'] = False # when training, the network is not pre-trained 332 | run(params) 333 | if params['test']: 334 | print("Testing...") 335 | params['train'] = False 336 | params['load_weights'] = True 337 | run(params) --------------------------------------------------------------------------------