├── requirements.txt ├── .ipynb_checkpoints └── Untitled-checkpoint.ipynb ├── __pycache__ ├── BitFlipEnv.cpython-38.pyc └── DeepQNetwork.cpython-38.pyc ├── results and plots ├── ddpg_with_her_plots │ ├── actor3 │ ├── critic3 │ ├── target_actor3 │ ├── target_critic3 │ ├── episode_plot.png │ └── plot_with_avg.png ├── dqn_with_her_plots │ ├── q_eval_with_her │ ├── q_next_with_her │ └── dqn_plot_with_her.png └── dqn_without_her_plots │ ├── q_eval_without_her │ ├── q_next_without_her │ └── dqn_plot_without_her.png ├── ddpg_with_her ├── __pycache__ │ ├── OUNoise.cpython-38.pyc │ ├── ActorCritic.cpython-38.pyc │ ├── DDPGAgent.cpython-38.pyc │ └── ContinuousEnv.cpython-38.pyc ├── OUNoise.py ├── ContinuousEnv.py ├── ActorCritic.py ├── DDPG_HER_main.py └── DDPGAgent.py ├── dqn_with_her ├── __pycache__ │ ├── HERMemory.cpython-38.pyc │ └── DQNAgentWithHER.cpython-38.pyc ├── HERMemory.py ├── HERmain.py └── DQNAgentWithHER.py ├── .gitignore ├── BitFlipEnv.py ├── DeepQNetwork.py ├── dqn_without_her ├── ExperienceReplayMemory.py ├── main.py └── DQNAgent.py └── README.md /requirements.txt: -------------------------------------------------------------------------------- 1 | numpy==1.18.5 2 | opencv-python==4.4.0.42 3 | matplotlib==3.2.2 4 | gym==0.17.2 5 | torch==1.6.0 6 | -------------------------------------------------------------------------------- /.ipynb_checkpoints/Untitled-checkpoint.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [], 3 | "metadata": {}, 4 | "nbformat": 4, 5 | "nbformat_minor": 4 6 | } 7 | -------------------------------------------------------------------------------- /__pycache__/BitFlipEnv.cpython-38.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hemilpanchiwala/Hindsight-Experience-Replay/HEAD/__pycache__/BitFlipEnv.cpython-38.pyc -------------------------------------------------------------------------------- /__pycache__/DeepQNetwork.cpython-38.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hemilpanchiwala/Hindsight-Experience-Replay/HEAD/__pycache__/DeepQNetwork.cpython-38.pyc -------------------------------------------------------------------------------- /results and plots/ddpg_with_her_plots/actor3: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hemilpanchiwala/Hindsight-Experience-Replay/HEAD/results and plots/ddpg_with_her_plots/actor3 -------------------------------------------------------------------------------- /results and plots/ddpg_with_her_plots/critic3: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hemilpanchiwala/Hindsight-Experience-Replay/HEAD/results and plots/ddpg_with_her_plots/critic3 -------------------------------------------------------------------------------- /ddpg_with_her/__pycache__/OUNoise.cpython-38.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hemilpanchiwala/Hindsight-Experience-Replay/HEAD/ddpg_with_her/__pycache__/OUNoise.cpython-38.pyc -------------------------------------------------------------------------------- /dqn_with_her/__pycache__/HERMemory.cpython-38.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hemilpanchiwala/Hindsight-Experience-Replay/HEAD/dqn_with_her/__pycache__/HERMemory.cpython-38.pyc -------------------------------------------------------------------------------- /ddpg_with_her/__pycache__/ActorCritic.cpython-38.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hemilpanchiwala/Hindsight-Experience-Replay/HEAD/ddpg_with_her/__pycache__/ActorCritic.cpython-38.pyc -------------------------------------------------------------------------------- /ddpg_with_her/__pycache__/DDPGAgent.cpython-38.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hemilpanchiwala/Hindsight-Experience-Replay/HEAD/ddpg_with_her/__pycache__/DDPGAgent.cpython-38.pyc -------------------------------------------------------------------------------- /results and plots/ddpg_with_her_plots/target_actor3: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hemilpanchiwala/Hindsight-Experience-Replay/HEAD/results and plots/ddpg_with_her_plots/target_actor3 -------------------------------------------------------------------------------- /results and plots/ddpg_with_her_plots/target_critic3: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hemilpanchiwala/Hindsight-Experience-Replay/HEAD/results and plots/ddpg_with_her_plots/target_critic3 -------------------------------------------------------------------------------- /results and plots/dqn_with_her_plots/q_eval_with_her: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hemilpanchiwala/Hindsight-Experience-Replay/HEAD/results and plots/dqn_with_her_plots/q_eval_with_her -------------------------------------------------------------------------------- /results and plots/dqn_with_her_plots/q_next_with_her: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hemilpanchiwala/Hindsight-Experience-Replay/HEAD/results and plots/dqn_with_her_plots/q_next_with_her -------------------------------------------------------------------------------- /ddpg_with_her/__pycache__/ContinuousEnv.cpython-38.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hemilpanchiwala/Hindsight-Experience-Replay/HEAD/ddpg_with_her/__pycache__/ContinuousEnv.cpython-38.pyc -------------------------------------------------------------------------------- /results and plots/ddpg_with_her_plots/episode_plot.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hemilpanchiwala/Hindsight-Experience-Replay/HEAD/results and plots/ddpg_with_her_plots/episode_plot.png -------------------------------------------------------------------------------- /dqn_with_her/__pycache__/DQNAgentWithHER.cpython-38.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hemilpanchiwala/Hindsight-Experience-Replay/HEAD/dqn_with_her/__pycache__/DQNAgentWithHER.cpython-38.pyc -------------------------------------------------------------------------------- /results and plots/ddpg_with_her_plots/plot_with_avg.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hemilpanchiwala/Hindsight-Experience-Replay/HEAD/results and plots/ddpg_with_her_plots/plot_with_avg.png -------------------------------------------------------------------------------- /results and plots/dqn_with_her_plots/dqn_plot_with_her.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hemilpanchiwala/Hindsight-Experience-Replay/HEAD/results and plots/dqn_with_her_plots/dqn_plot_with_her.png -------------------------------------------------------------------------------- /results and plots/dqn_without_her_plots/q_eval_without_her: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hemilpanchiwala/Hindsight-Experience-Replay/HEAD/results and plots/dqn_without_her_plots/q_eval_without_her -------------------------------------------------------------------------------- /results and plots/dqn_without_her_plots/q_next_without_her: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hemilpanchiwala/Hindsight-Experience-Replay/HEAD/results and plots/dqn_without_her_plots/q_next_without_her -------------------------------------------------------------------------------- /results and plots/dqn_without_her_plots/dqn_plot_without_her.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hemilpanchiwala/Hindsight-Experience-Replay/HEAD/results and plots/dqn_without_her_plots/dqn_plot_without_her.png -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | # IntelliJ 2 | .idea/ 3 | 4 | # pyenv 5 | .python-version 6 | 7 | # PEP 582; used by e.g. github.com/David-OConnor/pyflow 8 | __pypackages__/ 9 | 10 | # Environments 11 | .env 12 | .venv 13 | env/ 14 | venv/ 15 | ENV/ 16 | -------------------------------------------------------------------------------- /ddpg_with_her/OUNoise.py: -------------------------------------------------------------------------------- 1 | # This implementation is taken from https://github.com/openai/baselines/blob/master/baselines/ddpg/noise.py 2 | import numpy as np 3 | 4 | 5 | class OrnsteinUhlenbeckActionNoise: 6 | def __init__(self, mu, sigma=0.2, theta=.15, dt=1e-2, x0=None): 7 | self.theta = theta 8 | self.mu = mu 9 | self.sigma = sigma 10 | self.dt = dt 11 | self.x0 = x0 12 | self.reset() 13 | 14 | def __call__(self): 15 | x = self.x_prev + self.theta * (self.mu - self.x_prev) * self.dt + self.sigma * np.sqrt( 16 | self.dt) * np.random.normal(size=self.mu.shape) 17 | self.x_prev = x 18 | return x 19 | 20 | def reset(self): 21 | self.x_prev = self.x0 if self.x0 is not None else np.zeros_like(self.mu) 22 | 23 | def __repr__(self): 24 | return 'OrnsteinUhlenbeckActionNoise(mu={}, sigma={})'.format(self.mu, self.sigma) 25 | -------------------------------------------------------------------------------- /BitFlipEnv.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | 3 | 4 | class BitFlipEnv: 5 | """ 6 | A simple bit flip environment 7 | Bit of the current state flips as an action 8 | Reward of -1 for each step 9 | """ 10 | def __init__(self, n_bits): 11 | self.n_bits = n_bits 12 | self.state = np.random.randint(2, size=self.n_bits) 13 | self.goal = np.random.randint(2, size=self.n_bits) 14 | 15 | def reset_env(self): 16 | """ 17 | Resets the environment with new state and goal 18 | """ 19 | self.state = np.random.randint(2, size=self.n_bits) 20 | self.goal = np.random.randint(2, size=self.n_bits) 21 | 22 | def take_step(self, action): 23 | """ 24 | Returns updated_state, reward, and done for the step taken 25 | """ 26 | self.state[action] = self.state[action] ^ 1 27 | done = False 28 | if np.array_equal(self.state, self.goal): 29 | done = True 30 | reward = 0 31 | else: 32 | reward = -1 33 | return np.copy(self.state), reward, done 34 | 35 | def print_state(self): 36 | """ 37 | Prints the current state 38 | """ 39 | print('Current State:', self.state) 40 | -------------------------------------------------------------------------------- /ddpg_with_her/ContinuousEnv.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | 3 | 4 | class ContinuousEnv: 5 | """ 6 | A continuous environment 7 | A state is defined as a numpy array of size 2 with random values 8 | """ 9 | def __init__(self, size): 10 | self.size = size 11 | self.state = size * (2 * np.random.random(2) - 1) 12 | self.goal = size * (2 * np.random.random(2) - 1) 13 | self.threshold = 0.5 14 | 15 | def reset_env(self): 16 | """ 17 | Resets the environment with new state and goal 18 | """ 19 | self.state = self.size * (2 * np.random.random(2) - 1) 20 | self.goal = self.size * (2 * np.random.random(2) - 1) 21 | 22 | def take_step(self, action): 23 | """ 24 | Returns updated_state, reward, and done for the step taken 25 | """ 26 | self.state += (action / 4) 27 | good_done = np.linalg.norm(self.goal) <= self.threshold 28 | bad_done = np.max(np.abs(self.state)) > self.size 29 | if good_done: 30 | reward = 0 31 | else: 32 | reward = -1 33 | return np.copy(self.state / self.size), reward, good_done or bad_done 34 | 35 | def print_state(self): 36 | """ 37 | Prints the current state 38 | """ 39 | print('Current State:', self.state) 40 | -------------------------------------------------------------------------------- /DeepQNetwork.py: -------------------------------------------------------------------------------- 1 | import os 2 | 3 | import torch 4 | import torch.nn as nn 5 | import torch.nn.functional as F 6 | import torch.optim as optim 7 | 8 | 9 | class DeepQNetwork(nn.Module): 10 | """ 11 | Defines a deep Q network with a single hidden layer 12 | """ 13 | def __init__(self, learning_rate, n_actions, input_dims, checkpoint_dir, name): 14 | super(DeepQNetwork, self).__init__() 15 | 16 | self.fc1 = nn.Linear(input_dims, 512) 17 | self.fc2 = nn.Linear(512, 256) 18 | self.fc3 = nn.Linear(256, n_actions) 19 | 20 | self.optimizer = optim.RMSprop(self.parameters(), lr=learning_rate) 21 | self.loss = nn.MSELoss() 22 | 23 | self.device_type = 'cuda:0' if torch.cuda.is_available() else 'cpu' 24 | self.device = torch.device(self.device_type) 25 | self.to(self.device) 26 | 27 | self.checkpoint_dir = checkpoint_dir 28 | self.checkpoint_name = os.path.join(checkpoint_dir, name) 29 | 30 | def forward(self, data): 31 | fc_layer1 = F.relu(self.fc1(data)) 32 | fc_layer2 = F.relu(self.fc2(fc_layer1)) 33 | actions = self.fc3(fc_layer2) 34 | 35 | return actions 36 | 37 | def save_checkpoint(self): 38 | print('Saving checkpoint ...') 39 | torch.save(self.state_dict(), self.checkpoint_name) 40 | 41 | def load_checkpoint(self): 42 | print('Loading checkpoint ...') 43 | self.load_state_dict(torch.load(self.checkpoint_name)) 44 | 45 | 46 | -------------------------------------------------------------------------------- /dqn_without_her/ExperienceReplayMemory.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | 3 | 4 | class ExperienceReplayMemory(object): 5 | def __init__(self, memory_size, input_dims): 6 | super(ExperienceReplayMemory, self).__init__() 7 | self.max_mem_size = memory_size 8 | self.counter = 0 9 | 10 | # initializes the state, next_state, action, reward, and terminal experience memory 11 | print(type(input_dims)) 12 | self.state_memory = np.zeros((memory_size, input_dims), dtype=np.float32) 13 | self.next_state_memory = np.zeros((memory_size, input_dims), dtype=np.float32) 14 | self.reward_memory = np.zeros(memory_size, dtype=np.float32) 15 | self.action_memory = np.zeros(memory_size, dtype=np.int64) 16 | self.terminal_memory = np.zeros(memory_size, dtype=bool) 17 | 18 | def add_experience(self, state, action, reward, next_state, done): 19 | """ 20 | Adds new experience to the memory. 21 | """ 22 | curr_index = self.counter % self.max_mem_size 23 | 24 | self.state_memory[curr_index] = state 25 | self.action_memory[curr_index] = action 26 | self.reward_memory[curr_index] = reward 27 | self.next_state_memory[curr_index] = next_state 28 | self.terminal_memory[curr_index] = done 29 | 30 | self.counter += 1 31 | 32 | def get_random_experience(self, batch_size): 33 | """ 34 | Returns any random memory from the experience replay memory. 35 | """ 36 | rand_index = np.random.choice(min(self.counter, self.max_mem_size), batch_size) 37 | 38 | rand_state = self.state_memory[rand_index] 39 | rand_action = self.action_memory[rand_index] 40 | rand_reward = self.reward_memory[rand_index] 41 | rand_next_state = self.next_state_memory[rand_index] 42 | rand_done = self.terminal_memory[rand_index] 43 | 44 | return rand_state, rand_action, rand_reward, rand_next_state, rand_done 45 | -------------------------------------------------------------------------------- /dqn_with_her/HERMemory.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | 3 | 4 | class HindsightExperienceReplayMemory(object): 5 | """ 6 | Hindsight Experience replay - Takes size, input dimensions and number of actions as parameters 7 | """ 8 | def __init__(self, memory_size, input_dims, n_actions): 9 | super(HindsightExperienceReplayMemory, self).__init__() 10 | self.max_mem_size = memory_size 11 | self.counter = 0 12 | 13 | # initializes the state, next_state, action, reward, and terminal experience memory 14 | self.state_memory = np.zeros((memory_size, input_dims), dtype=np.float32) 15 | self.next_state_memory = np.zeros((memory_size, input_dims), dtype=np.float32) 16 | self.reward_memory = np.zeros(memory_size, dtype=np.float32) 17 | self.action_memory = np.zeros(memory_size, dtype=np.int64) 18 | self.terminal_memory = np.zeros(memory_size, dtype=bool) 19 | self.goal_memory = np.zeros((memory_size, input_dims), dtype=np.float32) 20 | 21 | def add_experience(self, state, action, reward, next_state, done, goal): 22 | """ 23 | Adds new experience to the memory 24 | """ 25 | curr_index = self.counter % self.max_mem_size 26 | 27 | self.state_memory[curr_index] = state 28 | self.action_memory[curr_index] = action 29 | self.reward_memory[curr_index] = reward 30 | self.next_state_memory[curr_index] = next_state 31 | self.terminal_memory[curr_index] = done 32 | self.goal_memory[curr_index] = goal 33 | 34 | self.counter += 1 35 | 36 | def get_random_experience(self, batch_size): 37 | """ 38 | Returns any random memory from the experience replay memory 39 | """ 40 | rand_index = np.random.choice(min(self.counter, self.max_mem_size), batch_size, replace=False) 41 | 42 | rand_state = self.state_memory[rand_index] 43 | rand_action = self.action_memory[rand_index] 44 | rand_reward = self.reward_memory[rand_index] 45 | rand_next_state = self.next_state_memory[rand_index] 46 | rand_done = self.terminal_memory[rand_index] 47 | rand_goal = self.goal_memory[rand_index] 48 | 49 | return rand_state, rand_action, rand_reward, rand_next_state, rand_done, rand_goal 50 | -------------------------------------------------------------------------------- /dqn_without_her/main.py: -------------------------------------------------------------------------------- 1 | import os 2 | import matplotlib.pyplot as plt 3 | 4 | import BitFlipEnv as bflip 5 | from dqn_without_her import DQNAgent as dqn 6 | 7 | if __name__ == '__main__': 8 | 9 | n_bits = 8 10 | env = bflip.BitFlipEnv(n_bits) 11 | 12 | n_episodes = 30000 13 | epsilon_history = [] 14 | episodes = [] 15 | win_percent = [] 16 | success = 0 17 | 18 | load_checkpoint = False 19 | 20 | checkpoint_dir = os.path.join(os.getcwd(), '/checkpoint/') 21 | 22 | # Initializes the DQN agent with simple experience replay 23 | agent = dqn.DQNAgent(learning_rate=0.0001, n_actions=n_bits, 24 | input_dims=n_bits, gamma=0.99, 25 | epsilon=0.9, batch_size=64, memory_size=10000, 26 | replace_network_count=50, 27 | checkpoint_dir=checkpoint_dir) 28 | 29 | if load_checkpoint: 30 | agent.load_model() 31 | 32 | # Iterate through the episodes 33 | for episode in range(n_episodes): 34 | env.reset_env() 35 | state = env.state 36 | goal = env.goal 37 | done = False 38 | 39 | for p in range(n_bits): 40 | if not done: 41 | action = agent.choose_action(state) 42 | next_state, reward, done = env.take_step(action) 43 | if not load_checkpoint: 44 | agent.store_experience(state, action, reward, next_state, done) 45 | agent.learn() 46 | state = next_state 47 | 48 | if done: 49 | success += 1 50 | 51 | # Average over last 500 episodes to avoid spikes 52 | if episode % 500 == 0: 53 | print('success rate for last 500 episodes after', episode, ':', success/5) 54 | if len(win_percent) > 0 and (success / 500) > win_percent[len(win_percent) - 1]: 55 | agent.save_model() 56 | epsilon_history.append(agent.epsilon) 57 | episodes.append(episode) 58 | win_percent.append(success/500.0) 59 | success = 0 60 | 61 | print('Epsilon History:', epsilon_history) 62 | print('Episodes:', episodes) 63 | print('Win percentage:', win_percent) 64 | 65 | figure = plt.figure() 66 | plt.plot(episodes, win_percent) 67 | 68 | plt.title('DQN without HER') 69 | plt.ylabel('Win Percentage') 70 | plt.xlabel('Number of Episodes') 71 | plt.ylim([0, 1]) 72 | 73 | plt.savefig(plt.savefig(os.path.join(os.getcwd(), '/plots/'))) 74 | -------------------------------------------------------------------------------- /ddpg_with_her/ActorCritic.py: -------------------------------------------------------------------------------- 1 | import os 2 | 3 | import torch 4 | import torch.nn as nn 5 | import torch.nn.functional as F 6 | import torch.optim as optim 7 | 8 | 9 | class Actor(nn.Module): 10 | """ 11 | Defines a neural network for the actor (that derives the actions) 12 | """ 13 | def __init__(self, input_dims, n_actions, learning_rate, checkpoint_dir, name): 14 | super(Actor, self).__init__() 15 | 16 | self.input = input_dims 17 | self.fc1 = nn.Linear(2*input_dims, 512) 18 | self.fc2 = nn.Linear(512, 256) 19 | self.fc3 = nn.Linear(256, n_actions) 20 | 21 | self.optimizer = optim.RMSprop(self.parameters(), lr=learning_rate) 22 | self.loss = nn.MSELoss() 23 | 24 | self.device_type = 'cuda:0' if torch.cuda.is_available() else 'cpu' 25 | self.device = torch.device(self.device_type) 26 | self.to(self.device) 27 | 28 | self.checkpoint_dir = checkpoint_dir 29 | self.checkpoint_name = os.path.join(checkpoint_dir, name) 30 | 31 | def forward(self, data): 32 | fc_layer1 = F.relu(self.fc1(data)) 33 | fc_layer2 = F.relu(self.fc2(fc_layer1)) 34 | actions = self.fc3(fc_layer2) 35 | 36 | return actions 37 | 38 | def save_checkpoint(self): 39 | print('Saving checkpoint ...') 40 | torch.save(self.state_dict(), self.checkpoint_name) 41 | 42 | def load_checkpoint(self): 43 | print('Loading checkpoint ...') 44 | self.load_state_dict(torch.load(self.checkpoint_name)) 45 | 46 | 47 | class Critic(nn.Module): 48 | """ 49 | Defines a neural network for the critic (that derives the value) 50 | """ 51 | def __init__(self, input_dims, n_actions, learning_rate, checkpoint_dir, name): 52 | super(Critic, self).__init__() 53 | 54 | self.fc1 = nn.Linear(2*input_dims + n_actions, 512) 55 | self.fc2 = nn.Linear(512, 256) 56 | self.fc3 = nn.Linear(256, 1) 57 | 58 | self.optimizer = optim.RMSprop(self.parameters(), lr=learning_rate) 59 | self.loss = nn.MSELoss() 60 | 61 | self.device_type = 'cuda:0' if torch.cuda.is_available() else 'cpu' 62 | self.device = torch.device(self.device_type) 63 | self.to(self.device) 64 | 65 | self.checkpoint_dir = checkpoint_dir 66 | self.checkpoint_name = os.path.join(checkpoint_dir, name) 67 | 68 | def forward(self, data1, data2): 69 | fc_layer1 = F.relu(self.fc1(torch.cat((data1, data2), 1))) 70 | fc_layer2 = F.relu(self.fc2(fc_layer1)) 71 | value = self.fc3(fc_layer2) 72 | 73 | return value 74 | 75 | def save_checkpoint(self): 76 | print('Saving checkpoint ...') 77 | torch.save(self.state_dict(), self.checkpoint_name) 78 | 79 | def load_checkpoint(self): 80 | print('Loading checkpoint ...') 81 | self.load_state_dict(torch.load(self.checkpoint_name)) 82 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Hindsight-Experience-Replay 2 | 3 | This repository provides the Pytorch implementation of Hindsight Experience Replay on Deep Q Network and Deep Deterministic Policy Gradient algorithms. 4 | 5 | Link to the paper: https://arxiv.org/pdf/1707.01495.pdf 6 | 7 | Authors: Marcin Andrychowicz, Filip Wolski, Alex Ray, Jonas Schneider, Rachel Fong, Peter Welinder, Bob McGrew, Josh Tobin, Pieter Abbeel, Wojciech Zaremba 8 | 9 | ## Training 10 | 11 | -
You can train the model simply by running the main.py files.
12 |DQN With HER -> HERmain.py
13 |DDPG With HER -> DDPG_HER_main.py
14 |DQN Without HER -> main.py
15 | 16 | - You can set the hyper-parameters such as learning_rate, discount factor (gamma), epsilon, and others while initializing the agent variable in the above-mentioned files 17 | 18 | ## Running the pre-trained model 19 | 20 | 21 | - Just run the files mentioned in the Training section with making the load_checkpoint variable to True which will load the saved parameters of the model and output the results. Just update the paths as per the saved results path. 22 | 23 | ## Results 24 | 25 |![]() |
29 | ![]() |
30 |
|
35 | With average
36 |
37 | |
38 |
39 |
40 | Without average (contains spikes)
41 |
42 | |
43 |