├── Gameplay_1568.gif ├── Gameplay_3973-242-37-0-0.gif ├── Gameplay_FR_1641-305-37-0-0.gif ├── LICENSE ├── README.md ├── asyncGoExplore.ipynb ├── markov.py ├── support.py └── syncGoExplore.ipynb /Gameplay_1568.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/R-McHenry/ParallelizedGoExplore/3943e13fdd4ccf771c1a1d71ae622a133838913f/Gameplay_1568.gif -------------------------------------------------------------------------------- /Gameplay_3973-242-37-0-0.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/R-McHenry/ParallelizedGoExplore/3943e13fdd4ccf771c1a1d71ae622a133838913f/Gameplay_3973-242-37-0-0.gif -------------------------------------------------------------------------------- /Gameplay_FR_1641-305-37-0-0.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/R-McHenry/ParallelizedGoExplore/3943e13fdd4ccf771c1a1d71ae622a133838913f/Gameplay_FR_1641-305-37-0-0.gif -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2019 R-McHenry 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # SynchronousGoExplore 2 | A first bare bones paralleled implementation of Go Explore as described by the Uber Engineering blog post 3 | 4 | Currently no deep learning is incorperated with the project. The avalible exploration policies are random, and markov chain. 5 | 6 | The notebook syncGoExplore.ipynb demonstrates the use of Go Explore to create a speedrun of level in a gym environment using multiple threads. 7 | 8 | Dependencies: 9 | 10 | ray (linux and osx only) 11 | gym retro 12 | imageio (also needs freeimage) 13 | rom file for the game environment 14 | 15 | Original reddit discussion with some more information: 16 | https://www.reddit.com/r/MachineLearning/comments/agf43s/d_go_explore_vs_sonic_the_hedgehog/ 17 | 18 | Original blog post by Uber: 19 | https://eng.uber.com/go-explore/ 20 | 21 | To do: 22 | 23 | Add smarter exploration policies (fast simple models and deep learning) 24 | 25 | Asynchronous Go Explore, i.e. allow workers to be constantly playing and updating only when ready/neccesary 26 | 27 | Add iterative deepening 28 | 29 | Add procedures for experiments to search for good hyperparameters 30 | 31 | Add the comb operation - sequentially go to each state encountered in a run that reaches the end of the level 32 | 33 | Some early gameplay: 34 | 35 | ![](Gameplay_3973-242-37-0-0.gif) 36 | 37 | A very polished run: 38 | 39 | ![](Gameplay_1568.gif) -------------------------------------------------------------------------------- /asyncGoExplore.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": 1, 6 | "metadata": {}, 7 | "outputs": [ 8 | { 9 | "name": "stderr", 10 | "output_type": "stream", 11 | "text": [ 12 | "WARNING: Not monitoring node memory since `psutil` is not installed. Install this with `pip install psutil` (or ray[debug]) to enable debugging of memory-related crashes.\n" 13 | ] 14 | } 15 | ], 16 | "source": [ 17 | "from ast import literal_eval\n", 18 | "import ray\n", 19 | "import gym\n", 20 | "import retro\n", 21 | "import os\n", 22 | "import numpy as np\n", 23 | "import matplotlib.pyplot as plt\n", 24 | "from markov import sampleMarkov, createMarkov, randMarkov\n", 25 | "from support import getInitial, verifyTrajectory, install_games_from_rom_dir, frameToCell, action_set, trajectoryToGif\n", 26 | "import time" 27 | ] 28 | }, 29 | { 30 | "cell_type": "code", 31 | "execution_count": 2, 32 | "metadata": {}, 33 | "outputs": [], 34 | "source": [ 35 | "import imageio\n", 36 | "imageio.plugins.freeimage.download()" 37 | ] 38 | }, 39 | { 40 | "cell_type": "code", 41 | "execution_count": 3, 42 | "metadata": {}, 43 | "outputs": [ 44 | { 45 | "name": "stderr", 46 | "output_type": "stream", 47 | "text": [ 48 | "WARNING: Not updating worker name since `setproctitle` is not installed. Install this with `pip install setproctitle` (or ray[debug]) to enable monitoring of worker processes.\n", 49 | "Process STDOUT and STDERR is being redirected to /tmp/ray/session_2019-01-23_00-10-20_2192/logs.\n", 50 | "Waiting for redis server at 127.0.0.1:46177 to respond...\n", 51 | "Waiting for redis server at 127.0.0.1:10050 to respond...\n", 52 | "Starting Redis shard with 10.0 GB max memory.\n", 53 | "WARNING: The object store is using /tmp instead of /dev/shm because /dev/shm has only 2147483648 bytes available. This may slow down performance! You may be able to free up space by deleting files in /dev/shm or terminating any running plasma_store_server processes. If you are inside a Docker container, you may need to pass an argument with the flag '--shm-size' to 'docker run'.\n", 54 | "Starting the Plasma object store with 12.785916313 GB memory using /tmp.\n", 55 | "\n", 56 | "======================================================================\n", 57 | "View the web UI at http://localhost:8888/notebooks/ray_ui.ipynb?token=c43331ca232bb127a9007cdf2aa763cc09a8e9dc596ccedd\n", 58 | "======================================================================\n", 59 | "\n" 60 | ] 61 | }, 62 | { 63 | "data": { 64 | "text/plain": [ 65 | "{'node_ip_address': None,\n", 66 | " 'redis_address': '172.17.0.6:46177',\n", 67 | " 'object_store_address': '/tmp/ray/session_2019-01-23_00-10-20_2192/sockets/plasma_store',\n", 68 | " 'webui_url': 'http://localhost:8888/notebooks/ray_ui.ipynb?token=c43331ca232bb127a9007cdf2aa763cc09a8e9dc596ccedd',\n", 69 | " 'raylet_socket_name': '/tmp/ray/session_2019-01-23_00-10-20_2192/sockets/raylet'}" 70 | ] 71 | }, 72 | "execution_count": 3, 73 | "metadata": {}, 74 | "output_type": "execute_result" 75 | } 76 | ], 77 | "source": [ 78 | "ray.init()" 79 | ] 80 | }, 81 | { 82 | "cell_type": "code", 83 | "execution_count": null, 84 | "metadata": {}, 85 | "outputs": [], 86 | "source": [ 87 | "def winCondition(cell, info):\n", 88 | " return info['level_end_bonus'] != 0\n", 89 | "\n", 90 | "def stopCondition(cell, info, step):\n", 91 | " return step > 500 or winCondition(cell, info) or info['lives'] != 3\n" 92 | ] 93 | }, 94 | { 95 | "cell_type": "code", 96 | "execution_count": 4, 97 | "metadata": {}, 98 | "outputs": [], 99 | "source": [ 100 | "@ray.remote\n", 101 | "class MasterActor(object):\n", 102 | " def __init__(self, \n", 103 | " initialPolicy,\n", 104 | " initialCell,\n", 105 | " initialFitness,\n", 106 | " initialTrajectory,\n", 107 | " initialState):\n", 108 | " self.best_trajectory = None\n", 109 | " self.best_fitness = None\n", 110 | " self.policy = initialPolicy\n", 111 | " self.cells = [initialCell]\n", 112 | " self.fitnesses = {initialCell:initialFitness}\n", 113 | " self.cell_prob = {initialCell:1}\n", 114 | " self.trajectories = {initialCell:initialTrajectory}\n", 115 | " self.states = {initialCell:initialState}\n", 116 | "\n", 117 | " def pushResult(self, cell, trajectory, state, info, step):\n", 118 | " fitness = len(trajectory)\n", 119 | " if cell in self.cells:\n", 120 | " if fitness < self.fitnesses[cell]:\n", 121 | " #Improvement to existing cell\n", 122 | " self.fitnesses[cell] = fitness\n", 123 | " self.trajectories[cell] = trajectory\n", 124 | " self.states[cell] = state\n", 125 | " self.cell_prob[cell] += 1\n", 126 | " else:\n", 127 | " if winCondition(cell, info):\n", 128 | " if self.best_trajectory is None:\n", 129 | " #First time win\n", 130 | " self.best_trajectory = trajectory\n", 131 | " self.best_fitness = fitness\n", 132 | " elif fitness\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[0;32mfrom\u001b[0m \u001b[0mast\u001b[0m \u001b[0;32mimport\u001b[0m \u001b[0mliteral_eval\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 2\u001b[0;31m \u001b[0;32mimport\u001b[0m \u001b[0mray\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 3\u001b[0m \u001b[0;32mimport\u001b[0m \u001b[0mgym\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 4\u001b[0m \u001b[0;32mimport\u001b[0m \u001b[0mretro\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 5\u001b[0m \u001b[0;32mimport\u001b[0m \u001b[0mos\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", 16 | "\u001b[0;31mModuleNotFoundError\u001b[0m: No module named 'ray'" 17 | ] 18 | } 19 | ], 20 | "source": [ 21 | "from ast import literal_eval\n", 22 | "import ray\n", 23 | "import gym\n", 24 | "import retro\n", 25 | "import os\n", 26 | "import numpy as np\n", 27 | "import matplotlib.pyplot as plt\n", 28 | "from markov import sampleMarkov, createMarkov, randMarkov\n", 29 | "from support import getInitial, verifyTrajectory, install_games_from_rom_dir, frameToCell, action_set, trajectoryToGif" 30 | ] 31 | }, 32 | { 33 | "cell_type": "code", 34 | "execution_count": null, 35 | "metadata": {}, 36 | "outputs": [], 37 | "source": [ 38 | "import imageio\n", 39 | "imageio.plugins.freeimage.download()" 40 | ] 41 | }, 42 | { 43 | "cell_type": "code", 44 | "execution_count": null, 45 | "metadata": {}, 46 | "outputs": [], 47 | "source": [ 48 | "ray.init()" 49 | ] 50 | }, 51 | { 52 | "cell_type": "code", 53 | "execution_count": null, 54 | "metadata": {}, 55 | "outputs": [], 56 | "source": [ 57 | "@ray.remote\n", 58 | "class GoExploreActor(object):\n", 59 | " def __init__(self, name):\n", 60 | " self.env = retro.make(name)\n", 61 | " self.env.reset()\n", 62 | " \n", 63 | " def setPolicy(self, policy):\n", 64 | " self.policy_type = policy['type']\n", 65 | " if self.policy_type=='markov':\n", 66 | " self.policy_weights = policy['weights']\n", 67 | " \n", 68 | " def updateCache(self, cells, fitnesses):\n", 69 | " self.cells = cells\n", 70 | " self.fitnesses = fitnesses\n", 71 | " \n", 72 | " def GoExplore(self, state, trajectory, steps):\n", 73 | " results = []\n", 74 | " visits = []\n", 75 | " self.env.em.set_state(state) \n", 76 | " \n", 77 | " recurrent_state = None\n", 78 | " \n", 79 | " if self.policy_type=='markov':\n", 80 | " recurrent_state = np.random.randint(12)\n", 81 | " \n", 82 | " for i in range(steps):\n", 83 | " action = None\n", 84 | " if self.policy_type == 'random':\n", 85 | " action = np.random.randint(12)\n", 86 | " if self.policy_type == 'markov':\n", 87 | " action = sampleMarkov(recurrent_state, self.policy_weights)\n", 88 | " recurrent_state = action\n", 89 | " \n", 90 | " observation, reward, done, info = self.env.step(action_set[action])\n", 91 | " trajectory.append(action)\n", 92 | " \n", 93 | " cell = frameToCell(observation, info)\n", 94 | " \n", 95 | " if cell not in visits:\n", 96 | " visits.append(cell)\n", 97 | " \n", 98 | " if cell in self.cells:\n", 99 | " if len(trajectory) < self.fitnesses[cell]:\n", 100 | " results.append((cell, trajectory.copy(), self.env.em.get_state(), info, i))\n", 101 | " else:\n", 102 | " results.append((cell, trajectory.copy(), self.env.em.get_state(), info, i))\n", 103 | " self.cells.append(cell)\n", 104 | " self.fitnesses[cell] = len(trajectory)\n", 105 | " return (results, visits)\n", 106 | "\n", 107 | " def reset(self):\n", 108 | " self.env.reset()" 109 | ] 110 | }, 111 | { 112 | "cell_type": "code", 113 | "execution_count": null, 114 | "metadata": {}, 115 | "outputs": [], 116 | "source": [ 117 | "install_games_from_rom_dir('roms/')\n", 118 | "\n", 119 | "game = 'SonicTheHedgehog-Genesis'\n", 120 | "stateStr = 'GreenHillZone.Act1.state'\n", 121 | "\n", 122 | "policy = {'type':'markov', 'weights':randMarkov(10,12)}\n", 123 | "\n", 124 | "initialCell, initialState, initialTrajectory, initialFitness = getInitial(game, stateStr)\n", 125 | "\n", 126 | "cells = [initialCell]\n", 127 | "fitnesses = {initialCell:initialFitness}\n", 128 | "cell_prob = {initialCell:1}\n", 129 | "trajectories = {initialCell:initialTrajectory}\n", 130 | "states = {initialCell:initialState}\n", 131 | "\n", 132 | "NWorkers = 2\n", 133 | "\n", 134 | "workers = [ GoExploreActor.remote(game) for _ in range(NWorkers)]\n", 135 | "\n", 136 | "for worker in workers:\n", 137 | " worker.setPolicy.remote(policy)\n", 138 | " " 139 | ] 140 | }, 141 | { 142 | "cell_type": "code", 143 | "execution_count": null, 144 | "metadata": {}, 145 | "outputs": [], 146 | "source": [ 147 | "def winCondition(cell, info):\n", 148 | " return info['level_end_bonus'] != 0\n", 149 | "\n", 150 | "best_trajectory = None\n", 151 | "verbose = True" 152 | ] 153 | }, 154 | { 155 | "cell_type": "code", 156 | "execution_count": null, 157 | "metadata": {}, 158 | "outputs": [], 159 | "source": [ 160 | "stat_cells = np.zeros(500)\n", 161 | "stat_avglen = np.zeros(500)\n", 162 | "stat_runlen = np.zeros(500)\n", 163 | "#Optionally add other numbers of steps to this list to alternate shorter episodes\n", 164 | "nsteps = [500]\n", 165 | "\n", 166 | "for i in range(500):\n", 167 | " print('Iteration ' + str(i))\n", 168 | " \n", 169 | " #Pass the full cell table and fitnesses table to each worker\n", 170 | " for worker in workers:\n", 171 | " worker.updateCache.remote(ray.put(cells), ray.put(fitnesses))\n", 172 | " normalized_cell_prob = np.array([cell_prob[c] for c in cells])\n", 173 | " normalized_cell_prob = normalized_cell_prob/normalized_cell_prob.sum()\n", 174 | " goCells = [cells[r] for r in np.random.choice(np.arange(len(cells)),size=(NWorkers) , p = normalized_cell_prob )]\n", 175 | " #Select a cell randomly\n", 176 | " #goCells = [cells[np.random.randint(len(cells))] for i in range(NWorkers)]\n", 177 | " R = ray.get( [ workers[i].GoExplore.remote(ray.put(states[goCells[i]]), ray.put(trajectories[goCells[i]]), nsteps[i%len(nsteps)]) for i in range(NWorkers)] )\n", 178 | " \n", 179 | " #Complie results and update master tables\n", 180 | " results = []\n", 181 | " visits = []\n", 182 | " for r, v in R:\n", 183 | " results+=r\n", 184 | " visits+=v\n", 185 | " \n", 186 | " for cell, trajectory, state, info, iteration in results:\n", 187 | " if cell in cells:\n", 188 | " if len(trajectory) < fitnesses[cell]:\n", 189 | " if verbose:\n", 190 | " print('Shortened trajectory to ' + cell + ' from ' \n", 191 | " + str(fitnesses[cell]) + ' to ' \n", 192 | " + str(len(trajectory)) + ', saving ' \n", 193 | " + str(fitnesses[cell]-len(trajectory)) + ' frames at step '\n", 194 | " + str(iteration))\n", 195 | " fitnesses[cell] = len(trajectory)\n", 196 | " trajectories[cell] = trajectory\n", 197 | " states[cell] = state\n", 198 | " cell_prob[cell] += 1\n", 199 | " else:\n", 200 | " if winCondition(cell, info):\n", 201 | " if best_trajectory is None:\n", 202 | " best_trajectory = trajectory\n", 203 | " print('Improved trajectory: ' + str(len(best_trajectory)) + ' Frames')\n", 204 | " elif len(trajectory)