├── Gameplay_1568.gif
├── Gameplay_3973-242-37-0-0.gif
├── Gameplay_FR_1641-305-37-0-0.gif
├── LICENSE
├── README.md
├── asyncGoExplore.ipynb
├── markov.py
├── support.py
└── syncGoExplore.ipynb


/Gameplay_1568.gif:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/R-McHenry/ParallelizedGoExplore/3943e13fdd4ccf771c1a1d71ae622a133838913f/Gameplay_1568.gif


--------------------------------------------------------------------------------
/Gameplay_3973-242-37-0-0.gif:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/R-McHenry/ParallelizedGoExplore/3943e13fdd4ccf771c1a1d71ae622a133838913f/Gameplay_3973-242-37-0-0.gif


--------------------------------------------------------------------------------
/Gameplay_FR_1641-305-37-0-0.gif:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/R-McHenry/ParallelizedGoExplore/3943e13fdd4ccf771c1a1d71ae622a133838913f/Gameplay_FR_1641-305-37-0-0.gif


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) 2019 R-McHenry
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # SynchronousGoExplore
 2 | A first bare bones paralleled implementation of Go Explore as described by the Uber Engineering blog post
 3 | 
 4 | Currently no deep learning is incorperated with the project. The avalible exploration policies are random, and markov chain.
 5 | 
 6 | The notebook syncGoExplore.ipynb demonstrates the use of Go Explore to create a speedrun of level in a gym environment using multiple threads.
 7 | 
 8 | Dependencies:
 9 | 
10 | 	ray (linux and osx only)
11 | 	gym retro
12 | 	imageio (also needs freeimage)
13 | 	rom file for the game environment
14 | 
15 | Original reddit discussion with some more information:
16 | https://www.reddit.com/r/MachineLearning/comments/agf43s/d_go_explore_vs_sonic_the_hedgehog/
17 | 
18 | Original blog post by Uber:
19 | https://eng.uber.com/go-explore/
20 | 
21 | To do:
22 | 
23 | Add smarter exploration policies (fast simple models and deep learning)
24 | 
25 | Asynchronous Go Explore, i.e. allow workers to be constantly playing and updating only when ready/neccesary
26 | 
27 | Add iterative deepening
28 | 
29 | Add procedures for experiments to search for good hyperparameters
30 | 
31 | Add the comb operation - sequentially go to each state encountered in a run that reaches the end of the level
32 | 
33 | Some early gameplay:
34 | 
35 | ![](Gameplay_3973-242-37-0-0.gif)
36 | 
37 | A very polished run:
38 | 
39 | ![](Gameplay_1568.gif)


--------------------------------------------------------------------------------
/asyncGoExplore.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "code",
  5 |    "execution_count": 1,
  6 |    "metadata": {},
  7 |    "outputs": [
  8 |     {
  9 |      "name": "stderr",
 10 |      "output_type": "stream",
 11 |      "text": [
 12 |       "WARNING: Not monitoring node memory since `psutil` is not installed. Install this with `pip install psutil` (or ray[debug]) to enable debugging of memory-related crashes.\n"
 13 |      ]
 14 |     }
 15 |    ],
 16 |    "source": [
 17 |     "from ast import literal_eval\n",
 18 |     "import ray\n",
 19 |     "import gym\n",
 20 |     "import retro\n",
 21 |     "import os\n",
 22 |     "import numpy as np\n",
 23 |     "import matplotlib.pyplot as plt\n",
 24 |     "from markov import sampleMarkov, createMarkov, randMarkov\n",
 25 |     "from support import getInitial, verifyTrajectory, install_games_from_rom_dir, frameToCell, action_set, trajectoryToGif\n",
 26 |     "import time"
 27 |    ]
 28 |   },
 29 |   {
 30 |    "cell_type": "code",
 31 |    "execution_count": 2,
 32 |    "metadata": {},
 33 |    "outputs": [],
 34 |    "source": [
 35 |     "import imageio\n",
 36 |     "imageio.plugins.freeimage.download()"
 37 |    ]
 38 |   },
 39 |   {
 40 |    "cell_type": "code",
 41 |    "execution_count": 3,
 42 |    "metadata": {},
 43 |    "outputs": [
 44 |     {
 45 |      "name": "stderr",
 46 |      "output_type": "stream",
 47 |      "text": [
 48 |       "WARNING: Not updating worker name since `setproctitle` is not installed. Install this with `pip install setproctitle` (or ray[debug]) to enable monitoring of worker processes.\n",
 49 |       "Process STDOUT and STDERR is being redirected to /tmp/ray/session_2019-01-23_00-10-20_2192/logs.\n",
 50 |       "Waiting for redis server at 127.0.0.1:46177 to respond...\n",
 51 |       "Waiting for redis server at 127.0.0.1:10050 to respond...\n",
 52 |       "Starting Redis shard with 10.0 GB max memory.\n",
 53 |       "WARNING: The object store is using /tmp instead of /dev/shm because /dev/shm has only 2147483648 bytes available. This may slow down performance! You may be able to free up space by deleting files in /dev/shm or terminating any running plasma_store_server processes. If you are inside a Docker container, you may need to pass an argument with the flag '--shm-size' to 'docker run'.\n",
 54 |       "Starting the Plasma object store with 12.785916313 GB memory using /tmp.\n",
 55 |       "\n",
 56 |       "======================================================================\n",
 57 |       "View the web UI at http://localhost:8888/notebooks/ray_ui.ipynb?token=c43331ca232bb127a9007cdf2aa763cc09a8e9dc596ccedd\n",
 58 |       "======================================================================\n",
 59 |       "\n"
 60 |      ]
 61 |     },
 62 |     {
 63 |      "data": {
 64 |       "text/plain": [
 65 |        "{'node_ip_address': None,\n",
 66 |        " 'redis_address': '172.17.0.6:46177',\n",
 67 |        " 'object_store_address': '/tmp/ray/session_2019-01-23_00-10-20_2192/sockets/plasma_store',\n",
 68 |        " 'webui_url': 'http://localhost:8888/notebooks/ray_ui.ipynb?token=c43331ca232bb127a9007cdf2aa763cc09a8e9dc596ccedd',\n",
 69 |        " 'raylet_socket_name': '/tmp/ray/session_2019-01-23_00-10-20_2192/sockets/raylet'}"
 70 |       ]
 71 |      },
 72 |      "execution_count": 3,
 73 |      "metadata": {},
 74 |      "output_type": "execute_result"
 75 |     }
 76 |    ],
 77 |    "source": [
 78 |     "ray.init()"
 79 |    ]
 80 |   },
 81 |   {
 82 |    "cell_type": "code",
 83 |    "execution_count": null,
 84 |    "metadata": {},
 85 |    "outputs": [],
 86 |    "source": [
 87 |     "def winCondition(cell, info):\n",
 88 |     "    return info['level_end_bonus'] != 0\n",
 89 |     "\n",
 90 |     "def stopCondition(cell, info, step):\n",
 91 |     "    return step > 500 or winCondition(cell, info) or info['lives'] != 3\n"
 92 |    ]
 93 |   },
 94 |   {
 95 |    "cell_type": "code",
 96 |    "execution_count": 4,
 97 |    "metadata": {},
 98 |    "outputs": [],
 99 |    "source": [
100 |     "@ray.remote\n",
101 |     "class MasterActor(object):\n",
102 |     "    def __init__(self, \n",
103 |     "                 initialPolicy,\n",
104 |     "                 initialCell,\n",
105 |     "                 initialFitness,\n",
106 |     "                 initialTrajectory,\n",
107 |     "                 initialState):\n",
108 |     "        self.best_trajectory = None\n",
109 |     "        self.best_fitness = None\n",
110 |     "        self.policy = initialPolicy\n",
111 |     "        self.cells = [initialCell]\n",
112 |     "        self.fitnesses = {initialCell:initialFitness}\n",
113 |     "        self.cell_prob = {initialCell:1}\n",
114 |     "        self.trajectories = {initialCell:initialTrajectory}\n",
115 |     "        self.states = {initialCell:initialState}\n",
116 |     "\n",
117 |     "    def pushResult(self, cell, trajectory, state, info, step):\n",
118 |     "        fitness = len(trajectory)\n",
119 |     "        if cell in self.cells:\n",
120 |     "            if fitness < self.fitnesses[cell]:\n",
121 |     "                #Improvement to existing cell\n",
122 |     "                self.fitnesses[cell] = fitness\n",
123 |     "                self.trajectories[cell] = trajectory\n",
124 |     "                self.states[cell] = state\n",
125 |     "                self.cell_prob[cell] += 1\n",
126 |     "        else:\n",
127 |     "            if winCondition(cell, info):\n",
128 |     "                if self.best_trajectory is None:\n",
129 |     "                    #First time win\n",
130 |     "                    self.best_trajectory = trajectory\n",
131 |     "                    self.best_fitness = fitness\n",
132 |     "                elif fitness<self.best_fitness:\n",
133 |     "                    #Improvement win\n",
134 |     "                    self.best_trajectory = trajectory       \n",
135 |     "                    self.best_fitness = fitness\n",
136 |     "            else:\n",
137 |     "                #First time to this new cell\n",
138 |     "                self.cells.append(cell)\n",
139 |     "                self.fitnesses[cell] = fitness\n",
140 |     "                self.trajectories[cell] = trajectory\n",
141 |     "                self.states[cell] = state\n",
142 |     "                self.cell_prob[cell] = 10\n",
143 |     "\n",
144 |     "    def pullCache(self):\n",
145 |     "        return (self.policy, self.cells, self.fitnesses)\n",
146 |     "    \n",
147 |     "    def pullCell(self, cell):\n",
148 |     "        return (self.states[cell], self.trajectories[cell])\n",
149 |     "    \n",
150 |     "    def pullBestTrajectory(self):\n",
151 |     "        return self.best_trajectory\n",
152 |     "    \n",
153 |     "    def renormalizeCellProbs(self):\n",
154 |     "        padd = .1\n",
155 |     "        probsSum = np.array([self.cell_prob[c] for c in self.cells]).sum() + len(self.cells)*padd\n",
156 |     "        for cell in self.cells:\n",
157 |     "            self.cell_prob[cell]=(self.cell_prob[cell]+padd)/probsSum\n",
158 |     "    \n",
159 |     "    def updatePolicy(self):\n",
160 |     "        if self.best_trajectory is None:\n",
161 |     "            self.policy['weights'] = createMarkov(self.trajectories[self.cells[-1]],12)\n",
162 |     "        else:\n",
163 |     "            self.policy['weights'] = createMarkov(self.best_trajectory)\n",
164 |     "    \n",
165 |     "    def pullGo(self):\n",
166 |     "        \n",
167 |     "        normalized_cell_prob = np.array([self.cell_prob[c] for c in self.cells])\n",
168 |     "        normalized_cell_prob = normalized_cell_prob/normalized_cell_prob.sum()\n",
169 |     "    \n",
170 |     "        goCell = self.cells[np.random.choice(np.arange(len(self.cells)), p = normalized_cell_prob )]\n",
171 |     "    \n",
172 |     "        return (self.states[goCell], self.trajectories[goCell])"
173 |    ]
174 |   },
175 |   {
176 |    "cell_type": "code",
177 |    "execution_count": 5,
178 |    "metadata": {},
179 |    "outputs": [],
180 |    "source": [
181 |     "\n",
182 |     "@ray.remote\n",
183 |     "def GoExploreWorker(game, master):\n",
184 |     "    env = retro.make(game)\n",
185 |     "    env.reset()\n",
186 |     "    while(True):\n",
187 |     "        policy, cells, fitnesses = ray.get(master.pullCache.remote())\n",
188 |     "        for _ in range(10):\n",
189 |     "            state, trajectory = ray.get(master.pullGo.remote())\n",
190 |     "\n",
191 |     "            recurrent_state = None\n",
192 |     "\n",
193 |     "            if policy['type']=='markov':\n",
194 |     "                recurrent_state = np.random.randint(12)\n",
195 |     "\n",
196 |     "            env.em.set_state(state)\n",
197 |     "            step = 0\n",
198 |     "            while(True):\n",
199 |     "\n",
200 |     "                action = None\n",
201 |     "                if policy['type'] == 'random':\n",
202 |     "                    action = np.random.randint(12)\n",
203 |     "                if policy['type'] == 'markov':\n",
204 |     "                    action = sampleMarkov(recurrent_state, policy['weights'])\n",
205 |     "                    recurrent_state = action\n",
206 |     "\n",
207 |     "                observation, reward, done, info = env.step(action_set[action])\n",
208 |     "                trajectory.append(action)\n",
209 |     "                cell = frameToCell(observation, info)\n",
210 |     "                fitness = len(trajectory)\n",
211 |     "                state = env.em.get_state()\n",
212 |     "                if cell in cells:\n",
213 |     "                    if fitness < fitnesses[cell]:\n",
214 |     "                        master.pushResult.remote(cell, trajectory.copy(), state, info, step)\n",
215 |     "                else:\n",
216 |     "                    master.pushResult.remote(cell, trajectory.copy(), state, info, step)\n",
217 |     "                    cells.append(cell)\n",
218 |     "                    fitnesses[cell]=fitness\n",
219 |     "                    \n",
220 |     "                if (stopCondition(cell,info,step)):\n",
221 |     "                    break\n",
222 |     "                step += 1"
223 |    ]
224 |   },
225 |   {
226 |    "cell_type": "code",
227 |    "execution_count": null,
228 |    "metadata": {},
229 |    "outputs": [],
230 |    "source": []
231 |   },
232 |   {
233 |    "cell_type": "code",
234 |    "execution_count": 6,
235 |    "metadata": {},
236 |    "outputs": [
237 |     {
238 |      "name": "stdout",
239 |      "output_type": "stream",
240 |      "text": [
241 |       "Importing SonicTheHedgehog-Genesis\n",
242 |       "Imported 1 games\n"
243 |      ]
244 |     }
245 |    ],
246 |    "source": [
247 |     "install_games_from_rom_dir('roms/')\n",
248 |     "\n",
249 |     "game = 'SonicTheHedgehog-Genesis'\n",
250 |     "stateStr = 'GreenHillZone.Act1.state'\n",
251 |     "\n",
252 |     "initialPolicy = {'type':'markov', 'weights':randMarkov(10,12)}\n",
253 |     "\n",
254 |     "initialCell, initialState, initialTrajectory, initialFitness = getInitial(game, stateStr)\n",
255 |     "\n",
256 |     "NWorkers = 8\n",
257 |     "\n",
258 |     "master = MasterActor.remote(initialPolicy, initialCell, initialFitness, initialTrajectory, initialState)\n",
259 |     "workers = [ GoExploreWorker.remote(game, master) for _ in range(NWorkers)]    "
260 |    ]
261 |   },
262 |   {
263 |    "cell_type": "code",
264 |    "execution_count": 7,
265 |    "metadata": {},
266 |    "outputs": [],
267 |    "source": [
268 |     "#time.sleep(10)\n",
269 |     "#policy, cells, fitnesses = ray.get(master.pullCache.remote())\n",
270 |     "#test_cell = cells[-1]\n",
271 |     "#state, trajectory = ray.get(master.pullCell.remote(test_cell))\n",
272 |     "#verifyTrajectory(game, stateStr, trajectory, state)"
273 |    ]
274 |   },
275 |   {
276 |    "cell_type": "code",
277 |    "execution_count": null,
278 |    "metadata": {},
279 |    "outputs": [
280 |     {
281 |      "name": "stdout",
282 |      "output_type": "stream",
283 |      "text": [
284 |       "Time elapsed: 3.003373384475708, Cells: 1\n",
285 |       "Time elapsed: 13.310152769088745, Cells: 148\n"
286 |      ]
287 |     }
288 |    ],
289 |    "source": [
290 |     "start_time = time.time()\n",
291 |     "i = 0\n",
292 |     "while True:\n",
293 |     "    time.sleep(1)\n",
294 |     "    master.renormalizeCellProbs.remote()\n",
295 |     "    master.updatePolicy.remote()\n",
296 |     "    \n",
297 |     "    \n",
298 |     "    \n",
299 |     "    if i%10==0:\n",
300 |     "        message = ''\n",
301 |     "        policy, cells, fitnesses = ray.get(master.pullCache.remote())\n",
302 |     "        best_trajectory = ray.get(master.pullBestTrajectory.remote())\n",
303 |     "        message += 'Time elapsed: ' + str(time.time()-start_time)\n",
304 |     "        message += ', Cells: ' + str(len(cells))\n",
305 |     "        if best_trajectory is not None:\n",
306 |     "            message += ', Best trajectory length: ' + str(len(best_trajectory))\n",
307 |     "            if i%200==0:\n",
308 |     "                trajectoryToGif(game, stateStr, best_trajectory, True, 'Gameplay_FIN_'+str(len(best_trajectory))+'.gif')\n",
309 |     "        else:\n",
310 |     "            if i%200==0:\n",
311 |     "                cell = cells[np.array([literal_eval(cell)[0] for cell in cells]).argsort()[-1]]\n",
312 |     "                state, trajectory = ray.get(master.pullCell.remote(cell))\n",
313 |     "                c = literal_eval(cell)\n",
314 |     "                trajectoryToGif(game, stateStr, trajectory, True, 'Gameplay_FR_'+str(len(trajectory))+'-'+str(c[0])+'-'+str(c[1])+'-'+str(c[2])+'-'+str(c[3])+'.gif')  \n",
315 |     "        print(message)\n",
316 |     "    \n",
317 |     "    i+=1"
318 |    ]
319 |   },
320 |   {
321 |    "cell_type": "code",
322 |    "execution_count": null,
323 |    "metadata": {},
324 |    "outputs": [],
325 |    "source": []
326 |   }
327 |  ],
328 |  "metadata": {
329 |   "kernelspec": {
330 |    "display_name": "Python 3",
331 |    "language": "python",
332 |    "name": "python3"
333 |   },
334 |   "language_info": {
335 |    "codemirror_mode": {
336 |     "name": "ipython",
337 |     "version": 3
338 |    },
339 |    "file_extension": ".py",
340 |    "mimetype": "text/x-python",
341 |    "name": "python",
342 |    "nbconvert_exporter": "python",
343 |    "pygments_lexer": "ipython3",
344 |    "version": "3.6.5"
345 |   }
346 |  },
347 |  "nbformat": 4,
348 |  "nbformat_minor": 2
349 | }
350 | 


--------------------------------------------------------------------------------
/markov.py:
--------------------------------------------------------------------------------
 1 | import numpy as np
 2 | 
 3 | def sampleMarkov(state, cpt):
 4 |     """
 5 |     Get the next state as an integer given the current state and conditional probability table
 6 |     """
 7 |     return np.random.choice(cpt.shape[0],p=cpt[state])
 8 | 
 9 | def createMarkov(trajectory, size):
10 |     """
11 |     Create a markov chain by counting all action transitions from one frame to the next in a given trajectory
12 |     """
13 |     #initialize with ones to prevent divide by zero and allow some random actions to be possible still
14 |     cpt = np.ones((size,size))
15 |     for i in range(len(trajectory)-1):
16 |         crt = trajectory[i]
17 |         nxt = trajectory[i+1]
18 |         cpt[crt,nxt] += 1
19 |     for i in range(size):
20 |         cpt[i,:] /= cpt[i,:].sum()
21 |     return cpt
22 | 
23 | def randMarkov(eye, size):
24 |     """
25 |     Create a random markov chain by summing random uniform values with an identity matrix scaled by eye.
26 |     """
27 |     cpt = np.random.rand(size,size)
28 |     cpt += np.eye(size)*eye    
29 |     for i in range(size):
30 |         cpt[i,:] /= cpt[i,:].sum()
31 |     return cpt


--------------------------------------------------------------------------------
/support.py:
--------------------------------------------------------------------------------
 1 | import os
 2 | import sys
 3 | import time
 4 | import retro.data
 5 | import numpy as np
 6 | import ray
 7 | import imageio
 8 | 
 9 | #Action set for sonic the hedgehog
10 | #Allowing for any combination of valid moves
11 | #Removing redundant and useless actions like look up, hold down left and right, other jump buttons, etc
12 | action_set = [
13 |     [0,0,0,0,0,0,0,0,0,0,0,0],
14 |     [1,0,0,0,0,0,0,0,0,0,0,0],
15 |     [0,0,0,0,0,1,0,0,0,0,0,0],
16 |     [0,0,0,0,0,0,1,0,0,0,0,0],
17 |     [0,0,0,0,0,0,0,1,0,0,0,0],
18 |     [1,0,0,0,0,1,0,0,0,0,0,0],
19 |     [1,0,0,0,0,0,1,0,0,0,0,0],
20 |     [1,0,0,0,0,0,0,1,0,0,0,0],
21 |     [0,0,0,0,0,1,1,0,0,0,0,0],
22 |     [0,0,0,0,0,1,0,1,0,0,0,0],
23 |     [1,0,0,0,0,1,1,0,0,0,0,0],
24 |     [1,0,0,0,0,1,0,1,0,0,0,0]
25 | ]
26 | 
27 | def trajectoryToGif(game, state, trajectory, skip, name):
28 |     """
29 |     Re run a saved trajectory and save the result
30 |     """
31 |     frames = []
32 |     env = retro.make(game=game,state=state)
33 |     observation = env.reset()
34 |     i=0
35 |     for action in trajectory:
36 |         observation, reward, done, info = env.step(action_set[ action] )
37 |         if (skip and i%2==0) or (not skip):
38 |             frames.append(observation)
39 |         i+=1
40 |     
41 |     imageio.mimwrite(name, frames, format='GIF-FI', fps=30)
42 |     env.close()
43 | 
44 | def frameToCell(frame, info):
45 |     """
46 |     Convert a frame and game info to a cell representation
47 |     """
48 |     return str((info['x']//32,info['y']//32,info['act'],info['zone']))
49 | 
50 | def getInitial(game, state):
51 |     """
52 |     Output an initial cell, emulator state, trajectory and trajectory score
53 |     """
54 |     env = retro.make(game=game, state=state)
55 | 
56 |     env.reset()
57 |     observation, reward, done, info = env.step( env.action_space.sample() )
58 |     env.reset()
59 | 
60 |     cell = frameToCell(observation, info)
61 |     state =  env.em.get_state()
62 |     env.close()
63 |     return (cell, state, [], 0)
64 | 
65 | def verifyTrajectory(game, initialState, trajectory, finalState):
66 |     """
67 |     Verify that a trajectory of moves reaches the associated game state, in case saved trajectories to become out of sync with the emulator states
68 |     """
69 |     env = retro.make(game=game,state=initialState)
70 |     observation = env.reset()
71 | 
72 |     for action in trajectory:
73 |         observation, reward, done, info = env.step(action_set[ action] )
74 | 
75 |     observation, reward, done, info = env.step(action_set[ 0 ] )
76 |     a = (str(info['x']) +', '+str(info['y']))
77 | 
78 |     env.em.set_state(finalState)
79 |     observation, reward, done, info = env.step(action_set[ 0 ] )
80 |     b = (str(info['x']) +', '+str(info['y']))
81 |     env.close()
82 |     if a==b:
83 |         print('Verified trajectory')
84 |     else:
85 |         print('Verify failed, locations not equal:')
86 |         print(a)
87 |         print(b)
88 | 
89 | def install_games_from_rom_dir(romdir):
90 |     """
91 |     Add the ROMs to the Retro Game list
92 |     """
93 |     roms = [os.path.join(romdir, rom) for rom in os.listdir(romdir)]
94 |     retro.data.merge(*roms, quiet=False)
95 |     
96 |     
97 | 


--------------------------------------------------------------------------------
/syncGoExplore.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "code",
  5 |    "execution_count": 1,
  6 |    "metadata": {},
  7 |    "outputs": [
  8 |     {
  9 |      "ename": "ModuleNotFoundError",
 10 |      "evalue": "No module named 'ray'",
 11 |      "output_type": "error",
 12 |      "traceback": [
 13 |       "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
 14 |       "\u001b[0;31mModuleNotFoundError\u001b[0m                       Traceback (most recent call last)",
 15 |       "\u001b[0;32m<ipython-input-1-2b3eb2205af9>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m()\u001b[0m\n\u001b[1;32m      1\u001b[0m \u001b[0;32mfrom\u001b[0m \u001b[0mast\u001b[0m \u001b[0;32mimport\u001b[0m \u001b[0mliteral_eval\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 2\u001b[0;31m \u001b[0;32mimport\u001b[0m \u001b[0mray\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m      3\u001b[0m \u001b[0;32mimport\u001b[0m \u001b[0mgym\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m      4\u001b[0m \u001b[0;32mimport\u001b[0m \u001b[0mretro\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m      5\u001b[0m \u001b[0;32mimport\u001b[0m \u001b[0mos\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
 16 |       "\u001b[0;31mModuleNotFoundError\u001b[0m: No module named 'ray'"
 17 |      ]
 18 |     }
 19 |    ],
 20 |    "source": [
 21 |     "from ast import literal_eval\n",
 22 |     "import ray\n",
 23 |     "import gym\n",
 24 |     "import retro\n",
 25 |     "import os\n",
 26 |     "import numpy as np\n",
 27 |     "import matplotlib.pyplot as plt\n",
 28 |     "from markov import sampleMarkov, createMarkov, randMarkov\n",
 29 |     "from support import getInitial, verifyTrajectory, install_games_from_rom_dir, frameToCell, action_set, trajectoryToGif"
 30 |    ]
 31 |   },
 32 |   {
 33 |    "cell_type": "code",
 34 |    "execution_count": null,
 35 |    "metadata": {},
 36 |    "outputs": [],
 37 |    "source": [
 38 |     "import imageio\n",
 39 |     "imageio.plugins.freeimage.download()"
 40 |    ]
 41 |   },
 42 |   {
 43 |    "cell_type": "code",
 44 |    "execution_count": null,
 45 |    "metadata": {},
 46 |    "outputs": [],
 47 |    "source": [
 48 |     "ray.init()"
 49 |    ]
 50 |   },
 51 |   {
 52 |    "cell_type": "code",
 53 |    "execution_count": null,
 54 |    "metadata": {},
 55 |    "outputs": [],
 56 |    "source": [
 57 |     "@ray.remote\n",
 58 |     "class GoExploreActor(object):\n",
 59 |     "    def __init__(self, name):\n",
 60 |     "        self.env = retro.make(name)\n",
 61 |     "        self.env.reset()\n",
 62 |     "        \n",
 63 |     "    def setPolicy(self, policy):\n",
 64 |     "        self.policy_type = policy['type']\n",
 65 |     "        if self.policy_type=='markov':\n",
 66 |     "            self.policy_weights = policy['weights']\n",
 67 |     "    \n",
 68 |     "    def updateCache(self, cells, fitnesses):\n",
 69 |     "        self.cells = cells\n",
 70 |     "        self.fitnesses = fitnesses\n",
 71 |     "    \n",
 72 |     "    def GoExplore(self, state, trajectory, steps):\n",
 73 |     "        results = []\n",
 74 |     "        visits = []\n",
 75 |     "        self.env.em.set_state(state)      \n",
 76 |     "        \n",
 77 |     "        recurrent_state = None\n",
 78 |     "        \n",
 79 |     "        if self.policy_type=='markov':\n",
 80 |     "            recurrent_state = np.random.randint(12)\n",
 81 |     "        \n",
 82 |     "        for i in range(steps):\n",
 83 |     "            action = None\n",
 84 |     "            if self.policy_type == 'random':\n",
 85 |     "                action = np.random.randint(12)\n",
 86 |     "            if self.policy_type == 'markov':\n",
 87 |     "                action = sampleMarkov(recurrent_state, self.policy_weights)\n",
 88 |     "                recurrent_state = action\n",
 89 |     "                \n",
 90 |     "            observation, reward, done, info = self.env.step(action_set[action])\n",
 91 |     "            trajectory.append(action)\n",
 92 |     "            \n",
 93 |     "            cell = frameToCell(observation, info)\n",
 94 |     "            \n",
 95 |     "            if cell not in visits:\n",
 96 |     "                visits.append(cell)\n",
 97 |     "                \n",
 98 |     "            if cell in self.cells:\n",
 99 |     "                if len(trajectory) < self.fitnesses[cell]:\n",
100 |     "                    results.append((cell, trajectory.copy(), self.env.em.get_state(), info, i))\n",
101 |     "            else:\n",
102 |     "                results.append((cell, trajectory.copy(), self.env.em.get_state(), info, i))\n",
103 |     "                self.cells.append(cell)\n",
104 |     "                self.fitnesses[cell] = len(trajectory)\n",
105 |     "        return (results, visits)\n",
106 |     "\n",
107 |     "    def reset(self):\n",
108 |     "        self.env.reset()"
109 |    ]
110 |   },
111 |   {
112 |    "cell_type": "code",
113 |    "execution_count": null,
114 |    "metadata": {},
115 |    "outputs": [],
116 |    "source": [
117 |     "install_games_from_rom_dir('roms/')\n",
118 |     "\n",
119 |     "game = 'SonicTheHedgehog-Genesis'\n",
120 |     "stateStr = 'GreenHillZone.Act1.state'\n",
121 |     "\n",
122 |     "policy = {'type':'markov', 'weights':randMarkov(10,12)}\n",
123 |     "\n",
124 |     "initialCell, initialState, initialTrajectory, initialFitness = getInitial(game, stateStr)\n",
125 |     "\n",
126 |     "cells = [initialCell]\n",
127 |     "fitnesses = {initialCell:initialFitness}\n",
128 |     "cell_prob = {initialCell:1}\n",
129 |     "trajectories = {initialCell:initialTrajectory}\n",
130 |     "states = {initialCell:initialState}\n",
131 |     "\n",
132 |     "NWorkers = 2\n",
133 |     "\n",
134 |     "workers = [ GoExploreActor.remote(game) for _ in range(NWorkers)]\n",
135 |     "\n",
136 |     "for worker in workers:\n",
137 |     "    worker.setPolicy.remote(policy)\n",
138 |     "    "
139 |    ]
140 |   },
141 |   {
142 |    "cell_type": "code",
143 |    "execution_count": null,
144 |    "metadata": {},
145 |    "outputs": [],
146 |    "source": [
147 |     "def winCondition(cell, info):\n",
148 |     "    return info['level_end_bonus'] != 0\n",
149 |     "\n",
150 |     "best_trajectory = None\n",
151 |     "verbose = True"
152 |    ]
153 |   },
154 |   {
155 |    "cell_type": "code",
156 |    "execution_count": null,
157 |    "metadata": {},
158 |    "outputs": [],
159 |    "source": [
160 |     "stat_cells = np.zeros(500)\n",
161 |     "stat_avglen = np.zeros(500)\n",
162 |     "stat_runlen = np.zeros(500)\n",
163 |     "#Optionally add other numbers of steps to this list to alternate shorter episodes\n",
164 |     "nsteps = [500]\n",
165 |     "\n",
166 |     "for i in range(500):\n",
167 |     "    print('Iteration ' + str(i))\n",
168 |     "    \n",
169 |     "    #Pass the full cell table and fitnesses table to each worker\n",
170 |     "    for worker in workers:\n",
171 |     "        worker.updateCache.remote(ray.put(cells), ray.put(fitnesses))\n",
172 |     "    normalized_cell_prob = np.array([cell_prob[c] for c in cells])\n",
173 |     "    normalized_cell_prob = normalized_cell_prob/normalized_cell_prob.sum()\n",
174 |     "    goCells = [cells[r] for r in np.random.choice(np.arange(len(cells)),size=(NWorkers) , p = normalized_cell_prob )]\n",
175 |     "    #Select a cell randomly\n",
176 |     "    #goCells = [cells[np.random.randint(len(cells))] for i in range(NWorkers)]\n",
177 |     "    R = ray.get( [ workers[i].GoExplore.remote(ray.put(states[goCells[i]]), ray.put(trajectories[goCells[i]]), nsteps[i%len(nsteps)]) for i in range(NWorkers)]   )\n",
178 |     "    \n",
179 |     "    #Complie results and update master tables\n",
180 |     "    results = []\n",
181 |     "    visits = []\n",
182 |     "    for r, v in R:\n",
183 |     "        results+=r\n",
184 |     "        visits+=v\n",
185 |     "    \n",
186 |     "    for cell, trajectory, state, info, iteration in results:\n",
187 |     "        if cell in cells:\n",
188 |     "            if len(trajectory) < fitnesses[cell]:\n",
189 |     "                if verbose:\n",
190 |     "                    print('Shortened trajectory to ' + cell + ' from ' \n",
191 |     "                          + str(fitnesses[cell]) + ' to ' \n",
192 |     "                          + str(len(trajectory)) + ', saving ' \n",
193 |     "                          + str(fitnesses[cell]-len(trajectory)) + ' frames at step '\n",
194 |     "                          + str(iteration))\n",
195 |     "                fitnesses[cell] = len(trajectory)\n",
196 |     "                trajectories[cell] = trajectory\n",
197 |     "                states[cell] = state\n",
198 |     "                cell_prob[cell] += 1\n",
199 |     "        else:\n",
200 |     "            if winCondition(cell, info):\n",
201 |     "                if best_trajectory is None:\n",
202 |     "                    best_trajectory = trajectory\n",
203 |     "                    print('Improved trajectory: ' + str(len(best_trajectory)) + ' Frames')\n",
204 |     "                elif len(trajectory)<len(best_trajectory):\n",
205 |     "                    best_trajectory = trajectory       \n",
206 |     "                    print('Improved trajectory: ' + str(len(best_trajectory)) + ' Frames')             \n",
207 |     "            else:\n",
208 |     "                if verbose:\n",
209 |     "                    print('Found cell ' + cell + ' at step ' + str(iteration) + ', total ' + str(len(cells)) + ' found')\n",
210 |     "                cells.append(cell)\n",
211 |     "                fitnesses[cell] = len(trajectory)\n",
212 |     "                trajectories[cell] = trajectory\n",
213 |     "                states[cell] = state\n",
214 |     "                cell_prob[cell] = 10\n",
215 |     "    \n",
216 |     "    for cell in visits:\n",
217 |     "        cell_prob[cell] *= .5\n",
218 |     "    if best_trajectory is None:\n",
219 |     "        if i < 100 and i%2==0:\n",
220 |     "            policy['weights'] = randMarkov(10,12)\n",
221 |     "        else:\n",
222 |     "            policy['weights'] = createMarkov(trajectories[cells[-1]],12)\n",
223 |     "    else:\n",
224 |     "        policy['weights'] = createMarkov(best_trajectory)\n",
225 |     "        stat_runlen[i] = len(best_trajectory)\n",
226 |     "    \n",
227 |     "    for worker in workers:\n",
228 |     "        worker.setPolicy.remote(policy)\n",
229 |     "        \n",
230 |     "    stat_cells[i] = len(cells)\n",
231 |     "    stat_avglen[i] = np.array(list(fitnesses.values())).mean()\n"
232 |    ]
233 |   },
234 |   {
235 |    "cell_type": "code",
236 |    "execution_count": null,
237 |    "metadata": {},
238 |    "outputs": [],
239 |    "source": [
240 |     "plt.plot(stat_cells)\n",
241 |     "plt.show()\n",
242 |     "plt.plot(stat_avglen)\n",
243 |     "plt.show()\n",
244 |     "plt.plot(stat_runlen)\n",
245 |     "plt.show()"
246 |    ]
247 |   },
248 |   {
249 |    "cell_type": "code",
250 |    "execution_count": null,
251 |    "metadata": {},
252 |    "outputs": [],
253 |    "source": [
254 |     "\n",
255 |     "cell = cells[np.array([literal_eval(cell)[0] for cell in cells]).argsort()[-1]]\n",
256 |     "\n",
257 |     "c = literal_eval(cell)\n",
258 |     "trajectory = trajectories[cell]\n",
259 |     "trajectoryToGif(game, stateStr, trajectory, True, 'Gameplay_'+str(len(trajectory))+'-'+str(c[0])+'-'+str(c[1])+'-'+str(c[2])+'-'+str(c[3])+'.gif')"
260 |    ]
261 |   },
262 |   {
263 |    "cell_type": "code",
264 |    "execution_count": null,
265 |    "metadata": {},
266 |    "outputs": [],
267 |    "source": []
268 |   },
269 |   {
270 |    "cell_type": "code",
271 |    "execution_count": null,
272 |    "metadata": {},
273 |    "outputs": [],
274 |    "source": []
275 |   },
276 |   {
277 |    "cell_type": "code",
278 |    "execution_count": null,
279 |    "metadata": {},
280 |    "outputs": [],
281 |    "source": []
282 |   },
283 |   {
284 |    "cell_type": "code",
285 |    "execution_count": null,
286 |    "metadata": {},
287 |    "outputs": [],
288 |    "source": []
289 |   },
290 |   {
291 |    "cell_type": "code",
292 |    "execution_count": null,
293 |    "metadata": {},
294 |    "outputs": [],
295 |    "source": []
296 |   },
297 |   {
298 |    "cell_type": "code",
299 |    "execution_count": null,
300 |    "metadata": {},
301 |    "outputs": [],
302 |    "source": []
303 |   },
304 |   {
305 |    "cell_type": "code",
306 |    "execution_count": null,
307 |    "metadata": {},
308 |    "outputs": [],
309 |    "source": []
310 |   }
311 |  ],
312 |  "metadata": {
313 |   "kernelspec": {
314 |    "display_name": "Python 3",
315 |    "language": "python",
316 |    "name": "python3"
317 |   },
318 |   "language_info": {
319 |    "codemirror_mode": {
320 |     "name": "ipython",
321 |     "version": 3
322 |    },
323 |    "file_extension": ".py",
324 |    "mimetype": "text/x-python",
325 |    "name": "python",
326 |    "nbconvert_exporter": "python",
327 |    "pygments_lexer": "ipython3",
328 |    "version": "3.6.5"
329 |   }
330 |  },
331 |  "nbformat": 4,
332 |  "nbformat_minor": 2
333 | }
334 | 


--------------------------------------------------------------------------------