├── Sample-based Learning Methods
    ├── README.md
    ├── Week 4
    │   └── 04_Q-Learning and Expected Sarsa.ipynb
    ├── Week 2
    │   └── 03a_Blackjack.ipynb
    └── Week 3
    │   └── 03_Policy Evaluation with Temporal Difference Learning.ipynb
├── README.md
└── Prediction and Control with Function Approximation
    └── Week 3
        └── 08_Function Approximation and Control.ipynb


/Sample-based Learning Methods/README.md:
--------------------------------------------------------------------------------
1 | # Sampling based Learning Methods
2 | 
3 | This contains the code for the following algorithms:
4 | 1. [Monte Carlo Methods - Blackjack](https://github.com/adithyaprem/Reinforcement-Learning-Specialization/tree/master/Sample-based%20Learning%20Methods/Week%202)
5 | 2. [Temporal Difference Learning](https://github.com/adithyaprem/Reinforcement-Learning-Specialization/tree/master/Sample-based%20Learning%20Methods/Week%203)
6 | 3. [Q-Learning,SARSA and Expected SARSA](https://github.com/adithyaprem/Reinforcement-Learning-Specialization/tree/master/Sample-based%20Learning%20Methods/Week%204)
7 | 4. [Planning using Dyna Algorithm](https://github.com/adithyaprem/Reinforcement-Learning-Specialization/tree/master/Sample-based%20Learning%20Methods/Week%205)
8 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # Reinforcement Learning Specialization
 2 | This repository contains the assignments done as a part of the Reinforcement Learning Specialization offered by University of Alberta on Coursera.
 3 | 
 4 | You can view the course by clicking on [this link](https://www.coursera.org/specializations/reinforcement-learning).
 5 | 
 6 | This repo consists of the following implementations:
 7 | 1. [K-Armed Bandits](Fundamentals%20of%20Reinforcement%20Learning/Week%201/Bandits-Assignment1.ipynb)
 8 | 2. [Dynamic Programming for Policy Iteration and Improvement](Fundamentals%20of%20Reinforcement%20Learning/Week%204)
 9 | 3. [Temporal Differnce Learning](Sample-based%20Learning%20Methods/Week%203)
10 | 4. [Q-Learning and SARSA](Sample-based%20Learning%20Methods/Week%204)
11 | 5. [Dyna-Q and Dyna-Q+](Sample-based%20Learning%20Methods/Week%205)
12 | 6. [TD with State Aggregation](Prediction%20and%20Control%20with%20Function%20Approximation/Week%201)
13 | 7. [Semi-Gradient TD](Prediction%20and%20Control%20with%20Function%20Approximation/Week%202)
14 | 8. [Function Approximation and Control using Tile Coding-SARSA](Prediction%20and%20Control%20with%20Function%20Approximation/Week%203)
15 | 9. [Actor-Critic using TD(0)](Prediction%20and%20Control%20with%20Function%20Approximation/Week%204)
16 | 
17 | 
18 | References:
19 | 1. [Reinforcement Learning: An Introduction](http://incompleteideas.net/book/RLbook2018.pdf)
20 | 2. [Lectures by David Silver](https://www.youtube.com/playlist?list=PLqYmG7hTraZDM-OYHWgPebj2MfCFzFObQ)
21 | 3. [Advanced Deep Learning and Reinforcement Learning](https://www.youtube.com/playlist?list=PLqYmG7hTraZDNJre23vqCGIVpfZ_K2RZs)
22 | 4. [List of RL algorithms and their implementations](https://github.com/dennybritz/reinforcement-learning)
23 | 5. [Stanford CS234: Reinforcement Learning](http://web.stanford.edu/class/cs234/index.html)
24 | 


--------------------------------------------------------------------------------
/Prediction and Control with Function Approximation/Week 3/08_Function Approximation and Control.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "metadata": {
  6 |     "deletable": false,
  7 |     "editable": false,
  8 |     "nbgrader": {
  9 |      "checksum": "bfeb29c19452c8f782e787e746422cae",
 10 |      "grade": false,
 11 |      "grade_id": "cell-6e2d08d53786c408",
 12 |      "locked": true,
 13 |      "schema_version": 1,
 14 |      "solution": false
 15 |     }
 16 |    },
 17 |    "source": [
 18 |     "# Assignment 3: Function Approximation and Control"
 19 |    ]
 20 |   },
 21 |   {
 22 |    "cell_type": "markdown",
 23 |    "metadata": {
 24 |     "deletable": false,
 25 |     "editable": false,
 26 |     "nbgrader": {
 27 |      "checksum": "bc8f8a3eff0506f5f2adbb15d865a6f8",
 28 |      "grade": false,
 29 |      "grade_id": "cell-c0aa3738f85c68a6",
 30 |      "locked": true,
 31 |      "schema_version": 1,
 32 |      "solution": false
 33 |     }
 34 |    },
 35 |    "source": [
 36 |     "Welcome to Assignment 3. In this notebook you will learn how to:\n",
 37 |     "- Use function approximation in the control setting\n",
 38 |     "- Implement the Sarsa algorithm using tile coding\n",
 39 |     "- Compare three settings for tile coding to see their effect on our agent\n",
 40 |     "\n",
 41 |     "As with the rest of the notebooks do not import additional libraries or adjust grading cells as this will break the grader.\n",
 42 |     "\n",
 43 |     "MAKE SURE TO RUN ALL OF THE CELLS SO THE GRADER GETS THE OUTPUT IT NEEDS\n"
 44 |    ]
 45 |   },
 46 |   {
 47 |    "cell_type": "code",
 48 |    "execution_count": null,
 49 |    "metadata": {
 50 |     "deletable": false,
 51 |     "editable": false,
 52 |     "nbgrader": {
 53 |      "checksum": "9d89a1172d798eea73c3f94e84b19a80",
 54 |      "grade": false,
 55 |      "grade_id": "cell-3e99c32936b1780b",
 56 |      "locked": true,
 57 |      "schema_version": 1,
 58 |      "solution": false
 59 |     }
 60 |    },
 61 |    "outputs": [],
 62 |    "source": [
 63 |     "# Import Necessary Libraries\n",
 64 |     "import numpy as np\n",
 65 |     "import matplotlib.pyplot as plt\n",
 66 |     "import tiles3 as tc\n",
 67 |     "from rl_glue import RLGlue\n",
 68 |     "from agent import BaseAgent\n",
 69 |     "from utils import argmax\n",
 70 |     "import mountaincar_env\n",
 71 |     "import time"
 72 |    ]
 73 |   },
 74 |   {
 75 |    "cell_type": "markdown",
 76 |    "metadata": {
 77 |     "deletable": false,
 78 |     "editable": false,
 79 |     "nbgrader": {
 80 |      "checksum": "b4b89842d59f53a0397404188fea8a68",
 81 |      "grade": false,
 82 |      "grade_id": "cell-0c184d76dab0d615",
 83 |      "locked": true,
 84 |      "schema_version": 1,
 85 |      "solution": false
 86 |     }
 87 |    },
 88 |    "source": [
 89 |     "In the above cell, we import the libraries we need for this assignment. You may have noticed that we import mountaincar_env. This is the __Mountain Car Task__ introduced in [Section 10.1 of the textbook](http://www.incompleteideas.net/book/RLbook2018.pdf#page=267). The task is for an under powered car to make it to the top of a hill:\n",
 90 |     "![Mountain Car](mountaincar.png \"Mountain Car\")\n",
 91 |     "The car is under-powered so the agent needs to learn to rock back and forth to get enough momentum to reach the goal. At each time step the agent receives from the environment its current velocity (a float between -0.07 and 0.07), and it's current position (a float between -1.2 and 0.5). Because our state is continuous there are a potentially infinite number of states that our agent could be in. We need a function approximation method to help the agent deal with this. In this notebook we will use tile coding. We provide a tile coding implementation for you to use, imported above with tiles3."
 92 |    ]
 93 |   },
 94 |   {
 95 |    "cell_type": "markdown",
 96 |    "metadata": {
 97 |     "deletable": false,
 98 |     "editable": false,
 99 |     "nbgrader": {
100 |      "checksum": "ec2c6de4858e1844bbba0fb965fb1697",
101 |      "grade": false,
102 |      "grade_id": "cell-abeb1cf4c7319766",
103 |      "locked": true,
104 |      "schema_version": 1,
105 |      "solution": false
106 |     }
107 |    },
108 |    "source": [
109 |     "## Section 0: Tile Coding Helper Function"
110 |    ]
111 |   },
112 |   {
113 |    "cell_type": "markdown",
114 |    "metadata": {
115 |     "deletable": false,
116 |     "editable": false,
117 |     "nbgrader": {
118 |      "checksum": "9948acdae6f3a64b7c81cf049ebb80a2",
119 |      "grade": false,
120 |      "grade_id": "cell-6b462acd2d7d2b79",
121 |      "locked": true,
122 |      "schema_version": 1,
123 |      "solution": false
124 |     }
125 |    },
126 |    "source": [
127 |     "To begin we are going to build a tile coding class for our Sarsa agent that will make it easier to make calls to our tile coder."
128 |    ]
129 |   },
130 |   {
131 |    "cell_type": "markdown",
132 |    "metadata": {
133 |     "deletable": false,
134 |     "editable": false,
135 |     "nbgrader": {
136 |      "checksum": "3ead2ccf72a2be17e6781d5b76e84219",
137 |      "grade": false,
138 |      "grade_id": "cell-85d6085e6342eb1c",
139 |      "locked": true,
140 |      "schema_version": 1,
141 |      "solution": false
142 |     }
143 |    },
144 |    "source": [
145 |     "### Tile Coding Function"
146 |    ]
147 |   },
148 |   {
149 |    "cell_type": "markdown",
150 |    "metadata": {
151 |     "deletable": false,
152 |     "editable": false,
153 |     "nbgrader": {
154 |      "checksum": "50b0abf997f87549fd27944b04e5b57c",
155 |      "grade": false,
156 |      "grade_id": "cell-2067725dbeb55d8b",
157 |      "locked": true,
158 |      "schema_version": 1,
159 |      "solution": false
160 |     }
161 |    },
162 |    "source": [
163 |     "Tile coding is introduced in [Section 9.5.4 of the textbook](http://www.incompleteideas.net/book/RLbook2018.pdf#page=239) of the textbook as a way to create features that can both provide good generalization and discrimination. It consists of multiple overlapping tilings, where each tiling is a partitioning of the space into tiles.\n",
164 |     "![Tile Coding](tilecoding.png \"Tile Coding\")"
165 |    ]
166 |   },
167 |   {
168 |    "cell_type": "markdown",
169 |    "metadata": {
170 |     "deletable": false,
171 |     "editable": false,
172 |     "nbgrader": {
173 |      "checksum": "810ec33572588560507256351c37be9d",
174 |      "grade": false,
175 |      "grade_id": "cell-7fc517d0f2a4618e",
176 |      "locked": true,
177 |      "schema_version": 1,
178 |      "solution": false
179 |     }
180 |    },
181 |    "source": [
182 |     "To help keep our agent code clean we are going to make a function specific for tile coding for our Mountain Car environment. To help we are going to use the Tiles3 library. This is a Python 3 implementation of the tile coder. To start take a look at the documentation: [Tiles3 documentation](http://incompleteideas.net/tiles/tiles3.html)\n",
183 |     "To get the tile coder working we need to implement a few pieces:\n",
184 |     "- First: create an index hash table - this is done for you in the init function using tc.IHT.\n",
185 |     "- Second is to scale the inputs for the tile coder based on the number of tiles and the range of values each input could take. The tile coder needs to take in a number in range [0, 1], or scaled to be [0, 1] * num_tiles. For more on this refer to the [Tiles3 documentation](http://incompleteideas.net/tiles/tiles3.html).\n",
186 |     "- Finally we call tc.tiles to get the active tiles back."
187 |    ]
188 |   },
189 |   {
190 |    "cell_type": "code",
191 |    "execution_count": null,
192 |    "metadata": {
193 |     "deletable": false,
194 |     "nbgrader": {
195 |      "checksum": "960c0a3aaa33badde927b45c7a4ebb96",
196 |      "grade": false,
197 |      "grade_id": "cell-12821f87b5c813b2",
198 |      "locked": false,
199 |      "schema_version": 1,
200 |      "solution": true
201 |     }
202 |    },
203 |    "outputs": [],
204 |    "source": [
205 |     "# Tile Coding Function [Graded]\n",
206 |     "class MountainCarTileCoder:\n",
207 |     "    def __init__(self, iht_size=4096, num_tilings=8, num_tiles=8):\n",
208 |     "        \"\"\"\n",
209 |     "        Initializes the MountainCar Tile Coder\n",
210 |     "        Initializers:\n",
211 |     "        iht_size -- int, the size of the index hash table, typically a power of 2\n",
212 |     "        num_tilings -- int, the number of tilings\n",
213 |     "        num_tiles -- int, the number of tiles. Here both the width and height of the\n",
214 |     "                     tile coder are the same\n",
215 |     "        Class Variables:\n",
216 |     "        self.iht -- tc.IHT, the index hash table that the tile coder will use\n",
217 |     "        self.num_tilings -- int, the number of tilings the tile coder will use\n",
218 |     "        self.num_tiles -- int, the number of tiles the tile coder will use\n",
219 |     "        \"\"\"\n",
220 |     "        self.iht = tc.IHT(iht_size)\n",
221 |     "        self.num_tilings = num_tilings\n",
222 |     "        self.num_tiles = num_tiles\n",
223 |     "    \n",
224 |     "    def get_tiles(self, position, velocity):\n",
225 |     "        \"\"\"\n",
226 |     "        Takes in a position and velocity from the mountaincar environment\n",
227 |     "        and returns a numpy array of active tiles.\n",
228 |     "        \n",
229 |     "        Arguments:\n",
230 |     "        position -- float, the position of the agent between -1.2 and 0.5\n",
231 |     "        velocity -- float, the velocity of the agent between -0.07 and 0.07\n",
232 |     "        returns:\n",
233 |     "        tiles - np.array, active tiles\n",
234 |     "        \"\"\"\n",
235 |     "        # Set the max and min of position and velocity to scale the input\n",
236 |     "        # POSITION_MIN\n",
237 |     "        # POSITION_MAX\n",
238 |     "        # VELOCITY_MIN\n",
239 |     "        # VELOCITY_MAX\n",
240 |     "        ### START CODE HERE ###\n",
241 |     "        \n",
242 |     "        ### END CODE HERE ###\n",
243 |     "        \n",
244 |     "        # Use the ranges above and self.num_tiles to set position_scale and velocity_scale\n",
245 |     "        # position_scale = number of tiles / position range\n",
246 |     "        # velocity_scale = number of tiles / velocity range\n",
247 |     "        \n",
248 |     "        # Scale position and velocity by multiplying the inputs of each by their scale\n",
249 |     "        \n",
250 |     "        ### START CODE HERE ###\n",
251 |     "        \n",
252 |     "        ### END CODE HERE ###\n",
253 |     "        \n",
254 |     "        # get the tiles using tc.tiles, with self.iht, self.num_tilings and [scaled position, scaled velocity]\n",
255 |     "        # nothing to implment here\n",
256 |     "        tiles = tc.tiles(self.iht, self.num_tilings, [position * position_scale, \n",
257 |     "                                                      velocity * velocity_scale])\n",
258 |     "        \n",
259 |     "        return np.array(tiles)"
260 |    ]
261 |   },
262 |   {
263 |    "cell_type": "code",
264 |    "execution_count": null,
265 |    "metadata": {
266 |     "deletable": false,
267 |     "editable": false,
268 |     "nbgrader": {
269 |      "checksum": "2f9114c9e8fe820a8d131b87e4c589a6",
270 |      "grade": false,
271 |      "grade_id": "cell-2c8b58b5dc63cc52",
272 |      "locked": true,
273 |      "schema_version": 1,
274 |      "solution": false
275 |     }
276 |    },
277 |    "outputs": [],
278 |    "source": [
279 |     "# [DO NOT CHANGE]\n",
280 |     "tests = [[-1.0, 0.01], [0.1, -0.01], [0.2, -0.05], [-1.0, 0.011], [0.2, -0.05]]\n",
281 |     "\n",
282 |     "mctc = MountainCarTileCoder(iht_size=1024, num_tilings=8, num_tiles=8)\n",
283 |     "\n",
284 |     "t = []\n",
285 |     "for test in tests:\n",
286 |     "    position, velocity = test\n",
287 |     "    tiles = mctc.get_tiles(position=position, velocity=velocity)\n",
288 |     "    t.append(tiles)\n",
289 |     "\n",
290 |     "print(\"Your results:\")\n",
291 |     "for tiles in t:\n",
292 |     "    print(tiles)\n",
293 |     "\n",
294 |     "print()\n",
295 |     "print(\"Expected results:\")\n",
296 |     "expected = \"\"\"[0 1 2 3 4 5 6 7]\n",
297 |     "[ 8  9 10 11 12 13 14 15]\n",
298 |     "[16 17 18 19 20 21 22 23]\n",
299 |     "[ 0 24  2  3  4  5  6  7]\n",
300 |     "[16 17 18 19 20 21 22 23]\n",
301 |     "\"\"\"\n",
302 |     "print(expected)\n",
303 |     "\n",
304 |     "np.random.seed(1)\n",
305 |     "mctc_test = MountainCarTileCoder(iht_size=1024, num_tilings=8, num_tiles=8)\n",
306 |     "test = [mctc_test.get_tiles(np.random.uniform(-1.2, 0.5), np.random.uniform(-0.07, 0.07)) for _ in range(10)]\n",
307 |     "np.save(\"tiles_test\", test)"
308 |    ]
309 |   },
310 |   {
311 |    "cell_type": "markdown",
312 |    "metadata": {
313 |     "deletable": false,
314 |     "editable": false,
315 |     "nbgrader": {
316 |      "checksum": "fdaa393fe01b7bebe2ff44aeabc1a233",
317 |      "grade": false,
318 |      "grade_id": "cell-b647e597fb7ee2e4",
319 |      "locked": true,
320 |      "schema_version": 1,
321 |      "solution": false
322 |     }
323 |    },
324 |    "source": [
325 |     "## Section 1: Sarsa Agent"
326 |    ]
327 |   },
328 |   {
329 |    "cell_type": "markdown",
330 |    "metadata": {
331 |     "deletable": false,
332 |     "editable": false,
333 |     "nbgrader": {
334 |      "checksum": "d8b0c1c1cab5bd89cc9479e3faad7539",
335 |      "grade": false,
336 |      "grade_id": "cell-bc36b07292c6751a",
337 |      "locked": true,
338 |      "schema_version": 1,
339 |      "solution": false
340 |     }
341 |    },
342 |    "source": [
343 |     "We are now going to use the functions that we just created to implement the Sarsa algorithm. Recall from class that Sarsa stands for State, Action, Reward, State, Action.\n",
344 |     "\n",
345 |     "For this case we have given you an argmax function similar to what you wrote back in Course 1 Assignment 1. Recall, this is different than the argmax function that is used by numpy, which returns the first index of a maximum value. We want our argmax function to arbitrarily break ties, which is what the imported argmax function does. The given argmax function takes in an array of values and returns an int of the chosen action: \n",
346 |     "argmax(action values)\n",
347 |     "\n",
348 |     "There are multiple ways that we can deal with actions for the tile coder. Here we are going to use one simple method - make the size of the weight vector equal to (iht_size, num_actions). This will give us one weight vector for each action and one weight for each tile.\n",
349 |     "\n",
350 |     "Use the above function to help fill in select_action, agent_start, agent_step, and agent_end.\n",
351 |     "\n",
352 |     "Hints:\n",
353 |     "\n",
354 |     "1) The tile coder returns a list of active indexes (e.g. [1, 12, 22]). You can index a numpy array using an array of values - this will return an array of the values at each of those indices. So in order to get the value of a state we can index our weight vector using the action and the array of tiles that the tile coder returns:\n",
355 |     "\n",
356 |     "```self.w[action][active_tiles]```\n",
357 |     "\n",
358 |     "This will give us an array of values, one for each active tile, and we sum the result to get the value of that state-action pair.\n",
359 |     "\n",
360 |     "2) In the case of a binary feature vector (such as the tile coder), the derivative is 1 at each of the active tiles, and zero otherwise."
361 |    ]
362 |   },
363 |   {
364 |    "cell_type": "code",
365 |    "execution_count": null,
366 |    "metadata": {
367 |     "deletable": false,
368 |     "nbgrader": {
369 |      "checksum": "c899864c5d5422ec74315ab7ec2ae092",
370 |      "grade": false,
371 |      "grade_id": "cell-f01df3aacf5a9d4b",
372 |      "locked": false,
373 |      "schema_version": 1,
374 |      "solution": true
375 |     }
376 |    },
377 |    "outputs": [],
378 |    "source": [
379 |     "# SARSA\n",
380 |     "class SarsaAgent(BaseAgent):\n",
381 |     "    \"\"\"\n",
382 |     "    Initialization of Sarsa Agent. All values are set to None so they can\n",
383 |     "    be initialized in the agent_init method.\n",
384 |     "    \"\"\"\n",
385 |     "    def __init__(self):\n",
386 |     "        self.last_action = None\n",
387 |     "        self.last_state = None\n",
388 |     "        self.epsilon = None\n",
389 |     "        self.gamma = None\n",
390 |     "        self.iht_size = None\n",
391 |     "        self.w = None\n",
392 |     "        self.alpha = None\n",
393 |     "        self.num_tilings = None\n",
394 |     "        self.num_tiles = None\n",
395 |     "        self.mctc = None\n",
396 |     "        self.initial_weights = None\n",
397 |     "        self.num_actions = None\n",
398 |     "        self.previous_tiles = None\n",
399 |     "\n",
400 |     "    def agent_init(self, agent_info={}):\n",
401 |     "        \"\"\"Setup for the agent called when the experiment first starts.\"\"\"\n",
402 |     "        self.num_tilings = agent_info.get(\"num_tilings\", 8)\n",
403 |     "        self.num_tiles = agent_info.get(\"num_tiles\", 8)\n",
404 |     "        self.iht_size = agent_info.get(\"iht_size\", 4096)\n",
405 |     "        self.epsilon = agent_info.get(\"epsilon\", 0.0)\n",
406 |     "        self.gamma = agent_info.get(\"gamma\", 1.0)\n",
407 |     "        self.alpha = agent_info.get(\"alpha\", 0.5) / self.num_tilings\n",
408 |     "        self.initial_weights = agent_info.get(\"initial_weights\", 0.0)\n",
409 |     "        self.num_actions = agent_info.get(\"num_actions\", 3)\n",
410 |     "        \n",
411 |     "        # We initialize self.w to three times the iht_size. Recall this is because\n",
412 |     "        # we need to have one set of weights for each action.\n",
413 |     "        self.w = np.ones((self.num_actions, self.iht_size)) * self.initial_weights\n",
414 |     "        \n",
415 |     "        # We initialize self.mctc to the mountaincar verions of the \n",
416 |     "        # tile coder that we created\n",
417 |     "        self.tc = MountainCarTileCoder(iht_size=self.iht_size, \n",
418 |     "                                         num_tilings=self.num_tilings, \n",
419 |     "                                         num_tiles=self.num_tiles)\n",
420 |     "\n",
421 |     "    def select_action(self, tiles):\n",
422 |     "        \"\"\"\n",
423 |     "        Selects an action using epsilon greedy\n",
424 |     "        Args:\n",
425 |     "        tiles - np.array, an array of active tiles\n",
426 |     "        Returns:\n",
427 |     "        (chosen_action, action_value) - (int, float), tuple of the chosen action\n",
428 |     "                                        and it's value\n",
429 |     "        \"\"\"\n",
430 |     "        action_values = []\n",
431 |     "        chosen_action = None\n",
432 |     "        \n",
433 |     "        # First loop through the weights of each action and populate action_values\n",
434 |     "        # with the action value for each action and tiles instance\n",
435 |     "        \n",
436 |     "        # Use np.random.random to decide if an exploritory action should be taken\n",
437 |     "        # and set chosen_action to a random action if it is\n",
438 |     "        # Otherwise choose the greedy action using the given argmax \n",
439 |     "        # function and the action values (don't use numpy's armax)\n",
440 |     "        \n",
441 |     "        ### START CODE HERE ###\n",
442 |     "        \n",
443 |     "        ### END CODE HERE ###\n",
444 |     "        \n",
445 |     "        return chosen_action, action_values[chosen_action]\n",
446 |     "    \n",
447 |     "    def agent_start(self, state):\n",
448 |     "        \"\"\"The first method called when the experiment starts, called after\n",
449 |     "        the environment starts.\n",
450 |     "        Args:\n",
451 |     "            state (Numpy array): the state observation from the\n",
452 |     "                environment's evn_start function.\n",
453 |     "        Returns:\n",
454 |     "            The first action the agent takes.\n",
455 |     "        \"\"\"\n",
456 |     "        position, velocity = state\n",
457 |     "        \n",
458 |     "        # Use self.tc to set active_tiles using position and velocity\n",
459 |     "        # set current_action to the epsilon greedy chosen action using\n",
460 |     "        # the select_action function above with the active tiles\n",
461 |     "        \n",
462 |     "        ### START CODE HERE ###\n",
463 |     "        \n",
464 |     "        ### END CODE HERE ###\n",
465 |     "        \n",
466 |     "        self.last_action = current_action\n",
467 |     "        self.previous_tiles = np.copy(active_tiles)\n",
468 |     "        return self.last_action\n",
469 |     "\n",
470 |     "    def agent_step(self, reward, state):\n",
471 |     "        \"\"\"A step taken by the agent.\n",
472 |     "        Args:\n",
473 |     "            reward (float): the reward received for taking the last action taken\n",
474 |     "            state (Numpy array): the state observation from the\n",
475 |     "                environment's step based, where the agent ended up after the\n",
476 |     "                last step\n",
477 |     "        Returns:\n",
478 |     "            The action the agent is taking.\n",
479 |     "        \"\"\"\n",
480 |     "        # choose the action here\n",
481 |     "        position, velocity = state\n",
482 |     "        \n",
483 |     "        # Use self.tc to set active_tiles using position and velocity\n",
484 |     "        # set current_action and action_value to the epsilon greedy chosen action using\n",
485 |     "        # the select_action function above with the active tiles\n",
486 |     "        \n",
487 |     "        # Update self.w at self.previous_tiles and self.previous action\n",
488 |     "        # using the reward, action_value, self.gamma, self.w,\n",
489 |     "        # self.alpha, and the Sarsa update from the textbook\n",
490 |     "        \n",
491 |     "        ### START CODE HERE ###\n",
492 |     "        \n",
493 |     "        ### END CODE HERE ###\n",
494 |     "        \n",
495 |     "        self.last_action = current_action\n",
496 |     "        self.previous_tiles = np.copy(active_tiles)\n",
497 |     "        return self.last_action\n",
498 |     "\n",
499 |     "    def agent_end(self, reward):\n",
500 |     "        \"\"\"Run when the agent terminates.\n",
501 |     "        Args:\n",
502 |     "            reward (float): the reward the agent received for entering the\n",
503 |     "                terminal state.\n",
504 |     "        \"\"\"\n",
505 |     "        # Update self.w at self.previous_tiles and self.previous action\n",
506 |     "        # using the reward, self.gamma, self.w,\n",
507 |     "        # self.alpha, and the Sarsa update from the textbook\n",
508 |     "        # Hint - there is no action_value used here because this is the end\n",
509 |     "        # of the episode.\n",
510 |     "        \n",
511 |     "        ### START CODE HERE ###\n",
512 |     "        \n",
513 |     "        ### END CODE HERE ###\n",
514 |     "        \n",
515 |     "    def agent_cleanup(self):\n",
516 |     "        \"\"\"Cleanup done after the agent ends.\"\"\"\n",
517 |     "        pass\n",
518 |     "\n",
519 |     "    def agent_message(self, message):\n",
520 |     "        \"\"\"A function used to pass information from the agent to the experiment.\n",
521 |     "        Args:\n",
522 |     "            message: The message passed to the agent.\n",
523 |     "        Returns:\n",
524 |     "            The response (or answer) to the message.\n",
525 |     "        \"\"\"\n",
526 |     "        pass"
527 |    ]
528 |   },
529 |   {
530 |    "cell_type": "code",
531 |    "execution_count": null,
532 |    "metadata": {
533 |     "deletable": false,
534 |     "editable": false,
535 |     "nbgrader": {
536 |      "checksum": "cd2e581d58b48f8be57ecd190b2b6e6a",
537 |      "grade": false,
538 |      "grade_id": "cell-4363d332fa1b0f96",
539 |      "locked": true,
540 |      "schema_version": 1,
541 |      "solution": false
542 |     }
543 |    },
544 |    "outputs": [],
545 |    "source": [
546 |     "# Test Epsilon Greedy Function [DO NOT CHANGE]\n",
547 |     "agent = SarsaAgent()\n",
548 |     "agent.agent_init({\"epsilon\": 0.1})\n",
549 |     "agent.w = np.array([np.array([1, 2, 3]), np.array([4, 5, 6]), np.array([7, 8, 9])])\n",
550 |     "\n",
551 |     "total = 0\n",
552 |     "for i in range(1000):\n",
553 |     "    chosen_action, action_value = agent.select_action(np.array([0,1]))\n",
554 |     "    total += action_value\n",
555 |     "print(total)\n",
556 |     "assert total < 15000, \"Check that you are not always choosing the best action\"\n",
557 |     "\n",
558 |     "np.save(\"epsilon_test\", total)\n",
559 |     "\n",
560 |     "agent = SarsaAgent()\n",
561 |     "agent.agent_init({\"epsilon\": 0.0})\n",
562 |     "agent.w = np.array([np.array([1, 2, 3]), np.array([4, 5, 6]), np.array([7, 8, 9])])\n",
563 |     "\n",
564 |     "chosen_action, action_value = agent.select_action(np.array([0,1]))\n",
565 |     "print(\"Expected value\")\n",
566 |     "print(\"(2, 15)\")\n",
567 |     "\n",
568 |     "print(\"Your value\")\n",
569 |     "print((chosen_action, action_value))\n",
570 |     "\n",
571 |     "np.save(\"egreedy_test\", (chosen_action, action_value))"
572 |    ]
573 |   },
574 |   {
575 |    "cell_type": "code",
576 |    "execution_count": null,
577 |    "metadata": {
578 |     "deletable": false,
579 |     "editable": false,
580 |     "nbgrader": {
581 |      "checksum": "8082fe078300cb738d2dd10f590a3535",
582 |      "grade": false,
583 |      "grade_id": "cell-aa40398d933320db",
584 |      "locked": true,
585 |      "schema_version": 1,
586 |      "solution": false
587 |     }
588 |    },
589 |    "outputs": [],
590 |    "source": [
591 |     "# Test Sarsa Agent [DO NOT CHANGE]\n",
592 |     "num_runs = 10\n",
593 |     "num_episodes = 50\n",
594 |     "env_info = {\"num_tiles\": 8, \"num_tilings\": 8}\n",
595 |     "agent_info = {}\n",
596 |     "all_steps = []\n",
597 |     "\n",
598 |     "agent = SarsaAgent\n",
599 |     "env = mountaincar_env.Environment\n",
600 |     "start = time.time()\n",
601 |     "\n",
602 |     "for run in range(num_runs):\n",
603 |     "    if run % 5 == 0:\n",
604 |     "        print(\"RUN: {}\".format(run))\n",
605 |     "\n",
606 |     "    rl_glue = RLGlue(env, agent)\n",
607 |     "    rl_glue.rl_init(agent_info, env_info)\n",
608 |     "    steps_per_episode = []\n",
609 |     "\n",
610 |     "    for episode in range(num_episodes):\n",
611 |     "        rl_glue.rl_episode(15000)\n",
612 |     "        steps_per_episode.append(rl_glue.num_steps)\n",
613 |     "\n",
614 |     "    all_steps.append(np.array(steps_per_episode))\n",
615 |     "\n",
616 |     "print(\"Run time: {}\".format(time.time() - start))\n",
617 |     "\n",
618 |     "plt.plot(np.mean(np.array(all_steps), axis=0))\n",
619 |     "np.save(\"sarsa_test\", np.array(all_steps))"
620 |    ]
621 |   },
622 |   {
623 |    "cell_type": "markdown",
624 |    "metadata": {
625 |     "deletable": false,
626 |     "editable": false,
627 |     "nbgrader": {
628 |      "checksum": "76a454920dcb6dd3d10d811a2f4dfa21",
629 |      "grade": false,
630 |      "grade_id": "cell-25c387a04d2d325f",
631 |      "locked": true,
632 |      "schema_version": 1,
633 |      "solution": false
634 |     }
635 |    },
636 |    "source": [
637 |     "The learning rate of your agent should look similar to ours, though it will not look exactly the same.If there are some spikey points that is okay. Due to stochasticity,  a few episodes may have taken much longer, causing some spikes in the plot. The trend of the line should be similar, though, generally decreasing to about 200 steps per run.\n",
638 |     "![alt text](sarsa_agent_initial.png \"Logo Title Text 1\")"
639 |    ]
640 |   },
641 |   {
642 |    "cell_type": "markdown",
643 |    "metadata": {
644 |     "deletable": false,
645 |     "editable": false,
646 |     "nbgrader": {
647 |      "checksum": "59a8c32add9b94d547093defbc0958cf",
648 |      "grade": false,
649 |      "grade_id": "cell-b8423d7852d12941",
650 |      "locked": true,
651 |      "schema_version": 1,
652 |      "solution": false
653 |     }
654 |    },
655 |    "source": [
656 |     "This result was using 8 tilings with 8x8 tiles on each. Let's see if we can do better, and what different tilings look like. We will also text 2 tilings of 16x16 and 4 tilings of 32x32. These three choices produce the same number of features (512), but distributed quite differently. "
657 |    ]
658 |   },
659 |   {
660 |    "cell_type": "code",
661 |    "execution_count": null,
662 |    "metadata": {
663 |     "deletable": false,
664 |     "editable": false,
665 |     "nbgrader": {
666 |      "checksum": "da19e771d808db89cb738f0c84fcecca",
667 |      "grade": false,
668 |      "grade_id": "cell-779d7aea26f74686",
669 |      "locked": true,
670 |      "schema_version": 1,
671 |      "solution": false
672 |     }
673 |    },
674 |    "outputs": [],
675 |    "source": [
676 |     "# Compare the three\n",
677 |     "num_runs = 20\n",
678 |     "num_episodes = 100\n",
679 |     "env_info = {}\n",
680 |     "\n",
681 |     "agent_runs = []\n",
682 |     "# alphas = [0.2, 0.4, 0.5, 1.0]\n",
683 |     "alphas = [0.5]\n",
684 |     "agent_info_options = [{\"num_tiles\": 16, \"num_tilings\": 2, \"alpha\": 0.5},\n",
685 |     "                      {\"num_tiles\": 4, \"num_tilings\": 32, \"alpha\": 0.5},\n",
686 |     "                      {\"num_tiles\": 8, \"num_tilings\": 8, \"alpha\": 0.5}]\n",
687 |     "agent_info_options = [{\"num_tiles\" : agent[\"num_tiles\"], \n",
688 |     "                       \"num_tilings\": agent[\"num_tilings\"],\n",
689 |     "                       \"alpha\" : alpha} for agent in agent_info_options for alpha in alphas]\n",
690 |     "\n",
691 |     "agent = SarsaAgent\n",
692 |     "env = mountaincar_env.Environment\n",
693 |     "for agent_info in agent_info_options:\n",
694 |     "    all_steps = []\n",
695 |     "    start = time.time()\n",
696 |     "    for run in range(num_runs):\n",
697 |     "        if run % 5 == 0:\n",
698 |     "            print(\"RUN: {}\".format(run))\n",
699 |     "        env = mountaincar_env.Environment\n",
700 |     "        \n",
701 |     "        rl_glue = RLGlue(env, agent)\n",
702 |     "        rl_glue.rl_init(agent_info, env_info)\n",
703 |     "        steps_per_episode = []\n",
704 |     "\n",
705 |     "        for episode in range(num_episodes):\n",
706 |     "            rl_glue.rl_episode(15000)\n",
707 |     "            steps_per_episode.append(rl_glue.num_steps)\n",
708 |     "        all_steps.append(np.array(steps_per_episode))\n",
709 |     "    \n",
710 |     "    agent_runs.append(np.mean(np.array(all_steps), axis=0))\n",
711 |     "    print(rl_glue.agent.alpha)\n",
712 |     "    print(\"Run Time: {}\".format(time.time() - start))\n",
713 |     "\n",
714 |     "plt.figure(figsize=(15, 10), dpi= 80, facecolor='w', edgecolor='k')\n",
715 |     "plt.plot(np.array(agent_runs).T)\n",
716 |     "plt.xlabel(\"Episode\")\n",
717 |     "plt.ylabel(\"Steps Per Episode\")\n",
718 |     "plt.yscale(\"linear\")\n",
719 |     "plt.ylim(0, 1000)\n",
720 |     "plt.legend([\"num_tiles: {}, num_tilings: {}, alpha: {}\".format(agent_info[\"num_tiles\"], \n",
721 |     "                                                               agent_info[\"num_tilings\"],\n",
722 |     "                                                               agent_info[\"alpha\"])\n",
723 |     "            for agent_info in agent_info_options])"
724 |    ]
725 |   },
726 |   {
727 |    "cell_type": "markdown",
728 |    "metadata": {
729 |     "deletable": false,
730 |     "editable": false,
731 |     "nbgrader": {
732 |      "checksum": "6038f4c142bec2a4e78e32ddf9c249de",
733 |      "grade": false,
734 |      "grade_id": "cell-b7e2f943ac40b395",
735 |      "locked": true,
736 |      "schema_version": 1,
737 |      "solution": false
738 |     }
739 |    },
740 |    "source": [
741 |     "Here we can see that using 32 tilings and 4 x 4 tiles does a little better than 8 tilings with 8x8 tiles. Both seem to do much better than using 2 tilings, with 16 x 16 tiles."
742 |    ]
743 |   },
744 |   {
745 |    "cell_type": "markdown",
746 |    "metadata": {
747 |     "deletable": false,
748 |     "editable": false,
749 |     "nbgrader": {
750 |      "checksum": "63a8d309f64550844d8d8669796da9e6",
751 |      "grade": false,
752 |      "grade_id": "cell-279952a048a11f2a",
753 |      "locked": true,
754 |      "schema_version": 1,
755 |      "solution": false
756 |     }
757 |    },
758 |    "source": [
759 |     "## Section 3: Conclusion"
760 |    ]
761 |   },
762 |   {
763 |    "cell_type": "markdown",
764 |    "metadata": {
765 |     "deletable": false,
766 |     "editable": false,
767 |     "nbgrader": {
768 |      "checksum": "d2f0a01598313c64677f7ea426cad9cc",
769 |      "grade": false,
770 |      "grade_id": "cell-a2e2660173b7334f",
771 |      "locked": true,
772 |      "schema_version": 1,
773 |      "solution": false
774 |     }
775 |    },
776 |    "source": [
777 |     "Congratulations! You have learned how to implement a control agent using function approximation. In this notebook you learned how to:\n",
778 |     "\n",
779 |     "- Use function approximation in the control setting\n",
780 |     "- Implement the Sarsa algorithm using tile coding\n",
781 |     "- Compare three settings for tile coding to see their effect on our agent"
782 |    ]
783 |   }
784 |  ],
785 |  "metadata": {
786 |   "@webio": {
787 |    "lastCommId": null,
788 |    "lastKernelId": null
789 |   },
790 |   "coursera": {
791 |    "course_slug": "prediction-control-function-approximation",
792 |    "graded_item_id": "N1YGC",
793 |    "launcher_item_id": "UJo5o"
794 |   },
795 |   "kernelspec": {
796 |    "display_name": "Python 3",
797 |    "language": "python",
798 |    "name": "python3"
799 |   },
800 |   "language_info": {
801 |    "codemirror_mode": {
802 |     "name": "ipython",
803 |     "version": 3
804 |    },
805 |    "file_extension": ".py",
806 |    "mimetype": "text/x-python",
807 |    "name": "python",
808 |    "nbconvert_exporter": "python",
809 |    "pygments_lexer": "ipython3",
810 |    "version": "3.7.1"
811 |   }
812 |  },
813 |  "nbformat": 4,
814 |  "nbformat_minor": 2
815 | }
816 | 


--------------------------------------------------------------------------------
/Sample-based Learning Methods/Week 4/04_Q-Learning and Expected Sarsa.ipynb:
--------------------------------------------------------------------------------
   1 | {
   2 |  "cells": [
   3 |   {
   4 |    "cell_type": "markdown",
   5 |    "metadata": {
   6 |     "deletable": false,
   7 |     "editable": false,
   8 |     "nbgrader": {
   9 |      "cell_type": "markdown",
  10 |      "checksum": "e539f7fea2ba568c7a9a8292b0602a0a",
  11 |      "grade": false,
  12 |      "grade_id": "cell-4292e6ff11f3c291",
  13 |      "locked": true,
  14 |      "schema_version": 3,
  15 |      "solution": false
  16 |     }
  17 |    },
  18 |    "source": [
  19 |     "# Assignment 2 - Q-Learning and Expected Sarsa"
  20 |    ]
  21 |   },
  22 |   {
  23 |    "cell_type": "markdown",
  24 |    "metadata": {
  25 |     "deletable": false,
  26 |     "editable": false,
  27 |     "nbgrader": {
  28 |      "cell_type": "markdown",
  29 |      "checksum": "084c7b68a27987da29071541fb20358b",
  30 |      "grade": false,
  31 |      "grade_id": "cell-f4e1bfc6ad38ce3d",
  32 |      "locked": true,
  33 |      "schema_version": 3,
  34 |      "solution": false
  35 |     }
  36 |    },
  37 |    "source": [
  38 |     "Welcome to Course 2 Programming Assignment 2. In this notebook, you will:\n",
  39 |     "\n",
  40 |     "- Implement Q-Learning with $\\epsilon$-greedy action selection\n",
  41 |     "- Implement Expected Sarsa with $\\epsilon$-greedy action selection\n",
  42 |     "- Investigate how these two algorithms behave on Cliff World (described on page 132 of the textbook)\n",
  43 |     "\n",
  44 |     "We will provide you with the environment and infrastructure to run an experiment (called the experiment program in RL-Glue). This notebook will provide all the code you need to run your experiment and visualise learning performance.\n",
  45 |     "\n",
  46 |     "This assignment will be graded automatically by comparing the behavior of your agent to our implementations of Expected Sarsa and Q-learning. The random seed will be set to avoid different behavior due to randomness. **You should not call any random functions in this notebook.** It will affect the agent's random state and change the results."
  47 |    ]
  48 |   },
  49 |   {
  50 |    "cell_type": "markdown",
  51 |    "metadata": {
  52 |     "deletable": false,
  53 |     "editable": false,
  54 |     "nbgrader": {
  55 |      "cell_type": "markdown",
  56 |      "checksum": "3a6df636f47ebdf7f0707d7b2651a2c6",
  57 |      "grade": false,
  58 |      "grade_id": "cell-2a8ddbbf0ef25d07",
  59 |      "locked": true,
  60 |      "schema_version": 3,
  61 |      "solution": false
  62 |     }
  63 |    },
  64 |    "source": [
  65 |     "## Packages"
  66 |    ]
  67 |   },
  68 |   {
  69 |    "cell_type": "markdown",
  70 |    "metadata": {
  71 |     "deletable": false,
  72 |     "editable": false,
  73 |     "nbgrader": {
  74 |      "cell_type": "markdown",
  75 |      "checksum": "d74b7bc264a49057450f81177d1afbdb",
  76 |      "grade": false,
  77 |      "grade_id": "cell-69f08c6441da699c",
  78 |      "locked": true,
  79 |      "schema_version": 3,
  80 |      "solution": false
  81 |     }
  82 |    },
  83 |    "source": [
  84 |     "You will need the following libraries for this assignment. We are using:\n",
  85 |     "1. numpy: the fundamental package for scientific computing with Python.\n",
  86 |     "2. scipy: a Python library for scientific and technical computing.\n",
  87 |     "3. matplotlib: library for plotting graphs in Python.\n",
  88 |     "4. RL-Glue: library for reinforcement learning experiments.\n",
  89 |     "\n",
  90 |     "**Please do not import other libraries** — this will break the autograder."
  91 |    ]
  92 |   },
  93 |   {
  94 |    "cell_type": "code",
  95 |    "execution_count": null,
  96 |    "metadata": {},
  97 |    "outputs": [],
  98 |    "source": [
  99 |     "%matplotlib inline\n",
 100 |     "import numpy as np\n",
 101 |     "from scipy.stats import sem\n",
 102 |     "import matplotlib.pyplot as plt\n",
 103 |     "from rl_glue import RLGlue\n",
 104 |     "import agent\n",
 105 |     "import cliffworld_env\n",
 106 |     "from tqdm import tqdm\n",
 107 |     "import pickle"
 108 |    ]
 109 |   },
 110 |   {
 111 |    "cell_type": "code",
 112 |    "execution_count": null,
 113 |    "metadata": {
 114 |     "deletable": false,
 115 |     "editable": false,
 116 |     "nbgrader": {
 117 |      "cell_type": "code",
 118 |      "checksum": "781be58c941d2ddc62052efda26ebd05",
 119 |      "grade": false,
 120 |      "grade_id": "cell-92144e79fff2c0ea",
 121 |      "locked": true,
 122 |      "schema_version": 3,
 123 |      "solution": false
 124 |     }
 125 |    },
 126 |    "outputs": [],
 127 |    "source": [
 128 |     "plt.rcParams.update({'font.size': 15})\n",
 129 |     "plt.rcParams.update({'figure.figsize': [10,5]})"
 130 |    ]
 131 |   },
 132 |   {
 133 |    "cell_type": "markdown",
 134 |    "metadata": {
 135 |     "deletable": false,
 136 |     "editable": false,
 137 |     "nbgrader": {
 138 |      "cell_type": "markdown",
 139 |      "checksum": "f6c9d5996579dbe1b3ac25058a574409",
 140 |      "grade": false,
 141 |      "grade_id": "cell-148cfbbe73465cef",
 142 |      "locked": true,
 143 |      "schema_version": 3,
 144 |      "solution": false
 145 |     }
 146 |    },
 147 |    "source": [
 148 |     "## Section 1: Q-Learning"
 149 |    ]
 150 |   },
 151 |   {
 152 |    "cell_type": "markdown",
 153 |    "metadata": {
 154 |     "deletable": false,
 155 |     "editable": false,
 156 |     "nbgrader": {
 157 |      "cell_type": "markdown",
 158 |      "checksum": "4869e937cb5c63d7046a204ebe15914c",
 159 |      "grade": false,
 160 |      "grade_id": "cell-0c942413e94d98db",
 161 |      "locked": true,
 162 |      "schema_version": 3,
 163 |      "solution": false
 164 |     }
 165 |    },
 166 |    "source": [
 167 |     "In this section you will implement and test a Q-Learning agent with $\\epsilon$-greedy action selection (Section 6.5 in the textbook). "
 168 |    ]
 169 |   },
 170 |   {
 171 |    "cell_type": "markdown",
 172 |    "metadata": {
 173 |     "deletable": false,
 174 |     "editable": false,
 175 |     "nbgrader": {
 176 |      "cell_type": "markdown",
 177 |      "checksum": "d6eff9064c79025d80bff9970686a5d3",
 178 |      "grade": false,
 179 |      "grade_id": "cell-11cf7ceec7f5b9fe",
 180 |      "locked": true,
 181 |      "schema_version": 3,
 182 |      "solution": false
 183 |     }
 184 |    },
 185 |    "source": [
 186 |     "### Implementation"
 187 |    ]
 188 |   },
 189 |   {
 190 |    "cell_type": "markdown",
 191 |    "metadata": {
 192 |     "deletable": false,
 193 |     "editable": false,
 194 |     "nbgrader": {
 195 |      "cell_type": "markdown",
 196 |      "checksum": "d8a38e971b034abfdfc90ca66f3936b0",
 197 |      "grade": false,
 198 |      "grade_id": "cell-3417aeb44526bda3",
 199 |      "locked": true,
 200 |      "schema_version": 3,
 201 |      "solution": false
 202 |     }
 203 |    },
 204 |    "source": [
 205 |     "Your job is to implement the updates in the methods agent_step and agent_end. We provide detailed comments in each method describing what your code should do."
 206 |    ]
 207 |   },
 208 |   {
 209 |    "cell_type": "code",
 210 |    "execution_count": null,
 211 |    "metadata": {
 212 |     "deletable": false,
 213 |     "nbgrader": {
 214 |      "cell_type": "code",
 215 |      "checksum": "b523008e6f0bde39944117023b591333",
 216 |      "grade": false,
 217 |      "grade_id": "cell-e77107160ebd3c72",
 218 |      "locked": false,
 219 |      "schema_version": 3,
 220 |      "solution": true
 221 |     }
 222 |    },
 223 |    "outputs": [],
 224 |    "source": [
 225 |     "# [Graded]\n",
 226 |     "# Q-Learning agent here\n",
 227 |     "class QLearningAgent(agent.BaseAgent):\n",
 228 |     "    def agent_init(self, agent_init_info):\n",
 229 |     "        \"\"\"Setup for the agent called when the experiment first starts.\n",
 230 |     "        \n",
 231 |     "        Args:\n",
 232 |     "        agent_init_info (dict), the parameters used to initialize the agent. The dictionary contains:\n",
 233 |     "        {\n",
 234 |     "            num_states (int): The number of states,\n",
 235 |     "            num_actions (int): The number of actions,\n",
 236 |     "            epsilon (float): The epsilon parameter for exploration,\n",
 237 |     "            step_size (float): The step-size,\n",
 238 |     "            discount (float): The discount factor,\n",
 239 |     "        }\n",
 240 |     "        \n",
 241 |     "        \"\"\"\n",
 242 |     "        # Store the parameters provided in agent_init_info.\n",
 243 |     "        self.num_actions = agent_init_info[\"num_actions\"]\n",
 244 |     "        self.num_states = agent_init_info[\"num_states\"]\n",
 245 |     "        self.epsilon = agent_init_info[\"epsilon\"]\n",
 246 |     "        self.step_size = agent_init_info[\"step_size\"]\n",
 247 |     "        self.discount = agent_init_info[\"discount\"]\n",
 248 |     "        self.rand_generator = np.random.RandomState(agent_info[\"seed\"])\n",
 249 |     "        \n",
 250 |     "        # Create an array for action-value estimates and initialize it to zero.\n",
 251 |     "        self.q = np.zeros((self.num_states, self.num_actions)) # The array of action-value estimates.\n",
 252 |     "\n",
 253 |     "        \n",
 254 |     "    def agent_start(self, state):\n",
 255 |     "        \"\"\"The first method called when the episode starts, called after\n",
 256 |     "        the environment starts.\n",
 257 |     "        Args:\n",
 258 |     "            state (int): the state from the\n",
 259 |     "                environment's evn_start function.\n",
 260 |     "        Returns:\n",
 261 |     "            action (int): the first action the agent takes.\n",
 262 |     "        \"\"\"\n",
 263 |     "        \n",
 264 |     "        # Choose action using epsilon greedy.\n",
 265 |     "        current_q = self.q[state,:]\n",
 266 |     "        if self.rand_generator.rand() < self.epsilon:\n",
 267 |     "            action = self.rand_generator.randint(self.num_actions)\n",
 268 |     "        else:\n",
 269 |     "            action = self.argmax(current_q)\n",
 270 |     "        self.prev_state = state\n",
 271 |     "        self.prev_action = action\n",
 272 |     "        return action\n",
 273 |     "    \n",
 274 |     "    def agent_step(self, reward, state):\n",
 275 |     "        \"\"\"A step taken by the agent.\n",
 276 |     "        Args:\n",
 277 |     "            reward (float): the reward received for taking the last action taken\n",
 278 |     "            state (int): the state from the\n",
 279 |     "                environment's step based on where the agent ended up after the\n",
 280 |     "                last step.\n",
 281 |     "        Returns:\n",
 282 |     "            action (int): the action the agent is taking.\n",
 283 |     "        \"\"\"\n",
 284 |     "        \n",
 285 |     "        # Choose action using epsilon greedy.\n",
 286 |     "        current_q = self.q[state, :]\n",
 287 |     "        if self.rand_generator.rand() < self.epsilon:\n",
 288 |     "            action = self.rand_generator.randint(self.num_actions)\n",
 289 |     "        else:\n",
 290 |     "            action = self.argmax(current_q)\n",
 291 |     "        \n",
 292 |     "        # Perform an update (1 line)\n",
 293 |     "        ### START CODE HERE ###\n",
 294 |     "        \n",
 295 |     "        ### END CODE HERE ###\n",
 296 |     "        \n",
 297 |     "        self.prev_state = state\n",
 298 |     "        self.prev_action = action\n",
 299 |     "        return action\n",
 300 |     "    \n",
 301 |     "    def agent_end(self, reward):\n",
 302 |     "        \"\"\"Run when the agent terminates.\n",
 303 |     "        Args:\n",
 304 |     "            reward (float): the reward the agent received for entering the\n",
 305 |     "                terminal state.\n",
 306 |     "        \"\"\"\n",
 307 |     "        # Perform the last update in the episode (1 line)\n",
 308 |     "        ### START CODE HERE ###\n",
 309 |     "        \n",
 310 |     "        ### END CODE HERE ###\n",
 311 |     "        \n",
 312 |     "    def argmax(self, q_values):\n",
 313 |     "        \"\"\"argmax with random tie-breaking\n",
 314 |     "        Args:\n",
 315 |     "            q_values (Numpy array): the array of action-values\n",
 316 |     "        Returns:\n",
 317 |     "            action (int): an action with the highest value\n",
 318 |     "        \"\"\"\n",
 319 |     "        top = float(\"-inf\")\n",
 320 |     "        ties = []\n",
 321 |     "\n",
 322 |     "        for i in range(len(q_values)):\n",
 323 |     "            if q_values[i] > top:\n",
 324 |     "                top = q_values[i]\n",
 325 |     "                ties = []\n",
 326 |     "\n",
 327 |     "            if q_values[i] == top:\n",
 328 |     "                ties.append(i)\n",
 329 |     "\n",
 330 |     "        return self.rand_generator.choice(ties)"
 331 |    ]
 332 |   },
 333 |   {
 334 |    "cell_type": "markdown",
 335 |    "metadata": {
 336 |     "deletable": false,
 337 |     "editable": false,
 338 |     "nbgrader": {
 339 |      "cell_type": "markdown",
 340 |      "checksum": "9361d06fd03ef5169c039e916de4ec26",
 341 |      "grade": false,
 342 |      "grade_id": "cell-5bb232d570f6ba80",
 343 |      "locked": true,
 344 |      "schema_version": 3,
 345 |      "solution": false
 346 |     }
 347 |    },
 348 |    "source": [
 349 |     "### Test"
 350 |    ]
 351 |   },
 352 |   {
 353 |    "cell_type": "markdown",
 354 |    "metadata": {
 355 |     "deletable": false,
 356 |     "editable": false,
 357 |     "nbgrader": {
 358 |      "cell_type": "markdown",
 359 |      "checksum": "301cb73e95ae17680f0d24e10c7513d6",
 360 |      "grade": false,
 361 |      "grade_id": "cell-d2621de8f8b5e4ba",
 362 |      "locked": true,
 363 |      "schema_version": 3,
 364 |      "solution": false
 365 |     }
 366 |    },
 367 |    "source": [
 368 |     "Run the cells below to test the implemented methods. The output of each cell should match the expected output.\n",
 369 |     "\n",
 370 |     "Note that passing this test does not guarantee correct behavior on the Cliff World."
 371 |    ]
 372 |   },
 373 |   {
 374 |    "cell_type": "code",
 375 |    "execution_count": null,
 376 |    "metadata": {
 377 |     "deletable": false,
 378 |     "editable": false,
 379 |     "nbgrader": {
 380 |      "cell_type": "code",
 381 |      "checksum": "e31522059faa25ed475e25a6fbbc420c",
 382 |      "grade": false,
 383 |      "grade_id": "cell-1c160d79c07cac0b",
 384 |      "locked": true,
 385 |      "schema_version": 3,
 386 |      "solution": false
 387 |     }
 388 |    },
 389 |    "outputs": [],
 390 |    "source": [
 391 |     "# Do not modify this cell!\n",
 392 |     "\n",
 393 |     "## Test Code for agent_start() ##\n",
 394 |     "\n",
 395 |     "agent_info = {\"num_actions\": 4, \"num_states\": 3, \"epsilon\": 0.1, \"step_size\": 0.1, \"discount\": 1.0, \"seed\": 0}\n",
 396 |     "current_agent = QLearningAgent()\n",
 397 |     "current_agent.agent_init(agent_info)\n",
 398 |     "action = current_agent.agent_start(0)\n",
 399 |     "print(\"Action Value Estimates: \\n\", current_agent.q)\n",
 400 |     "print(\"Action:\", action)"
 401 |    ]
 402 |   },
 403 |   {
 404 |    "cell_type": "markdown",
 405 |    "metadata": {
 406 |     "deletable": false,
 407 |     "editable": false,
 408 |     "nbgrader": {
 409 |      "cell_type": "markdown",
 410 |      "checksum": "fbe3f4201266f67423b1ece02dbc0333",
 411 |      "grade": false,
 412 |      "grade_id": "cell-f1a6a8b66b6598e6",
 413 |      "locked": true,
 414 |      "schema_version": 3,
 415 |      "solution": false
 416 |     }
 417 |    },
 418 |    "source": [
 419 |     "**Expected Output:**\n",
 420 |     "\n",
 421 |     "```\n",
 422 |     "Action Value Estimates: \n",
 423 |     " [[0. 0. 0. 0.]\n",
 424 |     " [0. 0. 0. 0.]\n",
 425 |     " [0. 0. 0. 0.]]\n",
 426 |     "Action: 1\n",
 427 |     "```"
 428 |    ]
 429 |   },
 430 |   {
 431 |    "cell_type": "code",
 432 |    "execution_count": null,
 433 |    "metadata": {
 434 |     "deletable": false,
 435 |     "editable": false,
 436 |     "nbgrader": {
 437 |      "cell_type": "code",
 438 |      "checksum": "b5d0abaed2b270d5a21f9503d8470e68",
 439 |      "grade": false,
 440 |      "grade_id": "cell-b63b908156924031",
 441 |      "locked": true,
 442 |      "schema_version": 3,
 443 |      "solution": false
 444 |     }
 445 |    },
 446 |    "outputs": [],
 447 |    "source": [
 448 |     "# Do not modify this cell!\n",
 449 |     "\n",
 450 |     "## Test Code for agent_step() ##\n",
 451 |     "\n",
 452 |     "actions = []\n",
 453 |     "agent_info = {\"num_actions\": 4, \"num_states\": 3, \"epsilon\": 0.1, \"step_size\": 0.1, \"discount\": 1.0, \"seed\": 0}\n",
 454 |     "current_agent = QLearningAgent()\n",
 455 |     "current_agent.agent_init(agent_info)\n",
 456 |     "actions.append(current_agent.agent_start(0))\n",
 457 |     "actions.append(current_agent.agent_step(2, 1))\n",
 458 |     "actions.append(current_agent.agent_step(0, 0))\n",
 459 |     "print(\"Action Value Estimates: \\n\", current_agent.q)\n",
 460 |     "print(\"Actions:\", actions)"
 461 |    ]
 462 |   },
 463 |   {
 464 |    "cell_type": "markdown",
 465 |    "metadata": {
 466 |     "deletable": false,
 467 |     "editable": false,
 468 |     "nbgrader": {
 469 |      "cell_type": "markdown",
 470 |      "checksum": "ed1a688d14e6eb3961b32a8dbdbbb858",
 471 |      "grade": false,
 472 |      "grade_id": "cell-3b916a9081886d4d",
 473 |      "locked": true,
 474 |      "schema_version": 3,
 475 |      "solution": false
 476 |     }
 477 |    },
 478 |    "source": [
 479 |     "**Expected Output:**\n",
 480 |     "\n",
 481 |     "```\n",
 482 |     "Action Value Estimates: \n",
 483 |     " [[ 0.   0.2  0.   0. ]\n",
 484 |     " [ 0.   0.   0.  0.02]\n",
 485 |     " [ 0.   0.   0.   0. ]]\n",
 486 |     "Actions: [1, 3, 1]\n",
 487 |     "```"
 488 |    ]
 489 |   },
 490 |   {
 491 |    "cell_type": "code",
 492 |    "execution_count": null,
 493 |    "metadata": {
 494 |     "deletable": false,
 495 |     "editable": false,
 496 |     "nbgrader": {
 497 |      "cell_type": "code",
 498 |      "checksum": "49dd68d058ac35cf96e3682e71080b1f",
 499 |      "grade": false,
 500 |      "grade_id": "cell-8fe80d6a4a6555a5",
 501 |      "locked": true,
 502 |      "schema_version": 3,
 503 |      "solution": false
 504 |     }
 505 |    },
 506 |    "outputs": [],
 507 |    "source": [
 508 |     "# Do not modify this cell!\n",
 509 |     "\n",
 510 |     "## Test Code for agent_end() ##\n",
 511 |     "\n",
 512 |     "actions = []\n",
 513 |     "agent_info = {\"num_actions\": 4, \"num_states\": 3, \"epsilon\": 0.1, \"step_size\": 0.1, \"discount\": 1.0, \"seed\": 0}\n",
 514 |     "current_agent = QLearningAgent()\n",
 515 |     "current_agent.agent_init(agent_info)\n",
 516 |     "actions.append(current_agent.agent_start(0))\n",
 517 |     "actions.append(current_agent.agent_step(2, 1))\n",
 518 |     "current_agent.agent_end(1)\n",
 519 |     "print(\"Action Value Estimates: \\n\", current_agent.q)\n",
 520 |     "print(\"Actions:\", actions)"
 521 |    ]
 522 |   },
 523 |   {
 524 |    "cell_type": "markdown",
 525 |    "metadata": {
 526 |     "deletable": false,
 527 |     "editable": false,
 528 |     "nbgrader": {
 529 |      "cell_type": "markdown",
 530 |      "checksum": "d34093b01b729874834af87668416b5f",
 531 |      "grade": false,
 532 |      "grade_id": "cell-8eddb10c5e7c1791",
 533 |      "locked": true,
 534 |      "schema_version": 3,
 535 |      "solution": false
 536 |     }
 537 |    },
 538 |    "source": [
 539 |     "**Expected Output:**\n",
 540 |     "\n",
 541 |     "```\n",
 542 |     "Action Value Estimates: \n",
 543 |     " [[0.  0.2 0.  0. ]\n",
 544 |     " [0.  0.  0.  0.1]\n",
 545 |     " [0.  0.  0.  0. ]]\n",
 546 |     "Actions: [1, 3]\n",
 547 |     "```"
 548 |    ]
 549 |   },
 550 |   {
 551 |    "cell_type": "markdown",
 552 |    "metadata": {
 553 |     "deletable": false,
 554 |     "editable": false,
 555 |     "nbgrader": {
 556 |      "cell_type": "markdown",
 557 |      "checksum": "9a549cc5d3d6a35b2578be87a3ea288a",
 558 |      "grade": false,
 559 |      "grade_id": "cell-3ab82a89ea44f09e",
 560 |      "locked": true,
 561 |      "schema_version": 3,
 562 |      "solution": false
 563 |     }
 564 |    },
 565 |    "source": [
 566 |     "## Section 2:  Expected Sarsa"
 567 |    ]
 568 |   },
 569 |   {
 570 |    "cell_type": "markdown",
 571 |    "metadata": {
 572 |     "deletable": false,
 573 |     "editable": false,
 574 |     "nbgrader": {
 575 |      "cell_type": "markdown",
 576 |      "checksum": "e16e2e0918866de0908360b07d53b814",
 577 |      "grade": false,
 578 |      "grade_id": "cell-12980d9f811d7bb6",
 579 |      "locked": true,
 580 |      "schema_version": 3,
 581 |      "solution": false
 582 |     }
 583 |    },
 584 |    "source": [
 585 |     "In this section you will implement an Expected Sarsa agent with $\\epsilon$-greedy action selection (Section 6.6 in the textbook). "
 586 |    ]
 587 |   },
 588 |   {
 589 |    "cell_type": "markdown",
 590 |    "metadata": {
 591 |     "deletable": false,
 592 |     "editable": false,
 593 |     "nbgrader": {
 594 |      "cell_type": "markdown",
 595 |      "checksum": "f635cf2541375086474f964e9ebe31d8",
 596 |      "grade": false,
 597 |      "grade_id": "cell-09c8eef6bd8e9472",
 598 |      "locked": true,
 599 |      "schema_version": 3,
 600 |      "solution": false
 601 |     }
 602 |    },
 603 |    "source": [
 604 |     "### Implementation"
 605 |    ]
 606 |   },
 607 |   {
 608 |    "cell_type": "markdown",
 609 |    "metadata": {
 610 |     "deletable": false,
 611 |     "editable": false,
 612 |     "nbgrader": {
 613 |      "cell_type": "markdown",
 614 |      "checksum": "401762021600e7176bb065754532c57b",
 615 |      "grade": false,
 616 |      "grade_id": "cell-27a67597b07f3d03",
 617 |      "locked": true,
 618 |      "schema_version": 3,
 619 |      "solution": false
 620 |     }
 621 |    },
 622 |    "source": [
 623 |     "Your job is to implement the updates in the methods agent_step and agent_end. We provide detailed comments in each method describing what your code should do."
 624 |    ]
 625 |   },
 626 |   {
 627 |    "cell_type": "code",
 628 |    "execution_count": null,
 629 |    "metadata": {
 630 |     "deletable": false,
 631 |     "nbgrader": {
 632 |      "cell_type": "code",
 633 |      "checksum": "db1e0c043dcc4292dc81eb19e3e0debd",
 634 |      "grade": false,
 635 |      "grade_id": "cell-8d20990dcf9eeb6c",
 636 |      "locked": false,
 637 |      "schema_version": 3,
 638 |      "solution": true
 639 |     }
 640 |    },
 641 |    "outputs": [],
 642 |    "source": [
 643 |     "# [Graded]\n",
 644 |     "# Expected Sarsa agent here\n",
 645 |     "class ExpectedSarsaAgent(agent.BaseAgent):\n",
 646 |     "    def agent_init(self, agent_init_info):\n",
 647 |     "        \"\"\"Setup for the agent called when the experiment first starts.\n",
 648 |     "        \n",
 649 |     "        Args:\n",
 650 |     "        agent_init_info (dict), the parameters used to initialize the agent. The dictionary contains:\n",
 651 |     "        {\n",
 652 |     "            num_states (int): The number of states,\n",
 653 |     "            num_actions (int): The number of actions,\n",
 654 |     "            epsilon (float): The epsilon parameter for exploration,\n",
 655 |     "            step_size (float): The step-size,\n",
 656 |     "            discount (float): The discount factor,\n",
 657 |     "        }\n",
 658 |     "        \n",
 659 |     "        \"\"\"\n",
 660 |     "        # Store the parameters provided in agent_init_info.\n",
 661 |     "        self.num_actions = agent_init_info[\"num_actions\"]\n",
 662 |     "        self.num_states = agent_init_info[\"num_states\"]\n",
 663 |     "        self.epsilon = agent_init_info[\"epsilon\"]\n",
 664 |     "        self.step_size = agent_init_info[\"step_size\"]\n",
 665 |     "        self.discount = agent_init_info[\"discount\"]\n",
 666 |     "        self.rand_generator = np.random.RandomState(agent_info[\"seed\"])\n",
 667 |     "        \n",
 668 |     "        # Create an array for action-value estimates and initialize it to zero.\n",
 669 |     "        self.q = np.zeros((self.num_states, self.num_actions)) # The array of action-value estimates.\n",
 670 |     "\n",
 671 |     "        \n",
 672 |     "    def agent_start(self, state):\n",
 673 |     "        \"\"\"The first method called when the episode starts, called after\n",
 674 |     "        the environment starts.\n",
 675 |     "        Args:\n",
 676 |     "            state (int): the state from the\n",
 677 |     "                environment's evn_start function.\n",
 678 |     "        Returns:\n",
 679 |     "            action (int): the first action the agent takes.\n",
 680 |     "        \"\"\"\n",
 681 |     "        \n",
 682 |     "        # Choose action using epsilon greedy.\n",
 683 |     "        current_q = self.q[state, :]\n",
 684 |     "        if self.rand_generator.rand() < self.epsilon:\n",
 685 |     "            action = self.rand_generator.randint(self.num_actions)\n",
 686 |     "        else:\n",
 687 |     "            action = self.argmax(current_q)\n",
 688 |     "        self.prev_state = state\n",
 689 |     "        self.prev_action = action\n",
 690 |     "        return action\n",
 691 |     "    \n",
 692 |     "    def agent_step(self, reward, state):\n",
 693 |     "        \"\"\"A step taken by the agent.\n",
 694 |     "        Args:\n",
 695 |     "            reward (float): the reward received for taking the last action taken\n",
 696 |     "            state (int): the state from the\n",
 697 |     "                environment's step based on where the agent ended up after the\n",
 698 |     "                last step.\n",
 699 |     "        Returns:\n",
 700 |     "            action (int): the action the agent is taking.\n",
 701 |     "        \"\"\"\n",
 702 |     "        \n",
 703 |     "        # Choose action using epsilon greedy.\n",
 704 |     "        current_q = self.q[state,:]\n",
 705 |     "        if self.rand_generator.rand() < self.epsilon:\n",
 706 |     "            action = self.rand_generator.randint(self.num_actions)\n",
 707 |     "        else:\n",
 708 |     "            action = self.argmax(current_q)\n",
 709 |     "        \n",
 710 |     "        # Perform an update (~5 lines)\n",
 711 |     "        ### START CODE HERE ###\n",
 712 |     "        \n",
 713 |     "        ### END CODE HERE ###\n",
 714 |     "        \n",
 715 |     "        self.prev_state = state\n",
 716 |     "        self.prev_action = action\n",
 717 |     "        return action\n",
 718 |     "    \n",
 719 |     "    def agent_end(self, reward):\n",
 720 |     "        \"\"\"Run when the agent terminates.\n",
 721 |     "        Args:\n",
 722 |     "            reward (float): the reward the agent received for entering the\n",
 723 |     "                terminal state.\n",
 724 |     "        \"\"\"\n",
 725 |     "        # Perform the last update in the episode (1 line)\n",
 726 |     "        ### START CODE HERE ###\n",
 727 |     "        \n",
 728 |     "        ### END CODE HERE ###\n",
 729 |     "        \n",
 730 |     "    def argmax(self, q_values):\n",
 731 |     "        \"\"\"argmax with random tie-breaking\n",
 732 |     "        Args:\n",
 733 |     "            q_values (Numpy array): the array of action-values\n",
 734 |     "        Returns:\n",
 735 |     "            action (int): an action with the highest value\n",
 736 |     "        \"\"\"\n",
 737 |     "        top = float(\"-inf\")\n",
 738 |     "        ties = []\n",
 739 |     "\n",
 740 |     "        for i in range(len(q_values)):\n",
 741 |     "            if q_values[i] > top:\n",
 742 |     "                top = q_values[i]\n",
 743 |     "                ties = []\n",
 744 |     "\n",
 745 |     "            if q_values[i] == top:\n",
 746 |     "                ties.append(i)\n",
 747 |     "\n",
 748 |     "        return self.rand_generator.choice(ties)"
 749 |    ]
 750 |   },
 751 |   {
 752 |    "cell_type": "markdown",
 753 |    "metadata": {
 754 |     "deletable": false,
 755 |     "editable": false,
 756 |     "nbgrader": {
 757 |      "cell_type": "markdown",
 758 |      "checksum": "f358f7e2676a77b8dd13a09fad9261a2",
 759 |      "grade": false,
 760 |      "grade_id": "cell-bd6580041d80533a",
 761 |      "locked": true,
 762 |      "schema_version": 3,
 763 |      "solution": false
 764 |     }
 765 |    },
 766 |    "source": [
 767 |     "### Test"
 768 |    ]
 769 |   },
 770 |   {
 771 |    "cell_type": "markdown",
 772 |    "metadata": {
 773 |     "deletable": false,
 774 |     "editable": false,
 775 |     "nbgrader": {
 776 |      "cell_type": "markdown",
 777 |      "checksum": "562af8b2c4449bec9534666c9747e461",
 778 |      "grade": false,
 779 |      "grade_id": "cell-7574736a2553024d",
 780 |      "locked": true,
 781 |      "schema_version": 3,
 782 |      "solution": false
 783 |     }
 784 |    },
 785 |    "source": [
 786 |     "Run the cells below to test the implemented methods. The output of each cell should match the expected output.\n",
 787 |     "\n",
 788 |     "Note that passing this test does not guarantee correct behavior on the Cliff World."
 789 |    ]
 790 |   },
 791 |   {
 792 |    "cell_type": "code",
 793 |    "execution_count": null,
 794 |    "metadata": {
 795 |     "deletable": false,
 796 |     "editable": false,
 797 |     "nbgrader": {
 798 |      "cell_type": "code",
 799 |      "checksum": "62db384f5fa66caae6a68a840cb56797",
 800 |      "grade": false,
 801 |      "grade_id": "cell-7d4f037d4106e8e2",
 802 |      "locked": true,
 803 |      "schema_version": 3,
 804 |      "solution": false
 805 |     }
 806 |    },
 807 |    "outputs": [],
 808 |    "source": [
 809 |     "# Do not modify this cell!\n",
 810 |     "\n",
 811 |     "## Test Code for agent_start() ##\n",
 812 |     "\n",
 813 |     "agent_info = {\"num_actions\": 4, \"num_states\": 3, \"epsilon\": 0.1, \"step_size\": 0.1, \"discount\": 1.0, \"seed\": 0}\n",
 814 |     "current_agent = ExpectedSarsaAgent()\n",
 815 |     "current_agent.agent_init(agent_info)\n",
 816 |     "action = current_agent.agent_start(0)\n",
 817 |     "print(\"Action Value Estimates: \\n\", current_agent.q)\n",
 818 |     "print(\"Action:\", action)"
 819 |    ]
 820 |   },
 821 |   {
 822 |    "cell_type": "markdown",
 823 |    "metadata": {
 824 |     "deletable": false,
 825 |     "editable": false,
 826 |     "nbgrader": {
 827 |      "cell_type": "markdown",
 828 |      "checksum": "2f5cc33e33a94e5123e0311be2208c2a",
 829 |      "grade": false,
 830 |      "grade_id": "cell-4d1ae44ff39f2ef6",
 831 |      "locked": true,
 832 |      "schema_version": 3,
 833 |      "solution": false
 834 |     }
 835 |    },
 836 |    "source": [
 837 |     "**Expected Output:**\n",
 838 |     "\n",
 839 |     "```\n",
 840 |     "Action Value Estimates: \n",
 841 |     " [[0. 0. 0. 0.]\n",
 842 |     " [0. 0. 0. 0.]\n",
 843 |     " [0. 0. 0. 0.]]\n",
 844 |     "Action: 1\n",
 845 |     "```"
 846 |    ]
 847 |   },
 848 |   {
 849 |    "cell_type": "code",
 850 |    "execution_count": null,
 851 |    "metadata": {
 852 |     "deletable": false,
 853 |     "editable": false,
 854 |     "nbgrader": {
 855 |      "cell_type": "code",
 856 |      "checksum": "5b40fa207655b4dd1028786e8d553a70",
 857 |      "grade": false,
 858 |      "grade_id": "cell-e77508d1e061c326",
 859 |      "locked": true,
 860 |      "schema_version": 3,
 861 |      "solution": false
 862 |     }
 863 |    },
 864 |    "outputs": [],
 865 |    "source": [
 866 |     "# Do not modify this cell!\n",
 867 |     "\n",
 868 |     "## Test Code for agent_step() ##\n",
 869 |     "\n",
 870 |     "actions = []\n",
 871 |     "agent_info = {\"num_actions\": 4, \"num_states\": 3, \"epsilon\": 0.1, \"step_size\": 0.1, \"discount\": 1.0, \"seed\": 0}\n",
 872 |     "current_agent = ExpectedSarsaAgent()\n",
 873 |     "current_agent.agent_init(agent_info)\n",
 874 |     "actions.append(current_agent.agent_start(0))\n",
 875 |     "actions.append(current_agent.agent_step(2, 1))\n",
 876 |     "actions.append(current_agent.agent_step(0, 0))\n",
 877 |     "print(\"Action Value Estimates: \\n\", current_agent.q)\n",
 878 |     "print(\"Actions:\", actions)"
 879 |    ]
 880 |   },
 881 |   {
 882 |    "cell_type": "markdown",
 883 |    "metadata": {
 884 |     "deletable": false,
 885 |     "editable": false,
 886 |     "nbgrader": {
 887 |      "cell_type": "markdown",
 888 |      "checksum": "e92c78b348a88e1db2e988fd442a1ae5",
 889 |      "grade": false,
 890 |      "grade_id": "cell-11bdb20cca21c6d6",
 891 |      "locked": true,
 892 |      "schema_version": 3,
 893 |      "solution": false
 894 |     }
 895 |    },
 896 |    "source": [
 897 |     "**Expected Output:**\n",
 898 |     "\n",
 899 |     "```\n",
 900 |     "Action Value Estimates: \n",
 901 |     " [[0.     0.2    0.     0.    ]\n",
 902 |     " [0.     0.     0.     0.0185]\n",
 903 |     " [0.     0.     0.     0.    ]]\n",
 904 |     "Actions: [1, 3, 1]\n",
 905 |     "```"
 906 |    ]
 907 |   },
 908 |   {
 909 |    "cell_type": "code",
 910 |    "execution_count": null,
 911 |    "metadata": {
 912 |     "deletable": false,
 913 |     "editable": false,
 914 |     "nbgrader": {
 915 |      "cell_type": "code",
 916 |      "checksum": "4f52f3065b81d15c96f297117c7b6d81",
 917 |      "grade": false,
 918 |      "grade_id": "cell-1866144548cd9c28",
 919 |      "locked": true,
 920 |      "schema_version": 3,
 921 |      "solution": false
 922 |     }
 923 |    },
 924 |    "outputs": [],
 925 |    "source": [
 926 |     "# Do not modify this cell!\n",
 927 |     "\n",
 928 |     "## Test Code for agent_end() ##\n",
 929 |     "\n",
 930 |     "actions = []\n",
 931 |     "agent_info = {\"num_actions\": 4, \"num_states\": 3, \"epsilon\": 0.1, \"step_size\": 0.1, \"discount\": 1.0, \"seed\": 0}\n",
 932 |     "current_agent = ExpectedSarsaAgent()\n",
 933 |     "current_agent.agent_init(agent_info)\n",
 934 |     "actions.append(current_agent.agent_start(0))\n",
 935 |     "actions.append(current_agent.agent_step(2, 1))\n",
 936 |     "current_agent.agent_end(1)\n",
 937 |     "print(\"Action Value Estimates: \\n\", current_agent.q)\n",
 938 |     "print(\"Actions:\", actions)"
 939 |    ]
 940 |   },
 941 |   {
 942 |    "cell_type": "markdown",
 943 |    "metadata": {
 944 |     "deletable": false,
 945 |     "editable": false,
 946 |     "nbgrader": {
 947 |      "cell_type": "markdown",
 948 |      "checksum": "e9a2554acf9aa8d280d1175c3f23554b",
 949 |      "grade": false,
 950 |      "grade_id": "cell-9edd1b6d5a51c18a",
 951 |      "locked": true,
 952 |      "schema_version": 3,
 953 |      "solution": false
 954 |     }
 955 |    },
 956 |    "source": [
 957 |     "**Expected Output:**\n",
 958 |     "\n",
 959 |     "```\n",
 960 |     "Action Value Estimates: \n",
 961 |     " [[0.  0.2 0.  0. ]\n",
 962 |     " [0.  0.  0.  0.1]\n",
 963 |     " [0.  0.  0.  0. ]]\n",
 964 |     "Actions: [1, 3]\n",
 965 |     "```"
 966 |    ]
 967 |   },
 968 |   {
 969 |    "cell_type": "markdown",
 970 |    "metadata": {
 971 |     "deletable": false,
 972 |     "editable": false,
 973 |     "nbgrader": {
 974 |      "cell_type": "markdown",
 975 |      "checksum": "95e2ea24f0de8c0a847e3f9b1719e8f1",
 976 |      "grade": false,
 977 |      "grade_id": "cell-2692792f654c792f",
 978 |      "locked": true,
 979 |      "schema_version": 3,
 980 |      "solution": false
 981 |     }
 982 |    },
 983 |    "source": [
 984 |     "## Section 3:  Solving the Cliff World"
 985 |    ]
 986 |   },
 987 |   {
 988 |    "cell_type": "markdown",
 989 |    "metadata": {
 990 |     "deletable": false,
 991 |     "editable": false,
 992 |     "nbgrader": {
 993 |      "cell_type": "markdown",
 994 |      "checksum": "5f6c1e54b358fabad02c9002f23a1087",
 995 |      "grade": false,
 996 |      "grade_id": "cell-6e7fbbaa12d4bf31",
 997 |      "locked": true,
 998 |      "schema_version": 3,
 999 |      "solution": false
1000 |     }
1001 |    },
1002 |    "source": [
1003 |     "We described the Cliff World environment in the video \"Expected Sarsa in the Cliff World\" in Lesson 3. This is an undiscounted episodic task and thus we set $\\gamma$=1. The agent starts in the bottom left corner of the gridworld below and takes actions that move it in the four directions. Actions that would move the agent off of the cliff incur a reward of -100 and send the agent back to the start state. The reward for all other transitions is -1. An episode terminates when the agent reaches the bottom right corner. "
1004 |    ]
1005 |   },
1006 |   {
1007 |    "cell_type": "markdown",
1008 |    "metadata": {
1009 |     "deletable": false,
1010 |     "editable": false,
1011 |     "nbgrader": {
1012 |      "cell_type": "markdown",
1013 |      "checksum": "bac5c2eaf9d52fa5d29242db0de448f4",
1014 |      "grade": false,
1015 |      "grade_id": "cell-6aaddf82523ef2a5",
1016 |      "locked": true,
1017 |      "schema_version": 3,
1018 |      "solution": false
1019 |     }
1020 |    },
1021 |    "source": [
1022 |     "<img src=\"cliffworld.png\" alt=\"Drawing\" style=\"width: 600px;\"/>\n"
1023 |    ]
1024 |   },
1025 |   {
1026 |    "cell_type": "markdown",
1027 |    "metadata": {
1028 |     "deletable": false,
1029 |     "editable": false,
1030 |     "nbgrader": {
1031 |      "cell_type": "markdown",
1032 |      "checksum": "4215fbaa30c33d57f4351e501f0a6422",
1033 |      "grade": false,
1034 |      "grade_id": "cell-e55d077b9f8b6133",
1035 |      "locked": true,
1036 |      "schema_version": 3,
1037 |      "solution": false
1038 |     }
1039 |    },
1040 |    "source": [
1041 |     "Using the experiment program in the cell below we now compare the agents on the Cliff World environment and plot the sum of rewards during each episode for the two agents.\n",
1042 |     "\n",
1043 |     "The result of this cell will be graded. If you make any changes to your algorithms, you have to run this cell again before submitting the assignment."
1044 |    ]
1045 |   },
1046 |   {
1047 |    "cell_type": "code",
1048 |    "execution_count": null,
1049 |    "metadata": {
1050 |     "deletable": false,
1051 |     "editable": false,
1052 |     "nbgrader": {
1053 |      "cell_type": "code",
1054 |      "checksum": "343a62fbee9e83abdb3d4bd9a25c6283",
1055 |      "grade": false,
1056 |      "grade_id": "cell-6d11bb590ebfb0b2",
1057 |      "locked": true,
1058 |      "schema_version": 3,
1059 |      "solution": false
1060 |     }
1061 |    },
1062 |    "outputs": [],
1063 |    "source": [
1064 |     "# Do not modify this cell!\n",
1065 |     "\n",
1066 |     "agents = {\n",
1067 |     "    \"Q-learning\": QLearningAgent,\n",
1068 |     "    \"Expected Sarsa\": ExpectedSarsaAgent\n",
1069 |     "}\n",
1070 |     "env = cliffworld_env.Environment\n",
1071 |     "all_reward_sums = {} # Contains sum of rewards during episode\n",
1072 |     "all_state_visits = {} # Contains state visit counts during the last 10 episodes\n",
1073 |     "agent_info = {\"num_actions\": 4, \"num_states\": 48, \"epsilon\": 0.1, \"step_size\": 0.5, \"discount\": 1.0}\n",
1074 |     "env_info = {}\n",
1075 |     "num_runs = 100 # The number of runs\n",
1076 |     "num_episodes = 500 # The number of episodes in each run\n",
1077 |     "\n",
1078 |     "for algorithm in [\"Q-learning\", \"Expected Sarsa\"]:\n",
1079 |     "    all_reward_sums[algorithm] = []\n",
1080 |     "    all_state_visits[algorithm] = []\n",
1081 |     "    for run in tqdm(range(num_runs)):\n",
1082 |     "        agent_info[\"seed\"] = run\n",
1083 |     "        rl_glue = RLGlue(env, agents[algorithm])\n",
1084 |     "        rl_glue.rl_init(agent_info, env_info)\n",
1085 |     "\n",
1086 |     "        reward_sums = []\n",
1087 |     "        state_visits = np.zeros(48)\n",
1088 |     "#         last_episode_total_reward = 0\n",
1089 |     "        for episode in range(num_episodes):\n",
1090 |     "            if episode < num_episodes - 10:\n",
1091 |     "                # Runs an episode\n",
1092 |     "                rl_glue.rl_episode(0) \n",
1093 |     "            else: \n",
1094 |     "                # Runs an episode while keeping track of visited states\n",
1095 |     "                state, action = rl_glue.rl_start()\n",
1096 |     "                state_visits[state] += 1\n",
1097 |     "                is_terminal = False\n",
1098 |     "                while not is_terminal:\n",
1099 |     "                    reward, state, action, is_terminal = rl_glue.rl_step()\n",
1100 |     "                    state_visits[state] += 1\n",
1101 |     "                \n",
1102 |     "            reward_sums.append(rl_glue.rl_return())\n",
1103 |     "#             last_episode_total_reward = rl_glue.rl_return()\n",
1104 |     "            \n",
1105 |     "        all_reward_sums[algorithm].append(reward_sums)\n",
1106 |     "        all_state_visits[algorithm].append(state_visits)\n",
1107 |     "\n",
1108 |     "# save results\n",
1109 |     "import os\n",
1110 |     "import shutil\n",
1111 |     "os.makedirs('results', exist_ok=True)\n",
1112 |     "np.save('results/q_learning.npy', all_reward_sums['Q-learning'])\n",
1113 |     "np.save('results/expected_sarsa.npy', all_reward_sums['Expected Sarsa'])\n",
1114 |     "shutil.make_archive('results', 'zip', '.', 'results')\n",
1115 |     "\n",
1116 |     "        \n",
1117 |     "for algorithm in [\"Q-learning\", \"Expected Sarsa\"]:\n",
1118 |     "    plt.plot(np.mean(all_reward_sums[algorithm], axis=0), label=algorithm)\n",
1119 |     "plt.xlabel(\"Episodes\")\n",
1120 |     "plt.ylabel(\"Sum of\\n rewards\\n during\\n episode\",rotation=0, labelpad=40)\n",
1121 |     "plt.xlim(0,500)\n",
1122 |     "plt.ylim(-100,0)\n",
1123 |     "plt.legend()\n",
1124 |     "plt.show()"
1125 |    ]
1126 |   },
1127 |   {
1128 |    "cell_type": "markdown",
1129 |    "metadata": {
1130 |     "deletable": false,
1131 |     "editable": false,
1132 |     "nbgrader": {
1133 |      "cell_type": "markdown",
1134 |      "checksum": "1cbb34897b56a32ea1e378b95caa0842",
1135 |      "grade": false,
1136 |      "grade_id": "cell-c3967df7d24c7d02",
1137 |      "locked": true,
1138 |      "schema_version": 3,
1139 |      "solution": false
1140 |     }
1141 |    },
1142 |    "source": [
1143 |     "To see why these two agents behave differently, let's inspect the states they visit most. Run the cell below to generate plots showing the number of timesteps that the agents spent in each state over the last 10 episodes."
1144 |    ]
1145 |   },
1146 |   {
1147 |    "cell_type": "code",
1148 |    "execution_count": null,
1149 |    "metadata": {
1150 |     "deletable": false,
1151 |     "editable": false,
1152 |     "nbgrader": {
1153 |      "cell_type": "code",
1154 |      "checksum": "a5d9243d4e90f82665bc9ca467e065ef",
1155 |      "grade": false,
1156 |      "grade_id": "cell-37a2b6675676da6f",
1157 |      "locked": true,
1158 |      "schema_version": 3,
1159 |      "solution": false
1160 |     }
1161 |    },
1162 |    "outputs": [],
1163 |    "source": [
1164 |     "# Do not modify this cell!\n",
1165 |     "\n",
1166 |     "for algorithm, position in [(\"Q-learning\", 211), (\"Expected Sarsa\", 212)]:\n",
1167 |     "    plt.subplot(position)\n",
1168 |     "    average_state_visits = np.array(all_state_visits[algorithm]).mean(axis=0)\n",
1169 |     "    grid_state_visits = average_state_visits.reshape((4,12))\n",
1170 |     "    grid_state_visits[0,1:-1] = np.nan\n",
1171 |     "    plt.pcolormesh(grid_state_visits, edgecolors='gray', linewidth=2)\n",
1172 |     "    plt.title(algorithm)\n",
1173 |     "    plt.axis('off')\n",
1174 |     "    cm = plt.get_cmap()\n",
1175 |     "    cm.set_bad('gray')\n",
1176 |     "\n",
1177 |     "    plt.subplots_adjust(bottom=0.0, right=0.7, top=1.0)\n",
1178 |     "    cax = plt.axes([0.85, 0.0, 0.075, 1.])\n",
1179 |     "cbar = plt.colorbar(cax=cax)\n",
1180 |     "cbar.ax.set_ylabel(\"Visits during\\n the last 10\\n episodes\", rotation=0, labelpad=70)\n",
1181 |     "plt.show()"
1182 |    ]
1183 |   },
1184 |   {
1185 |    "cell_type": "markdown",
1186 |    "metadata": {
1187 |     "deletable": false,
1188 |     "editable": false,
1189 |     "nbgrader": {
1190 |      "cell_type": "markdown",
1191 |      "checksum": "e20aaec2eb1806cda6de9f75002264d5",
1192 |      "grade": false,
1193 |      "grade_id": "cell-c7575e40e56f751c",
1194 |      "locked": true,
1195 |      "schema_version": 3,
1196 |      "solution": false
1197 |     }
1198 |    },
1199 |    "source": [
1200 |     "The Q-learning agent learns the optimal policy, one that moves along the cliff and reaches the goal in as few steps as possible. However, since the agent does not follow the optimal policy and uses $\\epsilon$-greedy exploration, it occasionally falls off the cliff. The Expected Sarsa agent takes exploration into account and follows a safer path. Note this is different from the book. The book shows Sarsa learns the even safer path\n",
1201 |     "\n",
1202 |     "\n",
1203 |     "Previously we used a fixed step-size of 0.5 for the agents. What happens with other step-sizes? Does this difference in performance persist?\n",
1204 |     "\n",
1205 |     "In the next experiment we will try 10 different step-sizes from 0.1 to 1.0 and compare the sum of rewards per episode averaged over the first 100 episodes (similar to the interim performance curves in Figure 6.3 of the textbook). Shaded regions show standard errors.\n",
1206 |     "\n",
1207 |     "This cell takes around 10 minutes to run. The result of this cell will be graded. If you make any changes to your algorithms, you have to run this cell again before submitting the assignment."
1208 |    ]
1209 |   },
1210 |   {
1211 |    "cell_type": "code",
1212 |    "execution_count": null,
1213 |    "metadata": {
1214 |     "deletable": false,
1215 |     "editable": false,
1216 |     "nbgrader": {
1217 |      "cell_type": "code",
1218 |      "checksum": "96725dad62b0596792b4d5694f64637e",
1219 |      "grade": false,
1220 |      "grade_id": "cell-f079ef9418195c22",
1221 |      "locked": true,
1222 |      "schema_version": 3,
1223 |      "solution": false
1224 |     }
1225 |    },
1226 |    "outputs": [],
1227 |    "source": [
1228 |     "# Do not modify this cell!\n",
1229 |     "\n",
1230 |     "agents = {\n",
1231 |     "    \"Q-learning\": QLearningAgent,\n",
1232 |     "    \"Expected Sarsa\": ExpectedSarsaAgent\n",
1233 |     "}\n",
1234 |     "env = cliffworld_env.Environment\n",
1235 |     "all_reward_sums = {}\n",
1236 |     "step_sizes = np.linspace(0.1,1.0,10)\n",
1237 |     "agent_info = {\"num_actions\": 4, \"num_states\": 48, \"epsilon\": 0.1, \"discount\": 1.0}\n",
1238 |     "env_info = {}\n",
1239 |     "num_runs = 100\n",
1240 |     "num_episodes = 100\n",
1241 |     "all_reward_sums = {}\n",
1242 |     "\n",
1243 |     "for algorithm in [\"Q-learning\", \"Expected Sarsa\"]:\n",
1244 |     "    for step_size in step_sizes:\n",
1245 |     "        all_reward_sums[(algorithm, step_size)] = []\n",
1246 |     "        agent_info[\"step_size\"] = step_size\n",
1247 |     "        for run in tqdm(range(num_runs)):\n",
1248 |     "            agent_info[\"seed\"] = run\n",
1249 |     "            rl_glue = RLGlue(env, agents[algorithm])\n",
1250 |     "            rl_glue.rl_init(agent_info, env_info)\n",
1251 |     "\n",
1252 |     "            return_sum = 0\n",
1253 |     "            for episode in range(num_episodes):\n",
1254 |     "                rl_glue.rl_episode(0)\n",
1255 |     "                return_sum += rl_glue.rl_return()\n",
1256 |     "            all_reward_sums[(algorithm, step_size)].append(return_sum/num_episodes)\n",
1257 |     "        \n",
1258 |     "\n",
1259 |     "for algorithm in [\"Q-learning\", \"Expected Sarsa\"]:\n",
1260 |     "    algorithm_means = np.array([np.mean(all_reward_sums[(algorithm, step_size)]) for step_size in step_sizes])\n",
1261 |     "    algorithm_stds = np.array([sem(all_reward_sums[(algorithm, step_size)]) for step_size in step_sizes])\n",
1262 |     "    plt.plot(step_sizes, algorithm_means, marker='o', linestyle='solid', label=algorithm)\n",
1263 |     "    plt.fill_between(step_sizes, algorithm_means + algorithm_stds, algorithm_means - algorithm_stds, alpha=0.2)\n",
1264 |     "\n",
1265 |     "plt.legend()\n",
1266 |     "plt.xlabel(\"Step-size\")\n",
1267 |     "plt.ylabel(\"Sum of\\n rewards\\n per episode\",rotation=0, labelpad=50)\n",
1268 |     "plt.xticks(step_sizes)\n",
1269 |     "plt.show()"
1270 |    ]
1271 |   },
1272 |   {
1273 |    "cell_type": "markdown",
1274 |    "metadata": {
1275 |     "deletable": false,
1276 |     "editable": false,
1277 |     "nbgrader": {
1278 |      "cell_type": "markdown",
1279 |      "checksum": "6113751690c166257cd1ace47ef977b1",
1280 |      "grade": false,
1281 |      "grade_id": "cell-e2c9c37b494e40f1",
1282 |      "locked": true,
1283 |      "schema_version": 3,
1284 |      "solution": false
1285 |     }
1286 |    },
1287 |    "source": [
1288 |     "## Wrapping up"
1289 |    ]
1290 |   },
1291 |   {
1292 |    "cell_type": "markdown",
1293 |    "metadata": {
1294 |     "deletable": false,
1295 |     "editable": false,
1296 |     "nbgrader": {
1297 |      "cell_type": "markdown",
1298 |      "checksum": "893577356341c384f4e2457631037f81",
1299 |      "grade": false,
1300 |      "grade_id": "cell-10150ffd5c7c91f8",
1301 |      "locked": true,
1302 |      "schema_version": 3,
1303 |      "solution": false
1304 |     }
1305 |    },
1306 |    "source": [
1307 |     "Expected Sarsa shows an advantage over Q-learning in this problem across a wide range of step-sizes.\n",
1308 |     "\n",
1309 |     "Congratulations! Now you have:\n",
1310 |     "\n",
1311 |     "- implemented Q-Learning with $\\epsilon$-greedy action selection\n",
1312 |     "- implemented Expected Sarsa with $\\epsilon$-greedy action selection\n",
1313 |     "- investigated the behavior of these two algorithms on Cliff World\n",
1314 |     "\n",
1315 |     "To submit your solution, you will need to submit the `results.zip` file generated by the experiments. Here are the steps:\n",
1316 |     "\n",
1317 |     "- Go to the `file` menu at the top of the screen\n",
1318 |     "- Select `open`\n",
1319 |     "- Click the selection square next to `results.zip`\n",
1320 |     "- Select `Download` from the top menu\n",
1321 |     "- Upload that file to the grader in the next part of this module\n"
1322 |    ]
1323 |   }
1324 |  ],
1325 |  "metadata": {
1326 |   "coursera": {
1327 |    "course_slug": "sample-based-learning-methods",
1328 |    "launcher_item_id": "biN1L"
1329 |   },
1330 |   "kernelspec": {
1331 |    "display_name": "Python 3",
1332 |    "language": "python",
1333 |    "name": "python3"
1334 |   },
1335 |   "language_info": {
1336 |    "codemirror_mode": {
1337 |     "name": "ipython",
1338 |     "version": 3
1339 |    },
1340 |    "file_extension": ".py",
1341 |    "mimetype": "text/x-python",
1342 |    "name": "python",
1343 |    "nbconvert_exporter": "python",
1344 |    "pygments_lexer": "ipython3",
1345 |    "version": "3.7.1"
1346 |   }
1347 |  },
1348 |  "nbformat": 4,
1349 |  "nbformat_minor": 2
1350 | }
1351 | 


--------------------------------------------------------------------------------
/Sample-based Learning Methods/Week 2/03a_Blackjack.ipynb:
--------------------------------------------------------------------------------
 1 | {
 2 |  "cells": [
 3 |   {
 4 |    "cell_type": "markdown",
 5 |    "metadata": {},
 6 |    "source": [
 7 |     "# Practice Assignment 1 - Blackjack"
 8 |    ]
 9 |   },
10 |   {
11 |    "cell_type": "markdown",
12 |    "metadata": {},
13 |    "source": [
14 |     "Welcome to Course 2 Practice Assignment 1. In this notebook you will see the optimal policy for Blackjack with different deal policies.\n",
15 |     "\n",
16 |     "In a previous video (Solving the Blackjack Example) we described the optimal policy for Blackjack when the dealer sticks at 17. How would the optimal policy change if the dealer's policy were different? Would the optimal agent play more conservatively?\n",
17 |     "\n",
18 |     "We ran experiments similar to Example 5.3 in the textbook but with different dealer policies. You can change dealer_sticks in the cell below to another number between 12 and 20 and see the optimal policy against that dealer.\n",
19 |     "\n",
20 |     "This notebook is not graded. You do not need to write down any answers."
21 |    ]
22 |   },
23 |   {
24 |    "cell_type": "code",
25 |    "execution_count": 23,
26 |    "metadata": {},
27 |    "outputs": [
28 |     {
29 |      "data": {
30 |       "image/png": "iVBORw0KGgoAAAANSUhEUgAAApkAAAMWCAYAAAC6Pp3rAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAAPYQAAD2EBqD+naQAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4zLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvnQurowAAIABJREFUeJzs3Xt8zvXj//HnZYdrw3axMTZsRFmRQ0gi5pDZRw4RHdCUPlIoKdXqU03S6Pz5RHRYVko+HRwTPigbOY4WCplWxAhxzcauzfb+/dHP9W1tTvPe9b7U4367vW+3rvfxeW3Fs9f7ZDMMwxAAAABgokpWBwAAAMBfDyUTAAAApqNkAgAAwHSUTAAAAJiOkgkAAADTUTIBAABgOkomAAAATEfJBAAAgOkomQAAADAdJROAR/3000+y2Wyy2Wz66aefPL49AMAzKJnAJap+/fqy2WwaOnToOddNSUmhmAEAPIqSCQAAANNRMgEAAGA6SiYAAABMR8kEoPXr12vQoEFq0KCBAgICVKVKFUVFRalTp06aMGGCfvnllxLrFxcXa8WKFXrggQd03XXXqW7duvL391doaKg6deqk6dOnq7Cw8LyOvWvXLg0dOlR169aV3W5XZGSkRowYof3795f7+xQXF+vDDz/UP/7xD9WqVUv+/v6qWbOmunfvro8++kiGYZRrvwcOHNDrr7+uPn366Morr5TD4VBgYKAaNWqke+65R999990593HkyBE9++yzatu2rUJCQhQQEKD69eure/fumjZtmpxOZ5nbbdu2TcOHD9fll1+uypUrq2rVqmrWrJmefPJJHT58uFzfBwAqlAHgkhQVFWVIMuLj48+57owZMwxJhiQjKyurxLKUlBTDZrO5l9vtdiM4ONj9WZIxY8aMEttkZWWVWF61alXD4XCUmHfDDTcYJ06cKJXlj9vOnj3bCAoKcu8jMDDQvSwkJMTYtGnTWbf/83cxDMM4cuSI0bFjxxJZ/pytd+/ehsvlOufP7c/i4+Pd+/D19TVCQkIMX1/fEj+7Tz/99IzbL1261KhevXqJfYSGhhp+fn7ueXPnzi213eTJk41KlSq516lcubLh7+/v/hweHm5s3rz5gr8PAFQkRjKBv7ETJ05o9OjRMgxDgwcPVmZmpvLz8+V0OpWbm6v09HSNGzdOYWFhJbbz9fXVoEGDtGDBAh05ckTHjx/XsWPHdPz4cc2YMUMRERFatWqVnnzyybMe/95771WDBg20fv16HT9+XHl5eVq6dKkiIyP122+/6eabb9bx48fP+/sUFRWpX79+SktLU4sWLbRw4ULl5eXp2LFjys3N1XvvvaewsDAtWLBAjz322AX/vBo1aqQXX3xRW7du1cmTJ3XkyBG5XC5t27ZNgwYNksvlUnx8fJmjsN9884369Omjo0ePqkmTJvriiy904sQJHT58WCdPnlR6eroefvhhBQUFldguOTlZjz32mCpXrqyJEycqOztbeXl5OnHihNLT09WlSxdlZ2erd+/eys3NveDvBAAVxuqWC6B8zBjJXL9+vSHJqFKlilFYWGhato0bN7r3e/LkyRLL/jgSGRoaahw8eLDU9t9//717pO6FF1444/Z/Hsl8//33DUlGdHS0cezYsTKzpaenGzabzfD39y/z2BejZ8+ehiRjwoQJpZZ16NDBkGRcfvnlZ8z2Zzk5OUa1atUMScaSJUvKXKewsNBo1aqVIcl49dVXLyo/AJiJkUzgb6xatWqSpIKCAh05csS0/bZu3VphYWHKy8tTRkbGGdcbMWJEqVFSSbryyit1yy23SJJmz5593sdNTk6WJN13331yOBxlrtOqVSs1adJEBQUF+uqrr8573+ejZ8+ekqTVq1eXmL9r1y73vOeff/6M2f7ss88+07Fjx9SyZUvFxsaWuY6vr69uv/12SdLSpUvLGx0ATOdrdQAA1mnYsKGio6O1Y8cOtW3bVvfdd59iY2N19dVXy8fH56zbFhQU6N1339WcOXO0bds2HTlyRAUFBaXW+/NNQ3/UpUuXsy6bNWuWtmzZosLCQvn5+Z01T1FRkdatWydJSkxM1PPPP3/GdX/77TdJ0s8//3zWfZbl22+/1ZtvvqnVq1frp59+Um5ubqkbif78ndesWSNJ8vHxUVxc3Hkf6+uvv5Ykbd++XbVr1z7jeidPnpRUvu8DABWFkgn8jfn4+Gj27Nm6+eablZWVpccff1yPP/64KleurOuvv179+vVTfHy8KleuXGK7X3/9Vd26ddPWrVvd8wICAlSjRg13OT106JCKi4uVl5d3xuPXqVPnnMtOnTql3377TbVq1Trrd/ntt9/kcrkkSUePHj37F///Tpw4cV7rnTZlyhQ9+OCDKi4uliTZbDY5HA7Z7XZJv5e9nJycUt/5wIEDkqQaNWqoSpUq532809d25ufnKz8//5zrX+j3AYCKxOly4BIVGBgo6f9Gsc7mj+Xj9HanNW/eXDt27NBnn32m4cOHq2nTpjp58qSWL1+u+++/X9HR0SXKpCQ99NBD2rp1q0JDQ/Xuu+8qOztbJ0+e1KFDh3TgwAEdOHBAERERklTuxwVdqKKiIvc/L168WIZhnHNKTEw87/1v375dY8aMUXFxsQYMGKANGzYoPz9fR48edX/nV155RVLp72yz2S7qO916663n9X14ZSgAb0LJBC5RNWrUkCTt27fvnOueXqdSpUoKCQkptdzf31/9+vXTm2++qa1bt+rQoUOaPn26QkJCtHfvXsXHx7vXLSws1Jw5cyT9PrJ31113lTqVW1RUdF7Pbjxb9tPLfH19y8z8Z6GhofL1/f3kTEWcNv70009VVFSkK6+8UrNnz1abNm3k7+9fYp3TI5Z/dvrnc/jw4bOO7J5pO06DA7gUUTKBS1SrVq0k/f5onHMVl9M3nTRr1uyc1zZKvxe2e++9V5MnT3Yf4/SNQYcOHXKfum3ZsuUZj3c+p3fPduPN6WXnm9nPz0/XXnutJGnhwoXnXP9C7d27V9LvI7+VKpX9R+fy5cvLnH/99ddL+r18L168+LyP2b59e0nSpk2blJ2dfSFxAcBylEzgEnXrrbdK+v1U+H/+858zrvf1119r1apVkqSBAweWWHb6GsYz+eOp9dPFKjg42H3699tvvy21zalTp875fMzTpk+fXuaI586dO/Xpp59K+r/veT6GDx8uSfriiy/0xRdfnHXd0zf/nK/Td4Rv3bq1zEsAFi9erJUrV5a5baNGjdSxY0dJ0hNPPKGcnJzzOuaAAQNUrVo1FRYWauzYsWe99KC4uFjHjh07r/0CgEd48HFJAEzWv39/Q5JRqVIlIyEhwdizZ4972bFjx4x33nnH/YaZyy67zHA6nSW2T0lJMa6//npj+vTpxu7du93zT506ZSxZssSoW7euIclo165die1OP/OxTp06xooVK4yioiLDMAxj69atxo033mjY7XajSpUq53xbkMPhMFq0aGFs2LDBMAzDKC4uNpYtW+Z+Bmi9evVKZT7bczJPnTpldOvWzZBk+Pv7GxMmTDD27dvnXp6bm2t8+eWXxv333284HI4L+lkvX77cfdz77rvPOHLkiHuf06dPNypXrmyEhoYakoyoqKhS23/zzTdGQECAIclo2rSpsXjxYqOgoMCde8OGDca9995rLFu2rMR2KSkp7uPGxcUZ69atc/+8i4qKjO+//9546aWXjOjoaGPmzJkX9J0AoCJRMoFLWE5OjvsB4KenqlWrGtWrVy/xqsgrrrjC2LFjR6nt//iQdv3/1yKGhoaWeIVhRESEsX379hLbpaenu0vk6e1Ovx7S19fXeP/9991F8Wwl88+vlaxcubJ7WbVq1YyNGzeWynyu10o6nU7jpptuKvG9goODjWrVqpX4mfj6+l7wz/u2224rsd9q1aoZPj4+hiSjVatWxuuvv37GkmkYv79W8o+vuPTz8zuv10pOmzatxGskT/+e/ridJOODDz644O8EABWF0+XAJSwoKEgLFy7UokWLdNttt6lBgwbuxwbVqlVLsbGxmj59ur799ls1bty41Pa9e/fW+++/r7vuukvNmzeXw+GQ0+lUUFCQrr32Wk2YMEHfffedoqOjS2zXqlUrbdiwQQMHDlSNGjVUXFysoKAgDRw4UGvWrNGQIUPOK3/btm2Vnp6uO++8Uw6HQ6dOnVKdOnX0z3/+U1u3blXr1q0v+GcSHByshQsX6osvvtCtt96qyMhIuVwunThxQnXq1FH37t2VlJSknTt3XvC+P/zwQ7322mtq1qyZ7Ha7ioqKdPXVVyspKUlff/21qlatetbtu3fvrl27dunJJ59Uy5YtFRgYqLy8PNWpU0exsbF68803y3x26IgRI7Rz50498sgjat68uex2u44dO6aqVauqdevWGj16tJYtW+Z+KDsAeAObYXjo+SIAAAD422AkEwAAAKajZAIAAMB0lEwAAACYjpIJAAAA01EyAQAAYDpKJgAAAEzna3UAnFlxcbH279+voKAg92v8AAC4EIZh6Pjx44qIiHC/HhbwBEqmF9u/f7/q1atndQwAwF/A3r17VbduXatj4G+EkunFgoKCJEl7b+2kYH9+VQCAC5dTcEr1/pvq/jsF8BSaixc7fYo82N+XkgkAuChcdgVP4+IMAAAAmI6SCQAAANNRMgEAAGA6SiYAAABMR8kEAACA6SiZAAAAMB0lEwAAAKajZAIAAMB0lEwTJSUlqU2bNgoKClJYWJj69u2rnTt3lljnrbfeUkxMjIKDg2Wz2XTs2DGL0gIAAFQcSqaJUlNTNXLkSK1bt07Lli1TYWGhunfvrry8PPc6J06cUI8ePfTEE09YmBQAAKBi8a5CEy1ZsqTE55SUFIWFhWnTpk3q2LGjJGnMmDGSpJUrV3o6HgAAgMcwklmBnE6nJCkkJMTiJAAAAJ5FyawgxcXFGjNmjNq3b6+mTZtaHQcAAMCjOF1eQUaOHKlt27Zp9erVVkcBAADwOEpmBRg1apQ+//xzpaWlqW7dulbHAQAA8DhKpokMw9Do0aM1d+5crVy5Ug0aNLA6EgAAgCUomSYaOXKkZs2apfnz5ysoKEgHDhyQJDkcDgUGBkqSDhw4oAMHDigzM1OStHXrVgUFBSkyMpIbhAAAwF8GN/6YaNq0aXI6nYqJiVF4eLh7+u9//+teZ/r06WrZsqX++c9/SpI6duyoli1basGCBVbFBgAAMJ3NMAzD6hAoW05OjhwOh5xDuirYn0FnAMCFyyk4JcfMFXI6nQoODrY6Dv5GGMkEAACA6SiZAAAAMB0lEwAAAKajZAIAAMB0lEwAAACYjpIJAAAA01EyAQAAYDoevuhFXC6XXC6X+3NOTo6FaQAAAMqPkUwvkpSUJIfD4Z7q1atndSQAAIByoWR6kYSEBDmdTve0d+9eqyMBAACUC6fLvYjdbpfdbrc6BgAAwEVjJBMAAACmo2QCAADAdJRMAAAAmI6SCQAAANNRMgEAAGA6SiYAAABMR8kEAACA6SiZAAAAMB0l00RJSUlq06aNgoKCFBYWpr59+2rnzp0l1snPz9fIkSMVGhqqqlWrqn///jp48KBFiQEAACoGJdNEqampGjlypNatW6dly5apsLBQ3bt3V15ennudhx56SAsXLtQnn3yi1NRU7d+/X/369bMwNQAAgPlshmEYVof4qzp06JDCwsKUmpqqjh07yul0qmbNmpo1a5ZuueUWSdKOHTt05ZVXau3atbruuutKbJ+TkyOHwyHnkK4K9ucNoACAC5dTcEqOmSvkdDoVHBxsdRz8jTCSWYGcTqckKSQkRJK0adMmFRYWqlu3bu51oqOjFRkZqbVr11qSEQAAoCJQMitIcXGxxowZo/bt26tp06aSpAMHDsjf31/VqlUrsW6tWrV04MABK2ICAABUCM7BVpCRI0dq27ZtWr16tdVRAAAAPI6RzAowatQoff755/rqq69Ut25d9/zatWuroKBAx44dK7H+wYMHVbt2bU/HBAAAqDCUTBMZhqFRo0Zp7ty5+vLLL9WgQYMSy1u1aiU/Pz+tWLHCPW/nzp3as2eP2rVr5+m4AAAAFYbT5SYaOXKkZs2apfnz5ysoKMh9naXD4VBgYKAcDoeGDRumsWPHKiQkRMHBwRo9erTatWtX6s5yAACASxkl00TTpk2TJMXExJSYP2PGDA0dOlSS9Oqrr6pSpUrq37+/XC6XYmNj9cYbb3g4KQAAQMXiOZlejOdkAgAuFs/JhFW4JhMAAACmo2QCAADAdJRMAAAAmI6SCQAAANNRMgEAAGA6SiYAAABMR8kEAACA6SiZAAAAMB1P+PYiLpdLLpfL/TknJ8fCNAAAAOXHSKYXSUpKksPhcE/16tWzOhIAAEC5UDK9SEJCgpxOp3vau3ev1ZEAAADKhdPlXsRut8tut1sdAwAA4KIxkgkAAADTUTIBAABgOkomAAAATEfJBAAAgOkomQAAADAdJRMAAACmo2QCAADAdJRMAAAAmI6SabK0tDT16tVLERERstlsmjdvXonlBw8e1NChQxUREaHKlSurR48e2rVrl0VpAQAAKgYl02R5eXlq3ry5pk6dWmqZYRjq27evfvzxR82fP1/ffPONoqKi1K1bN+Xl5VmQFgAAoGLwWkmTxcXFKS4ursxlu3bt0rp167Rt2zY1adJEkjRt2jTVrl1bH330ke655x5PRgUAAKgwjGR6kMvlkiQFBAS451WqVEl2u12rV6+2KhYAAIDpKJkeFB0drcjISCUkJOjo0aMqKCjQ5MmT9csvvyg7O9vqeAAAAKahZHqQn5+f5syZox9++EEhISGqXLmyvvrqK8XFxalSJX4VAADgr4NrMj2sVatWysjIkNPpVEFBgWrWrKm2bduqdevWVkcDAAAwDcNnFnE4HKpZs6Z27dql9PR09enTx+pIAAAApmEk02S5ubnKzMx0f87KylJGRoZCQkIUGRmpTz75RDVr1lRkZKS2bt2qBx98UH379lX37t0tTA0AAGAuSqbJ0tPT1blzZ/fnsWPHSpLi4+OVkpKi7OxsjR07VgcPHlR4eLjuvPNOPfXUU1bFBQAAqBA2wzAMq0OgbDk5OXI4HHIO6apgf/5/AABw4XIKTskxc4WcTqeCg4OtjoO/Ea7JBAAAgOkomQAAADAdJRMAAACmo2QCAADAdJRMAAAAmI6SCQAAANNRMgEAAGA6Hr7oRVwul1wul/tzTk6OhWkAAADKj5FML5KUlCSHw+Ge6tWrZ3UkAACAcuGNP16krJHMevXq6Zm0HxVQNcjCZGf2+NTBVkcA8Bc3aeQHVke4pOXnHtf4jpfxxh94HKfLvYjdbpfdbrc6BgAAwEXjdDkAAABMR8kEAACA6SiZAAAAMB0lEwAAAKajZAIAAMB0lEwAAACYjpIJAAAA01EyAQAAYDpKpsnS0tLUq1cvRUREyGazad68eSWW5+bmatSoUapbt64CAwN11VVXafr06RalBQAAqBiUTJPl5eWpefPmmjp1apnLx44dqyVLluiDDz7Q9u3bNWbMGI0aNUoLFizwcFIAAICKw2slTRYXF6e4uLgzLl+zZo3i4+MVExMjSRo+fLjefPNNbdiwQb179/ZQSgAAgIrFSKaHXX/99VqwYIH27dsnwzD01Vdf6YcfflD37t2tjgYAAGAaRjI97PXXX9fw4cNVt25d+fr6qlKlSnr77bfVsWNHq6MBAACYhpLpYa+//rrWrVunBQsWKCoqSmlpaRo5cqQiIiLUrVs3q+MBAACYgpLpQSdPntQTTzyhuXPnqmfPnpKkZs2aKSMjQy+99BIlEwAA/GVwTaYHFRYWqrCwUJUqlfyx+/j4qLi42KJUAAAA5mMk02S5ubnKzMx0f87KylJGRoZCQkIUGRmpTp06ady4cQoMDFRUVJRSU1P1/vvv65VXXrEwNQAAgLkomSZLT09X586d3Z/Hjh0rSYqPj1dKSopmz56thIQEDRo0SL/99puioqI0ceJEjRgxwqrIAAAApqvQkpmYmKjx48frq6++cj8X8q8uJiZGhmGccXnt2rU1Y8YMDyYCAADwvAu+JvOnn36SzWYrMfn7+6tevXq64447tGXLlorICQAAgEtIuUcyGzZsqMGDB0v6/TrEdevW6aOPPtKcOXO0YsUKtW/f3rSQAAAAuLSUu2Q2atRIiYmJJeb961//0sSJE/Xkk09q5cqVFxkNAAAAlypTH2E0evRoSdLGjRvPut67776rPn36qH79+goICFBISIhiY2P11VdflVhv+fLlstlsuv/++8vcz+7du1WpUiXFxsaWmH/8+HE988wzatKkiQIDA1WtWjXFxsZq9erVpfYRExMjm82m/Px8/etf/1LDhg3l5+dXqkADAADg/FXIjT82m+2sy0eOHKnmzZurW7duqlmzpvbt26d58+apW7dumjNnjvr06SNJ6tq1qxo2bKhZs2bppZdeUuXKlUvs55133pFhGPrnP//pnvfbb7+pY8eO+u6779S+fXuNGDFCOTk5mj9/vjp37qxPPvlEffv2LZWpf//++vbbb9WjRw9Vq1ZNDRo0MOEnAQAA8Pdkasl84403JEnXXnvtWdf7/vvvS5W47OxstW7dWuPGjXOXTJvNpuHDh+uxxx7TJ598ovj4ePf6p06d0nvvvaewsDD3+tLvo6nfffed3n77bd1zzz3u+UlJSWrdurWGDx+uHj16KCAgoMTx9+/fry1btigkJKR8Xx4AAABu5T5dnpmZqcTERCUmJmrcuHHq2LGjnn32WQUEBGjixIln3basUcLw8HD1799fu3bt0s8//+yef9ddd8nf31/vvPNOifUXLVqk7OxsxcfHy8/PT5J0+PBh/fe//1WXLl1KFExJCgsL07hx43To0CEtX7681PHHjx9PwQQAADBJuUcyd+/erfHjx0uS/Pz8VKtWLd1xxx16/PHHdfXVV5912x9//FFJSUn68ssvtW/fPrlcrhLL9+/fr6ioKElSzZo11a9fP82ePVs7duxQdHS0JLlL5x/L5MaNG1VUVCSXy1XmNZW7du2SJO3YsUM33XRTiWXnGn31BJfLVeJnkZOTY2EaAACA8it3yYyNjdWSJUsueLvMzExde+21ysnJUefOndWrVy8FBwerUqVKWrlypVJTU0uVznvvvVezZ8/WO++8o5deekn79+/X4sWL1alTJ11xxRXu9X777TdJ0tdff62vv/76jBny8vJKzatVq9YFfxezJSUluYs7AADApczjr5V89dVXdfToUc2cOdP9nM3TRowYodTU1FLbxMTEKDo6Wu+//76ef/55zZgxQ0VFRSVu+JGk4OBgSdLDDz+sl1566YJynetmJU9ISEhwv4ZS+n0ks169ehYmOrdJIz+wOgIAAPBCpj7C6Hzs3r1bkkrcrCNJhmGcdfRx+PDhOnTokObNm6d3331X1atXV//+/Uus06ZNG9lsNq1du9b84B5gt9sVHBxcYgIAALgUebxknr7W8s/PrJw0aZK2bdt2xu3i4+MVEBCghx56SD/++KOGDBlS6g7x2rVra+DAgVqzZo1efPHFMt8hvn79ep04ccKEbwIAAIAz8fjp8hEjRmjGjBnq37+/Bg4cqNDQUK1bt06bN29Wz549tWjRojK3CwkJ0YABAzRz5kxJKnWq/LQ33nhDO3fu1KOPPqqZM2eqXbt2qlatmvbu3av09HTt2rVL2dnZpZ65CQAAAPN4fCSzZcuW+t///qdrrrlGc+bM0bvvvqtq1arp66+/VuvWrc+67ennZF533XVq2rRpmeuEhIRozZo1euGFF+Tv768PP/xQr7/+utatW6cmTZro/fffV40aNUz/XgAAAPg/NqOsc8pe6qWXXtK4ceOUnJysu+++2+o4FS4nJ0cOh0PPpP2ogKpBVscBAFyC8nOPa3zHy+R0OrnWHx7l8ZHM8srPz9eUKVNUvXp13XbbbVbHAQAAwFl4/JrMC7V69WqlpqZq6dKl+vnnn5WUlMT1lAAAAF7O60vm8uXLNX78eNWoUUMPPfSQHnnkEasjAQAA4BwuqWsy/264JhMAcLG4JhNWuWSuyQQAAMClg5JpsrS0NPXq1UsRERGy2WyaN29eieU2m63M6cUXX7QoMQAAgPkomSbLy8tT8+bNNXXq1DKXZ2dnl5jeffdd2Wy2Uq/IBAAAuJR5/Y0/l5q4uDjFxcWdcXnt2rVLfJ4/f746d+6syy67rKKjAQAAeAwl00IHDx7UokWL9N5771kdBQAAwFScLrfQe++9p6CgIPXr18/qKAAAAKaqsJL52WefqVOnTgoLC1NAQIAiIiLUrVs3ffbZZ5KklStXymazKTExsdS2P/30k2w2m4YOHVpifv369VW/fn05nU7dd999Cg8PV5UqVdSxY0dt3rxZkrR//34NHjxYYWFhCgwMVPfu3bVr166K+poX5d1339WgQYMUEBBgdRQAAABTVcjp8mnTpun+++9XeHi4br75ZoWGhurAgQPasGGD5s6de1E3uRQUFOjGG29Ufn6+br31Vh08eFAff/yxunXrpjVr1ig2Nlbh4eEaPHiwMjMztXDhQvXs2VPbt2+Xj4+Pid/y4qxatUo7d+7Uf//7X6ujAAAAmK5CSuY777wjf39/ZWRkKCwsrMSyI0eOXNS+s7Oz1aFDB82aNUu+vr/Hb9mypR577DG1a9dOd911l15++WXZbDZJ0v33369p06Zp/vz5XnVaOjk5Wa1atVLz5s2tjgIAAGC6Cjtd7ufnJz8/v1LzQ0NDL3rfL730krtgStLtt98uSTp16pSee+45d8H847Jvv/32oo97PnJzc5WRkaGMjAxJUlZWljIyMrRnzx73Ojk5Ofrkk090zz33eCQTAACAp1VIybztttuUl5enpk2baty4cfriiy+Uk5Njyr6rV6+uyMjIEvPCw8MlSZdffrkqV65c5rL9+/ebcvxzSU9PV8uWLdWyZUtJ0tixY9WyZUs9/fTT7nVmz54twzDcBRgAAOCvpkJK5iOPPKLk5GRFRETo5ZdfVs+ePRUaGqq+ffsqKyvrovZd1ntXT49qnm1ZYWHhRR33fMXExMgwjFJTSkqKe53hw4frxIkTcjgcHskEAADgaRVSMm2UUEYoAAAgAElEQVQ2m+6++25t3LhRhw4d0ty5c9WvXz/Nnz9fN910k4qKilSp0u+HPnXqVKntnU5nRcQCAACAh1T4w9hPj2D27dtXhw8f1pdffqnMzExVr15dkrRv375S23zzzTcVHQsAAAAVqEJGMleuXCnDMErMKyws1G+//SZJCggIUOPGjRUUFKQFCxa450u/vwXnueeeq4hYAAAA8JAKGcns27evgoODdd111ykqKkqFhYVatmyZvv/+e91yyy2KioqSJI0ePVrPP/+8rrnmGvXp00fHjx/XwoUL1alTJ+3evbsiogEAAMADKqRkJiUlacmSJdqwYYMWLlyoKlWqqGHDhpo2bZqGDRvmXm/ChAny9/dXcnKypk+frvr16+upp55Sr1693G8GAgAAwKXHZvz5vDa8Rk5OjhwOh55J+1EBVYOsjgMAuATl5x7X+I6Xyel0lvkUFqCiVNjD2AEAAPD3VeF3l+P8uVwuuVwu92ezHmAPAADgaYxkepGkpCQ5HA73VK9ePasjAQAAlAsl04skJCTI6XS6p71791odCQAAoFw4Xe5F7Ha77Ha71TEAAAAuGiOZAAAAMB0lEwAAAKajZAIAAMB0lEwAAACYjpIJAAAA01EyAQAAYDpKJgAAAExHyQQAAIDpKJkmS0tLU69evRQRESGbzaZ58+aVWmf79u3q3bu3HA6HqlSpojZt2mjPnj0WpAUAAKgYlEyT5eXlqXnz5po6dWqZy3fv3q0OHTooOjpaK1eu1JYtW/TUU08pICDAw0kBAAAqDq+VNFlcXJzi4uLOuPzJJ5/UP/7xD73wwgvueQ0bNvRENAAAAI9hJNODiouLtWjRIl1xxRWKjY1VWFiY2rZtW+YpdQAAgEsZJdODfv31V+Xm5mrSpEnq0aOH/ve//+nmm29Wv379lJqaanU8AAAA03C63IOKi4slSX369NFDDz0kSWrRooXWrFmj6dOnq1OnTlbGAwAAMA0jmR5Uo0YN+fr66qqrriox/8orr+TucgAA8JdCyfQgf39/tWnTRjt37iwx/4cfflBUVJRFqQAAAMzH6XKT5ebmKjMz0/05KytLGRkZCgkJUWRkpMaNG6dbb71VHTt2VOfOnbVkyRItXLhQK1eutC40AACAySiZJktPT1fnzp3dn8eOHStJio+PV0pKim6++WZNnz5dSUlJeuCBB9S4cWN99tln6tChg1WRAQAATGczDMOwOgTKlpOTI4fDoWfSflRA1SCr4wAALkH5ucc1vuNlcjqdCg4OtjoO/ka4JhMAAACmo2QCAADAdJRMAAAAmI6SCQAAANNRMgEAAGA6SiYAAABMR8kEAACA6XgYuxdxuVxyuVzuzzk5ORamAQAAKD9GMr1IUlKSHA6He6pXr57VkQAAAMqFkulFEhIS5HQ63dPevXutjgQAAFAunC73Ina7XXa73eoYAAAAF42RTAAAAJiOkgkAAADTUTIBAABgOkomAAAATEfJBAAAgOkomQAAADAdJRMAAACmo2QCAADAdJRMAAAAmI6SabK0tDT16tVLERERstlsmjdvXonlQ4cOlc1mKzH16NHDorQAAAAVg5Jpsry8PDVv3lxTp0494zo9evRQdna2e/roo488mBAAAKDi8e5yk8XFxSkuLu6s69jtdtWuXdtDiQAAADyPkUwLrFy5UmFhYWrcuLHuu+8+HTlyxOpIAAAApmIk08N69Oihfv36qUGDBtq9e7eeeOIJxcXFae3atfLx8bE6HgAAgCkomR522223uf/56quvVrNmzdSwYUOtXLlSXbt2tTAZAACAeThdbrHLLrtMNWrUUGZmptVRAAAATEPJtNgvv/yiI0eOKDw83OooAAAApuF0uclyc3NLjEpmZWUpIyNDISEhCgkJ0fjx49W/f3/Vrl1bu3fv1qOPPqpGjRopNjbWwtQAAADmomSaLD09XZ07d3Z/Hjt2rCQpPj5e06ZN05YtW/Tee+/p2LFjioiIUPfu3TVhwgTZ7XarIgMAAJiOkmmymJgYGYZxxuVLly71YBoAAABrcE0mAAAATEfJBAAAgOkomQAAADAdJRMAAACm48YfAADgMfn5+SooKLA6hmn8/f0VEBBgdQyvRMkEAAAekZ+frwYhDh04+dcpmbVr11ZWVhZFswyUTAAA4BEFBQU6cLJAe2/rpGC/S7+C5BSeUr3ZqSooKKBkluHS/w3/hbhcLrlcLvfnnJwcC9MAAFAxgv18FexPBfmr48YfL5KUlCSHw+Ge6tWrZ3UkAACAcqFkepGEhAQ5nU73tHfvXqsjAQAAlAtj1V7EbrfzDnMAAPCXwEgmAAAATEfJBAAAgOkomQAAADAdJRMAAACmo2QCAADAdJRMAAAAmI6SCQAAANNRMgEAAGA6SqbJ0tLS1KtXL0VERMhms2nevHlnXHfEiBGy2Wx67bXXPJgQAACg4lEyTZaXl6fmzZtr6tSpZ11v7ty5WrdunSIiIjyUDAAAwHN4raTJ4uLiFBcXd9Z19u3bp9GjR2vp0qXq2bOnh5IBAAB4DiOZHlZcXKwhQ4Zo3LhxatKkidVxAAAAKgQl08MmT54sX19fPfDAA1ZHAQAAqDCcLvegTZs26d///rc2b94sm81mdRwAAIAKw0imB61atUq//vqrIiMj5evrK19fX/388896+OGHVb9+favjAQAAmIaRTA8aMmSIunXrVmJebGyshgwZorvuusuiVAAAAOajZJosNzdXmZmZ7s9ZWVnKyMhQSEiIIiMjFRoaWmJ9Pz8/1a5dW40bN/Z0VAAAgApDyTRZenq6Onfu7P48duxYSVJ8fLxSUlIsSgUAAOBZlEyTxcTEyDCM817/p59+qrgwAAAAFuHGHwAAAJiOkgkAAADTUTIBAABgOkomAAAATEfJBAAAgOkomQAAADAdJRMAAACm4zmZXsTlcsnlcrk/5+TkWJgGAACg/BjJ9CJJSUlyOBzuqV69elZHAgAAKBdKphdJSEiQ0+l0T3v37rU6EgAAQLlwutyL2O122e12q2MAAABcNEYyAQAAYDpKJgAAAExHyQQAAIDpKJkAAAAwHSUTAAAApqNkAgAAwHSUTAAAAJiOkgkAAADTUTIBAABgOkqmydLS0tSrVy9FRETIZrNp3rx5JZYnJiYqOjpaVapUUfXq1dWtWzetX7/eorQAAAAVg5Jpsry8PDVv3lxTp04tc/kVV1yhKVOmaOvWrVq9erXq16+v7t2769ChQx5OCgAAUHF4d7nJ4uLiFBcXd8bld9xxR4nPr7zyipKTk7VlyxZ17dq1ouMBAAB4BCOZFiooKNBbb70lh8Oh5s2bWx0HAADANIxkWuDzzz/XbbfdphMnTig8PFzLli1TjRo1rI4FAABgGkYyLdC5c2dlZGRozZo16tGjhwYOHKhff/3V6lgAAACmoWRaoEqVKmrUqJGuu+46JScny9fXV8nJyVbHAgAAMA0l0wsUFxfL5XJZHQMAAMA0XJNpstzcXGVmZro/Z2VlKSMjQyEhIQoNDdXEiRPVu3dvhYeH6/Dhw5o6dar27dunAQMGWJgaAADAXJRMk6Wnp6tz587uz2PHjpUkxcfHa/r06dqxY4fee+89HT58WKGhoWrTpo1WrVqlJk2aWBUZAADAdJRMk8XExMgwjDMunzNnjgfTAAAAWINrMgEAAGA6SiYAAABMR8kEAACA6SiZAAAAMB0lEwAAAKajZAIAAMB0lEwAAACYjudkehGXy1Xi9ZI5OTkWpgEAACg/RjK9SFJSkhwOh3uqV6+e1ZEAAADKhZLpRRISEuR0Ot3T3r17rY4EAABQLpwu9yJ2u112u93qGAAAABeNkUwAAACYjpIJAAAA01EyAQAAYDpKJgAAAExHyQQAAIDpKJkAAAAwHSUTAAAApqNkAgAAwHSUTJOlpaWpV69eioiIkM1m07x589zLCgsL9dhjj+nqq69WlSpVFBERoTvvvFP79++3MDEAAID5KJkmy8vLU/PmzTV16tRSy06cOKHNmzfrqaee0ubNmzVnzhzt3LlTvXv3tiApAABAxeG1kiaLi4tTXFxcmcscDoeWLVtWYt6UKVN07bXXas+ePYqMjPRERAAAgArHSKbFnE6nbDabqlWrZnUUAAAA01AyLZSfn6/HHntMt99+u4KDg62OAwAAYBpKpkUKCws1cOBAGYahadOmWR0HAADAVFyTaYHTBfPnn3/Wl19+ySgmAAD4y2Ek08NOF8xdu3Zp+fLlCg0NtToSAACoIHl5eXr++ed1zTXXqGrVqrLb7apbt65uuOEGJSQkaPfu3Ro6dKhsNtt5TykpKZKkmJgY2Ww2HThwoMxjb9q0ScOGDdPll1+uKlWqKDAwUA0bNtSQIUNK3YicmJgom82m2bNnl9pPTk6O+1iDBg3SqVOnzuu7M5JpstzcXGVmZro/Z2VlKSMjQyEhIQoPD9ctt9yizZs36/PPP1dRUZH7X4yQkBD5+/tbFRsAAJjs+PHj6tChg7Zs2aJGjRpp8ODBCg0N1eHDh7VhwwZNmjRJDRs2VN++fVW/fv0S265cuVKpqanq06ePWrRoUWLZnz//WXFxsR555BG9+uqr8vX1VZcuXdS7d2/5+fnpxx9/1KJFi/TBBx/o2Wef1VNPPXXWfR06dEg9evTQ5s2bNXr0aP373/+WzWY7r+9PyTRZenq6Onfu7P48duxYSVJ8fLwSExO1YMECSaX/Bfnqq68UExPjsZwAAKBivfbaa9qyZYvuuecevfXWW6XKWVZWllwul6Kjo9W3b98SyxITE5Wamqq+fftq6NChF3Tcf/3rX3r11VfVokULffrpp2rYsGGJ5SdPntSUKVN05MiRs+5n7969uvHGG7Vz504988wzSkxMvKAclEyTxcTEyDCMMy4/2zIAAPDXsXbtWknSyJEjyxz9a9CggenHzMzM1AsvvKDQ0FAtWbJEtWrVKrVOYGCgxo0bJ5fLdcb97NixQ927d9cvv/yi119/XaNGjbrgLFyTCQAAUAFO33fxww8/eOyYKSkpKioq0r333ltmwfwju91e5vz09HTdcMMNys7O1syZM8tVMCVKJgAAQIUYMGCAJOmee+7RI488ov/973/nPEV9sb7++mtJUpcuXcq1/cqVK9WlSxfl5eVp/vz5GjRoULmzUDIBAAAqQO/evfXyyy/LMAy9/PLLio2NVY0aNdSoUSONGjVKu3btMv2Yp28orlu3brm2f/PNN3X8+HG9+uqr+sc//nFRWSiZAAAAFWTs2LHav3+/Pv74Y40ZM0YdOnTQnj17NHXqVDVr1sx9Q7C36NatmyTpiSee0DfffHNR+6JkAgAAVKCgoCANGDBAr776qlatWqVDhw7p/vvvV35+voYNG6aCggLTjlW7dm1J0r59+8q1/bBhw/TGG2/o6NGj6tq1qzZt2lTuLJRMAAAAD3I4HJoyZYqioqJ0+PBhbd261bR9t2/fXpK0YsWKcu/jvvvu0/Tp03Xs2DF169ZNGzduLNd+KJkAAAAeZrPZVKVKFdP3O3ToUPn4+Oitt97SoUOHzrru2R5hNHz4cL399tvKycnRjTfeqPXr119wFp6T6UVcLleJX7jT6fx9ft5xqyIBAC5xp/8O4TnNnvfmm2/qmmuuUZs2bUotmzdvnrZv365q1aqpadOmph2zUaNGevTRR5WUlKS4uDh98sknpZ7HmZ+frzfeeEOHDh1SUlLSGfc1bNgw+fj4aNiwYerevbuWLFmidu3anXcWSqYXSUpK0vjx40vNnxTX3II0AIC/kuPHj8vhcFgd429l8eLFGjFihBo1aqT27dsrIiJCeXl5+uabb7Rq1SpVqlRJb7zxxhmfV1lezz33nPLz8/Xqq6+qcePG6tKli5o2bSo/Pz9lZWVp+fLlOnLkiJ577rlz7uv0yOhdd92l2NhYLV682H1K/lwomV4kISHB/RpKSTp27JiioqK0Z88er/2DIScnR/Xq1dPevXsVHBxsdZxSyHdxvD2f5P0ZyXdxvD2f5P0ZDcPQ8ePHFRERYXWUv53Jkyerffv2WrZsmdLS0pSdnS1JqlOnjuLj4zV69Gi1atXK9ONWqlRJr7zyiu644w5NmzZNaWlpSktLU3FxscLDwxUbG6u77rrLfSf5uQwZMkS+vr4aMmSIevTooUWLFqljx47n3M5mMH7utXJycuRwOOR0Or3yDy7J+zOS7+J4ez7J+zOS7+J4ez7p0sjoLdw/qyFdFex/6Y9z5RSckmPmCn73Z8CNPwAAADAdJRMAAACm80lMTEy0OgTOzMfHRzExMfL19d7TCt6ekXwXx9vzSd6fkXwXx9vzSZdGRm/gcrk0adIkJTS/THafS3+cy1VUrElbspSQkGD6zTt/BZf+b/gvzG63KzEx0bR/cdeuXSsfHx/17NnTlP1J5mRMSkpSmzZtFBQUpLCwMPXt21c7d+70mnySNG3aNDVr1kzBwcEKDg5Wu3bttHjxYq/J92eTJk2SzWbTmDFjLmo/ZuVLTEyUzWYrMUVHR1/UPs3OuG/fPg0ePFihoaEKDAzU1VdfrfT0dK/JV79+/VI/Q5vNppEjR3pFvqKiIj311FNq0KCBAgMD1bBhQ02YMOGiH5tj5n8jx48f15gxYxQVFaXAwEBdf/315X7ItBkZ09LS1KtXL0VERMhms2nevHkllhuGoaefflrh4eEKDAxUt27dKuRd10BFoWT+jSQnJ2v06NFKS0vT/v37rY7jlpqaqpEjR2rdunVatmyZCgsL1b17d+Xl5Vkdza1u3bqaNGmSNm3apPT0dHXp0kV9+vTRd999Z3W0UjZu3Kg333xTzZo1szpKCU2aNFF2drZ7Wr16tdWR3I4ePar27dvLz89Pixcv1vfff6+XX35Z1atXtzqa28aNG0v8/JYtWyZJGjBggMXJfjd58mRNmzZNU6ZM0fbt2zV58mS98MILev31162O5nbPPfdo2bJlmjlzprZu3aru3burW7du5X793sXKy8tT8+bNNXXq1DKXv/DCC/rPf/6j6dOna/369apSpYpiY2OVn5/v4aRA+XB3+d9Ebm6uwsPDlZ6ermeeeUbNmjXTE088YXWsMh06dEhhYWFKTU09r0ckWCUkJEQvvviihg0bZnUUt9zcXF1zzTV644039Nxzz6lFixZ67bXXrI6lxMREzZs3TxkZGVZHKdPjjz+ur7/+WqtWrbI6ynkbM2aMPv/8c+3atUs2m83qOLrppptUq1YtJScnu+f1799fgYGB+uCDDyxM9ruTJ08qKChI8+fPL3E2p1WrVoqLizuv5wVWJJvNprlz56pv376Sfh/FjIiI0MMPP6xHHnlE0u8v6KhVq5ZSUlJ02223WRm33Li7/O+Fkcy/iY8//ljR0dFq3LixBg8erHfffddr3/5w+k1HISEhFicpW1FRkWbPnq28vLwLevOBJ4wcOVI9e/Y872efedKuXbsUERGhyy67TIMGDdKePXusjuS2YMECtW7dWgMGDFBYWJhatmypt99+2+pYZ1RQUKAPPvhAd999t1cUTEm6/vrrtWLFCv3www+SpG+//VarV69WXFycxcl+d+rUKRUVFSkgIKDE/MDAQK8aVT8tKytLBw4cKPHfssPhUNu2bbV27VoLkwHn79L/3wicl+TkZA0ePFiS1KNHDzmdTqWmpiomJsbaYH9SXFysMWPGqH379qa+ZssMW7duVbt27ZSfn6+qVatq7ty5uuqqq6yO5TZ79mxt3rzZlGvMzNa2bVulpKSocePGys7O1vjx43XDDTdo27ZtCgoKsjqefvzxR02bNk1jx47VE088oY0bN+qBBx6Qv7+/4uPjrY5Xyrx583Ts2DENHTrU6ihujz/+uHJychQdHS0fHx8VFRVp4sSJGjRokNXRJElBQUFq166dJkyYoCuvvFK1atXSRx99pLVr16pRo0ZWxyvlwIEDkqRatWqVmF+rVi33MsDbUTL/Bnbu3KkNGzZo7ty5kiRfX1/deuutSk5O9rqSOXLkSG3bts0rRxYaN26sjIwMOZ1Offrpp4qPj1dqaqpXFM29e/fqwQcf1LJly0qN1HiDP45mNWvWTG3btlVUVJQ+/vhjr7jcoLi4WK1bt9bzzz8vSWrZsqW2bdum6dOne2XJTE5OVlxcnFe9weXjjz/Whx9+qFmzZqlJkybKyMjQmDFjFBER4TU/w5kzZ+ruu+9WnTp15OPjo2uuuUa33367Nm3aZHU04C+Jkvk3kJycrFOnTpX4C8kwDNntdk2ZMsVrXlk5atQoff7550pLS1PdunWtjlOKv7+/e8SjVatW2rhxo/7973/rzTfftDiZtGnTJv3666+65ppr3POKioqUlpamKVOmyOVyycfHx8KEJVWrVk1XXHGFMjMzrY4iSQoPDy/1PwtXXnmlPvvsM4sSndnPP/+s5cuXa86cOVZHKWHcuHF6/PHH3dcKXn311fr555+VlJTkNSWzYcOGSk1NVV5ennJychQeHq5bb71Vl112mdXRSqldu7Yk6eDBgwoPD3fPP3jwoFq0aGFVLOCCcE3mX9ypU6f0/vvv6+WXX1ZGRoZ7+vbbbxUREaGPPvrI6ogyDEOjRo3S3Llz9eWXX6pBgwZWRzovxcXFcrlcVseQJHXt2lVbt24t8Ttu3bq1Bg0apIyMDK8qmNLvNyjt3r27xF+eVmrfvn2px2b98MMPioqKsijRmc2YMUNhYWGmPorMDCdOnFClSiX/SvHx8VFxcbFFic6sSpUqCg8P19GjR7V06VL16dPH6kilNGjQQLVr19aKFSvc83JycrR+/XqvuxYcOBNGMv/iPv/8cx09elTDhg0rNWLZv39/JScna8SIERal+93IkSM1a9YszZ8/X0FBQe7rjRwOhwIDAy3NdlpCQoLi4uIUGRmp48ePa9asWVq5cqWWLl1qdTRJv19v9udrWKtUqaLQ0FCvuLb1kUceUa9evRQVFaX9+/frmWeekY+Pj26//Xaro0mSHnroIV1//fV6/vnnNXDgQG3YsEFvvfWW3nrrLaujlVBcXKwZM2YoPj7e6x763atXL02cOFGRkZFq0qSJvvnmG73yyiu6++67rY7mtnTpUhmGocaNGyszM1Pjxo1TdHS07rrrLkvy5ObmlhjNz8rKUkZGhkJCQhQZGakxY8boueee0+WXX64GDRroqaeeUkREhPsOdMDbedefUjBdcnKyunXrVuYp8f79++uFF17Qli1bLH2m4rRp0ySp1PWhM2bM8JobG3799Vfdeeedys7OlsPhULNmzbR06VLdeOONVke7JPzyyy+6/fbbdeTIEdWsWVMdOnTQunXrVLNmTaujSZLatGmjuXPnKiEhQc8++6waNGig1157zWtuWjlt+fLl2rNnj1cVt9Nef/11PfXUU7r//vv166+/KiIiQvfee6+efvppq6O5OZ1OJSQk6JdfflFISIj69++viRMnys/Pz5I86enp6ty5s/vz2LFjJUnx8fFKSUnRo48+qry8PA0fPlzHjh1Thw4dtGTJEq+87hooC8/JBAAAHsFzMv9euCYTAAAApqNkAgAAwHSUTAAAAJiOkgkAAADTUTIBAABgOkomAAAATEfJBAAAgOkomQAAADAdJRPABUtJSZHNZlNKSorVUc4qMTFRNptNK1eutDpKmX766SfZbDavebMVAJiJkgl4qdMF5I9T5cqVFRERoa5du+rpp5/W7t27rY4JAECZLv13OgF/cQ0bNtTgwYMlSS6XS7/++qs2bNigCRMm6Pnnn9ejjz6qiRMnymazWZwUF6pOnTravn27HA6H1VEAwHSUTMDLNWrUSImJiaXmr169WkOGDFFSUpJ8fHw0YcIEz4fDRfHz81N0dLTVMQCgQnC6HLhEdejQQUuWLJHdbtcLL7ygvXv3llpn/vz56tq1q6pXr66AgAA1bdpUL730koqKikqs53Q6NXnyZHXq1EkRERHy9/dXRESE7rzzzgs+JZ+VlaV77rlHkZGRstvtCg8P19ChQ/Xzzz+XWtdmsykmJkb79u3TnXfeqdq1a6tSpUrnvIbS6XTq6aef1lVXXaWqVasqODhYjRo1Unx8fJnHkaRZs2apRYsWCgwMVHh4uB588EGdPHmyzHVnzJihtm3bqmrVqqpataratm1b6vrTo0ePysfHRzfddFOJ+RkZGe7LGzIzM0ssi4mJUWBgoFwul6QzX5MZExMjm82mwsJCJSYmqn79+rLb7briiiv0xhtvlJn58OHDGj58uMLCwlS5cmW1adNGc+fOvWSunwXw18NIJnAJa9y4sQYOHKiZM2dq3rx5Gj16tHtZQkKCJk2apDp16qhfv35yOBxatWqVxo0bp/Xr1+uTTz5xr7t9+3Y9/fTT6ty5s26++WZVqVJFO3bs0KxZs7Ro0SJt3rxZUVFR58yzfv16xcbGKi8vTzfddJMuv/xy/fTTT/rwww+1ePFirV27VpdddlmJbY4c+X/s3XlcVPXi//H3gGwqDIsSooKa5lrIdUmzEIwrYpmoWXbL3MpbuebNlMrCylCzW4mmZZhWLmnuZZYmSGbuUpkr7guaKwgqoMzvD3/MNy6oqIc5VK/n4zGPR3O2eZ9pe/v5nHPmlFq2bClfX19169ZNFy9elJeX11U/w2azKSoqSuvWrVOrVq3Url07OTk56cCBA1q8eLG6d+9eJOuECRO0bNkydezYUW3atNGyZcs0fvx4nTx5UjNmzCi07cCBA5WQkKCqVauqT58+kqR58+apV69e2rJli95//31Jko+Pj0JCQvTDDz/o8uXLcnZ2liQlJSXZj5WUlKTatWtLki5evKi1a9fqnnvukZub23W/S0l67LHHtH79ekVHR8vZ2Vlz5sxRv3795OLioqefftq+XVZWllq3bq1t27bpnnvuUVhYmA4fPqxu3bopKiqqRJ8FAEajZAJ/cuHh4frss8+0YcMG+7Lly5dr9DW2fZQAACAASURBVOjRioqK0rx581ShQgVJVwrac889p8mTJ2vevHnq0qWLJKl+/fpKT0+Xr69voWMnJSUpMjJSb775pqZMmXLNHHl5eerWrZvy8/O1fv16hYaG2tetXr1a4eHhGjRokJYsWVJov61bt6pXr16aMmWKvahdy9atW7Vu3TrFxMRowYIFhdbl5OQoLy+vyD4rVqzQpk2bVLduXUnSqFGj1LhxY82ePVtvv/22AgMDJUkpKSlKSEhQ/fr19dNPP9mvlYyLi1OLFi00fvx4Pfzww7rvvvskSREREdqyZYs2bdqk5s2b27+zO+64QxcuXFBSUpK9DK5Zs0Y5OTmKiIi47jkWOHz4sLZu3Wov3YMGDVKjRo30zjvvFCqZY8aM0bZt29S3b199+OGH9uU9e/ZUZGRkiT8PAIzEdDnwJ1dQkE6ePGlfNmHCBEnSRx99ZC+Y0pXp6dGjR8tisWjWrFn25VartUjBlK6UqIYNG2rFihXXzfHVV19p//79Gjp0aKGCKV2Z2u/YsaOWLl2qzMzMQutcXV01duzYEhXMP/Lw8CiyzM3NTRUrViyyfNCgQfaCWbDvY489pvz8fG3atMm+fPr06ZKulMo/3ozj4+Oj1157TZIKTTsXFMaVK1dKki5fvqyUlBRFREQoIiKiyKimdOUPBSUVHx9faFS3bt26atWqlXbu3Klz587Zl3/++edydXXV66+/Xmj/+++/X23bti3x5wGAkRjJBP6C1q5dqwoVKmjq1KnFrvfw8NCOHTsKLUtOTtZ7772ndevW6eTJk7p06ZJ9naura4k+U5J27txZ7I1Kx44dU35+vnbt2qWmTZval9esWVOVKlUqyWlJujLqetddd2nWrFk6fPiwYmJiFB4ersaNG8vJqfg/Nzdp0qTIsmrVqkmSzp49a1+2ZcsWScUXwYJCmZqaal8WFhYmZ2dnJSUlafjw4dqyZYsyMjLUpk0bnT9/Xp9++qm2b9+u+vXrKykpSR4eHrr77rtLfK7Xy+3p6anMzEzt379fDRo00G233VZk+1atWum7774r8WcCgFEomcCf3NGjRyVJlStXti87ffq0Ll26pJEjR151v+zsbPtfz507V48++qgqVqyoqKgo1ahRQ+XLl7ffMHK1m2n+6PTp05JU5BrHa32upGKL0bWUK1dOK1euVFxcnObNm6f//Oc/kq6cf//+/fXyyy8XGRUt7hrPcuWu/OfvjzdBZWZmysnJqdB3+cecFoul0Eisl5eX/vGPf+jHH39UXl6ekpKSZLFYFBERofPnz0u6MoIZHBys9evXq3Xr1iUq7DeSuyCPv79/sce40e8XAIxCyQT+5AruxG7WrJl9mZeXlywWS6Ep9GuJi4uTu7u7Nm3apDp16hRaN3v27BIdo6AQLVmypMgd19dyM8/39PPzU0JCgsaPH68dO3Zo5cqVSkhI0GuvvSYXFxfFxsbe8DGlK+eQn5+vEydOFCltv//+u2w2W5HiFxERoQ0bNmj9+vVKTk5Ww4YN7SW1Zs2aSkpKUp06dZSXl3dD12PeSOaCfMU5fvy44Z8JACXBNZnAn9iuXbs0Z84cubm5qVOnTvbld999t06dOqXdu3eX6Dh79uxR/fr1ixTM9PR07d27t0THKJgG/umnn0qY/tZZLBbVr19f/fr10/LlyyVJixcvvunjFVxLWtwjlAqWNW7cuNDyguL43Xff6YcfflCbNm3s69q0aaPk5GT7NZs3cj1mSXl5ealGjRpKS0srtmiuWbPG8M8EgJKgZAJ/Uj/++KOioqKUk5Oj4cOHq2rVqvZ1AwcOlCT17t1bp06dKrLvsWPHtH37dvv74OBgpaWlFRr1unjxop599tli79YuTseOHRUUFKT//ve/SklJKbI+Ly9Pq1evLvH5Xc3+/fu1f//+IssLsru7u9/0sXv06CFJGjlyZKFp8YyMDPulBwXbFLj33ntVrlw5TZo0SefOnStUMiMiInTy5EklJiaqQoUKhUabjfT4448rNzfXfnNSgeTkZH377bel8pkAcD1MlwNlXFpamv1GmtzcXPvPSv76669ydnbWK6+8UqRctGvXTiNGjNAbb7yh2rVrq127dgoODtapU6eUlpamH374QW+++abq168vSRowYIAGDBig0NBQPfzww7p06ZKWL18um82mkJAQ/fzzz9fN6ebmpi+//FLR0dFq3bq12rRpozvvvFMWi0UHDhzQDz/8ID8/vyI3HN2o1NRUde7cWc2bN1eDBg0UEBCgI0eOaOHChXJyctLzzz9/08cOCwvTgAEDlJCQoEaNGqlLly6y2WyaN2+eDh8+rIEDByosLKzQPhUrVlSzZs30008/ycnJSa1bt7avKxjlPHHihKKiouTi4nLT2a5l2LBhmjdvniZPnqytW7fqvvvu0+HDhzVnzhx16NBBS5YsuepNUQBQWiiZQBm3Z88e+yiah4eHvL29Va9ePY0YMUI9evTQ7bffXux+r7/+usLCwjR+/Hh9//33Onv2rPz8/FSzZk3FxcXp8ccft29b8IDvhIQETZkyRd7e3nrggQcUHx+vrl27ljhrs2bN9PPPP+vtt9/W0qVL9eOPP8rNzU1Vq1ZVTEyMHnvssVv7MiQ1bdpUw4YNU3Jysr7++mudPXtWAQEBioyM1NChQ9WiRYtbOv748eMVGhqqSZMm6aOPPpIkNWzYUK+//rp69epV7D4RERH66aefFBoaKm9vb/vywMBA3XHHHdq1a1epTJUX8PT0VEpKimJjY7Vo0SJt3LhRDRs21KxZs7R3714tWbLkmg+4B4DSYLHZbDazQwAASscTTzyhGTNmaNu2bfaRa8AsmZmZslqtyuh+v7xc//zjXJm5l2T97HtlZGTwB7liMH8CAH8B6enpRZatWrVKs2fPVt26dSmYABzuz//HCACA2rdvLw8PDzVu3FgVKlTQtm3btGzZMjk7OyshIcHseAD+hiiZAPAX0KNHD82YMUOzZ8/WuXPn5O3trQ4dOig2NvaGfmUIAIzCNZkAAMAhuCbz74VrMgEAAGA4SiYAAAAMR8kEAACA4SiZAAAAMBwlEwAAAIajZAIAAMBwlEwAAAAYjpIJAAAAw1EyAQAAYDhKJgAAAAxHyQQAAIDhKJkAAAAwHCUTAAAAhqNkAgAAwHCUTAAAABiOkgkAAADDUTIBAABgOEomAAAADEfJBAAAgOEomQAAADAcJRPAn1pcXJwsFovCw8NN2R8AUDxKJuAgBWXGYrGofPnyOnr06FW33b9/v33b5ORkx4UEAMAglEzABBcuXNDIkSPNjgEAQKmhZAImmTp1qnbt2mV2DAAASgUlE3Cw6tWr66677tKlS5f00ksvmR0HAIBSQckEHMzJyUnx8fGSpHnz5mn9+vU3dZzLly9r6tSpatOmjSpVqiQ3NzdVrVpVXbt2vaXrOGvUqCGLxaJp06ZddZuePXvKYrGoZ8+eRdZdunRJH330kcLDw1WpUiW5uLjIz89PdevW1aOPPqrExMQi+5w5c0aJiYl65JFHdOedd8rX11fu7u4KDg7Wv/71L61du7bE+efMmaPWrVvL19dXFSpUUJMmTTRhwgRdvny5xMf4XydOnNArr7yi0NBQWa1Wubu7q1atWurTp49+++23mz7u2rVrNWzYMN13330KDg6Wu7u7vL291aJFC40ZM0ZZWVnXPca6devUq1cv1a5dW+XLl5eXl5caNGig3r1769tvvy12n/z8fM2YMUPt27fXbbfdJldXV1WuXFlt27bVrFmzZLPZbvqcAKBAObMDAH9H7du3V+vWrbVq1SoNHz5cK1euvKH9MzIyFBMTYy+Tzs7O8vT0VHp6ur788kt9+eWXeuGFF/T222+XQvqru3z5stq3b6/ly5fbl1mtVmVnZ+v06dPatWuX5syZoz59+hTa7/3337dfo+rs7CwvLy9J0sGDB3Xw4EHNnj1b7733ngYOHHjNzx82bJjGjh0ri8Uib29vXbx4UZs3b9bmzZv11VdfadGiRXJzc7uhc1qxYoW6du2qs2fPSpJcXFzk6uqqffv2ad++ffr88881ZcoUPfnkkzd0XElq2bKl/a/Lly+v8uXL68yZM1q3bp3WrVunTz/9VElJSfL39y+y7+XLlzVkyBCNHz/evqxChQoqV66cduzYoe3bt2v+/Pn23AVOnz6tTp06KSUlxb7MarXq5MmTWr58uZYvX67Zs2dr7ty5cnV1veFzAoACjGQCJhk9erQkKSkpScuWLbuhffv06aPk5GS5urpq/PjxyszM1JkzZ3T06FH17t1bkjRu3DhNnjzZ8NzXMmvWLC1fvlzu7u76+OOPde7cOZ09e1YXLlzQ8ePHNX/+fD388MNF9gsMDNRrr72mjRs36vz58zp9+rQuXLigvXv3atCgQZKkIUOGaMuWLVf97NTUVI0dO1b9+/fX8ePHdfr0aZ05c0ZvvPGGLBaLvv32W8XGxt7Q+fz666966KGHdPbsWT399NPatm2bLly4oKysLB04cEDPPfeccnNz1adPH23cuPHGvixJHTp00BdffKH09HR7ET9//rzmz5+vunXratu2bXrmmWeK3fell16yF8zevXtr586dysrKsp/3woUL1a5du0L7XL58WZ07d1ZKSooaN26sJUuWKDs7W2fPnlVWVpamT58uf39/LV68WMOGDbvh8wGAP6JkAiZp0aKFOnXqJEmKjY0t8RTlunXrNG/ePElSQkKCBgwYoPLly0uSAgIClJiYqC5dukiSRowYoYsXL5ZC+uKtWbNGkvTkk0+qT58+qlixoiTJYrHI399fnTp10ty5c4vs17dvX8XFxalJkyb20TOLxaKaNWvqvffe03PPPafLly9r4sSJV/3sjIwMde/eXQkJCapcubIkycvLS6+88opefvllSVe+r2s9Oup/DR48WBcuXFBsbKw++ugj1a9fX87OzpKkoKAgTZw4UQMHDtSlS5f05ptvlvi4BRYvXqxHHnlEAQEB9mUeHh7q1KmTvv/+e7m5uWnhwoU6ePBgof127dqlcePGSZJefPFFJSYm6o477rCvt1qt6tixo2bPnl1ov5kzZ2rVqlWqV6+ekpOT9eCDD9r/2alQoYKefPJJLV26VBaLRR988IF+//33Gz4nAChAyQRM9NZbb8nZ2VmpqamaNWtWifb54osvJEnVqlXTU089Vew2b7zxhiTZp0AdxdvbW5J07NgxQ4/7wAMPSJJWr159ze1effXVYpcPHTpUHh4eunTpkr2gX8/+/fu1cuVKlStXTi+88MJVtyuYJl+xYsUtXff5v6pWraqQkBDZbDZ7eS8wffp05efny8/P74YehVVwPeyzzz4rq9Va7DZNmjRRw4YNlZubq6SkpJs/AQB/e5RMwET16tVTr169JF0ZdczLy7vuPgXTshEREXJyKv5f4fr166tq1aqFtneE9u3by2KxaPHixYqOjtasWbNKPHK4d+9evfDCC2rSpIm8vb3l7OxsfyB9+/btJUmHDx++6v7Vq1dX7dq1i13n5eWlJk2aSCr59/Hjjz9KunKTTIMGDRQQEFDsq2BKOjs7W6dOnSrRsQvk5+dr5syZeuihhxQUFCQPDw/7OVssFvtNYf973gWl85///Kfc3d1L9FmXL1+230AVFxd31fMJCAjQzp07JUkHDhy4ofMBgD/ixh/AZHFxcZoxY4b27t2ryZMna8CAAdfcvmAKs6BEXk21atV05MgRh0553nvvvRozZoxeeeUVLVu2zH6tabVq1RQZGaknn3xSERERRfZbsGCBHnvsMeXk5NiXeXl5yd3dXRaLRbm5uTpz5oyys7Ov+tnX+z4K1pf0+ygox/n5+Tp+/HiJ9jl//nyJtivY9sEHHyw0Wujq6ipfX1+5uLhIunKTTl5eXpHzLhgpDg4OLvHnnT592v79njlzpsQZAeBmMZIJmKxq1ar2Yvnmm2+W6LE1ZdnQoUO1b98+vfvuu4qJiZG/v78OHz6sadOmqU2bNuratWuhEdtTp06pZ8+eysnJUZs2bZScnKzz588rIyNDx48f17Fjx4q9jrO0FUx933bbbbLZbCV61ahRo8THHzVqlJKSkuTh4aF3331XBw4c0MWLF3Xq1CkdO3ZMx44d09133y1JRa7XtVgsN30+kvTNN9+U6Hzi4uJu+HMAoAAlEygDhg8fLh8fH/3+++965513rrltweNsrjV1/Mf1xT3+5lrKlbsywXGtG4YyMjKueYzAwEANHjxYCxYs0PHjx/XLL7/Yrx/98ssvNWnSJPu2S5cuVWZmpnx8fLRkyRK1bt1aHh4ehY5Xkms8jxw5UqL1Jf0+Cm7GOXny5DVHUG9WwU05r776qgYPHqygoKAi5fFq512Q7Uams/38/Ox/b5kGB+AIlEygDPDx8dHw4cMlSe+8845OnDhx1W2bNm0q6cqjj/Lz84vdZseOHfZS1axZsxvOIkmHDh0qdn1+fv4NX+d55513asqUKWrVqpUkFboZqeBz6tata7/T+X+tWLHiup9x6NAh7dmzp9h1586d06ZNmyT93/d3PQVZL1++rG+++aZE+9yIgvMODQ0tdv3+/fuVlpZW7Lp77rlH0pXvsaRPD3BxcVHz5s0lSUuWLLnRuABwwyiZQBkxYMAAVatWTefOnbPfHV6cbt26SboyMvfxxx8Xu03BXdaVKlVSZGTkDeUICQmRdOU6yeIeqzR9+vSrjqL+8ZrK4hSMUP7xhqWCu5x37dpVbGFKTU3VzJkzS5T9at/bO++8owsXLqhcuXL2xztdT506dRQeHi5Jevnll687env69OkSHbdAwXn//PPPxa4v+ENHcXr27ClnZ2edOnVKr732Wok/s2/fvpKujB4vXbr0mtve6PkAwP/ixh+gjPDw8FBcXJyeeuqpa440NW/eXF26dNG8efM0YMAA5ebmqnfv3ipfvryOHTumESNG2K9hfOONN0p893GBxx57TImJidq+fbv69u2r0aNHy8/PT5mZmUpMTNTw4cPl6+tbbAmJiYlRlSpV1LVrV7Vs2dL+SKPTp0/rgw8+0Pfffy/p/x5JJElt27aVk5OTTp8+rccff1zjx49X1apVlZubq4ULF6p///7y9PS87p3bVqtV06dPl9Vq1YgRI1SpUiWdO3dOCQkJev311yVJ/fr1U2BgYIm/i4SEBLVs2VK7du1SixYtNHr0aEVFRdm/0yNHjmjlypWaNm2aatWqpSlTppT42O3atdPnn3+uN998U7Vr19ZDDz2kcuXKad++fXr55Zc1Z84c+fj4FHuTTu3atTV06FCNHj1aY8eO1alTpzRs2DDVqVNHkpSZmalvvvlGs2fP1oIFC+z7PfHEE/r888+1YsUKderUSSNGjFDv3r3t30l2drbWr1+vL7/8UjNmzCjya0GAUf779BS5V/Q0O8Ytu5h1TvqsltkxyixGMoEypGfPnqpXr951t0tMTFTr1q2Vm5urAQMGyGq1ytfXV4GBgfbRzRdeeOGqvxZzLffff7+6d+8uSfr4449VqVIl+fj4yMfHR0OGDNG///1vdejQodh9L1y4oE8++UTt27eXj4+PrFarrFar/Pz8NGLECNlsNj388MOFnu9Zp04dDR06VJI0f/58VatWTd7e3qpYsaIeffRRVaxYsdBPJ15N48aN9eKLL2r8+PHy9/eXr6+vfHx89PLLL8tmsykyMtL+K0sl1ahRIy1btkwBAQHasWOHYmJiVLFiRVWqVEnly5dXtWrV9OSTT97wz4JKV27yuu2223Tu3Dl16dJFHh4e8vb2Vq1atTRr1iyNGjVKd9111zX379evnyTZH8bu6ekpX19feXt7q1u3bkWec+ns7Kx58+bpwQcfVG5urkaMGKGqVavKarXKx8dHnp6eatOmjT744INSuQ4VwN8LJRMoQ5ydnfXWW29ddzur1arvv/9eiYmJCg8Pl6enp7KyshQQEKAuXbooKSnpln63fNq0aXr//ffVuHFjeXh4KD8/X61atdKcOXOuWfgSEhI0ZswYtW/fXnXq1JHNZtOFCxcUGBiohx56SPPmzdPcuXOLPN9z9OjR+vTTT9W8eXN5eHgoLy9PtWvX1ksvvaQtW7aUePRxzJgxmj17tu69917ZbDa5urqqcePGev/997Vs2bIbHtWVrlybWfALO2FhYfL29tbZs2fl7Oys+vXr64knntCMGTP03nvv3dBxg4ODtXHjRvXp08d+fu7u7nrwwQdL9BOYzs7OmjBhglavXq3HH39cQUFBysvLk81mU4MGDdSnT59iHzzv5eWlJUuWaOnSpXr00UcVFBSknJwcnT9/XlWrVlXbtm0VHx9vf1YmANwsi62kv2UHAABwCzIzM2W1WvVayt6/zHT5yLBaysjIkJeXl9lxyhxGMgEAAGA4SiYAAAAMR8kEAACA4SiZAAAAMBwlEwAAAIbjYexlWH5+vo4ePSpPT88iv2kMAEBJ2Gw2nTt3ToGBgUUeHwaUJkpmGXb06FFVr17d7BgAgL+AQ4cOqVq1ambHwN8IJbMM8/S88gyxQ4+2lpcrf6sAADcuM/eSqn+xyv7/FMBRaC5lWMEUuZdrOUomAOCWcNkVHI2LMwAAAGA4SiYAAAAMR8kEAACA4SiZAAAAMBwlEwAAAIajZAIAAMBwlEwAAAAYjpIJAAAAw1EyAQAAYDhKpoHi4+PVrFkzeXp6yt/fXzExMdq5c2ehbT766COFh4fLy8tLFotFZ8+eNSktAABA6aFkGmjVqlXq16+f1q5dq+XLlysvL09t27ZVdna2fZvz58+rXbt2eumll0xMCgAAULr4QWwDLVu2rND7adOmyd/fX5s2bVJYWJgkafDgwZKk5ORkR8cDAABwGEYyS1FGRoYkydfX1+QkAAAAjkXJLCX5+fkaPHiwWrVqpUaNGpkdBwAAwKGYLi8l/fr109atW7V69WqzowAAADgcJbMU9O/fX1999ZVSUlJUrVo1s+MAAAA4HCXTQDabTQMGDNCCBQuUnJysmjVrmh0JAADAFJRMA/Xr108zZ87UokWL5OnpqWPHjkmSrFarPDw8JEnHjh3TsWPHlJaWJkn69ddf5enpqaCgIG4QAgAAfxnc+GOgSZMmKSMjQ+Hh4apSpYr99cUXX9i3mTx5skJDQ/X0009LksLCwhQaGqrFixebFRsAAMBwjGQayGazXXebuLg4xcXFlX4YAAAAEzGSCQAAAMNRMgEAAGA4SiYAAAAMR8kEAACA4SiZAAAAMBwlEwAAAIajZAIAAMBwPCezDMnJyVFOTo79fWZmpolpAAAAbh4jmWVIfHy8rFar/VW9enWzIwEAANwUSmYZEhsbq4yMDPvr0KFDZkcCAAC4KUyXlyFubm5yc3MzOwYAAMAtYyQTAAAAhqNkAgAAwHCUTAAAABiOkgkAAADDUTIBAABgOEomAAAADEfJBAAAgOEomQAAADAcJdNA8fHxatasmTw9PeXv76+YmBjt3Lmz0DYXL15Uv3795Ofnp4oVK6pLly46fvy4SYkBAABKByXTQKtWrVK/fv20du1aLV++XHl5eWrbtq2ys7Pt2zz//PNasmSJ5s6dq1WrVuno0aPq3LmziakBAACMx89KGmjZsmWF3k+bNk3+/v7atGmTwsLClJGRocTERM2cOVNt2rSRJH3yySeqX7++1q5dqxYtWpgRGwAAwHCMZJaijIwMSZKvr68kadOmTcrLy1NkZKR9m3r16ikoKEg//fSTKRkBAABKAyWzlOTn52vw4MFq1aqVGjVqJEk6duyYXF1d5e3tXWjb2267TceOHTMjJgAAQKlguryU9OvXT1u3btXq1avNjgIAAOBwjGSWgv79++urr75SUlKSqlWrZl8eEBCg3NxcnT17ttD2x48fV0BAgKNjAgAAlBpKpoFsNpv69++vBQsWaOXKlapZs2ah9U2aNJGLi4u+//57+7KdO3fq4MGDatmypaPjAgAAlBqmyw3Ur18/zZw5U4sWLZKnp6f9Okur1SoPDw9ZrVb16dNHQ4YMka+vr7y8vDRgwAC1bNmSO8sBAMBfCiXTQJMmTZIkhYeHF1r+ySefqGfPnpKkd999V05OTurSpYtycnIUFRWlDz74wMFJAQAAShcl00A2m+2627i7u2vixImaOHGiAxIBAACYg2syAQAAYDhKJgAAAAxHyQQAAIDhKJkAAAAwHCUTAAAAhqNkAgAAwHCUTAAAABiO52SWITk5OcrJybG/z8zMNDENAADAzWMkswyJj4+X1Wq1v6pXr252JAAAgJtCySxDYmNjlZGRYX8dOnTI7EgAAAA3henyMsTNzU1ubm5mxwAAALhljGQCAADAcJRMAAAAGI6SCQAAAMNRMgEAAGA4SiYAAAAMR8kEAACA4SiZAAAAMBwlEwAAoBTs379fFotF7dq1u+o2ycnJslgseuaZZ+zLevbsKYvForVr10qSatSoIYvFUuLX/v37S/vUSoSHsQMAAJRhgwcP1tmzZwstmzZtmg4cOKBBgwbJ29u70Lr/fW8WSqbBUlJS9Pbbb2vTpk1KT0/XggULFBMTY19//PhxDRs2TN99953Onj2rsLAwJSQkqE6dOiamBgAAZdXgwYOLLEtOTtaBAwc0ePBg1ahRw/GhSoDpcoNlZ2crJCREEydOLLLOZrMpJiZGe/fu1aJFi7RlyxYFBwcrMjJS2dnZJqQFAAAoHYxkGiw6OlrR0dHFrtu9e7fWrl2rrVu3qmHDhpKkSZMmKSAgQLNmzdJTTz3lyKgAAAClhpLpQDk5OZIkd3d3+zInJye5ublp9erVlEwAAP6C0tLSFBcXV+y6snKTTmmgZDpQvXr1FBQUpNjYWH344YeqUKGC3n33XR0+fFjp6elmxwMAAKVgz549GjlypNkxHI5rMh3IxcVF8+fP165du+Tr66vy5csrKSlJ0dHRcnLibwUAAH9FUVFRstlscfw+1QAAIABJREFUxb6SkpLMjldqGMl0sCZNmig1NVUZGRnKzc1V5cqVdffdd6tp06ZmRwMAADAMw2cmsVqtqly5snbv3q2NGzeqY8eOZkcCAAAwDCOZBsvKylJaWpr9/b59+5SamipfX18FBQVp7ty5qly5soKCgvTrr79q0KBBiomJUdu2bU1MDQAAYCxKpsE2btyoiIgI+/shQ4ZIknr06KFp06YpPT1dQ4YM0fHjx1WlShU9+eSTGjFihFlxAQAASgUl02Dh4eGy2WxXXT9w4EANHDjQgYkAAAAcj2syAQAAYDhGMgEAAEpBjRo1rjm7KRU/Azpt2jRNmzbtmvslJyffYrrSx0gmAAAADEfJBAAAgOEomQAAADAcJRMAAACGo2QCAADAcNxdXobk5OQoJyfH/j4zM9PENAAAADePkcwyJD4+Xlar1f6qXr262ZEAAABuCiWzDImNjVVGRob9dejQIbMjAQAA3BSmy8sQNzc3ubm5mR0DAADgljGSCQAAAMNRMgEAAGA4SiYAAAAMR8kEAACA4SiZAAAAMBwlEwAAAIajZAIAAMBwlEwAAAAYjpJpsJSUFHXo0EGBgYGyWCxauHBhofVZWVnq37+/qlWrJg8PDzVo0ECTJ082KS0AAEDpoGQaLDs7WyEhIZo4cWKx64cMGaJly5bp888/1/bt2zV48GD1799fixcvdnBSAACA0sPPShosOjpa0dHRV12/Zs0a9ejRQ+Hh4ZKkvn376sMPP9T69ev10EMPOSglAABA6WIk08HuueceLV68WEeOHJHNZlNSUpJ27dqltm3bmh0NAADAMIxkOlhCQoL69u2ratWqqVy5cnJyctKUKVMUFhZmdjQAAADDUDIdLCEhQWvXrtXixYsVHByslJQU9evXT4GBgYqMjDQ7HgAAgCEomQ504cIFvfTSS1qwYIEeeOABSdJdd92l1NRUjRs3jpIJAAD+Mrgm04Hy8vKUl5cnJ6fCX7uzs7Py8/NNSgUAAGA8RjINlpWVpbS0NPv7ffv2KTU1Vb6+vgoKClLr1q01dOhQeXh4KDg4WKtWrdKnn36q//73vyamBgAAMFapjmTGxcXJYrEoOTm5ND+mTNm4caNCQ0MVGhoq6cpzMUNDQ/Xqq69KkmbPnq1mzZrp8ccfV4MGDTR69GiNGjVKzzzzjJmxAQAADHXDI5n79+9XzZo1Cy1zcXHRbbfdpvvuu0/Dhw/XXXfdZVjAP5vw8HDZbLarrg8ICNAnn3ziwEQAAACOd9PT5bfffrueeOIJSVemiNeuXatZs2Zp/vz5+v7779WqVSvDQgIAAODP5aZLZu3atRUXF1do2SuvvKJRo0bp5Zdf/ltNkQMAAKAwQ6/JHDBggCRpw4YN19xu6tSp6tixo2rUqCF3d3f5+voqKipKSUlJhbZbsWKFLBaLnnvuuWKPs2fPHjk5OSkqKqrQ8nPnzum1115Tw4YN5eHhIW9vb0VFRWn16tVFjhEeHi6LxaKLFy/qlVde0e233y4XF5ciBRoAAAAlVyp3l1sslmuu79evn0JCQhQZGanKlSvryJEjWrhwoSIjIzV//nx17NhRknT//ffr9ttv18yZMzVu3DiVL1++0HE+/vhj2Ww2Pf300/Zlp0+fVlhYmH777Te1atVKzzzzjDIzM7Vo0SJFRERo7ty5iomJKZKpS5cu+vnnn9WuXTt5e3sXue4UAAAAJWdoyfzggw8kSc2bN7/mdtu2bStS4tLT09W0aVMNHTrUXjItFov69u2rYcOGae7cuerRo4d9+0uXLmn69Ony9/e3by9dGU397bffNGXKFD311FP25fHx8WratKn69u2rdu3ayd3dvdDnHz16VL/88ot8fX1v7uQBAABgd9PT5WlpaYqLi1NcXJyGDh2qsLAwvf7663J3d9eoUaOuuW9xo4RVqlRRly5dtHv3bh04cMC+vFevXnJ1ddXHH39caPuvv/5a6enp6tGjh1xcXCRJJ0+e1BdffKE2bdoUKpiS5O/vr6FDh+rEiRNasWJFkc8fOXIkBRMAAMAgNz2SuWfPHo0cOVLS/z3C6F//+peGDx+uO++885r77t27V/Hx8Vq5cqWOHDminJycQuuPHj2q4OBgSVLlypXVuXNnzZ49Wzt27FC9evUkyV46/1gmN2zYoMuXLysnJ6fYayp3794tSdqxY4cefPDBQuuuN/oKAACAkrvpkhkVFaVly5bd8H5paWlq3ry5MjMzFRERoQ4dOsjLy0tOTk5KTk7WqlWripTOf//735o9e7Y+/vhjjRs3TkePHtU333yj1q1b64477rBvd/r0aUnSjz/+qB9//PGqGbKzs4ssu+222274XIyWk5NT6NwzMzNNTAMAAHDzHP6zku+++67OnDmjzz77zP6czQLPPPOMVq1aVWSf8PBw1atXT59++qneeustffLJJ7p8+XKhG34kycvLS5L0n//8R+PGjbuhXNe7WckR4uPj7aPDAAAAf2al+rOSxdmzZ48kFbpZR5JsNts1Rx/79u2rEydOaOHChZo6dap8fHzUpUuXQts0a9ZMFotFP/30k/HBHSA2NlYZGRn216FDh8yOBAAAcFMcXjILrrX832dWjh49Wlu3br3qfj169JC7u7uef/557d27V927dy9yh3hAQIAeeeQRrVmzRm+//XaxP++4bt06nT9/3oAzMZ6bm5u8vLwKvQAAAP6MHD5d/swzz+iTTz5Rly5d9Mgjj8jPz09r167V5s2b9cADD+jrr78udj9fX1917dpVn332mSQVmSov8MEHH2jnzp168cUX9dlnn6lly5by9vbWoUOHtHHjRu3evVvp6elFnrkJAAAA4zh8JDM0NFTfffed/vGPf2j+/PmaOnWqvL299eOPP6pp06bX3LfgOZktWrRQo0aNit3G19dXa9as0dixY+Xq6qoZM2YoISFBa9euVcOGDfXpp5+qUqVKhp8XAAAA/o/FVtycchk1btw4DR06VImJierdu7fZcUpdZmamrFarMrrfLy9Xhw86AwD+AjJzL8n62ffKyMgw/TKsgv+vvZayV+4VPU3NYoSLWec0MqxWmfhuyyKHj2TerIsXL2rChAny8fFRt27dzI4DAACAayjzw2OrV6/WqlWr9O233+rAgQOKj4/nekoAAIAyrsyXzBUrVmjkyJGqVKmSnn/+eb3wwgtmRwIAAMB1lPmSWfD76AAAAPjz+NNckwkAAIA/D0qmwVJSUtShQwcFBgbKYrFo4cKFhdZbLJZiX2+//bZJiQEAAIxHyTRYdna2QkJCNHHixGLXp6enF3pNnTpVFoulyE9kAgAA/JmV+Wsy/2yio6MVHR191fUBAQGF3i9atEgRERGqVatWaUcDAABwGEqmiY4fP66vv/5a06dPNzsKAACAoZguN9H06dPl6empzp07mx0FAADAUKVWMufNm6fWrVvL399f7u7uCgwMVGRkpObNmydJSk5OlsViKfbxRPv375fFYlHPnj0LLa9Ro4Zq1KihjIwMPfvss6pSpYoqVKigsLAwbd68WZJ09OhRPfHEE/L395eHh4fatm2r3bt3l9Zp3pKpU6fq8ccfl7u7u9lRAAAADFUq0+WTJk3Sc889pypVqqhTp07y8/PTsWPHtH79ei1YsOCWbnLJzc3VP//5T128eFGPPvqojh8/rjlz5igyMlJr1qxRVFSUqlSpoieeeEJpaWlasmSJHnjgAW3fvl3Ozs4GnuWt+eGHH7Rz50598cUXZkcBAAAwXKmUzI8//liurq5KTU2Vv79/oXWnTp26pWOnp6fr3nvv1cyZM1Wu3JX4oaGhGjZsmFq2bKlevXrpnXfekcVikSQ999xzmjRpkhYtWlSmpqUTExPVpEkThYSEmB0FAADAcKU2Xe7i4iIXF5ciy/38/G752OPGjbMXTEl67LHHJEmXLl3Sm2++aS+Yf1z3888/3/LnlkRWVpZSU1OVmpoqSdq3b59SU1N18OBB+zaZmZmaO3eunnrqKYdkAgAAcLRSKZndunVTdna2GjVqpKFDh2rp0qXKzMw05Ng+Pj4KCgoqtKxKlSqSpDp16qh8+fLFrjt69Kghn389GzduVGhoqEJDQyVJQ4YMUWhoqF599VX7NrNnz5bNZrMXYAAAgL+aUimZL7zwghITExUYGKh33nlHDzzwgPz8/BQTE6N9+/bd0rG9vLyKLCsY1bzWury8vFv63JIKDw+XzWYr8po2bZp9m759++r8+fOyWq0OyQQAAOBopVIyLRaLevfurQ0bNujEiRNasGCBOnfurEWLFunBBx/U5cuX5eR05aMvXbpUZP+MjIzSiAUAAAAHKfWHsReMYMbExOjkyZNauXKl0tLS5OPjI0k6cuRIkX22bNlS2rEAAABQikplJDM5OVk2m63Qsry8PJ0+fVqS5O7urrp168rT01OLFy+2L5eu/ArOm2++WRqxAAAA4CClMpIZExMjLy8vtWjRQsHBwcrLy9Py5cu1bds2PfzwwwoODpYkDRgwQG+99Zb+8Y9/qGPHjjp37pyWLFmi1q1ba8+ePaURDQAAAA5QKiUzPj5ey5Yt0/r167VkyRJVqFBBt99+uyZNmqQ+ffrYt3vjjTfk6uqqxMRETZ48WTVq1NCIESPUoUMH+y8DAQAA4M/HYvvfeW2UGZmZmbJarcrofr+8XEv98lkAwF9QZu4lWT/7XhkZGcU+hcWhWf7//9deS9kr94qepmYxwsWscxoZVqtMfLdlUak9jB0AAAB/X5RMAAAAGI452DIkJydHOTk59vdG/UoSAACAo1Eyy5D4+HiNHDmyyPL/Pj2lzF67MnziE2ZHAMq00f0+NzsC/uYuZp2TPqtldgz8DTFdXobExsYqIyPD/jp06JDZkQAAAG4KI5lliJubm9zc3MyOAQAAcMsYyQQAAIDhKJkAAAAwHCUTAAAAhqNkAgAAwHCUTAAAABiOkgkAAADDUTIBAABgOEomAAAADMfD2A2WkpKit99+W5s2bVJ6eroWLFigmJiYQtts375dw4YN06pVq3Tp0iU1aNBA8+bNU1BQkEmpAQBwnJzsc2ZHMMRf5TxKCyXTYNnZ2QoJCVHv3r3VuXPnIuv37Nmje++9V3369NHIkSPl5eWl3377Te7u7iakBQDAcVxdXRUQEKDR0SFmRzFMQECAXF1dzY5RJlEyDRYdHa3o6Oirrn/55ZfVvn17jR071r7s9ttvd0Q0AABM5e7urn379ik3N9fsKIZxdXVloOgqKJkOlJ+fr6+//lovvviioqKitGXLFtWsWVOxsbFFptQBAPgrcnd3p5T9TXDjjwP9/vvvysrK0ujRo9WuXTt999136tSpkzp37qxVq1aZHQ8AAMAwjGQ6UH5+viSpY8eOev755yVJjRs31po1azR58mS1bt3azHgAAACGYSTTgSpVqqRy5cqpQYMGhZbXr19fBw8eNCkVAACA8SiZDuTq6qpmzZpp586dhZbv2rVLwcHBJqUCAAAwHtPlBsvKylJaWpr9/b59+5SamipfX18FBQVp6NChevTRRxUWFqaIiAgtW7ZMS5YsUXJysnmhAQAADEbJNNjGjRsVERFhfz9kyBBJUo8ePTRt2jR16tRJkydPVnx8vAYOHKi6detq3rx5uvfee82KDAAAYDhKpsHCw8Nls9muuU3v3r3Vu3dvByUCAABwPK7JBAAAgOEomQAAADAcJRMAAACGo2QCAADAcJRMAAAAGI6SCQAAAMNRMgEAAGA4npNZhuTk5CgnJ8f+PjMz08Q0AAAAN4+RzDIkPj5eVqvV/qpevbrZkQAAAG4KJbMMiY2NVUZGhv116NAhsyMBAADcFKbLyxA3Nze5ubmZHQMAAOCWMZIJAAAAw1EyAQAAYDhKJgAAAAxHyQQAAIDhKJkAAAAwHCUTAAAAhqNkAgAAwHCUTAAAABiOkmmwlJQUdejQQYGBgbJYLFq4cGGh9T179pTFYin0ateunUlpAQAASgcl02DZ2dkKCQnRxIkTr7pNu3btlJ6ebn/NmjXLgQkBAABKHz8rabDo6GhFR0dfcxs3NzcFBAQ4KBEAAIDjMZJpguTkZPn7+6tu3bp69tlnderUKbMjAQAAGIqRTAdr166dOnfurJo1a2rPnj166aWXFB0drZ9++knOzs5mxwMAADAEJdPBunXrZv/rO++8U3fddZduv/12JScn6/777zcxGQAAgHGYLjdZrVq1VKlSJaWlpZkdBQAAwDCUTJMdPnxYp06dUpUqVcyOAgAAYBimyw2WlZVVaFRy3759Sk1Nla+vr3x9fTVy5Eh16dJFAQEB2rNnj1588UXVrl1bUVFRJqYGAAAwFiXTYBs3blRERIT9/ZAhQyRJPXr00KRJk/TLL79o+vTpOnv2rAIDA9W2bVu98cYbcnNzMysyAACA4SiZBgsPD5fNZrvq+m+//daBaQAAAMzBNZkAAAAwHCUTAAAAhqNkAgAAwHCUTAAAABiOkgkAAADDUTIBAABgOEomAAAADMdzMsuQnJwc5eTk2N9nZmaamAYAAODmMZJZhsTHx8tqtdpf1atXNzsSAADATaFkliGxsbHKyMiwvw4dOmR2JAAAgJvCdHkZ4ubmxm+YAwCAvwRGMgEAAGA4SiYAAAAMR8kEAACA4SiZAAAAMBwlEwAAAIajZAIAAMBwlEwAAAAYjpIJAAAAw1EyAQAAYDhKpsFSUlLUoUMHBQYGymKxaOHChVfd9plnnpHFYtF7773nwIQAAAClj5JpsOzsbIWEhGjixInX3G7BggVau3atAgMDHZQMAADAcfjtcoNFR0crOjr6mtscOXJEAwYM0LfffqsHHnjAQckAAAAch5FMB8vPz1f37t01dOhQNWzY0Ow4AAAApYKS6WBjxoxRuXLlNHDgQLOjAAAAlBqmyx1o06ZNev/997V582ZZLBaz4wAAAJQaRjId6IcfftDvv/+uoKAglStXTuXKldOBAwf0n//8RzVq1DA7HgAAgGEYyXSg7t27KzIystCyqKgode/eXb169TIpFQAAgPEomQbLyspSWlqa/f2+ffuUmpoqX19fBQUFyc/Pr9D2Li4uCggIUN26dR0dFQAAoNRQMg22ceNGRURE2N8PGTJEktSjRw9NmzbNpFQAAACORck0WHh4uGw2W4m3379/f+mFAQAAMAk3/gAAAMBwlEwAAAAYjpIJAAAAw1EyAQAAYDhKJgAAAAxHyQQAAIDhKJkAAAAwHM/JLENycnKUk5Njf5+ZmWliGgAAgJvHSGYZEh8fL6vVan9Vr17d7EgAAAA3hZJZhsTGxiojI8P+OnTokNmRAAAAbgrT5WWIm5ub3NzczI4BAABwyxjJBAAAgOEomQAAADAcJRMAAACGo2QCAADAcJRMAAAAGI6SCQAAAMNRMgEAAGA4SiYAAAAMR8k0WEpKijp06KDAwEBZLBYtXLiw0Pq4uDjVq1dPFSpUkI+PjyIjI7Vu3TqT0gIAAJQOSqbBsrOzFRISookTJxa7/o477tCECRP066+/avXq1apRo4batm2rEydOODgpAABA6eFnJQ0WHR2t6Ojoq67/17/+Vej9f//7XyUmJuqXX37R/fffX9rxAAAAHIKRTBPl5ubqo48+ktVqVUhIiNlxAAAADMNIpgm++uordevWTefPn1eVKlW0fPlyVapUyexYAAAAhmEk0wQRERFKTU3VmjVr1K5dOz3yyCP6/fffzY4FAABgGEqmCSpUqKDatWurRYsWSkxMVLly5ZSYmGh2LAAAAMNQMsuA/Px85eTkmB0DAADAMFyTabCsrCylpaXZ3+/bt0+pqany9fWVn5+fRo0apYceekhVqlTRyZMnNXHiRB05ckRdu3Y1MTUAAICxKJkG27hxoyIiIuzvhwwZIknq0aOHJk+erB07dmj69Ok6efKk/Pz81KxZM/3www9q2LChWZEBAAAMR8k0WHh4uGw221XXz58/34FpAAAAzME1mQAAADAcJRMAAACGo2QCAADAcJRMAAAAGI6SCQAAAMNRMgEAAGA4SiYAAAAMx3Myy5CcnJxCPy+ZmZlpYhoAAICbx0hmGRIfHy+r1Wp/Va9e3exIAAAAN8Viu9bP08ChihvJrF69ul5L2Sv3ip4mJkNpGj7xCbMjoBSN7ve52RHwN3cx65xGhtVSRkaGvLy8zI6DvxGmy8sQNzc3ubm5mR0DAADgljFdDgAAAMNRMgEAAGA4SiYAAAAMR8kEAACA4SiZAAAAMBwlEwAAAIajZAIAAMBwlEwAAAAYjpIJAAAAw1EyDZaSkqIOHTooMDBQFotFCxcutK/Ly8vTsGHDdOedd6pChQoKDAzUk08+qaNHj5qYGAAAwHiUTINlZ2crJCREEydOLLLu/Pnz2rx5s0aMGKHNmzdr/vz52rlzpx566CETkgIAAJQefrvcYNHR0YqOji52ndVq1fLlywstmzBhgpo3b66DBw8qKCjIEREBAABKHSOZJsvIyJDFYpG3t7fZUQAAAAxDyTTRxYsXNWzYMD322GPy8vIyOw4AAIBhKJkmycvL0yOPPCKbzaZJkyaZHQcAAMBQXJNpgoKCeeDAAa1cuZJRTAAA8JdDyXSwgoK5e/duJSUlyc/Pz+xIAAAAhqNkGiwrK0tpaWn29/v27VNqaqp8fX1VpUoVPfzww9q8ebO++uorXb58WceOHZMk+fr6ytXV1azYAAAAhqJkGmzjxo2KiIiwvx8yZIgkqUePHoqLi9PixYslSY0bNy60X1JSksLDwx2WEwAAoDRRMg0WHh4um8121fXXWgcAAPBXwd3lAAAAMBwlEwAAAIajZAIAAMBwlEwAAAAYjpIJAAAAw1EyAQAAYDhKJgAAAAzHczLLkJycHOXk5NjfZ2RkXFmefc6sSHCAzNxLZkdAKbqYxb+/MFfB/0N4TjMczWLjn7oyIy4uTiNHjjQ7BgDgL+jQoUOqVq2a2THwN0LJLEP+dyTz7NmzCg4O1sGDB2W1Wk1MdnWZmZmqXr26Dh06JC8vL7PjFEG+W1PW80llPyP5bk1ZzyeV/Yw2m03nzp1TYGCgnJy4Sg6Ow3R5GeLm5iY3N7ciy61Wa5n8D9cfeXl5lemM5Ls1ZT2fVPYzku/WlPV8UtnOWFYHKvDXxh9pAAAAYDhKJgAAAAznHBcXF2d2CFyds7OzwsPDVa5c2b2yoaxnJN+tKev5pLKfkXy3pqznk/4cGQGHs+FvY82aNTYnJydb+/btzY5SyFtvvWVr2rSprWLFirbKlSvbOnbsaNuxY4fZsQr54IMPbHfeeafN09PT5unpaWvRooVt6dKlZse6qvj4eJsk26BBg8yOYrPZbLbXXnvNJqnQq27dumbHKuTw4cO2xx9/3Obr62tzd3e3NWrUyLZhwwazY9kFBwcX+Q4l2Z577jmzo9lsNpvt0qVLtldeecVWo0YNm7u7u61WrVq2119/3Zafn292NLvMzEzboEGDbEFBQTZ3d3dby5YtbevXrzctz6pVq2wPPvigrUqVKjZJtgULFhRan5+fbxsxYoQtICDA5u7ubrv//vttu3btMiktcOOYLv8bSUxM1IABA5SSkqKjR4+aHcdu1apV6tevn9auXavly5crLy9Pbdu2VXZ2ttnR7KpVq6bRo0dr06ZN2rhxo9q0aaOOHTvqt99+MztaERs2bNCHH36ou+66y+wohTRs2FDp6en21+rVq82OZHfmzBm1atVKLi4u+uabb7Rt2za988478vHxMTua3YYNGwp9f8uXL5ckde3a1eRkV4wZM0aTJk3ShAkTtH37do0ZM0Zjx45VQkKC2dHsnnrqKS1fvlyfffaZfv31V7Vt21aRkZE6cuSIKXmys7MVEhKiiRMnFrt+7NixGj9+vCZPnqx169apQoUKioqK0sWLFx2cFLhJZrdcOMa5c+dsFStWtO3YscP26KOP2kaNGmV2pKv6/fffbfp/7d15UFX3+cfx9+UqVwQEtZGtBUFWMRGMlipGtkyRqqMxTZBBg2KaSQe3pKVqjYoaNU3aTCJOTUkJNglgidQ1qbixaQlYFaujYkCx0bhElEUtyPL9/eFwft5cNJiQHDTPa4YZec72ufeO8PA933MOqMLCQr2j3FPfvn3VX//6V71jmGloaFA+Pj5q165dKiwsrFuNZA4dOlTvGHc1f/58NXr0aL1j3Je5c+eqQYMGdZuRwnHjxqnExESz2uTJk1V8fLxOiczdvHlTGY1GtX37drP6sGHD1KJFi3RK9f/4ykhmW1ubcnZ2Vm+88YZWq62tVSaTSWVnZ+sRUYj7JiOZPxA5OTn4+/vj5+fH1KlTee+997rt0x/an3TUr18/nZN0rLW1lQ0bNnDjxg1GjhypdxwzSUlJjBs3jieffFLvKBY+++wzXF1d8fLyIj4+nv/+9796R9Js3bqV4cOH88wzzzBgwACCg4N599139Y51V7du3eLDDz8kMTERg8GgdxwARo0axZ49ezh16hQAR44cYd++fcTExOic7LaWlhZaW1vp1auXWd3GxqZbjaq3O3PmDBcvXjT7v+zg4EBISAglJSU6JhOi82SG8g9Eeno6U6dOBWDs2LHU1dVRWFhIeHi4vsG+oq2tjXnz5hEaGsqQIUP0jmPm6NGjjBw5ksbGRuzs7Ni0aRODBw/WO5Zmw4YNHDp0iAMHDugdxUJISAjr16/Hz8+PCxcusGzZMp544gmOHTuGvb293vE4ffo069at4+WXX+b3v/89Bw4cYM6cOVhbW5OQkKB3PAubN2+mtraW6dOn6x1Fs2DBAurr6/H398doNNLa2srKlSuJj4/XOxoA9vb2jBw5khUrVhAQEICTkxPZ2dmUlJTg7e2tdzwLFy9eBMDJycms7uTkpC0ToruTJvMHoKKigrKyMjZt2gRAjx49iI2NJT09vds1mUlJSRw7dqxbjiz4+flRXl5OXV0dGzduJCEhgcIHC+DxAAAQfUlEQVTCwm7RaH7++efMnTuXXbt2WYzUdAd3jmY99thjhISE4OHhQU5ODjNnztQx2W1tbW0MHz6cVatWARAcHMyxY8d45513umWTmZ6eTkxMDK6urnpH0eTk5JCZmUlWVhaBgYGUl5czb948XF1du817+MEHH5CYmIibmxtGo5Fhw4YRFxfHwYMH9Y4mxENJmswfgPT0dFpaWsx+ISmlMJlMrF27tts8CWLWrFls376doqKibvl8XWtra23E4/HHH+fAgQO8/fbb/OUvf9E5GRw8eJDLly8zbNgwrdba2kpRURFr166lqakJo9GoY0Jzjo6O+Pr6UllZqXcUAFxcXCz+WAgICCA3N1enRHd39uxZdu/ezT/+8Q+9o5hJTk5mwYIFTJkyBYBHH32Us2fPsnr16m7TZA4aNIjCwkJu3LhBfX09Li4uxMbG4uXlpXc0C87OzgBcunQJFxcXrX7p0iWCgoL0iiXEfZE5mQ+5lpYW3n//ff70pz9RXl6ufR05cgRXV1eys7P1johSilmzZrFp0yb27t2Lp6en3pE6pa2tzexZ83qKiori6NGjZp/x8OHDiY+Pp7y8vFs1mADXr1+nqqrK7JennkJDQ6moqDCrnTp1Cg8PD50S3V1GRgYDBgxg3Lhxekcxc/PmTYvnYhuNRtra2nRKdHe2tra4uLhw7do18vLymDhxot6RLHh6euLs7MyePXu0Wn19PaWlpd1uLrgQdyMjmQ+57du3c+3aNWbOnGkxYvn000+Tnp7Oiy++qFO625KSksjKymLLli3Y29tr840cHBywsbHRNVu7hQsXEhMTg7u7Ow0NDWRlZVFQUEBeXp7e0YDb882+OofV1taW/v37d4u5rb/97W+ZMGECHh4efPHFFyxduhSj0UhcXJze0QB46aWXGDVqFKtWreLZZ5+lrKyMtLQ00tLS9I5mpq2tjYyMDBISErrdTb8nTJjAypUrcXd3JzAwkMOHD/Pmm2+SmJiodzRNXl4eSin8/PyorKwkOTkZf39/ZsyYoUue69evm43mnzlzhvLycvr164e7uzvz5s3j1VdfxcfHB09PTxYvXoyrqyuTJk3SJa8Q903fi9vFd238+PF3vfl6aWmpAtSRI0e+51Tm6OAG04DKyMjQNdedEhMTlYeHh7K2tlaPPPKIioqKUjt37tQ71j11p1sYxcbGKhcXF2Vtba3c3NxUbGysqqys1DuWmW3btqkhQ4Yok8mk/P39VVpamt6RLOTl5SlAVVRU6B3FwldvdO7l5aUWLVqkmpqa9I6m+fvf/668vLyUtbW1cnZ2VklJSaq2tla3PPn5+R3+7EtISFBK/f/N2J2cnJTJZFJRUVHd8rMX4m4MSnXT+9gIIYQQQogHlszJFEIIIYQQXU6aTCGEEEII0eWkyRRCCCGEEF1OmkwhhBBCCNHlpMkUQgghhBBdTppMIYQQQgjR5aTJFEIIIYQQXU6aTCGEEEII0eWkyRRC3Lf169djMBhYv3693lHuKSUlBYPBQEFBgd5ROlRdXY3BYGD69Ol6RxFCiC4nTaYQ3VR7A3LnV+/evXF1dSUqKoolS5ZQVVWld0whhBCiQz30DiCEuLdBgwYxdepUAJqamrh8+TJlZWWsWLGCVatW8bvf/Y6VK1diMBh0Tirul5ubGydOnMDBwUHvKEII0eWkyRSim/P29iYlJcWivm/fPqZNm8bq1asxGo2sWLHi+w8nvpWePXvi7++vdwwhhPhOyOlyIR5Qo0ePZseOHZhMJl5//XU+//xzi3W2bNlCVFQUffv2pVevXgwZMoQ//vGPtLa2mq1XV1fHH/7wB8LCwnB1dcXa2hpXV1eee+65+z4lf+bMGZ5//nnc3d0xmUy4uLgwffp0zp49a7GuwWAgPDyc8+fP89xzz+Hs7IyVldXXzqGsq6tjyZIlDB48GDs7O/r06YO3tzcJCQkdHgcgKyuLoKAgbGxscHFxYe7cufzvf//rcN2MjAxCQkKws7PDzs6OkJAQi/mn165dw2g0Mn78eLN6eXm5Nr2hsrLSbFl4eDg2NjY0NTUBd5+TGR4ejsFgoLm5mZSUFAYOHIjJZMLX15c///nPHWa+cuUKL7zwAgMGDKB3796MGDGCTZs2PTDzZ4UQDx8ZyRTiAebn58ezzz7LBx98wObNm5k9e7a2bOHChbz22mu4ubkxefJkHBwcKC4uJjk5mdLSUj766CNt3RMnTrBkyRIiIiJ46qmnsLW15eTJk2RlZfHxxx9z6NAhPDw8vjZPaWkp0dHR3Lhxg/Hjx+Pj40N1dTWZmZn885//pKSkBC8vL7NtampqGDlyJP369WPKlCk0NjbSp0+fux5DKUV0dDSlpaWEhoYyduxYrKysOHv2LFu3bmXatGkWWdeuXcuOHTuYOHEikZGR7NixgzVr1nDlyhUyMzPN1p0zZw6pqam4ubkxc+ZMAHJzc5kxYwaHDx/m7bffBqBv374MHTqU4uJiWltbMRqNAOTn52v7ys/Px9vbG4DGxkY+/fRTRo0ahclk+tr3EiAuLo6ysjJiYmIwGo3k5OSQlJREz549+dWvfqWtd/36dcLCwjh+/DijRo1izJgxnDt3jilTphAdHd2pYwkhRJdTQohu6cyZMwpQ0dHR91wvPT1dAWratGlabefOndq2169f1+ptbW3qxRdfVIDauHGjVq+trVU1NTUW+967d6+ysrJSzz//vFk9IyNDASojI0Or3bp1Sw0cOFDZ29urQ4cOma1fXFysjEajGj9+vFkdUICaMWOGamlpuefrbPef//xHAWrSpEkWyxobG1VDQ4P2/dKlSxWgHBwc1MmTJ7X6zZs3la+vr7KyslLnz5/X6oWFhQpQAQEBqra2VqtfvXpV+fr6KkAVFRVp9ZdfflkBqrS0VKtNmDBB+fr6qp/85CcqLi5Oq+/Zs0cBavny5Vqt/TNOSEgwex1hYWEKUCEhIaqurk6rnzx5UvXo0UP5+fmZrf/KK68oQL3wwgtm9d27d2vv8Z2flRBCfB/kdLkQDzhXV1fg9unSdmvXrgUgLS0NW1tbrW4wGHjttdcwGAxkZ2drdQcHB/r162ex74iICAIDA9m9e/fX5ti+fTvV1dUkJycTHBxstmz06NFMnDiRTz75hPr6erNl1tbWvP7669pIYGfZ2NhY1EwmE3Z2dhb1uXPn4ufnZ7ZtXFwcbW1tHDx4UKv/7W9/A27f+ujOi3H69u3L0qVLAcxOO0dERACwd+9eAFpbWykqKiIiIoKIiAiLUU24fSq8s1avXm02quvn50doaCgVFRU0NDRo9Q8//BBra2uWL19utn1UVBQ///nPO308IYToSnK6XIiH0KeffoqtrS3vvfdeh8ttbGw4efKkWa2goIC33nqL0tJSrly5QktLi7bM2tq6U8cEqKio6PBCpYsXL9LW1sapU6cYPny4Vvf09ORHP/pRZ14WAAEBATz22GNkZ2dz7tw5Jk2aRHh4OEFBQVhZdfx38+OPP25R+/GPfwxAbW2tVjt8+DDQcSPY3lCWl5drtTFjxmA0GsnPz2fBggUcPnyYuro6IiMjuXnzJu+//z4nTpwgICCA/Px8bGxsCAkJ6fRr/brc9vb21NfXU11dzeDBg3FycrJYPzQ0lJ07d3b6mEII0VWkyRTiAffFF18A8Mgjj2i1q1ev0tLSwrJly+663Y0bN7R/f/TRR8TGxmJnZ0d0dDQDBw6kd+/e2gUjd7uY5k5Xr14FsJjjeK/jAh02RvfSo0cP9u7dS0pKCrm5ufzmN78Bbr/+WbNmsWjRIotR0Y7mePbocfvH350XQdXX12NlZWX2Xt6Z02AwmI3E9unTh2HDhrF//36am5vJz8/HYDAQERHBzZs3gdsjmB4eHpSVlREWFtaphv1+crfnGTBgQIf7uN/3Vwghuoo0mUI84NqvxB4xYoRW69OnDwaDwewU+r2kpKTQq1cvDh48iI+Pj9myDRs2dGof7Q3Rtm3bLK64vpdvcn/P/v37k5qaypo1azh58iR79+4lNTWVpUuX0rNnTxYuXHjf+4Tbr6GtrY0vv/zSomm7fPkySimLxi8iIoIDBw5QVlZGQUEBgYGBWpPq6elJfn4+Pj4+NDc3a6OhXak9z+XLlztcfunSpS4/phBCdIbMyRTiAXbq1ClycnIwmUw89dRTWj0kJISamho+++yzTu2nqqqKgIAAiwbzwoULnD59ulP7aD8NXFJS0sn0357BYCAgIICkpCR27doFwNatW7/x/trnknZ0C6X2WlBQkFm9vXHcuXMnxcXFREZGassiIyMpKCjQ5mzez3zMzurTpw8DBw6ksrKyw0bzX//6V5cfUwghOkOaTCEeUPv37yc6OpqmpiYWLFiAm5ubtmzOnDkAJCYmUlNTY7HtxYsXOXHihPa9h4cHlZWVZqNejY2N/PrXv6a5ublTeSZOnIi7uztvvvkmRUVFFsubm5vZt29fp1/f3VRXV1NdXW1Rb8/eq1evb7zvhIQEAJYtW2Z2Wryurk6betC+TrvRo0fTo0cP1q1bR0NDg1mTGRERwZUrV0hPT8fW1tZstLkrxcfHc+vWLe3ipHYFBQXk5eV9J8cUQoivI6fLhejmKisrtQtpbt26pT1W8ujRoxiNRl555RWL5mLs2LEsXryYFStW4O3tzdixY/Hw8KCmpobKykqKi4t59dVXCQgIAGD27NnMnj2b4OBgfvnLX9LS0sKuXbtQSjF06FCOHDnytTlNJhMbN24kJiaGsLAwIiMjefTRRzEYDJw9e5bi4mL69+9vccHR/SovL2fy5Mn89Kc/ZfDgwTg7O3P+/Hk2b96MlZUVL7300jfe95gxY5g9ezapqakMGTKEp59+GqUUubm5nDt3jjlz5jBmzBizbezs7BgxYgQlJSVYWVkRFhamLWsf5fzyyy+Jjo6mZ8+e3zjbvcyfP5/c3Fzeeecdjh07xhNPPMG5c+fIyclhwoQJbNu27a4XRQkhxHdFmkwhurmqqiptFM3GxgZHR0f8/f1ZvHgxCQkJDBo0qMPtli9fzpgxY1izZg179uyhtraW/v374+npSUpKCvHx8dq67Tf4Tk1N5d1338XR0ZFx48axevVqnnnmmU5nHTFiBEeOHOGNN97gk08+Yf/+/ZhMJtzc3Jg0aRJxcXHf7s0Ahg8fzvz58ykoKODjjz+mtrYWZ2dnnnzySZKTk/nZz372rfa/Zs0agoODWbduHWlpaQAEBgayfPlyZsyY0eE2ERERlJSUEBwcjKOjo1Z3dXXF19eXU6dOfSenytvZ29tTVFTEwoUL2bJlC//+978JDAwkOzub06dPs23btnve4F4IIb4LBqWU0juEEEKI78bUqVPJzMzk+PHj2si1EEJ8H+T8iRBCPAQuXLhgUSssLGTDhg34+flJgymE+N7J6XIhhHgI/OIXv8DGxoagoCBsbW05fvw4O3bswGg0kpqaqnc8IcQPkJwuF0KIh8Bbb71FZmYmVVVVNDQ04OjoSGhoKAsXLryvpwwJIURXkSZTCCGEEEJ0OZmTKYQQQgghupw0mUIIIYQQostJkymEEEIIIbqcNJlCCCGEEKLLSZMphBBCCCG6nDSZQgghhBCiy0mTKYQQQgghupw0mUIIIYQQosv9H/KbFTR683fiAAAAAElFTkSuQmCC\n",
31 |       "text/plain": [
32 |        "<IPython.core.display.Image object>"
33 |       ]
34 |      },
35 |      "execution_count": 23,
36 |      "metadata": {},
37 |      "output_type": "execute_result"
38 |     }
39 |    ],
40 |    "source": [
41 |     "dealer_sticks = 17\n",
42 |     "\n",
43 |     "from IPython.display import Image \n",
44 |     "assert 11<dealer_sticks<21, 'Please provide a number between 12 and 20.'\n",
45 |     "Image(filename='plots/plt_'+str(dealer_sticks)+'.png')"
46 |    ]
47 |   }
48 |  ],
49 |  "metadata": {
50 |   "kernelspec": {
51 |    "display_name": "Python 3",
52 |    "language": "python",
53 |    "name": "python3"
54 |   },
55 |   "language_info": {
56 |    "codemirror_mode": {
57 |     "name": "ipython",
58 |     "version": 3
59 |    },
60 |    "file_extension": ".py",
61 |    "mimetype": "text/x-python",
62 |    "name": "python",
63 |    "nbconvert_exporter": "python",
64 |    "pygments_lexer": "ipython3",
65 |    "version": "3.6.5"
66 |   }
67 |  },
68 |  "nbformat": 4,
69 |  "nbformat_minor": 2
70 | }
71 | 


--------------------------------------------------------------------------------
/Sample-based Learning Methods/Week 3/03_Policy Evaluation with Temporal Difference Learning.ipynb:
--------------------------------------------------------------------------------
   1 | {
   2 |  "cells": [
   3 |   {
   4 |    "cell_type": "markdown",
   5 |    "metadata": {
   6 |     "deletable": false,
   7 |     "editable": false,
   8 |     "nbgrader": {
   9 |      "checksum": "e8b56fd33339971ca5778f6972734625",
  10 |      "grade": false,
  11 |      "grade_id": "cell-6de02a4f8d8d32df",
  12 |      "locked": true,
  13 |      "schema_version": 1,
  14 |      "solution": false
  15 |     }
  16 |    },
  17 |    "source": [
  18 |     "# Assignment: Policy Evaluation in Cliff Walking Environment\n",
  19 |     "\n",
  20 |     "Welcome to the Course 2 Module 2 Programming Assignment! In this assignment, you will implement one of the fundamental sample and bootstrapping based model free reinforcement learning agents for prediction. This is namely one that uses one-step temporal difference learning, also known as TD(0). The task is to design an agent for policy evaluation in the Cliff Walking environment. Recall that policy evaluation is the prediction problem where the goal is to accurately estimate the values of states given some policy.\n",
  21 |     "\n",
  22 |     "### Learning Objectives\n",
  23 |     "- Implement parts of the Cliff Walking environment, to get experience specifying MDPs [Section 1].\n",
  24 |     "- Implement an agent that uses bootstrapping and, particularly, TD(0) [Section 2].\n",
  25 |     "- Apply TD(0) to estimate value functions for different policies, i.e., run policy evaluation experiments [Section 3]."
  26 |    ]
  27 |   },
  28 |   {
  29 |    "cell_type": "markdown",
  30 |    "metadata": {
  31 |     "deletable": false,
  32 |     "editable": false,
  33 |     "nbgrader": {
  34 |      "checksum": "798b4c4240e0c54b93e87f4e543970e1",
  35 |      "grade": false,
  36 |      "grade_id": "cell-5a9dadfdba2eea90",
  37 |      "locked": true,
  38 |      "schema_version": 1,
  39 |      "solution": false
  40 |     }
  41 |    },
  42 |    "source": [
  43 |     "## The Cliff Walking Environment\n",
  44 |     "\n",
  45 |     "The Cliff Walking environment is a gridworld with a discrete state space and discrete action space. The agent starts at grid cell S. The agent can move (deterministically) to the four neighboring cells by taking actions Up, Down, Left or Right. Trying to move out of the boundary results in staying in the same location. So, for example, trying to move left when at a cell on the leftmost column results in no movement at all and the agent remains in the same location. The agent receives -1 reward per step in most states, and -100 reward when falling off of the cliff. This is an episodic task; termination occurs when the agent reaches the goal grid cell G. Falling off of the cliff results in resetting to the start state, without termination.\n",
  46 |     "\n",
  47 |     "The diagram below showcases the description above and also illustrates two of the policies we will be evaluating.\n",
  48 |     "\n",
  49 |     "<img src=\"cliffwalk.png\" style=\"height:400px\">"
  50 |    ]
  51 |   },
  52 |   {
  53 |    "cell_type": "markdown",
  54 |    "metadata": {
  55 |     "deletable": false,
  56 |     "editable": false,
  57 |     "nbgrader": {
  58 |      "checksum": "e95b77b39709873d39c9fa46afbf4700",
  59 |      "grade": false,
  60 |      "grade_id": "cell-82b38787563a7073",
  61 |      "locked": true,
  62 |      "schema_version": 1,
  63 |      "solution": false
  64 |     }
  65 |    },
  66 |    "source": [
  67 |     "## Packages.\n",
  68 |     "\n",
  69 |     "We import the following libraries that are required for this assignment. We shall be using the following libraries:\n",
  70 |     "1. jdc: Jupyter magic that allows defining classes over multiple jupyter notebook cells.\n",
  71 |     "2. numpy: the fundamental package for scientific computing with Python.\n",
  72 |     "3. matplotlib: the library for plotting graphs in Python.\n",
  73 |     "4. RL-Glue: the library for reinforcement learning experiments.\n",
  74 |     "5. BaseEnvironment, BaseAgent: the base classes from which we will inherit when creating the environment and agent classes in order for them to support the RL-Glue framework.\n",
  75 |     "6. Manager: the file allowing for visualization and testing.\n",
  76 |     "7. itertools.product: the function that can be used easily to compute permutations.\n",
  77 |     "8. tqdm.tqdm: Provides progress bars for visualizing the status of loops.\n",
  78 |     "\n",
  79 |     "**Please do not import other libraries** — this will break the autograder.\n",
  80 |     "\n",
  81 |     "**NOTE: For this notebook, there is no need to make any calls to methods of random number generators. Spurious or missing calls to random number generators may affect your results.**"
  82 |    ]
  83 |   },
  84 |   {
  85 |    "cell_type": "code",
  86 |    "execution_count": null,
  87 |    "metadata": {
  88 |     "deletable": false,
  89 |     "editable": false,
  90 |     "nbgrader": {
  91 |      "checksum": "386dc36257995b1270913ddfcdc1a838",
  92 |      "grade": false,
  93 |      "grade_id": "cell-917f710997077ab6",
  94 |      "locked": true,
  95 |      "schema_version": 1,
  96 |      "solution": false
  97 |     }
  98 |    },
  99 |    "outputs": [],
 100 |    "source": [
 101 |     "# Do not modify this cell!\n",
 102 |     "\n",
 103 |     "import jdc\n",
 104 |     "# --\n",
 105 |     "import numpy as np\n",
 106 |     "# --\n",
 107 |     "from rl_glue import RLGlue\n",
 108 |     "# --\n",
 109 |     "from Agent import BaseAgent \n",
 110 |     "from Environment import BaseEnvironment  \n",
 111 |     "# --\n",
 112 |     "from manager import Manager\n",
 113 |     "# --\n",
 114 |     "from itertools import product\n",
 115 |     "# --\n",
 116 |     "from tqdm import tqdm"
 117 |    ]
 118 |   },
 119 |   {
 120 |    "cell_type": "markdown",
 121 |    "metadata": {
 122 |     "deletable": false,
 123 |     "editable": false,
 124 |     "nbgrader": {
 125 |      "checksum": "83300358f840fb91d5c9ef9a3e77e09c",
 126 |      "grade": false,
 127 |      "grade_id": "cell-bb3b3151b3f4f759",
 128 |      "locked": true,
 129 |      "schema_version": 1,
 130 |      "solution": false
 131 |     }
 132 |    },
 133 |    "source": [
 134 |     "## Section 1. Environment\n",
 135 |     "\n",
 136 |     "In the first part of this assignment, you will get to see how the Cliff Walking environment is implemented. You will also get to implement parts of it to aid your understanding of the environment and more generally how MDPs are specified. In particular, you will implement the logic for:\n",
 137 |     " 1. Converting 2-dimensional coordinates to a single index for the state,\n",
 138 |     " 2. One of the actions (action up), and,\n",
 139 |     " 3. Reward and termination.\n",
 140 |     " \n",
 141 |     "Given below is an annotated diagram of the environment with more details that may help in completing the tasks of this part of the assignment. Note that we will be creating a more general environment where the height and width positions can be variable but the start, goal and cliff grid cells have the same relative positions (bottom left, bottom right and the cells between the start and goal grid cells respectively).\n",
 142 |     "\n",
 143 |     "<img src=\"cliffwalk-annotated.png\" style=\"height:400px\">\n",
 144 |     "\n",
 145 |     "Once you have gone through the code and begun implementing solutions, it may be a good idea to come back here and see if you can convince yourself that the diagram above is an accurate representation of the code given and the code you have written."
 146 |    ]
 147 |   },
 148 |   {
 149 |    "cell_type": "code",
 150 |    "execution_count": null,
 151 |    "metadata": {
 152 |     "deletable": false,
 153 |     "editable": false,
 154 |     "nbgrader": {
 155 |      "checksum": "54342eceb70d3ded536ef90b397df6a9",
 156 |      "grade": false,
 157 |      "grade_id": "cell-3b0342944fae98dd",
 158 |      "locked": true,
 159 |      "schema_version": 1,
 160 |      "solution": false
 161 |     }
 162 |    },
 163 |    "outputs": [],
 164 |    "source": [
 165 |     "# Do not modify this cell!\n",
 166 |     "\n",
 167 |     "# Create empty CliffWalkEnvironment class.\n",
 168 |     "# These methods will be filled in later cells.\n",
 169 |     "class CliffWalkEnvironment(BaseEnvironment):\n",
 170 |     "    def env_init(self, agent_info={}):\n",
 171 |     "        raise NotImplementedError\n",
 172 |     "\n",
 173 |     "    def env_start(self, state):\n",
 174 |     "        raise NotImplementedError\n",
 175 |     "\n",
 176 |     "    def env_step(self, reward, state):\n",
 177 |     "        raise NotImplementedError\n",
 178 |     "\n",
 179 |     "    def env_end(self, reward):\n",
 180 |     "        raise NotImplementedError\n",
 181 |     "        \n",
 182 |     "    def env_cleanup(self, reward):\n",
 183 |     "        raise NotImplementedError\n",
 184 |     "    \n",
 185 |     "    # helper method\n",
 186 |     "    def state(self, loc):\n",
 187 |     "        raise NotImplementedError"
 188 |    ]
 189 |   },
 190 |   {
 191 |    "cell_type": "markdown",
 192 |    "metadata": {
 193 |     "deletable": false,
 194 |     "editable": false,
 195 |     "nbgrader": {
 196 |      "checksum": "4583069bbb85ef2faa1824bf57693198",
 197 |      "grade": false,
 198 |      "grade_id": "cell-ace4da9bae087ba3",
 199 |      "locked": true,
 200 |      "schema_version": 1,
 201 |      "solution": false
 202 |     }
 203 |    },
 204 |    "source": [
 205 |     "## env_init()\n",
 206 |     "\n",
 207 |     "The first function we add to the environment is the initialization function which is called once when an environment object is created. In this function, the grid dimensions and special locations (start and goal locations and the cliff locations) are stored for easy use later."
 208 |    ]
 209 |   },
 210 |   {
 211 |    "cell_type": "code",
 212 |    "execution_count": null,
 213 |    "metadata": {
 214 |     "deletable": false,
 215 |     "editable": false,
 216 |     "nbgrader": {
 217 |      "checksum": "386dd420b57b9661e21454eb72f9783e",
 218 |      "grade": false,
 219 |      "grade_id": "cell-fa1aefc241323c3d",
 220 |      "locked": true,
 221 |      "schema_version": 1,
 222 |      "solution": false
 223 |     }
 224 |    },
 225 |    "outputs": [],
 226 |    "source": [
 227 |     "%%add_to CliffWalkEnvironment\n",
 228 |     "\n",
 229 |     "# Do not modify this cell!\n",
 230 |     "\n",
 231 |     "# Work Required: No.\n",
 232 |     "def env_init(self, env_info={}):\n",
 233 |     "        \"\"\"Setup for the environment called when the experiment first starts.\n",
 234 |     "        Note:\n",
 235 |     "            Initialize a tuple with the reward, first state, boolean\n",
 236 |     "            indicating if it's terminal.\n",
 237 |     "        \"\"\"\n",
 238 |     "        \n",
 239 |     "        # Note, we can setup the following variables later, in env_start() as it is equivalent. \n",
 240 |     "        # Code is left here to adhere to the note above, but these variables are initialized once more\n",
 241 |     "        # in env_start() [See the env_start() function below.]\n",
 242 |     "        \n",
 243 |     "        reward = None\n",
 244 |     "        state = None # See Aside\n",
 245 |     "        termination = None\n",
 246 |     "        self.reward_state_term = (reward, state, termination)\n",
 247 |     "        \n",
 248 |     "        # AN ASIDE: Observation is a general term used in the RL-Glue files that can be interachangeably \n",
 249 |     "        # used with the term \"state\" for our purposes and for this assignment in particular. \n",
 250 |     "        # A difference arises in the use of the terms when we have what is called Partial Observability where \n",
 251 |     "        # the environment may return states that may not fully represent all the information needed to \n",
 252 |     "        # predict values or make decisions (i.e., the environment is non-Markovian.)\n",
 253 |     "        \n",
 254 |     "        # Set the default height to 4 and width to 12 (as in the diagram given above)\n",
 255 |     "        self.grid_h = env_info.get(\"grid_height\", 4) \n",
 256 |     "        self.grid_w = env_info.get(\"grid_width\", 12)\n",
 257 |     "        \n",
 258 |     "        # Now, we can define a frame of reference. Let positive x be towards the direction down and \n",
 259 |     "        # positive y be towards the direction right (following the row-major NumPy convention.)\n",
 260 |     "        # Then, keeping with the usual convention that arrays are 0-indexed, max x is then grid_h - 1 \n",
 261 |     "        # and max y is then grid_w - 1. So, we have:\n",
 262 |     "        # Starting location of agent is the bottom-left corner, (max x, min y). \n",
 263 |     "        self.start_loc = (self.grid_h - 1, 0)\n",
 264 |     "        # Goal location is the bottom-right corner. (max x, max y).\n",
 265 |     "        self.goal_loc = (self.grid_h - 1, self.grid_w - 1)\n",
 266 |     "        \n",
 267 |     "        # The cliff will contain all the cells between the start_loc and goal_loc.\n",
 268 |     "        self.cliff = [(self.grid_h - 1, i) for i in range(1, (self.grid_w - 1))]\n",
 269 |     "        \n",
 270 |     "        # Take a look at the annotated environment diagram given in the above Jupyter Notebook cell to \n",
 271 |     "        # verify that your understanding of the above code is correct for the default case, i.e., where \n",
 272 |     "        # height = 4 and width = 12."
 273 |    ]
 274 |   },
 275 |   {
 276 |    "cell_type": "markdown",
 277 |    "metadata": {
 278 |     "deletable": false,
 279 |     "editable": false,
 280 |     "nbgrader": {
 281 |      "checksum": "367f3f8ad6acdd57c117ec393126b835",
 282 |      "grade": false,
 283 |      "grade_id": "cell-3acbe101e0eef006",
 284 |      "locked": true,
 285 |      "schema_version": 1,
 286 |      "solution": false
 287 |     }
 288 |    },
 289 |    "source": [
 290 |     "## *Implement* state()\n",
 291 |     "    \n",
 292 |     "The agent location can be described as a two-tuple or coordinate (x, y) describing the agent’s position. \n",
 293 |     "However, we can convert the (x, y) tuple into a single index and provide agents with just this integer.\n",
 294 |     "One reason for this choice is that the spatial aspect of the problem is secondary and there is no need \n",
 295 |     "for the agent to know about the exact dimensions of the environment. \n",
 296 |     "From the agent’s viewpoint, it is just perceiving some states, accessing their corresponding values \n",
 297 |     "in a table, and updating them. Both the coordinate (x, y) state representation and the converted coordinate representation are thus equivalent in this sense.\n",
 298 |     "\n",
 299 |     "Given a grid cell location, the state() function should return the state; a single index corresponding to the location in the grid.\n",
 300 |     "\n",
 301 |     "\n",
 302 |     "```\n",
 303 |     "Example: Suppose grid_h is 2 and grid_w is 2. Then, we can write the grid cell two-tuple or coordinate\n",
 304 |     "states as follows (following the usual 0-index convention):\n",
 305 |     "|(0, 0) (0, 1)| |0 1|\n",
 306 |     "|(1, 0) (1, 1)| |2 3|\n",
 307 |     "Assuming row-major order as NumPy does,  we can flatten the latter to get a vector [0 1 2 3].\n",
 308 |     "So, if loc = (0, 0) we return 0. While, for loc = (1, 1) we return 3.\n",
 309 |     "```"
 310 |    ]
 311 |   },
 312 |   {
 313 |    "cell_type": "code",
 314 |    "execution_count": null,
 315 |    "metadata": {
 316 |     "deletable": false,
 317 |     "nbgrader": {
 318 |      "checksum": "bd4aefb2add249390e2b41a037d99598",
 319 |      "grade": false,
 320 |      "grade_id": "cell-f11f3394d921a585",
 321 |      "locked": false,
 322 |      "schema_version": 1,
 323 |      "solution": true
 324 |     }
 325 |    },
 326 |    "outputs": [],
 327 |    "source": [
 328 |     "%%add_to CliffWalkEnvironment\n",
 329 |     "#GRADED FUNCTION: [state]\n",
 330 |     "\n",
 331 |     "# Work Required: Yes. Modify the return statement of this function to return a correct single index as \n",
 332 |     "# the state (see the logic for this in the previous cell.)\n",
 333 |     "# Lines: 1\n",
 334 |     "def state(self, loc):\n",
 335 |     "    ### START CODE HERE ###\n",
 336 |     "    return None\n",
 337 |     "    ### END CODE HERE ###"
 338 |    ]
 339 |   },
 340 |   {
 341 |    "cell_type": "code",
 342 |    "execution_count": null,
 343 |    "metadata": {
 344 |     "deletable": false,
 345 |     "editable": false,
 346 |     "nbgrader": {
 347 |      "checksum": "5c1f5e70a8745afcdffb67c18be40958",
 348 |      "grade": true,
 349 |      "grade_id": "state",
 350 |      "locked": true,
 351 |      "points": 5,
 352 |      "schema_version": 1,
 353 |      "solution": false
 354 |     }
 355 |    },
 356 |    "outputs": [],
 357 |    "source": [
 358 |     "### AUTOGRADER TESTS FOR STATE (5 POINTS)\n",
 359 |     "# NOTE: The test below corresponds to the annotated diagram for the environment\n",
 360 |     "#       given previously and is limited in scope. Hidden tests are used in the autograder.\n",
 361 |     "#       You may wish to run other tests to check your implementation.\n",
 362 |     "def test_state():\n",
 363 |     "    env = CliffWalkEnvironment()\n",
 364 |     "    env.env_init({\"grid_height\": 4, \"grid_width\": 12})\n",
 365 |     "    coords_to_test = [(0, 0), (0, 11), (1, 5), (3, 0), (3, 9), (3, 11)]\n",
 366 |     "    true_states = [0, 11, 17, 36, 45, 47]\n",
 367 |     "    output_states = [env.state(coords) for coords in coords_to_test]\n",
 368 |     "    assert(output_states == true_states)\n",
 369 |     "test_state()"
 370 |    ]
 371 |   },
 372 |   {
 373 |    "cell_type": "markdown",
 374 |    "metadata": {
 375 |     "deletable": false,
 376 |     "editable": false,
 377 |     "nbgrader": {
 378 |      "checksum": "b388d1d1cccc77907f4137339943df39",
 379 |      "grade": false,
 380 |      "grade_id": "cell-dd93e8a1b24bc4cf",
 381 |      "locked": true,
 382 |      "schema_version": 1,
 383 |      "solution": false
 384 |     }
 385 |    },
 386 |    "source": [
 387 |     "## env_start()\n",
 388 |     "\n",
 389 |     "In env_start(), we initialize the agent location to be the start location and return the state corresponding to it as the first state for the agent to act upon. Additionally, we also set the reward and termination terms to be 0 and False respectively as they are consistent with the notion that there is no reward nor termination before the first action is even taken."
 390 |    ]
 391 |   },
 392 |   {
 393 |    "cell_type": "code",
 394 |    "execution_count": null,
 395 |    "metadata": {
 396 |     "deletable": false,
 397 |     "editable": false,
 398 |     "nbgrader": {
 399 |      "checksum": "58fed26db20dabd6fa4570b1f399a148",
 400 |      "grade": false,
 401 |      "grade_id": "cell-c71fa14686edfa0b",
 402 |      "locked": true,
 403 |      "schema_version": 1,
 404 |      "solution": false
 405 |     }
 406 |    },
 407 |    "outputs": [],
 408 |    "source": [
 409 |     "%%add_to CliffWalkEnvironment\n",
 410 |     "\n",
 411 |     "# Do not modify this cell!\n",
 412 |     "\n",
 413 |     "# Work Required: No.\n",
 414 |     "def env_start(self):\n",
 415 |     "    \"\"\"The first method called when the episode starts, called before the\n",
 416 |     "    agent starts.\n",
 417 |     "\n",
 418 |     "    Returns:\n",
 419 |     "        The first state from the environment.\n",
 420 |     "    \"\"\"\n",
 421 |     "    reward = 0\n",
 422 |     "    # agent_loc will hold the current location of the agent\n",
 423 |     "    self.agent_loc = self.start_loc\n",
 424 |     "    # state is the one dimensional state representation of the agent location.\n",
 425 |     "    state = self.state(self.agent_loc)\n",
 426 |     "    termination = False\n",
 427 |     "    self.reward_state_term = (reward, state, termination)\n",
 428 |     "\n",
 429 |     "    return self.reward_state_term[1]"
 430 |    ]
 431 |   },
 432 |   {
 433 |    "cell_type": "markdown",
 434 |    "metadata": {
 435 |     "deletable": false,
 436 |     "editable": false,
 437 |     "nbgrader": {
 438 |      "checksum": "1c21e0e861d5f3220fff345e3d5e07b0",
 439 |      "grade": false,
 440 |      "grade_id": "cell-bf5f7e78e0019780",
 441 |      "locked": true,
 442 |      "schema_version": 1,
 443 |      "solution": false
 444 |     }
 445 |    },
 446 |    "source": [
 447 |     "## *Implement* env_step()\n",
 448 |     "\n",
 449 |     "Once an action is taken by the agent, the environment must provide a new state, reward and termination signal. \n",
 450 |     "\n",
 451 |     "In the Cliff Walking environment, agents move around using a 4-cell neighborhood called the Von Neumann neighborhood (https://en.wikipedia.org/wiki/Von_Neumann_neighborhood). Thus, the agent has 4 available actions at each state. Three of the actions have been implemented for you and your first task is to implement the logic for the fourth action (Action UP).\n",
 452 |     "\n",
 453 |     "Your second task for this function is to implement the reward logic. Look over the environment description given earlier in this notebook if you need a refresher for how the reward signal is defined."
 454 |    ]
 455 |   },
 456 |   {
 457 |    "cell_type": "code",
 458 |    "execution_count": null,
 459 |    "metadata": {
 460 |     "deletable": false,
 461 |     "nbgrader": {
 462 |      "checksum": "e35e399a59401a069393c4fc781bad8c",
 463 |      "grade": false,
 464 |      "grade_id": "cell-45dfc79809d59695",
 465 |      "locked": false,
 466 |      "schema_version": 1,
 467 |      "solution": true
 468 |     }
 469 |    },
 470 |    "outputs": [],
 471 |    "source": [
 472 |     "%%add_to CliffWalkEnvironment\n",
 473 |     "#GRADED FUNCTION: [env_step]\n",
 474 |     "\n",
 475 |     "# Work Required: Yes. Fill in the code for action UP and implement the logic for reward and termination.\n",
 476 |     "# Lines: ~7.\n",
 477 |     "def env_step(self, action):\n",
 478 |     "    \"\"\"A step taken by the environment.\n",
 479 |     "\n",
 480 |     "    Args:\n",
 481 |     "        action: The action taken by the agent\n",
 482 |     "\n",
 483 |     "    Returns:\n",
 484 |     "        (float, state, Boolean): a tuple of the reward, state,\n",
 485 |     "            and boolean indicating if it's terminal.\n",
 486 |     "    \"\"\"\n",
 487 |     "\n",
 488 |     "    if action == 0: # UP (Task 1)\n",
 489 |     "        ### START CODE HERE ###\n",
 490 |     "        # Hint: Look at the code given for the other actions and think about the logic in them.\n",
 491 |     "        pass \n",
 492 |     "        ### END CODE HERE ###\n",
 493 |     "    elif action == 1: # LEFT\n",
 494 |     "        possible_next_loc = (self.agent_loc[0], self.agent_loc[1] - 1)\n",
 495 |     "        if possible_next_loc[1] >= 0: # Within Bounds?\n",
 496 |     "            self.agent_loc = possible_next_loc\n",
 497 |     "        else:\n",
 498 |     "            pass # Stay.\n",
 499 |     "    elif action == 2: # DOWN\n",
 500 |     "        possible_next_loc = (self.agent_loc[0] + 1, self.agent_loc[1])\n",
 501 |     "        if possible_next_loc[0] < self.grid_h: # Within Bounds?\n",
 502 |     "            self.agent_loc = possible_next_loc\n",
 503 |     "        else:\n",
 504 |     "            pass # Stay.\n",
 505 |     "    elif action == 3: # RIGHT\n",
 506 |     "        possible_next_loc = (self.agent_loc[0], self.agent_loc[1] + 1)\n",
 507 |     "        if possible_next_loc[1] < self.grid_w: # Within Bounds?\n",
 508 |     "            self.agent_loc = possible_next_loc\n",
 509 |     "        else:\n",
 510 |     "            pass # Stay.\n",
 511 |     "    else: \n",
 512 |     "        raise Exception(str(action) + \" not in recognized actions [0: Up, 1: Left, 2: Down, 3: Right]!\")\n",
 513 |     "\n",
 514 |     "    reward = -1\n",
 515 |     "    terminal = False\n",
 516 |     "\n",
 517 |     "    ### START CODE HERE ###\n",
 518 |     "    # Hint: Consider the initialization of reward and terminal variables above. Then, note the \n",
 519 |     "    # conditional statements and comments given below and carefully ensure to set the variables reward \n",
 520 |     "    # and terminal correctly for each case.\n",
 521 |     "    if self.agent_loc == self.goal_loc: # Reached Goal!\n",
 522 |     "        pass\n",
 523 |     "    elif self.agent_loc in self.cliff: # Fell into the cliff!\n",
 524 |     "        pass\n",
 525 |     "    else: \n",
 526 |     "        pass\n",
 527 |     "    ### END CODE HERE ###\n",
 528 |     "\n",
 529 |     "    self.reward_state_term = (reward, self.state(self.agent_loc), terminal)\n",
 530 |     "    return self.reward_state_term"
 531 |    ]
 532 |   },
 533 |   {
 534 |    "cell_type": "code",
 535 |    "execution_count": null,
 536 |    "metadata": {
 537 |     "deletable": false,
 538 |     "editable": false,
 539 |     "nbgrader": {
 540 |      "checksum": "265bd639fb4f3a322d51ff96597f5648",
 541 |      "grade": true,
 542 |      "grade_id": "action_up",
 543 |      "locked": true,
 544 |      "points": 5,
 545 |      "schema_version": 1,
 546 |      "solution": false
 547 |     }
 548 |    },
 549 |    "outputs": [],
 550 |    "source": [
 551 |     "### AUTOGRADER TESTS FOR ACTION UP (5 POINTS)\n",
 552 |     "# NOTE: The test below is again limited in scope. Hidden tests are used in the autograder.\n",
 553 |     "#       You may wish to run other tests to check your implementation.\n",
 554 |     "def test_action_up():\n",
 555 |     "    env = CliffWalkEnvironment()\n",
 556 |     "    env.env_init({\"grid_height\": 4, \"grid_width\": 12})\n",
 557 |     "    env.agent_loc = (0, 0)\n",
 558 |     "    env.env_step(0)\n",
 559 |     "    assert(env.agent_loc == (0, 0))\n",
 560 |     "    \n",
 561 |     "    env.agent_loc = (1, 0)\n",
 562 |     "    env.env_step(0)\n",
 563 |     "    assert(env.agent_loc == (0, 0))\n",
 564 |     "test_action_up()"
 565 |    ]
 566 |   },
 567 |   {
 568 |    "cell_type": "code",
 569 |    "execution_count": null,
 570 |    "metadata": {
 571 |     "deletable": false,
 572 |     "editable": false,
 573 |     "nbgrader": {
 574 |      "checksum": "73b383e7e0a1c203a26a29b850684a52",
 575 |      "grade": true,
 576 |      "grade_id": "reward_termination",
 577 |      "locked": true,
 578 |      "points": 10,
 579 |      "schema_version": 1,
 580 |      "solution": false
 581 |     }
 582 |    },
 583 |    "outputs": [],
 584 |    "source": [
 585 |     "### AUTOGRADER TESTS FOR REWARD & TERMINATION (10 POINTS)\n",
 586 |     "# NOTE: The test below is limited in scope. Hidden tests are used in the autograder.\n",
 587 |     "#       You may wish to run other tests to check your implementation.\n",
 588 |     "def test_reward():\n",
 589 |     "    env = CliffWalkEnvironment()\n",
 590 |     "    env.env_init({\"grid_height\": 4, \"grid_width\": 12})\n",
 591 |     "    env.agent_loc = (0, 0)\n",
 592 |     "    reward_state_term = env.env_step(0)\n",
 593 |     "    assert(reward_state_term[0] == -1 and reward_state_term[1] == env.state((0, 0)) and\n",
 594 |     "           reward_state_term[2] == False)\n",
 595 |     "    \n",
 596 |     "    env.agent_loc = (3, 1)\n",
 597 |     "    reward_state_term = env.env_step(2)\n",
 598 |     "    assert(reward_state_term[0] == -100 and reward_state_term[1] == env.state((3, 0)) and\n",
 599 |     "           reward_state_term[2] == False)\n",
 600 |     "    \n",
 601 |     "    env.agent_loc = (2, 11)\n",
 602 |     "    reward_state_term = env.env_step(2)\n",
 603 |     "    assert(reward_state_term[0] == -1 and reward_state_term[1] == env.state((3, 11)) and\n",
 604 |     "           reward_state_term[2] == True)\n",
 605 |     "test_reward()"
 606 |    ]
 607 |   },
 608 |   {
 609 |    "cell_type": "markdown",
 610 |    "metadata": {
 611 |     "deletable": false,
 612 |     "editable": false,
 613 |     "nbgrader": {
 614 |      "checksum": "4a499bb0871653446d6b303c2625df09",
 615 |      "grade": false,
 616 |      "grade_id": "cell-88082a3dab94c4a8",
 617 |      "locked": true,
 618 |      "schema_version": 1,
 619 |      "solution": false
 620 |     }
 621 |    },
 622 |    "source": [
 623 |     "## env_cleanup()\n",
 624 |     "\n",
 625 |     "There is not much cleanup to do for the Cliff Walking environment. Here, we simply reset the agent location to be the start location in this function."
 626 |    ]
 627 |   },
 628 |   {
 629 |    "cell_type": "code",
 630 |    "execution_count": null,
 631 |    "metadata": {
 632 |     "deletable": false,
 633 |     "editable": false,
 634 |     "nbgrader": {
 635 |      "checksum": "ef29fa83791e5f064a53af742f2859a2",
 636 |      "grade": false,
 637 |      "grade_id": "cell-4c235eb2667f9f0d",
 638 |      "locked": true,
 639 |      "schema_version": 1,
 640 |      "solution": false
 641 |     }
 642 |    },
 643 |    "outputs": [],
 644 |    "source": [
 645 |     "%%add_to CliffWalkEnvironment\n",
 646 |     "\n",
 647 |     "# Do not modify this cell!\n",
 648 |     "\n",
 649 |     "# Work Required: No.\n",
 650 |     "def env_cleanup(self):\n",
 651 |     "    \"\"\"Cleanup done after the environment ends\"\"\"\n",
 652 |     "    self.agent_loc = self.start_loc"
 653 |    ]
 654 |   },
 655 |   {
 656 |    "cell_type": "markdown",
 657 |    "metadata": {
 658 |     "deletable": false,
 659 |     "editable": false,
 660 |     "nbgrader": {
 661 |      "checksum": "180b31b525fb94442eb79f07d7097700",
 662 |      "grade": false,
 663 |      "grade_id": "cell-ff4e65eb47d735c9",
 664 |      "locked": true,
 665 |      "schema_version": 1,
 666 |      "solution": false
 667 |     }
 668 |    },
 669 |    "source": [
 670 |     "## Section 2. Agent\n",
 671 |     "\n",
 672 |     "In this second part of the assignment, you will be implementing the key updates for Temporal Difference Learning. There are two cases to consider depending on whether an action leads to a terminal state or not."
 673 |    ]
 674 |   },
 675 |   {
 676 |    "cell_type": "code",
 677 |    "execution_count": null,
 678 |    "metadata": {
 679 |     "deletable": false,
 680 |     "editable": false,
 681 |     "nbgrader": {
 682 |      "checksum": "d4eaa5e48e54f9c5e55149325b215d47",
 683 |      "grade": false,
 684 |      "grade_id": "cell-3abcfd06cde56935",
 685 |      "locked": true,
 686 |      "schema_version": 1,
 687 |      "solution": false
 688 |     }
 689 |    },
 690 |    "outputs": [],
 691 |    "source": [
 692 |     "# Do not modify this cell!\n",
 693 |     "\n",
 694 |     "# Create empty TDAgent class.\n",
 695 |     "# These methods will be filled in later cells.\n",
 696 |     "\n",
 697 |     "class TDAgent(BaseAgent):\n",
 698 |     "    def agent_init(self, agent_info={}):\n",
 699 |     "        raise NotImplementedError\n",
 700 |     "        \n",
 701 |     "    def agent_start(self, state):\n",
 702 |     "        raise NotImplementedError\n",
 703 |     "\n",
 704 |     "    def agent_step(self, reward, state):\n",
 705 |     "        raise NotImplementedError\n",
 706 |     "\n",
 707 |     "    def agent_end(self, reward):\n",
 708 |     "        raise NotImplementedError\n",
 709 |     "\n",
 710 |     "    def agent_cleanup(self):        \n",
 711 |     "        raise NotImplementedError\n",
 712 |     "        \n",
 713 |     "    def agent_message(self, message):\n",
 714 |     "        raise NotImplementedError"
 715 |    ]
 716 |   },
 717 |   {
 718 |    "cell_type": "markdown",
 719 |    "metadata": {
 720 |     "deletable": false,
 721 |     "editable": false,
 722 |     "nbgrader": {
 723 |      "checksum": "83d69efb64983138188a2bad513622b1",
 724 |      "grade": false,
 725 |      "grade_id": "cell-edd826505e77e70a",
 726 |      "locked": true,
 727 |      "schema_version": 1,
 728 |      "solution": false
 729 |     }
 730 |    },
 731 |    "source": [
 732 |     "## agent_init()\n",
 733 |     "\n",
 734 |     "As we did with the environment, we first initialize the agent once when a TDAgent object is created. In this function, we create a random number generator, seeded with the seed provided in the agent_info dictionary to get reproducible results. We also set the policy, discount and step size based on the agent_info dictionary. Finally, with a convention that the policy is always specified as a mapping from states to actions and so is an array of size (# States, # Actions), we initialize a values array of shape (# States,) to zeros."
 735 |    ]
 736 |   },
 737 |   {
 738 |    "cell_type": "code",
 739 |    "execution_count": null,
 740 |    "metadata": {
 741 |     "deletable": false,
 742 |     "editable": false,
 743 |     "nbgrader": {
 744 |      "checksum": "24d451072e0b43f3ad7980887c6f5861",
 745 |      "grade": false,
 746 |      "grade_id": "cell-077135deef2f8fd9",
 747 |      "locked": true,
 748 |      "schema_version": 1,
 749 |      "solution": false
 750 |     }
 751 |    },
 752 |    "outputs": [],
 753 |    "source": [
 754 |     "%%add_to TDAgent\n",
 755 |     "\n",
 756 |     "# Do not modify this cell!\n",
 757 |     "\n",
 758 |     "# Work Required: No.\n",
 759 |     "def agent_init(self, agent_info={}):\n",
 760 |     "    \"\"\"Setup for the agent called when the experiment first starts.\"\"\"\n",
 761 |     "\n",
 762 |     "    # Create a random number generator with the provided seed to seed the agent for reproducibility.\n",
 763 |     "    self.rand_generator = np.random.RandomState(agent_info.get(\"seed\"))\n",
 764 |     "\n",
 765 |     "    # Policy will be given, recall that the goal is to accurately estimate its corresponding value function. \n",
 766 |     "    self.policy = agent_info.get(\"policy\")\n",
 767 |     "    # Discount factor (gamma) to use in the updates.\n",
 768 |     "    self.discount = agent_info.get(\"discount\")\n",
 769 |     "    # The learning rate or step size parameter (alpha) to use in updates.\n",
 770 |     "    self.step_size = agent_info.get(\"step_size\")\n",
 771 |     "\n",
 772 |     "    # Initialize an array of zeros that will hold the values.\n",
 773 |     "    # Recall that the policy can be represented as a (# States, # Actions) array. With the \n",
 774 |     "    # assumption that this is the case, we can use the first dimension of the policy to\n",
 775 |     "    # initialize the array for values.\n",
 776 |     "    self.values = np.zeros((self.policy.shape[0],))"
 777 |    ]
 778 |   },
 779 |   {
 780 |    "cell_type": "markdown",
 781 |    "metadata": {
 782 |     "deletable": false,
 783 |     "editable": false,
 784 |     "nbgrader": {
 785 |      "checksum": "a4f9d05505f09198739674b1b330b9cd",
 786 |      "grade": false,
 787 |      "grade_id": "cell-99e8e59cd1f7a5ef",
 788 |      "locked": true,
 789 |      "schema_version": 1,
 790 |      "solution": false
 791 |     }
 792 |    },
 793 |    "source": [
 794 |     "# agent_start()\n",
 795 |     "\n",
 796 |     "In agent_start(), we choose an action based on the initial state and policy we are evaluating. We also cache the state so that we can later update its value when we perform a Temporal Difference update. Finally, we return the action chosen so that the RL loop can continue and the environment can execute this action."
 797 |    ]
 798 |   },
 799 |   {
 800 |    "cell_type": "code",
 801 |    "execution_count": null,
 802 |    "metadata": {
 803 |     "deletable": false,
 804 |     "editable": false,
 805 |     "nbgrader": {
 806 |      "checksum": "8c5490c3d5fa88339b02cdf933f7f106",
 807 |      "grade": false,
 808 |      "grade_id": "cell-1b6dd05f7f49c1fc",
 809 |      "locked": true,
 810 |      "schema_version": 1,
 811 |      "solution": false
 812 |     }
 813 |    },
 814 |    "outputs": [],
 815 |    "source": [
 816 |     "%%add_to TDAgent\n",
 817 |     "\n",
 818 |     "# Do not modify this cell!\n",
 819 |     "\n",
 820 |     "# Work Required: No.\n",
 821 |     "def agent_start(self, state):\n",
 822 |     "    \"\"\"The first method called when the episode starts, called after\n",
 823 |     "    the environment starts.\n",
 824 |     "    Args:\n",
 825 |     "        state (Numpy array): the state from the environment's env_start function.\n",
 826 |     "    Returns:\n",
 827 |     "        The first action the agent takes.\n",
 828 |     "    \"\"\"\n",
 829 |     "    # The policy can be represented as a (# States, # Actions) array. So, we can use \n",
 830 |     "    # the second dimension here when choosing an action.\n",
 831 |     "    action = self.rand_generator.choice(range(self.policy.shape[1]), p=self.policy[state])\n",
 832 |     "    self.last_state = state\n",
 833 |     "    return action"
 834 |    ]
 835 |   },
 836 |   {
 837 |    "cell_type": "markdown",
 838 |    "metadata": {
 839 |     "deletable": false,
 840 |     "editable": false,
 841 |     "nbgrader": {
 842 |      "checksum": "fd9be2a7be0a8fbd3aee6b7c14f4ec37",
 843 |      "grade": false,
 844 |      "grade_id": "cell-a472a21b4a57b885",
 845 |      "locked": true,
 846 |      "schema_version": 1,
 847 |      "solution": false
 848 |     }
 849 |    },
 850 |    "source": [
 851 |     "## *Implement* agent_step()\n",
 852 |     "\n",
 853 |     "In agent_step(), the agent must:\n",
 854 |     "\n",
 855 |     "- Perform an update to improve the value estimate of the previously visited state, and\n",
 856 |     "- Act based on the state provided by the environment.\n",
 857 |     "\n",
 858 |     "The latter of the two steps above has been implemented for you. Implement the former. Note that, unlike later in agent_end(), the episode has not yet ended in agent_step(). in other words, the previously observed state was not a terminal state."
 859 |    ]
 860 |   },
 861 |   {
 862 |    "cell_type": "code",
 863 |    "execution_count": null,
 864 |    "metadata": {
 865 |     "deletable": false,
 866 |     "nbgrader": {
 867 |      "checksum": "4c68b1daf60c5fbf3c75d4329048af15",
 868 |      "grade": false,
 869 |      "grade_id": "cell-2bec3235783127e8",
 870 |      "locked": false,
 871 |      "schema_version": 1,
 872 |      "solution": true
 873 |     }
 874 |    },
 875 |    "outputs": [],
 876 |    "source": [
 877 |     "%%add_to TDAgent\n",
 878 |     "\n",
 879 |     "#[GRADED] FUNCTION: [agent_step]\n",
 880 |     "\n",
 881 |     "# Work Required: Yes. Fill in the TD-target and update.\n",
 882 |     "# Lines: ~2.\n",
 883 |     "def agent_step(self, reward, state):\n",
 884 |     "    \"\"\"A step taken by the agent.\n",
 885 |     "    Args:\n",
 886 |     "        reward (float): the reward received for taking the last action taken\n",
 887 |     "        state (Numpy array): the state from the\n",
 888 |     "            environment's step after the last action, i.e., where the agent ended up after the\n",
 889 |     "            last action\n",
 890 |     "    Returns:\n",
 891 |     "        The action the agent is taking.\n",
 892 |     "    \"\"\"\n",
 893 |     "    ### START CODE HERE ###\n",
 894 |     "    # Hint: We should perform an update with the last state given that we now have the reward and\n",
 895 |     "    # next state. We break this into two steps. Recall for example that the Monte-Carlo update \n",
 896 |     "    # had the form: V[S_t] = V[S_t] + alpha * (target - V[S_t]), where the target was the return, G_t.\n",
 897 |     "    target = None\n",
 898 |     "    self.values[self.last_state] = None\n",
 899 |     "    ### END CODE HERE ###\n",
 900 |     "\n",
 901 |     "    # Having updated the value for the last state, we now act based on the current \n",
 902 |     "    # state, and set the last state to be current one as we will next be making an \n",
 903 |     "    # update with it when agent_step is called next once the action we return from this function \n",
 904 |     "    # is executed in the environment.\n",
 905 |     "\n",
 906 |     "    action = self.rand_generator.choice(range(self.policy.shape[1]), p=self.policy[state])\n",
 907 |     "    self.last_state = state\n",
 908 |     "\n",
 909 |     "    return action"
 910 |    ]
 911 |   },
 912 |   {
 913 |    "cell_type": "markdown",
 914 |    "metadata": {
 915 |     "deletable": false,
 916 |     "editable": false,
 917 |     "nbgrader": {
 918 |      "checksum": "8f516e1bdb6ca3f1a1ce385cf2c56765",
 919 |      "grade": false,
 920 |      "grade_id": "cell-f5ed31e224e22cda",
 921 |      "locked": true,
 922 |      "schema_version": 1,
 923 |      "solution": false
 924 |     }
 925 |    },
 926 |    "source": [
 927 |     "## *Implement* agent_end() \n",
 928 |     "\n",
 929 |     "Implement the TD update for the case where an action leads to a terminal state."
 930 |    ]
 931 |   },
 932 |   {
 933 |    "cell_type": "code",
 934 |    "execution_count": null,
 935 |    "metadata": {
 936 |     "deletable": false,
 937 |     "nbgrader": {
 938 |      "checksum": "65b123f954dd8176f089d389bc2e9735",
 939 |      "grade": false,
 940 |      "grade_id": "cell-08c5ac56c1a0a841",
 941 |      "locked": false,
 942 |      "schema_version": 1,
 943 |      "solution": true
 944 |     }
 945 |    },
 946 |    "outputs": [],
 947 |    "source": [
 948 |     "%%add_to TDAgent\n",
 949 |     "\n",
 950 |     "#[GRADED] FUNCTION: [agent_end]\n",
 951 |     "\n",
 952 |     "# Work Required: Yes. Fill in the TD-target and update.\n",
 953 |     "# Lines: ~2.\n",
 954 |     "def agent_end(self, reward):\n",
 955 |     "    \"\"\"Run when the agent terminates.\n",
 956 |     "    Args:\n",
 957 |     "        reward (float): the reward the agent received for entering the terminal state.\n",
 958 |     "    \"\"\"\n",
 959 |     "    ### START CODE HERE ###\n",
 960 |     "    # Hint: Here too, we should perform an update with the last state given that we now have the \n",
 961 |     "    # reward. Note that in this case, the action led to termination. Once more, we break this into \n",
 962 |     "    # two steps, computing the target and the update itself that uses the target and the \n",
 963 |     "    # current value estimate for the state whose value we are updating.\n",
 964 |     "    target = None\n",
 965 |     "    self.values[self.last_state] = None\n",
 966 |     "    ### END CODE HERE ###"
 967 |    ]
 968 |   },
 969 |   {
 970 |    "cell_type": "markdown",
 971 |    "metadata": {
 972 |     "deletable": false,
 973 |     "editable": false,
 974 |     "nbgrader": {
 975 |      "checksum": "8e12df5724ecfa704bc67e2577c2ea34",
 976 |      "grade": false,
 977 |      "grade_id": "cell-06cbc6053aa156bc",
 978 |      "locked": true,
 979 |      "schema_version": 1,
 980 |      "solution": false
 981 |     }
 982 |    },
 983 |    "source": [
 984 |     "## agent_cleanup()\n",
 985 |     "\n",
 986 |     "In cleanup, we simply reset the last state to be None to ensure that we are not storing any states past an episode."
 987 |    ]
 988 |   },
 989 |   {
 990 |    "cell_type": "code",
 991 |    "execution_count": null,
 992 |    "metadata": {
 993 |     "deletable": false,
 994 |     "editable": false,
 995 |     "nbgrader": {
 996 |      "checksum": "540f808d320ce14b29ee7e8ddcda6ae7",
 997 |      "grade": false,
 998 |      "grade_id": "cell-c39f860970dfe3a7",
 999 |      "locked": true,
1000 |      "schema_version": 1,
1001 |      "solution": false
1002 |     }
1003 |    },
1004 |    "outputs": [],
1005 |    "source": [
1006 |     "%%add_to TDAgent\n",
1007 |     "\n",
1008 |     "# Do not modify this cell!\n",
1009 |     "\n",
1010 |     "# Work Required: No.\n",
1011 |     "def agent_cleanup(self):\n",
1012 |     "    \"\"\"Cleanup done after the agent ends.\"\"\"\n",
1013 |     "    self.last_state = None"
1014 |    ]
1015 |   },
1016 |   {
1017 |    "cell_type": "markdown",
1018 |    "metadata": {
1019 |     "deletable": false,
1020 |     "editable": false,
1021 |     "nbgrader": {
1022 |      "checksum": "f62a4cb327c230937f3bc459f305e6f6",
1023 |      "grade": false,
1024 |      "grade_id": "cell-aadbaa36598224e6",
1025 |      "locked": true,
1026 |      "schema_version": 1,
1027 |      "solution": false
1028 |     }
1029 |    },
1030 |    "source": [
1031 |     "## agent_message()\n",
1032 |     "\n",
1033 |     "agent_message() can generally be used to get different kinds of information about an RLGlue agent in the interaction loop of RLGlue. Here, we conditonally check for a message matching \"get_values\" and use it to retrieve the values table the agent has been updating over time."
1034 |    ]
1035 |   },
1036 |   {
1037 |    "cell_type": "code",
1038 |    "execution_count": null,
1039 |    "metadata": {
1040 |     "deletable": false,
1041 |     "editable": false,
1042 |     "nbgrader": {
1043 |      "checksum": "9e222e9cc5396065dafe6e39a2052097",
1044 |      "grade": false,
1045 |      "grade_id": "cell-a440254b4f0573e2",
1046 |      "locked": true,
1047 |      "schema_version": 1,
1048 |      "solution": false
1049 |     }
1050 |    },
1051 |    "outputs": [],
1052 |    "source": [
1053 |     "%%add_to TDAgent\n",
1054 |     "\n",
1055 |     "# Do not modify this cell!\n",
1056 |     "\n",
1057 |     "# Work Required: No.\n",
1058 |     "def agent_message(self, message):\n",
1059 |     "    \"\"\"A function used to pass information from the agent to the experiment.\n",
1060 |     "    Args:\n",
1061 |     "        message: The message passed to the agent.\n",
1062 |     "    Returns:\n",
1063 |     "        The response (or answer) to the message.\n",
1064 |     "    \"\"\"\n",
1065 |     "    if message == \"get_values\":\n",
1066 |     "        return self.values\n",
1067 |     "    else:\n",
1068 |     "        raise Exception(\"TDAgent.agent_message(): Message not understood!\")"
1069 |    ]
1070 |   },
1071 |   {
1072 |    "cell_type": "code",
1073 |    "execution_count": null,
1074 |    "metadata": {
1075 |     "deletable": false,
1076 |     "editable": false,
1077 |     "nbgrader": {
1078 |      "checksum": "6830d90d89beea893034ebe0d7b2373c",
1079 |      "grade": true,
1080 |      "grade_id": "td_updates",
1081 |      "locked": true,
1082 |      "points": 20,
1083 |      "schema_version": 1,
1084 |      "solution": false
1085 |     }
1086 |    },
1087 |    "outputs": [],
1088 |    "source": [
1089 |     "### AUTOGRADER TESTS FOR TD-UPDATES (20 POINTS)\n",
1090 |     "# NOTE: The test belows serve as a good check in debugging your code for the TD updates. However, \n",
1091 |     "#       as with the other tests, it is limited in scope. Hidden tests are used in the autograder.\n",
1092 |     "#       You may wish to run other tests to check your implementation.\n",
1093 |     "def test_td_updates():\n",
1094 |     "    # The following test checks that the TD check works for a case where the transition \n",
1095 |     "    # garners reward -1 and does not lead to a terminal state. This is in a simple two state setting \n",
1096 |     "    # where there is only one action. The first state's current value estimate is 0 while the second is 1.\n",
1097 |     "    # Note the discount and step size if you are debugging this test.\n",
1098 |     "    agent = TDAgent()\n",
1099 |     "    policy_list = np.array([[1.], [1.]])\n",
1100 |     "    agent.agent_init({\"policy\": np.array(policy_list), \"discount\": 0.99, \"step_size\": 0.1})\n",
1101 |     "    agent.values = np.array([0., 1.])\n",
1102 |     "    agent.agent_start(0)\n",
1103 |     "    reward = -1\n",
1104 |     "    next_state = 1\n",
1105 |     "    agent.agent_step(reward, next_state)\n",
1106 |     "    assert(np.isclose(agent.values[0], -0.001) and np.isclose(agent.values[1], 1.))\n",
1107 |     "    \n",
1108 |     "    # The following test checks that the TD check works for a case where the transition \n",
1109 |     "    # garners reward -100 and lead to a terminal state. This is in a simple one state setting \n",
1110 |     "    # where there is only one action. The state's current value estimate is 0.\n",
1111 |     "    # Note the discount and step size if you are debugging this test.\n",
1112 |     "    agent = TDAgent()\n",
1113 |     "    policy_list = np.array([[1.]])\n",
1114 |     "    agent.agent_init({\"policy\": np.array(policy_list), \"discount\": 0.99, \"step_size\": 0.1})\n",
1115 |     "    agent.values = np.array([0.])\n",
1116 |     "    agent.agent_start(0)\n",
1117 |     "    reward = -100\n",
1118 |     "    next_state = 0\n",
1119 |     "    agent.agent_end(reward)\n",
1120 |     "    assert(np.isclose(agent.values[0], -10))\n",
1121 |     "    \n",
1122 |     "test_td_updates()"
1123 |    ]
1124 |   },
1125 |   {
1126 |    "cell_type": "markdown",
1127 |    "metadata": {
1128 |     "deletable": false,
1129 |     "editable": false,
1130 |     "nbgrader": {
1131 |      "checksum": "ba2a884124c4e5c073f529f94a2bd580",
1132 |      "grade": false,
1133 |      "grade_id": "cell-3f4e7f85a3f548e9",
1134 |      "locked": true,
1135 |      "schema_version": 1,
1136 |      "solution": false
1137 |     }
1138 |    },
1139 |    "source": [
1140 |     "## Section 3. Policy Evaluation Experiments\n",
1141 |     "\n",
1142 |     "Finally, in this last part of the assignment, you will get to see the TD policy evaluation algorithm in action by looking at the estimated values, the per state value error and after the experiment is complete, the Mean Squared Value Error curve vs. episode number, summarizing how the value error changed over time.\n",
1143 |     "\n",
1144 |     "The code below runs one run of an experiment given env_info and agent_info dictionaries. A \"manager\" object is created for visualizations and is used in part for the autograder. By default, the run will be for 5000 episodes. The true_values_file is specified to compare the learned value function with the values stored in the true_values_file. Plotting of the learned value  function occurs by default after every 100 episodes. In addition, when true_values_file is specified, the value error per state and the root mean square value error will also be plotted."
1145 |    ]
1146 |   },
1147 |   {
1148 |    "cell_type": "code",
1149 |    "execution_count": null,
1150 |    "metadata": {
1151 |     "deletable": false,
1152 |     "editable": false,
1153 |     "nbgrader": {
1154 |      "checksum": "d929e2760dd07cbdd0bea18f05639d81",
1155 |      "grade": false,
1156 |      "grade_id": "cell-7b2ee7906789af2d",
1157 |      "locked": true,
1158 |      "schema_version": 1,
1159 |      "solution": false
1160 |     }
1161 |    },
1162 |    "outputs": [],
1163 |    "source": [
1164 |     "%matplotlib notebook\n",
1165 |     "\n",
1166 |     "# Work Required: No. \n",
1167 |     "def run_experiment(env_info, agent_info, \n",
1168 |     "                   num_episodes=5000,\n",
1169 |     "                   experiment_name=None,\n",
1170 |     "                   plot_freq=100,\n",
1171 |     "                   true_values_file=None,\n",
1172 |     "                   value_error_threshold=1e-8):\n",
1173 |     "    env = CliffWalkEnvironment\n",
1174 |     "    agent = TDAgent\n",
1175 |     "    rl_glue = RLGlue(env, agent)\n",
1176 |     "\n",
1177 |     "    rl_glue.rl_init(agent_info, env_info)\n",
1178 |     "\n",
1179 |     "    manager = Manager(env_info, agent_info, true_values_file=true_values_file, experiment_name=experiment_name)\n",
1180 |     "    for episode in range(1, num_episodes + 1):\n",
1181 |     "        rl_glue.rl_episode(0) # no step limit\n",
1182 |     "        if episode % plot_freq == 0:\n",
1183 |     "            values = rl_glue.agent.agent_message(\"get_values\")\n",
1184 |     "            manager.visualize(values, episode)\n",
1185 |     "\n",
1186 |     "    values = rl_glue.agent.agent_message(\"get_values\")\n",
1187 |     "    if true_values_file is not None:\n",
1188 |     "        # Grading: The Manager will check that the values computed using your TD agent match \n",
1189 |     "        # the true values (within some small allowance) across the states. In addition, it also\n",
1190 |     "        # checks whether the root mean squared value error is close to 0.\n",
1191 |     "        manager.run_tests(values, value_error_threshold)\n",
1192 |     "    \n",
1193 |     "    return values"
1194 |    ]
1195 |   },
1196 |   {
1197 |    "cell_type": "markdown",
1198 |    "metadata": {
1199 |     "deletable": false,
1200 |     "editable": false,
1201 |     "nbgrader": {
1202 |      "checksum": "4fad676a6e39768fdfe6897b2fb9032a",
1203 |      "grade": false,
1204 |      "grade_id": "cell-17e3e97164635860",
1205 |      "locked": true,
1206 |      "schema_version": 1,
1207 |      "solution": false
1208 |     }
1209 |    },
1210 |    "source": [
1211 |     "The cell below just runs a policy evaluation experiment with the determinstic optimal policy that strides just above the cliff. You should observe that the per state value error and RMSVE curve asymptotically go towards 0. The arrows in the four directions denote the probabilities of taking each action. This experiment is ungraded but should serve as a good test for the later experiments. The true values file provided for this experiment may help with debugging as well."
1212 |    ]
1213 |   },
1214 |   {
1215 |    "cell_type": "code",
1216 |    "execution_count": null,
1217 |    "metadata": {
1218 |     "deletable": false,
1219 |     "editable": false,
1220 |     "nbgrader": {
1221 |      "checksum": "5f6345cb14ab92ac88ba1653695a1cec",
1222 |      "grade": false,
1223 |      "grade_id": "cell-70b3816dea097209",
1224 |      "locked": true,
1225 |      "schema_version": 1,
1226 |      "solution": false
1227 |     }
1228 |    },
1229 |    "outputs": [],
1230 |    "source": [
1231 |     "# Do not modify this cell!\n",
1232 |     "\n",
1233 |     "env_info = {\"grid_height\": 4, \"grid_width\": 12, \"seed\": 0}\n",
1234 |     "agent_info = {\"discount\": 1, \"step_size\": 0.01, \"seed\": 0}\n",
1235 |     "\n",
1236 |     "# The Optimal Policy that strides just along the cliff\n",
1237 |     "policy = np.ones(shape=(env_info['grid_width'] * env_info['grid_height'], 4)) * 0.25\n",
1238 |     "policy[36] = [1, 0, 0, 0]\n",
1239 |     "for i in range(24, 35):\n",
1240 |     "    policy[i] = [0, 0, 0, 1]\n",
1241 |     "policy[35] = [0, 0, 1, 0]\n",
1242 |     "\n",
1243 |     "agent_info.update({\"policy\": policy})\n",
1244 |     "\n",
1245 |     "true_values_file = \"optimal_policy_value_fn.npy\"\n",
1246 |     "_ = run_experiment(env_info, agent_info, num_episodes=5000, experiment_name=\"Policy Evaluation on Optimal Policy\",\n",
1247 |     "                   plot_freq=500, true_values_file=true_values_file)"
1248 |    ]
1249 |   },
1250 |   {
1251 |    "cell_type": "code",
1252 |    "execution_count": null,
1253 |    "metadata": {
1254 |     "deletable": false,
1255 |     "nbgrader": {
1256 |      "checksum": "9b8fd1dd026cb36082aa0d6245763627",
1257 |      "grade": false,
1258 |      "grade_id": "cell-11c76882b7939d66",
1259 |      "locked": false,
1260 |      "schema_version": 1,
1261 |      "solution": true
1262 |     }
1263 |    },
1264 |    "outputs": [],
1265 |    "source": [
1266 |     "# The Safe Policy\n",
1267 |     "# Hint: Fill in the array below (as done in the previous cell) based on the safe policy illustration \n",
1268 |     "# in the environment diagram. This is the policy that strides as far as possible away from the cliff. \n",
1269 |     "# We call it a \"safe\" policy because if the environment has any stochasticity, this policy would do a good job in \n",
1270 |     "# keeping the agent from falling into the cliff (in contrast to the optimal policy shown before). \n",
1271 |     "# BOILERPLATE:\n",
1272 |     "policy = np.ones(shape=(env_info['grid_width'] * env_info['grid_height'], 4)) * 0.25\n",
1273 |     "### START CODE HERE ###\n",
1274 |     "pass\n",
1275 |     "### END CODE HERE ###"
1276 |    ]
1277 |   },
1278 |   {
1279 |    "cell_type": "code",
1280 |    "execution_count": null,
1281 |    "metadata": {
1282 |     "deletable": false,
1283 |     "editable": false,
1284 |     "nbgrader": {
1285 |      "checksum": "b67dc6aaaa345faed7768ab6c07a926c",
1286 |      "grade": true,
1287 |      "grade_id": "safe_policy",
1288 |      "locked": true,
1289 |      "points": 20,
1290 |      "schema_version": 1,
1291 |      "solution": false
1292 |     }
1293 |    },
1294 |    "outputs": [],
1295 |    "source": [
1296 |     "### AUTO-GRADER TESTS FOR POLICY EVALUATION WITH SAFE POLICY\n",
1297 |     "agent_info.update({\"policy\": policy})\n",
1298 |     "v = run_experiment(env_info, agent_info,\n",
1299 |     "               experiment_name=\"Policy Evaluation On Safe Policy\",\n",
1300 |     "               num_episodes=5000, plot_freq=500)"
1301 |    ]
1302 |   },
1303 |   {
1304 |    "cell_type": "code",
1305 |    "execution_count": null,
1306 |    "metadata": {
1307 |     "deletable": false,
1308 |     "editable": false,
1309 |     "nbgrader": {
1310 |      "checksum": "64c386534d040888bfa2fe3ba58f1370",
1311 |      "grade": false,
1312 |      "grade_id": "cell-3a8720c5685c8716",
1313 |      "locked": true,
1314 |      "schema_version": 1,
1315 |      "solution": false
1316 |     }
1317 |    },
1318 |    "outputs": [],
1319 |    "source": [
1320 |     "# Do not modify this cell!\n",
1321 |     "\n",
1322 |     "# A Near Optimal Stochastic Policy\n",
1323 |     "# Now, we try a stochastic policy that deviates a little from the optimal policy seen above. \n",
1324 |     "# This means we can get different results due to randomness.\n",
1325 |     "# We will thus average the value function estimates we get over multiple runs. \n",
1326 |     "# This can take some time, upto about 5 minutes from previous testing. \n",
1327 |     "# NOTE: The autograder will compare . Re-run this cell upon making any changes.\n",
1328 |     "\n",
1329 |     "env_info = {\"grid_height\": 4, \"grid_width\": 12}\n",
1330 |     "agent_info = {\"discount\": 1, \"step_size\": 0.01}\n",
1331 |     "\n",
1332 |     "policy = np.ones(shape=(env_info['grid_width'] * env_info['grid_height'], 4)) * 0.25\n",
1333 |     "policy[36] = [0.9, 0.1/3., 0.1/3., 0.1/3.]\n",
1334 |     "for i in range(24, 35):\n",
1335 |     "    policy[i] = [0.1/3., 0.1/3., 0.1/3., 0.9]\n",
1336 |     "policy[35] = [0.1/3., 0.1/3., 0.9, 0.1/3.]\n",
1337 |     "agent_info.update({\"policy\": policy})\n",
1338 |     "agent_info.update({\"step_size\": 0.01})"
1339 |    ]
1340 |   },
1341 |   {
1342 |    "cell_type": "code",
1343 |    "execution_count": null,
1344 |    "metadata": {
1345 |     "deletable": false,
1346 |     "editable": false,
1347 |     "nbgrader": {
1348 |      "checksum": "c900a0e55cf506cdae6e3607890bcdbd",
1349 |      "grade": true,
1350 |      "grade_id": "near_optimal_stochastic_policy",
1351 |      "locked": true,
1352 |      "points": 40,
1353 |      "schema_version": 1,
1354 |      "solution": false
1355 |     }
1356 |    },
1357 |    "outputs": [],
1358 |    "source": [
1359 |     "### AUTO-GRADER TESTS FOR POLICY EVALUATION WITH NEAR OPTIMAL STOCHASTIC POLICY (40 POINTS)\n",
1360 |     "arr = []\n",
1361 |     "from tqdm import tqdm\n",
1362 |     "for i in tqdm(range(30)):\n",
1363 |     "    env_info['seed'] = i\n",
1364 |     "    agent_info['seed'] = i\n",
1365 |     "    v = run_experiment(env_info, agent_info,\n",
1366 |     "                   experiment_name=\"Policy Evaluation On Optimal Policy\",\n",
1367 |     "                   num_episodes=5000, plot_freq=10000)\n",
1368 |     "    arr.append(v)\n",
1369 |     "average_v = np.array(arr).mean(axis=0)"
1370 |    ]
1371 |   },
1372 |   {
1373 |    "cell_type": "markdown",
1374 |    "metadata": {
1375 |     "deletable": false,
1376 |     "editable": false,
1377 |     "nbgrader": {
1378 |      "checksum": "74d80436e59bbf99333411fd5ac1675b",
1379 |      "grade": false,
1380 |      "grade_id": "cell-0276c863cb27066d",
1381 |      "locked": true,
1382 |      "schema_version": 1,
1383 |      "solution": false
1384 |     }
1385 |    },
1386 |    "source": [
1387 |     "## Wrapping Up\n",
1388 |     "Congratulations, you have completed assignment 2! In this assignment, we investigated a very useful concept for sample-based online learning: temporal difference. We particularly looked at the prediction problem where the goal is to find the value function corresponding to a given policy. In the next assignment, by learning the action-value function instead of the state-value function, you will get to see how temporal difference learning can be used in control as well."
1389 |    ]
1390 |   }
1391 |  ],
1392 |  "metadata": {
1393 |   "coursera": {
1394 |    "course_slug": "sample-based-learning-methods",
1395 |    "graded_item_id": "P4k5f",
1396 |    "launcher_item_id": "OwIbv"
1397 |   },
1398 |   "kernelspec": {
1399 |    "display_name": "Python 3",
1400 |    "language": "python",
1401 |    "name": "python3"
1402 |   },
1403 |   "language_info": {
1404 |    "codemirror_mode": {
1405 |     "name": "ipython",
1406 |     "version": 3
1407 |    },
1408 |    "file_extension": ".py",
1409 |    "mimetype": "text/x-python",
1410 |    "name": "python",
1411 |    "nbconvert_exporter": "python",
1412 |    "pygments_lexer": "ipython3",
1413 |    "version": "3.7.1"
1414 |   }
1415 |  },
1416 |  "nbformat": 4,
1417 |  "nbformat_minor": 2
1418 | }
1419 | 


--------------------------------------------------------------------------------