├── .gitignore
├── LICENSE.txt
├── README.md
└── notebooks
├── chapter2.ipynb
├── chapter3.ipynb
├── chapter4.ipynb
├── chapter5.ipynb
├── chapter5_racetrack
├── __init__.py
├── racetrack.ipynb
└── tracks.py
├── chapter6.ipynb
├── chapter7.ipynb
├── chapter8.ipynb
└── chapter9.ipynb
/.gitignore:
--------------------------------------------------------------------------------
1 | .DS_Store
2 |
3 | .ipynb_checkpoints
4 | *.mp4
5 | *.pyc
6 |
--------------------------------------------------------------------------------
/LICENSE.txt:
--------------------------------------------------------------------------------
1 | The MIT License
2 |
3 | Copyright (c) 2016 Baruch Tabanpour
4 |
5 | Permission is hereby granted, free of charge, to any person obtaining a copy
6 | of this software and associated documentation files (the "Software"), to deal
7 | in the Software without restriction, including without limitation the rights
8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 |
12 | The above copyright notice and this permission notice shall be included in
13 | all copies or substantial portions of the Software.
14 |
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
21 | THE SOFTWARE.
22 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # intro-to-rl
2 |
3 | Solutions to problem sets and code for the older version of [Reinforcement Learning: An Introduction](http://incompleteideas.net/book/bookdraft2017nov5.pdf), which I can't find online anymore.
4 |
5 | I do not guarantee correctness of the solutions presented here, although I always try to get the coding examples to match the graphs exactly as they are in the textbook. I note otherwise when I am unsure of a solution. Please contact me if you see any errors or would like to contribute to the solutions presented here as a learning resource for others.
6 |
--------------------------------------------------------------------------------
/notebooks/chapter3.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "\"For our purposes, it is convenient to place the boundary of the learning agent not at the limit of its physical body, but at the limit of its control.\" - 3.2"
8 | ]
9 | },
10 | {
11 | "cell_type": "markdown",
12 | "metadata": {
13 | "collapsed": true
14 | },
15 | "source": [
16 | "## 3.1\n",
17 | "\n",
18 | "##### Exercise 3.2\n",
19 | "\n",
20 | "Is the reinforcement learning framework adequate to usefully represent all goal-directed learning tasks? Can you think of any clear exceptions?\n",
21 | "\n",
22 | "The goal-directed task doesn't work for tasks where failure is not an option (i.e. the problem doesn't reset). One example would be learning how to walk near a cliff, where failure means death with no ability to reset the episode.\n",
23 | "\n",
24 | "Another exception would be a task in which the rewards cannot be easily reduced to a single number. For example, in teaching a car how to drive we would want to minimize amount of travel time while maximizing safetiness. With the current framework, we would have to reduce all these desirable outcomes into a single number returned by the environment. \n",
25 | "\n",
26 | "##### Exercise 3.4\n",
27 | "\n",
28 | "Suppose you treated pole-balancing as an episodic task but also used discounting, with all rewards zero except for upon failure. What then would the return be at each time? How does this return differ from that in the discounted, continuing formulation of this task?\n",
29 | "\n",
30 | "For an episodic task with the above rewards, the discounted future reward would be exactly -1 * gamma^k, where k is the time step of failure, since that is when the episode ends. In the continuing formulation task, the reward is -1 * gamma^k1 + -1 * gamma^k2, etc. for each failure.\n"
31 | ]
32 | },
33 | {
34 | "cell_type": "markdown",
35 | "metadata": {
36 | "collapsed": true
37 | },
38 | "source": [
39 | "## 3.6 - MDPs\n",
40 | "\n",
41 | "##### Exercise 3.7\n",
42 | "Assuming a finite MDP with a finite number of reward values, write an equation for the transition probabilities and the expected rewards in terms of the joint conditional distribution in (3.5) $Pr(s_{t+1} = s', r_{t+1} = r| s_t, a_t)$\n",
43 | "\n",
44 | "$P^a_{s,s'} = Pr(s_{t+1} = s' | s_t = s, a_t = a) = \\sum_r Pr(s_{t+1} = s', r_{t+1} = r| s_t, a_t) $\n",
45 | "\n",
46 | "$R^a_{s, s'} = E[r_{t+1} | s_t = s, a_t = a, s_{t+1} = s'] = \\sum_r r \\cdot Pr(s_{t+1} = s', r_{t+1} = r| s_t, a_t)$\n",
47 | "\n"
48 | ]
49 | },
50 | {
51 | "cell_type": "markdown",
52 | "metadata": {
53 | "collapsed": true
54 | },
55 | "source": [
56 | "## 3.7 - Value Functions\n",
57 | "\n",
58 | "##### Exercise 3.8\n",
59 | "What is the Bellman equation for action values?\n",
60 | "\n",
61 | "$\n",
62 | "\\begin{equation}\n",
63 | "\\begin{split}\n",
64 | "Q^{\\pi}(s, a) &= E_{\\pi} [R_t | s_t = s, a_t = a]\\\\\n",
65 | "=& E_{\\pi} [\\sum_{k=0}^{\\infty} \\gamma^k r_{t + k + 1} | s_t = s, a_t = a ]\\\\\n",
66 | "=& E_{\\pi} [r_{t + 1} + \\gamma \\sum_{k=0}^{\\infty} \\gamma^k r_{t + k + 2} | s_t = s, a_t = a]\\\\\n",
67 | "=& \\sum_{s'} P^a_{ss'} R^a_{ss'} + \\sum_{s'} P^a_{ss'} \\gamma \\sum_{k=0}^{\\infty} \\gamma^k r_{t + k + 2}\\\\\n",
68 | "=& \\sum_{s'} P^a_{ss'} [R^a_{ss'} + \\gamma V^{\\pi}(s')]\\\\\n",
69 | "=& \\sum_{s'} P^a_{ss'} [R^a_{ss'} + \\gamma \\sum_{a'} \\pi(s', a') Q^{\\pi}(s', a')]\\\\\n",
70 | "\\end{split}\n",
71 | "\\end{equation}\n",
72 | "$\n",
73 | "\n",
74 | "\n",
75 | "##### Exercise 3.9\n",
76 | "The Bellman equation (3.10) must hold for each state for the value function shown in Figure 3.5b. As an example, show numerically that this equation holds for the center state, valued at 0.7 ,\n",
77 | "\n",
78 | "$V^{\\pi}(s_{mid}) = \\frac{1}{4} [(0 + .9 * .7) + (0 + 0.9 * 2.3) + (0 + .9 * .4) + (0 - .9 * .4)] = .675 \\approx .7$\n",
79 | "\n",
80 | "\n",
81 | "\n",
82 | "\n",
83 | "\n",
84 | "\n"
85 | ]
86 | },
87 | {
88 | "cell_type": "markdown",
89 | "metadata": {},
90 | "source": [
91 | "##### Exercise 3.10\n",
92 | "In the gridworld example, rewards are positive for goals, negative for running into the edge of the world, and zero the rest of the time. Are the signs of these rewards important, or only the intervals between them? Prove, using (3.2), that adding a constant $C$ to all the rewards adds a constant, $K$, to the values of all states, and thus does not affect the relative values of any states under any policies. What is $K$ in terms of $C$ and $\\gamma$?\n",
93 | "\n",
94 | "We have from previously that $V^{\\pi}(s) = E_{\\pi} [\\sum_{k=0}^{\\infty} \\gamma^k r_{t+k+1} | s_t = s]$.\n",
95 | "\n",
96 | "Let's add a constant $C$ to all rewards so that $r'_{t+1} = r_{t+1} + C$.\n",
97 | "\n",
98 | "Then, $$V'^{\\pi}(s) = E_{\\pi} [\\sum_{k=0}^{\\infty} \\gamma^k (r_{t+k+1} + C) | s_t = s]$$\n",
99 | "$$ = V^{\\pi}(s) + E_{\\pi}[ \\sum_{k=0}^{\\infty} \\gamma^k C | s_t = s ]$$\n",
100 | "$$ = V^{\\pi}(s) + K $$\n",
101 | "\n",
102 | "We have, \n",
103 | "\n",
104 | "$\n",
105 | "\\begin{equation}\n",
106 | "\\begin{split}\n",
107 | "K =& E_{\\pi}[\\sum_{k=0}^{\\infty} \\gamma^k C | s_t = s]\\\\\n",
108 | "=& \\sum_a \\pi(s, a) \\sum_s' P^a_{ss'} (C \\frac{1}{1 - \\gamma})\\\\\n",
109 | "=& \\frac{C}{1 - \\gamma}\n",
110 | "\\end{split}\n",
111 | "\\end{equation}\n",
112 | "$"
113 | ]
114 | },
115 | {
116 | "cell_type": "markdown",
117 | "metadata": {},
118 | "source": [
119 | "##### Exercise 3.11 \n",
120 | "\n",
121 | "Now consider adding a constant C to all the rewards in an episodic task, such as maze running. Would this have any effect, or would it leave the task unchanged as in the continuing task above? Why or why not? Give an example.\n",
122 | " \n",
123 | "For episodic tasks we have,\n",
124 | "\n",
125 | "$$K = \\sum_a \\pi(s, a) \\sum_s' P^a_{ss'} \\sum_k^{N}(C \\gamma^k)$$\n",
126 | "\n",
127 | "where $N$ is the number of time steps until the episode is over. It seems like the reward is maximized as N goes to infiinity, so adding a constant reward to all actions in an episodic task makes the agent never want to end interacting with the environment to avoid finishing the episode.\n",
128 | "\n",
129 | "\n",
130 | " "
131 | ]
132 | },
133 | {
134 | "cell_type": "markdown",
135 | "metadata": {},
136 | "source": [
137 | "##### Exercise 3.12\n",
138 | "\n",
139 | "The value of a state depends on the the values of the actions possible in that state and on how likely each action is to be taken under the current policy. We can think of this in terms of a small backup diagram rooted at the state and considering each possible action:\n",
140 | "\n",
141 | "$\n",
142 | "\\begin{equation}\n",
143 | "\\begin{split}\n",
144 | "V^{\\pi} (s) =& E_{\\pi} [Q^{\\pi} (s, a) | s_t = s]\\\\\n",
145 | "=& \\sum_a \\pi(s, a) Q^{\\pi} (s, a)\n",
146 | "\\end{split}\n",
147 | "\\end{equation}\n",
148 | "$\n",
149 | "\n"
150 | ]
151 | },
152 | {
153 | "cell_type": "markdown",
154 | "metadata": {},
155 | "source": [
156 | "##### Exercise 3.13\n",
157 | "\n",
158 | "The value of an action, , can be divided into two parts, the expected next reward, which does not depend on the policy , and the expected sum of the remaining rewards, which depends on the next state and the policy. Again we can think of this in terms of a small backup diagram, this one rooted at an action (state-action pair) and branching to the possible next states:\n",
159 | "\n",
160 | "$\n",
161 | "\\begin{equation}\n",
162 | "\\begin{split}\n",
163 | "Q^{\\pi} (s, a) =& E[ r_{t+1} | s_t = s, a_t = a] + \\gamma E_{\\pi}[ V^{\\pi}(s') | s_t = s, a_t = a] \\\\\n",
164 | "Q^{\\pi} (s, a) =& \\sum_{s'} P^a_{ss'} [R_{ss'} + \\gamma V^{\\pi}(s')]\n",
165 | "\\end{split}\n",
166 | "\\end{equation}\n",
167 | "$"
168 | ]
169 | },
170 | {
171 | "cell_type": "markdown",
172 | "metadata": {},
173 | "source": [
174 | "## 3.8 - Optimal Value Functions\n",
175 | "\n",
176 | "##### 3.16\n",
177 | "Give the Bellman equation for for the recycling robot.\n",
178 | "s = search, w = wait, re = recharge\n",
179 | "h = high, l = low\n",
180 | "\n",
181 | "$Q^*(s, a) = \\sum_{s'} P^a_{s, s'} [R^a_{s,s'} + \\gamma max_{a'} Q^*(s', a')]$\n",
182 | "\n",
183 | "$Q^*(h, s) = \\alpha (R^s + \\gamma max_{a'} Q^*(h, a')) + (1 - \\alpha) (R^s + \\gamma max_{a'} Q^*(l, a'))$\n",
184 | "\n",
185 | "$Q^*(l, s) = \\beta (R^s + \\gamma max_{a'} Q^*(l, a')) + (1 - \\beta) (-3 + \\gamma max_{a'} Q^*(h, a'))$\n",
186 | "\n",
187 | "$Q^*(l, w) = (R^w + \\gamma max_{a'} Q^*(l, a'))$\n",
188 | "\n",
189 | "$Q^*(h, w) = (R^w + \\gamma max_{a'} Q^*(h, a'))$\n",
190 | "\n",
191 | "$Q^*(l, r) = (0 + \\gamma max_{a'} Q^*(h, a'))$\n",
192 | "\n",
193 | "$Q^*(h, r) = 0$\n",
194 | "\n",
195 | "\n",
196 | "##### 3.17 \n",
197 | "Figure 3.8 gives the optimal value of the best state of the gridworld as 24.4, to one decimal place. Use your knowledge of the optimal policy and (3.2) to express this value symbolically, and then to compute it to three decimal places.\n",
198 | "\n",
199 | "\n",
200 | "$V^*(s) = max_{a \\in A(s)} \\sum_{s'} P^a_{s,s'}[R^a_{s, s'} + \\gamma V^*(s')]$\n",
201 | "\n",
202 | "So we have,\n",
203 | "\n",
204 | "$V^*(s) = max_a E[r_{t+1} + \\gamma \\sum_{k=0}^{\\infty} \\gamma^k r_{t + k+2} | s_t=s, s_t = a]$\n",
205 | "\n",
206 | "$ = max_a [10 + \\gamma \\cdot 0 + \\gamma^2 \\cdot 0 + \\gamma^3 \\cdot 0 + \\gamma^4 + \\gamma^5 \\cdot 10 + ...]$\n",
207 | "\n",
208 | "$ = 10(\\sum_{k=0}^\\infty \\gamma^{5k})$\n",
209 | "\n",
210 | "$ = 10 \\frac{1}{1 - \\gamma^5} \\approx 24.419$\n",
211 | "\n",
212 | "\n",
213 | "\n"
214 | ]
215 | },
216 | {
217 | "cell_type": "markdown",
218 | "metadata": {},
219 | "source": [
220 | "# Cart-pole balancing"
221 | ]
222 | },
223 | {
224 | "cell_type": "markdown",
225 | "metadata": {},
226 | "source": [
227 | "### Neuron-like adaptive elements\n",
228 | "\n",
229 | "https://webdocs.cs.ualberta.ca/~sutton/papers/barto-sutton-anderson-83.pdf\n",
230 | "\n",
231 | "https://webdocs.cs.ualberta.ca/~sutton/book/code/pole.c"
232 | ]
233 | },
234 | {
235 | "cell_type": "code",
236 | "execution_count": 1,
237 | "metadata": {
238 | "collapsed": false
239 | },
240 | "outputs": [],
241 | "source": [
242 | "# https://raw.githubusercontent.com/stober/cartpole/master/src/__init__.py\n",
243 | "\n",
244 | "import numpy as np\n",
245 | "import math\n",
246 | "import random\n",
247 | "\n",
248 | "class CartPole(object):\n",
249 | "\n",
250 | " def __init__(self):\n",
251 | "\n",
252 | " # Constants\n",
253 | " self.gravity = 9.8\n",
254 | " self.masscart = 1.0\n",
255 | " self.masspole = 0.1\n",
256 | " self.total_mass = (self.masspole + self.masscart)\n",
257 | " self.length = 0.5 # actually half the pole's length\n",
258 | " self.polemass_length = (self.masspole * self.length)\n",
259 | " self.force_mag = 10.0\n",
260 | " self.tau = 0.02 # seconds between state updates\n",
261 | " self.fourthirds = 1.3333333333333\n",
262 | "\n",
263 | " def cart_pole(self, action, x = 0.0, xdot = 0.0, theta = 0.0, thetadot = 0.0):\n",
264 | " # action must be binary\n",
265 | " \n",
266 | " force = self.force_mag if action > 0 else -self.force_mag\n",
267 | "\n",
268 | " costheta = math.cos(theta)\n",
269 | " sintheta = math.sin(theta)\n",
270 | "\n",
271 | " tmp = (force + self.polemass_length * (thetadot ** 2) * sintheta) / self.total_mass\n",
272 | " thetaacc = (self.gravity * sintheta - costheta * tmp) / (self.length * (self.fourthirds - self.masspole * costheta ** 2 / self.total_mass))\n",
273 | " xacc = tmp - self.polemass_length * thetaacc * costheta / self.total_mass\n",
274 | "\n",
275 | " \n",
276 | " # Update the four state variables, using Euler's method\n",
277 | " x += self.tau * xdot\n",
278 | " xdot += self.tau * xacc\n",
279 | " theta += self.tau * thetadot\n",
280 | " thetadot += self.tau * thetaacc\n",
281 | " \n",
282 | " return [x, xdot, theta, thetadot]\n",
283 | "\n",
284 | " def get_box(self, x = 0.0, xdot = 0.0, theta = 0.0, thetadot = 0.0):\n",
285 | " # get_box: Given the current state, returns a number from 1 to 162\n",
286 | " # designating the region of the state space encompassing the current state.\n",
287 | " # Returns a value of -1 if a failure state is encountered.\n",
288 | " one_degree = 0.0174532 # 2pi/360\n",
289 | " six_degrees = 0.1047192\n",
290 | " twelve_degrees = 0.2094384\n",
291 | " fifty_degrees = 0.87266\n",
292 | "\n",
293 | " if (x < -2.4 or x > 2.4) or (theta < -twelve_degrees or theta > twelve_degrees):\n",
294 | " return -1\n",
295 | "\n",
296 | " if x < -0.8:\n",
297 | " box = 0\n",
298 | " elif x < 0.8:\n",
299 | " box = 1\n",
300 | " else:\n",
301 | " box = 2\n",
302 | "\n",
303 | " if xdot < -0.5:\n",
304 | " pass\n",
305 | " elif xdot < 0.5:\n",
306 | " box += 3\n",
307 | " else:\n",
308 | " box += 6\n",
309 | "\n",
310 | " if theta < -six_degrees:\n",
311 | " pass\n",
312 | " elif theta < -one_degree:\n",
313 | " box += 9\n",
314 | " elif theta < 0:\n",
315 | " box += 18\n",
316 | " elif theta < one_degree:\n",
317 | " box += 27\n",
318 | " elif theta < six_degrees:\n",
319 | " box += 36\n",
320 | " else:\n",
321 | " box += 45\n",
322 | "\n",
323 | " if thetadot < -fifty_degrees:\n",
324 | " pass\n",
325 | " elif thetadot < fifty_degrees:\n",
326 | " box += 54\n",
327 | " else:\n",
328 | " box += 108\n",
329 | "\n",
330 | " return box\n",
331 | " \n",
332 | " def prob_push_right(self, s):\n",
333 | " return (1.0 / (1.0 + np.exp(-max(-50.0, min(s, 50.0)))))\n",
334 | "\n"
335 | ]
336 | },
337 | {
338 | "cell_type": "code",
339 | "execution_count": 17,
340 | "metadata": {
341 | "collapsed": true
342 | },
343 | "outputs": [
344 | {
345 | "name": "stdout",
346 | "output_type": "stream",
347 | "text": [
348 | "Trial 1 was 17 steps.\n",
349 | "\n",
350 | "Trial 2 was 14 steps.\n",
351 | "\n",
352 | "Trial 3 was 10 steps.\n",
353 | "\n",
354 | "Trial 4 was 10 steps.\n",
355 | "\n",
356 | "Trial 5 was 17 steps.\n",
357 | "\n",
358 | "Trial 6 was 71 steps.\n",
359 | "\n",
360 | "Trial 7 was 23 steps.\n",
361 | "\n",
362 | "Trial 8 was 26 steps.\n",
363 | "\n",
364 | "Trial 9 was 88 steps.\n",
365 | "\n",
366 | "Trial 10 was 39 steps.\n",
367 | "\n",
368 | "Trial 11 was 15 steps.\n",
369 | "\n",
370 | "Trial 12 was 43 steps.\n",
371 | "\n",
372 | "Trial 13 was 121 steps.\n",
373 | "\n",
374 | "Trial 14 was 30 steps.\n",
375 | "\n",
376 | "Trial 15 was 28 steps.\n",
377 | "\n",
378 | "Trial 16 was 171 steps.\n",
379 | "\n",
380 | "Trial 17 was 193 steps.\n",
381 | "\n",
382 | "Trial 18 was 210 steps.\n",
383 | "\n",
384 | "Trial 19 was 222 steps.\n",
385 | "\n",
386 | "Trial 20 was 185 steps.\n",
387 | "\n",
388 | "Trial 21 was 207 steps.\n",
389 | "\n",
390 | "Trial 22 was 99 steps.\n",
391 | "\n",
392 | "Trial 23 was 85 steps.\n",
393 | "\n",
394 | "Trial 24 was 185 steps.\n",
395 | "\n",
396 | "Trial 25 was 213 steps.\n",
397 | "\n",
398 | "Trial 26 was 230 steps.\n",
399 | "\n",
400 | "Trial 27 was 283 steps.\n",
401 | "\n",
402 | "Trial 28 was 281 steps.\n",
403 | "\n",
404 | "Trial 29 was 234 steps.\n",
405 | "\n",
406 | "Trial 30 was 292 steps.\n",
407 | "\n",
408 | "Trial 31 was 732 steps.\n",
409 | "\n",
410 | "Trial 32 was 455 steps.\n",
411 | "\n",
412 | "Trial 33 was 321 steps.\n",
413 | "\n",
414 | "Trial 34 was 264 steps.\n",
415 | "\n",
416 | "Trial 35 was 380 steps.\n",
417 | "\n",
418 | "Trial 36 was 294 steps.\n",
419 | "\n",
420 | "Trial 37 was 318 steps.\n",
421 | "\n",
422 | "Trial 38 was 241 steps.\n",
423 | "\n",
424 | "Trial 39 was 1554 steps.\n",
425 | "\n",
426 | "Trial 40 was 1792 steps.\n",
427 | "\n",
428 | "Trial 41 was 2523 steps.\n",
429 | "\n",
430 | "Trial 42 was 1555 steps.\n",
431 | "\n",
432 | "Trial 43 was 56268 steps.\n",
433 | "\n",
434 | "Trial 44 was 12440 steps.\n",
435 | "\n",
436 | "Trial 45 was 84286 steps.\n",
437 | "\n",
438 | "Pole balanced successfully for at least 100000 steps\n",
439 | "\n"
440 | ]
441 | }
442 | ],
443 | "source": [
444 | "cp = CartPole()\n",
445 | "\n",
446 | "N_BOXES = 162\n",
447 | "MAX_FAILURES = 100\n",
448 | "MAX_STEPS = 100000\n",
449 | "LAMBDAw = 0.9 #/* Decay rate for w eligibility trace. */\n",
450 | "LAMBDAv = 0.8 # /* Decay rate for v eligibility trace. */\n",
451 | "GAMMA = 0.95 # /* Discount factor for critic. */\n",
452 | "ALPHA = 1000 # /* Learning rate for action weights, w. */\n",
453 | "BETA = 0.5 # /* Learning rate for critic weights, v. */\n",
454 | "\n",
455 | "failures = steps = 0\n",
456 | "w = []; v = []; xbar = []; e = []\n",
457 | "\n",
458 | "# Initialize action and heuristic critic weights and traces.\n",
459 | "for i in range(N_BOXES):\n",
460 | " w.append(0.0); v.append(0.0); xbar.append(0.0); e.append(0.0)\n",
461 | "\n",
462 | "# Starting state is (0 0 0 0)\n",
463 | "x = x_dot = theta = theta_dot = 0.0\n",
464 | "\n",
465 | "# Find box in state space containing start state\n",
466 | "box = cp.get_box(x, x_dot, theta, theta_dot)\n",
467 | "\n",
468 | "# Iterate through the action-learn loop. ---*/\n",
469 | "while steps < MAX_STEPS and failures < MAX_FAILURES:\n",
470 | " # Choose action randomly, biased by current weight\n",
471 | " y = (random.uniform(0, 1) < cp.prob_push_right(w[box]))\n",
472 | " \n",
473 | " # Update traces.\n",
474 | " e[box] += (1.0 - LAMBDAw) * (y - 0.5)\n",
475 | " xbar[box] += (1.0 - LAMBDAv)\n",
476 | " \n",
477 | " # Remember prediction of failure for current stat\n",
478 | " oldp = v[box]\n",
479 | " \n",
480 | " # /*--- Apply action to the simulated cart-pole ---*/\n",
481 | " x, x_dot, theta, theta_dot = cp.cart_pole(y, x, x_dot, theta, theta_dot)\n",
482 | " \n",
483 | " # /*--- Get box of state space containing the resulting state. ---*/\n",
484 | " box = cp.get_box(x, x_dot, theta, theta_dot)\n",
485 | " \n",
486 | " if box < 0:\n",
487 | " # /*--- Failure occurred. ---*/\n",
488 | " failed = 1\n",
489 | " failures += 1\n",
490 | " print(\"Trial %d was %d steps.\\n\" %(failures, steps))\n",
491 | " steps = 0\n",
492 | "\n",
493 | " # /*--- Reset state to (0 0 0 0). Find the box. ---*/\n",
494 | " x = x_dot = theta = theta_dot = 0.0\n",
495 | " box = cp.get_box(x, x_dot, theta, theta_dot)\n",
496 | "\n",
497 | " # /*--- Reinforcement upon failure is -1. Prediction of failure is 0. ---*/\n",
498 | " r = -1.0\n",
499 | " p = 0.0\n",
500 | " else:\n",
501 | " # /*--- Not a failure. ---*/\n",
502 | " failed = 0\n",
503 | " \n",
504 | " # /*--- Reinforcement is 0. Prediction of failure given by v weight. ---*/\n",
505 | " r = 0\n",
506 | " p = v[box]\n",
507 | " \n",
508 | " steps += 1\n",
509 | " \n",
510 | "\n",
511 | " # /*--- Heuristic reinforcement is: current reinforcement\n",
512 | " # + gamma * new failure prediction - previous failure prediction ---*/\n",
513 | " rhat = r + GAMMA * p - oldp\n",
514 | "\n",
515 | " for i in range(N_BOXES):\n",
516 | " # /*--- Update all weights. ---*/\n",
517 | " \n",
518 | " w[i] += ALPHA * rhat * e[i]\n",
519 | " v[i] += BETA * rhat * xbar[i]\n",
520 | " \n",
521 | " if v[i] < -1.0:\n",
522 | " v[i] = v[i]\n",
523 | " \n",
524 | " if failed == 1:\n",
525 | " #/*--- If failure, zero all traces. ---*/\n",
526 | " e[i] = 0.0\n",
527 | " xbar[i] = 0.0\n",
528 | " else:\n",
529 | " # /*--- Otherwise, update (decay) the traces. ---*/\n",
530 | " e[i] *= LAMBDAw\n",
531 | " xbar[i] *= LAMBDAv\n",
532 | "\n",
533 | "if (failures == MAX_FAILURES):\n",
534 | " print(\"Pole not balanced. Stopping after %d failures.\" %(failures))\n",
535 | "else:\n",
536 | " print(\"Pole balanced successfully for at least %d steps\\n\" %(steps))\n",
537 | " \n",
538 | " \n"
539 | ]
540 | },
541 | {
542 | "cell_type": "code",
543 | "execution_count": 10,
544 | "metadata": {
545 | "collapsed": false
546 | },
547 | "outputs": [
548 | {
549 | "name": "stderr",
550 | "output_type": "stream",
551 | "text": [
552 | "[2016-07-19 01:55:07,269] Making new env: CartPole-v0\n"
553 | ]
554 | },
555 | {
556 | "name": "stdout",
557 | "output_type": "stream",
558 | "text": [
559 | "Trial 1 was 20 steps.\n",
560 | "\n",
561 | "Trial 2 was 12 steps.\n",
562 | "\n",
563 | "Trial 3 was 9 steps.\n",
564 | "\n",
565 | "Trial 4 was 155 steps.\n",
566 | "\n",
567 | "Trial 5 was 16 steps.\n",
568 | "\n",
569 | "Trial 6 was 103 steps.\n",
570 | "\n",
571 | "Trial 7 was 34 steps.\n",
572 | "\n",
573 | "Trial 8 was 38 steps.\n",
574 | "\n",
575 | "Trial 9 was 12 steps.\n",
576 | "\n",
577 | "Trial 10 was 27 steps.\n",
578 | "\n",
579 | "Trial 11 was 10 steps.\n",
580 | "\n",
581 | "Trial 12 was 58 steps.\n",
582 | "\n",
583 | "Trial 13 was 188 steps.\n",
584 | "\n",
585 | "Trial 14 was 144 steps.\n",
586 | "\n",
587 | "Trial 15 was 186 steps.\n",
588 | "\n",
589 | "Trial 16 was 101 steps.\n",
590 | "\n",
591 | "Trial 17 was 122 steps.\n",
592 | "\n",
593 | "Trial 18 was 40 steps.\n",
594 | "\n",
595 | "Trial 19 was 26 steps.\n",
596 | "\n",
597 | "Trial 20 was 48 steps.\n",
598 | "\n",
599 | "Trial 21 was 16 steps.\n",
600 | "\n",
601 | "Trial 22 was 11 steps.\n",
602 | "\n",
603 | "Trial 23 was 714 steps.\n",
604 | "\n",
605 | "Trial 24 was 412 steps.\n",
606 | "\n",
607 | "Trial 25 was 620 steps.\n",
608 | "\n",
609 | "Trial 26 was 470 steps.\n",
610 | "\n",
611 | "Trial 27 was 580 steps.\n",
612 | "\n",
613 | "Trial 28 was 498 steps.\n",
614 | "\n",
615 | "Trial 29 was 523 steps.\n",
616 | "\n",
617 | "Trial 30 was 736 steps.\n",
618 | "\n",
619 | "Trial 31 was 597 steps.\n",
620 | "\n",
621 | "Trial 32 was 825 steps.\n",
622 | "\n",
623 | "Trial 33 was 759 steps.\n",
624 | "\n",
625 | "Trial 34 was 342 steps.\n",
626 | "\n",
627 | "Trial 35 was 482 steps.\n",
628 | "\n",
629 | "Trial 36 was 598 steps.\n",
630 | "\n",
631 | "Trial 37 was 601 steps.\n",
632 | "\n",
633 | "Trial 38 was 614 steps.\n",
634 | "\n",
635 | "Trial 39 was 868 steps.\n",
636 | "\n",
637 | "Trial 40 was 1189 steps.\n",
638 | "\n",
639 | "Trial 41 was 1358 steps.\n",
640 | "\n",
641 | "Trial 42 was 427 steps.\n",
642 | "\n",
643 | "Trial 43 was 785 steps.\n",
644 | "\n",
645 | "Trial 44 was 387 steps.\n",
646 | "\n",
647 | "Trial 45 was 287 steps.\n",
648 | "\n",
649 | "Trial 46 was 3246 steps.\n",
650 | "\n",
651 | "Trial 47 was 10808 steps.\n",
652 | "\n",
653 | "Trial 48 was 15820 steps.\n",
654 | "\n",
655 | "Trial 49 was 22176 steps.\n",
656 | "\n",
657 | "Trial 50 was 4548 steps.\n",
658 | "\n",
659 | "Trial 51 was 11423 steps.\n",
660 | "\n",
661 | "Trial 52 was 4957 steps.\n",
662 | "\n",
663 | "Trial 53 was 37452 steps.\n",
664 | "\n",
665 | "Trial 54 was 6652 steps.\n",
666 | "\n",
667 | "Pole balanced successfully for at least 100000 steps\n",
668 | "\n"
669 | ]
670 | }
671 | ],
672 | "source": [
673 | "# Us AI Gym\n",
674 | "\n",
675 | "import gym\n",
676 | "import random\n",
677 | "env = gym.make('CartPole-v0')\n",
678 | "\n",
679 | "cp = CartPole()\n",
680 | "N_BOXES = 162\n",
681 | "MAX_FAILURES = 100\n",
682 | "MAX_STEPS = 100000\n",
683 | "LAMBDAw = 0.9 #/* Decay rate for w eligibility trace. */\n",
684 | "LAMBDAv = 0.8 # /* Decay rate for v eligibility trace. */\n",
685 | "GAMMA = 0.95 # /* Discount factor for critic. */\n",
686 | "ALPHA = 1000 # /* Learning rate for action weights, w. */\n",
687 | "BETA = 0.5 # /* Learning rate for critic weights, v. */\n",
688 | "\n",
689 | "failures = steps = 0\n",
690 | "w = []; v = []; xbar = []; e = []\n",
691 | "\n",
692 | "# Initialize action and heuristic critic weights and traces.\n",
693 | "for i in range(N_BOXES):\n",
694 | " w.append(0.0); v.append(0.0); xbar.append(0.0); e.append(0.0)\n",
695 | "\n",
696 | "# Starting state is (0 0 0 0)\n",
697 | "observation = env.reset()\n",
698 | "x, x_dot, theta, theta_dot = observation\n",
699 | "\n",
700 | "# Find box in state space containing start state\n",
701 | "box = cp.get_box(x, x_dot, theta, theta_dot)\n",
702 | "\n",
703 | "# Iterate through the action-learn loop. ---*/\n",
704 | "while steps < MAX_STEPS and failures < MAX_FAILURES:\n",
705 | " env.render()\n",
706 | " \n",
707 | " # Choose action randomly, biased by current weight\n",
708 | " y = (random.uniform(0, 1) < cp.prob_push_right(w[box]))\n",
709 | " \n",
710 | " # Update traces.\n",
711 | " e[box] += (1.0 - LAMBDAw) * (y - 0.5)\n",
712 | " xbar[box] += (1.0 - LAMBDAv)\n",
713 | " \n",
714 | " # Remember prediction of failure for current stat\n",
715 | " oldp = v[box]\n",
716 | " \n",
717 | " # /*--- Apply action to the simulated cart-pole ---*/\n",
718 | " observation, reward, done, info = env.step(y)\n",
719 | " x, x_dot, theta, theta_dot = observation\n",
720 | " \n",
721 | " # /*--- Get box of state space containing the resulting state. ---*/\n",
722 | " box = cp.get_box(x, x_dot, theta, theta_dot)\n",
723 | " \n",
724 | " if done:\n",
725 | " # /*--- Failure occurred. ---*/\n",
726 | " failed = 1\n",
727 | " failures += 1\n",
728 | " print(\"Trial %d was %d steps.\\n\" %(failures, steps))\n",
729 | " steps = 0\n",
730 | "\n",
731 | " # /*--- Reset state to (0 0 0 0). Find the box. ---*/\n",
732 | " observation = env.reset()\n",
733 | " x, x_dot, theta, theta_dot = observation\n",
734 | " box = cp.get_box(x, x_dot, theta, theta_dot)\n",
735 | "\n",
736 | " # /*--- Reinforcement upon failure is -1. Prediction of failure is 0. ---*/\n",
737 | " r = -1\n",
738 | " p = 0.0\n",
739 | " else:\n",
740 | " # /*--- Not a failure. ---*/\n",
741 | " failed = 0\n",
742 | " \n",
743 | " # /*--- Reinforcement is 0. Prediction of failure given by v weight. ---*/\n",
744 | " r = 0\n",
745 | " p = v[box]\n",
746 | " \n",
747 | " steps += 1\n",
748 | " \n",
749 | "\n",
750 | " # /*--- Heuristic reinforcement is: current reinforcement\n",
751 | " # + gamma * new failure prediction - previous failure prediction ---*/\n",
752 | " rhat = r + GAMMA * p - oldp\n",
753 | "\n",
754 | " for i in range(N_BOXES):\n",
755 | " # /*--- Update all weights. ---*/\n",
756 | " \n",
757 | " w[i] += ALPHA * rhat * e[i]\n",
758 | " v[i] += BETA * rhat * xbar[i]\n",
759 | " \n",
760 | " if v[i] < -1.0:\n",
761 | " v[i] = v[i]\n",
762 | " \n",
763 | " if failed == 1:\n",
764 | " #/*--- If failure, zero all traces. ---*/\n",
765 | " e[i] = 0.0\n",
766 | " xbar[i] = 0.0\n",
767 | " else:\n",
768 | " # /*--- Otherwise, update (decay) the traces. ---*/\n",
769 | " e[i] *= LAMBDAw\n",
770 | " xbar[i] *= LAMBDAv\n",
771 | "\n",
772 | "if (failures == MAX_FAILURES):\n",
773 | " print(\"Pole not balanced. Stopping after %d failures.\" %(failures))\n",
774 | "else:\n",
775 | " print(\"Pole balanced successfully for at least %d steps\\n\" %(steps))\n",
776 | " "
777 | ]
778 | },
779 | {
780 | "cell_type": "code",
781 | "execution_count": null,
782 | "metadata": {
783 | "collapsed": true
784 | },
785 | "outputs": [],
786 | "source": []
787 | },
788 | {
789 | "cell_type": "markdown",
790 | "metadata": {},
791 | "source": [
792 | "### Simulated Annealing\n",
793 | "\n",
794 | "https://gym.openai.com/evaluations/eval_7PjfKVdYRMuKyahjgTbrA#reproducibility\n"
795 | ]
796 | },
797 | {
798 | "cell_type": "code",
799 | "execution_count": 1,
800 | "metadata": {
801 | "collapsed": false
802 | },
803 | "outputs": [
804 | {
805 | "name": "stderr",
806 | "output_type": "stream",
807 | "text": [
808 | "[2016-07-19 01:43:02,117] Making new env: CartPole-v0\n"
809 | ]
810 | },
811 | {
812 | "name": "stdout",
813 | "output_type": "stream",
814 | "text": [
815 | "test: [-0.47 +0.06 +0.29 -0.32] 85 best: [-0.47 +0.06 +0.29 -0.32] 85 0.9\n",
816 | "test: [-0.81 -0.11 +0.70 -0.55] 88 best: [-0.81 -0.11 +0.70 -0.55] 88 0.81\n",
817 | "test: [-0.49 -0.13 +0.60 -0.53] 85 best: [-0.81 -0.11 +0.70 -0.55] 88 0.81\n",
818 | "test: [-0.98 -0.30 +1.09 -0.47] 88 best: [-0.81 -0.11 +0.70 -0.55] 88 0.81\n",
819 | "test: [-0.50 +0.21 +0.46 -0.53] 86 best: [-0.81 -0.11 +0.70 -0.55] 88 0.81\n",
820 | "test: [-0.59 -0.15 +0.57 -0.91] 84 best: [-0.81 -0.11 +0.70 -0.55] 88 0.81\n",
821 | "test: [-1.01 -0.45 +0.43 -0.57] 83 best: [-0.81 -0.11 +0.70 -0.55] 88 0.81\n",
822 | "test: [-0.68 +0.05 +0.43 -0.18] 82 best: [-0.81 -0.11 +0.70 -0.55] 88 0.81\n",
823 | "test: [-1.05 +0.07 +0.80 -0.15] 88 best: [-0.81 -0.11 +0.70 -0.55] 88 0.81\n",
824 | "test: [-0.97 -0.07 +1.00 -0.36] 83 best: [-0.81 -0.11 +0.70 -0.55] 88 0.81\n",
825 | "test: [-0.82 +0.15 +0.85 -0.47] 88 best: [-0.81 -0.11 +0.70 -0.55] 88 0.81\n",
826 | "test: [-0.91 +0.29 +0.65 -0.24] 83 best: [-0.81 -0.11 +0.70 -0.55] 88 0.81\n",
827 | "test: [-0.96 -0.18 +0.34 -0.44] 86 best: [-0.81 -0.11 +0.70 -0.55] 88 0.81\n",
828 | "test: [-0.66 -0.18 +0.40 -0.34] 86 best: [-0.81 -0.11 +0.70 -0.55] 88 0.81\n",
829 | "test: [-1.11 +0.17 +0.99 -0.32] 89 best: [-1.11 +0.17 +0.99 -0.32] 89 0.729\n",
830 | "test: [-1.16 +0.43 +0.99 -0.02] 87 best: [-1.11 +0.17 +0.99 -0.32] 89 0.729\n",
831 | "test: [-0.80 -0.11 +0.79 -0.03] 543 best: [-0.80 -0.11 +0.79 -0.03] 543 0.6561\n",
832 | "test: [-0.91 -0.26 +0.93 -0.21] 317 best: [-0.80 -0.11 +0.79 -0.03] 543 0.6561\n",
833 | "test: [-0.99 -0.32 +0.67 +0.28] 618 best: [-0.99 -0.32 +0.67 +0.28] 618 0.59049\n",
834 | "test: [-0.72 -0.45 +0.39 +0.03] 560 best: [-0.99 -0.32 +0.67 +0.28] 618 0.59049\n",
835 | "test: [-1.24 -0.19 +0.53 +0.31] 856 best: [-1.24 -0.19 +0.53 +0.31] 856 0.531441\n",
836 | "test: [-1.36 -0.41 +0.77 +0.15] 727 best: [-1.24 -0.19 +0.53 +0.31] 856 0.531441\n",
837 | "test: [-1.04 -0.13 +0.51 +0.25] 610 best: [-1.24 -0.19 +0.53 +0.31] 856 0.531441\n",
838 | "test: [-1.47 -0.18 +0.78 +0.40] 728 best: [-1.24 -0.19 +0.53 +0.31] 856 0.531441\n",
839 | "test: [-1.24 +0.00 +0.27 +0.41] 962 best: [-1.24 +0.00 +0.27 +0.41] 962 0.4782969\n",
840 | "test: [-1.32 -0.12 +0.30 +0.19] 618 best: [-1.24 +0.00 +0.27 +0.41] 962 0.4782969\n",
841 | "test: [-1.27 -0.06 +0.21 +0.27] 607 best: [-1.24 +0.00 +0.27 +0.41] 962 0.4782969\n",
842 | "test: [-1.25 -0.20 +0.39 +0.62] 740 best: [-1.24 +0.00 +0.27 +0.41] 962 0.4782969\n",
843 | "test: [-1.07 +0.02 +0.10 +0.43] 721 best: [-1.24 +0.00 +0.27 +0.41] 962 0.4782969\n",
844 | "test: [-1.11 -0.19 +0.31 +0.59] 794 best: [-1.24 +0.00 +0.27 +0.41] 962 0.4782969\n",
845 | "test: [-1.28 -0.12 +0.27 +0.61] 1068 best: [-1.28 -0.12 +0.27 +0.61] 1068 0.43046721\n",
846 | "test: [-1.33 +0.02 +0.46 +0.72] 939 best: [-1.28 -0.12 +0.27 +0.61] 1068 0.43046721\n",
847 | "test: [-1.38 -0.20 +0.10 +0.76] 1018 best: [-1.28 -0.12 +0.27 +0.61] 1068 0.43046721\n",
848 | "test: [-1.15 -0.24 +0.22 +0.67] 827 best: [-1.28 -0.12 +0.27 +0.61] 1068 0.43046721\n",
849 | "test: [-1.39 -0.12 +0.35 +0.70] 907 best: [-1.28 -0.12 +0.27 +0.61] 1068 0.43046721\n",
850 | "test: [-1.07 -0.20 +0.19 +0.80] 1048 best: [-1.28 -0.12 +0.27 +0.61] 1068 0.43046721\n",
851 | "test: [-1.24 -0.14 +0.43 +0.53] 941 best: [-1.28 -0.12 +0.27 +0.61] 1068 0.43046721\n",
852 | "test: [-1.10 -0.28 +0.38 +0.64] 899 best: [-1.28 -0.12 +0.27 +0.61] 1068 0.43046721\n",
853 | "test: [-1.47 -0.06 +0.15 +0.66] 920 best: [-1.28 -0.12 +0.27 +0.61] 1068 0.43046721\n",
854 | "test: [-1.43 +0.03 +0.30 +0.40] 640 best: [-1.28 -0.12 +0.27 +0.61] 1068 0.43046721\n",
855 | "test: [-1.11 +0.10 +0.14 +0.53] 970 best: [-1.28 -0.12 +0.27 +0.61] 1068 0.43046721\n",
856 | "test: [-1.23 +0.03 +0.12 +0.60] 1033 best: [-1.28 -0.12 +0.27 +0.61] 1068 0.43046721\n",
857 | "test: [-1.13 -0.17 +0.41 +0.71] 822 best: [-1.28 -0.12 +0.27 +0.61] 1068 0.43046721\n",
858 | "test: [-1.39 +0.04 +0.31 +0.78] 1099 best: [-1.39 +0.04 +0.31 +0.78] 1099 0.387420489\n",
859 | "test: [-1.23 +0.14 +0.37 +0.88] 1555 best: [-1.23 +0.14 +0.37 +0.88] 1555 0.3486784401\n",
860 | "test: [-1.15 +0.28 +0.40 +1.06] 1539 best: [-1.23 +0.14 +0.37 +0.88] 1555 0.3486784401\n",
861 | "test: [-1.07 +0.12 +0.50 +0.83] 1214 best: [-1.23 +0.14 +0.37 +0.88] 1555 0.3486784401\n",
862 | "test: [-1.30 +0.21 +0.50 +0.80] 1064 best: [-1.23 +0.14 +0.37 +0.88] 1555 0.3486784401\n",
863 | "test: [-1.30 +0.12 +0.24 +0.76] 1089 best: [-1.23 +0.14 +0.37 +0.88] 1555 0.3486784401\n",
864 | "test: [-1.39 -0.02 +0.35 +0.96] 848 best: [-1.23 +0.14 +0.37 +0.88] 1555 0.3486784401\n",
865 | "test: [-1.38 +0.10 +0.51 +0.96] 1334 best: [-1.23 +0.14 +0.37 +0.88] 1555 0.3486784401\n",
866 | "test: [-1.21 +0.14 +0.45 +0.72] 1173 best: [-1.23 +0.14 +0.37 +0.88] 1555 0.3486784401\n",
867 | "test: [-1.24 +0.27 +0.54 +0.76] 1172 best: [-1.23 +0.14 +0.37 +0.88] 1555 0.3486784401\n",
868 | "test: [-1.21 +0.27 +0.54 +0.95] 1433 best: [-1.23 +0.14 +0.37 +0.88] 1555 0.3486784401\n",
869 | "test: [-1.11 +0.16 +0.39 +0.93] 1553 best: [-1.23 +0.14 +0.37 +0.88] 1555 0.3486784401\n",
870 | "test: [-1.20 +0.17 +0.33 +1.02] 1346 best: [-1.23 +0.14 +0.37 +0.88] 1555 0.3486784401\n",
871 | "test: [-1.07 +0.26 +0.51 +0.85] 1344 best: [-1.23 +0.14 +0.37 +0.88] 1555 0.3486784401\n",
872 | "test: [-1.12 +0.20 +0.21 +0.97] 963 best: [-1.23 +0.14 +0.37 +0.88] 1555 0.3486784401\n",
873 | "test: [-1.40 +0.10 +0.39 +1.00] 934 best: [-1.23 +0.14 +0.37 +0.88] 1555 0.3486784401\n",
874 | "test: [-1.27 +0.15 +0.21 +0.81] 1026 best: [-1.23 +0.14 +0.37 +0.88] 1555 0.3486784401\n",
875 | "test: [-1.35 +0.19 +0.42 +0.96] 1062 best: [-1.23 +0.14 +0.37 +0.88] 1555 0.3486784401\n",
876 | "test: [-1.20 +0.31 +0.30 +0.73] 1162 best: [-1.23 +0.14 +0.37 +0.88] 1555 0.3486784401\n",
877 | "test: [-1.24 +0.23 +0.24 +0.99] 1357 best: [-1.23 +0.14 +0.37 +0.88] 1555 0.3486784401\n",
878 | "test: [-1.24 +0.29 +0.27 +0.90] 1245 best: [-1.23 +0.14 +0.37 +0.88] 1555 0.3486784401\n",
879 | "test: [-1.28 +0.07 +0.28 +0.99] 1056 best: [-1.23 +0.14 +0.37 +0.88] 1555 0.3486784401\n",
880 | "test: [-1.22 -0.02 +0.50 +0.89] 1012 best: [-1.23 +0.14 +0.37 +0.88] 1555 0.3486784401\n",
881 | "test: [-1.26 +0.16 +0.40 +0.97] 996 best: [-1.23 +0.14 +0.37 +0.88] 1555 0.3486784401\n",
882 | "test: [-1.07 +0.26 +0.38 +0.76] 1166 best: [-1.23 +0.14 +0.37 +0.88] 1555 0.3486784401\n",
883 | "test: [-1.11 +0.17 +0.43 +0.93] 1201 best: [-1.23 +0.14 +0.37 +0.88] 1555 0.3486784401\n",
884 | "test: [-1.37 +0.08 +0.20 +0.78] 1068 best: [-1.23 +0.14 +0.37 +0.88] 1555 0.3486784401\n",
885 | "test: [-1.07 +0.02 +0.24 +0.90] 1518 best: [-1.23 +0.14 +0.37 +0.88] 1555 0.3486784401\n",
886 | "test: [-1.24 +0.28 +0.40 +0.92] 1266 best: [-1.23 +0.14 +0.37 +0.88] 1555 0.3486784401\n",
887 | "test: [-1.28 +0.23 +0.28 +1.00] 1214 best: [-1.23 +0.14 +0.37 +0.88] 1555 0.3486784401\n",
888 | "test: [-1.16 +0.29 +0.23 +0.77] 1232 best: [-1.23 +0.14 +0.37 +0.88] 1555 0.3486784401\n",
889 | "test: [-1.23 +0.26 +0.21 +0.78] 1328 best: [-1.23 +0.14 +0.37 +0.88] 1555 0.3486784401\n",
890 | "test: [-1.25 +0.20 +0.23 +0.98] 917 best: [-1.23 +0.14 +0.37 +0.88] 1555 0.3486784401\n",
891 | "test: [-1.24 +0.06 +0.30 +0.91] 1234 best: [-1.23 +0.14 +0.37 +0.88] 1555 0.3486784401\n",
892 | "test: [-1.10 +0.07 +0.29 +0.83] 1089 best: [-1.23 +0.14 +0.37 +0.88] 1555 0.3486784401\n",
893 | "test: [-1.20 +0.06 +0.29 +1.03] 1375 best: [-1.23 +0.14 +0.37 +0.88] 1555 0.3486784401\n",
894 | "test: [-1.19 +0.27 +0.30 +0.88] 1378 best: [-1.23 +0.14 +0.37 +0.88] 1555 0.3486784401\n",
895 | "best [-1.2322893214490043, 0.14150894065716918, 0.36790056804535143, 0.881835892179857] 1555\n"
896 | ]
897 | }
898 | ],
899 | "source": [
900 | "import gym\n",
901 | "import random\n",
902 | "env = gym.make('CartPole-v0')\n",
903 | "\n",
904 | "# env.monitor.start('/tmp/cartpole-experiment-3', force=True)\n",
905 | "\n",
906 | "bestSteps = 0\n",
907 | "best = [0, 0, 0, 0]\n",
908 | "alpha = 1\n",
909 | "\n",
910 | "for i_episode in xrange(80):\n",
911 | " \n",
912 | " test = [best[i] + (random.random() - 0.5) * alpha for i in range(4)]\n",
913 | "\n",
914 | " score = 0\n",
915 | " for ep in range(10): # <-- key thing was to figure out that you need to do 10 tests per point\n",
916 | " observation = env.reset()\n",
917 | " for t in xrange(200): # <-- because you can't go over 200 you need to gain score hight else where\n",
918 | " env.render()\n",
919 | " if sum(observation[i] * test[i] for i in range(4)) > 0:\n",
920 | " action = 1\n",
921 | " else:\n",
922 | " action = 0\n",
923 | " observation, reward, done, info = env.step(action)\n",
924 | " if done:\n",
925 | " break\n",
926 | "\n",
927 | " score += t\n",
928 | "\n",
929 | " if bestSteps < score:\n",
930 | " bestSteps = score\n",
931 | " best = test\n",
932 | " alpha *= .9\n",
933 | "\n",
934 | " print \"test:\", \"[%+1.2f %+1.2f %+1.2f %+1.2f]\" % tuple(test), score, \n",
935 | " print \"best:\", \"[%+1.2f %+1.2f %+1.2f %+1.2f]\" % tuple(best), bestSteps, alpha\n",
936 | "\n",
937 | "\n",
938 | "print \"best\", best, bestSteps\n",
939 | "\n",
940 | "# env.monitor.close()"
941 | ]
942 | }
943 | ],
944 | "metadata": {
945 | "kernelspec": {
946 | "display_name": "Python 2",
947 | "language": "python",
948 | "name": "python2"
949 | },
950 | "language_info": {
951 | "codemirror_mode": {
952 | "name": "ipython",
953 | "version": 2
954 | },
955 | "file_extension": ".py",
956 | "mimetype": "text/x-python",
957 | "name": "python",
958 | "nbconvert_exporter": "python",
959 | "pygments_lexer": "ipython2",
960 | "version": "2.7.11"
961 | }
962 | },
963 | "nbformat": 4,
964 | "nbformat_minor": 0
965 | }
966 |
--------------------------------------------------------------------------------
/notebooks/chapter5.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "##### Exercise 5.1\n",
8 | "\n",
9 | "Consider the diagrams on the right in Figure 5.2. Why does the value function jump up for the last two rows in the rear? Why does it drop off for the whole last row on the left? Why are the frontmost values higher in the upper diagrams than in the lower?\n",
10 | "\n",
11 | "Jumps up for last two rows in the rear since sticking for 20 and 21 in blackjack usually result in a win. It drops off for the last row on the left since if the dealer has an Ace, it's bad news for the player. Finally, the frontmost values are higher in the upper diagrams than in the lower since having a usable Ace makes it less likely that an Ace will make the player bust."
12 | ]
13 | },
14 | {
15 | "cell_type": "markdown",
16 | "metadata": {},
17 | "source": [
18 | "##### Exercise 5.2\n",
19 | "\n",
20 | "The backup diagram for the Monte Carlo estimation of $Q^\\pi$ is similar to the backup diagram for $V^\\pi$, but the root is a state-action pair, not just a state. The diagram ends in a terminal state."
21 | ]
22 | },
23 | {
24 | "cell_type": "markdown",
25 | "metadata": {},
26 | "source": [
27 | "##### Some notes \n",
28 | "\"Without a model (as we had in DP chapter 4)... state values alone are not sufficient. One must explicitly estimate the value of each action in order for the values to be useful in suggesting a policy. \"\n",
29 | "\n",
30 | "By model, the author means, having the transition probabilities and transition rewards available at hand.\n",
31 | "\n",
32 | "\" For policy evaluation to work for action values, we must assure continual exploration. One way to do this is by specifying that the first step of each episode starts at a state-action pair, and that every such pair has a nonzero probability of being selected as the start. This guarantees that all state-action pairs will be visited an infinite number of times in the limit of an infinite number of episodes. We call this the assumption of exploring starts.\"\n",
33 | "\n",
34 | "\n",
35 | "Has the following been proved on Monte Carlo ES?\n",
36 | "\"Convergence to this optimal fixed point seems inevitable as the changes to the action-value function decrease over time, but has not yet been formally proved. In our opinion, this is one of the most fundamental open questions in reinforcement learning.\""
37 | ]
38 | },
39 | {
40 | "cell_type": "markdown",
41 | "metadata": {
42 | "collapsed": true
43 | },
44 | "source": [
45 | "##### Questions\n",
46 | "\n",
47 | "**Question**: In 5.6, in the Figure 5.7, (c), the update to w, why is there a 1 in the numerator? It's because $\\pi(s,a)$ is a deterministic policy!\n",
48 | "\n",
49 | "Why are we taking $\\tau$ to be the latest time at which the actions are not greedy? Is it because our estimate for $Q^\\pi$ only improves for nongreedy actions as stated in the section?\n",
50 | "\n",
51 | "**Question**: In 5.4, the conditions for the policy improvement theorem require optimal substructure right? So even though Monte Carlo is more robust to violations of the Markov Property since it doesn't bootstrap $V$ or $Q$, we are still assuming that greedy updates in GPI (Generalized Policy Improvement) will allow us to arrive at the optimal action-value functions due to the Markov Property?"
52 | ]
53 | },
54 | {
55 | "cell_type": "markdown",
56 | "metadata": {},
57 | "source": [
58 | "##### Exercise 5.3 \n",
59 | "\n",
60 | "What is the Monte Carlo estimate analogous to (5.3) for action values, given returns generated using $\\pi'$?\n",
61 | "\n",
62 | "I'm not sure, but I think it would be something like:\n",
63 | "\n",
64 | "$$Q(s,a) = \\frac{\\sum_i^{n_s} \\frac{p_i(s,a)}{p_i'(s,a)} R_i(s,a) }{ \\sum_i^{n_s} \\frac{p_i(s,a)}{p_i'(s,a)} }$$\n",
65 | "\n",
66 | "Where $R_i(s,a)$ is the reward following state $s$ and action $a$, and $p_i(s_t, a_t) = P^{a_t}_{s_t, s_{t+1}} \\prod_{k=t+1}^{T_i(s_t, a_t) - 1} \\pi(s_k, a_k)P^{a_k}_{s_k, s_{k+1}} $. \n",
67 | "\n"
68 | ]
69 | },
70 | {
71 | "cell_type": "markdown",
72 | "metadata": {},
73 | "source": [
74 | "##### Exercise 5.4"
75 | ]
76 | },
77 | {
78 | "cell_type": "markdown",
79 | "metadata": {
80 | "collapsed": false
81 | },
82 | "source": [
83 | "See the code in `chapter5_racetrack` for the solution!"
84 | ]
85 | },
86 | {
87 | "cell_type": "markdown",
88 | "metadata": {},
89 | "source": [
90 | "##### Exercise 5.5\n",
91 | "\n",
92 | "Modify the algorithm for first-visit MC policy evaluation (Figure 5.1) to use the incremental implementation for stationary averages described in Section 2.5.\n",
93 | "\n",
94 | "We just need to update the algorithm so that $Returns(s)$ is a 1x1 array for each $s \\in S$. Before part(b) of the algorithm, we need to initialize $k = 0$, and in part (b) in the loop, we do:\n",
95 | "\n",
96 | "- $Returns(s) \\leftarrow Returns(s) + \\frac{1}{k + 1}[R - Returns(s)]$ \n",
97 | "- $k \\leftarrow k + 1$\n",
98 | "\n",
99 | "\n",
100 | "##### Exercise 5.6\n",
101 | "\n",
102 | "\n",
103 | "Derive the weighted-average update rule (5.5) from (5.4). Follow the pattern of the derivation of the unweighted rule (2.4) from (2.1).\n",
104 | "\n",
105 | "We have, \n",
106 | "\n",
107 | "$\n",
108 | "\\begin{equation}\n",
109 | "\\begin{split}\n",
110 | "V_{n+1} =& \\frac{\\sum_{k=1}^{n+1} w_k R_k }{\\sum{k+1}{n+1} w_k}\\\\\n",
111 | "=& \\frac{w_{n+1} R_{n+1} + \\sum_{k=1}^n w_k R_k }{W_{n+1}}\n",
112 | "\\end{split}\n",
113 | "\\end{equation}\n",
114 | "$\n",
115 | "\n",
116 | "where $W_{n+1} = W_n + w_{n+1}$ and $W_0 = 0$. Then we have:\n",
117 | "\n",
118 | "$\n",
119 | "\\begin{equation}\n",
120 | "\\begin{split}\n",
121 | "V_{n+1} =& \\frac{1}{W_{n+1}} [w_{n+1}R_{n+1} + V_n W_n]\\\\\n",
122 | "=& \\frac{1}{W_{n+1}} [w_{n+1}R_{n+1} + V_n (W_{n+1} - w_{n+1})]\\\\\n",
123 | "=& V_n + \\frac{w_{n+1}}{W_{n+1}} [R_{n+1} - V_n ]\n",
124 | "\\end{split}\n",
125 | "\\end{equation}\n",
126 | "$\n",
127 | "\n",
128 | "\n",
129 | "##### Exercise 5.7\n",
130 | "\n",
131 | "Modify the algorithm for the off-policy Monte Carlo control algorithm (Figure 5.7) to use the method described above for incrementally computing weighted averages.\n",
132 | "\n",
133 | "Before the repeat loop, we need to initialize $W = 0$. Then in the repeat loop we do:\n",
134 | "\n",
135 | "- get $w$ and $t$ as usual\n",
136 | "- delete lines for $N(s,a)$ and $D(s,a)$\n",
137 | "- $W \\leftarrow w + W$\n",
138 | "- $Q(s,a) \\leftarrow Q(s,a) + \\frac{w}{W} [R_t - Q(s,a)] $"
139 | ]
140 | },
141 | {
142 | "cell_type": "code",
143 | "execution_count": null,
144 | "metadata": {
145 | "collapsed": true
146 | },
147 | "outputs": [],
148 | "source": []
149 | }
150 | ],
151 | "metadata": {
152 | "kernelspec": {
153 | "display_name": "Python 2",
154 | "language": "python",
155 | "name": "python2"
156 | },
157 | "language_info": {
158 | "codemirror_mode": {
159 | "name": "ipython",
160 | "version": 2
161 | },
162 | "file_extension": ".py",
163 | "mimetype": "text/x-python",
164 | "name": "python",
165 | "nbconvert_exporter": "python",
166 | "pygments_lexer": "ipython2",
167 | "version": "2.7.11"
168 | }
169 | },
170 | "nbformat": 4,
171 | "nbformat_minor": 0
172 | }
173 |
--------------------------------------------------------------------------------
/notebooks/chapter5_racetrack/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/btaba/intro-to-rl/b65860cd81ce43ac344d4f618a6364c000ea971b/notebooks/chapter5_racetrack/__init__.py
--------------------------------------------------------------------------------
/notebooks/chapter5_racetrack/racetrack.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "code",
5 | "execution_count": 7,
6 | "metadata": {
7 | "collapsed": false
8 | },
9 | "outputs": [],
10 | "source": [
11 | "import numpy as np\n",
12 | "\n",
13 | "from itertools import product\n",
14 | "\n",
15 | "import matplotlib.pyplot as plt\n",
16 | "import matplotlib.animation as animation\n",
17 | "%matplotlib inline\n",
18 | "\n",
19 | "from tracks import fancytrack, fancytrack2"
20 | ]
21 | },
22 | {
23 | "cell_type": "code",
24 | "execution_count": 8,
25 | "metadata": {
26 | "collapsed": true
27 | },
28 | "outputs": [],
29 | "source": [
30 | "# from tempfile import NamedTemporaryFile\n",
31 | "# from IPython.display import HTML\n",
32 | "\n",
33 | "# VIDEO_TAG = \"\"\"\"\"\"\n",
37 | "\n",
38 | "# def anim_to_html(anim):\n",
39 | "# if not hasattr(anim, '_encoded_video'):\n",
40 | "# with NamedTemporaryFile(suffix='.mp4') as f:\n",
41 | "# anim.save(f.name, fps=10, extra_args=['-vcodec', 'libx264'])\n",
42 | "# video = open(f.name, \"rb\").read()\n",
43 | "# anim._encoded_video = video.encode(\"base64\")\n",
44 | " \n",
45 | "# return VIDEO_TAG.format(anim._encoded_video)\n",
46 | "\n",
47 | "# def display_animation(anim):\n",
48 | "# plt.close(anim._fig)\n",
49 | "# return HTML(anim_to_html(anim))\n",
50 | "\n",
51 | "# # automatically display animations\n",
52 | "# animation.Animation._repr_html_ = anim_to_html"
53 | ]
54 | },
55 | {
56 | "cell_type": "code",
57 | "execution_count": 116,
58 | "metadata": {
59 | "collapsed": true
60 | },
61 | "outputs": [],
62 | "source": [
63 | "class Track(object):\n",
64 | " def __init__(self, track):\n",
65 | " \"\"\"\n",
66 | " 0 = off track\n",
67 | " 1 = road - on track\n",
68 | " 2 = start line\n",
69 | " 3 = finish line\n",
70 | " \"\"\"\n",
71 | " self.track = track\n",
72 | "\n",
73 | " def get_next_position(self, racecar):\n",
74 | " \"\"\"\n",
75 | " RaceCar racecar: RaceCar object\n",
76 | " \"\"\"\n",
77 | " \n",
78 | " reward = -1\n",
79 | " crash = False\n",
80 | " \n",
81 | " new_x = racecar.x + racecar.velocity_x\n",
82 | " new_y = racecar.y + racecar.velocity_y\n",
83 | " \n",
84 | " final_x = new_x\n",
85 | " final_y = new_y\n",
86 | " \n",
87 | " # Compute all the unique boxes we hit on a line between the start and end points\n",
88 | " x_positions = np.linspace(racecar.x, new_x, num=20)\n",
89 | " y_positions = np.linspace(racecar.y, new_y, num=20)\n",
90 | " positions = zip(x_positions, y_positions)\n",
91 | " positions = [(np.floor(x), np.floor(y)) for x, y in positions]\n",
92 | " \n",
93 | " # Get unique discrete positions visited during this time step\n",
94 | " ordered_positions = []\n",
95 | " for pos in positions:\n",
96 | " if len(ordered_positions) == 0 or pos != ordered_positions[-1]:\n",
97 | " ordered_positions.append(pos)\n",
98 | " \n",
99 | " # Check if the car crashes into the track at any of those time points\n",
100 | " # or if it reached the finish line\n",
101 | " for pos_idx, pos in enumerate(ordered_positions):\n",
102 | "\n",
103 | " # ability to speed past the finish without penalty\n",
104 | " if self.is_terminal_state_from_coordinates(pos[0], pos[1]):\n",
105 | " reward = -1\n",
106 | " final_x, final_y = ordered_positions[pos_idx]\n",
107 | " break\n",
108 | " \n",
109 | " # check if the car crashes\n",
110 | " if self.is_out_of_bounds(pos):\n",
111 | " reward -= 5\n",
112 | " crash_x, crash_y = pos\n",
113 | " final_x, final_y = ordered_positions[pos_idx - 1]\n",
114 | " racecar.velocity_x = 0\n",
115 | " racecar.velocity_y = 0\n",
116 | " crash = True\n",
117 | " break\n",
118 | " \n",
119 | "\n",
120 | " # If the car is not moving, the car must move at least 1 step\n",
121 | " # Here instead, we just give a negative reward for not moving, since we have left turns as well\n",
122 | " if final_x == racecar.x and final_y == racecar.y:\n",
123 | " reward -= 5\n",
124 | "# if self.is_out_of_bounds((final_x + 1, final_y)):\n",
125 | "# final_y += 1\n",
126 | "# racecar.velocity_y = 1\n",
127 | "# elif self.is_out_of_bounds((final_x, final_y + 1)):\n",
128 | "# final_x += 1\n",
129 | "# racecar.velocity_x = 1\n",
130 | "# else:\n",
131 | "# random_choice = np.random.choice([0, 1])\n",
132 | "# final_x += random_choice\n",
133 | "# final_y += (1 - random_choice)\n",
134 | "# racecar.velocity_x += random_choice\n",
135 | "# racecar.velocity_y += (1 - random_choice) \n",
136 | " \n",
137 | " racecar.x = final_x\n",
138 | " racecar.y = final_y\n",
139 | " \n",
140 | " return reward, crash\n",
141 | "\n",
142 | " def convert_cartesian_to_indexes(self, x, y):\n",
143 | " y_prime, x_prime = x, y\n",
144 | " x_prime = self.track.shape[0] - x_prime - 1\n",
145 | " return int(x_prime), int(y_prime)\n",
146 | " \n",
147 | " def convert_indexes_to_cartesian(self, x, y):\n",
148 | " y_prime, x_prime = x, y\n",
149 | " y_prime = self.track.shape[0] - y_prime - 1\n",
150 | " return int(x_prime), int(y_prime)\n",
151 | " \n",
152 | " def is_terminal_state(self, racecar):\n",
153 | " x, y = self.convert_cartesian_to_indexes(racecar.x, racecar.y)\n",
154 | " if self.track[x, y] == 3:\n",
155 | " return True\n",
156 | " return False\n",
157 | " \n",
158 | " def is_terminal_state_from_coordinates(self, x, y):\n",
159 | " if self.is_out_of_bounds((x, y)):\n",
160 | " return False\n",
161 | "\n",
162 | " x, y = self.convert_cartesian_to_indexes(x, y)\n",
163 | " if self.track[x, y] == 3:\n",
164 | " return True\n",
165 | " return False\n",
166 | " \n",
167 | " def is_out_of_bounds(self, position):\n",
168 | " x, y = position\n",
169 | " \n",
170 | " if x < 0 or x >= self.track.shape[1]:\n",
171 | " return True\n",
172 | " \n",
173 | " if y < 0 or y >= self.track.shape[0]:\n",
174 | " return True\n",
175 | "\n",
176 | " # y is reversed in our frame of reference\n",
177 | " x, y = self.convert_cartesian_to_indexes(x, y)\n",
178 | "\n",
179 | " if self.track[x, y] == 0:\n",
180 | " return True\n",
181 | " \n",
182 | " return False\n",
183 | " \n",
184 | " def get_random_start(self):\n",
185 | " # returns x and y coordinates of random start\n",
186 | " starts = np.argwhere(self.track == 2)\n",
187 | " random_start = np.random.randint(len(starts))\n",
188 | " start = starts[random_start]\n",
189 | " return self.convert_indexes_to_cartesian(*start)\n",
190 | " \n",
191 | " def get_states(self):\n",
192 | " return [self.convert_indexes_to_cartesian(x, y) for x, y in np.argwhere(self.track != 0)]\n",
193 | " \n",
194 | " def print_track(self, x, y):\n",
195 | " x, y = self.convert_cartesian_to_indexes(x, y)\n",
196 | " pt = np.copy(self.track)\n",
197 | " pt[x, y] = -1\n",
198 | " print(pt)\n",
199 | " \n",
200 | " "
201 | ]
202 | },
203 | {
204 | "cell_type": "code",
205 | "execution_count": 165,
206 | "metadata": {
207 | "collapsed": true
208 | },
209 | "outputs": [],
210 | "source": [
211 | "class RaceCar(object):\n",
212 | " def __init__(self):\n",
213 | " self.velocity_x = 0\n",
214 | " self.velocity_y = 0\n",
215 | " self.x = 0\n",
216 | " self.y = 0\n",
217 | " \n",
218 | " self.MAX_VELOCITY = 5\n",
219 | " self.MIN_VELOCITY = -5\n",
220 | "\n",
221 | " def get_episode(self, pi, track, actions, states, greedy=False, verbose=False, max_episodes=200):\n",
222 | " \"\"\"\n",
223 | " actions: an index to action dictionary\n",
224 | " pi: numpy array of probabilities to take an action given the state\n",
225 | " \n",
226 | " \"\"\"\n",
227 | "\n",
228 | " self.velocity_x = 0; self.velocity_y = 0\n",
229 | " self.x, self.y = track.get_random_start()\n",
230 | "\n",
231 | " saved_actions = []\n",
232 | " crash_events = [False]\n",
233 | " rewards = [0]\n",
234 | " visited_states = [((self.x, self.y), (self.velocity_x, self.velocity_y))]\n",
235 | " visited_positions = set([(self.x, self.y)])\n",
236 | " \n",
237 | " terminated = False\n",
238 | " while not terminated:\n",
239 | " state_idx = states[((self.x, self.y), (self.velocity_x, self.velocity_y))]\n",
240 | "\n",
241 | " # choose greedy action with probability pi\n",
242 | " if greedy:\n",
243 | " action_idx = np.where(pi[state_idx, :] == np.amax(pi[state_idx, :]))[0]\n",
244 | " action_idx = np.random.choice(action_idx) \n",
245 | " else:\n",
246 | " action_idx = np.random.choice(len(actions), size=1, p=pi[state_idx, :])[0] \n",
247 | " \n",
248 | " action = actions[action_idx]\n",
249 | " saved_actions.append(action)\n",
250 | " \n",
251 | " # Take the action\n",
252 | " self.velocity_x += action[0]\n",
253 | " self.velocity_y += action[1]\n",
254 | " self.velocity_x = min(max(self.velocity_x, self.MIN_VELOCITY), self.MAX_VELOCITY)\n",
255 | " self.velocity_y = min(max(self.velocity_y, self.MIN_VELOCITY), self.MAX_VELOCITY)\n",
256 | "\n",
257 | " # check if the car crashed, didn't move, or if we should end the episode, and penalize the rewards\n",
258 | " reward, crash = track.get_next_position(self)\n",
259 | " if (self.x, self.y) in visited_positions:\n",
260 | " reward -= 6\n",
261 | " # pass\n",
262 | " if len(visited_states) > max_episodes:\n",
263 | " terminated = True\n",
264 | " else:\n",
265 | " terminated = track.is_terminal_state(self)\n",
266 | " \n",
267 | " # save the rewards, states, and actions\n",
268 | " crash_events.append(crash)\n",
269 | " rewards.append(reward)\n",
270 | " visited_states.append(((self.x, self.y), (self.velocity_x, self.velocity_y)))\n",
271 | " visited_positions.update([(self.x, self.y)])\n",
272 | " if terminated: saved_actions.append((0,0))\n",
273 | " \n",
274 | " if verbose:\n",
275 | " track.print_track(self.x, self.y)\n",
276 | " print('Velocity is now: ', (self.velocity_x, self.velocity_y))\n",
277 | " \n",
278 | " return visited_states, saved_actions, rewards, crash_events\n",
279 | " \n",
280 | " def get_states(self):\n",
281 | " return list(product(\n",
282 | " range(self.MIN_VELOCITY, self.MAX_VELOCITY + 1),\n",
283 | " range(self.MIN_VELOCITY, self.MAX_VELOCITY + 1)\n",
284 | " )\n",
285 | " )"
286 | ]
287 | },
288 | {
289 | "cell_type": "code",
290 | "execution_count": 166,
291 | "metadata": {
292 | "collapsed": false
293 | },
294 | "outputs": [],
295 | "source": [
296 | "class MonteCarlo(object):\n",
297 | " def __init__(self, actions, agent, environment):\n",
298 | " self.actions_list = actions\n",
299 | " self.agent = agent\n",
300 | " self.environment = environment\n",
301 | " \n",
302 | " self.actions_to_idx = {action: idx for idx, action in enumerate(self.actions_list)}\n",
303 | " self.idx_to_actions = {idx: action for idx, action in enumerate(self.actions_list)}\n",
304 | "\n",
305 | " self.states_list = list(product(environment.get_states(), agent.get_states()))\n",
306 | " self.states_to_idx = {state: idx for idx, state in enumerate(self.states_list)}\n",
307 | "\n",
308 | " self.initialize_random_policy()\n",
309 | " \n",
310 | " def initialize_random_policy(self):\n",
311 | " self.Q = np.random.random((len(self.states_to_idx), len(self.actions_to_idx)))\n",
312 | " self.Returns = {(s, a): [] for s, a in product(self.states_to_idx, self.actions_to_idx)}\n",
313 | "\n",
314 | " self.pi = np.random.random((len(self.states_to_idx), len(self.actions_to_idx)))\n",
315 | " self.pi = self.pi / np.sum(self.pi, axis=1)[:, None]\n",
316 | " \n",
317 | " def apply_discount(self, r, gamma):\n",
318 | " for i, rr in enumerate(r):\n",
319 | " r[i] = (gamma ** i) * rr\n",
320 | " return r\n",
321 | " \n",
322 | " def on_policy_learning(self, num_iterations, epsilon=.1, gamma=1, verbose=False, \n",
323 | " sample_every=1000, get_greedy_episode_after=500):\n",
324 | " \"\"\"\n",
325 | " epsilon: sets minimum probability threshold for policy pi\n",
326 | " gamma: discount factor in rewards\n",
327 | " \n",
328 | " \"\"\"\n",
329 | "\n",
330 | " count = 0\n",
331 | " learning = True\n",
332 | " \n",
333 | " movie_array = []\n",
334 | "\n",
335 | " while learning:\n",
336 | " if count % sample_every == 0 and verbose:\n",
337 | " if count > get_greedy_episode_after:\n",
338 | " get_greedy = True\n",
339 | " else:\n",
340 | " get_greedy = False\n",
341 | "\n",
342 | " visited_states, actions_taken, rewards, crash_events = self.agent.get_episode(\n",
343 | " self.pi, \n",
344 | " self.environment, \n",
345 | " self.idx_to_actions, \n",
346 | " self.states_to_idx,\n",
347 | " greedy=get_greedy\n",
348 | " )\n",
349 | " movie_array.append((visited_states, actions_taken, rewards, crash_events, count))\n",
350 | " count += 1\n",
351 | " continue\n",
352 | " else:\n",
353 | " visited_states, actions_taken, rewards, crash_events = self.agent.get_episode(\n",
354 | " self.pi, \n",
355 | " self.environment, \n",
356 | " self.idx_to_actions, \n",
357 | " self.states_to_idx\n",
358 | " ) \n",
359 | "\n",
360 | " has_visited_first_occurence = {}\n",
361 | " for idx, sa in enumerate(zip(visited_states, actions_taken)):\n",
362 | " s, a = sa\n",
363 | " if (s, a) not in has_visited_first_occurence:\n",
364 | " r = self.apply_discount(rewards[idx:], gamma)\n",
365 | " self.Returns[(s, a)].append(sum(r))\n",
366 | " self.Q[self.states_to_idx[s], self.actions_to_idx[a]] = np.mean(self.Returns[(s, a)]) \n",
367 | " has_visited_first_occurence[(s, a)] = 0\n",
368 | "\n",
369 | " for s in visited_states:\n",
370 | " # We can take the greedy action, but it's probably better to break ties\n",
371 | " # a_star = np.argmax(Q[states_to_idx[s],:])\n",
372 | " action_idx = np.where(self.Q[self.states_to_idx[s],:] == np.amax(self.Q[self.states_to_idx[s],:]))[0]\n",
373 | " a_star = np.random.choice(action_idx)\n",
374 | " for action_idx, a in enumerate(self.actions_list):\n",
375 | " if a_star == action_idx:\n",
376 | " self.pi[self.states_to_idx[s], action_idx] = 1 - epsilon + epsilon / len(self.actions_list)\n",
377 | " else:\n",
378 | " self.pi[self.states_to_idx[s], action_idx] = epsilon / len(self.actions_list)\n",
379 | "\n",
380 | " count += 1\n",
381 | "\n",
382 | " if count >= num_iterations: learning = False\n",
383 | " \n",
384 | " if verbose:\n",
385 | " return movie_array\n",
386 | " return"
387 | ]
388 | },
389 | {
390 | "cell_type": "code",
391 | "execution_count": 171,
392 | "metadata": {
393 | "collapsed": true
394 | },
395 | "outputs": [],
396 | "source": [
397 | "# Set up the environment and agent\n",
398 | "car = RaceCar()\n",
399 | "track = Track(fancytrack)\n",
400 | "\n",
401 | "actions_list = list(product([-1, 0, 1], [-1, 0, 1]))\n",
402 | "mc = MonteCarlo(actions_list, car, track)"
403 | ]
404 | },
405 | {
406 | "cell_type": "code",
407 | "execution_count": 172,
408 | "metadata": {
409 | "collapsed": false
410 | },
411 | "outputs": [],
412 | "source": [
413 | "# Learn by On-Policy Monte Carlo\n",
414 | "movie_array = mc.on_policy_learning(num_iterations=5000, verbose=True, gamma=.8, epsilon=0.20, \n",
415 | " sample_every=499, get_greedy_episode_after=3500)"
416 | ]
417 | },
418 | {
419 | "cell_type": "code",
420 | "execution_count": 173,
421 | "metadata": {
422 | "collapsed": false
423 | },
424 | "outputs": [
425 | {
426 | "data": {
427 | "image/png": "iVBORw0KGgoAAAANSUhEUgAAAT4AAAD7CAYAAADzaviDAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3XuUVNWZ9/Hv06igoFwkgIh4mAkCIjR0tMNFhAYVRFGJ\nkQGCghJCHMIYcRANMYQoDIri5Y2ZuBQN4gUUFcU7oK3giCC0woxKXMoBJIJGQERERPb7R12o7q7q\nquo+tXddns9atapOXU79dlX10/ucfS5ijEEppQpJkesASillmxY+pVTB0cKnlCo4WviUUgVHC59S\nquBo4VNKFZwjMv0GIqLbyyilnDHGSNX7Ml74QqbZeRvnyoF+jjPYUE5htBO0rbluetx7dVE3ULtd\nB7CkUNoJ2tb8ZKnHl8Qf/+g6QTCe9mHoHzP/Pon2tpke/7+bUoUl+do17fEFqdsY1wks6eY6gEXa\n1nykhS9I7fq5TmCJ5zqARZ7rABZ5rgNYo4UvSJvKXSewxHcdwCLfdQCLfNcBrNHCp5QqONkxuJEv\ndFE3D3nBzzJfBvOciz+IYZof7s/Jb+K/Unt8SqmCo4UvSLqOLw/5rgPYUzC/Xy18SqkCpIUvSLqO\nLw95rgPYUzC/3xQKn4jUF5G3RaRCRDaIyLTw/dNE5FMRWRe+DMp8XKWUqruko7rGmO9EpMwYs09E\n6gFvisiL4YfnGGPmZDZiDtlUbue/plQ72ETItAQHgwh8VzafwukJ+RRMW239frNASou6xph94Zv1\nCRXLyDhygr9ApZTKXikVPhEpEpEKYDuw1BizJvzQb0TkXRG5X0QaZyxlriiQ/5YF0wMCCqqtBfP7\nTXEDZmPMIaC7iBwHPC0ipwJ/Af5kjDEicjMwBxgbfw6LgSbh2w2AVhz+QfmVu9iRIXWdrt20H572\nYqd9Kn3eoSfotKtp/b1nbvqtOxmzC7zjqZGke0JxEbkR+CZ23Z6InAwsMcZ0jfN8k/RApPmyJbvr\ndSTWDlflUzg9IZ/A25qtv3fXv9+0pbbnRrwjMKcyqts8shgrIkcD5wAfikirmKf9DPjf9EIrpZQb\nqSzqngDME5EiQoVyoTHmBRF5SES6AYcI/VscX+sUiXoqiUYvg5BmTzclXt/azTeT7cwIz3UAizzX\nAeLLpt+vMwl6fCk0IZXNWTYAJXHuvzz57JVSKvvonhtBigws5D3fdQCLfNcB7CmY368WPqVUAdLC\nF6TIJiR5z3MdwCLPdQB7Cub3q4VPKVWAsuMIzIm2M4u372m6I6BWT8XoU6seQqJ9bLN2tNencHpC\nPk7bmgu/3yxTVGm7YT2huFJKAVr4Aua5DmCJ5zqARZ7rABZ5rgNYo4VPKVVwtPAFyncdwBLfdQCL\nfNcBLPJdB7BGC59SquBkx6huIvFGrrL1yBZArdeRpDOqDYlHe60dmdmr5etykec6gEWe6wDWaI9P\nKVVwtPAFyncdwBLfdQCLfNcBLPJdB7BGC59SquBo4QuU5zqAJZ7rABZ5rgNY5LkOYI0WPqVUwcnu\nUd14XBytOWU+gf7XDGq0N3A+hdM78NG25h/t8SmlCo4WvkB5rgNY4rkOYJHnOoBFnusA1mjhU0oV\nnFROL1lfRN4WkQoR2SAi08L3NxWRV0Rko4i8HDkFZWHzXQewxHcdwCLfdQCLfNcBrEla+Iwx3wFl\nxpjuQDfgPBEpBa4HlhljOgCvAjdkNKlSSgVETBrn0RSRY4A3gKuA+UBfY8yO8MnFy40xHeO8xkCC\nUcggpbsPr9Uj21qS7mhvPn4GuSCd36p+R3U0HWNMtT+AlNbxiUiRiFQA24Glxpg1QEtjzA4AY8x2\noEWQcZVSKlNSKnzGmEPhRd02QKmIdKb6acxz6RTsGeK7DmCJ7zqARb7rABb5rgNYk9YGzMaYPSJS\nDgwCdohIy5hF3c8Tv3Ix0CR8uwHQisND5374uq7TYZvKQ9ft+tU87fUNvzwy3S/gPI6mq7bHLwck\n8ecR9/l+kvfbnrn8WTe9Pfj5bypP/vus8fuJFUCevJpeReg7i9Sb+JKu4xOR5sD3xpivRORo4GVg\nFtAX2GmMuUVEpgBNjTHXx3m9ruOzRdfx5QZdx2dR/HV8qfT4TgDmiUgRoUXjhcaYF0RkFfC4iFwJ\nbAaGBZpXKaUyJGnhM8ZsAEri3L8TODsToWolK/bh9XG69XtQ+/bGe36lefsUzlb+PtrW/KN7biil\nCo4WvkB5rgNY4rkOYJHnOoBFnusA1mjhU0oVHC18gfJdB7DEdx3AIt91AIt81wGsyb0DkSaSaMV+\nVp+O0pIgBj1in+uXH96uTDerUDlIe3yB8lwHsCO6sXch8FwHsMhzHcAaLXxKqYKjhS9QvusAdlTb\nbSqf+a4DWOS7DmCNFj6lVMHRwhcoz3UAO3QdX57yXAewJn9GdVX60hntTWf3tprmrVQW0B5foHzX\nAeyIHDqpIPiuA1jkuw5gjRY+pVTB0cIXKM91ADsiB8ksCJ7rABZ5rgNYo4VPKVVwtPAFyncdwA5d\nx5enfNcBrMmfUV2rBxzNc/FGZCuN3prDB37V0V6Vg7THFyjPdQA7dDu+POW5DmCNFj6lVMHRwhco\n33UAO3Rf3Tzluw5gjRY+pVTBSVr4RKSNiLwqIv8nIhtEZGL4/mki8qmIrAtfBmU+brbzXAewQ9fx\n5SnPdQBrUhnVPQhMMsa8KyKNgLUisjT82BxjzJzMxYsj3VFEPSFzMNI9wnWi7ynR8/V7UhYl7fEZ\nY7YbY94N394LfACcGH5YtyGpxHcdwBLfdQCLfNcBLPJdB7AmrXV8IuIB3YC3w3f9RkTeFZH7RaRx\nwNmUUiojUt6AObyYuwi42hizV0T+AvzJGGNE5GZgDjA2/qsXA03CtxsArTi8PsEPX6c4bTaFA7Wr\nPB0R2asgsj9pZATSqzIdleb71zjtBTy/bJ4Oq/p513ba6xuefWS6X8B5azsduS/A+W8qT/3z0d9v\nmtOrgO0crjfxiUm0biX2SSJHAM8BLxpj7orz+MnAEmNM1ziPGUiw/q02dB1fdgnqLHaF9D2l85kV\n0ueSEdMxxlQrGqku6j4AvB9b9ESkVczjPwP+t24B84HvOoAlvusAFvmuA1jkuw5gTdJFXRHpDfwC\n2CAiFYABfgeMFJFuwCFCn9j4wNMlO79rLP3PqJRKUdLCZ4x5E6gX56GXgo+T6zzXASzxXAewyHMd\nwCLPdQBrdM8NpVTB0cIXKN91AEt81wEs8l0HsMh3HcAaLXxKqYKjhS9QnusAlniuA1jkuQ5gkec6\ngDXZcQTmdLbN09Hb7JLCdqCV6JGyVRbQHl+gfNcBLPFdB7DIdx3AIt91AGu08CmlCo4WvkB5rgNY\n4rkOYJHnOoBFnusA1mjhU0oVHC18gfJdB7DEdx3AIt91AIt81wGsyY5R3T/8If798UYMdfQ2u6T7\nfQR1NBel6kB7fIHyXAewxHMdwCLPdQCLPNcBrNHCp5QqOFr4AuW7DmCJ7zqARb7rABb5rgNYo4VP\nKVVwtPAFynMdwBLPdQCLPNcBLPJcB7AmO0Z10zlna6IjMOtor1IqRdrjC5TvOoAlvusAFvmuA1jk\nuw5gjRY+pVTB0cIXKM91AEs81wEs8lwHsMhzHcCapIVPRNqIyKsi8n8iskFE/iN8f1MReUVENorI\nyyLSOPNxlVKq7lIZ3DgITDLGvCsijYC1IvIKcAWwzBhzq4hMAW4Ars9g1pB0TyhuddDDpzD+a/oU\nRjuhTm1N96CrznfR9CmU7zVpj88Ys90Y82749l7gA6ANcBEwL/y0ecDFmQqplFJBSmsdn4h4QDdg\nFdDSGLMDQsURaBF0uNzjuQ5giec6gEWe6wAWea4DWJPydnzhxdxFwNXGmL0iUrVfXsPJFxYDTcK3\nGwCtOPwh++HrBNObykPX7frVPO31Db88Mt0vtfnrtN3pQvv+zKbQtbQ7PL2pPPHvOd/ab316FbCd\nw/UmPjEpnCxGRI4AngNeNMbcFb7vA6CfMWaHiLQCXjPGdIrzWgMJ1r8lk84hjLLiJEQ+hfFf06fW\n7Uz0nWbF9xePT+Dr+BKtj9Z1fBkwHWNMtS8i1UXdB4D3I0Uv7FlgTPj2aOCZOuVTSilLki7qikhv\n4BfABhGpILRI+zvgFuBxEbkS2AwMy2TQpLJitNfLwDyzkec6gEVe7V+a6LeXtTzXAaxJWviMMW8C\n9RI8fHawcZRSKvN0z41A+a4DWOK7DmCR7zqARb7rANZo4VNKFRwtfIHyXAewxHMdwCLPdQCLPNcB\nrNHCp5QqONld+IypfslqvusAlviuA1jkuw5gke86gDXZXfiUUioDtPAFynMdwBLPdQCLPNcBLPJc\nB7BGC59SquBo4QuU7zqAJb7rABb5rgNY5LsOYI0WPqVUwcmO00smEm9/2nSO2GKd5zqAJZ7rABZ5\ntX9pEFsh6L7mGaE9PqVUwdHCFyjfdQBLfNcBLPJdB7DIdx3AGi18SqmCo4UvUJ7rAJZ4rgNY5LkO\nYJHnOoA1WviUUgUnu0d1c45PYfzX9EnaznTPKZu1fGr9nQYx8mr1KM4+hfH71R6fUqoAaeELlOc6\ngCWe6wAWea4DWOS5DmCNFj6lVMFJWvhEZK6I7BCR9TH3TRORT0VkXfgyKLMxc4XvOoAlvusAFvmu\nA1jkuw5gTSo9vgeBgXHun2OMKQlfXgo4l1JKZUwqp5dcKSInx3koX4btAuS5DpA5Wb2PdCZ59t7K\n+Ui45/j97anLOr7fiMi7InK/iDQOLJFSSmVYbQvfX4B/McZ0A7YDc4KLlMt81wHs2FTuOoFFvusA\nFvmuA1hTqw2YjTFfxEzeByyp+RWLgSbh2w2AVhzuVvvh6xSnI3907fqlNu2Hp70q01Fpvn/BTodt\nKoft76b++ac67fUNv11kul/A+Ws7vd3u+5lNoWtpF364PPxwv8rTURnOk3PTqwh9Z5F6E5+YFI4Z\nJiIesMQY0yU83coYsz18+xrgDGPMyASvNRDg1ufprmtK1L6MHM8sj2V6HZ9+T/HX8f3hD/GfW0if\nS51MxxhT7YNN2uMTkUeBfsDxIrKFUBUrE5FuwCFCJXd8oFmVUiqDUhnVjdeTezADWapzPsqVLp+8\nHRmL7ZH55TGLomnS7zTxZxBvv1yr55L2ydvfbxW654ZSquBo4QuU5zqAHbXt7eUkz3UAizzXAazR\nwqeUKjha+ALluw5gR7VNKvKZ7zqARb7rANZo4VNKFZzsPgJzOkefzYrtwDyL72VZtc/x9drNJ+f2\n+fVq/9JCamuO0R6fUqrgaOELlO86gCW+6wAW+a4DWOS7DmCNFj6lVMHRwhcoz3UASzzXASzyXAew\nyHMdwJrsHtxQKhcEtRtevAE6PRhBRmiPL1C+6wCW+K4DWOS7DmCR7zqANVr4lFIFRwtfoDzXASzx\nXAewyHMdwCLPdQBrtPAppQqOFr5A+a4DWOK7DmCR7zqARb7rANbk3qhuVuyaplSMdHatVFlBe3yB\n8lwHsMRzHcAiz3UAizzXAazRwqeUKjha+ALluw5gie86gEW+6wAW+a4DWKOFTylVcJIWPhGZKyI7\nRGR9zH1NReQVEdkoIi+LSOPMxswVnusAlniuA1jkuQ5gUTtCJcHmxY1URnUfBP4f8FDMfdcDy4wx\nt4rIFOCG8H21k3OnHFTKpZZAI+BooEGV6yOBenW4uHAQ+KEWl4PAd8B+4NvwJXL7C2BfwndM5by6\nK0Xk5Cp3XwT0Dd+eB5RTl8KXN3wKo4fgUxjthOxqa33gKqBJxt6hXj17/ZBDh0KXUBkKesu6Q8Ab\nQPzN3Gr7bi2MMTsAjDHbRaRFLeejlEpZRyJFb8AAaNYMmjYNXSK3GzWCo46q3eWII+wvfP3wA3z/\nPRw4kP5l/3746ivYtSt02bkzdL15M6xfXwSUJnzfoMpsktO9L+bwf6kGQCsO/xf1Q6+WduE5bQpd\nR6Y3lYeu2/ULPz087VWZjvLD156Dac/x+9ucJsnjCaarfp+RaS+8AFH1+3Xe3sh9NTx/U3n19tR2\nusb2h/6Gpk6Fm2+GevXqUVxcjDEGEWH48OGMG3cdidx77700bNiQUaNGJXxOKtq1a8fatWtp1qxZ\n2q89cOAAl19+OWvXrqV58+YsXLiQtm3b0qBBnSJVc/TRsH//MUDDuI/XtvDtEJGWxpgdItIK+Lzm\np19cw2Ne5X8zkYIXEflBRJ+eaDpy8huv+vx1Onumq36fkenIHjnVTlae4TxBTMe2KVH7Up2usf2h\nP+If/Sg81bAh69atI1Xjx49P+bk1kTp0C+fOnUuzZs346KOPWLhwIddddx0LFiwIJFesH/0Itm4F\n+FHcx1MdVpHwJeJZYEz49mjgmdrFyze+6wCW+K4DWOS7DhDjGOBw4TMJdt9s164dU6ZMoWvXrvTo\n0YNPPvkEgOnTpzNnzhwA7r77bjp37ky3bt0YOXIkALt27WLo0KEUFxfTq1cvNmzYAMDOnTsZOHAg\nXbp0Ydy4cZXe95FHHuGnP/0pJSUlXHXVVQkzRTzzzDOMHj0agJ///OcsX768lp9FzX4UrXfxC1/S\nHp+IPAr0A44XkS3ANGAW8ISIXAlsBobVKaXu66hygfOtDyr3+L799ltKSkqii7o33HADl156KQBN\nmzZl/fr1zJ8/n6uvvpolS5ZUmtMtt9yC7/sceeSR7NmzB4Bp06ZRUlLC008/zWuvvcbll19ORUUF\n06dPp0+fPvz+97/nhRde4IEHHgDgww8/ZOHChfzP//wP9erVY8KECTzyyCOMGjWKcePGcdVVV1FS\nUlLpfbdt28ZJJ50EhBbVmzRpws6dO2u12FyTOhc+Y8zIBA+dXbtI+cxzHcASz3UAizzXAWJU7vEd\nc8wxCRd1hw8fDsCIESOYNGlStceLi4sZOXIkF198MRdfHFoVtXLlSp566ikAysrK2LlzJ19//TVv\nvPEGTz/9NACDBw+madOmACxfvpx169ZxxhlnYIxh//79tGzZEoD77rsvpRYl6yHWVp0Ln1IqW9QH\n4Ljjkj8zdj1cvHVyzz//PG+88QbPPvssM2bMiC7W1jSfiEixMsYwevRoZsyYkUp4ANq0acPWrVtp\n3bo1P/zwA3v27Am8twexn9GxcR/XXdYC5bsOYInvOoBFvusAMUIbGEdGQGvqLS1cuBCABQsW0LNn\nz2qPb9myhb59+zJr1iz27NnDN998w1lnncXDDz8MQHl5Oc2bN6dRo0acddZZPPLIIwC8+OKL7N69\nG4ABAwawaNEivvjiCyC0jnDLli01tmDIkCHMmzcPgCeeeIL+/fun2vi0HB4lrh/3ce3xKZUzQn+u\n9cN/y/v376+0jm/QoEHMnDkTCBWh4uJiGjRowGOPPVZpLgcPHmTUqFHs2bMHYwxXX301xx13HNOm\nTePKK6+kuLiYhg0bRgvUtGnTGDFiBAsWLKBXr160bdsWgE6dOnHzzTdz7rnncujQIY466ijuuece\n2rZtm3Ad39ixY7nsssto3749xx9/fEZGdGM/o0SFTzK1jB19AxETGg+pwR//mPoM9UCkuS3Rd50L\n32uiwY2gBueSnl5yKnAke/dCw/ibpwF1284uX/zpT5Gv5SaMubHaF6c9PqWqclHgIIUiH1rUrR+/\nExNVl+3s8kWyHp+u4wuU7zqAJb7rAPZE9iRyLnQ0k3r1QruW1eSTTz4p6N4exK7ji79LiBY+pXJC\n5fV7qmba47PKcx3AEs91AHuq7kLpTKjwxduntUePHpSUlHDyySfTokULunfvTklJCZs3b+aCCy6I\nbqCcKevWraNr166ccsop/Pa3v83oe6UqWY9P1/EplRMSr99btWoVAPPmzWPt2rXcfffd0ceee+65\njCe76qqrmDt3LmeccQaDBw/m5ZdfZuDAgRl/35poj88q33UAS3zXAezJmnV8iXt8NWnXrh07d+5k\n8+bNdOrUiSuuuIIOHTowatQoli9fzplnnkmHDh145513ANi3bx9jx46lR48e/OQnP6m2q1tV27dv\n5+uvv+aMM84A4PLLL2fx4sXpNy9guh2fciMXRhZTGb2NPeRUIuluElarTXRqt44vdoT3448/5skn\nn+TUU0/l9NNP57HHHmPlypU8++yzzJw5k6eeeooZM2YwYMAA5s6dy1dffUVpaSlnn302u3fvZty4\ncdV6kNu2baNNmzbR6TZt2rBt27ZatC9Yhz8nXdS1wHMdwBLPdQB7khU9ayrvtZGq2O1027Vrx6mn\nngpA586dGTBgAABdunTB930AXnnlFZYsWcLs2bOB0PHztmzZQocOHawsNgdFe3xK5YW6j+rWj3lx\nUVFRdLqoqIiDBw8CoUL55JNP0r59+5TmeeKJJ7I1dOA7AD799FNOPPHE2ocMSLIen67jC5TvOoAl\nvusA9kSOkOxc6E812TZ8NUllL62BAwdWGhx59913a3x+q1ataNy4MatXr8YYw0MPPcRFF11U+5AB\nOfLI6K24j2vhUyqHpLvqNNFRWhLt3XHjjTfy/fff07VrV7p06cIf/vAHAD777DMuuOCCuK+55557\nGDt2LKeccgrt27dn0KBB6YXMgGSfk+6rqzIj3d2+XHyvQe2aZmVwoy1wJWeeCStW1OLlBWbNGigt\nBViDMWfovrrKkmw6qnamC5z+0805uqgbKN91AEt81wHsyZp1fCpIWviUygOe51FcXExxcTFlZWWV\nRlptO/bY+Ec9juX7Pj169OCUU05hxIgR0VFlW7TwBcpzHcASz3UAe7JmO76aFRUVUV5eznvvvUff\nvn256aabrLzvDz/8UO2+VA6LNWXKFK699lr+/ve/06RJE+bOnZuJeAnVqfCJiC8i74lIhYisDiqU\nUio9xpjo5io9e/bkH//4R/SxqqeAPHToEIsWLeLaa68F4K677uJf//VfAdi0aRNnnnkmADfddBM/\n/elP6dq1K7/+9a+j8ysrK+Oaa66htLSUu+++G9/36dWrF8XFxdx4440p5X311Ve55JJLABg9enT0\nZEa21LXHdwjoZ4zpbowpDSJQbvNdB7DEdx3Anhxcx/fSSy9Fz5wWewrIdevWUVRUxKOPPkqfPn1Y\nuXIlEDq7WvPmzfnss89YsWIFffv2BWDixIm8/fbbrF+/nn379vH8889H3+P7779n9erVXHPNNVx9\n9dVMmDCB9957jxNOOKFSlqqHngf48ssvadq0KUVFofLTpk2bSoXahrqO6gq2F5fTHaHTETdVp9Fb\nk/7mKo6UlZXx5Zdfcuyxx3LzzTcD8U8B2apVK1q2bMnevXvZu3cvW7duZeTIkbz++uusWLEi2hNb\nvnw5s2fPZt++fezatYvTTjuN888/H4B/+7d/i77vm2++GT0t5WWXXcb1118ffSzR6S9dq2vRMsBS\nEVkjIuOCCJTbPNcBLPFcB7DH6+c6QcrKy8vZsmUL3bp1i254HDkF5Lp166ioqOCDDz6ILo726tWL\nBx98kI4dO9KnTx9WrFjBqlWr6N27N9999x0TJkzgqaeeYv369fzyl79k//790fdqGHPSDxGJrtdL\nZbvg448/nt27d3Po0CHAzW5udS18vY0xJcBgYIKInBn/aYuB8vBlFZUXlfzKixObyus27ZeHLrHz\nr/p+Om1vOhu+v3R/T7HvXzVPvGmrn29ixhiKioq44447mD9/Prt3767xFJBnnnkmt912G3379qVb\nt2689tpr1K9fn2OPPZb9+/cjIhx//PHs3buXRYsWJXzf3r17R8/kFjkNZTJlZWU88cQTQOg4grZ3\nc6vToq4x5rPw9Rci8jRQCqys/syLa5iLV3nkrOooWqrTkf801f5Dexanfcvv52raT/352fD9pfv7\nqtpric1UNZ/XD3i9bvnSmo4vdiS1VatWjBgxgnvuuYepU6cmPAVknz59+PTTTznrrLMoKiqibdu2\ndOrUCYDGjRszbtw4OnfuzAknnEBpaWnc9wK48847GTlyJLfeemu1AlZSUhJ3cXfWrFkMHz6cG2+8\nke7duzN27NiU2hmUWu+yJiLHAEXGmL0i0hB4BZhujHmlyvOC3WUtkazYqt6nMBYDfZK2M93vNJPf\nX12y+OXJF3et/MZ0l7V0ZHKXtZbA06HCxhHAI1WLXuHxXAewxHMdILF09uJPudi+HvdpKnfVuvAZ\nYzYB3QLMopRSVuieG4HyXQewxHcdwCLfdQCVAVr4lFIFRwtfoDzXASzxXAewyHMdIKm9e/fy4x//\nmI8//hiAgwcP0rVrV9asWRP3+f3792fp0qWV7rvrrruYMGFCje+TysEH4pk2bRqvvvpq9H1itwdM\n165duzj33HPp0KEDAwcO5KuvvqrVfLTwKZXjGjVqxKxZs6KFa/bs2fTu3Tt6yseqRo4cGd3uLmLB\nggWMHDmyxvdJ5eAD8UyfPp3+/fsDoU1f9u3bV6v5QGgzmLPPPpuNGzfSv39//uu//qtW88mOIzAn\n2qUoiFMUWt3MxScXegh15xP45iyZVKffgE92fKfJN2c577zz6N+/P3/5y1+oqKigSZMmcZ+3a9cu\nOnXqxKeffsoRRxzB5s2b6du3b/RMa7fddhuPP/44Bw4cYOjQoUwL/30ed9xx7NmzB4DJkyfz0ksv\nUVRUxNSpUxk2bBgAt9xyC4888gj16tXjvPPOY+bMmVxxxRUMGTKEbdu28Z//+Z907NiR5s2bM2rU\nKNavX88dd9wBwP33388HH3zA7bffnvBT6NixI6+//jotW7Zk+/bt9OvXjw8//LDa85JtzqI9PqXy\nxJ133smUKVO48cYbExY9gKZNm1JaWsqLL74IhHp7kcK1dOlSPvroI1avXk1FRQXvvPNO9GAGEU8+\n+STr169nw4YNLF26lMmTJ7Njxw5eeukllixZwpo1a6ioqOC6666r9LqJEyfSunVrysvLWb58OcOG\nDeO5556LHtrqwQcfjG7IfP7557N9+/Zq2T///HNatmwJhDbU/vzzz2v1WWnhC5TnOoAlnusAFnmu\nA1QS3r01rhdffJHWrVuzYcOGpPMZPnw4CxYsAEKFb8SIEUDovLpLly6lpKSEkpISNm7cyEcffVTp\ntW+++WaPZKa5AAALRElEQVT0+S1atKBfv36sXr2aZcuWccUVV0RPW5mo+EaWMhs2bEj//v157rnn\n2LhxIwcPHoye9/f555+nVatWSduRaPH78GEC439ges4NpXJC6AjF330X/9F//OMf/PnPf2b16tWU\nlZUxduxYTjvttIRzu+iii5g0aRIVFRV8++23dO/eHQgVpRtuuIFx41I/5ogxptbr/8aOHcvMmTPp\n2LEjV1xxRdLnt2zZkh07dkQXdVu0aBH3eYc/p/gDKdrjC5TvOoAlvusAFvmuA4SFCl+iAdFJkyYx\ndepUWrduze23386///u/1zi3hg0b0q9fP6688spo7w1C59V94IEH+Oabb4BQQf3nP/8JHO6p9enT\nh4ULF3Lo0CG++OILVqxYQWlpKeeccw4PPvgg3377LRBal1hV7HpCgNLSUrZu3cpjjz1WKUciF154\nIX/729+Amg9ucPhziv+fQgufUjkhtOwWr8e3bNkytm7dGu0xXXDBBTRr1oz58+fXOMcRI0awfv36\nSgXnnHPOYeTIkfTs2ZOuXbty6aWX8vXXXwOHFyuHDh1K165dKS4u5uyzz2b27Nm0aNGCgQMHcuGF\nF3L66adTUlISHaSI7Q2OGzeOQYMGMWDAgOh9w4YNo3fv3jRu3Dh6X6J1fFOmTGHp0qV06NCB5cuX\nVzr2X6zDn1P8wpcdo7qJZHJkMCsOapDHXI3qxvte8+I7bQxcQ5s24PA8QhkxZMgQJk2aRFlZWWDz\nfPxxCB0r9QmMuVRHdZXKTTWv48tFX331FR06dKBhw4aBFj1I3uPTwY1A+WTbKGBm+BRGOyF72lrz\nOr54du7cyYABAyodHVlEWL58OU2bNs1EyLQ0btyYjRs3ZmTehz+n+B+YFj6lckLidXyJNGvWjIqK\nigzlyW7Jeny6qBsoz3UASzzXASzyXAcIC/X4Dhyovi3fqFGj+Otf/xqdfvvtt+nWrVvcc95eeeWV\n3HfffZXue+aZZxg8eHCN796uXTt27tyZdup7772Xhx9+GAiNwsYbsEjVgQMHGD58OO3bt6dnz57R\nQ+jHk6zHp4VPqZxxuPjFmjNnDrNnz+bLL7/EGMPEiRP57//+b+rVq1dtDiNGjLC6n+748eMZNWoU\nAH/729/Ytm1breYDMHfuXJo1a8ZHH33Eb3/722p7hsTK8R6fqX4xCS7xnlurS134dXx9rvCTPyXR\n95Tpy/Tp1S+Zbqs1oR5c1fV8LVq0YPLkyUyePJm//vWvFBcX07Nnz7hzGDBgABs3bmTHjh0A7Nu3\nj2XLlkXPw1v15OORrT5it/6YM2cOXbp0oWvXrtx1113R+x966CGKi4vp3r07o0ePBkIHKLj99tt5\n8skneeeddxg1ahQlJSW88MILDB06NPraZcuW8bOf/azG1j/zzDPR+f785z9n+fLlCZ+bt9vxNTsa\npvWB+rqWUhWMxCO748eP5/333+e2227j1ltvTTiHoqIiLrnkEh5//HEAlixZQllZGY0aNYp78vGq\nZ01bt24d8+bNY82aNbz11lvcd999vPfee7z//vvMnDmT8vJyKioqKhVEEeGSSy7h9NNP59FHH2Xd\nunUMHjyYjRs38uWXXwKh/XR/+ctfAqFt/eKdoGjbtm2cdNJJANSrV48mTZokXPxOtudGzpaNOWfD\n6J8I3x403PqW6zQRnusAlniuA1jkuQ4QI/HIrogwfvx41q5dm3TEdvjw4UyePJmJEyeyYMECLr/8\nciDxycdjrVy5kqFDh9KgQQMALrnkEt544w1EhEsvvTT63sn204XQyccffvhhxowZw6pVq6IbXFdd\nB5lITdsgJ+vx5Wzh+yh8/MGte93mUMqeUOFLdDi7oqIiioqSL8T16tWLzz77jPXr1/PWW2+xcOFC\n4PDJx2fMmFHtNYnW8UU2kRGRlE4mHmvMmDEMGTKE+vXrc+mllybN3qZNG7Zu3Urr1q354Ycf2LNn\nD82aNYv73PBec8C3cR+v06KuiAwSkQ9F5O8iMqUu80rXx+Ee7tbaHYA1Q3zXASzxXQewyHcdIEao\n4oWXDutk2LBhjB49mvPOO4+jjjoKoMaTj8fup7t48WL279/PN998w9NPP02fPn0oKytj0aJF0UXP\nePvpHnvssZX20z3hhBNo3bo1M2bMSOkABUOGDGHevHkAPPHEE9GDm8YT3r0YiP9h1brwiUgR8Gdg\nINAZGCEiHWs7v/xQ+6H63FIo7YTsamuo8IXrUp1E9tONHc3t1KlT9OTjxcXFnHvuudHNTyI9vu7d\nuzNmzBjOOOMMevbsya9+9SuKi4s59dRTmTp1Kn379qV79+5ce+211d5zzJgx/PrXv6akpITvwivh\nfvGLX3DSSSfRoUOH6PMSreMbO3Ys//znP2nfvj133nkns2bNSti+w59R/A+rLicU7wFMM8acF56+\nHjDGmFuqPC+FfXXjZ/j9oQPV7isqCjX2tOHDufSxx5jbpw9bVq7kkLkhrfyJmj2j6Ki05lNZOdCv\nDq/PFeUURjshu9p6IVDCvffCr37lOkswJk6cSElJSUo9vnR06AB//zvAqRjzfqAnFD8RiN1d+lOg\ntA7zU0rVKLgeXzY4/fTTadSoEXPmzAl83sl6fDk7uHHo++8rXWeH3a4DWFIo7YTsamvoGHlPPAGn\nngpNm0KzZqHrpk2hYcPKp6np0aMHB8JbO0cGIebPn0/nzp1dhK/mnXfeqfM8Dh6E3bth167Dl82b\nQ9ehoy/H39ylrou6fzTGDApP17Coq5Squy7A+oSPHnlkqPgddVTNl/r1Ez92xBHBnOMrFYcOhfZC\nSeXy3Xfx79tb41YdS4FzMcZUa1FdCl89YCMwAPgMWA2MMMZ8UKsZKqWSEmEIcAHQCmgavjQLXx/t\nMJorhlC3fFf4sjN8vQF42Jj4w/J1OhCpiAwC7iI0OjzXGJN4mEUplVEiNCBU/I4KX+rH3E71cqTF\nyAY4EL58F3M7ncvXxiQ4o1ANMn4EZqWUyjYZ21fX5cbNmSYic0Vkh4isj7mvqYi8IiIbReRlEWlc\n0zxyhYi0EZFXReT/RGSDiPxH+P68a6+I1BeRt0WkItzWaeH7866tENoWV0TWiciz4em8bGc8GSl8\nBbBx84OE2hbremCZMaYD8CqQ3oaF2esgMMkY0xnoCUwIf5d5115jzHdAmTGmO9ANOE9ESsnDtoZd\nDbwfM52v7awmUz2+UuAjY8xmY8z3wAIg/nngcpAxZiWhFaixLgLmhW/PAy62GipDjDHbjTHvhm/v\nBT4A2pC/7Y3sCVuf0OZehjxsq4i0AQYD98fcnXftTCRThS/exs0nZui9skULY8wOCBULIP6ZjnOY\niHiEekKrgJb52N7w4l8FoX3Vlhpj1pCfbb0DmEzl3abysZ1x5ezx+HJAXo0aiUgjYBFwdbjnV7V9\nedFeY8yh8KJuG6BURDqTZ20VkfOBHeGefE1b7eV0O2uSqcK3DWgbM90mfF8+2yEiLQFEpBXwueM8\ngRGRIwgVvfnGmGfCd+dtewGMMXsI7ag7iPxra2/gQhH5BHgM6C8i84HtedbOhDJV+NYAPxaRk0Xk\nKGA48GyG3ssVofJ/y2eBMeHbo4Fnqr4ghz0AvG+MuSvmvrxrr4g0j4xkisjRwDmE1mnmVVuNMb8z\nxrQ1xvwLob/NV40xlwFLyKN21iRj2/Hl88bNIvIooUN2HA/sIHT4mcXAE8BJwGZgmDEmm3b0rBUR\n6Q28QWhL+MiJSX5HaE+dx8mj9opIF0Ir9YvCl4XGmBki0ow8a2uEiPQFrjXGXJjP7axKN2BWShUc\nHdxQShUcLXxKqYKjhU8pVXC08CmlCo4WPqVUwdHCp5QqOFr4lFIFRwufUqrg/H/H/C6zJy9sPAAA\nAABJRU5ErkJggg==\n",
428 | "text/plain": [
429 | ""
430 | ]
431 | },
432 | "metadata": {},
433 | "output_type": "display_data"
434 | }
435 | ],
436 | "source": [
437 | "# Make a movie!!\n",
438 | "\n",
439 | "from matplotlib.patches import FancyArrow\n",
440 | "# Get the number of frames and initialize the track image\n",
441 | "\n",
442 | "track_matrix = np.copy(track.track)\n",
443 | "track_matrix[-1, -1] = 4\n",
444 | "\n",
445 | "# Create the figure of the track\n",
446 | "fig = plt.figure()\n",
447 | "ax = fig.add_subplot(111, autoscale_on=False, xlim=(0, track_matrix.shape[1] - 1), ylim=(0 ,track_matrix.shape[0] - 1))\n",
448 | "ax.grid()\n",
449 | "im = ax.imshow(np.flipud(track_matrix), origin='upper', interpolation='none')\n",
450 | "\n",
451 | "# Make an info box\n",
452 | "bbox_props = dict(boxstyle=\"round4,pad=0.3\", fc=\"white\", ec=\"b\", lw=2)\n",
453 | "info_template = 'Episode: %d \\n Time: %d \\n Reward: %d \\n X_Velocity: %d \\n Y_Velocity: %d'\n",
454 | "\n",
455 | "# annotation\n",
456 | "annotation = ax.annotate(\n",
457 | " info_template %(0, 0, 0, 0, 0),\n",
458 | " xy=(track_matrix.shape[0] - 2.5, .5),\n",
459 | " bbox=bbox_props\n",
460 | ")\n",
461 | "annotation.set_animated(True)\n",
462 | "\n",
463 | "\n",
464 | "def frame_generator():\n",
465 | " for m_idx, m in enumerate(movie_array):\n",
466 | " visited_states, _, _, _, _ = m\n",
467 | " for vs_idx, vs in enumerate(visited_states):\n",
468 | " yield (m_idx, vs_idx)\n",
469 | "\n",
470 | "\n",
471 | "def updatetrack(idx):\n",
472 | " m_idx, vs_idx = idx\n",
473 | " m = movie_array[m_idx]\n",
474 | " track_matrix = np.copy(track.track)\n",
475 | " visited_states, actions_taken, rewards_given, had_crash, episode_count = m\n",
476 | " state = visited_states[vs_idx]\n",
477 | " x, y = track.convert_cartesian_to_indexes(state[0][0], state[0][1])\n",
478 | " vx, vy = state[1][0], state[1][1]\n",
479 | " \n",
480 | " # place the car\n",
481 | " if had_crash[vs_idx]:\n",
482 | " track_matrix[x, y] = 3\n",
483 | " else:\n",
484 | " track_matrix[x, y] = 4\n",
485 | "\n",
486 | " # get rewards\n",
487 | " r = sum(rewards_given[:(vs_idx + 1)])\n",
488 | " action = actions_taken[vs_idx]\n",
489 | " \n",
490 | " # update the figure\n",
491 | " im.set_array(np.flipud(track_matrix))\n",
492 | " annotation.set_text(info_template %(episode_count, vs_idx, r, vx, vy))\n",
493 | " ax.patches = []\n",
494 | " ax.add_patch(FancyArrow(\n",
495 | " x=state[0][0], y=state[0][1],\n",
496 | " dx=action[0], dy=action[1], \n",
497 | " head_length=.2, head_width=.25, width=.1, edgecolor='white'\n",
498 | " ))\n",
499 | " \n",
500 | " return im, annotation\n",
501 | "\n",
502 | "anim = animation.FuncAnimation(fig, updatetrack, frames=frame_generator, blit=False, save_count=9000)\n",
503 | "# display_animation(anim)"
504 | ]
505 | },
506 | {
507 | "cell_type": "code",
508 | "execution_count": 174,
509 | "metadata": {
510 | "collapsed": false
511 | },
512 | "outputs": [],
513 | "source": [
514 | "from matplotlib.animation import FFMpegWriter\n",
515 | "anim.save('racecar.mp4', fps=60, writer=FFMpegWriter())"
516 | ]
517 | },
518 | {
519 | "cell_type": "code",
520 | "execution_count": null,
521 | "metadata": {
522 | "collapsed": true
523 | },
524 | "outputs": [],
525 | "source": []
526 | },
527 | {
528 | "cell_type": "code",
529 | "execution_count": null,
530 | "metadata": {
531 | "collapsed": true
532 | },
533 | "outputs": [],
534 | "source": []
535 | }
536 | ],
537 | "metadata": {
538 | "kernelspec": {
539 | "display_name": "Python 2",
540 | "language": "python",
541 | "name": "python2"
542 | },
543 | "language_info": {
544 | "codemirror_mode": {
545 | "name": "ipython",
546 | "version": 2
547 | },
548 | "file_extension": ".py",
549 | "mimetype": "text/x-python",
550 | "name": "python",
551 | "nbconvert_exporter": "python",
552 | "pygments_lexer": "ipython2",
553 | "version": "2.7.11"
554 | }
555 | },
556 | "nbformat": 4,
557 | "nbformat_minor": 0
558 | }
559 |
--------------------------------------------------------------------------------
/notebooks/chapter5_racetrack/tracks.py:
--------------------------------------------------------------------------------
1 | import numpy as np
2 |
3 | fancytrack = np.array([
4 | [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
5 | [0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
6 | [0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,3],
7 | [0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,3],
8 | [0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,3],
9 | [0,0,0,1,1,1,1,0,0,0,0,0,0,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0],
10 | [0,0,0,0,1,1,1,0,0,0,0,0,0,0,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0],
11 | [0,0,0,0,1,1,1,0,0,0,0,0,0,0,0,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0],
12 | [0,0,0,0,0,1,1,1,0,0,0,0,0,0,0,0,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0],
13 | [0,0,0,0,0,1,1,1,0,0,0,0,0,0,0,0,0,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0],
14 | [0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0],
15 | [0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0],
16 | [0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,0,0,0,0,0,0,0,0,0,0,0,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0],
17 | [0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,0,0,0,0,0,0,0,0,0,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0],
18 | [0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,0,0,0,0,0,0,0,0,0,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0],
19 | [0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,0,0,0,0,0,0,0,0,0,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0],
20 | [0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0],
21 | [0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0],
22 | [0,0,0,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
23 | [0,0,0,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
24 | [0,0,0,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
25 | [0,0,0,0,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0],
26 | [0,0,0,0,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0],
27 | [0,0,0,0,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0],
28 | [0,0,0,0,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0],
29 | [0,0,0,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0],
30 | [0,0,0,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
31 | [0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
32 | [0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
33 | [0,0,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
34 | [0,0,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
35 | [0,0,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
36 | [0,0,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,0,0,0,0,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
37 | [0,0,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,0,0,0,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
38 | [0,0,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
39 | [0,0,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
40 | [0,0,0,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
41 | [0,0,0,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
42 | [0,0,0,2,2,2,2,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]
43 | ])
44 |
45 | fancytrack2 = np.array([
46 | [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
47 | [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
48 | [0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
49 | [0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
50 | [0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
51 | [0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
52 | [0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
53 | [0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
54 | [0,0,0,0,0,0,0,1,1,1,1,0,0,0,0,0,0,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
55 | [0,0,0,0,0,0,0,1,1,1,1,0,0,0,0,0,0,0,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
56 | [0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
57 | [0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
58 | [0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
59 | [0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
60 | [0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
61 | [0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
62 | [0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
63 | [0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,0,0,0,0,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0],
64 | [0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,3],
65 | [0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,3],
66 | [0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,3],
67 | [0,0,0,0,0,0,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
68 | [0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
69 | [0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
70 | [0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
71 | [0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
72 | [0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
73 | [0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
74 | [0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
75 | [0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
76 | [0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
77 | [0,0,0,0,0,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
78 | [0,0,0,0,0,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
79 | [0,0,0,0,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
80 | [0,0,0,0,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
81 | [0,0,0,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
82 | [0,0,0,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
83 | [0,0,0,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
84 | [0,0,0,2,2,2,2,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]
85 | ])
--------------------------------------------------------------------------------
/notebooks/chapter7.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "code",
5 | "execution_count": 3,
6 | "metadata": {
7 | "collapsed": false
8 | },
9 | "outputs": [],
10 | "source": [
11 | "import numpy as np\n",
12 | "\n",
13 | "from matplotlib import pyplot as plt\n",
14 | "%matplotlib inline"
15 | ]
16 | },
17 | {
18 | "cell_type": "markdown",
19 | "metadata": {},
20 | "source": [
21 | "##### Exercise 7.1\n",
22 | "\n",
23 | "Why do you think a larger random walk task (19 states instead of 5) was used in the examples of this chapter? Would a smaller walk have shifted the advantage to a different value of n? How about the change in left-side outcome from 0 to -1? Would that have made any difference in the best value of n?\n",
24 | "\n",
25 | "A small random walk would truncate large n-step to their total returns since episodes will be shorter (i.e. large n would just result in alpha MC methods). Therefore we should expect the advantage at lower n for smaller random walks. \n",
26 | "\n",
27 | "With values initialized at 0, if the left-most value terminated in 0 reward, we would need longer episodes for an agent to assign the correct values to the states left of center, since episodes that terminate to the left will not cause any updates initially, only the episodes that terminate to the right end with non-zero reward. Thus I would expect the best value of n to increase."
28 | ]
29 | },
30 | {
31 | "cell_type": "markdown",
32 | "metadata": {},
33 | "source": [
34 | "---------\n",
35 | "\n",
36 | "##### Exercise 7.2\n",
37 | "\n",
38 | "Why do you think on-line methods worked better than off-line methods on the example task?\n",
39 | "\n",
40 | "Off-line methods generally take random actions with some small probability $\\epsilon$. We would expect at least 1-2 random actions in an environment with a minimum of 10 states to termination, depending on $\\epsilon$ (assuming $\\epsilon$ is between 10-20%). Therefore, even after finding the optimal action-values, these random actions will attribute erroneous rewards to certain actions, leading to higher RMSEs compared to on-line methods; we also see that larger n is more optimal for off-line methods compared to on-line, presumably because larger n reduces noise from the $\\epsilon$ greedy actions."
41 | ]
42 | },
43 | {
44 | "cell_type": "markdown",
45 | "metadata": {},
46 | "source": [
47 | "-----------\n",
48 | "\n",
49 | "##### Exercise 7.3\n",
50 | "\n",
51 | "In the lower part of Figure 7.2, notice that the plot for n=3 is different from the others, dropping to low performance at a much lower value of $\\alpha$ than similar methods. In fact, the same was observed for n=5, n=7, and n=9. Can you explain why this might have been so? In fact, we are not sure ourselves.\n",
52 | "\n",
53 | "My hypothesis is that odd values of n have higher RMSE because of the environment. It takes at a minimum, an odd number of steps to reach termination from the starting state. For off-line methods, even after finding the optimal action-values, an agent may still not terminate in an odd number of steps. Therefore my hypothesis is that odd n-step methods are more likely to cause erroneous updates to the $\\epsilon$ greedy actions compared to even n-step methods. A quick way to test this, would be to create a random-walk where an agent will terminate at a minimum in an even number of steps, and then to observe the same plots as in Figure 7.2. "
54 | ]
55 | },
56 | {
57 | "cell_type": "markdown",
58 | "metadata": {},
59 | "source": [
60 | "----------\n",
61 | "\n",
62 | "#### Exercise 7.4 \n",
63 | "\n",
64 | "The parameter $\\lambda $ characterizes how fast the exponential weighting in Figure 7.4 falls off, and thus how far into the future the $\\lambda $-return algorithm looks in determining its backup. But a rate factor such as $\\lambda $ is sometimes an awkward way of characterizing the speed of the decay. For some purposes it is better to specify a time constant, or half-life. What is the equation relating $\\lambda $ and the half-life, $\\tau$, the time by which the weighting sequence will have fallen to half of its initial value?\n",
65 | "\n",
66 | "The half life occurs when weighting drops in half:\n",
67 | "\n",
68 | "$ \\lambda^{n} = 0.5 $,\n",
69 | "\n",
70 | "which occurs at,\n",
71 | "$n = -ln(2) / ln(\\lambda) = \\tau$\n"
72 | ]
73 | },
74 | {
75 | "cell_type": "markdown",
76 | "metadata": {},
77 | "source": [
78 | "-----\n",
79 | "Getting (7.3) from the equation above it:\n",
80 | "\n",
81 | "$R_t^\\lambda = (1 - \\lambda) \\sum_{n=1}^\\infty \\lambda^{n-1} R^{(n)}_t$,\n",
82 | "\n",
83 | "after $T-t-1$, we sum to infinity but with $R^{T-t-1}_t$, which is just the total return $R_t$, so:\n",
84 | "\n",
85 | "$R_t^\\lambda = (1 - \\lambda) \\sum_{n=1}^{T-t-1} \\lambda^{n-1} R^{(n)}_t + (1 - \\lambda) R_t \\sum_{n=T-t-1}^{\\infty} \\lambda^{n} $\n",
86 | "\n",
87 | "We can remove $\\lambda^{T-t-1}$ from the last sum to get $ (1 - \\lambda) R_t \\lambda^{T-t-1} \\sum_{n=0}^\\infty \\lambda^n = (1 - \\lambda) R_t \\lambda^{T-t-1} \\frac{1}{1 - \\lambda}$, so that: \n",
88 | "\n",
89 | "$R_t^\\lambda = (1 - \\lambda) \\sum_{n=1}^{T-t-1} \\lambda^{n} R^{(n)}_t + \\lambda^{T-t-1} R_t $"
90 | ]
91 | },
92 | {
93 | "cell_type": "markdown",
94 | "metadata": {},
95 | "source": [
96 | "----------\n",
97 | "\n",
98 | "##### Exercise 7.5\n",
99 | "\n",
100 | "In order to get TD($\\lambda$) to be equivalent to the $\\lambda$-return algorithm in the online case, the proposal is that $\\delta_t = r_{t+1} + \\gamma V_t(s_{t+1}) - V_{t-1}(s_t) $ and the n-step return is $R_t^{(n)} = r_{t+1} + \\dots + \\gamma^{n-1} r_{t+n} + \\gamma^n V_{t+n-1}(s_{t+n}) $. To show that this new TD method is equivalent to the $\\lambda$ return, it suffices to show that $\\Delta V_t(s_t)$ for the $\\lambda$ return is equivalent to the new TD with modified $\\delta_t$ and $R_t^{(n)}$.\n",
101 | "\n",
102 | "As such, we expand the $\\lambda$ return:\n",
103 | "\n",
104 | "$\n",
105 | "\\begin{equation}\n",
106 | "\\begin{split}\n",
107 | "\\frac{1}{\\alpha} \\Delta V_t(s_t) =& -V_{t-1}(s_t) + R_t^\\lambda\\\\\n",
108 | "=& -V_{t-1}(s_t) + (1 - \\lambda) \\lambda^0 [r_{t+1} + \\gamma V_t(s_{t+1})] + (1-\\lambda) \\lambda^1 [r_{t+1} + \\gamma r_{t+2} + \\gamma^2 V_{t+1}(s_{t+2})] + \\dots\\\\\n",
109 | "=& -V_{t-1}(s_t) + (\\gamma \\lambda)^0 [r_{t+1} + \\gamma V_t(s_{t+1}) - \\gamma \\lambda V_t(s_{t+1})] + (\\gamma \\lambda)^1 [r_{t+2} + \\gamma V_{t+1}(s_{t+2}) - \\gamma \\lambda V_{t+1}(s_{t+2})] + \\dots\\\\\n",
110 | "=& (\\gamma \\lambda)^0 [r_{t+1} + \\gamma V_t(s_{t+1}) - V_{t-1}(s_t)] + (\\gamma \\lambda) [r_{t+2} + \\gamma V_{t+1}(s_{t+2}) - V_t(s_t+1)] + \\dots\\\\\n",
111 | "=& \\sum_{k=t}^\\infty (\\gamma \\lambda)^{k-t} \\delta_k\n",
112 | "\\end{split}\n",
113 | "\\end{equation}\n",
114 | "$\n",
115 | "\n",
116 | "where $\\delta_k = r_k + \\gamma V_k(s_{k+1}) - V_{k-1}(s_k)$ as defined in the problem. Therefore, for online TD as defined above, the $\\lambda$ return is exactly equivalent.\n"
117 | ]
118 | },
119 | {
120 | "cell_type": "markdown",
121 | "metadata": {},
122 | "source": [
123 | "-------------\n",
124 | "\n",
125 | "##### Exercise 7.6\n",
126 | "\n",
127 | "In Example 7.5, suppose from state s the wrong action is taken twice before the right action is taken. If accumulating traces are used, then how big must the trace parameter $\\lambda $ be in order for the wrong action to end up with a larger eligibility trace than the right action?\n",
128 | " \n",
129 | "The eligibility trace update is $e_t(s) \\leftarrow 1 + \\gamma \\lambda e_{t-1}(s)$ if $s = s_t$ and $e_t(s) \\leftarrow \\gamma \\lambda e_{t-1}(s)$ if $s \\neq s_t$. For two wrong actions, then one right action, $e_t(wrong) = (1 + \\gamma \\lambda) \\gamma \\lambda $, and $e_t(right) = 1$. If we want $e_t(wrong) \\gt e_t(right)$, we need $(1 + \\gamma \\lambda) \\gamma \\lambda \\gt 1$, or $\\gamma \\lambda \\gt \\frac{1}{2} (\\sqrt(5) - 1)$."
130 | ]
131 | },
132 | {
133 | "cell_type": "markdown",
134 | "metadata": {},
135 | "source": [
136 | "-----------\n",
137 | "\n",
138 | "##### Exercise 7.7\n"
139 | ]
140 | },
141 | {
142 | "cell_type": "code",
143 | "execution_count": 4,
144 | "metadata": {
145 | "collapsed": true
146 | },
147 | "outputs": [],
148 | "source": [
149 | "class LoopyEnvironment(object):\n",
150 | " def __init__(self):\n",
151 | " self._terminal_state = 5\n",
152 | " self._state = 0\n",
153 | " self._num_actions = 2\n",
154 | " \n",
155 | " @property\n",
156 | " def state(self):\n",
157 | " return self._state\n",
158 | " \n",
159 | " @state.setter\n",
160 | " def state(self, state):\n",
161 | " assert isinstance(state, int)\n",
162 | " assert state >= 0 and state <= self._terminal_state\n",
163 | " self._state = state\n",
164 | " \n",
165 | " @property\n",
166 | " def terminal_state(self):\n",
167 | " return self._terminal_state\n",
168 | "\n",
169 | " def reinit_state(self):\n",
170 | " self._state = 0\n",
171 | " \n",
172 | " def get_states_list(self):\n",
173 | " return range(self._terminal_state + 1)\n",
174 | " \n",
175 | " def get_actions_list(self):\n",
176 | " return range(self._num_actions)\n",
177 | " \n",
178 | " def is_terminal_state(self):\n",
179 | " return self._state == self._terminal_state\n",
180 | " \n",
181 | " def take_action(self, action):\n",
182 | " \"\"\"\n",
183 | " action int: 0 or 1\n",
184 | " if action is 0 = wrong, then don't change the state\n",
185 | " if action is 1 = right, then go to the next state\n",
186 | "\n",
187 | " returns int: reward\n",
188 | " \"\"\"\n",
189 | " assert action in [0, 1]\n",
190 | " assert self.is_terminal_state() == False\n",
191 | " if action == 1:\n",
192 | " self._state += 1\n",
193 | " if self._state == self._terminal_state:\n",
194 | " return 1\n",
195 | " return 0"
196 | ]
197 | },
198 | {
199 | "cell_type": "code",
200 | "execution_count": 150,
201 | "metadata": {
202 | "collapsed": false
203 | },
204 | "outputs": [],
205 | "source": [
206 | "import random\n",
207 | "from itertools import product\n",
208 | "\n",
209 | "class SARSA_lambda(object):\n",
210 | " def __init__(self, environment):\n",
211 | " states = environment.get_states_list()\n",
212 | " actions = environment.get_actions_list()\n",
213 | " \n",
214 | " self.environment = environment\n",
215 | " self.state_actions = list(product(states, actions))\n",
216 | " self.Q = np.random.random([len(states), len(actions)])\n",
217 | " self.e = np.zeros([len(states), len(actions)])\n",
218 | " \n",
219 | " def _get_epsilon_greedy_action(self, epsilon, p):\n",
220 | " if random.random() <= epsilon:\n",
221 | " action = random.randint(0, len(p) - 1)\n",
222 | " return action\n",
223 | " actions = np.where(p == np.amax(p))[0]\n",
224 | " action = np.random.choice(actions)\n",
225 | " return action\n",
226 | " \n",
227 | " def learn(self, num_episodes=100, Lambda=.9, gamma=.9, epsilon=.05, alpha=0.05,\n",
228 | " replace_trace=False):\n",
229 | " \"\"\"\n",
230 | " Args:\n",
231 | " num_episodes (int): Number of episodes to train\n",
232 | " Lambda (float): TD(lambda) parameter \n",
233 | " (if lambda = 1 we have MC or if lambda = 0 we have 1-step TD)\n",
234 | " gamma (float): decay parameter for Bellman equation\n",
235 | " epsilon (float): epsilon greedy decisions\n",
236 | " alpha (float): determines how big should TD update be\n",
237 | " \n",
238 | " Returns:\n",
239 | " list (int): the number of time steps it takes for each episode to terminate\n",
240 | " \"\"\"\n",
241 | " \n",
242 | " time_steps = []\n",
243 | " for n in xrange(num_episodes):\n",
244 | " time_idx = 0\n",
245 | " self.e = self.e * 0\n",
246 | " self.environment.reinit_state()\n",
247 | " s = self.environment.state\n",
248 | " a = random.randint(0, self.Q.shape[1] - 1)\n",
249 | " while not self.environment.is_terminal_state():\n",
250 | " r = self.environment.take_action(a)\n",
251 | " time_idx += 1\n",
252 | "\n",
253 | " s_prime = self.environment.state\n",
254 | " a_prime = self._get_epsilon_greedy_action(epsilon, self.Q[s_prime, :])\n",
255 | " delta = r + gamma * self.Q[s_prime, a_prime] - self.Q[s, a]\n",
256 | "\n",
257 | " if replace_trace:\n",
258 | " self.e[s, a] = 1\n",
259 | " else:\n",
260 | " self.e[s, a] = self.e[s, a] + 1\n",
261 | " \n",
262 | " for s, a in self.state_actions:\n",
263 | " self.Q[s, a] = self.Q[s, a] + alpha * delta * self.e[s, a]\n",
264 | " self.e[s, a] = gamma * Lambda * self.e[s, a]\n",
265 | " \n",
266 | " s = s_prime\n",
267 | " a = a_prime\n",
268 | " \n",
269 | " time_steps.append(time_idx)\n",
270 | " return time_steps\n"
271 | ]
272 | },
273 | {
274 | "cell_type": "code",
275 | "execution_count": 151,
276 | "metadata": {
277 | "collapsed": false
278 | },
279 | "outputs": [],
280 | "source": [
281 | "env = LoopyEnvironment()\n",
282 | "s = SARSA_lambda(env)"
283 | ]
284 | },
285 | {
286 | "cell_type": "markdown",
287 | "metadata": {},
288 | "source": [
289 | "Run both the replace-trace and the SARSA($\\lambda$) regular trace methods for X episodes, and repeat N times. Get the average time length over all X episodes for each iteration for each alpha. In the environment in Figure 7.18, it takes at a minimum, 5 time steps to terminate. This is our baseline."
290 | ]
291 | },
292 | {
293 | "cell_type": "code",
294 | "execution_count": 154,
295 | "metadata": {
296 | "collapsed": false
297 | },
298 | "outputs": [],
299 | "source": [
300 | "\n",
301 | "def get_results(replace_trace, num_trials, num_episodes):\n",
302 | " alphas = np.linspace(.2, 1, num=10)\n",
303 | " results = np.array([])\n",
304 | " for alpha in alphas:\n",
305 | " res = []\n",
306 | " for i in xrange(num_trials):\n",
307 | " sarsa_lambda = SARSA_lambda(env)\n",
308 | " t = sarsa_lambda.learn(num_episodes=num_episodes, alpha=alpha, \n",
309 | " replace_trace=replace_trace, gamma=0.9,\n",
310 | " epsilon=0.05, Lambda=0.9)\n",
311 | " res.append(np.mean(t))\n",
312 | "\n",
313 | " if results.shape[0] == 0:\n",
314 | " results = np.array([alpha, np.mean(res)])\n",
315 | " else:\n",
316 | " results = np.vstack([results, [alpha, np.mean(res)]])\n",
317 | " return results\n",
318 | "\n",
319 | "num_trials = 100\n",
320 | "num_episodes = 20\n",
321 | "replace_trace = get_results(True, num_trials, num_episodes)\n",
322 | "regular_trace = get_results(False, num_trials, num_episodes)\n",
323 | " "
324 | ]
325 | },
326 | {
327 | "cell_type": "code",
328 | "execution_count": 155,
329 | "metadata": {
330 | "collapsed": false
331 | },
332 | "outputs": [
333 | {
334 | "data": {
335 | "text/plain": [
336 | ""
337 | ]
338 | },
339 | "execution_count": 155,
340 | "metadata": {},
341 | "output_type": "execute_result"
342 | },
343 | {
344 | "data": {
345 | "image/png": "iVBORw0KGgoAAAANSUhEUgAAAY4AAAEZCAYAAACAZ8KHAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3Xd4FOX2wPHvSeglEJDeqzQBUUARJNIERASRi6hguSIW\nrNfKVQEbqFdBFEX82UApVgRBEZGIWBCUrvQuTUroSEjO74+ZhE1Iwm6ym9kN5/M8+2R3Z+ads7Ob\nPfuWeUdUFWOMMcZfUV4HYIwxJrJY4jDGGBMQSxzGGGMCYonDGGNMQCxxGGOMCYglDmOMMQGxxOEh\nEblORL72Oo5QEJGZItLP6zhCSURai8ifISg3WURqBrvcvEZE2orIVq/jyIiIPCYi47yOI1TO2sQh\nIptE5KiIHBSRQ+7f0bkZg6pOVNXOwSzT/cCmvJ6D7ms8KSKlMli3Srp1D7lfWvf7ua8hInIi3TF8\n0H1tXVV1QjZfQ5ZfnCLSVUR+EJH9IrJdRMaJSFGf5QVE5B0ROeAu9+v1BEpV56tq/VAUHYIy86pM\nj5WIPCUiy0QkUUSezGD5de73wCER+UxESvos8/szlFECU9Xhqnpbdl9UuDtrEwfOB+4KVY1R1eLu\n33uCuQMRiQ5mef5wP7AprycGeB6IV9V9Gay7Nd265wFJwCcB7HJyumP4vzNt4MdxOdMXZwzwNFAB\nqA9UBl70WT4MqAVUAdoBD4tIpzPFFUbE6wCyw4vP+xmsBR4Cvky/QEQaAmOB64FywDHgDZ9VAvkM\nCWdbslfVs/IGbATaZbLsdeATn8fPA7N9HncDFgP7gfnAeenKfRhYivNhjML5YvsU2A38DYx2170R\n+MFn25HALuCAu30D9/kCwP+AzcAON76Cfr7O9cANfq47BJgTwDEcAozPZNlc4Baf1zkfeBnYAzyF\n808ZDyS4x2WSu+73QDJwGDgI9PYjjp7AUp/HfwHtfR4PAyZmsf2Z3s9HgZXAXuBtoIC7rC2w1Wfd\nR4Btbtx/Apf5vH+j3Li2ue9zfp/tHgK2u8tuxkneNc/03gOlgelu3HuB77N4jaOALe5nayHQ2n2+\nAnAUKOmz7vnu5zTafXwL8Ie7j6+Aqj7rJgN3AmuA9Vnty11WCHgf2Oce04fSHcMKOD9cduN8du9O\nt+177rYrgAeBLX58PiYAT6Z77lngA5/HNYF/gKKBfIaAIu7xOwkcct/78jj/GxPcdaq5x+km97js\nBQYCF+L8n+8DXk1XblbHPMPvidy85erOwulG1omjMLAK6A+0cT/EFdxl57tv2oU4vzT6uWXl9yn3\nd6AiUBAncSzB+ecvhPNF0Mpd90Zgnnu/k/tPVtx9fC5QzueDMhUoARQFvgCe9eM1Xup+kIv4eUzW\nAf18HldxP9SVM1k/kMSRiPMFE+Ueh4nAY+7y1GPiPk4GagTwXo5K+acGSrrbl/FZfjU+iSXdtv68\nn8vc97MkTmJ5yl3WFveLC6jrfimkvGdVU14DTqL8CeeLvjTwIzDMXdYZJyHUdz93H5I2cWT63gPP\n4SSSKCAauCSLY3SdG38UcL+7z5QE+C3wb591XwBed+9fhZMU6rrbDgZ+TPdezXLjK+jHvka4n40Y\n95gu9TmGAiwC/uu+nuo4n8mOPtt+7+6rErCc7CeOqcBD6Z476H4eAv0MtU0fBz7/G5xKHK/jfNY7\n4Pyo/Mz9PFTE+Qy2OdMxJ4vvidy8ef4F7tUN5wvhIM4X4373r+8/T3OcbL8R+JfP86/j/tP7PLfK\n503fCNzos+wi90MRlUEMvonjMrecloCkW+8wPl+kwMXABj9e4/8B7/h5PNoQQJJxtxmC8yvN9xiW\nd5elTxyb0m37Pk5TQaUMyk3G/eL0I4aO7vtUy31cGeeLt4DPOh0yO15+vp8DfJZ1Ada6930TRy1g\nJ9AeyJeuvHXA5T6PO6XEg1ODec5nWR3f15/Ve4/zK/jzlNce4Od/H27NCvg3PjVNnAR4iXt/JnCz\nz7Io4AhQxee9ahvAvtYDHXyW/dvnGLbM4HPyKPC2z7YdfZYNIPuJ41vgtnTPbcP5sRXoZ8ifxJGE\n+7/hPrcHn9o0Ti3rnjMdc7L4nsjN29ncxwFwlaqWUtVY9+/bKQtUdSGwAedX0Mc+21QD/iMi+9zb\nfpwPWkWfdbb53K8CbFbV5KwCUdW5wGvAGGCXiIwVkWIiUganOvxbyj5xqq6lsypPRAoDvXGq9v7o\nD3yqqkf9XD/FlHTHcGcm66Uf/fIQzj/EryKyXERuDnC/iMhFOL/Qe6nqevfpw+7fGJ9VS+A0I2Qk\n0Pdzc7plALj7vw8YivP+TRSR8u7iijhfxhmVUZG0x2azz+s703v/Is6X6Tcisk5EHsnkNSIiD4rI\nH+6Agv04x+ccd/GnwEUiUk5E2gJJqvqjz/F5xWf/e3Ha8ytlcnzOtK+K6db3fe1VgUrp3ovHgLKZ\nbLuZ7DtM2s8InPqcBPoZ8tdun/vHcH5Q+j4u5t7P9Jhn9j2Rw7gCdrYnjkw7IUXkLpxq5XactusU\nW3GaCkr5fGEWU9UpPutouvWrisgZj7WqvqaqFwINcKqgD+H8MjkKNPTZZ0lVLXGG4q4G9qrqvDPt\nV0QKEViSyQ5N80B1t6repqqVgNuB1wMZgioi5+M0N9ykqvE+5SbgNI008Vm9CU57ekb8eT+r+Nyv\nhvOZOI2qTlbVNu464PSN4a5fzWdV3zJ2ZFB+yrHK8r1X1cOq+qCq1gK6Aw+IyGXp4xKR1jifpWvc\n1xeLU7sUt5wE4BvgWqAvMNln8y3AwAyOzy++L93ffbmvt7LPtlV97m/F+VXvu68Sqnqlz3FMf6yy\nayU+nxERqQXkB9Zk4zOkmTyfXVvJ4phn8j2Rq872xJEhEamLM2rnepxf4g+LSGN38VvA7SLSwl23\nqDs8tGjGpfErzodwhIgUEZGCItIqg31eKCItRCQfzq+P40CyOnXVt4BR7i9QRKSSH6OE+gPj/XzJ\nVwP7VPV7P9fPMRG5RkRSfrUm4DR5pNTKduJ0Vma2bSOcX953q+rMDFaZADwuIiVFpD5Ok8a7mRTn\nz/t5l3vMS+G0N09OX4iI1BWRy0SkAHAC5z1MeT2T3HjOEZFzgCfcGAE+Am4SkfoiUgRIHTZ6pvde\nRK5wv/DA+TV80mefvorj9DHtdYeZPuk+52sSzmemF07/U4o3gcEi0sDdZwkRuSaDffi7r4+Ax9z3\nphJwl8+yX4FDIvKwiBQSkWgRaSgiF7rLP/bZtjIwKIs4EJF87o+iKCC/+7+X8p33IXCliFzivtdP\n4dS4j7jLA/kM7QJKi0j6GkyacLKKNZ2xZHLMM/ueCKDs4PCqjczrG07b9RGcX0Mpt09xOuUW4NNx\nhvOLeCmnOkw74XzI9+GMvpjCqdEYG0jX6Y7zC+tznF+Qu4FR7vO+fRzt3H0cdNeZgNvfgFPzeRan\nWSIB55fPoCxeW0WcL6/T+glwhhy+nu65r4GhGaxbxY0nO53j35G2j2NeuuXPc2oE0lrS9i/dhvPr\nch/OL9f0Zb+D8yV5EOcL8xCw3Gd5AZy+gwM4SfveM3wWsno/N+LUOFe6y98BCrnLfPs4znM/Nwfc\n93kap/p7CuJ04G93yx9J2vbzh904t+GMvEk/qirD9x6naWyj+/q3AIMzeX1RPsfjL5zRSGk+pzgD\nFg4CyzLY/nqcAQIJOM1D/+ezLDVWf/aF0/Q2HqdPbCVOIl7rs315nMS1A6eJ5iefbQvj9I3txxlV\n9R+y6OPA+aJPdmNMufX3WX6t+3oO4XRU+44sC/Qz9H/u+76PU6Oq0vdxRPmsvwW41OfxeN/3L7Nj\nThbfE7l5EzeYkBCRt3GGOu5S1cbuc7E4/5jVgE04Hc8HMth2E86blgwkqmqLkAVqTCZEZCNOUvvO\n61jyIhG5Heijqqc1sZnwFeqmqneBy9M99yjwraqei/Or9LFMtk0G4lT1fEsaxuQNIlJeRFqJ41yc\nWsNnXsdlAhPSxKGq83Gqlb6uwqlu4v7tkcnmgvXBGO+Frkp+diqA029yEGdI7OekPWPbRIB8Huyz\nrKruAlDVnSJSNpP1FJgtIknAOFV9K9ciNMalqjbZYBCp6hac/iATwbxIHOll9ovuElXd4Y4mmS0i\nf7o1GGOMMR7yInHsEpFyqrrLPUFqd0YrqeoO9+/fIvI50AJnuofTiIg1JxhjTIBUNVsTauZGH4KQ\ndgzzNJwhh+AM0/zitA2c8x2KufeL4gyXXJHVTnJ7OFqgtyFDhngeg8VpcVqcFmfKLSdCmjhEZCLO\nOOy6IrJFnGklRgAdRWQ1zrw+I9x1K4hIyvTH5YD5IrIY+AWYrqrfhDJWY4wx/glpU5WqXpfJog4Z\nrLsD55wPVHUj0DSEoRljjMkmG+6aS+Li4rwOwS8WZ3BZnMFlcYaHkJ45nltERPPC6zDGmNwiImg2\nO8fDYTiuMeYsVr16dTZvzskM6SYr1apVY9OmTUEt02ocxhhPub98vQ4jz8rs+OakxmF9HMYYYwJi\nicMYY0xALHEYY4wJiCUOY4zJJTVq1OC77yL/0i6WOIwxxgTEEocxxvgpKSnJ6xDCgiUOY4zJQo0a\nNXjhhRdo0qQJxYoVY+vWrfTq1YuyZctSq1YtXn311dR1hw0bRu/evbn22muJiYnhwgsvZNmyZRmW\nu3DhQlq1akVsbCyVKlXi7rvv5uTJk6nLV65cSadOnShdujQVKlRgxIgRgDOh64gRI6hduzZlypTh\n2muvJSEhIbQHIR1LHMYYcwaTJ0/mq6++Yt++ffTs2ZNmzZqxY8cO5syZwyuvvMLs2bNT1502bRp9\n+vRh//799O3blx49emRYU4mOjmbUqFHs27ePn3/+me+++47XX38dgMOHD9OxY0e6du3Kjh07WLdu\nHe3btwdg9OjRTJs2jR9++IHt27cTGxvLnXfemTsHIoXXU/sGaXpgNcZEJn/+fyE4t+yoXr26vvfe\ne6qq+ssvv2i1atXSLB8+fLjecsstqqo6dOhQvfjii1OXJScna4UKFXT+/PmpZc2ZMyfD/YwaNUqv\nvvpqVVWdNGmSNmvWLMP16tevr999913q4+3bt2v+/Pk1KSkpw/UzO77u89n6zrUpR4wxYc/rE8sr\nV64MwJYtW/jrr78oVaoU4PzwTk5O5tJLL01dt0qVKqn3RYTKlSuzffv208pcu3YtDzzwAIsWLeLY\nsWOcPHmSCy64AICtW7dSq1atDGPZvHkzPXv2JCoqKjWG/Pnzs2vXLipUqBCcF3wG1lRljDFnIOLM\nzFGlShVq1qzJvn372LdvH/v37+fAgQNMnz49dd2tW7em3ldVtm3bRqVKlU4r84477qB+/fqsX7+e\nhIQEnn322dSpQapUqcL69eszjKVq1aqpzWYpMRw5ciTXkgZY4jDGGL+1aNGC4sWL88ILL3D8+HGS\nkpJYuXIlixYtSl3nt99+Y+rUqSQlJTFy5EgKFSpEy5YtTyvr0KFDxMTEUKRIEVatWsUbb7yRuqxb\nt27s3LmT0aNHc+LECQ4fPsyvv/4KwMCBAxk8eDBbtmwB4O+//2batGkhfuVp5ZnE8c/Jf7wOwRiT\nB6XUNgCioqL48ssvWbJkCTVq1KBs2bIMGDCAgwcPpq5z1VVXMWXKFGJjY/nwww/57LPPiI6OPq2s\n//3vf3z44YfExMQwcOBArr322tRlxYoVY/bs2UybNo3y5ctTt25d4uPjAbj33nu56qqr6NSpEyVK\nlKBVq1apSSW35JnZcV9d8CqDWgzyOhRjTIDy0uy4w4YNY/369YwfP97rUFLZ7LhZGD5/OEcTj3od\nhjHG5HkhTRwi8raI7BKRZT7PxYrINyKyWkRmiUiJTLbtLCKrRGSNiDxypn21rNSSNxa+cabVjDHG\n5FBIm6pEpDVwGBivqo3d554H9qrqC25CiFXVR9NtFwWsAdoD24GFwLWquiqT/eiyncvoMKED6+9Z\nT7ECxUL2mowxwZWXmqrCUcQ1VanqfGB/uqevAt53778P9Mhg0xbAWlXdrKqJwGR3u0ydV+48Lqt+\nGa8ueDWr1YwxxuSQF30cZVV1F4Cq7gTKZrBOJWCrz+Nt7nNZGho3lJd/eZkDxw8EJVBjjDGnC4fO\n8aDVUeudU4+udboy6pdRwSrSGGNMOl5MObJLRMqp6i4RKQ/szmCdv4CqPo8ru89laujQoQCUOFaC\nlxNe5u6Wd1OqcKkghWyMMZEtPj4+9VyQnAr5eRwiUh2YrqrnuY+fB/ap6vNZdI5HA6txOsd3AL8C\nfVX1z0z2ob6vY8C0AZQpWobn2j8XgldkjAkm6xwPrYjrHBeRicBPQF0R2SIiNwMjgI4ikpIYRrjr\nVhCRLwFUNQkYBHwDrAQmZ5Y0MvL4pY/z5m9v8veRv4P7gowxJkS+//77NBMkhrM8c+Z4+tdx14y7\nKJy/MP/r9D+PojLG+MNqHI7vv/+efv36pc5BFSwRV+Pw0uA2g3ln8TvsOLTD61CMMXlEuF46Nrfj\nyrOJo1JMJW5qehPD5w/3OhRjTAQL5NKxx48f58Ybb6RUqVI0bNiQF198MU3zU1RUFBs2bEh9fPPN\nN/Pkk09muN/nn3+e2rVrExMTQ6NGjZg6dWrqsvfff5/WrVvzwAMPcM455zBs2LAQvPLM5ekLOT3a\n+lHqj6nPQ60eokqJyGg7NMaEn5RLx8bGxtKmTRt69uzJRx99xNatW+nQoQP16tWjY8eODB06lC1b\ntrBp0yYOHz5Mly5d0syI63v/TGrXrs2PP/5IuXLl+Pjjj7nhhhtYv3495cqVA2DBggVcd9117N69\nm8TExKC/5qzk6cRRtmhZBjQbwDPznuHNK9/0OhxjTDbJsGw1xZ9Gh2SvL+Xee++lYsWKLFiwgD17\n9vDf//4XgOrVq3PrrbcyefJkOnbsyMcff8ybb75JTEwMMTEx3HPPPWlqA4H05fTq1Sv1fu/evXnu\nuef49ddfufLKKwGoVKlS6rXGCxYsmK3XlV15OnEAPNTqIeq+VpdHWj9CzdiaXodjjMmG7H7hB4u/\nl47dvn176rpAjkZJjR8/npEjR7Jp0yYAjhw5wp49e4JSdk7l2T6OFKWLlOau5nfx9LynvQ7FGBOh\n/L10bMWKFdm2bVvqdulHSBUpUoSjR09d/mHnzp0Z7m/Lli3cdtttvP766+zfv5/9+/fTsGHDNDWW\nQJq9gi3PJw6ABy5+gC/XfMmavWu8DsUYE8HOdOnY3r17M3z4cBISEvjrr78YM2ZMmu3PP/98Jk6c\nSHJyMl9//TXff/99hvs5cuQIUVFRnHPOOSQnJ/Puu++yYsWKkL8+f50ViaNkoZLc1/I+hn2fuyMP\njDGRL5BLxz755JNUqlSJGjVq0KlTJ3r37p2m/2HUqFFMmzaN2NhYJk2aRM+ePTPcZ/369fnPf/7D\nRRddRPny5Vm5ciWtW7cO7QsNQJ49ATC9Q/8covartZnTfw6NyjbKpciMMWeSl08AHDt2LFOmTGHu\n3LmexWAnAOZA8YLFefDiBxkaP9TrUIwxedTOnTv56aefUFVWr17NSy+9xNVXX+11WEF31iQOgLta\n3MWPW39kyc4lXodijMmDTpw4wcCBA4mJiaFDhw707NmTO+64w+uwgu6saapK8covrzBn4xym9Z0W\n4qiMMf7Iy01V4cCaqoJg4IUDWbxzMb/+9avXoRhjTEQ66xJHoXyF+G+b//Lk3IznhzHGGJO1sy5x\nANxy/i2s3rua+Vvmex2KMcZEnDw/5UhGCkQX4IlLn+CJuU8w90bvhskZY6BatWqengWd11WrVi3o\nZZ51neMpTiafpP6Y+rzZ7U3a1WgXosiMMSY8Wed4NuSLysfQtkN5Yu4TNqLDGGMCcNYmDoBrG11L\nwvEEZq2f5XUoxhgTMTxLHCJyr4gsd2/3ZLC8rYgkiMjv7u3xYMcQHRVttQ5jjAmQJ4lDRBoC/wYu\nBJoC3UQko4tlzFPVZu7tmVDE0qtBL04knWDaajsh0Bhj/OFVjaM+sEBV/1HVJGAekNGELiEfahEl\nUTwV9xRPxj9JsiaHenfGGBPxvEocK4A2IhIrIkWArkBGl7O6WESWiMgMEWkQqmC6n9udAtEF+PSP\nT0O1C2OMyTM8OY9DVVeJyPPAbOAwsBhISrfab0BVVT0qIl2AqUDdzMocOnRo6v24uDji4uL8jkdE\nePqyp3lg1gNcXf9qoqOi/d7WGGMiQXx8PPHx8UEpKyzO4xCRZ4Gtqjo2i3U2Aheo6r4MlgV8Hkd6\nqkrrd1tz54V3cn3j63NUljHGhLuIPI9DRMq4f6sCPYGJ6ZaX87nfAifJnZY0ghgPT1/2NEO/H8rJ\n5JOh2o0xxkQ8L8/j+FREVgBfAHeq6kERGSgit7nLrxGRFSKyGBgF9Al1QO1qtKNyTGXGLx0f6l0Z\nY0zECoumqpwKRlNVivlb5tPv836sHrSaAtEFglKmMcaEm4hsqgpXrau2pm7puryz+B2vQzHGmLCU\nZ2ocJ08q0UEaDPXrX7/S66NerL17LYXyFQpOocYYE0asxgG88ELwympRqQXnlz+fcb+NC16hxhiT\nR+SZGkeZMsqsWXD++cEpc8nOJXT9sCvr7llHkfxFglOoMcaECatxAC+/DDfcAMePB6e8puWb0qpK\nK8b8OiY4BRpjTB6RZ2ocyclKnz5QubKTRIJh5e6VtBvfjnV3r6N4weLBKdQYY8KA1TgAEXjjDfjo\nI5gzJzhlNizbkA41OzB6wejgFGiMMXlAnqlxpLyOWbNgwABYtgxKlsx52Wv2ruGSdy5h7d1rKVko\nCAUaY0wYsBqHj8svh+7dYdCg4JRXt3RdutXtxsifRwanQGOMiXB5rsYBcPQoNGsGw4ZBnyBMVLJx\n/0YufOtC1gxaQ+kipXNeoDHGeCwnNY48mTgAFi6Ebt3g99+hUqWc72Pg9IHEFo5lRIcROS/MGGM8\nZokjk7mqnnoKfvwRvv7a6TzPia0HttJkbBP+vOtPyhUrd+YNjDEmjFkfRyYGD4aEBHj99ZyXVaVE\nFW5ofAPP//h8zgszxpgIlqdrHABr1kCrVjB/PtSrl7P97Di0g0ZvNGL5HcupWLxizgozxhgPWY0j\nC3XrwtNPQ79+kJiYs7IqFK/AzU1v5rkfngtOcMYYE4HyfI0DQBW6doUWLZyRVjnx95G/qTemHr/f\n9jvVSlbLWWHGGOMR6xz340JOO3ZA06bwxRdw0UU529/gOYP5+8jfvNX9rZwVZIwxHrGmKj9UqOB0\nkvfrB0eO5KysB1s9yOerPmf9vvXBCc4YYyLIWVPjSHHjjVCkiDOvVU4Mix/GhoQNvN/j/ZwVZIwx\nHojIGoeI3Csiy93bPZmsM1pE1orIEhFpGoz9jh4NX33l3HLivovu46u1X7Fqz6pghGWMMRHDk8Qh\nIg2BfwMXAk2BbiJSM906XYBaqloHGAiMDca+S5SA996DW2+FPXtyUE6hEtx/0f0M+z6Hve3GGBNh\nvKpx1AcWqOo/qpoEzAOuTrfOVcB4AFVdAJQQkaCcsh0XB337wsCBzoir7Lq75d3M3TiX5buWByMs\nY4yJCF4ljhVAGxGJFZEiQFegSrp1KgFbfR7/5T4XFM8845wcOGFC9ssoVqAYD7V6iCHxQ4IVljHG\nhL18XuxUVVeJyPPAbOAwsBhIykmZQ4cOTb0fFxdHXFxclusXKgQffAAdOsCll0L16tnb7x3N7+Dl\nX17m9x2/06xCs+wVYowxIRYfH098fHxQygqLUVUi8iywVVXH+jw3FpirqlPcx6uAtqq6K4Pt/R5V\nld4LL8CMGfDddxAdnb34X/v1Nb5e9zVfXvdl9gowxphcFtJRVSJSS0QKuvfjROQeEcnxpfBEpIz7\ntyrQE5iYbpVpQH93nYuAhIySRk795z9OP8fIHFynaUCzASzbtYxftv0SvMCMMSZMnbHGISJLcEY/\nVQdmAl8ADVW1a452LDIPKAUkAveraryIDARUVce567wGdAaOADer6u+ZlJXtGgfApk3QvLlT6zjv\nvOyVMe63cXzyxyd80++bbMdhjDG5JaRTjojI76raTEQeAo6r6qsislhVz8/ODkMhp4kDnCG6I0fC\nr79CwYKBb5+YlMi5r53Lez3e49Jql+YoFmOMCbVQnwCYKCJ9gRuBlEb8/NnZWTi78UaoWROefDJ7\n2+ePzs+QtkN4Yu4ThEO/kTHGhIo/ieNm4GLgWVXdKCI1gBwMYg1PIjBunDM8d9687JVxfePr2Xl4\nJ3M2zglucMYYE0b8GlUlIgWAeoACq1X1RKgDC0QwmqpSfPkl3H03LF0KMTGBbz9p+SRG/zqan275\nCcnp9WqNMSZEQj2q6gpgPTAaeA1Y504Hkid16wadOsG992Zv+z6N+nA08SgDvxzI7iO7gxucMcaE\nAX+aql4CLlPVOFVtC1wG5GDwavh76SX44Qf4/PPAt42SKObeOJdiBYrRYEwDnpn3DEcTjwY/SGOM\n8Yg/ieOQqq7zebwBOBSieMJCsWJOX8cdd8DOnYFvX6pwKV6+/GUW3LqA5buXU/fVury7+F2SknN0\ncrwxxoQFf4bjvgFUAz7C6ePoDWwBvgVQ1c9CHOMZBbOPw9cTT8DixTB9utN5nl0Lti3gwdkPcuD4\nAV7s+CKX1748eEEaY0w2hPo8jnezWKyqekt2dhxMoUociYnOZWYHDoTbbstZWarKF6u/4JFvH6Fa\niWq82PFFmpRvEpxAjTEmQHbN8RAlDoA//nAmQfzlF6hdO+flJSYl8tbvb/HU90/RuXZnnmn3DJVj\nKue8YGOMCUCoR1XVFZE5IrLCfdxYRB7Pzs4iUYMGzkmB/frByZM5Ly9/dH7ubH4na+5eQ+WYyjQZ\n24TBcwZz8J+DOS/cGGNygT+d428Bj+HMKYWqLgOuDWVQ4WbQIKfDfMSI4JUZUzCGZ9o9w9Lbl7Lj\n8A7qvlqXMb+OITEpMXg7McaYEPCnj2Ohqjb3nZ9KRJaoalCuAR4MoWyqSrFtGzRr5lyr/IILgl/+\n0p1Lefjbh9mUsIkR7UfQo14PO4HQGBMyoZ6rao+I1MIZUYWIXAPsyM7OIlnlyjB6NNxwAxw7Fvzy\nm5RvwqxWM/OqAAAgAElEQVQbZvFql1cZ+v1Q2rzbxqZpN8aEJX9qHDWBcUArYD+wEbheVTeHPjz/\n5EaNI8V110GZMvDKK6HbR1JyEhOWTeDx7x6nVZVWDG8/nFqlaoVuh8aYs06oaxyqqh2AMkA9VW3t\n53Z50pgxzhnls2eHbh/RUdHc1PQm1ty9hiblmtDy/1py39f3sffo3tDt1Bhj/ORPAvgUQFWPqGrK\nGeOfhC6k8BYbC++8A7fcAvv2hXZfRfIX4b+X/pc/7vqDxKRE6o2px4s/vsjxk8dDu2NjjMlCpk1V\nIlIPaAi8ADzksygGeEhVG4Y+PP/kZlNVinvvhd27YdKk3Nvn6j2reXTOoyzesZhn2z1L3/P6EiVn\nbeXPGJMDITkBUESuAnoA3XGu/53iEDBZVX/Kzg5DwYvEceyYM8rqySehb99c3TU/bP6BB2c/SFJy\nEi92fJHLalyWuwEYYyJeqKccuVhVf85WZLnEi8QB8Ntv0KUL/P67M+oqN6kqH638iMfmPEaDMg14\noeMLNCjTIHeDMMZErFB3jvcUkRgRye+eQf63iNyQnZ35EpH7RWSFiCwTkQ/di0X5Lm8rIgki8rt7\nC7uz1S+4wGmyuukmSE7O3X2LCH0a9eHPu/6kfY32xL0Xx8DpA9lx6KwbKW2MyWX+JI5OqnoQ6AZs\nAmqTts8jYCJSEbgbaKaqjYF8ZHw2+jxVbebensnJPkPlkUfg6FF47TVv9l8wX0Huv/h+Vg9aTUzB\nGM574zyGxQ/j8InD3gRkjMnz/Ekc+d2/VwAfq+qBIO07GigqIvmAIsD2DNYJ+1On8+WD8ePh6aed\nCRG9Els4lhc7vcii2xaxZt8azn3tXN767S1OJgdhgi1jjPHhT+KYLiKrgAuAOSJSBsjReFBV3Y5z\nZcEtwF9Agqp+m8GqF4vIEhGZISJh24BfuzY895wzEeIJj6/GXr1kdT68+kO+uPYLJq6YSNOxTZmx\nZgZ5YRZkY0x48GtadREpBRxQ1SQRKQoUV9VsXBsvtbySOOeH9AYO4JwX8rGqTvRZpxiQrKpH3Wuc\nv6KqdTMpT4cMGZL6OC4ujri4uOyGly2q0L07NGkCz4RJo5qq8uWaL3n424epUKwCY7qOoX6Z+l6H\nZYzxQHx8PPHx8amPhw0bljvX4xCRcaqaw0sapc53dbmqDnAf9wNaquqgLLbZCFygqqeddufVqKr0\ndu6Epk3hs8+gVSuvoznlZPJJxi4ay9PznmZqn6lcXOVir0Myxngs1KOqfF2YnZ1kYAtwkYgUEmcK\n2PbAn74riEg5n/stcJJciM/Vzpny5eGNN5wmq0NhdFX2fFH5GNRiEO9d9R7dJ3dn1rpZXodkjIlg\ngSaO3cHYqar+itM8tRhY6j49TkQGikhKjeYad7juYmAU0CcY+w61nj2hUyfo1Qv++cfraNLqUqcL\nU/tMpf/U/kxZMcXrcIwxEcrvpioRKaKqR0McT7aES1NVipMnoU8fiIqCyZMhOtrriNJatmsZXT7s\nwhOXPsHtF97udTjGGA+E+tKxrUTkD2CV+7iJiLyenZ2dLfLlg4kTYf9+uP12p+M8nDQu15h5N83j\nhR9f4LkfnrMRV8aYgPjTVDUSuBzYC6CqS4FLQxlUXlCwIEydCsuXw6OPeh3N6WqVqsX8W+YzacUk\nHvzmQZI1l099N8ZELL/6OFR1a7qnkkIQS55TrBjMmOHcnn/e62hOV7F4RebdNI+ft/3Mv6f9204W\nNMb4xZ/EsVVEWgHqzlf1IOlGQJnMlS4N33wDb74J48Z5Hc3pYgvHMrvfbHYc2sE1H11j1/owxpyR\nP4njduAuoBLOWd5N3cfGTxUrOlcMHDYMpoThYKaiBYoyre80CuUrRJcPu3Dwn4Neh2SMCWMBnQAY\nrsJtVFVmli2Djh3h/fehc2evozldUnISg2YOYuH2hXx1/VeUKVrG65CMMSES6utx1MCZybY6ziy2\nAKhq9+zsMBQiJXEA/PQT9OjhXLf8kku8juZ0qsoTc5/gkz8+4Zt+31C1RFWvQzLGhECoE8dS4G1g\nOZA69EZVv8/ODkMhkhIHwKxZ0L+/0/fRpInX0WRs5M8jGbVgFLNumEW9c+p5HY4xJshCnTgWqGrL\nbEWWSyItcQB8/DHcdx/Ex0OdOl5Hk7H3l7zPo3MeZXrf6VxYMVizzRhjwkGoE8d1QB3gGyB1Eg1V\n/T07OwyFSEwcAG+95UzHPn8+VKrkdTQZ+2LVFwyYPoAp10yxa5sbk4fkJHHkO/MqnAf0A9pxqqlK\n3ccmBwYMcM4u79gR5s2Dc87xOqLTXVXvKkoUKsG/Pv4X464cR496PbwOyRjjMX9qHOuABqrq8SWK\nMhepNY4Ujz4K330Hc+ZA8eJeR5Ox37b/RrdJ3Rjefjg3Nb3J63CMMTkU6mnVVwAls1O48c/w4c51\nPHr0gONhev7dBRUvIP7GeIbED+Hln1/2OhxjjIf8qXHEA42BhaTt47DhuEGUlAR9+0JiotNxns+f\nRkQPbD2wlU4fdOLqelfzTLtncC6nYoyJNKHuHG+b0fM2HDf4TpyAK690zjR/+21nWvZw9PeRv+k6\nsSsXVLiAMV3HEB0VZvPGG2POKKSJIxLklcQBcOSI01nesiW8/DKE6w/6Q/8c4qrJV1GmaBkm9JxA\ngegCXodkjAlASPo4RGS++/eQiBz0uR0SEZvMKESKFnVm0/3uO3j2Wa+jyVzxgsWZef1MTiSd4MpJ\nV3LkxBGvQzLG5JKsGkOKAqhqcVWN8bkVV9WYXIrvrBQb65xd/t57MGaM19FkrlC+Qnzc+2MqFa9E\nhwkd2HcsrC8Jb4wJkqwSR95o+4lQ5cs7M+qOGAEffuh1NJnLF5WPt7u/zSVVLqHte23Zfmi71yEZ\nY0Isq7E7ZUXkgcwWqmqOxmSKyP3Av3FOKlwO3Jz+XBERGQ10AY4AN6nqkpzsM9LUqAFffw3t20OJ\nEtCtm9cRZUxEeLHji5QuXJrW77Tmm37fULtUba/DMsaESFY1jmigGFA8k1u2iUhFnBl3m6lqY5wE\ndm26dboAtVS1DjAQGJuTfUaqhg1h2jS45Rbn7PJwJSI81uYxHm39KG3fa8uyXcu8DskYEyJZ1Th2\nqOpTIdx3NFBURJKBIkD6No6rgPEAqrpAREqISDlV3RXCmMJSixYwaRJcc41TA2nWzOuIMnfbBbdR\nslBJOk7oyGf/+oxLqobh3PHGmBzJqsYRsoGgqrodeAnYgnNVwQRV/TbdapUA32ud/+U+d1Zq3965\n/OwVV8Dq1V5Hk7V/NfwX43uMp8eUHsxcO9PrcIwxQZZVjaN9qHYqIiVxahTVgAPAJyJynapOzG6Z\nQ4cOTb0fFxdHXFxcDqMMPz17QkICdOrkzKhbpYrXEWXu8tqXM73vdHpM7sHIy0fS97y+XodkzFkt\nPj6e+Pj4oJTlyQmAInINcLmqDnAf9wNaquogn3XGAnNVdYr7eBXQNqOmqrx0AqA/Ro50ah8//ABl\nwvzqrit2r6DzB50Z3GYwdza/0+twjDGuUE9yGApbgItEpJA4kx21B/5Mt840oD+AiFyE05x11vVv\nZOT++6F3b+e65QfD/FTMRmUb8cPNP/Dyzy/z1PdPcTYleGPyKs+mHBGRITgjqRKB34EBwC2Aquo4\nd53XgM44w3FvzuziUWdbjQNAFQYNgpUr4auvoHBhryPK2s7DO7n8g8uJqxbHyM4jiZIwnYjLmLOE\nzVV1FiYOgORk6NcPDh2CTz+F/Pm9jihrCccT6DaxGzVja/J297fJHx3mARuTh1niOEsTBzjTsPfs\n6UxT8v774TujboqjiUfp/XFvoiWaKddMoXD+MK8qGZNHRWIfhwmS/Pnho49g82a47z6nCSucFclf\nhKl9plK8YHE6f9iZA8cPeB2SMSZAljjygCJFYPp0Z5TVsGFeR3Nm+aPzM6HnBBqXbUzd1+pyyxe3\n8Okfn1oSMSZCWFNVHrJ7N7RpA3feCffe63U0/lm/bz0z185kxtoZ/Lj1R5pXbE7XOl25os4V1Dun\nnl1h0JgQsT4OSxypNm92ksczz0D//l5HE5gjJ47w3cbvUhNJdFQ0V9S5gq51unJZ9cusP8SYILLE\nYYkjjVWr4LLLYOxYuOoqr6PJHlVl5d8rmbFmBjPXzWTxjsW0qdYmNZFUL1nd6xCNiWiWOCxxnOa3\n36BLF5gyxUkikW7/sf3M3jCbGWtn8NXaryhTtAxda3flirpXcEmVS2xorzEBssRhiSND8fHwr3/B\nzJlw4YVeRxM8yZrMou2LUpu01u1bR4eaHehauytd6nShfLHyXodoTNizxGGJI1PTpsFtt8HcuVC/\nvtfRhMauw7v4at1XzFw7k9kbZlMrtlZqB3vzSs3tLHVjMmCJwxJHliZMgP/+F8aNg2rVoGJFiImB\nvDhgKTEpkZ+2/sSMtTOYuXYmu4/spnPtzlxR5wo61epEbOFYr0M0JixY4rDEcUbvvOMkkO3bnVtS\nkpNAKlTI+m+JEpGdYDYlbOKrtV8xY+0M5m2eR9PyTelapytd63TlvLLn2XBfc9ayxGGJI2CHDsGO\nHc5t+/bM/548eebkUqEClCwZ/gnmWOIx4jfFM2PtDGasncHJ5JOpHezta7SnaIGiXodoTK6xxGGJ\nI2QOHz5zctmxA06cyDyx+N4PlwSjqqzasyq1g33h9oW0qtKKK+pcQbsa7ShfrDyxhWKJjor2OlRj\nQsIShyUOzx05ciqRZJVkUhJMSiK56CK45x7vZ/Y9+M9BZq+fzcy1M5m/dT57ju7hwPEDxBSMoVTh\nUpQuUprShUs79wuXpnSRU/fTL48pGGNNYCbsWeKwxBExUhJMSiJ57z3n/jvvQLNmXkeXVlJyEgnH\nE9h3bB97j+1l79G9Gd5P/9yxk8coVbhU5kkm3XMp94vkL+L1SzZnEUscljgilqrTaf/gg3DLLTBk\nSPhflOpMTiSdYN+xfU5CySTZpCYcn+eAjBNL4dKUK1aO68+7njJFw/xawSZiWOKwxBHxdu50rmi4\nfDm8/Ta0bu11RLnvaOLR1GSSPrGs2beG6aunc0/Le7j/ovspXrC41+GaCGeJwxJHnvHpp3D33dCr\nFwwfDsWKeR1R+NiwfwNPzn2Sbzd8y+A2gxl4wUAK5ivodVgmQtmFnEye0asXrFjhDBdu1Ai++cbr\niMJHzdiafHD1B8y6YRbfrP+Gc187l/eXvE9ScpLXoZmzjCc1DhGpC0wBFBCgJvCEqo72Wact8AWw\nwX3qM1V9JpPyrMaRB339NQwcCO3awcsvO5fHNaf8sPkHHpvzGAnHE3i23bN0P7e7jeYyfovopioR\niQK2AS1VdavP822B/6hqdz/KsMSRRx06BI8+ClOnwmuvOddXN6eoKjPXzuSxOY9RtEBRRrQfQdvq\nbb0Oy0SASE8cnXBqG23SPd8WeFBVr/SjDEscedy8eXDrrdC0Kbz6KpQr53VE4SUpOYnJKybzxNwn\nOPecc3mu3XOcX+F8r8MyYSzS+zj6AJMyWXaxiCwRkRki0iA3gzLh5dJLYelSqFEDGjeGDz5whvIa\nR3RUNNc3vp5Vg1bRrU43uk7sSt9P+7Ju3zqvQzN5kKc1DhHJD2wHGqjq3+mWFQOSVfWoiHQBXlHV\nupmUo0OGDEl9HBcXR1xcXOgCN55atMg556NKFecqh1WqeB1R+Dl84jCv/PIKI38ZSe8GvXmi7RNU\nLF7R67CMh+Lj44mPj099PGzYsMhsqhKR7sCdqtrZj3U3Aheo6r4MlllT1VnmxAkYMcJptnr6aeea\nI1HhUH8OM3uP7mXE/BG8s+Qdbmt2Gw9f8rBNLW+AyG6q6ksmzVQiUs7nfgucJHda0jBnpwIF4Mkn\nnQtUvfuuM/JqnbXKnKZ0kdK82OlFlt6+lD1H91D3tbo8P/95jiYe9To0E8E8SxwiUgToAHzm89xA\nEbnNfXiNiKwQkcXAKJy+EGPSaNQIfvoJrrzSmTDxpZeca42YtCrHVOat7m/xw80/sGjHIuq8Woex\ni8aSmJTodWgmAnk+qioYrKnKgFPjGDAAjh51pi1p1MjriMLXou2LGDxnMBsTNvL0ZU/zr4b/skvs\nnmUiejhuMFjiMCmSk+Gtt+Dxx525rx57zGnWMhmbs2EOj815jMTkRIa3H87ltS63kwjPEpY4LHGY\ndLZuhdtvd/6+/TY0b+51ROFLVfl81ecMnjOY8sXKM7z9cC6ucrHXYZkQs8RhicNkQBU+/BD+8x+4\n8UYYNizyp2wPpZPJJxm/dDxD44fSrEIznm33LA3LNvQ6rCwdP3mczQmb2bB/Q+rtaOJRCuUrROH8\nhSmUr5BzP1/hNM/587hAdIE8XfuyxGGJw2Rh925nxt3Fi+H//s85mTDSJCVBdC5dxfb4yeO8vvB1\nRswfQdc6XRkWN4xqJavlzs7TUVV2Ht6ZmhQ2JmxMkyT+Pvo3VWKqUDO2JjVja1KjZA2KFyzO8ZPH\nOZZ4zPl78tipx0nHM1/m8/j4yeMkJiVSMF9B/5NOuud9n6tWshrta7Qnf7THl7r0YYnDEofxw9Sp\ncNdd0KOHcw5I8TC7pEVystO0tnr1qduqVc7fHTugQwfnfJUrr8ydS+0eOH6Al35+iTELx9CvcT8G\ntxlM2aJlg76fIyeOnJYQUpLExv0bKVagWJrEkHK/ZmxNKsVUIl9UvqDHBM40Lv8k/ZNpkskq8aRf\nZ/nu5WzYv4G+jfrSv0l/zi9/vue1GUscljiMn/bvd5qu5syBN9+Ezmc89TT4Dh3KODmsWwclS0K9\nenDuuWlvZcvCZ585Hf9r1zpNb7feCrVrhz7eXYd38ewPz/Lh8g+5u8XdPHDxA8QUjPF7+6TkJP46\n9NephLB/IxsSTiWIg/8cpEbJGtSIrUHNkqeSQo3YGqk1iLxg7d61TFg2gQnLJlA0f1H6Ne7H9Y2v\np3JMZU/iscRhicME6JtvnF/vbdvCyJFQqlRwy09Kgs2bT08Oq1fDgQNQp87pCaJuXf9qQatWOU1u\n48c7Q44HDICrr4aCIb6m08b9GxkSP4RZ62fxWOvHuP3C2ymUrxAACccTnITgW2tIcJLElgNbKF2k\n9KmaQslTiaFmbE3KFyt/Vg0FTtZkftzyI+OXjufTPz/lgooX0L9xf3rW70mxArl35TJLHJY4TDYc\nPgyDB8MnnzhTl/TqFXgZCQkZJ4f166FMGSchpE8QlSsHZ3qUf/5xmt/eesuZALJfPyeJ1K+f87Kz\nsnzXch6f+ziLdyymbNGybNi/gcTkxDSJISUp1IytSbUS1Sic30YlZORY4jGmr5nO+KXjmb9lPt3P\n7U7/Jv25rPplREeFtlPLEoclDpMD8+c7zT6NGjnX/ChfPu3ykydh48aME8TRo05NIX2CqFMHihbN\nvdewfr0z7Pjdd6FWLSeB9O4NRYqEbp9Ldi7hRNIJasbWpHTh0p632Ue6XYd3MXnFZMYvG8+uw7u4\n/rzr6d+kf8hGtlnisMRhcuj4cWe47ttvw333Oc1JKclh40aoUOFUUvBNEBUrQjh9XyYmwpdfOrWQ\nBQugb18niTRp4nVkJhArd69kwrIJfLDsA8oVK0e/xv3o26gv5YoF70I0ljgscZgg+e03GDfOmao9\nJUnUrh2Z539s2QLvvOMkwwoVnARy7bXhN5rMZC4pOYm5m+YyYdkEvlj1Ba2rtqZf4350P7d7jpv/\nLHFY4jAmU0lJMGuWkxC//95pwhowAC68MLxqSyZrh08c5vM/P2fCsgks2r6IXvV70b9Jfy6pekm2\nBhdY4rDEYYxftm+H995zRmXFxDgJ5PrrnWHAJnJsO7iNicsnMn7peI4kHqFf4370a9yPOqXr+F2G\nJQ5LHMYEJDkZvvvO6QuZNcs5KXLAAGjVymohkURVWbJzCeOXjmfiionUiq1Fv8b96NOoD6UKZz3G\n3BKHJQ5jsu3vv+H9950kki+fM8Ksf38oXdrryEwgEpMSmb1hNuOXjufrdV/TvmZ7+jXuR9c6XSkQ\nffoU0ZY4LHEYk2Oq8MMPTgKZPh26dnVqIXFxVguJNAeOH+CTPz5h/LLx/PH3H/Rp2If+TfrTvGLz\n1GHTljgscRgTVPv2wQcfOEnk+HGnFnLTTVAueKNBTS7ZlLCJD5Z9wPil4xER+jfuzw2Nb6B6bHVL\nHHnhdRgTblSd80HeesuZK6t9e6cW0rFjcM5+N7lHVVnw1wImLJ3AlJVT2PvIXksceeF1GBPODh6E\nSZOcYb1798K//w2XXeac73LOOdacFUn+OfkPhfIXiqzEISJ1gSmAAgLUBJ5Q1dHp1hsNdAGOADep\n6pJMyrPEYUwu+v13Z3qTRYucKViiok6dMJlyZn29elCzZu5MAW8CF9F9HCISBWwDWqrqVp/nuwCD\nVPUKEWkJvKKqF2VShiUOYzyi6lwsy3cer5S/27ZB9eoZJ5Vgz0hsAhPpiaMTTm2jTbrnxwJzVXWK\n+/hPIE5Vd2VQhiUOY8LQP/841xlZter0pFKgQNpEknK/Rg1nWLAJrZwkjnB4e/oAkzJ4vhKw1efx\nX+5zpyUOY0x4KlgQGjZ0br5UYefOtIlkzhzn/s6dTvLwTSopf+0M9/DgaeIQkfxAd+DRnJY1dOjQ\n1PtxcXHExcXltEhjTIiIOBMvVqjgnCfi69gx5yqHKUll9mzneimrVztT1adPJuee6zSH5dY12SNV\nfHw88fHxQSnL06YqEekO3Kmqp13AM4OmqlVAW2uqMubspOrMtZW+yWvVKqePpXZtJ4mcdx40b+5M\n4lg2+JdIzzMito9DRCYBX6vq+xks6wrc5XaOXwSMss5xY0xGjhxxaimrVjlXQ1y40BnxVaLEqSTS\nvDlccIE1d6WIyMQhIkWAzUBNVT3kPjcQUFUd5z5+DeiMMxz3ZlX9PZOyLHEYY9JITnaujJiSRBYu\nhCVLnOYx32Ry/vm5e7XGcBGRiSOYLHEYY/yRlAR//pk2maxY4Vxu1zeZNG7sdOznZZY4LHEYY7Lp\nxAlYvjxtMlm7Fho0cJJISkJp0CBvDRO2xGGJwxgTREePOs1avslk2zbn2u2+yaROncids8sShyUO\nY0yIHTjgTLXim0z27XM63H2TSbVqkTFvlyUOSxzGGA/s2XMqiaT8TUw81VeSkkwqVPA60tNZ4rDE\nYYwJE9u3p00kCxc6I7xKlYLYWOdWsmTG99M/LlEidJNEWuKwxGGMCVOqTpPW/v2QkOD8TX8/s2UJ\nCVC48JkTTGbLChfOPK5In6vKGGPyLBHn+u3ZuYa7Khw6lHXCWbMm82WQeYLJCUscxhgTpkQgJsa5\nVa0a+PbHj2edVLIdV15o4rGmKmOMCUxOmqoidASyMcYYr1jiMMYYExBLHMYYYwJiicMYY0xALHEY\nY4wJiCUOY4wxAbHEYYwxJiCWOIwxxgTEEocxxpiAeJY4RKSEiHwsIn+KyEoRaZlueVsRSRCR393b\n417Faowx5hQvaxyvADNVtT7QBPgzg3XmqWoz9/ZM7oYXXPHx8V6H4BeLM7gszuCyOMODJ4lDRGKA\nNqr6LoCqnlTVgxmtmruRhU6kfJAszuCyOIPL4gwPXtU4agB7RORdtxlqnIhkNHP8xSKyRERmiEiD\n3A7SGGPM6bxKHPmAZsAYVW0GHAUeTbfOb0BVVW0KvAZMzd0QjTHGZMSTadVFpBzws6rWdB+3Bh5R\n1Suz2GYjcIGq7stgmc2pbowxAYqoKwCq6i4R2SoidVV1DdAe+MN3HREpp6q73PstcJLcaUnDLS/P\n9IUYY0y48/IKgPcAH4pIfmADcLOIDARUVccB14jIHUAicAzo412oxhhjUuSJKwAaY4zJPRFz5riI\ndBaRVSKyRkQeyWD5dSKy1L3NF5HzwjTO7m6Mi0XkVxG5JBzj9FmvuYgkisjVuRmfz/7PdDzD4kRR\nf46niMS57/sKEZkbbjGKyINufL+LyHIROSkiJcMwzhgRmeaOuFwuIjfldoxuHGeKs6SIfOb+v//i\n1chQEXlbRHaJyLIs1hktImvdY9r0jIWqatjfcBLcOqAakB9YAtRLt85FQAn3fmfglzCNs4jP/fOA\nP8MxTp/15gBfAleHY5xAW2BabseWjThLACuBSu7jc8ItxnTrdwO+DdNj+RgwPOU4AnuBfGEY5wvA\nE+79c704nu6+WwNNgWWZLO8CzHDvt/TnuzNSahwtgLWqullVE4HJwFW+K6jqL6p6wH34C1Apl2ME\n/+I86vOwGJCci/GlOGOcrruBT4DduRmcD3/j9HpwhD9xXgd8qqp/AajqnjCM0VdfYFKuRJaWP3Eq\nUNy9XxzYq6onczFG8C/OBsB3AKq6GqguImVyN0xQ1fnA/ixWuQoY7667ACjhjnzNVKQkjkrAVp/H\n28g6MdwKfBXSiDLmV5wi0kNE/gSmA7fkUmy+zhiniFQEeqjqG3j3xezv++71iaL+xFkXKCUic0Vk\noYj0y7XoHH7/D7kn43YGPs2FuNLzJ87XgAYish1YCtybS7H58ifOpcDVkDoytCpQOVeiC0z61/IX\nZ/jh7eWoqpAQkcuAm3GqZ2FJVacCU93zV54BOnocUkZGAb7ttl7/qs9MyomiR0WkC86JonU9jikj\nKSe9tgOKAj+LyM+qus7bsDJ0JTBfVRO8DiQTlwOLVbWdiNQCZotIY1U97HVg6YwAXhGR34HlwGIg\nyduQgiNSEsdfONk6RWX3uTREpDEwDuisqllVzULFrzhTqOp8EakpIqU0k3NUQsSfOC8EJouI4LQj\ndxGRRFWdlksxgh9x+n5ZqOpXIvJ6mB7PbcAeVT0OHBeReTiTe+ZW4gjks3kt3jRTgX9x3gwMB1DV\n9e7JwfWARbkSocOfz+YhfFoU3Dg35Ep0gfkLqOLzOMvvLSBiOsejOdURVQCnI6p+unWqAmuBi8I8\nzlo+95sBW8MxznTrv4s3neP+HM9yPvdbAJvCNM56wGx33SI4v0AbhFOM7nolcDqbC+f2cQzgWI4B\nhpe2QgkAAAKESURBVKS8/zjNLKXCMM4SQH73/gDgPS+Oqbv/6sDyTJZ15VTn+EX40TkeETUOVU0S\nkUHANzj9Mm+r6p/pThh8AigFvO7+Sk5U1RZhGGcvEekPnMA5sfFfuRljAHGm2SS3YwS/4/T8RFF/\n4lTVVSIyC1iG01wxTlX/yKLYXI/RXbUHMEtVj+VWbNmI8xngPZ/hpQ9r7tYw/Y2zPvC+iCTjjKj7\nd27GmEJEJgJxQGkR2QIMwUl2KZ/NmSLSVUTWAUdwanRZl+lmGWOMMcYvkTKqyhhjTJiwxGGMMSYg\nljiMMcYExBKHMcaYgFjiMMYYExBLHMYYYwJiicOYHBKRjSJSKqfrGBMpLHEYk3P+nAxlJ0yZPMMS\nhzEBEJHP3dltl4vIrSlPu8uqicifIvKBiPwhIh+JSCGfde4Rkd/cC/vUdbdpLiI/uc/PF5E6Hrws\nYwJiicOYwNysqs2B5sC9GTQ/nQu8pqoNgEPAnT7LdqvqBcBY4CH3uT+B1u7zQ3An7zMmnFniMCYw\n94nIEpyLhVUG6pC2GWqLqv7i3v+AtNP7f+7+/Q1ncjyAksAnIrIcGIlz8R9jwpolDmP8JCJtca6n\n0VJVm+LMiFoo663SJJV/3L9JnLqkwdPAd6p6Hs51MM5UnjGes8RhjP9KAPtV9R8RqYczBTWkvchV\nVRFp6d6/DvjBjzJTrn1wxllJjQkHljiM8d/XQH4RWQk8B/zkPu9bq1gN3CUif+A0Q43NYB1fLwAj\nROQ37P/RRAibVt2YIBGRasCXbrOTMXmW/cIxJrjsl5jJ86zGYYwxJiBW4zDGGBMQSxzGGGMCYonD\nGGNMQCxxGGOMCYglDmOMMQGxxGGMMSYg/w/jHaN0UYWWmwAAAABJRU5ErkJggg==\n",
346 | "text/plain": [
347 | ""
348 | ]
349 | },
350 | "metadata": {},
351 | "output_type": "display_data"
352 | }
353 | ],
354 | "source": [
355 | "plt.plot(replace_trace[:, 0], replace_trace[:, 1], label='replace')\n",
356 | "plt.plot(regular_trace[:, 0], regular_trace[:, 1], label='regular')\n",
357 | "\n",
358 | "plt.legend()\n",
359 | "plt.title('Exercise 7.7: First %d episodes averaged %d times' %(num_episodes, num_trials))\n",
360 | "plt.xlabel('alpha')\n",
361 | "plt.ylabel('Time-steps')"
362 | ]
363 | },
364 | {
365 | "cell_type": "markdown",
366 | "metadata": {},
367 | "source": [
368 | "We see that on average, the replace trace method for $\\gamma = 0.9$, $\\lambda=0.9$, $\\epsilon=0.05$ takes less time to terminate. With lower $\\gamma$, the advantage of replace-trace seems to disappear."
369 | ]
370 | },
371 | {
372 | "cell_type": "markdown",
373 | "metadata": {},
374 | "source": [
375 | "-----------\n",
376 | "\n",
377 | "##### Exercise 7.8\n",
378 | "\n",
379 | "sarsa($\\lambda$) with replacing traces, has a backup which is equivalent to sarsa($\\lambda$) until the first repeated state-action pair. If we use the replace-trace formula in Figure 7.17, the replace-trace backup diagram terminates at the first repeated state-action pair. For the replace-trace formula in Figure 7.16, the backup diagram after the first repeated-state action pair is some hybrid of sarsa($\\lambda$) with weights changed only for the repeated state-actions. I'm not sure how to draw that."
380 | ]
381 | },
382 | {
383 | "cell_type": "markdown",
384 | "metadata": {},
385 | "source": [
386 | "-------\n",
387 | "\n",
388 | "##### Exercise 7.9\n",
389 | "\n",
390 | "Write pseudocode for an implementation of TD($\\lambda $) that updates only value estimates for states whose traces are greater than some small positive constant.\n",
391 | " \n",
392 | "\n",
393 | "You can use a hash-map of traces to update, and if the update reduces the value of the trace below some constant, remove the trace from the hash-map. Traces get added to the hash-map as they get visited. If you want to write the pseudo code or real code, feel free to make a pull-request!"
394 | ]
395 | },
396 | {
397 | "cell_type": "markdown",
398 | "metadata": {},
399 | "source": [
400 | "-------\n",
401 | "\n",
402 | "##### Exercise 7.10\n",
403 | "\n",
404 | "Prove that the forward and backward views of off-line TD($\\lambda $) remain equivalent under their new definitions with variable $\\lambda $ given in this section. Follow the example of the proof in Section 7.4.\n",
405 | "\n",
406 | "\n",
407 | "As given in the book, the backward view is:\n",
408 | "\n",
409 | "$\n",
410 | " e_t(s)=\\left\\{\n",
411 | " \\begin{array}{ll}\n",
412 | " \\gamma \\lambda_t e_{t-1}(s), & \\mbox{ if } s \\neq s_t\\\\\n",
413 | " \\gamma \\lambda_t e_{t-1}(s) + 1, & \\mbox{ if } s = s_t\n",
414 | " \\end{array}\n",
415 | " \\right.\n",
416 | "$\n",
417 | "\n",
418 | "and the forward view is:\n",
419 | "\n",
420 | "$R_t^\\lambda = \\sum_{k=t+1}^{T-1} R_t^{(k-t)} (1 - \\lambda_k) \\prod_{i=t+1}^{k-1} \\lambda_i + R_t \\prod_{i=t+1}^{T-1} \\lambda_i$.\n",
421 | "\n",
422 | "The proof is almost identical to 7.4. For the backward view we need to express the eligibility trace nonrecursively:\n",
423 | "\n",
424 | "$e_t(s) = \\gamma \\lambda_t e_{t-1}(s) + I_{ss_t} = \\gamma \\lambda_t [\\gamma \\lambda_{t-1} e_{t-2}(s) + I_{ss_{t-1}}] + I_{ss_t} = \\sum_{k=0}^t I_{ss_k}\\gamma^{t-k} \\prod_{i=k+1}^t \\lambda_i$\n",
425 | "\n",
426 | "so that the sum of all updates to a given state is:\n",
427 | "\n",
428 | "$\\sum_{t=0}^{T-1}\\alpha I_{ss_t} \\sum_{k=t}^{T-1} \\gamma^{k-t} \\prod_{i=t+1}^k \\lambda_i \\delta_k$\n",
429 | "\n",
430 | "which was obtained by following the same algebra as in 7.9 to 7.12.\n",
431 | "\n",
432 | "\n",
433 | "The next step is to show that the sum of all updates of the forward view is equivalent to the previous equation above. We start with:\n",
434 | "\n",
435 | "\n",
436 | "$\n",
437 | "\\begin{equation}\n",
438 | "\\begin{split}\n",
439 | "\\frac{1}{\\alpha} \\Delta V_t(s_t) =& -V_{t}(s_t) + R_t^\\lambda\\\\\n",
440 | "=& -V_t(s_t) + (1 - \\lambda_{t+1}) [r_{t+1} + \\gamma V_t(s_{t+1})] + (1 - \\lambda_{t+2})\\lambda_{t+1} [r_{t+1} + \\gamma r_{t+2} + \\gamma^2 V_t(s_{t+2})] + \\dots\\\\\n",
441 | "=& -V_{t}(s_t) + [r_{t+1} + \\gamma V_t(s_{t+1}) - \\lambda_{t+1} \\gamma V_t(s_{t+1})] + \\gamma \\lambda_{t+1} [r_{t+2} + \\gamma V_t(s_{t+2}) - \\gamma \\lambda_{t+2} V_t(s_{t+2})] + \\dots\\\\\n",
442 | "=& [r_{t+1} + \\gamma V_t(s_{t+1}) - V_t(s_t)] + (\\gamma \\lambda_{t+1})[r_{t+2} + \\gamma V_t(s_{t+2}) - V_t(s_{t+1})] + (\\gamma^2 \\lambda_{t+1}\\lambda_{t+2}) \\delta_{t+3} + \\dots\\\\\n",
443 | "\\approx& \\sum_{k=t}^{T-1} \\gamma^{k-t} \\delta_k \\prod_{i=t+1}^{k} \\lambda_i\n",
444 | "\\end{split}\n",
445 | "\\end{equation}\n",
446 | "$\n",
447 | "\n",
448 | "which is equivalent to the backward case, and becomes an equality for offline updates.\n"
449 | ]
450 | },
451 | {
452 | "cell_type": "markdown",
453 | "metadata": {},
454 | "source": [
455 | "------\n",
456 | "\n",
457 | "** \"Eligibility traces are the first line of defense against both long-delayed rewards and non-Markov tasks.\"**\n",
458 | "\n",
459 | "\"In the future it may be possible to vary the trade-off between TD and Monte Carlo methods more finely by using variable $\\lambda $, but at present it is not clear how this can be done reliably and usefully.\""
460 | ]
461 | }
462 | ],
463 | "metadata": {
464 | "kernelspec": {
465 | "display_name": "Python 2",
466 | "language": "python",
467 | "name": "python2"
468 | },
469 | "language_info": {
470 | "codemirror_mode": {
471 | "name": "ipython",
472 | "version": 2
473 | },
474 | "file_extension": ".py",
475 | "mimetype": "text/x-python",
476 | "name": "python",
477 | "nbconvert_exporter": "python",
478 | "pygments_lexer": "ipython2",
479 | "version": "2.7.11"
480 | }
481 | },
482 | "nbformat": 4,
483 | "nbformat_minor": 0
484 | }
485 |
--------------------------------------------------------------------------------
/notebooks/chapter8.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "**This is Chapter 8 in the first edition, Chapter 9-10 in the second edition. There are additional problems in the first edition that I thought were worth solving.**\n",
8 | "\n",
9 | "I starred solutions to problems I was unsure about."
10 | ]
11 | },
12 | {
13 | "cell_type": "markdown",
14 | "metadata": {},
15 | "source": [
16 | "### First Edition"
17 | ]
18 | },
19 | {
20 | "cell_type": "markdown",
21 | "metadata": {
22 | "collapsed": true
23 | },
24 | "source": [
25 | "##### 8.1 \n",
26 | "\n",
27 | "Show that table-lookup TD($\\lambda $) is a special case of general TD($\\lambda $) as given by equations ((8.5)-(8.7)).\n",
28 | "\n",
29 | "The function approximation (FA) case is a general expression of the tabular TD($\\lambda$), where the TD error $\\delta_t = r_{t+1} + \\gamma V_t(s_{t+1}) - V_t(s_t)$ are identical in both cases, but we can express the tabular value function as $V_t(s) = \\theta_{t}(s)$ where $\\theta_{t}(s) \\in \\mathbb{R}$, to re-derive the tabular updates from 8.5-8.7. The update to the eligibility trace reduces as follows, $\\vec{e}_{t} = \\gamma \\lambda \\vec{e}_{t-1} + \\nabla_{\\vec{\\theta}_t} V_t(s_t) = \\gamma \\lambda \\vec{e}_{t-1} + \\nabla_{\\vec{\\theta}_t} \\vec{\\theta_t}(s_t) = \\gamma \\lambda \\vec{e}_{t-1} + \\delta(s - s_t) $, which is identical to the tabular case."
30 | ]
31 | },
32 | {
33 | "cell_type": "markdown",
34 | "metadata": {},
35 | "source": [
36 | "##### 8.2 \n",
37 | "\n",
38 | "State aggregation is a simple form of generalizing function approximation in which states are grouped together, with one table entry (value estimate) used for each group. Whenever a state in a group is encountered, the group's entry is used to determine the state's value, and when the state is updated, the group's entry is updated. Show that this kind of state aggregation is a special case of a gradient method such as (8.4).\n",
39 | "\n",
40 | "The state aggregation simply replaces $V_t(s_t)$ in 8.4 with a function that maps a set of states grouped into $N$ groups, to a value for each group, $S_i \\rightarrow v_i \\in \\mathbb{R}, \\forall i \\in \\{0, 1, \\dots, N\\} $."
41 | ]
42 | },
43 | {
44 | "cell_type": "markdown",
45 | "metadata": {},
46 | "source": [
47 | "##### 8.3*\n",
48 | "\n",
49 | "\n",
50 | "The equations given in this section are for the on-line version of gradient-descent TD($\\lambda $). What are the equations for the off-line version? Give a complete description specifying the new approximate value function at the end of an episode, $V'$, in terms of the approximate value function used during the episode, $V$. Start by modifying a forward-view equation for TD($\\lambda $), such as (8.4).\n",
51 | "\n",
52 | "I'm assuming the off-line version is just using the forward view of eligibility traces instead of the backward view. The algorithm is thus given in section 9.5 of the second edition. Given an $n$ step return, we loop through an episode for each time step $t$, let $\\tau = t- n-1$. If $\\tau \\geq 0$,we obtain the n-step return and update Equation 8.4 accordingly. \n",
53 | "\n",
54 | "If the question is referring to off-policy, the off-policy implementation would be making updates to a greedy policy (as in chapter 6 & 7 of edition 1), while following an epsilon-greedy policy. 8.4 would then be $\\vec{\\theta_{t+1}} = \\vec{\\theta_t} + \\alpha [R_t^\\lambda - \\max Q_t(s_{t+1}, a')] \\nabla_{\\vec{\\theta_t}} Q_t(s_t, a_t) $. Given $\\vec{\\theta_{t+1}}$, we then have $V' = V(\\vec{\\theta_{t+1}})$ where $V$ is the value function at the beginning of the episode and $V'$ is the update value function. We see later in the chapter that off-policy function approximation can diverge."
55 | ]
56 | },
57 | {
58 | "cell_type": "markdown",
59 | "metadata": {},
60 | "source": [
61 | "##### 8.4*\n",
62 | "\n",
63 | "For off-policy, we can show that the forward and backward views are identical along a similar line as in (7.4 1st edition). We know that for off-policy, we have the exact result that $R_t^\\lambda - V_t(s_t) = \\sum_{k=t}^{T-1} (\\gamma \\lambda)^{k-t} \\delta_{t + k} $ where $\\delta_{t + k} = r_{t+k} + \\gamma V(s_{t_k+1}) - V_t(s_k)$ (7.4).\n",
64 | "\n",
65 | "In addition, we know using Equation 8.7, that $e_t(s) = \\sum_{k=0}^t (\\gamma \\lambda)^{t-k} \\nabla_{\\vec{\\theta_t}} V_t(s_t)$ (similar to equation 7.8).\n",
66 | "\n",
67 | "It is easy now to show that the forward and backward off-policy equations 8.4-8.7 are equivalent:\n",
68 | "\n",
69 | "$\n",
70 | "\\begin{equation}\n",
71 | "\\begin{split}\n",
72 | "\\vec{\\theta_{t+1}} =& \\vec{\\theta_t} + \\alpha[R_t^\\lambda - V_t(s_t)] \\nabla_{\\vec{\\theta_t}} V_t(s_t) \\\\\n",
73 | "=& \\vec{\\theta_t} + \\alpha \\sum_{k=t}^{T-1} (\\gamma \\lambda)^{k-t} \\delta_k \\nabla_{\\vec{\\theta_t}} V_t(s_t)\\\\\n",
74 | "=& \\vec{\\theta_t} + \\alpha \\delta_t \\sum_{k=0}^{t} (\\gamma \\lambda)^{t-k} \\nabla_{\\vec{\\theta_t}} V_t(s_t)\\\\\n",
75 | "=& \\vec{\\theta_t} + \\alpha \\delta_t \\vec{e_t(s)}\n",
76 | "\\end{split}\n",
77 | "\\end{equation}\n",
78 | "$\n",
79 | "\n",
80 | "Step 2 to 3 above use the results found in Equations 7.9-7.12.\n",
81 | "\n"
82 | ]
83 | },
84 | {
85 | "cell_type": "markdown",
86 | "metadata": {},
87 | "source": [
88 | "##### 8.5\n",
89 | "\n",
90 | "How could we reproduce the tabular case within the linear framework?\n",
91 | "\n",
92 | "The tabular case can be reproduced if $\\vec{\\phi_s}$ is an orthogonal basis of unit vectors over the entire state space for each $s \\in S$. Each $\\theta$ would then correspond exactly to a $V(s)$ when performing an inner product with $\\vec{\\phi_s}$ and $\\vec{\\theta}$.\n",
93 | "\n",
94 | "##### 8.6 \n",
95 | "How could we reproduce the state aggregation case (see Exercise 8.4 **should be Exercise 8.2**) within the linear framework?\n",
96 | "\n",
97 | "One can reproduce the state aggregation case by making $\\vec{\\phi_s}$ an orthogonal basis on the group space instead of the entire state space.\n"
98 | ]
99 | },
100 | {
101 | "cell_type": "markdown",
102 | "metadata": {},
103 | "source": [
104 | "##### 8.7\n",
105 | "\n",
106 | "Suppose we believe that one of two state dimensions is more likely to have an effect on the value function than is the other, that generalization should be primarily across this dimension rather than along it. What kind of tilings could be used to take advantage of this prior knowledge?\n",
107 | "\n",
108 | "A tiling that is dense in the dimension that is important, but sparse in the other dimension, would be ideal in this case.\n",
109 | "\n"
110 | ]
111 | },
112 | {
113 | "cell_type": "markdown",
114 | "metadata": {},
115 | "source": [
116 | "##### 8.8**\n",
117 | "\n",
118 | "Describe how the actor-critic control method can be combined with gradient-descent function approximation.\n",
119 | "\n",
120 | "One example would be to make the critic $\\delta_t = r_{t+1} + \\gamma Q_t(s_{t+1}, a_{t+1}) - Q(s_t, a_t) $, and we update $\\theta$ as usual where the traces are updated as in semi-gradient SARSA (I'm not sure about a more complicated trace update as shown in 7.7 in the first edition). The actor could also use an update as in Equation 7.14, but since we are approximation $Q(s,a)$, it might make sense to also approximate $p(s,a)$. I think we can use $\\theta$ as a function approximation to $p(s,a)$, or equivalently using $Q(s,a)$ as our actor. I'm not too sure about this."
121 | ]
122 | },
123 | {
124 | "cell_type": "markdown",
125 | "metadata": {},
126 | "source": [
127 | "##### 8.9\n",
128 | "\n",
129 | "Look up the paper by Baird (1995) on the Internet and obtain his counterexample for Q-learning. Implement it and demonstrate the divergence.\n",
130 | "\n",
131 | "I only implemented the DP counter-example from the book."
132 | ]
133 | },
134 | {
135 | "cell_type": "code",
136 | "execution_count": 38,
137 | "metadata": {
138 | "collapsed": true
139 | },
140 | "outputs": [],
141 | "source": [
142 | "import numpy as np\n",
143 | "from matplotlib import pyplot as plt\n",
144 | "%matplotlib inline"
145 | ]
146 | },
147 | {
148 | "cell_type": "code",
149 | "execution_count": 57,
150 | "metadata": {
151 | "collapsed": true
152 | },
153 | "outputs": [],
154 | "source": [
155 | "\n",
156 | "class DP_Baird:\n",
157 | " def __init__(self):\n",
158 | " self.alpha = 0.01\n",
159 | " self.epsilon = 0.01\n",
160 | " self.gamma = 0.99\n",
161 | "\n",
162 | " # x is the feature set for each state\n",
163 | " self.x = np.array([\n",
164 | " [0, 0, 0, 0, 0, 0, 0],\n",
165 | " [1, 2, 0, 0, 0, 0, 0],\n",
166 | " [1, 0, 2, 0, 0, 0, 0],\n",
167 | " [1, 0, 0, 2, 0, 0, 0],\n",
168 | " [1, 0, 0, 0, 2, 0, 0],\n",
169 | " [1, 0, 0, 0, 0, 2, 0],\n",
170 | " [2, 0, 0, 0, 0, 0, 1]\n",
171 | " ])\n",
172 | "\n",
173 | " # v is the value function, v = w * x\n",
174 | " self.v = np.zeros((7))\n",
175 | " # w is the weight for each feature in x\n",
176 | " self.w = np.array([1., 1., 1., 1., 1., 10., 1.])\n",
177 | " # delta_w is w after the gradient updated\n",
178 | " self.delta_w = self.w.copy()\n",
179 | " \n",
180 | " def dp_update(self):\n",
181 | " \"\"\"\n",
182 | " Update V using the dynamic programming update for Baird's counterexample\n",
183 | " https://webdocs.cs.ualberta.ca/~sutton/book/ebook/node90.html\n",
184 | " \"\"\"\n",
185 | " self.delta_w *= 0\n",
186 | " # the approximation of the value function is linear\n",
187 | " self.v = self.x.dot(self.w)\n",
188 | "\n",
189 | " # obtain the DP updates for w for states 1-5\n",
190 | " for i in range(1, 6):\n",
191 | " # for states 1-5, you automatically wind up in state 6\n",
192 | " alpha_error = self.alpha * (self.gamma * self.v[6] - self.v[i])\n",
193 | " self.delta_w += alpha_error * self.x[i]\n",
194 | "\n",
195 | " # add the transitions from state 6 to the DP updates\n",
196 | " for i in range(7):\n",
197 | " # with epsilon probability you go to the terminal state\n",
198 | " alpha_error1 = self.alpha * self.epsilon * (0 - self.v[6])\n",
199 | " # with (1 - epsilon) probability you wind up in v[6]\n",
200 | " alpha_error2 = self.alpha * (1 - self.epsilon) * (self.gamma * self.v[6] - self.v[6])\n",
201 | " self.delta_w[i] += alpha_error1 * self.x[6, i]\n",
202 | " self.delta_w[i] += alpha_error2 * self.x[6, i]\n",
203 | "\n",
204 | " self.w += self.delta_w\n",
205 | " return self.w.copy()\n",
206 | "\n"
207 | ]
208 | },
209 | {
210 | "cell_type": "code",
211 | "execution_count": 118,
212 | "metadata": {
213 | "collapsed": false
214 | },
215 | "outputs": [],
216 | "source": [
217 | "dp_baird = DP_Baird()\n",
218 | "\n",
219 | "w_array = np.array([dp_baird.dp_update() for i in range(10000)])"
220 | ]
221 | },
222 | {
223 | "cell_type": "code",
224 | "execution_count": 129,
225 | "metadata": {
226 | "collapsed": false
227 | },
228 | "outputs": [],
229 | "source": [
230 | "for idx, w in enumerate(w_array):\n",
231 | " w_array[idx] = [np.log10(ww) if ww > 0 else -np.log10(-ww + 1) for ww in w]\n"
232 | ]
233 | },
234 | {
235 | "cell_type": "code",
236 | "execution_count": 132,
237 | "metadata": {
238 | "collapsed": false
239 | },
240 | "outputs": [
241 | {
242 | "data": {
243 | "text/plain": [
244 | "[]"
245 | ]
246 | },
247 | "execution_count": 132,
248 | "metadata": {},
249 | "output_type": "execute_result"
250 | },
251 | {
252 | "data": {
253 | "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYYAAAEACAYAAAC3adEgAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJztnXd4FNXawH9nU0mA0EMHAVEQkSIdJIAo4MV6EbAhKHYs\n99obcO3XD8WKgr1dFUWKAgpCFGmCVGlSBEILEEogjSR7vj9OAkl2NtmZnclOkvN7njzsTnnnzJCc\nd85bhZQSjUaj0Wjy8YR6ABqNRqNxF1oxaDQajaYQWjFoNBqNphBaMWg0Go2mEFoxaDQajaYQWjFo\nNBqNphBBKwYhREMhxAIhxAYhxHohxL1526sLIX4SQmwRQvwohIgLfrgajUajcRoRbB6DEKIuUFdK\nuUYIURn4A7gCGAmkSCn/K4R4BKgupXw06BFrNBqNxlGCXjFIKQ9IKdfkfT4JbAIaopTDx3mHfQxc\nGey1NBqNRuM8Qa8YCgkToimQCLQBkqSU1QvsOyKlrGHbxTQajUbjCLY5n/PMSN8A9+WtHIpqHF17\nQ6PRaMoA4XYIEUKEo5TCp1LKGXmbk4UQ8VLK5Dw/xEE/52qFodFoNBaQUgon5Nq1YvgA2CilfK3A\ntpnAzXmfRwAzip6Uj5RS/0jJ2LFjQz4Gt/zoZ6GfhX4Wxf84SdArBiFED+B6YL0QYjXKZPQ48BLw\ntRBiFLALuDbYa2k0Go3GeYJWDFLKxUCYn90XBytfo9FoNKWLznx2EQkJCaEegmvQz+IM+lmcQT+L\n0sHWcFVLAxBChnoMGo1GU9YQQiBd7nzWaDQaTTlBKwaNRqPRFEIrBo1Go9EUQisGjUaj0RRCKwaN\nRqPRFMKWkhgajUajCZx9J/Yxa8ssFu1exObDm0lOSybCE8FZ1c9icMvBjO4wmtjI2JCNT4erajQa\nTSmRuDORlxa/xPI9y7ms5WUkNEng/PjzqVu5Ltm52Ww6vIkP13zIxkMbmX/jfBpUbeBXlpPhqlox\naDQajcPsPr6b22bdxrYj23jyoicZet5QKkVU8nv8uMRxLNuzjLk3zPV7jM5j0Gg0mjLKj9t+pOPk\njvRu0ptNd2/i5nY3F6sUAJ7o9QQbDm3gz4N/Gu73ep0Y6Rm0j0Gj0Wgc4oe/fmDUzFFMHzqdHo17\nBHxeRFgEg1sO5sdtP9KmThuf/W+/becofdErBo1Go3GAHUd3MHLGSGYMm2FKKeTToV4H/jzku2Lw\nemHCBDtG6B+tGDQajcYBHvzpQe7vej9dG3a1dH6TuCbsOrbLZ/uSJVClSrCjKx6tGDQajcZm/j76\nN4t2L+Jf3f5lWUbt2NqkZKT4bF+0CC52uKGBVgwajUZjE2lpsGcPfLziG/7ZegjR4dGWZcVGxJKe\nne6zffVquPDCYEZZMtr5rNFoNBY5fhymToUff4TffoNjx6B6dTjUbwkxO4YS+RM8/TTUrGledkxE\njKFi2LcPGja0YfDFoFcMGo2mQuCVXrYc3sJP23/i6w1fM3fbXJJPJluStX8/3H03NG0Kc+bAP/4B\ny5apFcO+fXBW5w1MffMCvF7o3BmSLVwmOjyajOwMw2vXrWtp2AGjVwwajabckuPNYe62uXy67lPm\n75hPXFQczWs0Jy4qjuNZx1m5byXdGnZj0mWTaFKtSYnypIS33oKxY2HkSNi0yXiSPnDyAF1a1eeS\nN6BSJXjwQfj0U3NjD/OE4ZW+CQupqVCtmjlZZrFFMQgh3gf+ASRLKdvmbRsLjAYO5h32uJTSfxqf\nRqPR2IRXevls3WeM/2U88bHxjLhgBK9e+ir1q9QvdFxWThYTl02k54c9WTl6JfGV4/3KPHUKRoyA\nrVth6VJo2dL4uLRTaWR7s6kaVRWAJ56ARo2U2SkuLvB78AgPuTLXcJ9wJN/5DHatGD4E3gA+KbL9\nFSnlKzZdQ6PRaEok6XgSI6aPIC07jQ8u/4DeTXv7PTYqPIpHej7CgZMHeObXZ3hz0JuGx0kJ11+v\nlMNvv0F0MT7ltOw0KkdWRuTN3nFx0KWLOu+yywK/jzARRq7XWDE4jS0+Binlb8BRg10O6zWNRqM5\nw9oDa+n2fjcubnYxS0YtKVYpFORf3f7FF+u/8DsRv/EG7N4NX39dvFLIRxSZ+lq3hi1bAhrKGRlO\nLwuKwWnn8z1CiDVCiPeEECYWURqNRmOOPal7GPTFICZcMoHHez1OmCcs4HMbxTWiVkwtNh7a6LPv\n6FF45hn4+GOIirI2tsaNISnJ2rlFKY2ao046n98G/iOllEKIZ4FXgFuMDhw3btzpzwkJCSQkJDg4\nLI1GU96QUjJqxiju7nQ3Q9sMtSSjZc2W7Di6g/Pjzy+0/auvoG9fOPdc6+OLi1OO6mBITEwkMTGR\n9HR4+eXgZJWEY4pBSnmowNcpwCx/xxZUDBqNRmOWBX8vICk1iYe6P2RZRqOqjdiTusdn+/TpcPvt\ngcsxaiNQtaqKJgqG/JfmN9+Ehx+Gl18eH5zAYrDTlCQo4FMQQhQM4roaMK4fq9FoNEEyZdUUxnQe\nQ0RYhGUZVaOqcvLUyULbpISVK6FbN3OyivoHYmPh5Ek/BxeDJDS9auwKV/0CSABqCiF2A2OBPkKI\ndoAX2AmY0LkajUZjTE4OrFmjJuykJMg6JZlZeR5XV56A1wsei6+7lSIq+WQaH8wLtg82oSwiArKz\nzZ1T1IFdmtgVlXSdlLK+lDJKStlYSvmhlPImKWVbKWU7KeWVUkprKYYajUYDrFoFd9wBdeqofIIV\nK5QzOKzWdsK9sTz/aAPatzcf/ZNPTEQMGTmFM40PHoR4/6kNARMWZl9znbLufNZoNBUQr/Syav8q\n5u+Yz58H/2T38d1IJHUr16V3k97c2PZG4qIDD1LculVlDq9aBXfeCWvXqoSxfH7esYvlvzZj4WqY\nPBkuvVStKMxmB0eFRZGZk1lo2+HDULu2OTmlQVlJcNNoNBWcIxlHeHvF27y36j1iImK4uNnF9G/W\nnybVmuARHpKOJzHrr1m8tPglZg2fRbu67UqUOXmyyhx+5BEVHWSUQ7D/5H7qVamHEMpJvHgxTJoE\njz1mbvwe4fEpQXH8uHIcmyFUfgE70YpBo9EEhVd6eX356zy36Dkub3k504ZOo0O9DobHXt/2er78\n80uu+PIKNt+9udjex08+Cd98oyZ6f+UnAA6nH6Z2zJnX+ltvhX/9y7xiEEIYRhRZ8VnY5R8wGk9p\noBWDRqOxzJGMI1w79VqycrP49eZfaVW7VYnnDGszjC///JJP133KbR1vMzxmyhT49ltVRqJWreLl\neaWXcM+ZqaxLF1i/XpWviIwM/F48wuPo277ZOb48Zz5rNJpySkp6CgkfJdA2vi2JIxIDUgr5DG8z\nnO//+t5w39698OijKn+gJKVgRFQU1K8Pu3y7YhaLQBhWM7UDO+f40lhEaMWg0WhM45VebvjuBvqd\n1Y8Jl0wwVX4C8hrdHzRObXr5ZRg1Cs45x/r44uOV49gMRj6GEFlySkQ7nzUajev4esPXHEo7xKzh\nsyyZPJpWa0pSahI53pxCZqDcXPj8c9X0JlCM7PDVqqkaR2bwdx9mby9UfgE70SsGjUZjmhd+e4Hn\n+z1faFI3Q0RYBLERsT6Zxhs3qtaYzZubk1fU2RsTAxm+zc9KjVD6B+xAKwaNRhMwUsKmQ5s4knGE\ni5tdHJSsqlFVSc0qXEBo5Uro1CkosYByOpvNNHYaKwuJUIW+asWg0Wj88uef8OyzcMklqgF9eDic\nf9kSjq/pwwP3e9jjW3MuYKpEVfFRDHv3QpOSO2yWSESEikpyC1YWEP5CXrXzWaPRBMyp3FPsOLqD\n5XuWs+bAGk5knbAkR0qYNk29uQ8YAEeOwJgxKp8gIwNGP7WG2y5vR3Q0XHihyjK2QmRYJKdyC8/e\nycnmS1AYvVWHhSl/hVmK+gfc6i7QzmeNRuOXAycP8Nm6z/hh6w+s2LuCmjE1qRNbh8ycTHYc3UHP\nxj35v/7/59NjwB+bN6sm96dOwbhxMGiQmmQLcjBjH8POu4gh10Dbtqrl5bp1vseVRLgnnBxvTqFt\nhw6Zr2QKvjZ9O9/QTTufy0Hms14xaDRlkB1Hd3Dz9Jtp9VYrNh3axMPdH2bfv/ex6/5drBi9gvV3\nrufIw0e46tyr6PtJX1btX1WizOnToVcvuPFGVaBu8GDjyf5Q2iFqx6pM4+uuU2UqFiwwfw9GE3FO\njrmkNLcSysqodqBXDBpNGcIrvby69FVe+O0F7ul8Dzvv2+m3IF1UeBR3XHgHsRGx3P797fx+6+9+\no2WmT1cF6ubOhY4dix/DkYwj1KxUE1Bv01dcAfPmQf/+Qd1auceS89ngJO1j0Gg0p8nKyWL4t8P5\ndtO3/D76d8YljAuoSun1ba/nUNohvwllO3bA6NEwc2bJSgEgV+YWClNt2zb4tpXB4C9vwI4oILsm\nYUumrWJOctrHoBWDRlMGkFIycsZIsnKyWDBiAc2qNwv4XI/w0L9Zf37Z9Yvh/ocfhgcesB4m2rSp\n+fIT+djl7C1qurFzIq6ICW7alKTRlAE+WP0BW1K28NvI34gON6g9XQKta7dma8pWn+1JSco/8Mkn\n1sdWvTocO2b+PLsmYjeiE9w0Go2jZOZk8uTCJ5n8j8nFlqkujoZVG7LnhG/SwcyZykcQExO4rKJv\nxHFxqm+BpvygFYNG43Kmb55Omzpt6Fg/AAeAH6pXqs6xTN/X+mXLoGdP8/IKvhFHRUFWluWhBY2/\n8FC3WXTsynwuM609hRDvA/8AkqWUbfO2VQe+ApoAO4FrpZT6vUKjKYH0dBUu+vffykTz9ak59G7+\nT3JzzecK5FMlsophwtvGjSp5LRjCw1WYqRXscvbakcegru8i53MxIa9lxfn8IXBpkW2PAvOllOcA\nCwCT/ZQ0mopDWhp8+CH07at6DD/6KPz8s1IOG44v48uXu3H++eaqjhYkMiySbK9v8aDkZKhbN7ix\nh4erLGPTjWhsSiizC53gdgZbVgxSyt+EEEUrnFwB9M77/DGQiFIWGk2Z5kTWCZYkLWHZnmXsPr6b\n9Jx04qLiuLD+hVx73rVUjQq8SXBODrz1Frz4oiovcc898P33Z2z+Xull0rN/s3FRS+bMUklnv/wC\nrVubG3NEWIRP+Qkp4eDB4EtQCKFWMjk5qkaRRie4FUcdKWUygJTygBCijoPX0mgcJ3FnIu+sfIc5\n2+bQrm47ejTqQdeGXYmNjCUlPYW52+by+M+P8/GVHzPw7IElytu6VZWTiItTiWUXXOB7zNGMo8RE\nxFApIpqrr1ZRRI8/rhLSzBDhiSA7t/CKIStLTepRUeZkgT3hoXZhZx6Dk7htPMVRmuGqfh/LuHHj\nTn9OSEggISGhFIaj0QTGhoMbGDNnDEmpSTzQ9QHevuxtalSq4XPcmC5jWJK0hCu+vILfRv7GObX8\ntyBbsQIuvxyeeALuvtv/xHow7SDxlc+80t96qzrnxAmoUiXwe3BraKib8hjAXQlu6vpnBpCYmEhi\nYiKZmfDCC/aMyx9OKoZkIUS8lDJZCFEXOOjvwIKKQaNxE2/9/hbjfhnH+ITxjO4wmoiw4m0l3Rt1\n54GuD/Di4hf58IoPDY/ZtUsphXfeUaGixZGenU7lyMqnv8fGQrt2SrH07Wv6dmzBrjd0tykrt48n\n/6X5lVfUqvGll8Y7dm07FYPI+8lnJnAz8BIwAphh47U0Gsd5csGTfLvpW5bfutxUpvGNbW+kw+QO\neKUXjygc3yEl3HYb3HdfyUrBHy1awM6d5s+z823Yriig8ojOfM5DCPEFkADUFELsBsYCLwJThRCj\ngF3AtXZcS6MpDT5e8zFfb/iaJbcsoVZMLVPnNoprRHR4NEnHk2hSrXBMxq+/qkij77+3PrYGDTDd\nIMdtEUB24sY8hrKe+WxXVNJ1fnYF1/tPowkBh9IO8eC8B1lw0wLTSiGf5tWbs/3odh/F8OGHqopp\noNE7RpNelSrmG927FdflMTgYalqWFhI681mjKcLbK97m6nOvDri5jRG1Y2tzOP2wz/YFC+Af/zAn\ny6jRfXq65aGdxupEZdcbuttWMUbjCbnzOUSZz1oxaDRF+Gz9Z4zuODooGXFRcT4lKA4cUIlsLVoE\nJdqyYjCyfVudsNwUruo0blJUp/fp1p4ajTNkZakGM4mJ8McfqhF9ZvgB9l2VwpevdqDW3aqktBVi\nImLIyM4otC0pSckz80dtNJlHREC2bxJzsZQVm7ddzWysyrKD8pD5rFcMmjKDV3o5mHaQQ2mHfHoF\nm2HfPrj/fuXEffllqFYNHntMVRod+84q2sd3JDzMw4UXwqxZ1q4R4YnwGeOBA9bKTxSd1MPCwOu1\nNq6CuMHmbdcqxm0rGJ35rNE4hJSSxUmLmbphKr/u/pWNhzZSJVJldWXmZHJRk4t46qKn6NYosO7x\n2dnw/PPw2mswapRaJTQpUsjlt/R9nN+kES/eC1ddpfwB69ZBvXrmxh7uCfepTXTokKqDFCwej6pN\nZBajN1krE6jTeQyhxKkienbKKjPVVTUaO8nx5vD5us95btFzRIRFcF2b63h70Nu0r9f+dJOa45nH\nmbZpGld9dRVvDnqTf7b+Z7EyDx5USWXVq6uJvmFD4+OSTyYTH6syjbt0gWuvhcmTYexYc/cQEeZb\ngsLrVQXnzGA0mXs85lcMdr/Blsc8BicT3Oyq9hqsvEDRikHjKtYnr+em6TdRJbIKUwZP4aImFxn+\nwcZFxzGy/Uja1W1H/0/7c3Gzi6kWXc1Q5uHD0KsXDB0K48cX/0d1JOMI9aqcWR5ccw08+aR5xeAR\nHvtCMR0yk7jBlFQUu3oWlAdCuaLSPgaNa5i2aRp9P+nLmM5j+OXmX+jdtHeJfxzt67Xn4mYX87/1\n/zPcLyUMHw5XXgn/+U/Jk6pXegkTZ5oedOyoVhhunETtwM65x65J3a5+zSFzPpeDXxatGDSu4Pu/\nvufu2Xfz4w0/Mqr9KFNvS4POHsTCnQsN9331FaSkKN9CIBSdqOLilLPXSk9jOxyrdkbc2DVh2TaZ\nO+igDXWCmxv9J2bQikETcvam7mXkjJF8N/Q7OtTrYPr8NnXasCVli+G+V1+FZ54x1/ms6B91nTrK\ncWwGO5Ol7LDn2z1RlfWoGyOcTHCzU5ZOcNNUCJ5a+BSjO4yma8Ouls5vHNeY3cd3+2zftUvVJRow\nILjxVaoEGRklHxcIbvIPuNHiEcqVkD9C6nz2s4LRzmdNueZoxlGmbZrGjvt2WJZRLboaxzOPI6Us\n9Ga8ZIlyOptZLRhNMpUqWcw0tsEsYZfZxk5ZTpu37DJLlfUEt1CuyrRi0JQ6hw+r7mXJybA0dR7t\na/aiWpRv45tACfeEExEWQVZu1ulwVoA//zTuilYSRrWJzK4Y7HSG2jFBuDFc1Uk7fKhN/GXd1KYV\ng6ZU2LIF3n8fZs9WJaPPOUdlAa+tt5Rje3vQegJMmACXXWZNfmxELOnZ6YUUw/790L27OTlGb3v5\nze7tQJuS3I2TCW5lCa0YND7sO7GPxbsXs/3odo5mHCU6PJrWtVsz8OyBphrdA6xfr8pNrFwJI0ao\nstMdOpwx7/zji62M7tCHmCS1f9Ikaw1sosKjyMzJLLTtwAHzGctg/DZspQSFHZOMnYXvjAh1uKod\nMpyOJHKTMrdTTnFo57MGULb+15a9Rrt32tHm7TZ8uu5TUtJTqF6pOhLJp+s+penEpny42rhdZVFy\nclRSWL9+0L+/cgS/9BJ06lTY5n/g5AHqV6lH//4qtHTMGDh1yvz4w0QYXll49k5PV60wg8XjCW1J\nabtMLm4LV7VTlpvyGIyw27Slnc8aR8nKyeKVpa8wYekELml+CRMHTKRX416EeXw9thsPbWTg5wOp\nGVOTy8+53K/MzEwYMkT9u3Zt8W/tB04eON3svlcvqF8ffvlFKRMzeISHXG9he4+db+h2FK2zEzvq\nEgXV2tOFfo9CskNo4rdLAYcyF0IrhgrM5sObGfbNMBrHNWbpLUs5u+bZxR7funZr3hv8HvfOvZfB\nLQf7nWxuuEE5bKdNK7lTWVZuFpXCK53+3reviiYyqxjCPGHkSl9HgB0RLlZqE4E93cnsfDs3ItRO\nWrehE9wU2pRUQVm5byUJHyVw54V3MmPYjBKVQj4XN7uYjOwMNh7aaLj//fdVo/pPPgmsfWXRt6tz\nz1WOarPYFa7obyK2q3JoKDOEna4pFDIfg11v6C4KeQ01jq8YhBA7geOAF8iWUnZ2+pqa4tl3Yh+X\n/+9yJl02iataXWXqXCEEPRr34I/9f3BenfMK7cvKUn6FGTMgKsqczHwaNFBhrFawLSberv7BLirh\nbOek50aHuBvbchalLDmfS8OU5AUSpJTlpH152eee2fcwusNo00ohn5Y1WvJXyl8+2+fMUW0rL7ww\ncFlF32KrVIHUVPNjctr5GCrnsxsnYXWeuxziRXGbJSeY8RRN3AxWXiCUhilJlNJ1NAGw5sAaVuxb\nwaM9H7Uso3ZsbVLSU3y2z54NV19tXl7BSbRqVTh+3Nq4jGz6bpuIrVBW7NXloVGPHZSHMuClMWFL\nYJ4QYoUQIrgO65qg+XTtp9x8wc1UiqhU8sF+qBpVldRTvq/1q1apcFQzFJ2IK1VS0Ux2EUqzhF39\nGAxl22CLt3M8oTS32DkR2/WMdOZzyfSQUu4XQtRGKYhNUsrfCh4wbty4058TEhJISEgohWGVf/bs\ngQULYNky5RA+eRLWdl3A1VGT2NZYmX2sUDWqKqlZvoph61blPDZLwTdHK43uwXnns+nxuKwbmJ3j\ncfqNOFR5DG5PcEtMTCQxMZHsbHj2WXvG5A/HFYOUcn/ev4eEEN8BnQG/ikETHF4vfPedyiBevVol\nmHXvDgMHQqXKp7hs0RbiTrSlWzfVzeyuu8xfI9wT7tPo/tQplVBWvbo5WUUnGSvJZKdlOeR8VrKt\njCh4GW41S7g9UsptVio7xpP/0vzii6qr4PPPjw9eqB8cVQxCiBjAI6U8KYSIBS4BnLubMsaxzGPM\n3DKT+Tvms/rAav4++jfZ3mxqVKpB14ZdGdN5DH3P6huwvOXL4c471eT68MOqx3H0mdJB7Dq2nzpr\najLx6Rjuvxt69oTWrcHsAs0oy/jwYahZ05pNv+gkY2UCddL5bGe/XjeZtuwmlJFSFY2ynvkcD3wn\nhJB51/pcSvmTw9d0PduObOO5Rc/x3abv6HNWHwY0H8C/uv2Ls6qdRXR4NMlpyfy842dGzRjF7R1v\n57FejxUrT0pVbmLiRPUzdKjxL87h9MPUjqkNQNOm8MIL6sesYvAIj63lJwpO6kFFb9hgBrKrpLTb\nsnrLUriqWeyMbLKlVLrN2k3i+/LkNI4qBinl30A7J69RlsjKyeLphU/z/ur3ubfLvWwds5XasbV9\njmsc15iR7Ucy8OyBdJzckX7N+tG5gf/0j4cegvnzlfO3fn3/1z+cfpiaMTVPf7/6arXCOHkSKlcO\n/D78lZ+wy15tacVgZ8e0IrJC6lh1aRMat5XMtuP/38mue1YJlRNbh5GWEsknk+nxQQ+2pGxh490b\nebr304ZKoSB1K9fl4e4P8/ry1/0e88EH8MMPkJhYvFIAyMzJJCYi5vT32FjlgN682cydqPITRVcM\nVhUDFP7lt5JlfGYMNoSr2hnh4rICcU4rGTeOyQ7c4ny2W05xaMVQChzJOELvj3ozuOVgvhv6HXVi\n6wR87pXnXsm8HfMM9x0+DI8+ClOnQrVqJcsymqiaN1ftL81gZEoCe7qB2f0G66YIl1D6PJxOAHTb\nmMqj89lJeUXRisFhpJSMmD6CQWcPYmzCWNNLzMZxjcnIzuBIxhGffVOmwODB0KZN4GMpujStVQuO\n+IouFiPFEFSlziLPxPKKwaG8gVBmGrsxNNRtY3Jb5JbbxmMFXV3VYeZum8u2I9uYdu00S+cLIWhW\nvRm7ju2iRqXC7S+/+ALefTdwWRLf1Pq4ODh2zOSYELaZkor+EdlhjgpGntP9jK3gxugdtyVwOVWi\nww0JblJKSvtx6xWDw0xcPpEnez1JRFgApUb9UL1SdY5mFi41deyYMgF1NlGS0GjFEB1tPtNYCGFL\n+YnT8mwIV1XnOfNHHeooIB/ZVsNnHczEtirPybdrtyW4WSFUZUP0isFmpISDB1WF0CPpx1iyeynf\nDf0uKJlVo6pyPLNwAaH16+H881U/YjMU/UWLiFBVUU3JQNgWsmjkY7Azj8Eus0SosqidXHWAfZNe\neVGeTlKWnM9aMdhAZqYqIPf11/Drr2qirV8fMuquIrPZBXRsG8MTT6gGNlaICoviVG7hfpcHDqgS\n1WYwmqgiI+HECXNy7HQaFjVvhTqPQZ1jw4rBxmfktraVbguhNao+6jbKmvO5wikGr/SyNWUr65LX\ncSj9EB7hoUlcE3o27kmVqCqmZGVkwJtvwiuvQKtWMHy4Shhr2lT9x722bD1bUtoy7Hq44w7YvRse\nf9z8mCPDIn0UQ3IyxMebk2NkSgoLg1zfxmcly3KZKclpO3yobfoFcUN/CLcpKzeV6CgLIbglUWEU\nw5KkJXy85mOmbZ5GbEQs7eu1Jz42Hq/08vWGr7l+2vU83ONhHunxSEBvH4sWwahRypzz44/Qtq3v\nMclpydSvUo+LLoKff4YLLlC9kM8OrFnaaYxqEx05AjVq+DnBD0bOZ6smADtyBtR59piS/BHKstt2\nyLLLtOW0/dzOZ2QX5SHBDUIT5VTuFcOyPct4ZP4j7Endw20dbmPF6BU0rdbU57idx3YyZOoQsnOz\near3U8XKfP99eOIJmDxZ1SPyx6G0Q1xYX3WtqVcPRoyAjz82XxnRyNmbm2vev2C0YlDbzY/HeLs5\nOSXJM4uTESUhnWQcysQONWUljyGkzucQRX+VW8WQnZvNo/Mf5Ys/v+DFfi9yfdvrCff4v92m1Zoy\nfeh02kxqw20dbyO+srGd5n//U1VJFy0q+c3/WNYxqkWfyTzr399auVyjN3SvV5mBTMsyWDGEsteA\nUbiqXc7nfHnBjMeKjOJw06Ru1/+Z3ZT1PAZ/aOdziMnIzuDKr67EIzz8eeefheoDFUeDqg34R8t/\nMG3TNO6KOov6AAAgAElEQVTsdKfP/p07YcwYWLgwMHOQV3oJ85yZvc87T/UsMItHeOzpTmZXo3sb\nTUn58gqOxyp2KSvbVjAO2qtD2WsC3JdbUdFajerMZ5NIKblp+k3UqFSDWcNnBawU8undpDdL9yw1\n3Pfkk3DvvcqvEOhYPOLMI65XT5WxMNuIxiihzOtV5bXNYGRKsqQYbAoNzR+TkSyz+JuoQlXnxsmo\nJLtCQ+3ErtwKOwnV/1lZWcEUR7lTDJ+t+4xtR7bx0RUfFWs68keLGi3YeWynz/bDh1WxujFjApfl\nld5Cf5Aej6pievKkuTEJYc8buj/nc8jfPouEq4ayHLTTJaXdZEqySlnI6A71/5ntmc+lTLlSDLne\nXJ5OfJq3Br1FVHiUJRkNqjRgT+oen+0//wy9epnrUCYpvGIAVdHUtGLAONM4ZCsGP6YkK9hWEsPO\nN3S7GuPY8IxsTbizMVzVWL49ckKVBOhGdOZzEOTmqkYxv+5ZQK2YWnRv1N2yrLjoOMN+xkuXKsVg\nBq/0+vzHxsZCWpo5OUYrBq/XvuStUJuSbCuJYUemsV1JcjZmGttSosPhSLJQr4TcVrvJCO18dpit\nW2HOHJVlvHq1anofEQEZPecTE3kZI5eo5jWtW5uXXTmyMmnZvjP3zp1w0UXmZBX1MYAaZ06OnxP8\n4G/FECrns5GsoJzPNmQ+Ox2DXo5fSkOKm/IYwDn/gHY+O8TJkyrLuFMn9ea+fj1cdZVSEKmpan/C\n8JW88XBXzj0XeveGmTPNXyc6PJrMnEyfX9jdu6FxY3OyvNLroxjCwy0oBj8+BjeZkvJlmcWuDm7q\nPGf8A6EsWmdnVJJdMuyaPN2Wx+CUCbEs4ooVQ9LxJBrFNTLcd/QoTJigykv37g3PPw99+hgnd+1O\n3U3Xc5tybk/o2xcGDoSNG6FO4H1x8AgPYSKMHG9OoYqoR4+a8y+Ar/MZ1GTu9e1xUyxGKwYrpiS7\nnM/+qqtaxY4Obk5HlIQyCdAus01ZCVd1Wx6DnaY7K4QiysnxFYMQYoAQYrMQ4i8hxCNGx3y32bf6\naE4OvP02nHuuqgu0dCl8841KEvOX8Zt0PInGceq1vlMntaL44APzYzaqTZSTo8xAZjByPoeFWVAM\nLlsxGMu2J1zV7jyGUNn01fWddfaawa3hqvZdv3z6GEKVL+KoYhBCeIA3gUuB84DhQohzix63aPei\nQt9/+knVFZo2DebNU53KWrQo/lq53lyyvdmFehpfcw3MnWt+3JFhkWR7CycbZGebVwxGzudQrhjA\nJuez3QluRU606w9IyTYnxy7TjdNlrkNpSrITN+UxqHO0jwGcNyV1BrZKKXcBCCG+BK4ACrWf/2Pf\nH4ByKj/4IGzYoCqWDh4c+AM4lXuKCE/hmbtDB1i3zvygjVYMVhSDkfPZ4zFfzdTOPAZf2faYkvJl\nmcWucFWw7w3dTRE3dpp/nOqhcWa7OTluMkepc3TQQT5Om5IaAEkFvu/J21aIv4/9zeVXnaJbN+ja\nVSmGyy8395+b7c0mMiyy0LbatZVT2mwjmoiwCFsUg5M+hlA7n31lm5NRnLxQ1kqyE9tKdNhgBrLb\n1OKk8nRbHoPbfB6lgSuczyyENZGdue22K+nWLYGoqATTIk7lnvIpGyGEmtAnTzaXsWzbisFBH0Mo\nnc9g/HZul48hlPHebnQ+G1FeTEl24bY8BicynxMTE0lMTMTrVYU8ncTpFcNeoGCQZ8O8bYXpA2kX\nJ/H88+NISEiwdKFZW2YZ5h8AvPaaOVmRYZFk5xb2MWRm2udjMG1KsiuPwcZaSW7LYwDnktNCOZmX\n9xIdbnLQO02wK+GEhATGjRuHEOMYO3acfQMzwGnFsAJoIYRoIoSIBIYBhtkFRzKOBHWhDYc2+N23\nfbs5WUVXDAcPqn/N9j8wymOwZEqyKSoJjCe9UDpW7cpjcDoGPZSTupuqvRYnx65VlV3oBDfrOKoY\npJS5wD3AT8AG4Esp5SZ/xwejHI5mHLV8blEiPIV9DPk9ka28oduiGGzMY/CRbaMpySq2ld22ybxV\nFqqZ2qXMreJk8yA3lTFxU2RTaeJ4HoOUcq6U8hwp5dlSyheLO/bZXy10sckjOjy62P1HTOicoisG\ns5nK+Rg5n630WLY1j8HBBLeQhqvaZNO3s2idHTKcbBxUDuav0zi5+gh1glsocEVJjDGdx7BwxEK+\n3vA1X2/42pKM2MhYnunzjN/9kyYFLqtoHsOxYyphzixGzudQrxicjEqyI5okGOezU1FAoc40tq3R\nvcP+CivYmUFfXimXmc+B8PrA10lomsAP1/3AvXPu5e0Vb5teji1OWkznBp0N911zDUycGLivoeiK\n4cABc2U18rErwc0jPD4RV5adzw5FJeXLChY7SzQEI68odtjP7Zz0QmnPL89JgG4r6lcuM5/NckHd\nC1g8ajHv/vEuAz8fyNaUwPpgbji4ga0pW+ndpLfh/pYtYexYlRuR70gujqJ5DJs2wTnnBDSUQjid\n4BYy57NNtZLsNtvY0v/AZVFATo4nKL+Qm5IAHXTyhzLhrjTlFcVVigGgeY3mrBy9kj5N+9Dt/W4M\n+2YY87bPI8drbOjfk7qHYd8OY3zCeL/NeaKi4J57YMgQ6NED1qwpfgxFVwzz5pnvxQD+fQxl3fls\nZxtNu8w2dvaIsK0gn02zipMFAkNtSrILJ539OsHNJUSERfBIz0e4s9OdvL/qfR5f8Dh/H/2brg27\ncn6d86kZU5OM7Az+PPQn83fM5+HuD3PHhXcYylq9+szb/rhx6vPFF8OIEfDEE1Cjhu85BfMY/vhD\nZWL372/+PvyZkuwqu+0mU5JdPoZ8WVZwe6OekJZ7sDPhqgx1lQsVbku4M4vrVgwFqRpVlQe6PcCK\n0StYe8dabml/C7GRsew/sZ/MnEwGNB/Ahrs28EjPR/zaUNu1g0qVznwfPhz+/FOFoDZvDqNHq6J9\nmZlnjomJiOF4Virz58OVV6oEudhY8+PP8eb4lOmoXl2V8DaDnY16nDIl5csyi12KytZGPXY5ex1S\nVHZhZxkTOyN33GAqcxOhCH915YrBiAZVG3BV1au4qtVVQcuqW1eVyXjmGfjwQ5VevmYNnHUW1K8P\nO+qdx7dpf9JgvWoOdMUV1q5zKveUj2KIj1dlxM1gtGKwWtTPyeqqZrHbtOEme7wbe1Abyi7bL7aA\nvb9HblPEuudzCIiPh0cfVT9paaq6a3IyrEntyP+Sn2D11OD+cIwqvsbHw19/mZNjtGLIyTGfiW0o\n2+KKoSh2leiwipN+AbdNnpYnvfLyCm2AHRNoWVLAFc75HCpiY5XZ6dJL4YGre7EnfRs7j/0dlMzU\nrFSqRlUttO3882HtWnNybFsx2GRKypdV6HsIs4yLl2/lHPcoGbc5sNV5NvoYykAeQ6j+z0KJVgwG\nRIZFcmuHW3l+0fOWZXill+NZx4mLjiu0vVMnpRhOngxcltGKIStLRVuZwciUZAW7ej4Xt2wP1STj\nRuezG8MonSwbUh58DDrzuZzySI9HmLNtDnO2zrF0/r4T+6gdU9vHx1ClCiQkwNSpgcsyWjEcPw5x\ncX5O8IOdzmcf2TY7et0UQmtlPH5lhzAJsDyEURphp/nPjW/7Bf/fSmt4WjH4oXql6nz5zy8ZMX0E\nv+761fT5Gw5uoGXNlob7/vUv5fjOyAhMltGK4cgRFeFkBlvDVW0qA26EnX/UZX0itmsyL40M2pC+\noTvUzCiU2eoQurBXrRiKoWfjnnxxzRdc8/U1TFw20acsRXHM3jqbPk37GO7r00d1qrvrrsB+WYqu\nGHJzYdcuaNIk4OEUI9ueqKR8Weavb9OKweGkNLetqkK5gqlohf1C+XIRKrRiKIGLm13MsluWMXXj\nVC6cfCFzt80tUUFsP7Kdz9Z/xqj2o/weM2UKrF+vMrJLKpFRNBN70yaoV898boXbTEn+3oZDXrTO\nQYd4eZlkykJhv1ApmfJgstOKIQCa12jObyN/47Gej/HI/Edo+UZLnl74NIt2LeLkqTNe5LRTaXz1\n51f0+bgPz/V9jkZxjfzKjI2FBQtUiGxCAmzb5v/6tWJqcTj98Onv334LAwaYvw+3mZLAeR+DWdyW\n1Wunzdtt9nMnGxCFWnmW9cznCp3HYAYhBEPOG8I/W/+T5XuXM33zdB748QE2HtpIpYhKhIkwTp46\nSdeGXXn/8vfp37zkGhpVq8LcuSqzuksXuPZauP9+34J9tWJqcSj9EACbN8Nbb8Gv5t0ehiuG6OjA\nfR35+Etws9PHEMocBFuyem2MuLFjhea0/dwJeWZwSsmE2sfghIxA0IrBJEIIujbsSteGXQEVlpqS\nnoJXeqlRqQYRYeaSCzweeOABuPFGmDABeveGpk1VPkXXrnl+hLSGbNq3i/HjlVL4v/+Dc881P/Zc\nby7hnsL/5XXqwKFD5uTY2RTHSJbbKmPaLStYQm3aKo/mH3D2Ld/OjP7SWA1pxRAkHuGhdmztoOXU\nqgUvvAD/+Q8sXKh+JkyApCTIym3D7uFbOXAsk19+iaZVK2vXyMrN8lFcdeoEVoq8KEXfrHJz7SkD\nflq+yyKlrOC25C07bd9lwQzkphBj69fWJTE0qGzmSy5RP2eIpscH7bm810JanT3QsuwTWSeoXqlw\njGvdunD4sDInFSw2WBxGpqRTpyAy0s8JfrDTlGRrHoMdyVs2rqoMt4fIB1McdozJ2nXd5WNwmy/H\nCo45n4UQY4UQe4QQq/J+LLhLNflc1+Y6pqyaEpSM1KxU4qIKZ8VFRUGrVqo8eaAYTXpWFIM/WVZx\nMhs7lCWl7epZYSjbZeGqVnFSyVhBZz4XzytSyg55P3Mdvla5ZmT7kfy+93dLyXb5pGSk+KwYAPr1\ng++/Nyer6OSQng4xMcHJyMdt2dihnojdiFP2eLc5sN3mpyovmc/l5M8g9MRExDB58GSu+/Y6dh/f\nbUnGXyl/0aJGC5/tN9+syo8HWr/JyJR07Jj5TOx8WT7bbMpjsGpKMpTtIjOJ2h68LDdYPNxmBjIi\n1LknRX+vS+P5OK0Y7hFCrBFCvCeEMFnZR1OUQWcP4qHuD3HRhxexPnm9qXNPZJ1g+9HttKrl67lu\n3Vp1tXv66cBkGTXq2b9fObLNUNyyPVQ+Blvr7tjkfHaTz6M47FjlWbquzb3Dg8VWJ3+I3q2Dcj4L\nIeYB8QU3ARJ4Angb+I+UUgohngVeAW4xkjNu3LjTnxMSEkhISAhmWOWa+7reR82YmvT9pC8PdX+I\ne7vcS3R4dInnfbXhKy5qchGxkcbp0q++Cp07Q5s2MMp/wjZg/Mv6118wdGhAt3Aar/QS5gnzlW8x\nj6Eodjmf82UFK0NtNz8et+HGcFW7stWdrE0VLImJiSxYkIjXq9oUO0lQikFKGWgn5CnALH87xzl9\nl+WMG9reQNeGXfn3T//m7RVvM6bzGEa0G0GtmFqGx69PXs+TC55kxrAZfmXWqgWzZ6v8ib174fHH\nIcx3zgYg3BNeqERHbi4sWgQTJ5q7D6/04hG+i1a7iuhZWjG4LHzSVme4jeGzTjpXQ6U8SyNyKxgS\nEhLo1SuB555TimH8+PGOXcuxcFUhRF0p5YG8r1cDfzp1rYpIixotmDFsBsv3LGfSykk0e60Z59U5\nj16Ne9GiRgsqR1bmeOZxFictZs62Obw58E26NOxSrMxzz4XFi1Wy3Zw58MorKsmuKDUq1SA1K5Xs\n3GwiwiKYOhVatIDGjc3dQ3GKoaT6Ub7n2Biuald5jXKaie0POxVfeaQsOZ+dzGP4rxCiHeAFdgK3\nO3itCkuXhl3o0rAL7/zjHRbvXsyyPctYvmc56TnpVI6oTPdG3Zk4YKLf1URRGjaEn3+GDz5QJTpa\ntlSmpcGDVS8JgDBPGLVjapOclsxfKxty333w3Xfmx+6VXsMJq3p15czOv16gOOV8tnMitgs3vclC\naJ+RnQ56I0LufC5Pmc9Sypuckq3xJTo8mn7N+tGvWb+gZXk8cOutcMMNMH06fPQR3HabWlGcc44y\nO3ljm5FwzV/kbG3IJ59A9+7mr5OZk2noH6lVS5XpaOS/BqEPZWGSCeWqw1i2xfMcfMt3Y4ixWexM\ncNOZzxrXER0Nw4apn8xM+OMP2LEDUlKgbU5Xmty0hEnD+xJu8bcoPTudmAjf5If4eBXlZJaif5C5\nuf79JMXhpCPTzogis9ipZNxaVyhYdIKbQpfd1gREdDT06KH8D/ffD49cPZAVad8QFmb9DykjJ4NK\nEb51ONq0gXXrzMky+kPMzlYlRsxgW1SKy5KlSmOiCnVZ8WApjQq0wVJeEtw05ZQ+Z/UhLTuN+Tvm\nW5ZxLPMYVaOq+mzv3Bl++828vKK9cdPSzGdjg7PRKW6KkrKTkK6EHM5jCLXprijlIcFNU07xCA8T\nLpnAXbPv4ljmMUsyko4n0TjON5Rp0CClGMyUAy86maenqzpQZlcMhrJtCp+1fH27/Cc2has6+ZYf\n6jwGp7Az87k00IpBY5nLz7mcgS0GctVXV5GalWr6/M2HN9Oium+JjipV4Prr4bnnzMkrOGEdPw5x\nFnLtbev57HBMvF0lOiz7GBw0u4TUx2DDJFweMp+1YtAExauXvso5Nc+h14e9+POguVSVn3b8xEVN\nLjLcN3YsfP01zA/QUlV0ovr7b/N5Ff5khbKIXlmoJQShzzS2AzuVeVlv7akVgyYowjxhTLpsEmM6\nj6HPx3146KeHOJRWsg3op+0/kZKeQkLTBMP98fHwxRdw3XWQmBjYWApOMuvXY6mhkRszjZ2KSgp1\nuGpZCDEGdyli7XzWlBmEENza4VbW3L6G9Ox0zn7jbK779jpmbJ7BkYwjhY7Nzs3m83Wfc8O0G5gy\neIphraR8EhLgyy9Vot2LL0JOjv8xVIuuVsjXMWMG9A+0YEsBjLKx7ay3ZKXLnRGhnvTcVnbbTXkM\n/gj1/5kZtGLQ2EaDqg1467K32HbvNro36s6bK96k6cSmNHq1EZ2mdOKCdy6g5n9r8u4f7zJj2IyA\nkvH69oWVK1U29vnnK/NSdrbvcXVj63LgpKrAMm+eWjFcfbX5e8j15hImCisrIcDrNS+r6Jus12te\nMbgtF6I4ytLEV9rYmflcGugEN43t1IqpxT2d7+GezveQnZvNvhP7OHDyAJFhkTSt1tSwWVBxNG4M\nP/0Ec+eqlcO998I116hS4RdcoMp41K5Ul7V/72H8HHjzTfjmm8BblRYkV+b6rGJq1FBJfWYweqO2\nohj8ybIDO+ebUEdu2YUd49GZzxpNCUSERdCkWhOaVGsSlBwhYOBA9bNlC8ycCe++C5s3w549IFte\nQFSvt7hBwtKlqqifFTKyM6gUXlij1K0LK1YENXzAumIwwq6CdZZae7osO9i2vth2dgEs48sebUrS\nlDnOOQceekitIHbuVL6HAyu6EXnWcl57K8OyUgBIy07z6VnRsCHsttA0r+iElZ2N6fIhTlVoDRan\nwlXdmMfgpgQ37XzWaExQO7Y2nRt05ttN3wYlJ+1Umk/9po4dlZ/DjJ/BaOK00hfbnyw7cIMpyWlZ\nbqGsPR+tGDTlhod7PMzTC58mIzvDsoyDaQepHVO70La6daF27eDNSVYVgxFuq9RqFTflMYD7Etyc\nkBcIWjFoyg0XN7uY7o26c9v3t1myg6dmpZKZk2nYu+Kmm+Cdd8zJK/oHffKkPYoh1FFJ5TWPwY0J\nbjrzWaOxgcmDJ7P9yHZGzhhZqP1oIKzYu4J2ddsZTqJ336262i1fHpgsoz/offugfn1TQ3Jl7wNw\nruSHG/MYQr2qCgVaMWjKFTERMcy7cR6pWal0ntKZ1ftXB3zuVxu+YtDZgwz3VasGkybBkCGQlGR+\nXFKqMh1NLARnuanaq9Ny7JblJrTzWaMJIbGRsXx77bfc1+U+Bn4+kKHfDGVJ0pJizUvzts9j5paZ\n3N7Rfwfaq66Cf/0LevaE1SXom6jwKDJzMk9/37wZKldW/opgKQ91ifJxWx6DU5Q1xanzGDTlEiEE\nI9uPZMh5Q3hn5TuMmjGKHG8O/Zv1p0vDLjSOa0yEJ4L9J/czZ9scZm+dzdQhU6kdW7tYufffD/Xq\nwaWXwl13wYMPqgm/KHVi65B8Mvn09/fes5aJ7ZVeW/wDdtqqbbPpO5zHYEmWyxLcnJAXCEEpBiHE\nP4FxQCugk5RyVYF9jwGjgBzgPinlT8FcS6OxQuXIyjzY/UH+3e3frDmwhsSdiczfMZ+9J/aSnZtN\nrZha9GrciwmXTKBGpRoByRw6VPW4fuQRaN4cbr9dOacL5k/Ex8aTnKYUw9Sp8PnnsGqVH4HFkJ2b\nTYSncFOJ6GgV4RQsXq99ZbdDHetvR16FGxPcymrm83rgKuDdghuFEK2Aa1EKoyEwXwhxtqwo60aN\n6xBC0L5ee9rXa2+LvEaNVPXXDRtgyhTV9jQuTimMpk0hLLYxW9N20aVbDkcOhzN7tnnHM0C2N5uI\nsMKKoW5dSE72c0IxFP3zy8mxp5GR3dgxF5Y1002glAkfg5Ryi5RyK/isU68AvpRS5kgpdwJbgc7B\nXEujcSPnnQcTJ8L+/ao+U48e6k386P444kQjbnxwPZs2QYcO1uQbrRhq1oRjx+CUiaArozfPnBzz\nmdhgs+nGhX4Pp3BbwEBxOOVjaAAsLfB9b942jaZc4vFA27bqJ5+cOf05XGkG4eHWVyknTp2gcmRh\nJ0ZYGDRrpmpGnX++ZdGWFQPYEyllV7SVnYYIpxSVm1YdgVDir4UQYh4QX3ATIIEnpJSz7BjEuHHj\nTn9OSEggISHBDrEaTUi5pf0tDPpiEA92f9Bncg+U45nHqRZdzWd7166weHFwiiEryz5TUqhNN65S\nVA5lPicmJjJvXiLZ2VBgynSEEhWDlNJCuxP2Ao0KfG+Yt82QcU7fpUYTAi6oewEJTRN4csGTTBww\n0ZKMvSf20ql+J5/tgwfDW2/BHXcELqvohJWaaq0vdlnAjYrK0rULyElISKBLlwReeUUphvHjx9ty\nDSPszGMo+CRmAsOEEJFCiLOAFsDvNl5LoykTvD7gdb7/63veXvG2pfO3pGyhRQ3fcrGDB8O2bWrV\nEAhGE9WxY9YUg62mGx2PYooy4XwWQlwphEgCugLfCyHmAEgpNwJfAxuB2cBdOiJJUxGpGVOTH2/4\nkVeWvsLD8x42VabjRNYJ1hxYQ5eGXXz2RUXByy/DrbeqN38r7NqlmiBZwU3hoW7LY/Av2x45rq+u\nKqWcLqVsJKWsJKWsJ6UcWGDfC1LKFlLKVjqHQVORaV6jOUtuWcLmw5vpOLkjs7fODmgCevP3NxnQ\nYoBf/8SwYdCnD1xxBRw/XrwsIQRe6S103Q0bVG8LOwi56cYhRWUXZc35rEtiaDSlQJ3YOswYNoPx\nCeN5aN5DdJjcgdeXv86uY7t8js3x5vDuyneZuHwiL/Z7sVi5b7wBrVur/IniynSEe8KJiYghNUst\nL7KyYP58pVjKI6FMuKvwmc8ajSZwhBBc3epqrjz3Shb+vZCP1n7Ec4ueIyosihY1WlAzpiapWams\n2r+KVrVasXDEQprXaF6szLAw1eP6889VmY6rr4YnnlAJeEXJz8aOi47jzTehfXuVjGcWI9ONxwO5\nufbIchuhLHNeVjOfNRqNSTzCQ79m/ejXrB9SSrYf3c7OYzs5knGE2IhY2sa3pVGcwczuByHghhtU\nP+z//lflUnTtCv/8J/TurXIePB6IrxzPur/389GElnz0UeCO64JIKf0m3R0+bE6WG/MY3E5p3apW\nDBpNCBFC0KJGC8PII7PUrAkvvQRjx8K0aTB7tgprPHgQatWCw93OZ8RnaxjerDe//656WZslV+bi\nER7CPGGFtsfHWyvTURQrSXcS6dochEKyy5DzWSsGjaacEROjVhA33KC+Z2bCoUMwfWdXftn/Pe9d\ne59l2Vk5WUSGRfpsr1/fWp+KomRnm0+6k1IaFvULZQc3H9na+azRaNxEdLTyOVx/4eXM2zGPlPQU\ny7KOZx0nLto3+aFDB/jjD/PyipqBrCgGcLYFptlJvSz4TUpCKwaNpoJQo1INhp03jGd/fdayjJT0\nFKpHV/fZ3rMnLFxozQFdkIwMqFTJ3DlunIht7X8RgvvTikGjqUA80/cZvtrwFXO2zrF0/t/H/uas\n6mf5bG/ZUpmTZs8OXJZRxM2xY6qNqhmMTElquzk5+bKcworoogqmTGQ+azSaskWd2DpMHTKVm6bf\nxOytJmbxPFbvX815tc8z3PfEE/DooypHwgrp6WrFERtr7jwj53MoQ0yNZbtTlj+0YtBoKhg9Gvdg\n1vBZ3DLzFh6d/2ih3tTFIaVk2uZpDGgxwHD/FVeoZLs77lA9KcyyfbsKrTVt0/ezYrADt606Sgut\nGDSaCkjXhl1Ze8dath/dztlvnM2kFZM4kXWi2HMmrZxEuCecPk2N06WFgA8/VH0ibr5ZRUMVR6Xw\nSqRlp53+vmyZSrqzgpucz+qcMhaGVAQdrqrRVFDyzUq/7/2dF357gcd+foxLml9Cn6Z9aFe3HfWr\n1CdX5rLtyDa+WP8FC3cuZN6N84qd9CpXVqU2Ro6Ejh1h8mTV1c6IWjG1TkdISQmffgp3323+Pvw5\nZ8vBiztQeAWiE9w0Gk2p0LlBZ74b+h0H0w7yw18/8MuuX/ho7UfsO7GPcE84jeMac2nzS3n10lep\nXsk3IqkoMTHw5Zfq57rrVPvTe+6BSy4pnLxWO6Y2h9IPIaWqFHvsGFxzjfnxG5mSYmNhzx4LslyW\n4GakhHWCm0ajKTXqxNZhZPuRjGw/MmhZQsDw4ap20yefwDPPKPNS795w4YUq63pf2ln8fHAZi8fB\niRMwc6a1VqPZXt8SHfHxsGqVyTHrBLfTaMWg0WgcIyoKRo9WP7t3wy+/wNq1qhLsycgO7Drr33z6\nkGTwYEFYWMnyjMjMySQqPKrQtjp17CnRkZtrrURHWUcrBo1GUyo0bgw33qh+AKRsTvPXI2jadS1h\nYWS+ZYYAAAwxSURBVO0sy03NSiUuqnA2dvPmygkeLKdOuS8TuzTQUUkajSYkCCG4se2NvLH8jaDk\npGalUjWqaqFtzZqpLOq9fjvNG1M01PTUKYj0LQ1VqhRcgegEN41GU+55oNsDfL/1e5YmLbUs42Da\nQWrH1i60TQhVhvybbwKXY+ToPXECqlSxPLRC2JX57PoENyHEP4UQfwohcoUQHQpsbyKESBdCrMr7\nsdYJXaPRlGuqRVfjvcHvMWTqELYd2Wb6/KycLPak7uGsar5lOm67DV57zXomNlgr0WGEnR3lXK8Y\ngPXAVcAvBvu2SSk75P3cFeR1NBpNOWXwOYN5uvfT9P6oN4t2LTJ17tI9S2lTpw0RYb6OgF69VNOi\nsWOtjy0pSdWAMoPT9ZZcH64qpdwCIIwzXsq290Wj0ZQat3W8jYZVG3LtN9dybetrear3U9SKqVXi\neW+teIvhbYb73T9lCnTurBzfd5l8PZUSNm1SBQLN4JVePMIZK31ZWTEUR9M8M9JCIURPB6+j0WjK\nAYPOHsTaO9bilV7OefMc7vz+TpbvWU6u17eWd443h2d/fZb1yeu548I7/MqsXRt+/hlefRXuv185\npP0RFRZFenb66e+bNql/mzUzdx92K4aimc+uWDEIIeYB8QU3ARJ4Qko5y89p+4DGUsqjeb6H6UKI\n1lLKk0GPWKPRlFvqxNbhjUFv8GjPR/lg9QeMmjmKAycP0Kl+J5pVb0ZUWBQH0w/yy85fOLfWucy/\naT6xkcWXY23WTNVhuusuuOAClWw3ZIjqg12QupXrcuDkgdPfn39elfaw0qinqGIID1dNiMxS1Bjj\nGsUgpexvVqiUMhs4mvd5lRBiO9ASMMxFHDdu3OnPCQkJJCQkmL2kRqMpRzSo2oCnej/FU72fYm/q\nXtYcWMPfx/4mOzebC+pewJO9nqRV7VYBy6tZE776Cn76CZ58Eh57DEaMgMsug3bt1MRdr0o99p/c\nz8GDkvHjBWvWwKRJ5see6831UQw1a0KK9cZ5ACQmJvLDD4lkZKhe3k4i7HCUCCEWAg9KKf/I+14L\nOCKl9AohmqGc0+dLKY8ZnCvLQ5lajUZTNpBSlcv45BNV8G/3buWDqFoVVl7UgMjPF3HDZc148UWo\nXnJpKB/mbJ3D67+/zpzrzzRDklLVkEpJUf8GSvz/xbPujnXEV1ZGm4MHVe2pQ4fUakJK6cj6ISjn\nsxDiSuANoBbwvRBijZRyIHAR8B8hxCnAC9xupBQ0Go2mtBFCVX7t2FF9P35cKYfUVBi3sRvDZ/zG\nqI4mHQsFSM9OJyai8OwvhIpuSkqCc86xPvbcXCyXDjFDsFFJ04HpBtunAdOCka3RaDSlQVwcnH++\n+jyy6tV8vPZjRnW8ybI8I8UAymS1Zo15xVAw87m0MrF15rNGo9HkcXWrq9lwcANLkpZYlpGSkUL1\naF8bVPfuKkLKDEUzn+3MxC4OrRg0Go0mj+jwaCZcMoGRM0ZyLNOa9XvXsV00jmvss33IEJg2DU4G\nEZuZmqp8IU6jFYNGo9EUYGiboQxsMZBBnw/iSMYR0+evTV5Lq1q+EVONG8OAAfDSS9bHdvSoPSU6\nSkIrBo1GoynCK5e+Qs/GPek0pRO/7vo14PNS0lP4Y/8f9GxsnNP70kvw3nswb561cW3bZj7hzgpa\nMWg0Gk0RPMLDf/v/l4mXTmT4t8O5duq1rE9eX+w5UkqeWvgU17a+lrjoOMNjGjRQ+RTXXQfffx/Y\nOApmfq9eDW3amLoVS2jFoNFoNH4YfM5g/rrnLy6sfyGXfnYpnaZ04sXfXmTx7sWkpKcgpSQjO4Ml\nSUsY+s1QFict5uVLXi5W5kUXqTamd94JDzyg/Ab+qBlTk5QMlRl38iT88IMqJ+40tiS4BTUAneCm\n0WjKALneXObvmM+cbXNYtHsRO47uIDUrlXBPOK1rt2ZI6yHc2+VeKkdWDkheSgr8+98wd66q4zRy\npOpVXZD+n/bnoe4P0a/pJdxxB2Rmwqefqn1OJrhpxaDRaDQWyfHmEO4JrkPyunWqb8S0adC+PfTt\nC61bQ716MH7dCGql9WLrV7cSHQ2zZp2JStKKQaPRaMo5J0/CL7/AwoWwdSskJ8PBs14nov4Gnu/+\nLldeWTjrWSsGjUajqYCsPbCWK7+6ku33bvcpzOekYtDOZ41Go3EpbePbUjWqKj9u+7FUr6sVg0aj\n0bgUIQTjeo/jwXkPkpmTWXrXDbUZR5uSNBqNxj9SSoZ/O5wwTxifXPkJYR7laNCmJI1Go6mgCCH4\n4IoP2H9iP0O/GcqJrBOOX1MrBo1Go3E5MRExzLl+DtWjq9P2nbbM3jrb0etpU5JGo9GUIWZvnc2/\nf/o3m+/ZrMNVNRqNRqPI8eYQERahfQwajUajUQSbbV0SWjFoNBqNphBBKQYhxH+FEJuEEGuEEN8K\nIaoW2PeYEGJr3v5Lgh+qRqPRaEqDYFcMPwHnSSnbAVuBxwCEEK2Ba4FWwEDgbSGEI7aw8kRiYmKo\nh+Aa9LM4g34WZ9DPonQISjFIKedLKb15X5cBDfM+Xw58KaXMkVLuRCmNzsFcqyKgf+nPoJ/FGfSz\nOIN+FqWDnT6GUUB+cG0DIKnAvr152zQajUbjckp0bQsh5gEF20cIQAJPSCln5R3zBJAtpfyfI6PU\naDQaTakRdB6DEOJmYDTQV0qZlbftUUBKKV/K+z4XGCulXG5wvk5i0Gg0Ggu4MsFNCDEAmABcJKVM\nKbC9NfA50AVlQpoHnK0z2TQajcb9BJsl8QYQCczLCzpaJqW8S0q5UQjxNbARyAbu0kpBo9FoygYh\nL4mh0Wg0GncR0sxnIcQAIcRmIcRfQohHQjkWJxBCNBRCLBBCbBBCrBdC3Ju3vboQ4ichxBYhxI9C\niLgC5xgmBgohOggh1uU9q4mhuB87EEJ4hBCrhBAz875XyGchhIgTQkzNu7cNQoguFfhZPCCE+DPv\nPj4XQkRWlGchhHhfCJEshFhXYJtt9573LL/MO2epEKJxQAOTUobkB6WUtgFNgAhgDXBuqMbj0D3W\nBdrlfa4MbAHOBV4CHs7b/gjwYt7n1sBqlImvad7zyV/VLQc65X2eDVwa6vuz+EweAD4DZuZ9r5DP\nAvgIGJn3ORyIq4jPAqgP7AAi875/BYyoKM8C6Am0A9YV2GbbvQN3Am/nfR6Kyi8rcVyhXDF0BrZK\nKXdJKbOBL4ErQjge25FSHpBSrsn7fBLYhEoCvAL4OO+wj4Er8z4bJgYKIeoCVaSUK/KO+6TAOWUG\nIURDYBDwXoHNFe5Z5JWO6SWl/BAg7x6PUwGfRR5hQKwQIhyohMp7qhDPQkr5G3C0yGY7772grG+A\nfoGMK5SKoWgS3B7KcRKcEKIp6s1gGRAvpUwGpTyAOnmH+UsMbIB6PvmU1Wf1KvAQKg8mn4r4LM4C\nDgshPswzq00WQsRQAZ+FlHIfKrJxN+q+jksp51MBn0UB6th476fPkVLmAseEEDVKGoCurloKCCEq\no7T1fXkrh6Ie/3IfASCEuAxIzltBFRd7Xe6fBcoU0AF4S0rZAUgDHqVi/l5UQ73VNkGZlWKFENdT\nAZ9FMdh57wHlPYRSMewFCjpCGuZtK1fkLY+/AT6VUs7I25wshIjP218XOJi3fS/QqMDp+c/E3/ay\nRA/gciHEDuB/QF8hxKfAgQr4LPYASVLKlXnfv0Upior4e3ExsENKeSTvjfY7oDsV81nkY+e9n94n\nhAgDqkopj5Q0gFAqhhVACyFEEyFEJDAMmBnC8TjFB8BGKeVrBbbNBG7O+zwCmFFg+7C8SIKzgBbA\n73nLyeNCiM5CCAHcVOCcMoGU8nEpZWMpZTPU//UCKeWNwCwq3rNIBpKEEC3zNvUDNlABfy9QJqSu\nQojovHvoh8p/qkjPQlD4Td7Oe5+ZJwNgCLAgoBGF2CM/ABWpsxV4NJRjcej+egC5qIir1cCqvHuu\nAczPu/efgGoFznkMFW2wCbikwPaOwPq8Z/VaqO8tyOfSmzNRSRXyWQAXoF6O1gDTUFFJFfVZjM27\nr3UoR2lERXkWwBfAPiALpSRHAtXtuncgCvg6b/syoGkg49IJbhqNRqMphHY+azQajaYQWjFoNBqN\nphBaMWg0Go2mEFoxaDQajaYQWjFoNBqNphBaMWg0Go2mEFoxaDQajaYQWjFoNBqNphD/D1Hk2jNc\n0tsWAAAAAElFTkSuQmCC\n",
254 | "text/plain": [
255 | ""
256 | ]
257 | },
258 | "metadata": {},
259 | "output_type": "display_data"
260 | }
261 | ],
262 | "source": [
263 | "plt.plot(w_array[:, 6])\n",
264 | "plt.plot(w_array[:, 5])"
265 | ]
266 | },
267 | {
268 | "cell_type": "markdown",
269 | "metadata": {},
270 | "source": [
271 | "### Second Edition"
272 | ]
273 | },
274 | {
275 | "cell_type": "markdown",
276 | "metadata": {},
277 | "source": [
278 | "##### 9.1 \n",
279 | "$(N+1)^d$ since there are $N+1$ polynomial functions for each dimension $d$. Each basis function is a product of all states, thus there are $(N+1)^d$ functions."
280 | ]
281 | },
282 | {
283 | "cell_type": "markdown",
284 | "metadata": {},
285 | "source": [
286 | "##### 9.2\n",
287 | "\n",
288 | "$N=2$, $c_i$ are just the exponents on state 1 and 2 for each feature."
289 | ]
290 | },
291 | {
292 | "cell_type": "markdown",
293 | "metadata": {},
294 | "source": [
295 | "##### 9.3\n",
296 | "\n",
297 | "Because $N+1$ integer vectors can be assigned to each dimension $d$."
298 | ]
299 | },
300 | {
301 | "cell_type": "markdown",
302 | "metadata": {},
303 | "source": [
304 | "##### 9.4\n",
305 | "\n",
306 | "Same as 8.7 above."
307 | ]
308 | },
309 | {
310 | "cell_type": "markdown",
311 | "metadata": {},
312 | "source": [
313 | "#### 10.1\n",
314 | "\n",
315 | "Why have we not considered Monte Carlo methods in this chapter?\n",
316 | "\n",
317 | "Monte Carlo methods only update state-action value functions at the end of each episode. Semi-gradient methods are necessary for tasks such as the Mountain Car example, since the episodes take a long time to end."
318 | ]
319 | },
320 | {
321 | "cell_type": "markdown",
322 | "metadata": {},
323 | "source": [
324 | "##### 10.2\n",
325 | "Give pseudocode for semi-gradient one-step Expected Sarsa for control.\n",
326 | "\n",
327 | "\n",
328 | "Similar to n-step SARSA, but:\n",
329 | "- For each episode:\n",
330 | " - Initialize and store $S_0$\n",
331 | " - Select and store an action $A_0$, $\\epsilon$-greedy wrt to $\\hat{q}(S_0, \\theta)$\n",
332 | " - For each step $t$ in episode:\n",
333 | " - Take action $A_t$\n",
334 | " - Observe and store next reward as $R_{t+1}$ and next state as $S_{t+1}$\n",
335 | " - $G_t \\leftarrow R_{t+1} + \\gamma V_{t+1} = R_{t+1} + \\gamma \\sum_a \\pi(a|s_t) Q_{t-1}(s_t, a)$\n",
336 | " - $\\theta \\leftarrow \\theta + \\alpha [G_t - \\hat{q}(S_t, A_t, \\theta)] \\nabla \\hat{q}(S_t, A_t, \\theta)$\n",
337 | " - Select and store $A_{t+1}$, $\\epsilon$-greedy wrt to $\\hat{q}(S_{t+1}, \\theta)$"
338 | ]
339 | },
340 | {
341 | "cell_type": "markdown",
342 | "metadata": {},
343 | "source": [
344 | "##### *11.1\n",
345 | "\n",
346 | "Convert the equation of n-step off-policy TD (7.7) to semi-gradient\n",
347 | "form. Give accompanying definitions of the return for both the episodic and continuing\n",
348 | "cases.\n",
349 | "\n",
350 | "$\\theta_{t+n} = \\theta_{t+n-1} + \\alpha \\rho^{t+n}_t[G_t^{(n)} - \\hat{q}(S_t, A_t, \\theta_{t+n-1})] \\nabla\\hat{q}(S_t, A_t, \\theta_{t+n-1})$,\n",
351 | "\n",
352 | "where $\\rho_t^{t+n}$ is the importance sampling ratio ( is the relative probability under two policies of taking the n actions), and\n",
353 | "\n",
354 | "$G_t^{(n)} = R_{t+1} + ... + \\gamma^{n-1} R_{t+n} + \\gamma^n \\hat{q}(S_t, A_t, \\theta_{t+n-1})$, (episodic)\n",
355 | "\n",
356 | "$G_t^{(n)} = R_{t+1} - \\bar{R}_{t} + ... + R_{t+n} - \\bar{R}_{t+n+1} + \\hat{q}(S_t, A_t, \\theta_{t+n-1})$, (continuing)"
357 | ]
358 | },
359 | {
360 | "cell_type": "markdown",
361 | "metadata": {},
362 | "source": [
363 | "##### **11.2\n",
364 | "\n",
365 | "Convert the equations of n-step Q(σ) (7.9, 7.13, 7.14, and 7.15)\n",
366 | "to semi-gradient form. Give definitions that cover both the episodic and continuing\n",
367 | "cases.\n",
368 | "\n",
369 | "The semigradient update step would be:\n",
370 | "\n",
371 | "$\\theta_{t+n} = \\theta_{t+n-1} + \\alpha (1 - \\sigma_{t+1} + \\sigma_{t+1} \\rho_{t+1}) (1 - \\sigma_{t+2} + \\sigma_{t+2} \\rho_{t+2}) \\cdot \\cdot \\cdot (1 - \\sigma_{t+n-1} + \\sigma_{t+n-1} \\rho_{t+n-1}) [G_t^{(n)} - \\hat{q}(S_t, A_t, \\theta_{t+n-1})] \\nabla\\hat{q}(S_t, A_t, \\theta_{t+n-1})$,\n",
372 | "\n",
373 | "where $\\rho$ is defined in 7.15, and 7.14 is the n-step return for the episodic case. For the continuing task, $\\delta$ in 7.15 would need to be:\n",
374 | "\n",
375 | "$\\delta_t = R_{t+1} - \\bar{R_t} + \\gamma [\\sigma_{t+1} Q_t(S_{t+1}, A_{t+1}) + (1 - \\sigma_{t+1}V_{t+1})] - Q_{t-1}(S_t, A_t)$\n",
376 | "\n",
377 | "\n"
378 | ]
379 | },
380 | {
381 | "cell_type": "code",
382 | "execution_count": null,
383 | "metadata": {
384 | "collapsed": true
385 | },
386 | "outputs": [],
387 | "source": []
388 | }
389 | ],
390 | "metadata": {
391 | "kernelspec": {
392 | "display_name": "Python 2",
393 | "language": "python",
394 | "name": "python2"
395 | },
396 | "language_info": {
397 | "codemirror_mode": {
398 | "name": "ipython",
399 | "version": 2
400 | },
401 | "file_extension": ".py",
402 | "mimetype": "text/x-python",
403 | "name": "python",
404 | "nbconvert_exporter": "python",
405 | "pygments_lexer": "ipython2",
406 | "version": "2.7.11"
407 | }
408 | },
409 | "nbformat": 4,
410 | "nbformat_minor": 0
411 | }
412 |
--------------------------------------------------------------------------------