4 | 2019 edition - Gym intro, Genetics, CEM, Tabular DQN
5 |
6 |
7 | #### 0. Gym interface
8 | - `00-gym.ipynb` [](https://colab.research.google.com/github/Scitator/rl-teaser/blob/master/2019/code/00-gym.ipynb)
9 |
10 |
11 | #### 1. Genetic algorithm
12 | - [slides](./2019/slides/01-genetics.pdf)
13 | - `01-genetics.ipynb` [](https://colab.research.google.com/github/Scitator/rl-teaser/blob/master/2019/code/01-genetics.ipynb)
14 |
15 | ##### Additional materials
16 | * __[recommended]__ - awesome openai post about evolution strategies - [blog post](https://blog.openai.com/evolution-strategies/), [article](https://arxiv.org/abs/1703.03864)
17 | * Video on genetic algorithms - https://www.youtube.com/watch?v=ejxfTy4lI6I
18 | * Another guide to genetic algorithm - https://www.youtube.com/watch?v=zwYV11a__HQ
19 | * PDF on Differential evolution - http://jvanderw.une.edu.au/DE_1.pdf
20 | * Video on Ant Colony Algorithm - https://www.youtube.com/watch?v=D58nLNLkb0I
21 | * Longer video on Ant Colony Algorithm - https://www.youtube.com/watch?v=xpyKmjJuqhk
22 |
23 |
24 | #### 2. Cross Entropy Method
25 | - [slides](./2019/slides/02-cem.pdf)
26 | - `02-cem.ipynb` [](https://colab.research.google.com/github/Scitator/rl-teaser/blob/master/2019/code/02-cem.ipynb)
27 |
28 | ##### Additional materials
29 | * __[main]__ Video-intro by David Silver - https://www.youtube.com/watch?v=2pWv7GOvuf0
30 | * Optional lecture by David Silver - https://www.youtube.com/watch?v=lfHX2hHRMVQ
31 | * __[recommended]__ - formal explanation of crossentropy method in [general](https://people.smp.uq.edu.au/DirkKroese/ps/CEEncycl.pdf) and for [optimization](https://people.smp.uq.edu.au/DirkKroese/ps/CEopt.pdf)
32 |
33 |
34 | #### 3. Tabular
35 | - [slides](./2019/slides/03-tabular.pdf)
36 | - `03-tabular.ipynb` [](https://colab.research.google.com/github/Scitator/rl-teaser/blob/master/2019/code/03-tabular.ipynb)
37 |
38 | ##### Additional materials
39 | * __[main]__ lecture by David Silver - [url](https://www.youtube.com/watch?v=Nd1-UUMVfz4)
40 | * Alternative lecture by Pieter Abbeel: [part 1](https://www.youtube.com/watch?v=i0o-ui1N35U), [part 2](https://www.youtube.com/watch?v=Csiiv6WGzKM)
41 | * Alternative lecture by John Schulmann: https://www.youtube.com/watch?v=IL3gVyJMmhg
42 | * Definitive guide in policy/value iteration from Sutton: start from page 81 [here](http://incompleteideas.net/sutton/book/bookdraft2017june19.pdf).
43 |
44 |
45 | #### 4. DQN
46 | - [slides](./2019/slides/04-dqn.pdf)
47 | - `04-dqn.ipynb` [](https://colab.research.google.com/github/Scitator/rl-teaser/blob/master/2019/code/04-dqn.ipynb)
48 |
49 | ##### Additional materials
50 | * Lecture by David Silver - [video part I](https://www.youtube.com/watch?v=PnHCvfgC_ZA), [video part II](https://www.youtube.com/watch?v=0g4j2k_Ggc4&t=43s)
51 | * Alternative lecture by Pieter Abbeel - [video](https://www.youtube.com/watch?v=ifma8G7LegE)
52 | * Alternative lecture by John Schulmann - [video](https://www.youtube.com/watch?v=IL3gVyJMmhg)
53 | * Blog post on q-learning Vs SARSA - [url](https://studywolf.wordpress.com/2013/07/01/reinforcement-learning-sarsa-vs-q-learning/)
54 | * N-step temporal difference from Sutton's book - [suttonbook](http://incompleteideas.net/book/RLbook2018.pdf) __chapter 7__
55 | * Eligibility traces from Sutton's book - [suttonbook](http://incompleteideas.net/book/RLbook2018.pdf) __chapter 12__
56 | * Blog post on eligibility traces - [url](http://pierrelucbacon.com/traces/)
57 |
58 |
59 |
60 |
61 |