└── README.md


/README.md:
--------------------------------------------------------------------------------
  1 | # RL Winter Mentorship
  2 | 
  3 | This is the repository that contains all the material/code required to get
  4 | started with the mentorship programme. A few points of administration:
  5 | 
  6 | 1. The length of the mentorship is around 5 weeks.
  7 | 
  8 | 2. We assume you some prior knowledge of programming.
  9 | 
 10 | 3. For an help with the course, you can privately contact your mentor. A better option
 11 | would be to [open an issue on this repository](https://help.github.com/en/github/managing-your-work-on-github/creating-an-issue),
 12 | so that others can see your
 13 | question, and it'll prevent any replicated effort on the part of the mentor.
 14 | All discussions related to code will happen over issues.
 15 | 
 16 | 4. All your code will be pushed to GitHub, so if you haven't already, create
 17 | a GitHub account. Create a **private** repository with the name:
 18 | `rl-winter-mentorship` and add **only your mentor** as a collaborator. The
 19 | mentors GitHub IDs are: `@squadrick` (Dheeraj), `@sahas00` (Sahas).
 20 | 
 21 | 5. Create a `README.md` in your repository where you can keep track of your
 22 | progress over the next month. The mentors will be using the `README.md` as
 23 | a progress tracker.
 24 | 
 25 | Don't be afraid to ask any questions (however irrelevant you think it may be).
 26 | The mentors are here to help you every step of the way.
 27 | 
 28 | For any issues with the GirlScript Winter Mentorship Programme please contact
 29 | Arpith or Akshatha.
 30 | 
 31 | ## Prerequisites
 32 | 
 33 |   1. OS: Either some Linux based OS (Ubuntu, Fedora, etc.) OR Mac OSX. Windows
 34 |   will not suffice. [Tutorial for installing Ubuntu](https://tutorials.ubuntu.com/tutorial/tutorial-install-ubuntu-desktop).
 35 | 
 36 |   2. Language: We'll be using Python3 throughout this course. So familiarise
 37 |   yourself with the language. Also learn to install packages using `pip`.
 38 | 
 39 |   3. Libraries:
 40 | 
 41 |       a. [NumPy](https://numpy.org/): Used for matrix computations.
 42 | 
 43 |       b. [OpenAI Gym](https://gym.openai.com/): Has a host of training
 44 |       environments with an easy-to-use API.
 45 | 
 46 |       c. [TensorFlow](https://www.tensorflow.org/) OR
 47 |       [PyTorch](https://pytorch.org/): Deep learning libraries that we'll use
 48 |       later on (week 4, 5) in the course to train neural networks. The choice is
 49 |       left entirely up to the mentee, but you can contact your mentor to narrow
 50 |       down the choice.
 51 | 
 52 |   4. Tools:
 53 | 
 54 |       a. Text Editor: You can use any editor of choice. Recommendations: VSCode,
 55 |       Atom, Vim, Emacs. You can also use an IDE if you wish, PyCharm is [free for
 56 |       students.](https://www.jetbrains.com/student/)
 57 | 
 58 |       b. `git`: You'll be using GitHub for all your code/assignment submission,
 59 |       so learn the basics of `git`: `pull`, `push`, `add`, `commit`. 
 60 |       [Here's](https://www.atlassian.com/git/tutorials) an excellent tutorial for `git`.
 61 |       
 62 |       c. [Google Colab](https://colab.research.google.com/): Free access to a 
 63 |       powerful GPUs for training your agents. This will be handy for week 4 and 5.
 64 | 
 65 | 
 66 | ## Scope
 67 | 
 68 | The scope of this course will be rather narrow due to the time constraint, but we
 69 | hope you'll learn the foundational level of reinforcement learning that'll
 70 | help you along the way when you decide to learn more advanced concepts.
 71 | 
 72 | 1. Markov Decision Processes (MDPs): A formula mathematical framework for RL.
 73 | 
 74 | 2. Tabular methods: Value iteration, policy iteration
 75 | 
 76 | 3. RL with function approximators: Building and training a perceptron from
 77 | scratch to solve famous RL problems (CartPole, Mountain Car).
 78 | 
 79 | 4. Imitation learning: You'll be competiting against your peers to see who
 80 | can perform the best in trying to imitate an expert to control a robot.
 81 | 
 82 | 5. Intro to Deep RL: Brief introduction to using deep learning with RL to create
 83 | powerful general purpose solvers.
 84 | 
 85 | ## Resources
 86 | 
 87 | Since every one prefers a different approach to learning, we're gonna try our
 88 | best to accomodate each style. Every topic has multiple levels of resources:
 89 | 
 90 | 1. Intuitive: This will be a high level, *hand-wavy* explanation of the concepts.
 91 | This will not help you understand the core of the concept, but you will have a
 92 | general understanding.
 93 | 
 94 | 2. Code: If you prefer to learn by looking at the codebase, we'll link open
 95 | source implementations of the algorithm (where appropriate).
 96 | 
 97 | 3. Lectures: We'll link free online YouTube lectures.
 98 | 
 99 | 4. TextBook: We'll link to chapters from this book -
100 | [Sutton and Barto](https://web.stanford.edu/class/psych209/Readings/SuttonBartoIPRLBook2ndEd.pdf) (SnB).
101 | 
102 | The recommendation would be to either use Lectures or Text Books to get a
103 | solid grasp of the conceptual details, and to use the Code as a reference
104 | during the assignment. Please note that we don't tolerate any plagiarism.
105 | 
106 | At the end of each week you will be given a set of deliverables to complete.
107 | This could either be a report, or a coding assignment. All submissions will
108 | happen via GitHub.
109 | 
110 | ## Detailed Breakdown
111 | 
112 | #### Week 0
113 | 
114 | Before the start of the course, we except you to have completed all the administration
115 | work and prerequisites. Also, some *pop-sciency* knowledge never hurt anyone:
116 | 
117 |   1. [TextBook] Chaper 1 from SnB: The Reinforcement Learning Problem
118 | 
119 |   2. [Intuitive] [What is reinforcement learning?](https://deepsense.ai/what-is-reinforcement-learning-the-complete-guide/)
120 | 
121 |   3. [Lecture] [David Silver Lecture 1](https://www.youtube.com/watch?v=2pWv7GOvuf0)
122 | 
123 |   4. [Wikipedia article on RL](https://en.wikipedia.org/wiki/Reinforcement_learning)
124 | 
125 | 
126 | #### Week 1
127 | 
128 | Mathematical foundation of RL - Markov Decision Processes.
129 | 
130 |   1. [TextBook] Chapter 3 from SnB: Finite Markov Decision Processes
131 | 
132 |   2. [Intuitive] [Reinforcement Learning Demystified: Markov Decision Processes](https://towardsdatascience.com/reinforcement-learning-demystified-markov-decision-processes-part-1-bf00dda41690)
133 | 
134 |   3. [Lecture] [David Silver Lecture 2](https://www.youtube.com/watch?v=lfHX2hHRMVQ)
135 | 
136 | Deliverables:
137 | 
138 | 1. Solve Excercises 3.1, 3.2 and 3.3 from Sna (Page: 85). Write a report using
139 | [Markdown](https://www.markdownguide.org/getting-started/) or Google Docs. Please
140 | keep the answers as brief as possible, you won't be assessed on the length of
141 | the report.
142 | 
143 | #### Week 2
144 | 
145 | Tabular methods
146 | 
147 |   1. [TextBook] Chapter 4 from SnB: Dynamic Programming
148 | 
149 |   2. [Intuitive, Code] [Medium article](https://medium.com/@m.alzantot/deep-reinforcement-learning-demysitifed-episode-2-policy-iteration-value-iteration-and-q-978f9e89ddaa)
150 | 
151 |   3. [Lecture] [David Silver Lecture 3](https://www.youtube.com/watch?v=Nd1-UUMVfz4)
152 | 
153 | Deliverables:
154 | 
155 | 1. Solve all environments from [`gym-gridworlds`](https://github.com/podondra/gym-gridworlds)
156 | using value and policy iteration using **only** NumPy.
157 | 
158 | #### Week 3
159 | 
160 | Function Approximators
161 | 
162 |   1. [TextBook] Chapter 9, 10, 11 from SnB
163 |   2. [Lecture] David Silver: [Lecture 6](https://www.youtube.com/watch?v=UoPei5o4fps), [Lecture 7](https://www.youtube.com/watch?v=KHZVXao4qXs)
164 |   3. [Code] [OpenAI Spinning Up](https://spinningup.openai.com/en/latest/)
165 | 
166 | 
167 | Deliverables:
168 | 
169 | Build a single layer neural network using **only** NumPy to solve `CartPole` using:
170 | 
171 |   1. Q-Learning
172 |   2. Vanilla Policy gradients
173 | 
174 | 
175 | #### Week 4
176 | 
177 | Competition Week
178 | 
179 | We'll give you some data from an expert controlling a robot. Your task to create
180 | the best agent either using the expert data or not. Throughout the week, we'll
181 | maintain a leaderboard of scores, and each mentee can have multiple submissions.
182 | The format for submissions will be announced later. (#TODO)
183 | 
184 | #### Week 5
185 | 
186 | Intro to Deep RL. The leap from the previous week to this will be quite substantial. 
187 | The exact specifics of this week is open-ended, it's entirely up to the mentee
188 | to decide what they want to pursue. A few potential options are:
189 | 
190 |   1. Reimplementing a seminal research paper like [DQN](https://www.cs.toronto.edu/~vmnih/docs/dqn.pdf), 
191 |   [PPO](https://arxiv.org/abs/1707.06347), etc.
192 |   
193 |   2. Using [an existing library](https://github.com/openai/baselines) on new 
194 |   unexplored environments like your favourite FPS game, or in more unconvential 
195 |   problems like [solving symbolic integration](https://en.wikipedia.org/wiki/Symbolic_integration). 
196 |   
197 | You'll discuss with your mentor based on your progress to figure out what'll work 
198 | best, focusing on your area of interest. 
199 | 


--------------------------------------------------------------------------------