└── README.md /README.md: -------------------------------------------------------------------------------- 1 | # RL Winter Mentorship 2 | 3 | This is the repository that contains all the material/code required to get 4 | started with the mentorship programme. A few points of administration: 5 | 6 | 1. The length of the mentorship is around 5 weeks. 7 | 8 | 2. We assume you some prior knowledge of programming. 9 | 10 | 3. For an help with the course, you can privately contact your mentor. A better option 11 | would be to [open an issue on this repository](https://help.github.com/en/github/managing-your-work-on-github/creating-an-issue), 12 | so that others can see your 13 | question, and it'll prevent any replicated effort on the part of the mentor. 14 | All discussions related to code will happen over issues. 15 | 16 | 4. All your code will be pushed to GitHub, so if you haven't already, create 17 | a GitHub account. Create a **private** repository with the name: 18 | `rl-winter-mentorship` and add **only your mentor** as a collaborator. The 19 | mentors GitHub IDs are: `@squadrick` (Dheeraj), `@sahas00` (Sahas). 20 | 21 | 5. Create a `README.md` in your repository where you can keep track of your 22 | progress over the next month. The mentors will be using the `README.md` as 23 | a progress tracker. 24 | 25 | Don't be afraid to ask any questions (however irrelevant you think it may be). 26 | The mentors are here to help you every step of the way. 27 | 28 | For any issues with the GirlScript Winter Mentorship Programme please contact 29 | Arpith or Akshatha. 30 | 31 | ## Prerequisites 32 | 33 | 1. OS: Either some Linux based OS (Ubuntu, Fedora, etc.) OR Mac OSX. Windows 34 | will not suffice. [Tutorial for installing Ubuntu](https://tutorials.ubuntu.com/tutorial/tutorial-install-ubuntu-desktop). 35 | 36 | 2. Language: We'll be using Python3 throughout this course. So familiarise 37 | yourself with the language. Also learn to install packages using `pip`. 38 | 39 | 3. Libraries: 40 | 41 | a. [NumPy](https://numpy.org/): Used for matrix computations. 42 | 43 | b. [OpenAI Gym](https://gym.openai.com/): Has a host of training 44 | environments with an easy-to-use API. 45 | 46 | c. [TensorFlow](https://www.tensorflow.org/) OR 47 | [PyTorch](https://pytorch.org/): Deep learning libraries that we'll use 48 | later on (week 4, 5) in the course to train neural networks. The choice is 49 | left entirely up to the mentee, but you can contact your mentor to narrow 50 | down the choice. 51 | 52 | 4. Tools: 53 | 54 | a. Text Editor: You can use any editor of choice. Recommendations: VSCode, 55 | Atom, Vim, Emacs. You can also use an IDE if you wish, PyCharm is [free for 56 | students.](https://www.jetbrains.com/student/) 57 | 58 | b. `git`: You'll be using GitHub for all your code/assignment submission, 59 | so learn the basics of `git`: `pull`, `push`, `add`, `commit`. 60 | [Here's](https://www.atlassian.com/git/tutorials) an excellent tutorial for `git`. 61 | 62 | c. [Google Colab](https://colab.research.google.com/): Free access to a 63 | powerful GPUs for training your agents. This will be handy for week 4 and 5. 64 | 65 | 66 | ## Scope 67 | 68 | The scope of this course will be rather narrow due to the time constraint, but we 69 | hope you'll learn the foundational level of reinforcement learning that'll 70 | help you along the way when you decide to learn more advanced concepts. 71 | 72 | 1. Markov Decision Processes (MDPs): A formula mathematical framework for RL. 73 | 74 | 2. Tabular methods: Value iteration, policy iteration 75 | 76 | 3. RL with function approximators: Building and training a perceptron from 77 | scratch to solve famous RL problems (CartPole, Mountain Car). 78 | 79 | 4. Imitation learning: You'll be competiting against your peers to see who 80 | can perform the best in trying to imitate an expert to control a robot. 81 | 82 | 5. Intro to Deep RL: Brief introduction to using deep learning with RL to create 83 | powerful general purpose solvers. 84 | 85 | ## Resources 86 | 87 | Since every one prefers a different approach to learning, we're gonna try our 88 | best to accomodate each style. Every topic has multiple levels of resources: 89 | 90 | 1. Intuitive: This will be a high level, *hand-wavy* explanation of the concepts. 91 | This will not help you understand the core of the concept, but you will have a 92 | general understanding. 93 | 94 | 2. Code: If you prefer to learn by looking at the codebase, we'll link open 95 | source implementations of the algorithm (where appropriate). 96 | 97 | 3. Lectures: We'll link free online YouTube lectures. 98 | 99 | 4. TextBook: We'll link to chapters from this book - 100 | [Sutton and Barto](https://web.stanford.edu/class/psych209/Readings/SuttonBartoIPRLBook2ndEd.pdf) (SnB). 101 | 102 | The recommendation would be to either use Lectures or Text Books to get a 103 | solid grasp of the conceptual details, and to use the Code as a reference 104 | during the assignment. Please note that we don't tolerate any plagiarism. 105 | 106 | At the end of each week you will be given a set of deliverables to complete. 107 | This could either be a report, or a coding assignment. All submissions will 108 | happen via GitHub. 109 | 110 | ## Detailed Breakdown 111 | 112 | #### Week 0 113 | 114 | Before the start of the course, we except you to have completed all the administration 115 | work and prerequisites. Also, some *pop-sciency* knowledge never hurt anyone: 116 | 117 | 1. [TextBook] Chaper 1 from SnB: The Reinforcement Learning Problem 118 | 119 | 2. [Intuitive] [What is reinforcement learning?](https://deepsense.ai/what-is-reinforcement-learning-the-complete-guide/) 120 | 121 | 3. [Lecture] [David Silver Lecture 1](https://www.youtube.com/watch?v=2pWv7GOvuf0) 122 | 123 | 4. [Wikipedia article on RL](https://en.wikipedia.org/wiki/Reinforcement_learning) 124 | 125 | 126 | #### Week 1 127 | 128 | Mathematical foundation of RL - Markov Decision Processes. 129 | 130 | 1. [TextBook] Chapter 3 from SnB: Finite Markov Decision Processes 131 | 132 | 2. [Intuitive] [Reinforcement Learning Demystified: Markov Decision Processes](https://towardsdatascience.com/reinforcement-learning-demystified-markov-decision-processes-part-1-bf00dda41690) 133 | 134 | 3. [Lecture] [David Silver Lecture 2](https://www.youtube.com/watch?v=lfHX2hHRMVQ) 135 | 136 | Deliverables: 137 | 138 | 1. Solve Excercises 3.1, 3.2 and 3.3 from Sna (Page: 85). Write a report using 139 | [Markdown](https://www.markdownguide.org/getting-started/) or Google Docs. Please 140 | keep the answers as brief as possible, you won't be assessed on the length of 141 | the report. 142 | 143 | #### Week 2 144 | 145 | Tabular methods 146 | 147 | 1. [TextBook] Chapter 4 from SnB: Dynamic Programming 148 | 149 | 2. [Intuitive, Code] [Medium article](https://medium.com/@m.alzantot/deep-reinforcement-learning-demysitifed-episode-2-policy-iteration-value-iteration-and-q-978f9e89ddaa) 150 | 151 | 3. [Lecture] [David Silver Lecture 3](https://www.youtube.com/watch?v=Nd1-UUMVfz4) 152 | 153 | Deliverables: 154 | 155 | 1. Solve all environments from [`gym-gridworlds`](https://github.com/podondra/gym-gridworlds) 156 | using value and policy iteration using **only** NumPy. 157 | 158 | #### Week 3 159 | 160 | Function Approximators 161 | 162 | 1. [TextBook] Chapter 9, 10, 11 from SnB 163 | 2. [Lecture] David Silver: [Lecture 6](https://www.youtube.com/watch?v=UoPei5o4fps), [Lecture 7](https://www.youtube.com/watch?v=KHZVXao4qXs) 164 | 3. [Code] [OpenAI Spinning Up](https://spinningup.openai.com/en/latest/) 165 | 166 | 167 | Deliverables: 168 | 169 | Build a single layer neural network using **only** NumPy to solve `CartPole` using: 170 | 171 | 1. Q-Learning 172 | 2. Vanilla Policy gradients 173 | 174 | 175 | #### Week 4 176 | 177 | Competition Week 178 | 179 | We'll give you some data from an expert controlling a robot. Your task to create 180 | the best agent either using the expert data or not. Throughout the week, we'll 181 | maintain a leaderboard of scores, and each mentee can have multiple submissions. 182 | The format for submissions will be announced later. (#TODO) 183 | 184 | #### Week 5 185 | 186 | Intro to Deep RL. The leap from the previous week to this will be quite substantial. 187 | The exact specifics of this week is open-ended, it's entirely up to the mentee 188 | to decide what they want to pursue. A few potential options are: 189 | 190 | 1. Reimplementing a seminal research paper like [DQN](https://www.cs.toronto.edu/~vmnih/docs/dqn.pdf), 191 | [PPO](https://arxiv.org/abs/1707.06347), etc. 192 | 193 | 2. Using [an existing library](https://github.com/openai/baselines) on new 194 | unexplored environments like your favourite FPS game, or in more unconvential 195 | problems like [solving symbolic integration](https://en.wikipedia.org/wiki/Symbolic_integration). 196 | 197 | You'll discuss with your mentor based on your progress to figure out what'll work 198 | best, focusing on your area of interest. 199 | --------------------------------------------------------------------------------