├── .github ├── ISSUE_TEMPLATE │ ├── i-have-a-bug-with-a-hands-on.md │ ├── i-have-a-question.md │ └── i-want-to-improve-the-course.md └── workflows │ ├── build_documentation.yml │ ├── build_pr_documentation.yml │ └── upload_pr_documentation.yml ├── LICENSE.md ├── README.md ├── notebooks ├── bonus-unit1 │ ├── bonus-unit1.ipynb │ └── bonus_unit1.ipynb ├── unit1 │ ├── requirements-unit1.txt │ └── unit1.ipynb ├── unit2 │ ├── requirements-unit2.txt │ └── unit2.ipynb ├── unit3 │ └── unit3.ipynb ├── unit4 │ ├── requirements-unit4.txt │ └── unit4.ipynb ├── unit5 │ └── unit5.ipynb ├── unit6 │ ├── requirements-unit6.txt │ └── unit6.ipynb └── unit8 │ ├── unit8_part1.ipynb │ └── unit8_part2.ipynb └── units └── en ├── _toctree.yml ├── communication ├── certification.mdx └── conclusion.mdx ├── live1 └── live1.mdx ├── unit0 ├── discord101.mdx ├── introduction.mdx └── setup.mdx ├── unit1 ├── additional-readings.mdx ├── conclusion.mdx ├── deep-rl.mdx ├── exp-exp-tradeoff.mdx ├── glossary.mdx ├── hands-on.mdx ├── introduction.mdx ├── quiz.mdx ├── rl-framework.mdx ├── summary.mdx ├── tasks.mdx ├── two-methods.mdx └── what-is-rl.mdx ├── unit2 ├── additional-readings.mdx ├── bellman-equation.mdx ├── conclusion.mdx ├── glossary.mdx ├── hands-on.mdx ├── introduction.mdx ├── mc-vs-td.mdx ├── mid-way-quiz.mdx ├── mid-way-recap.mdx ├── q-learning-example.mdx ├── q-learning-recap.mdx ├── q-learning.mdx ├── quiz2.mdx ├── two-types-value-based-methods.mdx └── what-is-rl.mdx ├── unit3 ├── additional-readings.mdx ├── conclusion.mdx ├── deep-q-algorithm.mdx ├── deep-q-network.mdx ├── from-q-to-dqn.mdx ├── glossary.mdx ├── hands-on.mdx ├── introduction.mdx └── quiz.mdx ├── unit4 ├── additional-readings.mdx ├── advantages-disadvantages.mdx ├── conclusion.mdx ├── glossary.mdx ├── hands-on.mdx ├── introduction.mdx ├── pg-theorem.mdx ├── policy-gradient.mdx ├── quiz.mdx └── what-are-policy-based-methods.mdx ├── unit5 ├── bonus.mdx ├── conclusion.mdx ├── curiosity.mdx ├── hands-on.mdx ├── how-mlagents-works.mdx ├── introduction.mdx ├── pyramids.mdx ├── quiz.mdx └── snowball-target.mdx ├── unit6 ├── additional-readings.mdx ├── advantage-actor-critic.mdx ├── conclusion.mdx ├── hands-on.mdx ├── introduction.mdx ├── quiz.mdx └── variance-problem.mdx ├── unit7 ├── additional-readings.mdx ├── conclusion.mdx ├── hands-on.mdx ├── introduction-to-marl.mdx ├── introduction.mdx ├── multi-agent-setting.mdx ├── quiz.mdx └── self-play.mdx ├── unit8 ├── additional-readings.mdx ├── clipped-surrogate-objective.mdx ├── conclusion-sf.mdx ├── conclusion.mdx ├── hands-on-cleanrl.mdx ├── hands-on-sf.mdx ├── introduction-sf.mdx ├── introduction.mdx ├── intuition-behind-ppo.mdx └── visualize.mdx ├── unitbonus1 ├── conclusion.mdx ├── how-huggy-works.mdx ├── introduction.mdx ├── play.mdx └── train.mdx ├── unitbonus2 ├── hands-on.mdx ├── introduction.mdx └── optuna.mdx ├── unitbonus3 ├── curriculum-learning.mdx ├── decision-transformers.mdx ├── envs-to-try.mdx ├── generalisation.mdx ├── godotrl.mdx ├── introduction.mdx ├── language-models.mdx ├── learning-agents.mdx ├── model-based.mdx ├── offline-online.mdx ├── rl-documentation.mdx ├── rlhf.mdx └── student-works.mdx └── unitbonus5 ├── conclusion.mdx ├── customize-the-environment.mdx ├── getting-started.mdx ├── introduction.mdx ├── the-environment.mdx └── train-our-robot.mdx /.github/ISSUE_TEMPLATE/i-have-a-bug-with-a-hands-on.md: -------------------------------------------------------------------------------- 1 | --- 2 | name: I have a bug with a hands-on 3 | about: You have encountered a bug during one of the hands-on 4 | title: "[HANDS-ON BUG]" 5 | labels: '' 6 | assignees: '' 7 | 8 | --- 9 | 10 | # Describe the bug 11 | 12 | A clear and concise description of what the bug is. 13 | **Please share your notebook link so that we can reproduce the error** 14 | 15 | # Material 16 | 17 | - Did you use Google Colab? 18 | 19 | If not: 20 | - Your Operating system (OS) 21 | - Version of your OS 22 | -------------------------------------------------------------------------------- /.github/ISSUE_TEMPLATE/i-have-a-question.md: -------------------------------------------------------------------------------- 1 | --- 2 | name: I have a question 3 | about: You have a question about a part of the course 4 | title: "[QUESTION]" 5 | labels: '' 6 | assignees: '' 7 | 8 | --- 9 | 10 | 1. First, the **best way to get a response fast is to ask the community** it on #rl-study-group in our Discord server: https://www.hf.co/join/discord 11 | 12 | 2. If you prefer you can ask here, please **be specific**. 13 | -------------------------------------------------------------------------------- /.github/ISSUE_TEMPLATE/i-want-to-improve-the-course.md: -------------------------------------------------------------------------------- 1 | --- 2 | name: I want to improve the course 3 | about: You found a typo, an error or you want to improve a part of the course 4 | title: "[UPDATE]" 5 | labels: '' 6 | assignees: '' 7 | 8 | --- 9 | 10 | # What do you want to improve? 11 | 12 | - Explain the typo/error or the part of the course you want to improve 13 | 14 | - **Also, don't hesitate to open a Pull Request with the update**. 15 | -------------------------------------------------------------------------------- /.github/workflows/build_documentation.yml: -------------------------------------------------------------------------------- 1 | name: Build documentation 2 | 3 | on: 4 | push: 5 | branches: 6 | - main 7 | 8 | jobs: 9 | build: 10 | uses: huggingface/doc-builder/.github/workflows/build_main_documentation.yml@main 11 | with: 12 | commit_sha: ${{ github.sha }} 13 | package: deep-rl-class 14 | package_name: deep-rl-course 15 | path_to_docs: deep-rl-class/units/ 16 | additional_args: --not_python_module 17 | languages: en 18 | secrets: 19 | hf_token: ${{ secrets.HF_DOC_BUILD_PUSH }} 20 | -------------------------------------------------------------------------------- /.github/workflows/build_pr_documentation.yml: -------------------------------------------------------------------------------- 1 | name: Build PR Documentation 2 | 3 | on: 4 | pull_request: 5 | 6 | concurrency: 7 | group: ${{ github.workflow }}-${{ github.head_ref || github.run_id }} 8 | cancel-in-progress: true 9 | 10 | jobs: 11 | build: 12 | uses: huggingface/doc-builder/.github/workflows/build_pr_documentation.yml@main 13 | with: 14 | commit_sha: ${{ github.event.pull_request.head.sha }} 15 | pr_number: ${{ github.event.number }} 16 | package: deep-rl-class 17 | package_name: deep-rl-course 18 | path_to_docs: deep-rl-class/units/ 19 | additional_args: --not_python_module 20 | languages: en 21 | -------------------------------------------------------------------------------- /.github/workflows/upload_pr_documentation.yml: -------------------------------------------------------------------------------- 1 | name: Upload PR Documentation 2 | 3 | on: 4 | workflow_run: 5 | workflows: ["Build PR Documentation"] 6 | types: 7 | - completed 8 | 9 | jobs: 10 | build: 11 | uses: huggingface/doc-builder/.github/workflows/upload_pr_documentation.yml@main 12 | with: 13 | package_name: deep-rl-course 14 | hub_base_path: https://moon-ci-docs.huggingface.co 15 | secrets: 16 | hf_token: ${{ secrets.HF_DOC_BUILD_PUSH }} 17 | comment_bot_token: ${{ secrets.COMMENT_BOT_TOKEN }} -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # [The Hugging Face Deep Reinforcement Learning Course 🤗 (v2.0)](https://huggingface.co/deep-rl-course/unit0/introduction) 2 | 3 | Thumbnail 4 | 5 | If you like the course, don't hesitate to **⭐ star this repository. This helps us 🤗**. 6 | 7 | This repository contains the Deep Reinforcement Learning Course mdx files and notebooks. **The website is here**: https://huggingface.co/deep-rl-course/unit0/introduction?fw=pt 8 | 9 | - The syllabus 📚: https://simoninithomas.github.io/deep-rl-course 10 | 11 | - The course 📚: https://huggingface.co/deep-rl-course/unit0/introduction?fw=pt 12 | 13 | - **Sign up here** ➡️➡️➡️ http://eepurl.com/ic5ZUD 14 | 15 | ## Course Maintenance Notice 🚧 16 | 17 | Please note that this **Deep Reinforcement Learning course is now in a low-maintenance state**. However, it **remains an excellent resource to learn both the theory and practical aspects of Deep Reinforcement Learning**. 18 | 19 | Keep in mind the following points: 20 | 21 | - *Unit 7 (AI vs AI)* : This feature is currently non-functional. However, you can still train your agent to play soccer and observe its performance. 22 | 23 | - *Leaderboard* : The leaderboard is no longer operational. 24 | 25 | Aside from these points, all theory content and practical exercises remain fully accessible and effective for learning. 26 | 27 | If you have any problem with one of the hands-on **please check the issue sections where the community give some solutions to bugs**. 28 | 29 | ## Citing the project 30 | 31 | To cite this repository in publications: 32 | 33 | ```bibtex 34 | @misc{deep-rl-course, 35 | author = {Simonini, Thomas and Sanseviero, Omar}, 36 | title = {The Hugging Face Deep Reinforcement Learning Class}, 37 | year = {2023}, 38 | publisher = {GitHub}, 39 | journal = {GitHub repository}, 40 | howpublished = {\url{https://github.com/huggingface/deep-rl-class}}, 41 | } 42 | ``` 43 | -------------------------------------------------------------------------------- /notebooks/unit1/requirements-unit1.txt: -------------------------------------------------------------------------------- 1 | stable-baselines3==2.0.0a5 2 | swig 3 | gymnasium[box2d] 4 | huggingface_sb3 5 | -------------------------------------------------------------------------------- /notebooks/unit2/requirements-unit2.txt: -------------------------------------------------------------------------------- 1 | gymnasium 2 | pygame 3 | numpy 4 | 5 | huggingface_hub 6 | pickle5 7 | pyyaml==6.0 8 | imageio 9 | imageio_ffmpeg 10 | pyglet==1.5.1 11 | tqdm -------------------------------------------------------------------------------- /notebooks/unit4/requirements-unit4.txt: -------------------------------------------------------------------------------- 1 | git+https://github.com/ntasfi/PyGame-Learning-Environment.git 2 | git+https://github.com/simoninithomas/gym-games 3 | huggingface_hub 4 | imageio-ffmpeg 5 | pyyaml==6.0 6 | -------------------------------------------------------------------------------- /notebooks/unit6/requirements-unit6.txt: -------------------------------------------------------------------------------- 1 | stable-baselines3==2.0.0a4 2 | huggingface_sb3 3 | panda-gym 4 | huggingface_hub -------------------------------------------------------------------------------- /units/en/communication/certification.mdx: -------------------------------------------------------------------------------- 1 | # The certification process 2 | 3 | 4 | The certification process is **completely free**: 5 | 6 | - To get a *certificate of completion*: you need **to pass 80% of the assignments**. 7 | - To get a *certificate of excellence*: you need **to pass 100% of the assignments**. 8 | 9 | There's **no deadlines, the course is self-paced**. 10 | 11 | Course certification 12 | 13 | When we say pass, **we mean that your model must be pushed to the Hub and get a result equal or above the minimal requirement**. 14 | 15 | To check your progression and which unit you passed/not passed: https://huggingface.co/spaces/ThomasSimonini/Check-my-progress-Deep-RL-Course 16 | 17 | Now that you're ready for the certification process, you need to: 18 | 19 | 1. Go here: https://huggingface.co/spaces/huggingface-projects/Deep-RL-Course-Certification/ 20 | 2. Type your *hugging face username*, your *first name*, *last name* 21 | 22 | 3. Click on "Generate my certificate". 23 | - If you passed 80% of the assignments, **congratulations** you've just got the certificate of completion. 24 | - If you passed 100% of the assignments, **congratulations** you've just got the excellence certificate. 25 | - If you are below 80%, don't be discouraged! Check which units you need to do again to get your certificate. 26 | 27 | 4. You can download your certificate in pdf format and png format. 28 | 29 | Don't hesitate to share your certificate on Twitter (tag me @ThomasSimonini and @huggingface) and on Linkedin. 30 | 31 | -------------------------------------------------------------------------------- /units/en/communication/conclusion.mdx: -------------------------------------------------------------------------------- 1 | # Congratulations 2 | 3 | Thumbnail 4 | 5 | 6 | **Congratulations on finishing this course!** With perseverance, hard work, and determination, **you've acquired a solid background in Deep Reinforcement Learning**. 7 | 8 | But finishing this course is **not the end of your journey**. It's just the beginning: don't hesitate to explore bonus unit 3, where we show you topics you may be interested in studying. And don't hesitate to **share what you're doing, and ask questions in the discord server** 9 | 10 | **Thank you** for being part of this course. **I hope you liked this course as much as I loved writing it**. 11 | 12 | Don't hesitate **to give us feedback on how we can improve the course** using [this form](https://forms.gle/BzKXWzLAGZESGNaE9) 13 | 14 | And don't forget **to check in the next section how you can get (if you pass) your certificate of completion ‎‍🎓.** 15 | 16 | One last thing, to keep in touch with the Reinforcement Learning Team and with me: 17 | 18 | - [Follow me on Twitter](https://twitter.com/thomassimonini) 19 | - [Follow Hugging Face Twitter account](https://twitter.com/huggingface) 20 | - [Join the Hugging Face Discord](https://www.hf.co/join/discord) 21 | 22 | ## Keep Learning, Stay Awesome 🤗 23 | 24 | Thomas Simonini, 25 | -------------------------------------------------------------------------------- /units/en/live1/live1.mdx: -------------------------------------------------------------------------------- 1 | # Live 1: How the course work, Q&A, and playing with Huggy 2 | 3 | In this first live stream, we explained how the course work (scope, units, challenges, and more) and answered your questions. 4 | 5 | And finally, we saw some LunarLander agents you've trained and play with your Huggies 🐶 6 | 7 | 8 | 9 | To know when the next live is scheduled **check the discord server**. We will also send **you an email**. If you can't participate, don't worry, we record the live sessions. -------------------------------------------------------------------------------- /units/en/unit0/discord101.mdx: -------------------------------------------------------------------------------- 1 | # Discord 101 [[discord-101]] 2 | 3 | Hey there! My name is Huggy, the dog 🐕, and I'm looking forward to train with you during this RL Course! 4 | Although I don't know much about fetching sticks (yet), I know one or two things about Discord. So I wrote this guide to help you learn about it! 5 | 6 | Huggy Logo 7 | 8 | Discord is a free chat platform. If you've used Slack, **it's quite similar**. There is a Hugging Face Community Discord server with 50000 members you can join with a single click here. So many humans to play with! 9 | 10 | Starting in Discord can be a bit intimidating, so let me take you through it. 11 | 12 | When you [sign-up to our Discord server](http://hf.co/join/discord), you'll choose your interests. Make sure to **click "Reinforcement Learning,"** and you'll get access to the Reinforcement Learning Category containing all the course-related channels. If you feel like joining even more channels, go for it! 🚀 13 | 14 | Then click next, you'll then get to **introduce yourself in the `#introduce-yourself` channel**. 15 | 16 | 17 | Discord 18 | 19 | They are in the reinforcement learning category. **Don't forget to sign up to these channels** by clicking on 🤖 Reinforcement Learning in `role-assigment`. 20 | - `rl-announcements`: where we give the **latest information about the course**. 21 | - `rl-discussions`: where you can **exchange about RL and share information**. 22 | - `rl-study-group`: where you can **ask questions and exchange with your classmates**. 23 | - `rl-i-made-this`: where you can **share your projects and models**. 24 | 25 | The HF Community Server has a thriving community of human beings interested in many areas, so you can also learn from those. There are paper discussions, events, and many other things. 26 | 27 | Was this useful? There are a couple of tips I can share with you: 28 | 29 | - There are **voice channels** you can use as well, although most people prefer text chat. 30 | - You can **use markdown style** for text chats. So if you're writing code, you can use that style. Sadly this does not work as well for links. 31 | - You can open threads as well! It's a good idea when **it's a long conversation**. 32 | 33 | I hope this is useful! And if you have questions, just ask! 34 | 35 | See you later! 36 | 37 | Huggy 🐶 38 | -------------------------------------------------------------------------------- /units/en/unit0/setup.mdx: -------------------------------------------------------------------------------- 1 | # Setup [[setup]] 2 | 3 | After all this information, it's time to get started. We're going to do two things: 4 | 5 | 1. **Create your Hugging Face account** if it's not already done 6 | 2. **Sign up to Discord and introduce yourself** (don't be shy 🤗) 7 | 8 | ### Let's create my Hugging Face account 9 | 10 | (If it's not already done) create an account to HF here 11 | 12 | ### Let's join our Discord server 13 | 14 | You can now sign up for our Discord Server. This is the place where you **can chat with the community and with us, create and join study groups to grow with each other and more** 15 | 16 | 👉🏻 Join our discord server here. 17 | 18 | When you join, remember to introduce yourself in #introduce-yourself and sign-up for reinforcement channels in #channels-and-roles. 19 | 20 | We have multiple RL-related channels: 21 | - `rl-announcements`: where we give the latest information about the course. 22 | - `rl-discussions`: where you can chat about RL and share information. 23 | - `rl-study-group`: where you can create and join study groups. 24 | - `rl-i-made-this`: where you can share your projects and models. 25 | 26 | If this is your first time using Discord, we wrote a Discord 101 to get the best practices. Check the next section. 27 | 28 | Congratulations! **You've just finished the on-boarding**. You're now ready to start to learn Deep Reinforcement Learning. Have fun! 29 | 30 | 31 | ### Keep Learning, stay awesome 🤗 32 | -------------------------------------------------------------------------------- /units/en/unit1/additional-readings.mdx: -------------------------------------------------------------------------------- 1 | # Additional Readings [[additional-readings]] 2 | 3 | These are **optional readings** if you want to go deeper. 4 | 5 | ## Deep Reinforcement Learning [[deep-rl]] 6 | 7 | - [Reinforcement Learning: An Introduction, Richard Sutton and Andrew G. Barto Chapter 1, 2 and 3](http://incompleteideas.net/book/RLbook2020.pdf) 8 | - [Foundations of Deep RL Series, L1 MDPs, Exact Solution Methods, Max-ent RL by Pieter Abbeel](https://youtu.be/2GwBez0D20A) 9 | - [Spinning Up RL by OpenAI Part 1: Key concepts of RL](https://spinningup.openai.com/en/latest/spinningup/rl_intro.html) 10 | 11 | ## Gym [[gym]] 12 | 13 | - [Getting Started With OpenAI Gym: The Basic Building Blocks](https://blog.paperspace.com/getting-started-with-openai-gym/) 14 | - [Make your own Gym custom environment](https://www.gymlibrary.dev/content/environment_creation/) 15 | -------------------------------------------------------------------------------- /units/en/unit1/conclusion.mdx: -------------------------------------------------------------------------------- 1 | # Conclusion [[conclusion]] 2 | 3 | Congrats on finishing this unit! **That was the biggest one**, and there was a lot of information. And congrats on finishing the tutorial. You’ve just trained your first Deep RL agents and shared them with the community! 🥳 4 | 5 | It's **normal if you still feel confused by some of these elements**. This was the same for me and for all people who studied RL. 6 | 7 | **Take time to really grasp the material** before continuing. It’s important to master these elements and have a solid foundation before entering the fun part. 8 | 9 | Naturally, during the course, we’re going to use and explain these terms again, but it’s better to understand them before diving into the next units. 10 | 11 | In the next (bonus) unit, we’re going to reinforce what we just learned by **training Huggy the Dog to fetch a stick**. 12 | 13 | You will then be able to play with him 🤗. 14 | 15 | Huggy 16 | 17 | Finally, we would love **to hear what you think of the course and how we can improve it**. If you have some feedback then, please 👉 [fill this form](https://forms.gle/BzKXWzLAGZESGNaE9) 18 | 19 | ### Keep Learning, stay awesome 🤗 20 | 21 | 22 | -------------------------------------------------------------------------------- /units/en/unit1/deep-rl.mdx: -------------------------------------------------------------------------------- 1 | # The “Deep” in Reinforcement Learning [[deep-rl]] 2 | 3 | 4 | What we've talked about so far is Reinforcement Learning. But where does the "Deep" come into play? 5 | 6 | 7 | Deep Reinforcement Learning introduces **deep neural networks to solve Reinforcement Learning problems** — hence the name “deep”. 8 | 9 | For instance, in the next unit, we’ll learn about two value-based algorithms: Q-Learning (classic Reinforcement Learning) and then Deep Q-Learning. 10 | 11 | You’ll see the difference is that, in the first approach, **we use a traditional algorithm** to create a Q table that helps us find what action to take for each state. 12 | 13 | In the second approach, **we will use a Neural Network** (to approximate the Q value). 14 | 15 |
16 | Value based RL 17 |
Schema inspired by the Q learning notebook by Udacity 18 |
19 |
20 | 21 | If you are not familiar with Deep Learning you should definitely watch [the FastAI Practical Deep Learning for Coders](https://course.fast.ai) (Free). 22 | -------------------------------------------------------------------------------- /units/en/unit1/exp-exp-tradeoff.mdx: -------------------------------------------------------------------------------- 1 | # The Exploration/Exploitation trade-off [[exp-exp-tradeoff]] 2 | 3 | Finally, before looking at the different methods to solve Reinforcement Learning problems, we must cover one more very important topic: *the exploration/exploitation trade-off.* 4 | 5 | - *Exploration* is exploring the environment by trying random actions in order to **find more information about the environment.** 6 | - *Exploitation* is **exploiting known information to maximize the reward.** 7 | 8 | Remember, the goal of our RL agent is to maximize the expected cumulative reward. However, **we can fall into a common trap**. 9 | 10 | Let’s take an example: 11 | 12 | Exploration 13 | 14 | In this game, our mouse can have an **infinite amount of small cheese** (+1 each). But at the top of the maze, there is a gigantic sum of cheese (+1000). 15 | 16 | However, if we only focus on exploitation, our agent will never reach the gigantic sum of cheese. Instead, it will only exploit **the nearest source of rewards,** even if this source is small (exploitation). 17 | 18 | But if our agent does a little bit of exploration, it can **discover the big reward** (the pile of big cheese). 19 | 20 | This is what we call the exploration/exploitation trade-off. We need to balance how much we **explore the environment** and how much we **exploit what we know about the environment.** 21 | 22 | Therefore, we must **define a rule that helps to handle this trade-off**. We’ll see the different ways to handle it in the future units. 23 | 24 | If it’s still confusing, **think of a real problem: the choice of picking a restaurant:** 25 | 26 | 27 |
28 | Exploration 29 |
Source: Berkley AI Course 30 |
31 |
32 | 33 | - *Exploitation*: You go to the same one that you know is good every day and **take the risk to miss another better restaurant.** 34 | - *Exploration*: Try restaurants you never went to before, with the risk of having a bad experience **but the probable opportunity of a fantastic experience.** 35 | 36 | To recap: 37 | Exploration Exploitation Tradeoff 38 | -------------------------------------------------------------------------------- /units/en/unit1/glossary.mdx: -------------------------------------------------------------------------------- 1 | # Glossary [[glossary]] 2 | 3 | This is a community-created glossary. Contributions are welcome! 4 | 5 | ### Agent 6 | 7 | An agent learns to **make decisions by trial and error, with rewards and punishments from the surroundings**. 8 | 9 | ### Environment 10 | 11 | An environment is a simulated world **where an agent can learn by interacting with it**. 12 | 13 | ### Markov Property 14 | 15 | It implies that the action taken by our agent is **conditional solely on the present state and independent of the past states and actions**. 16 | 17 | ### Observations/State 18 | 19 | - **State**: Complete description of the state of the world. 20 | - **Observation**: Partial description of the state of the environment/world. 21 | 22 | ### Actions 23 | 24 | - **Discrete Actions**: Finite number of actions, such as left, right, up, and down. 25 | - **Continuous Actions**: Infinite possibility of actions; for example, in the case of self-driving cars, the driving scenario has an infinite possibility of actions occurring. 26 | 27 | ### Rewards and Discounting 28 | 29 | - **Rewards**: Fundamental factor in RL. Tells the agent whether the action taken is good/bad. 30 | - RL algorithms are focused on maximizing the **cumulative reward**. 31 | - **Reward Hypothesis**: RL problems can be formulated as a maximisation of (cumulative) return. 32 | - **Discounting** is performed because rewards obtained at the start are more likely to happen as they are more predictable than long-term rewards. 33 | 34 | ### Tasks 35 | 36 | - **Episodic**: Has a starting point and an ending point. 37 | - **Continuous**: Has a starting point but no ending point. 38 | 39 | ### Exploration v/s Exploitation Trade-Off 40 | 41 | - **Exploration**: It's all about exploring the environment by trying random actions and receiving feedback/returns/rewards from the environment. 42 | - **Exploitation**: It's about exploiting what we know about the environment to gain maximum rewards. 43 | - **Exploration-Exploitation Trade-Off**: It balances how much we want to **explore** the environment and how much we want to **exploit** what we know about the environment. 44 | 45 | ### Policy 46 | 47 | - **Policy**: It is called the agent's brain. It tells us what action to take, given the state. 48 | - **Optimal Policy**: Policy that **maximizes** the **expected return** when an agent acts according to it. It is learned through *training*. 49 | 50 | ### Policy-based Methods: 51 | 52 | - An approach to solving RL problems. 53 | - In this method, the Policy is learned directly. 54 | - Will map each state to the best corresponding action at that state. Or a probability distribution over the set of possible actions at that state. 55 | 56 | ### Value-based Methods: 57 | 58 | - Another approach to solving RL problems. 59 | - Here, instead of training a policy, we train a **value function** that maps each state to the expected value of being in that state. 60 | 61 | Contributions are welcome 🤗 62 | 63 | If you want to improve the course, you can [open a Pull Request.](https://github.com/huggingface/deep-rl-class/pulls) 64 | 65 | This glossary was made possible thanks to: 66 | 67 | - [@lucifermorningstar1305](https://github.com/lucifermorningstar1305) 68 | - [@daspartho](https://github.com/daspartho) 69 | - [@misza222](https://github.com/misza222) 70 | 71 | -------------------------------------------------------------------------------- /units/en/unit1/introduction.mdx: -------------------------------------------------------------------------------- 1 | # Introduction to Deep Reinforcement Learning [[introduction-to-deep-reinforcement-learning]] 2 | 3 | Unit 1 thumbnail 4 | 5 | 6 | Welcome to the most fascinating topic in Artificial Intelligence: **Deep Reinforcement Learning.** 7 | 8 | Deep RL is a type of Machine Learning where an agent learns **how to behave** in an environment **by performing actions** and **seeing the results.** 9 | 10 | In this first unit, **you'll learn the foundations of Deep Reinforcement Learning.** 11 | 12 | 13 | Then, you'll **train your Deep Reinforcement Learning agent, a lunar lander to land correctly on the Moon** using Stable-Baselines3 , a Deep Reinforcement Learning library. 14 | 15 | 16 | LunarLander 17 | 18 | And finally, you'll **upload this trained agent to the Hugging Face Hub 🤗, a free, open platform where people can share ML models, datasets, and demos.** 19 | 20 | It's essential **to master these elements** before diving into implementing Deep Reinforcement Learning agents. The goal of this chapter is to give you solid foundations. 21 | 22 | 23 | After this unit, in a bonus unit, you'll be **able to train Huggy the Dog 🐶 to fetch the stick and play with him 🤗**. 24 | 25 |