├── .gitignore ├── README.md └── ppo ├── colab_notebooks ├── PPO_v0.ipynb └── cleanrl_ppo.ipynb ├── ppo-readme.md └── ppo_resources.md /.gitignore: -------------------------------------------------------------------------------- 1 | .idea 2 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # rl-playground 2 | This repo is for members of the [#rl-implementation channel](https://discord.gg/fm6aDxJSbw) on MLC Discord to learn how to code and debug RL algorithms. 3 | 4 | Please note: content will change as we work. 5 | 6 | Dec 2021: Implementation of PPO 7 | [Google Doc](https://docs.google.com/document/d/1mAIeRdorIHxT8rNYZ5x1AL4dFpLvWTd3U0dWKFKGTfM/edit?usp=sharing) with structure, links and decisions taken. 8 | 9 | Meeting: Mon, Wed, Fri for 2 hours (18:00 CET - 20:00 CET) 10 | See discord channel for details. 11 | -------------------------------------------------------------------------------- /ppo/ppo-readme.md: -------------------------------------------------------------------------------- 1 | 2 | [PPO paper](https://arxiv.org/abs/1707.06347): in Arxiv 3 | 4 | MLC RL Reading group's [review of PPO paper](https://docs.google.com/presentation/d/1SyvJ-XJUbMboKup8xHXOdSHDPXLW95_NIMEMLR7ghhc/edit#slide=id.ge68055b75a_0_3) 5 | 6 | RL Implementation group's [Google Doc](https://docs.google.com/document/d/1mAIeRdorIHxT8rNYZ5x1AL4dFpLvWTd3U0dWKFKGTfM/edit?usp=sharing) with structure, links and decisions taken. 7 | 8 | 9 | -------------------------------------------------------------------------------- /ppo/ppo_resources.md: -------------------------------------------------------------------------------- 1 | **Resources found used during implementation sessions:** 2 | 3 | * [Link](https://www.youtube.com/watch?v=5P7I-xPq8u8&t=952s) to a video shared on the basics of PPO 4 | * [Link](https://math.stackexchange.com/questions/3108216/change-of-variables-apply-tanh-to-the-gaussian-samples) explains the use of the TANH as the activation function on the Feed Forward 5 | * [Implementation](https://github.com/nikhilbarhate99/PPO-PyTorch/blob/master/PPO_colab.ipynb) we referred to for how to code the multivariatenormal policy 6 | * Paper:["Benchmarking Deep Reinforcement Learning for Continuous Control"](https://arxiv.org/pdf/1604.06778.pdf) useful for standards adopted 7 | * Multiprocessing with SB3 - [Tutorial](https://colab.research.google.com/github/Stable-Baselines-Team/rl-colab-notebooks/blob/sb3/multiprocessing_rl.ipynb#scrollTo=BIedd7Pz9sOs) 8 | * SB3 - [SubprocVecEnv](https://stable-baselines3.readthedocs.io/en/master/guide/vec_envs.html#subprocvecenv) 9 | * Research Coding [best practices](https://goodresearch.dev/#the-good-research-code-handbook) 10 | * Seeds and reproducibility [pytorch](https://pytorch.org/docs/stable/notes/randomness.html) 11 | * Cleanrl's [blog](https://costa.sh/blog-the-32-implementation-details-of-ppo.html) on improving PPO 12 | * GAE [blog](https://danieltakeshi.github.io/2017/04/02/notes-on-the-generalized-advantage-estimation-paper/) from Seita's place 13 | 14 | 15 | 16 | --------------------------------------------------------------------------------