├── .gitignore
├── README.md
└── ppo
    ├── colab_notebooks
        ├── PPO_v0.ipynb
        └── cleanrl_ppo.ipynb
    ├── ppo-readme.md
    └── ppo_resources.md


/.gitignore:
--------------------------------------------------------------------------------
1 | .idea
2 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # rl-playground
 2 | This repo is for members of the [#rl-implementation channel](https://discord.gg/fm6aDxJSbw) on MLC Discord to learn how to code and debug RL algorithms.
 3 | 
 4 | Please note: content will change as we work. 
 5 | 
 6 | Dec 2021: Implementation of PPO
 7 | [Google Doc](https://docs.google.com/document/d/1mAIeRdorIHxT8rNYZ5x1AL4dFpLvWTd3U0dWKFKGTfM/edit?usp=sharing) with structure, links and decisions taken. 
 8 | 
 9 | Meeting: Mon, Wed, Fri for 2 hours (18:00 CET - 20:00 CET)
10 | See discord channel for details. 
11 | 


--------------------------------------------------------------------------------
/ppo/ppo-readme.md:
--------------------------------------------------------------------------------
1 | 
2 | [PPO paper](https://arxiv.org/abs/1707.06347): in Arxiv
3 | 
4 | MLC RL Reading group's [review of PPO paper](https://docs.google.com/presentation/d/1SyvJ-XJUbMboKup8xHXOdSHDPXLW95_NIMEMLR7ghhc/edit#slide=id.ge68055b75a_0_3)
5 | 
6 | RL Implementation group's [Google Doc](https://docs.google.com/document/d/1mAIeRdorIHxT8rNYZ5x1AL4dFpLvWTd3U0dWKFKGTfM/edit?usp=sharing) with structure, links and decisions taken. 
7 | 
8 | 
9 | 


--------------------------------------------------------------------------------
/ppo/ppo_resources.md:
--------------------------------------------------------------------------------
 1 | **Resources found used during implementation sessions:**
 2 | 
 3 | * [Link](https://www.youtube.com/watch?v=5P7I-xPq8u8&t=952s) to a video shared on the basics of PPO
 4 | * [Link](https://math.stackexchange.com/questions/3108216/change-of-variables-apply-tanh-to-the-gaussian-samples) explains the use of the TANH as the activation function on the Feed Forward
 5 | * [Implementation](https://github.com/nikhilbarhate99/PPO-PyTorch/blob/master/PPO_colab.ipynb) we referred to for how to code the multivariatenormal policy
 6 | * Paper:["Benchmarking Deep Reinforcement Learning for Continuous Control"](https://arxiv.org/pdf/1604.06778.pdf) useful for standards adopted
 7 | * Multiprocessing with SB3 - [Tutorial](https://colab.research.google.com/github/Stable-Baselines-Team/rl-colab-notebooks/blob/sb3/multiprocessing_rl.ipynb#scrollTo=BIedd7Pz9sOs)
 8 | * SB3 - [SubprocVecEnv](https://stable-baselines3.readthedocs.io/en/master/guide/vec_envs.html#subprocvecenv)
 9 | * Research Coding [best practices](https://goodresearch.dev/#the-good-research-code-handbook)
10 | * Seeds and reproducibility [pytorch](https://pytorch.org/docs/stable/notes/randomness.html)
11 | * Cleanrl's [blog](https://costa.sh/blog-the-32-implementation-details-of-ppo.html) on improving PPO
12 | * GAE [blog](https://danieltakeshi.github.io/2017/04/02/notes-on-the-generalized-advantage-estimation-paper/) from Seita's place
13 | 
14 | 
15 | 
16 | 


--------------------------------------------------------------------------------