├── .gitignore ├── LICENSE ├── README.md ├── docs └── Usage.md ├── examples ├── run_dqn_agent.py └── run_pg_agent.py ├── plot ├── .gitignore ├── README.md ├── dataio.py ├── dataproc.py └── plotter.py └── rltf ├── __init__.py ├── agents ├── __init__.py ├── agent.py ├── base_agents.py ├── ddpg_agent.py ├── dqn_agent.py ├── pg_agent.py ├── ppo_agent.py ├── qlearn_agent.py └── trpo_agent.py ├── cmdutils ├── __init__.py ├── cmdargs.py ├── defaults.py └── override.py ├── envs ├── __init__.py ├── atari.py ├── common.py ├── utils.py └── wrappers.py ├── exploration ├── __init__.py ├── exploration.py └── random_noise.py ├── memory ├── __init__.py ├── base_buffer.py ├── pg_buffer.py └── replay_buffer.py ├── models ├── __init__.py ├── base_dqn.py ├── base_pg.py ├── bdqn.py ├── bstrap_dqn.py ├── c51.py ├── c51_ids.py ├── ddpg.py ├── ddqn.py ├── dqn.py ├── dqn_ensemble.py ├── dqn_ids.py ├── dqn_ucb.py ├── model.py ├── ppo.py ├── qr_dqn.py ├── qrdqn_ids.py ├── reinforce.py └── trpo.py ├── monitoring ├── __init__.py ├── monitor.py ├── stats.py ├── vplot.py └── vplot_manager.py ├── optimizers ├── __init__.py ├── grad_clip.py ├── natural_grad.py └── opt_conf.py ├── schedules ├── __init__.py ├── const_schedule.py ├── exponential_decay.py ├── linear_schedule.py ├── piecewise_schedule.py ├── schedule.py └── utils.py ├── tf_utils ├── __init__.py ├── blr.py ├── cg.py ├── distributions.py ├── inverse.py ├── ops.py └── tf_utils.py └── utils ├── __init__.py ├── layouts.py ├── maker.py ├── rltf_conf.py ├── rltf_log.py └── seeding.py /.gitignore: -------------------------------------------------------------------------------- 1 | .gym 2 | trained_models/ 3 | /**/__pycache__ 4 | Notes.md 5 | plot/conf 6 | plot/restore.py 7 | .directory 8 | .pylintrc 9 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2017 Nikolay Nikolov 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # RLTF: Reinforcement Learning in TensorFlow 2 | RLTF is a research framework that provides high-quality implementations of common Reinforcement Learning algorithms. It also allows fast-prototyping and benchmarking of new methods. 3 | 4 | **Status**: This work is under active development (breaking changes might occur). 5 | 6 | ## Implemented Algorithms 7 | 8 | | Algorithm | Model | Agent | 9 | | --- | --- | --- | 10 | | [DQN](https://www.nature.com/articles/nature14236) | [DQN](rltf/models/dqn.py) | [AgentDQN](rltf/agents/dqn_agent.py) | 11 | | [Double DQN](https://arxiv.org/abs/1509.06461) | [DDQN](rltf/models/ddqn.py) | [AgentDQN](rltf/agents/dqn_agent.py) | 12 | | [Dueling DQN](https://arxiv.org/abs/1511.06581) | next | next | 13 | | [Prioritized Experience Replay](https://arxiv.org/abs/1511.05952) | next | next | 14 | | [C51](https://arxiv.org/abs/1707.06887) | [C51](rltf/models/c51.py) | [AgentDQN](rltf/agents/dqn_agent.py) | 15 | | [QR-DQN](https://arxiv.org/abs/1710.10044) | [QRDQN](rltf/models/qr_dqn.py) | [AgentDQN](rltf/agents/dqn_agent.py) | 16 | | [Bootstrapped DQN](https://arxiv.org/pdf/1602.04621.pdf) | [BstrapDQN](rltf/models/bstrap_dqn.py) | [AgentDQN](rltf/agents/dqn_agent.py) | 17 | | [Bootstrapped UCB](https://arxiv.org/pdf/1706.01502.pdf) | [DQN_UCB](rltf/models/dqn_ucb.py) | [AgentDQN](rltf/agents/dqn_agent.py) | 18 | | [DQN Ensemble](https://arxiv.org/pdf/1706.01502.pdf) | [DQN_Ensemble](rltf/models/dqn_ensemble.py) | [AgentDQN](rltf/agents/dqn_agent.py) | 19 | | [BDQN](https://arxiv.org/abs/1802.04412) | [BDQN](rltf/models/bdqn.py) | [AgentBDQN](rltf/agents/dqn_agent.py) | 20 | | [DQN-IDS](https://arxiv.org/abs/1812.07544) | [DQN-IDS](rltf/models/dqn_ids.py) | [AgentDQN](rltf/agents/dqn_agent.py) | 21 | | [C51-IDS](https://arxiv.org/abs/1812.07544) | [C51-IDS](rltf/models/c51_ids.py) | [AgentDQN](rltf/agents/dqn_agent.py) | 22 | | [DDPG](https://arxiv.org/abs/1509.02971) | [DDPG](rltf/models/ddpg.py) | [AgentDDPG](rltf/agents/ddpg_agent.py) | 23 | | [REINFORCE](http://www-anw.cs.umass.edu/~barto/courses/cs687/williams92simple.pdf) | [REINFORCE](rltf/models/reinforce.py) | [AgentPG](rltf/agents/pg_agent.py) | 24 | | [PPO](https://arxiv.org/abs/1707.06347) | [PPO](rltf/models/ppo.py) | [AgentPPO](rltf/agents/ppo_agent.py) | 25 | | [TRPO](https://arxiv.org/abs/1502.05477) | [TRPO](rltf/models/trpo.py) | [AgentTRPO](rltf/agents/trpo_agent.py) | 26 | 27 | 28 | Coming additions: 29 | - MPI support for policy gradients 30 | - Dueling DQN 31 | - Prioritized Experience Replay 32 | - n-step returns 33 | - Rainbow 34 | 35 | 36 | ## Reproducibility and Known Issues 37 | Implemented models are able to achieve comparable results to the ones reported 38 | in the corresponding papers. With tiny exceptions, all implementations should be 39 | equivalent to the ones described in the original papers. 40 | 41 | Implementations known to misbehave: 42 | - QR-DQN (in progress) 43 | 44 | 45 | ## About 46 | 47 | The goal of this framework is to provide stable implementations of standard 48 | RL algorithms and simultaneously enable fast prototyping of new methods. 49 | Some important features include: 50 | - Exact reimplementation and competitive performance of original papers 51 | - Unified and reusable modules 52 | - Clear hierarchical structure and easy code control 53 | - Efficient GPU utilization and fast training 54 | - Detailed logs of hyperparameters, train and eval scores, git diff, TensorBoard visualizations 55 | - Episode video recordings with plots of network outputs 56 | - Compatible with OpenAI gym, MuJoCo, PyBullet and Roboschool 57 | - Restoring the training process from where it stopped, retraining on a new task, fine-tuning 58 | 59 | 60 | ## Installation 61 | 62 | ### Dependencies 63 | - Python >= 3.5 64 | - Tensorflow >= 1.6.0 65 | - OpenAI gym >= 0.9.6 66 | - opencv-python (either pip package or OpenCV library with python bindings) 67 | - matplotlib (with TkAgg backend) 68 | - pybullet (optional) 69 | - roboschool (optional) 70 | 71 | ### Install 72 | ``` 73 | git clone https://github.com/nikonikolov/rltf.git 74 | ``` 75 | pip package coming soon 76 | 77 | ## Documentation 78 | For brief documentation see [docs/](docs/). 79 | 80 | If you use this repository for you research, please cite: 81 | ``` 82 | @misc{rltf, 83 | author = {Nikolay Nikolov}, 84 | title = {RLTF: Reinforcement Learning in TensorFlow}, 85 | year = {2018}, 86 | publisher = {GitHub}, 87 | journal = {GitHub repository}, 88 | howpublished = {\url{https://github.com/nikonikolov/rltf}}, 89 | } 90 | ``` 91 | -------------------------------------------------------------------------------- /docs/Usage.md: -------------------------------------------------------------------------------- 1 | ## Structure Overview 2 | 3 | All algorithms are composed of two parts: `Agent` and `Model` 4 | 5 | ### Agent 6 | - Should inherit from the [`Agent`](rltf/agents/agent.py) class 7 | - Provides communication interface between the [`Model`](rltf/models/model.py) and the environment 8 | - Executes the exact training procedure 9 | - Responsible for 10 | - Stepping the environment 11 | - Running a training step 12 | - Storing experience for training 13 | 14 | ### Model 15 | - Should inherit from the [`Model`](rltf/models/model.py) class 16 | - A passive component which only implements the Tensorflow computation graph for the network 17 | - Implements forward and backward network pass, exposes useful input and output Tensors and Operations 18 | - Controlled by the [`Agent`](rltf/agents/agent.py) during training and evaluation 19 | 20 | ------------------------------------------------------------------------------- 21 | 22 | ## Execution 23 | 24 | The data for separate runs is stored on disk under the template directory path 25 | `trained_models//__