├── .demo ├── pong-v4_dist4c_nonsmooth.png ├── pong-v4_dist4c_smooth.png ├── pong-v4_double.png ├── pong-v4_dueling.png ├── pong-v4_origsingle.png ├── pong-v4_prioritized.png ├── pong-v4_various_smooth.png ├── tensorboard_text.png └── test_agent.gif ├── .gitignore ├── LICENSE ├── README.md ├── dev_setup.py ├── examples ├── README.md ├── deep_q_learning │ ├── README.md │ ├── atari_play.py │ ├── gym_workout.py │ ├── run_project │ │ ├── ATARI.sh │ │ ├── GYM.sh │ │ ├── TEST.sh │ │ ├── atari_config.yaml │ │ └── gym_config.yaml │ └── test.py ├── gorila_dqn │ ├── README.md │ ├── client.py │ ├── launcher.py │ ├── run_project │ │ ├── ATARI.sh │ │ ├── TEST.sh │ │ └── config.yaml │ ├── server.py │ └── test.py └── prioritized_dqn │ ├── README.md │ ├── learn.py │ ├── run_project │ ├── ATARI.sh │ ├── TEST.sh │ └── config.yaml │ └── test.py └── pytorl ├── README.md ├── __init__.py ├── agents ├── DQN.py ├── __init__.py ├── _base_agent.py └── dist_DQN.py ├── distributed ├── __init__.py ├── _slurm.py ├── async_ops.py ├── initialize.py ├── param_server.py └── sync_ops.py ├── envs ├── __init__.py ├── _base_env.py ├── ale_atari.py └── gym_ctrl.py ├── lib ├── __init__.py ├── _tree.py ├── explore.py └── replay.py ├── networks ├── __init__.py ├── atari_conv.py ├── ctrl_mlp.py └── io.py ├── settings ├── __init__.py └── entries.py └── utils ├── __init__.py ├── config.py ├── decorators.py └── recorder.py /.demo/pong-v4_dist4c_nonsmooth.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kareido/pytorl/2f2f5258425166b8bfbde985a229fecdef3752d9/.demo/pong-v4_dist4c_nonsmooth.png -------------------------------------------------------------------------------- /.demo/pong-v4_dist4c_smooth.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kareido/pytorl/2f2f5258425166b8bfbde985a229fecdef3752d9/.demo/pong-v4_dist4c_smooth.png -------------------------------------------------------------------------------- /.demo/pong-v4_double.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kareido/pytorl/2f2f5258425166b8bfbde985a229fecdef3752d9/.demo/pong-v4_double.png -------------------------------------------------------------------------------- /.demo/pong-v4_dueling.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kareido/pytorl/2f2f5258425166b8bfbde985a229fecdef3752d9/.demo/pong-v4_dueling.png -------------------------------------------------------------------------------- /.demo/pong-v4_origsingle.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kareido/pytorl/2f2f5258425166b8bfbde985a229fecdef3752d9/.demo/pong-v4_origsingle.png -------------------------------------------------------------------------------- /.demo/pong-v4_prioritized.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kareido/pytorl/2f2f5258425166b8bfbde985a229fecdef3752d9/.demo/pong-v4_prioritized.png -------------------------------------------------------------------------------- /.demo/pong-v4_various_smooth.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kareido/pytorl/2f2f5258425166b8bfbde985a229fecdef3752d9/.demo/pong-v4_various_smooth.png -------------------------------------------------------------------------------- /.demo/tensorboard_text.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kareido/pytorl/2f2f5258425166b8bfbde985a229fecdef3752d9/.demo/tensorboard_text.png -------------------------------------------------------------------------------- /.demo/test_agent.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kareido/pytorl/2f2f5258425166b8bfbde985a229fecdef3752d9/.demo/test_agent.gif -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | # jupyter notebook for testing 2 | test_playground*.ipynb 3 | debug_playground*.ipynb 4 | visualization.ipynb 5 | .ipynb_checkpoints 6 | 7 | # Byte-compiled / optimized / DLL files 8 | __pycache__/ 9 | *.py[cod] 10 | *$py.class 11 | 12 | # network state_dict 13 | *.pth 14 | 15 | # train log 16 | log.txt 17 | 18 | # tensorboard files 19 | events.out.tfevents.* 20 | 21 | # setting file(s): 22 | pytorl.yaml 23 | 24 | # others: 25 | .DS_Store 26 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2019 Zhe Huang 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # PyToRL: PyTorch Toolbox for Reinforcement Learning 2 | ### [PROJECT CURRENTLY UNDER DEVELOPMENT] 3 | 4 | 5 | 6 | **Simple Description:** 7 |
8 | This project, named pytorl, is intended to be an RL toolbox for pytorch and 9 | contains RL algorithm implementations using this pytorl toolbox. As I am 10 | currently learning RL, I am going to update this project with other agents, 11 | algorithms and faster or more efficient implementations soon. 12 |13 | 14 | 15 | 16 | **Current Progress:** 17 |
18 | Implemented 4 DQN(and its variants) algorithms and a distributed DQN learning 19 | algorithm named Gorila via parameter server architecture. Will move on to A2C, 20 | A3C, TRPO, PPO ... 21 | 22 | Note that I use slurm for distributed RL training since my work is done on 23 | clusters, but I still provide a "local run" option which helps the project runs 24 | without slurm :). 25 |26 | 27 | 28 | 29 | **Some Dependencies Currently Used for Developing:** 30 | > gym == 0.10.11 with atari 31 | > numpy == 1.14.3 32 | > python == 3.6.5 33 | > pytorch == 1.0.1 34 | > tensorboard = 1.9.0 35 | > tensorboardX == 1.4 36 | 37 | 38 | 39 | **Simple Setup from Scratch:** 40 | ```bash 41 | # 1. clone this repo to local 42 | $ git clone