├── .gitignore ├── ES for RL.md ├── LFHF.md ├── MARL-basics.md ├── README.md ├── RL-Metalearning.md ├── RL-basics.md ├── RL4Control.md ├── RL4DrugDiscovery.md ├── RL4Game.md ├── RL4IIoT.md ├── RL4IL.md ├── RL4Policy-Diversity.md ├── RL4QD.md ├── RL4Robot.md └── Tools.md /.gitignore: -------------------------------------------------------------------------------- 1 | .DS_Store -------------------------------------------------------------------------------- /ES for RL.md: -------------------------------------------------------------------------------- 1 | # Combining Evolution and Deep Reinforcement Learning for Policy Search: a Survey 2 | 3 | Contributors: 4 | 5 | ## **Policy Search** 6 | 7 | 1. **PEPG:** [Parameter-exploring policy gradients](https://www.sciencedirect.com/science/article/abs/pii/S0893608009003220), Sehnke F et al, 2010, Neural Networks. 8 | 2. **NES:** [Natural evolution strategies](https://www.jmlr.org/papers/volume15/wierstra14a/wierstra14a.pdf), Wierstra D et al, 2014, The Journal of Machine Learning Research. 9 | 3. **OpenAI-ES:** [Evolution strategies as a scalable alternative to reinforcement learning](https://arxiv.org/abs/1703.03864), Salimans T et al, 2017. 10 | 4. **GA:** [Deep neuroevolution: Genetic algorithms are a competitive alternative for training deep neural networks for reinforcement learning](https://arxiv.org/abs/1712.06567), Such F P et al, 2017. 11 | 5. **NS-ES/NSR-ES/NSRA-ES:** [Improving exploration in evolution strategies for deep reinforcement learning via a population of novelty-seeking agents](https://proceedings.neurips.cc/paper/2018/hash/b1301141feffabac455e1f90a7de2054-Abstract.html), Conti E et al, 2018, NeurIPS. 12 | 6. **TRES:** [Trust region evolution strategies](https://ojs.aaai.org/index.php/AAAI/article/view/4345), Liu G et al, 2019, AAAI. 13 | 7. **Guided ES:** [Guided evolutionary strategies: Augmenting random search with surrogate gradients](http://proceedings.mlr.press/v97/maheswaranathan19a.html), Maheswaranathan N et al, 2019, ICML. 14 | 8. **PBT:** [Population based training of neural networks](https://arxiv.org/abs/1711.09846), Jaderberg M et al, 2017. 15 | 9. **PB2:** [Provably efficient online hyperparameter optimization with population-based bandits](https://proceedings.neurips.cc/paper/2020/hash/c7af0926b294e47e52e46cfebe173f20-Abstract.html), Parker-Holder J et al, 2020, Advances in Neural Information Processing Systems. 16 | 10. **SEARL:** [Sample-efficient automated deep reinforcement learning](https://arxiv.org/abs/2009.01555), Franke J K H et al, 2020. 17 | 11. **DERL:** [Embodied intelligence via learning and evolution](https://www.nature.com/articles/s41467-021-25874-z), Gupta A et al, 2021, Nature communications. 18 | 19 | ****** 20 | ## **Experience-guided** 21 | 22 | 1. **ERQL:** [Bootstrapping $ q $-learning for robotics from neuro-evolution results](https://ieeexplore.ieee.org/abstract/document/7879193), Zimmer M et al, 2017, IEEE. 23 | 2. **GRP-PG:** [Gep-pg: Decoupling exploration and exploitation in deep reinforcement learning algorithms](https://proceedings.mlr.press/v80/colas18a.html), Colas C et al, 2018, ICML. 24 | 3. **ERL:** [Evolution-guided policy gradient in reinforcement learning](https://proceedings.neurips.cc/paper/2018/hash/85fc37b18c57097425b52fc7afbb6969-Abstract.html), Khadka S et al, 2018, NeurIPS. 25 | 4. **CEM-RL:** [CEM-RL: Combining evolutionary and gradient-based methods for policy search](https://arxiv.org/abs/1810.01222), Pourchot A et al, 2018. 26 | 5. **CERL:** [Collaborative evolutionary reinforcement learning](https://proceedings.mlr.press/v97/khadka19a.html), Khadka S et al, 2019, ICML. 27 | 6. **PDERL:** [Proximal distilled evolutionary reinforcement learning](https://ojs.aaai.org/index.php/AAAI/article/view/5728), Bodnar C et al, 2020, AAAI. 28 | 7. **RIM:** [Recruitment-imitation mechanism for evolutionary reinforcement learning](https://www.sciencedirect.com/science/article/abs/pii/S0020025520311828), Lü S et al, 2021, Information Sciences. 29 | 8. **ESAC:** [Maximum mutation reinforcement learning for scalable control](https://arxiv.org/abs/2007.13690), Suri K et al, 2020. 30 | 9. **QD-RL:** [Qd-rl: Efficient mixing of quality and diversity in reinforcement learning](https://arxiv.org/abs/2006.08505), Cideron G et al, 2020. 31 | 10. **SUPE-RL:** [Genetic soft updates for policy evolution in deep reinforcement learning](https://openreview.net/forum?id=TGFO0DbD_pk), Marchesini E et al, 2020, ICLR. 32 | 33 | ****** 34 | ## **Modules-embedded** 35 | 36 | 1. **PPO-CMA:** [PPO-CMA: Proximal policy optimization with covariance matrix adaptation](https://ieeexplore.ieee.org/abstract/document/9231618), Hämäläinen P et al, 2020, IEEE. 37 | 2. **EPG:** [Evolved policy gradients](https://proceedings.neurips.cc/paper/2018/hash/7876acb66640bad41f1e1371ef30c180-Abstract.html), Houthooft R et al, 2018, NeurIPS. 38 | 3. **CGP:** [Q-learning for continuous actions with cross-entropy guided policies](https://arxiv.org/abs/1903.10605), Simmons-Edler R et al, 2019. 39 | 4. **GRAC:** [Grac: Self-guided and self-regularized actor-critic](https://proceedings.mlr.press/v164/shao22a.html), Shao L et al, 2022, CoRL. -------------------------------------------------------------------------------- /LFHF.md: -------------------------------------------------------------------------------- 1 | 2 | 3 | * [Learning to summarize from human feedback](https://arxiv.org/abs/2009.01325) 4 | > OpenAI采用LFHF技术在NLP领域的初步尝试 5 | * Raining a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback 6 | > Anthropic团队尝试用LFHF技术解决Harmless Assistant问题 7 | * [InstructGPT: Training language models to follow instructions with human feedback](https://arxiv.org/abs/2203.02155) 8 | > OpenAI采用LFHF技术在NLP领域的进一步尝试,也是ChatGPT的前身 9 | * Constitutional AI: Harmlessness from AI Feedback 10 | > 针对Human feedback效率低的问题,提出AI feedback方案 11 | * Scaling Laws for Reward Model Overoptimization 12 | > 分析 Reward Model 细节的文章 -------------------------------------------------------------------------------- /MARL-basics.md: -------------------------------------------------------------------------------- 1 | # Paper Collection of MARL 2 | 3 | Contributors: 4 | ## MARL Basic 5 | 6 | ### CTDE : Centralized Training, Decentralized Execution 7 | 1. VDN :[Value-decomposition networks for cooperative multi-agent learning](https://arxiv.org/pdf/1706.05296.pdf), Sunehag P, et al 2017. 8 | 2. QMIX : [Qmix: Monotonic value function factorisation for deep multi-agent reinforcement learning](http://proceedings.mlr.press/v80/rashid18a/rashid18a.pdf) Rashid T, et al 2018, ICML 9 | 3. QTRAN : [Qtran: Learning to factorize with transformation for cooperative multi-agent reinforcement learning](http://proceedings.mlr.press/v97/son19a/son19a.pdf) Son K, et al 2019, ICML 10 | 4. QATTEN : [Qatten: A general framework for cooperative multiagent reinforcement learning](https://arxiv.org/pdf/2002.03939.pdf) Yang Y, et al 2020. 11 | 3. MADDPG : [Multi-agent actor-critic for mixed cooperative-competitive environments](https://proceedings.neurips.cc/paper/2017/file/68a9750337a418a86fe06c1991a1d64c-Paper.pdf) Lowe R et al 2017, NIPS 12 | 4. COMA : [Counterfactual multi-agent policy gradients Foerster](https://ojs.aaai.org/index.php/AAAI/article/download/11794/11653) J et al 2018, AAAI 13 | 8. MAPPO : [Joint optimization of handover control and power allocation based on multi-agent deep reinforcement learning](https://github.com/cr-bh/on-policy) Guo D, et al 2020 14 | 9. HATRPO & HAPPO : [Trust region policy optimisation in multi-agent reinforcement learning](https://arxiv.org/pdf/2109.11251.pdf) Kuba J G, et al 2021 15 | 7. MA3C : [Asynchronous Actor-Critic for Multi-Agent Reinforcement Learning](https://arxiv.org/pdf/2209.10113.pdf) Xiao Y, et al 2022 16 | 17 | ### DTDE : Decentralized Training, Decentralized Execution 18 | IPPO : [Is independent learning all you need in the starcraft multi-agent challenge?](https://arxiv.org/pdf/2011.09533) de Witt C S, et al 2020 19 | 20 | ### Communication 21 | 1. RIAL & DIAL: [Learning to communicate with deep multi-agent reinforcement learning](https://proceedings.neurips.cc/paper/2016/file/c7635bfd99248a2cdef8249ef7bfbef4-Paper.pdf) Foerster J, et al 2016, NIPS 22 | 2. CommNet : [Learning multiagent communication with backpropagation](https://proceedings.neurips.cc/paper/2016/file/55b1927fdafef39c48e5b73b5d61ea60-Paper.pdf) Sukhbaatar S, et al 2016, NIPS 23 | 3. BicNet : [Multiagent bidirectionally-coordinated nets: Emergence of human-level coordination in learning to play starcraft combat games](https://arxiv.org/pdf/1703.10069.pdf) Peng P, et al 2017. 24 | 4. ATOC : [Learning attentional communication for multi-agent cooperation](https://proceedings.neurips.cc/paper/2018/file/6a8018b3a00b69c008601b8becae392b-Paper.pdf) Jiang J, et al 2018, NIPS 25 | 5. IC3Net : [Learning when to communicate at scale in multiagent cooperative and competitive tasks](https://arxiv.org/pdf/1812.09755.pdf) Singh A, et al 2018 26 | 6. Tramac : [Tarmac: Targeted multi-agent communication](http://proceedings.mlr.press/v97/das19a/das19a.pdf) Das A, et al 2019, ICML 27 | 7. NDQ : [Learning nearly decomposable value functions via communication minimization](https://arxiv.org/pdf/1910.05366.pdf) Wang T, et al 2019 28 | 8. SchedNet : [Learning to schedule communication in multi-agent reinforcement learning](https://arxiv.org/pdf/1902.01554.pd) Kim D, et al 2019 29 | 9. [Social influence as intrinsic motivation for multi-agent deep reinforcement learning](http://proceedings.mlr.press/v97/jaques19a/jaques19a.pdf) Jaques N, et al 2019, ICML 30 | 10. Infobot : [Infobot: Transfer and exploration via the information bottleneck](https://arxiv.org/pdf/1901.10902.pdf) Goyal A, et al 2019 31 | 32 | ## MARL FOR MAPF 33 | 34 | 1. PRMIAL : [Primal: Pathfinding via reinforcement and imitation multi-agent learning](https://ieeexplore.ieee.org/ielaam/7083369/8668830/8661608-aam.pdf) Sartoretti G, et al 2019, ICRA 35 | 3. MARLSP : [Learning to cooperate: Application of deep reinforcement learning for online AGV path finding](https://ifaamas.org/Proceedings/aamas2020/pdfs/p2077.pdf) Zhang Y, et al 2020, AAMAS 36 | 4. MAPPER: [Mapper: Multi-agent path planning with evolutionary reinforcement learning in mixed dynamic environments](https://arxiv.org/pdf/2007.15724) Liu Z, et a 2020, IROS 37 | 5. G2RL : [Mobile robot path planning in dynamic environments through globally guided reinforcement learning](https://arxiv.org/pdf/2005.05420) Wang B, et al 2020 38 | 2. PRMIAL2 : [PRIMAL2: Pathfinding via reinforcement and imitation multi-agent learning-lifelong](https://arxiv.org/pdf/2010.08184) Damani M, et al 2021, ICRA 39 | 6. DHC :[Distributed heuristic multi-agent path finding with communication](https://arxiv.org/pdf/2106.11365) Ma Z, et al 2021, ICRA 40 | 7. PICO : [Multi-Agent Path Finding with Prioritized Communication](https://arxiv.org/pdf/2202.03634) Learning Li W, et al 2022 -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | 2 | 3 | # Paper Collection of RL and Its Applications 4 | This repo mainly collects paper of RL (Reinforcement Learning) and Its Applications, and also tools including datasets , envs and frameworks usually used in RL. 5 | 6 | ## Paper lists 7 | 8 | [RL-basics](./RL-basics.md): basic papers of RL, if you want to learn RL, you must not miss these. 9 | 10 | [MARL-basics](./MARL-basics.md): basic papers of multi-agent reinforcement learning(MARL), if you want to learn RL, you must not miss these. 11 | 12 | [RL4RS](): RL for recommendation systems 13 | 14 | [RL4Game](./RL4Game.md): RL for game theory 15 | 16 | [RL4Traffics]() 17 | 18 | [RL4Policy-Diversity]() 19 | 20 | [RL4DrugDiscovery](./RL4DrugDiscovery.md): Drug discovery is a challenging multi-objective optimization problem where multiple pharmaceutical objectives need to be satisfied. Recently, utilizing reinforcement learning to generate molecules with desired physicochemical properties such as solubility has been acknowledged as a promising strategy for drug design. 21 | 22 | [RL4QD](./RL4QD.md): Quality-Diversity methods are evolutionary based algorithms to return the collection contains several working solutions, and also deal with the exploration-exploitation trade-off. 23 | 24 | [RL4IL](./RL4IL.md): RL for imitation learning 25 | 26 | [RL4Robot](./RL4Robot.md): RL for Robot. According to the classification of robot types, papers of the same category are arranged in chronological order, and papers that have been physically verified are preferred. 27 | 28 | [RL4IIoT](./RL4IIoT.md): With the technological breakthrough of 5G, more and more Internet of Things (IoT) technologies are being used in industrial scenarios. Industrial IoT (IIoT), which refers to the integrating industrial manufacturing systems and the Internet of Things (IoT), has received accumulating attention. These emerging IIoT applications and have higher requirements on quality of experience (QoE) which cannot be easily satisfied by heuristic algorithms. Recently, some research use RL to learn algorithms for IIoT tasks through exploiting the potential feature of the IIoT environment, 29 | 30 | [LFHF](./LFHF.md): Learn From Human Feedback,ChatGPT的核心技术之一。 31 | 32 | ## Tools 33 | 34 | [Tools](./Tools.md): including datasets , envs and frameworks 35 | 36 | 37 | 38 | ## Main Contributors 39 | 40 | 41 | 42 | 43 | 48 | 53 | 58 | 63 | 68 | 73 | 74 | 75 |
44 | pic
45 | Ariel Chen 46 |

MARL-basics
THU

47 |
49 | pic
50 | Yongqi Li 51 |

RL4Robotics&MRS
SUSTech

52 |
54 | pic
55 | Erlong Liu 56 |

QD&ERL
NJU

57 |
59 | pic
60 | Wen Qiu 61 |

DQN&PG&Exploration
KIT

62 |
64 | pic
65 | Kejian Shi 66 |

RL&Robotics
IC

67 |
69 | pic
70 | John Jim 71 |

offline RL
PKU

72 |
76 | -------------------------------------------------------------------------------- /RL-Metalearning.md: -------------------------------------------------------------------------------- 1 | # Meta-learning 2 | 3 | Authors: [Tienyu Zuo](https://github.com/TienyuZuo) 4 | * [Meta-learning from Learning Curves: Challenge Design and Baseline Results](https://ieeexplore.ieee.org/document/9892534), Nguyen et al, 2022, IJCNN. 5 | 6 | * [Exploration With Task Information for Meta Reinforcement Learning](https://ieeexplore.ieee.org/document/9604770), Peng et al, 2021, IEEE Trans. NNLS. 7 | 8 | * [Meta-Learning-Based Deep Reinforcement Learning for Multiobjective Optimization Problems](https://ieeexplore.ieee.org/document/9714721), Zhang et al, 2022, IEEE Trans. NNLS. 9 | 10 | * [Meta-Reinforcement Learning With Dynamic Adaptiveness Distillation](https://ieeexplore.ieee.org/document/9525812), Hu et al, 2021, IEEE Trans. NNLS. 11 | 12 | * [Meta-Reinforcement Learning in Non-Stationary and Dynamic Environments](https://ieeexplore.ieee.org/document/9804728), Bing et al, 2022, IEEE Trans. PAMI. 13 | 14 | * [MURAL: Meta-Learning Uncertainty-Aware Rewards for Outcome-Driven Reinforcement Learning](https://proceedings.mlr.press/v139/li21g.html), Gupta et al, 2021, ICML. 15 | 16 | 17 | 18 | # Multi-task 19 | 20 | * [Prioritized Sampling with Intrinsic Motivation in Multi-Task Reinforcement Learning](https://ieeexplore.ieee.org/document/9892973), D'Eramo et al, 2022, IJCNN. 21 | 22 | * [A Multi-Task Learning Framework for Head Pose Estimation under Target Motion](https://ieeexplore.ieee.org/document/7254213), Yan et al, 2015, IEEE Trans. PAMI. 23 | 24 | * [Multi-Task Reinforcement Learning in Reproducing Kernel Hilbert Spaces via Cross-Learning](https://ieeexplore.ieee.org/document/9585424), Cerviño et al, 2021, IEEE Trans. SP. 25 | 26 | * [Multi-Task Reinforcement Learning with Soft Modularization](https://proceedings.neurips.cc/paper/2020/hash/32cfdce9631d8c7906e8e9d6e68b514b-Abstract.html), Yang et al, 2020, NIPS. 27 | 28 | * [Provably efficient multi-task reinforcement learning with model transfer](https://proceedings.neurips.cc/paper/2021/hash/a440a3d316c5614c7a9310e902f4a43e-Abstract.html), Zhang et al, 2021, NIPS. 29 | 30 | * [Multi-Task Deep Reinforcement Learning with PopArt](https://ojs.aaai.org/index.php/AAAI/article/view/4266), Hessel et al, 2019, AAAI. 31 | 32 | 33 | 34 | # Hierarchical RL 35 | 36 | * H-DQN: [Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation](https://proceedings.neurips.cc/paper/2016/file/f442d33fa06832082290ad8544a8da27-Paper.pdf), Kulkarni et al, 2016, NIPS. 37 | 38 | * [HierRL: Hierarchical Reinforcement Learning for Task Scheduling in Distributed Systems](https://ieeexplore.ieee.org/document/9892507), Guan et al, 2022, IJCNN. 39 | * [Data-Efficient Hierarchical Reinforcement Learning](https://proceedings.neurips.cc/paper/2018/hash/e6384711491713d29bc63fc5eeb5ba4f-Abstract.html), Nachum et al, 2018, NIPS. 40 | * [FeUdal Networks for Hierarchical Reinforcement Learning](http://proceedings.mlr.press/v70/vezhnevets17a.html), Vezhnevets et al, 2017, ICML. 41 | * [Hierarchical Reinforcement Learning with Advantage-Based Auxiliary Rewards](https://proceedings.neurips.cc/paper/2019/hash/81e74d678581a3bb7a720b019f4f1a93-Abstract.html), Li et al, 2019, NIPS. 42 | 43 | 44 | 45 | # Order dispatching 46 | 47 | * [A Hierarchical Reinforcement Learning Based Optimization Framework for Large-scale Dynamic Pickup and Delivery Problems](https://proceedings.neurips.cc/paper/2021/hash/c6a01432c8138d46ba39957a8250e027-Abstract.html), Ma et al, 2021, NIPS. 48 | 49 | * [Deep Reinforcement Learning for Ride-sharing Dispatching and Repositioning](https://www.ijcai.org/proceedings/2019/0958.pdf), Qin et al, 2019, IJCAI. 50 | 51 | * [A City-Wide Crowdsourcing Delivery System with Reinforcement Learning](https://dl.acm.org/doi/abs/10.1145/3478117), Ding et al, 2021. 52 | 53 | * [Deep Reinforcement Learning with Knowledge Transfer for Online Rides Order Dispatching](https://ieeexplore.ieee.org/document/8594886), Wang et al, 2018, ICDM. 54 | 55 | * [Combinatorial Optimization Meets Reinforcement Learning: Effective Taxi Order Dispatching at Large-Scale](https://ieeexplore.ieee.org/document/9611023), Tong et al, 2021, IEEE Trans. KDE. 56 | 57 | * [An Integrated Reinforcement Learning and Centralized Programming Approach for Online Taxi Dispatching](https://ieeexplore.ieee.org/document/9366995), Liang et al, 2021, IEEE Trans. NNLS. 58 | 59 | * [Context-Aware Taxi Dispatching at City-Scale Using Deep Reinforcement Learning](https://ieeexplore.ieee.org/document/9247444), Liu et al, 2020, IEEE Trans. ITS. 60 | 61 | * [A Learning and Operation Planning Method for Uber Energy Storage System: Order Dispatch](https://ieeexplore.ieee.org/document/9868255), Tao et al, 2022, IEEE Trans. ITS. 62 | 63 | * [Distributed Q -Learning-Based Online Optimization Algorithm for Unit Commitment and Dispatch in Smart Grid](https://ieeexplore.ieee.org/document/8746822), Li et al, 2019, IEEE Trans. Cyb. 64 | 65 | * [Efficient Large-Scale Fleet Management via Multi-Agent Deep Reinforcement Learning](https://dl.acm.org/doi/abs/10.1145/3219819.3219993), Lin et al, 2018, KDD. 66 | 67 | * [PassGoodPool: Joint Passengers and Goods Fleet Management With Reinforcement Learning Aided Pricing, Matching, and Route Planning](https://ieeexplore.ieee.org/abstract/document/9655445), Manchella et al, 2021, IEEE Trans. ITS. 68 | 69 | * [Deep Reinforcement Learning for Multi-driver Vehicle Dispatching and Repositioning Problem](https://ieeexplore.ieee.org/abstract/document/8970873), Holler et al, 2018, ICDM. 70 | 71 | * [Supply-Demand-aware Deep Reinforcement Learning for Dynamic Fleet Management](https://dl.acm.org/doi/full/10.1145/3467979), Zheng et al, 2022. 72 | 73 | * [AdaPool: A Diurnal-Adaptive Fleet Management Framework Using Model-Free Deep Reinforcement Learning and Change Point Detection](https://ieeexplore.ieee.org/abstract/document/9565816), Haliem et al, 2021, IEEE Trans. ITS. 74 | 75 | * [CoRide: Joint Order Dispatching and Fleet Management for Multi-Scale Ride-Hailing Platforms](https://dl.acm.org/doi/abs/10.1145/3357384.3357978), Jin et al, 2019, CIKM. 76 | 77 | 78 | 79 | # Auction 80 | 81 | * [Reinforcement-Learning- and Belief-Learning-Based Double Auction Mechanism for Edge Computing Resource Allocation](https://ieeexplore.ieee.org/document/8896972), Li et al, 2019, IEEE IoT Journal. 82 | * [Intelligent EV Charging for Urban Prosumer Communities: An Auction and Multi-Agent Deep Reinforcement Learning Approach](https://ieeexplore.ieee.org/document/9737233), Zou et al, 2022, IEEE Trans. NSM. 83 | * [Comparisons of Auction Designs through Multi-Agent Learning in Peer-to-Peer Energy Trading](https://ieeexplore.ieee.org/document/9828543), Zhao et al, 2022, IEEE Trans. SG. 84 | * [Coordination for Multi-Energy Microgrids Using Multi-Agent Reinforcement Learning](https://ieeexplore.ieee.org/document/9760021), Qiu et al, 2022, IEEE Trans. II. 85 | * [Multi-Agent Reinforcement Learning for Automated Peer-to-Peer Energy Trading in Double-Side Auction Market](https://www.ijcai.org/proceedings/2021/0401.pdf), Qiu et al, 2021, IJCAI. 86 | -------------------------------------------------------------------------------- /RL-basics.md: -------------------------------------------------------------------------------- 1 | 9 | ## Paper Collection of RL basics 10 | 11 | Contributors: 12 | 13 | ### Review papers 14 | 15 | * Offline RL: [A Survey on Offline Reinforcement Learning: Taxonomy, Review, and Open Problems](https://arxiv.org/abs/2203.01387), Rafael Figueiredo Prudencio et al, 2022 16 | 17 | ### DQN Related 18 | 19 | * DQN: [Playing Atari with Deep Reinforcement Learning](https://arxiv.org/pdf/1312.5602v1.pdf), V.Mnih et al 2013. 20 | > Typical DQN paper 21 | * DQN: [Human-level control through deep reinforcement learning](https://storage.googleapis.com/deepmind-media/dqn/DQNNaturePaper.pdf), V.Mnih et al 2015, Nature. 22 | > Narture DQN, Compared to typical DQN paper, proposes a periodically updated target Q to address instabilities, which is more common today 23 | * DoubleDQN: [Deep Reinforcement Learning with Double Q-Learning](https://ojs.aaai.org/index.php/AAAI/article/view/10295), H Van Hasselt et al 2016, AAAI. 24 | * DuelingDQN: [Dueling Network Architectures for Deep Reinforcement Learning](http://proceedings.mlr.press/v48/wangf16.pdf), Z Wang et al 2015. 25 | * PER: [Prioritized Experience Replay](https://arxiv.org/pdf/1511.05952.pdf), T Schaul et al 2015, ICLR. 26 | * Rainbow DQN: [Rainbow: Combining Improvements in Deep Reinforcement Learning](https://arxiv.org/pdf/1710.02298.pdf), M Hessel et al 2017, AAAI. 27 | * DRQN: [Deep Recurrent Q-Learning for Partially Observable MDPs](https://www.aaai.org/ocs/index.php/FSS/FSS15/paper/view/11673/11503), M Hausknecht e al 2015, AAAI. 28 | * Noisy DQN: [Noisy Networks for Exploration](https://arxiv.org/pdf/1706.10295.pdf), M Fortunato et al 2017, ICLR. 29 | * Averaged-DQN: [Averaged-DQN: Variance Reduction and Stabilization for Deep Reinforcement Learning](http://proceedings.mlr.press/v70/anschel17a/anschel17a.pdf), O Anschel et al 2016, ICML. 30 | * C51: [A Distributional Perspective on Reinforcement Learning](https://arxiv.org/abs/1707.06887), 2017, ICML 31 | * [Neural Network Dynamics for Model-Based Deep Reinforcement Learning with Model-Free Fine-Tuning](https://arxiv.org/pdf/1708.02596.pdf),A Nagabandi et al 2017, ICRA. 32 | * [Deep Reinforcement Learning and the Deadly Triad](https://arxiv.org/pdf/1812.02648.pdf), H. V. Hasselt et al 2018. 33 | 34 | ### Policy gradient and related 35 | 36 | * REINFORCE: [Policy Gradient Methods for Reinforcement Learning with Function Approximation](https://proceedings.neurips.cc/paper/1999/file/464d828b85b0bed98e80ade0a5c43b0f-Paper.pdf), Sutton et al, 1999, NIPS 37 | * A3C: [Asynchronous Methods for Deep Reinforcement Learning](http://proceedings.mlr.press/v48/mniha16.pdf), Mnig et al, 2016, ICML. 38 | * TRPO: [Trust Region Policy Optimization](http://proceedings.mlr.press/v37/schulman15.pdf), J. Schulman et al 2015, ICML. 39 | * GAE: [High-Dimensional Continuous Control Using Generalized Advantage Estimation](https://arxiv.org/pdf/1506.02438.pdf), J. Schulman et al 2015, ICLR. 40 | * PPO: [Proximal Policy Optimization Algorithms](https://arxiv.org/pdf/1707.06347.pdf), J. Schulman et al 2017. 41 | > Update in small batches, solve the problem that step size in Policy Gradient algorithm is difficult to determine, and KL divergence as Penalty is easier to solve than TRPO 42 | * Distributed PPO: [Emergence of Locomotion Behaviours in Rich Environments](https://arxiv.org/pdf/1707.02286.pdf), N. Heess et al 2017. 43 | * ACKTR: [Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation](https://proceedings.neurips.cc/paper/2017/file/361440528766bbaaaa1901845cf4152b-Paper.pdf), Y Wu et al 2017, NIPS. 44 | * ACER: [Sample Efficient Actor-Critic with Experience Replay](https://arxiv.org/pdf/1611.01224.pdf), Z Wang et al 2016, ICLR. 45 | * DPG: [Deterministic Policy Gradient Algorithms](http://proceedings.mlr.press/v32/silver14.pdf), D Silver et al 2014, ICML. 46 | * DDPG: [Continuous control with deep reinforcement learning](https://arxiv.org/pdf/1509.02971.pdf), TP Lillicrap et al 2016, ICLR. 47 | * TD3: [Addressing Function Approximation Error in Actor-Critic Methods](http://proceedings.mlr.press/v80/fujimoto18a/fujimoto18a.pdf), S Fujimoto et al 2018, ICML. 48 | * C51: [A Distributional Perspective on Reinforcement Learning](http://proceedings.mlr.press/v70/bellemare17a/bellemare17a.pdf), MG Bellemare et al 2017, ICML. 49 | * Q-Prop:[Q-Prop: Sample-Efficient Policy Gradient with An Off-Policy Critic](https://arxiv.org/pdf/1611.02247.pdf), S Gu et al 2016, ICLR. 50 | * [Action-dependent Control Variates for Policy Optimization via Stein’s Identity](https://arxiv.org/pdf/1710.11198.pdf), H Liu et al 2017, ICLR. 51 | * [The Mirage of Action-Dependent Baselines in Reinforcement Learning](http://proceedings.mlr.press/v80/tucker18a/tucker18a.pdf), G Tucker et al 2018, ICML. 52 | * PCL:[Bridging the Gap Between Value and Policy Based Reinforcement Learning](https://proceedings.neurips.cc/paper/2017/file/facf9f743b083008a894eee7baa16469-Paper.pdf), O Nachum et al 2017, NIPS. 53 | * Trust-PCL:[Trust-PCL: An Off-Policy Trust Region Method for Continuous Control](https://arxiv.org/pdf/1707.01891.pdf), O Nachum et al 2017, CoRR. 54 | * PGQL:[Combining Policy Gradient and Q-learning](https://arxiv.org/pdf/1611.01626.pdf), B O'Donoghue et al 2016, ICLR. 55 | * [The Reactor: A Fast and Sample-Efficient Actor-Critic Agent for Reinforcement Learning](https://arxiv.org/pdf/1704.04651.pdf), A Gruslys et al 2017, ICLR. 56 | * IPG:[Interpolated Policy Gradient: Merging On-Policy and Off-Policy Gradient Estimation for Deep Reinforcement Learning](https://arxiv.org/pdf/1706.00387.pdf), S Gu et al 2017, NIPS. 57 | * [Equivalence Between Policy Gradients and Soft Q-Learning](https://arxiv.org/pdf/1704.06440.pdf), J Schulman et al 2017. 58 | * IQN:[Implicit Quantile Networks for Distributional Reinforcement Learning](http://proceedings.mlr.press/v80/dabney18a/dabney18a.pdf), W Dabney et al 2018, ICML. 59 | * [Dopamine: A Research Framework for Deep Reinforcement Learning](https://arxiv.org/pdf/1812.06110.pdf), PS Castro et al 2018. 60 | 61 | ### Exploration and related 62 | 63 | * VIME:[VIME: Variational Information Maximizing Exploration](https://proceedings.neurips.cc/paper/2016/file/abd815286ba1007abfbb8415b83ae2cf-Paper.pdf), R Houthooft et al 2017, NIPS. 64 | * [Unifying Count-Based Exploration and Intrinsic Motivation](https://proceedings.neurips.cc/paper/2016/file/afda332245e2af431fb7b672a68b659d-Paper.pdf), MG Bellemare et al 2016, NIPS. 65 | * [Count-Based Exploration with Neural Density Models](http://proceedings.mlr.press/v70/ostrovski17a/ostrovski17a.pdf), G Ostrovski et al 2017, ICML. 66 | * [#Exploration: A Study of Count-Based Exploration for Deep Reinforcement Learning](https://proceedings.neurips.cc/paper/2017/file/3a20f62a0af1aa152670bab3c602feed-Paper.pdf), H Tang et al 2016, NIPS. 67 | * EX2:[EX2: Exploration with Exemplar Models for Deep Reinforcement Learning](https://proceedings.neurips.cc/paper/2017/file/1baff70e2669e8376347efd3a874a341-Paper.pdf), J Fu et al 2017, NIPS. 68 | * ICM:[Curiosity-driven Exploration by Self-supervised Prediction](http://proceedings.mlr.press/v70/pathak17a/pathak17a.pdf), D Pathak et al 2017, ICML. 69 | * [Large-Scale Study of Curiosity-Driven Learning](https://arxiv.org/pdf/1808.04355.pdf), Y Burda et al 2018, ICLR. 70 | * RND:[Exploration by Random Network Distillation](https://arxiv.org/pdf/1810.12894.pdf%20http://arxiv.org/abs/1810.12894.pdf), Y Burda et al 2018, ICLR. 71 | 72 | ### Maximum Entropy RL 73 | 74 | * SAC_V: [Soft Actor-Critic:Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor](http://proceedings.mlr.press/v80/haarnoja18b/haarnoja18b.pdf), T Haarnoja et al 2018, ICML. 75 | 76 | * SAC: [Soft Actor-Critic Algorithms and Applications ](https://arxiv.org/pdf/1812.05905.pdf), T Haarnoja et al 2018, CoRR 77 | 78 | > SAC_V suffers from brittleness to the temperature hyperparameter, thus SAC solves it by automatic gradient-based temperature. 79 | 80 | 81 | ### Distributed RL 82 | 83 | * Distributed DQN:[Massively Parallel Methods for Deep Reinforcement Learning](https://arxiv.org/pdf/1507.04296.pdf), A Nair et al 2015. 84 | * [Distributed Prioritized Experience Replay](https://arxiv.org/pdf/1803.00933.pdf), D Horgan et al 2018, ICLR. 85 | * QR-DQN:[Distributional Reinforcement Learning with Quantile Regression](https://ojs.aaai.org/index.php/AAAI/article/view/11791), W Dabney et al 2017, AAAI. 86 | 87 | 88 | ### Offline RL 89 | 90 | * REM: [An Optimistic Perspective on Offline Reinforcement Learning](https://arxiv.org/abs/1907.04543), Rishabh Agarwal et al 2016. 91 | * AWR: [Simple and Scalable Off-Policy Reinforcement Learning](https://arxiv.org/abs/1910.00177), Xue Bin Peng et al 2019, CoRR. 92 | * AWAC: [AWAC: Accelerating Online Reinforcement Learning with Offline Datasets](https://arxiv.org/abs/2006.09359), Ashvin Nair et al 2020, CoRR 93 | * TD3+BC: [A Minimalist Approach to Offline Reinforcement Learning](https://arxiv.org/abs/2106.06860), Scott Fujimoto et al 2020. 94 | * CQL: [Conservative Q-Learning for Offline Reinforcement Learning](https://arxiv.org/abs/2006.04779), Aviral Kumar et al 2020, CoRR. 95 | * IQL: [Offline Reinforcement Learning with Implicit Q-Learning](https://arxiv.org/abs/2110.06169), Ilya Kostrikov et al 2021. 96 | 97 | 98 | *** IRL 99 | * App: [Apprenticeship Learning via Inverse Reinforcement Learning](https://www.cs.utexas.edu/~sniekum/classes/RLFD-F15/papers/Abbeel04.pdf), P Abbeel et al 2004. 100 | * [Maximum Entropy Inverse Reinforcement Learning](https://www.aaai.org/Papers/AAAI/2008/AAAI08-227.pdf?source=post_page---------------------------), BD Ziebart et al 2008, AAAI. 101 | * [Relative Entropy Inverse Reinforcement Learning](http://proceedings.mlr.press/v15/boularias11a/boularias11a.pdf), A Boularias et al 2011, AISTATS. 102 | 103 | 104 | 105 | ## Refs 106 | 107 | * https://spinningup.openai.com/en/latest/spinningup/keypapers.html 108 | -------------------------------------------------------------------------------- /RL4Control.md: -------------------------------------------------------------------------------- 1 | # RL4Control 2 | 3 | Contributors: 4 | 5 | ## model-free 6 | 7 | * [Enhanced model-Free deep Q-Learning Based control](https://www.iosrjournals.org/iosr-jce/papers/Vol20-issue1/Version-3/E2001032332.pdf), S. Mohamed el tal 2018. -------------------------------------------------------------------------------- /RL4DrugDiscovery.md: -------------------------------------------------------------------------------- 1 | # RL4DrugDiscovery 2 | 3 | 4 | 5 | * **GCPN**: [Graph Convolutional Policy Network for Goal-Directed Molecular Graph Generation](https://arxiv.org/abs/1806.02473), You et al 2018, NIPS. 6 | * **MolDQN**: [Optimization of Molecules via Deep Reinforcement Learning](https://arxiv.org/abs/1810.08678), Zhou et al 2018, Sci. Rep. 7 | * **MolGAN**: [MolGAN: An implicit generative model for small molecular graphs](https://arxiv.org/abs/1805.11973), De Cao et al 2018, ICML. 8 | * **MolGym**: [Reinforcement Learning for Molecular Design Guided by Quantum Mechanics](https://arxiv.org/abs/2002.07717), N. C. Simm et al 2020, ICML. 9 | * **REINVENT**: [REINVENT 2.0: An AI Tool for De Novo Drug Design](https://pubs.acs.org/doi/10.1021/acs.jcim.0c00915), Blaschke et al 2020, J Chem Inf Model. 10 | * **RationaleRL**: [Multi-Objective Molecule Generation using Interpretable Substructures](https://arxiv.org/abs/2002.03244), Jin et al 2020, ICML. 11 | * **MCMG**: [Multi-constraint molecular generation based on conditional transformer, knowledge distillation and reinforcement learning](https://www.nature.com/articles/s42256-021-00403-1), Wang et al 2020, Nat. Mach. Intell. 12 | * **DeepLigBuilder**: [Structure-based de novo drug design using 3D deep generative models](https://pubs.rsc.org/en/content/articlelanding/2021/sc/d1sc04444c), Li et al 2021, Chem. Sci. 13 | * **GEGL**: [Guiding Deep Molecular Optimization with Genetic Exploration (neurips.cc)](https://proceedings.neurips.cc//paper/2020/hash/8ba6c657b03fc7c8dd4dff8e45defcd2-Abstract.html), Ahn et al 2020, NIPS. 14 | * **MOLER**: [MOLER: Incorporate Molecule-Level Reward to Enhance Deep Generative Model for Molecule Optimization](https://ieeexplore.ieee.org/document/9330796), Fu et al 2022, IEEE Trans Knowl Data Eng. 15 | * **PROTAC-RL**: [Accelerated rational PROTAC design via deep learning and molecular simulations](https://www.nature.com/articles/s42256-022-00527-y), Zheng et al 2022, Nat. Mach. Intell. 16 | -------------------------------------------------------------------------------- /RL4Game.md: -------------------------------------------------------------------------------- 1 | # RL for game theory 2 | 3 | 4 | * Nash Q: [Deep Q-Learning for Nash Equilibria: Nash-DQN](https://arxiv.org/abs/1904.10554), Casgrain P, et al 2019, CoRR 5 | * Nash Q:[Nash Q-Learning for General-Sum Stochastic Games](https://www.jmlr.org/papers/volume4/hu03a/hu03a.pdf), 2003 6 | * SNQ2: [Learning Nash Equilibria in Zero-Sum Stochastic Games via Entropy-Regularized Policy Approximation](https://arxiv.org/abs/2009.00162), 2021 7 | 8 | * DeepNash: [Mastering the Game of Stratego with Model-Free Multiagent Reinforcement Learning](https://arxiv.org/abs/2206.15378), 2022, Deepmind -------------------------------------------------------------------------------- /RL4IIoT.md: -------------------------------------------------------------------------------- 1 | # RL- IIoT-related 2 | 3 | Contributors: [Wenhao Wu](https://github.com/wenhao0214) 4 | 5 | ### Network Scheduling 6 | 7 | [Cellular network traffic scheduling with deep reinforcement learning](https://dl.acm.org/doi/10.5555/3504035.3504129), Sandeep Chinchali et al 2018, AAAI. 8 | 9 | > Present a reinforcement learning (RL) based scheduler that can dynamically 10 | > adapt to traffic variation. 11 | 12 | [Learning Scheduling Algorithms for Data Processing Clusters](https://arxiv.org/abs/1810.01963), Hongzi Mao et al 2019, SIGCOMM. 13 | 14 | > Develop new representations for jobs' dependency and conduct real experiments on Spark. 15 | 16 | [ReLeS: A Neural Adaptive Multipath Scheduler based on Deep Reinforcement Learning](https://dl.acm.org/doi/abs/10.1109/INFOCOM.2019.8737649), Han Zhang et al 2019, IEEE INFOCOM. 17 | 18 | > A scheduler with an training algorithm that enables parallel execution of packet scheduling, data collecting, and neural network training. 19 | 20 | [Deep_Reinforcement_Learning_for_User_Association_and_Resource_Allocation_in_Heterogeneous_Cellular_Networks](https://ieeexplore.ieee.org/document/8796358), Nan Zhao et al 2019, IEEE Transactions on Wireless Communications. 21 | 22 | > Develop a distributed optimization method based on multi-agent RL. 23 | 24 | [Adaptive Video Streaming for Massive MIMO Networks via Approximate MDP and Reinforcement Learning](https://ieeexplore.ieee.org/document/9103310), Qiao Lan et al 2020, IEEE Transactions on Wireless Communications. 25 | 26 | >Consider a MDP with random user arrivals and departures. 27 | 28 | ### Workshop Scheduling 29 | 30 | [Relative value function approximation for the capacitated re-entrant line scheduling problem](https://ieeexplore.ieee.org/abstract/document/1458721), Jin Young Choi et al 2005, IEEE Transactions on Automation Science and Engineering. 31 | 32 | [A Reinforcement Learning Approach to Robust Scheduling of Semiconductor Manufacturing Facilities](https://ieeexplore.ieee.org/document/8946870), In-Beom Park et al 2020, IEEE Transactions on Automation Science and Engineering. 33 | 34 | [Deep reinforcement learning in production systems: a systematic literature review](https://arxiv.org/abs/2109.03540), Xiaocong Chen et al 2021, International Journal of Production Research. 35 | 36 | [Dynamic job-shop scheduling in smart manufacturing using deep reinforcement learning](https://www.sciencedirect.com/science/article/abs/pii/S1389128621001031), Libing Wang et al 2021, Computer Networks. 37 | 38 | ### Other Scheduling Problem 39 | 40 | [A deep q-network for the beer game: A reinforcement learning algorithm to solve inventory optimization problems](https://arxiv.org/abs/1708.05924), Afshin Oroojlooyjadid et al 2017. 41 | 42 | > Use RL in a simply case of the supply chain. 43 | 44 | [Efficient Large-Scale Fleet Management via Multi-Agent Deep Reinforcement Learning](https://dl.acm.org/doi/10.1145/3219819.3219993), Kaixiang Lin et al 2018, KDD. 45 | 46 | > A special case of bin packing problem solved by RL. 47 | 48 | [ORL Reinforcement Learning Benchmarks for Online Stochastic Optimization](https://arxiv.org/abs/1911.10641v2), Bharathan Balaji et al 2018, Amazon Report. 49 | 50 | >Applying RL algorithms to a range of practical applications. 51 | -------------------------------------------------------------------------------- /RL4IL.md: -------------------------------------------------------------------------------- 1 | # RL for imitation learning 2 | 3 | Contributors: [johnjim0816](https://github.com/JohnJim0816) 4 | 5 | ## Surveys 6 | 7 | [A Imitation Learning: A Survey of Learning Methods](https://core.ac.uk/download/pdf/141207521.pdf) 8 | 9 | * [Imitation Learning by Estimating Expertise of Demonstrators](https://arxiv.org/abs/2202.01288), 2022, ICML 10 | * BC: [Integrating Behavior Cloning and Reinforcement Learning for Improved Performance in Dense and Sparse Reward Environments](https://arxiv.org/abs/1910.04281) 11 | * BC+TD3: [Twin Delayed DDPG with Behavior Cloning ](https://arxiv.org/pdf/2106.06860.pdf) -------------------------------------------------------------------------------- /RL4Policy-Diversity.md: -------------------------------------------------------------------------------- 1 | * [Diversity Driven Exploration Strategy for Deep Reinforcement Learning](https://arxiv.org/abs/1802.04564), Zhang Wei Hong et al, 2018, NIPS 2 | 3 | ### Reward Shaping 4 | 5 | * Continuously Discovering Novel Strategies via Reward-Switching Policy Optimization, 2022, ICLR -------------------------------------------------------------------------------- /RL4QD.md: -------------------------------------------------------------------------------- 1 | # QD 2 | 3 | ## Survey: 4 | 5 | - QD: [Quality Diversity: A New Frontier for Evolutionary Computation](http://eplex.cs.ucf.edu/papers/pugh_frontiers16.pdf), Justin Pugh et al 2016, Frontiers in Robotics and AI 6 | - QD opt: [Quality and diversity optimization: A unifying modular framework](https://arxiv.org/pdf/1708.09251.pdf) , Antoine Cully et al 2018 , TEC 7 | - [Policy search in continuous action domains: An overview](), Oliver Sigaud et al 2019, Neural Network 8 | - [Quality-Diversity Optimization: a novel branch of stochastic optimization](https://arxiv.org/pdf/2012.04322.pdf), Konstantinos Chatzilygeroudis et al 2021 9 | 10 | ## QD Analysis: 11 | 12 | - [An Extended Study of Quality Diversity Algorithms](http://delivery.acm.org/10.1145/2910000/2909000/p19-pugh.pdf?ip=129.31.142.189&id=2909000&acc=CHORUS&key=BF07A2EE685417C5%2EF5014A9D3D5CC2D9%2E4D4702B0C3E38B35%2E6D218144511F3437&__acm__=1572805067_7b82b5509b1bbc9081c7a3dd591b28b7) , Justin Pugh et al 2016 , Gecco 13 | - [Gaining insight into quality diversity](https://infoscience.epfl.ch/record/220676/files/p1061-auerbach.pdf) , Joshua Auerbach et al 2016, Gecco 14 | - [Searching for quality diversity when diversity is unaligned with quality](https://eplex.cs.ucf.edu/papers/pugh_ppsn16.pdf) , Justin Pugh et al 2016 , PPSN 15 | - QD-suite: [Towards QD-suite: developing a set of benchmarks for Quality-Diversity algorithms](https://arxiv.org/pdf/2205.03207.pdf) 16 | - [ANALYSIS OF QUALITY DIVERSITY ALGORITHMS FOR THE 17 | KNAPSACK PROBLEM](https://arxiv.org/pdf/2207.14037.pdf), Adel Nikfarjam et al 2022, PPSN 18 | 19 | 20 | 21 | ## QD-RL: 22 | 23 | - QD-RL: [QD-RL: Efficient Mixing of Quality and Diversity in Reinforcement Learning](https://arxiv.org/pdf/2006.08505.pdf), Geoffrey Cideron et al 2020 24 | - DQD: [Differentiable Quality Diversity](https://arxiv.org/pdf/2106.03894.pdf), Matthew C. Fontaine et al 2021, NIPS 25 | - GUSS: [Guided Safe Shooting: model based reinforcement learning with safety constraints](https://arxiv.org/abs/2206.09743), Giuseppe Paolo et al 2022 26 | - EDO-CS: [Evolutionary Diversity Optimization with Clustering-based Selection for Reinforcement Learning](https://openreview.net/pdf?id=74x5BXs4bWD), Yutong Wang et al 2022, ICLR 27 | - [Deep Surrogate Assisted Generation of Environments](https://arxiv.org/pdf/2206.04199.pdf), Varun Bhatt et al 2022, NIPS 28 | - HTSE: [Promoting Quality and Diversity in Population-based Reinforcement 29 | Learning via Hierarchical Trajectory Space Exploration](https://ieeexplore.ieee.org/document/9811888/), Jiayu Miao et al 2022, ICRA 30 | - DA-QD: [Dynamics-Aware Quality-Diversity for Efficient Learning of Skill 31 | Repertoires](https://arxiv.org/pdf/2109.08522.pdf), Bryan Lim et al 2022, ICRA 32 | - [Approximating Gradients for Differentiable Quality Diversity in Reinforcement Learning](https://dl.acm.org/doi/10.1145/3512290.3528705), Bryon Tjanaka et al 2022, Gecco 33 | - QD-PG: [Diversity Policy Gradient for Sample Efficient Quality-Diversity Optimization](https://arxiv.org/pdf/2006.08505.pdf), Thomas Pierrot et al 2022, Gecco 34 | 35 | ## QD-Evolution: 36 | 37 | - [Discovering evolutionary stepping stones through behavior domination](https://arxiv.org/pdf/1704.05554.pdf) , Elliot Meyerson 2017, Gecco 38 | - [The surprising creativity of digital evolution: A collection of anecdotes from the evolutionary computation and artificial life research communities](https://arxiv.org/abs/1803.03453) , Joel Lehman et al 2018 39 | - [Open-ended evolution with multi-containers QD](https://dl.acm.org/doi/abs/10.1145/3205651.3205705) , Stephane Doncieux et al 2018 , Gecco 40 | - [Mapping structural diversity in networks sharing a given degree distribution and global clustering: Adaptive resolution grid search evolution with Diophantine equation-based mutations](https://arxiv.org/pdf/1809.06293.pdf) , Peter Overbury et al 2018 41 | - [Hierarchical Behavioral Repertoires with Unsupervised Descriptors](https://arxiv.org/pdf/1804.07127.pdf) , Antoine Cully et a; 2018, Gecco 42 | - mEDEA: [Evolution of a Functionally Diverse Swarm via a Novel Decentralised Quality-Diversity Algorithm](https://arxiv.org/pdf/1804.07655.pdf), Emma Hart 2018, Gecco 43 | - [An approach to evolve and exploit repertoires of general robot behaviours](https://repositorio.iscte-iul.pt/bitstream/10071/16255/5/1-s2.0-S2210650217308556-main.pdf),Jorge Gomes et al 2018, Swarm and Evolutionary Computation 44 | - POET:[POET: open-ended coevolution of environments and their optimized solutions](),Rui Wang et al 2019, Gecco 45 | - [Modeling user selection in quality diversity](https://arxiv.org/pdf/1907.06912.pdf), Alexander Hagg et al 2019, Gecco 46 | - [Exploration and Exploitation in Symbolic Regression using Quality-Diversity and Evolutionary Strategies Algorithms](https://arxiv.org/pdf/1906.03959.pdf), J.-P,Bruneton et al 2019 47 | - [Designing neural networks through neuroevolution](http://www.evolvingai.org/files/s42256-018-0006-z.pdf), Kenneth Stanley et al 2019, Nature Machine Intelligence 48 | - GAPN: [Behavioral Repertoire via Generative Adversarial Policy Networks](http://homepages.inf.ed.ac.uk/thospeda/papers/jegorova2019gpn.pdf), Marija Jegorova et al 2019 49 | - [Autonomous Skill Discovery with Quality-diversity and Unsupervised Descriptors](https://arxiv.org/pdf/1905.11874.pdf), Antoine Cully 2019, Gecco 50 | - [Scaling MAP-Elites to Deep Neuroevolution](https://arxiv.org/pdf/2003.01825.pdf), Cedric Colas et al 2020 51 | - [Quality Diversity for Multi-task Optimization](https://arxiv.org/pdf/2003.04407.pdf), Jean-Baptiste Mouret et al 2020, Gecco 52 | - [Policy Manifold Search for Improving Diversity-based Neuroevolution](https://arxiv.org/pdf/2012.08676), Nemanja Rakicevic et al 2020, NIPS Workshop 53 | - [Learning behaviour-performance maps with meta-evolution](https://hal.inria.fr/hal-02555231/document), David Bossens et al 2020, Gecco 54 | - [Exploring the Evolution of GANs through Quality Diversity](https://arxiv.org/abs/2007.06251), Victor Costa et al 2020, Gecco 55 | - Enhance POET: [Enhanced POET: Open-ended reinforcement learning through unbounded invention of learning challenges and their solutions](https://arxiv.org/pdf/2003.08536), Rui Wang et al 2020, ICML 56 | - [Effective Diversity in Population Based Reinforcement Learning](https://papers.nips.cc/paper/2020/file/d1dc3a8270a6f9394f88847d7f0050cf-Paper.pdf), Jack Parker-Holder et al 2020, NIPS 57 | - [Discovering Representations for Black-box Optimization](https://hal.inria.fr/hal-02555221/document), Adam Gaier et al 2020, Gecco 58 | - [Competitiveness of MAP-Elites against Proximal Policy Optimization on locomotion tasks in deterministic simulations](https://arxiv.org/pdf/2009.08438), Szymon Brych et al 2020 59 | - CPPN2GAN: [CPPN2GAN: Combining Compositional Pattern Producing Networks and GANs for Large-scale Pattern Generation](https://arxiv.org/pdf/2004.01703.pdf), Jacob Schrum et al 2020, Gecco 60 | - Bop-Elites: [Bop-elites, a bayesian optimisation algorithm for quality-diversity search](https://arxiv.org/pdf/2005.04320), Paul Kent et al 2020 61 | - [Unsupervised Behaviour Discovery with Quality-Diversity Optimisation](https://arxiv.org/pdf/2106.05648), Luca Grillotti et al 2021 62 | - [Sparse Reward Exploration via Novelty Search and Emitters](https://arxiv.org/pdf/2102.03140), Giuseppe Paolo et al 2021, Gecco 63 | - PMS: [Policy Manifold Search: Exploring the Manifold Hypothesis for Diversity-based Neuroevolution](https://arxiv.org/pdf/2104.13424), Nemanja Rakicevic et al 2021, Gecco 64 | - [On the use of feature-maps and parameter control for improved quality-diversity meta-evolution](https://arxiv.org/pdf/2105.10317), David M. Bossens et al 2021, Gecco 65 | - [Illuminating the Space of Beatable Lode Runner Levels Produced By Various Generative Adversarial Networks](https://arxiv.org/pdf/2101.07868), Kirby Steckel et al 2021 66 | - [Expressivity of Parameterized and Data-driven Representations in Quality Diversity Search](https://arxiv.org/pdf/2105.04247.pdf), Alexander Hagg et al 2021, Gecco 67 | - [Ensemble Feature Extraction for Multi-Container Quality-Diversity Algorithms](https://arxiv.org/pdf/2105.00682), Leo Cazenille et al 2021, Gecco 68 | - AutoAlpha: [AutoAlpha: an Efficient Hierarchical Evolutionary Algorithm for Mining Alpha Factors in Quantitative Investment](https://arxiv.org/pdf/2002.08245), Tianping Zhang et al 2021 69 | 70 | 71 | 72 | ## Novelty Search: 73 | 74 | - [Novelty-based multiobjectivization](https://hal.archives-ouvertes.fr/hal-01300711/file/2011COS1944.pdf) , Jean-Baptiste Mouret 2011 75 | - [Evolving a diversity of virtual creatures through novelty search and local competition](https://pdfs.semanticscholar.org/6d45/9da1ff73ec7225e92842341605e2b90d0da2.pdf) , Joel Lehman et al 2011, Gecco 76 | - [Abandoning objectives: Evolution through the search for novelty alone](http://eplex.cs.ucf.edu/papers/lehman_ecj11.pdf) , Joel Lehman et al 2011, Evolutionary Computation 77 | - [Constrained novelty search: A study on game content generation](http://antoniosliapis.com/papers/constrained_novelty_search.pdf) , Antonios Liapis et al 2015, Evolutionary Computation 78 | - [Understanding innovation engines: Automated creativity and improved stochastic optimization via deep learning](http://www.evolvingai.org/files/2016_NguyenEtAl_UnderstandingInnovationEngines.pdf) , Anh Nguyen et al 2016, Evolutionary Computation 79 | - [Bayesian optimization with automatic prior selection for data-efficient direct policy search](https://arxiv.org/pdf/1709.06919.pdf), Remi Pautrat et al 2018, ICRA 80 | - [Novelty search: a theoretical perspective](https://hal.archives-ouvertes.fr/hal-02561846/document),Stephane Doncieux et al 2019, Gecco 81 | - BR-NS:[BR-NS: an Archive-less Approach to Novelty Search](https://arxiv.org/pdf/2104.03936.pdf), Achkan Salehi et al 2021, Gecco 82 | - [Geodesics, Non-linearities and the Archive of Novelty Search](https://arxiv.org/pdf/2205.03162.pdf), Achkan Salehi et al 2022, Gecco 83 | 84 | 85 | 86 | ## MAP-Elites Family: 87 | 88 | - [Illuminating search spaces by mapping elites](https://arxiv.org/pdf/1504.04909.pdf) , Jean-Baptiste Mouret et al 2015 89 | - MAP-Elites: [Robots that can adapt like animals](https://arxiv.org/pdf/1407.3501.pdf) , Antoine Cully et al 2015 , Nature 90 | - [How Do Different Encodings Influence the Performance of the MAP-Elites Algorithm?](https://hal.inria.fr/hal-01302658/file/gecco_map_elites.pdf) , Danesh Tarapore et al 2016, Gecco 91 | - SAIL: [Feature space modeling through surrogate illumination](https://pdfs.semanticscholar.org/d305/ba9a5ee3089de3f2e03d6fa53c90aba89d9c.pdf) , Adam Gaier et al 2017, Gecco 92 | - SAIL2: [Data-efficient exploration, optimization, and modeling of diverse designs through surrogate-assisted illumination](https://hal.inria.fr/hal-01518698/file/sail2017.pdf) , Adam Gaier et al 2017, Gecco 93 | - [Comparing multimodal optimization and illumination](https://hal.inria.fr/hal-01518802/document) , Vassilis Vassiliades et al 2017 , Gecco 94 | - [A comparison of illumination algorithms in unbounded spaces](https://hal.inria.fr/hal-01518814/document) , Vassilis Vassiliades et al 2017 , Gecco 95 | - CVT-MAP-Elites: [Using Centroidal Voronoi Tessellations to Scale Up the Multidimensional Archive of Phenotypic Elites Algorithm](https://hal.inria.fr/hal-01630627/file/ieee_tec_voronoi_map_elites.pdf) , Vassilis Vassiliades et al 2018 , TEC 96 | - Talakat: [Talakat: Bullet Hell Generation through Constrained Map-Elites](https://arxiv.org/pdf/1806.04718.pdf) , Ahmed Khalifa et al 2018 , Gecco 97 | - RTE: [Reset-free trial-and-error learning for robot damage recovery](https://arxiv.org/pdf/1610.04213) , Konstantinos Chatzilygeroudis et al 2018 , RAS 98 | - [Optimisation and Illumination of a Real-World Workforce Scheduling and Routing Application (WSRP) via Map-Elites](https://arxiv.org/pdf/1805.11555.pdf) , Neil Urquhart et al 2018, PPSN 99 | - [Multi-objective Analysis of MAP-Elites Performance](https://arxiv.org/pdf/1803.05174.pdf) , Eivind Samuelsen et al 2018 100 | - SAIL3: [Data-Efficient Design Exploration through Surrogate-Assisted Illumination](https://arxiv.org/pdf/1806.05865.pdf), Adam Gaier et al 2018, Evolutionary Computation 101 | - MESB: [Mapping Hearthstone Deck Spaces with Map-Elites with Sliding Boundaries](https://arxiv.org/pdf/1904.10656.pdf), Matthew Fontaine et al 2019, Gecco 102 | - [MAP-Elites for noisy domains by adaptive sampling](http://sebastianrisi.com/wp-content/uploads/justesen_gecco19.pdf), Niels Justesen et al 2019, Gecco 103 | - [Evaluating MAP-Elites on Constrained Optimization Problems](https://arxiv.org/pdf/1902.00703.pdf), Stefano Fioravanzo et al 2019 104 | - [Empowering Quality Diversity in Dungeon Design with Interactive Constrained MAP-Elites](https://arxiv.org/pdf/1906.05175.pdf), Alberto Alvarez et al 2019 105 | - [An illumination algorithm approach to solving the micro-depot routing problem](https://dl.acm.org/doi/10.1145/3321707.3321767), Neil Urquhart et al 2019, Gecco 106 | - [Using MAP-Elites to support policy making around Workforce Scheduling and Routing](https://www.napier.ac.uk/~/media/worktribe/output-2296970/using-map-elites-to-support-policy-making-around-workforce-scheduling-and-routing.pdf), Neil Urquhart et al 2020 107 | - [Exploring the BipedalWalker benchmark with MAP-Elites and Curiosity-driven A3C](https://dl.acm.org/doi/pdf/10.1145/3377929.3389921), Vikas Gupta et al 2020, Gecco 108 | - [Covariance Matrix Adaptation for the Rapid Illumination of Behavior Space](https://arxiv.org/pdf/1912.02400.pdf), Matthew C. Fontaine et al 2020, Gecco 109 | - PGA-MAP-Elites: [Policy Gradient Assisted MAP-Elites](https://hal.archives-ouvertes.fr/hal-03135723/document), Olle Nilsson et al 2021, Gecco 110 | - [Multi-Emitter MAP-Elites: Improving quality, diversity and convergence speed with heterogeneous sets of emitters](https://arxiv.org/pdf/2007.05352), Antoine Cully 2021, Gecco 111 | - [Minimize Surprise MAP-Elites: A Task-Independent MAP-Elites Variant for Swarms](https://dl.acm.org/doi/10.1145/3520304.3528773), Tanja Katharina Kaiser et al 2022, Gecco 112 | - [Illuminating Diverse Neural Cellular Automata for Level Generation](https://dl.acm.org/doi/10.1145/3512290.3528754), Sam Earle et al 2022, Gecco 113 | - [Deep Surrogate Assisted MAP-Elites for Automated Hearthstone Deckbuilding](https://dl.acm.org/doi/10.1145/3512290.3528718), Yulun Zhang et al 2022, Gecco 114 | - [Accelerated Quality-Diversity through Massive 115 | Parallelism](https://arxiv.org/pdf/2202.01258.pdf), Bryan Lim et al 2022 116 | 117 | 118 | 119 | ## Refs 120 | 121 | [Quality-Diversity optimisation algorithms](https://quality-diversity.github.io/papers.html) 122 | 123 | -------------------------------------------------------------------------------- /RL4Robot.md: -------------------------------------------------------------------------------- 1 | # RL-Robot-related 2 | 3 | Contributors: [Yongqi Li](https://github.com/L3Y1Q2) 4 | 5 | ### legged robot 6 | 7 | #### quadrupedal robot 8 | 9 | * [Creating a Dynamic Quadrupedal Robotic Goalkeeper with Reinforcement Learning](https://arxiv.org/abs/2210.04435), Huang, Xiaoyu et al 2022, arXiv 10 | 11 | > A hierarchical reinforcement learning framework is used to intercept the ball by combining a quadruple robot with high dynamic motion and an object perception method. 12 | 13 | * [Learning to Walk in Minutes Using Massively Parallel Deep Reinforcement Learning](https://proceedings.mlr.press/v164/rudin22a.html), Rudin N et al 2022, CoRL 14 | 15 | * [Learning robust perceptive locomotion for quadrupedal robots in the wild](https://www.science.org/doi/abs/10.1126/scirobotics.abk2822), Takahiro Miki et al 2022, Science Robotics 16 | 17 | * [Legged Robots that Keep on Learning: Fine-Tuning Locomotion Policies in the Real World](https://www.semanticscholar.org/paper/35efc3a4c5f64d96ded6daea692f3935c96f0415), Laura Smith et al 2022, ICRA 18 | 19 | * [Cat-Like Jumping and Landing of Legged Robots in Low Gravity Using Deep Reinforcement Learning](https://www.semanticscholar.org/paper/7e8186146b95337d24d28dda05cab886621cdf8c), Rudin N et al 2021, TRO 20 | 21 | * [Learning quadrupedal locomotion over challenging terrain](https://www.semanticscholar.org/paper/eadbe2e4f9de47dd357589cf59e3d1f0199e5075), Joonho Lee et al 2020, Science Robotics 22 | 23 | * [Learning Agile Robotic Locomotion Skills by Imitating Animals](https://arxiv.org/abs/2004.00784), Xue Bin Peng et al 2020, RSS 24 | 25 | * [Learning agile and dynamic motor skills for legged robots](https://www.science.org/doi/full/10.1126/scirobotics.aau5872), Hwangbo et al 2019, Science Robotics 26 | 27 | * [Sim-to-Real: Learning Agile Locomotion For Quadruped Robots](https://www.semanticscholar.org/paper/4d3b69bdcd1d325d29badc6a38f2d6cc504fe7d1), Jie Tan et al 2018, RSS 28 | 29 | * [Learning to Walk via Deep Reinforcement Learning](https://www.semanticscholar.org/paper/2ed619fbc7902155d54f6f21da16ad6c120eac63), Tuomas Haarnoja et al 2018, RSS 30 | 31 | * [Robust Rough-Terrain Locomotion with a Quadrupedal Robot](https://ieeexplore.ieee.org/abstract/document/8460731), Peter Fankhauser et al 2018, ICRA 32 | 33 | #### bipedal robot 34 | 35 | * [Towards Real Robot Learning in the Wild: A Case Study in Bipedal Locomotion](https://proceedings.mlr.press/v164/bloesch22a.html), Bloesch M et al 2022, CoRL 36 | * [Sim-to-Real Learning for Bipedal Locomotion Under Unsensed Dynamic Loads](https://arxiv.org/abs/2204.04340), Jeremy Dao et al 2022, ICRA 37 | * [Sim-to-Real Learning of Footstep-Constrained Bipedal Dynamic Walking](https://arxiv.org/abs/2203.07589), Helei Duan et al 2022, ICRA 38 | * [Blind bipedal stair traversal via sim-to-real reinforcement learning](https://arxiv.org/abs/2105.08328), Jonah Siekmann et al 2021, RSS 39 | * [Reinforcement learning for robust parameterized locomotion control of bipedal robots](https://ieeexplore.ieee.org/abstract/document/9560769), Zhongyu Li et al 2021, ICRA 40 | * [DeepWalk: Omnidirectional bipedal gait by deep reinforcement learning](https://ieeexplore.ieee.org/abstract/document/9561717), Diego Rodriguez et al 2021, ICRA 41 | * [Learning Memory-Based Control for Human-Scale Bipedal Locomotion](https://arxiv.org/abs/2006.02402), Jonah Siekmann et al 2020, RSS 42 | * [Learning Locomotion Skills for Cassie: Iterative Design and Sim-to-Real](https://www.semanticscholar.org/paper/719068eb8b8c9ab8552ec3e82c1b1088a9eacdce), Zhaoming Xie et al 2019, CoRL 43 | * [Feedback Control For Cassie With Deep Reinforcement Learning](https://www.semanticscholar.org/paper/e3bcefbcba308934dd1d843102e2b82c7239d56d), Zhaoming Xie et al 2018, IROS 44 | 45 | * [DeepMimic: Example-Guided Deep Reinforcement Learning of Physics-Based Character Skills](https://www.semanticscholar.org/paper/1b9ce6abc0f3024b88fcd4dbd0c10cf5bcf7d38d), Xue Bin Peng et al 2018, TOG 46 | 47 | ### UAV 48 | 49 | * [Learning Minimum-Time Flight in Cluttered Environments](https://rpg.ifi.uzh.ch/docs/RAL_IROS22_Penicka.pdf), Robert Penicka et al 2022, RAL 50 | 51 | > On the basis of [previous work](https://rpg.ifi.uzh.ch/docs/IROS21_Yunlong.pdf), obstacles are considered. 52 | 53 | * [Learning High-Speed Flight in the Wild](https://rpg.ifi.uzh.ch/AgileAutonomy.html), A. Loquercio et al 2021, Science Robotics 54 | 55 | > This paper proposes an end-to-end approach that can autonomously fly quadrotors through complex natural and man-made environments at high speeds, with purely onboard sensing and computation. 56 | 57 | * [Autonomous Drone Racing with Deep Reinforcement Learning](https://rpg.ifi.uzh.ch/docs/IROS21_Yunlong.pdf), Yunlong Song et al 2021, IROS 58 | 59 | > This paper presents a learning-based method for autonomous drone racing. 60 | 61 | * [A Two-Stage Reinforcement Learning Approach for Multi-UAV Collision Avoidance Under Imperfect Sensing](https://ieeexplore.ieee.org/abstract/document/9001167), Wang D et al 2020, RAL 62 | 63 | > A two-stage training method for collision avoidance based on DDPG can generate a time-effective and collision-free path under imperfect perception. 64 | 65 | * [Low-level autonomous control and tracking of quadrotor using reinforcement learning](https://www.sciencedirect.com/science/article/pii/S0967066119301923), Chen Huan Pi et al 2020, CEP 66 | 67 | > Model-free DRL based low-level control algorithm for quadrotor is used for hovering and trajectory tracking. 68 | 69 | * [Low-level control of a quadrotor with deep model-based reinforcement learning](https://ieeexplore.ieee.org/abstract/document/8769882/), Lambert N O et al 2019, RAL 70 | 71 | > Model-based DRL based low-level control algorithm for quadrotor. 72 | 73 | * [Deterministic Policy Gradient With Integral Compensator for Robust Quadrotor Control](https://ieeexplore.ieee.org/abstract/document/8600717/), Wang Y et al 2019, TSMC 74 | 75 | > DPG-IC: a learning-based robust control strategy for quadrotor control with DRL. 76 | 77 | * [Reinforcement learning for UAV attitude control](https://dl.acm.org/doi/abs/10.1145/3301273), William Koch et al 2019, TCPS 78 | 79 | > This paper replaces the inner-loop PID attitude controller with reinforcement learning. 80 | 81 | * [Autonomous UAV Navigation Using Reinforcement Learning](https://arxiv.org/abs/1801.05086), Huy X. Pham et al 2018, IJMLC 82 | 83 | > This paper presented a technique to train a quadrotor to learn to navigate to the target point using a PID+ Q-learning algorithm in an unknown environment. 84 | 85 | * [Control of a quadrotor with reinforcement learning](https://ieeexplore.ieee.org/abstract/document/7961277/), Hwangbo J et al 2017, RAL 86 | 87 | > The paper proposes autonomous UAV stability control based on reinforcement learning. 88 | 89 | ### UGV&USV 90 | 91 | * [Goal-Driven Autonomous Exploration Through Deep Reinforcement Learning](https://ieeexplore.ieee.org/document/9645287/), Cimurs R et al 2021, RAL 92 | * [Path Planning Algorithms for USVs via Deep Reinforcement Learning](https://www.semanticscholar.org/paper/Path-Planning-Algorithms-for-USVs-via-Deep-Learning-Zhai-Wang/b7d3afecf5ea672621b1f96d28ea7542c02afc1a), Haoran Zhai et al 2021, CAC 93 | * [Mobile robot path planning in dynamic environments through globally guided reinforcement learning](https://ieeexplore.ieee.org/abstract/document/9205217/), B Wang et al 2020, RAL 94 | * [Deep reinforcement learning for indoor mobile robot path planning](https://www.mdpi.com/838810), Gao J et al 2020, Sensors 95 | * [PRM-RL: Long-range Robotic Navigation Tasks by Combining Reinforcement Learning and Sampling-Based Planning](https://www.semanticscholar.org/paper/PRM-RL%3A-Long-range-Robotic-Navigation-Tasks-by-and-Faust-Ramirez/551c60bd9178a199c20723122cd26ddd9c0c93b6), Aleksandra Faust et al 2018, ICRA 96 | * [Target-driven visual navigation in indoor scenes using deep reinforcement learning](https://www.semanticscholar.org/paper/Target-driven-visual-navigation-in-indoor-scenes-Zhu-Mottaghi/7af7f2f539cd3479faae4c66bbef49b0f66202fa), Yuke Zhu et al 2017, ICRA 97 | * [Virtual-to-real deep reinforcement learning: Continuous control of mobile robots for mapless navigation](https://www.semanticscholar.org/paper/Virtual-to-real-deep-reinforcement-learning%3A-of-for-Tai-Paolo/799c0e461332570ecde97e13266fecde8476efe3), L Tai et al 2017, IROS 98 | * [Towards Monocular Vision based Obstacle Avoidance through Deep Reinforcement Learning](https://www.semanticscholar.org/paper/Towards-Monocular-Vision-based-Obstacle-Avoidance-Xie-Wang/eab2c0bb3eda3b2c37b379e574a645d52ec264ef), Linhai Xie et al 2017, RSS 99 | * [Socially aware motion planning with deep reinforcement learning](https://www.semanticscholar.org/paper/Socially-aware-motion-planning-with-deep-learning-Chen-Everett/fe2ef22089712fcff33a77761860a10b7834da47), Yu Fan Chen et al 2017, IROS 100 | * [From perception to decision: A data-driven approach to end-to-end motion planning for autonomous ground robots](https://www.semanticscholar.org/paper/From-perception-to-decision%3A-A-data-driven-approach-Pfeiffer-Schaeuble/aa0b2517c1555fc5b3885723959f7ac950ba1626), Mark Pfeiffer et al 2017, ICRA 101 | 102 | ### manipulator 103 | 104 | * [Learning dexterous in-hand manipulation](https://journals.sagepub.com/doi/abs/10.1177/0278364919887447), Andrychowicz et al 2020, IJRR 105 | * [Dynamics Learning with Cascaded Variational Inference for Multi-Step Manipulation](https://www.semanticscholar.org/paper/Dynamics-Learning-with-Cascaded-Variational-for-Fang-Zhu/1674008abd47f1ce1e894c672074a47ee6c3288c), Kuan Fang et al 2019, CoRL 106 | * [Towards Practical Multi-Object Manipulation using Relational Reinforcement Learning](https://link.zhihu.com/?target=https%3A//richardrl.github.io/relational-rl/), R. Li et al 2019, ICRA 107 | * [Solving rubik's cube with a robot hand](https://arxiv.org/abs/1910.07113), I Akkaya et al 2019, ArXiv 108 | * [Closing the Sim-to-Real Loop: Adapting Simulation Randomization with Real World Experience](https://www.semanticscholar.org/paper/b2174399c04a7d894bcd2dc7848a35aed4c67f80), Chebotar et al 2019, ICRA 109 | * [Sim-to-Real Transfer of Robotic Control with Dynamics Randomization](https://www.semanticscholar.org/paper/0af8cdb71ce9e5bf37ad2a11f05af293cfe62172), Xue Bin Peng et al 2018, ICRA 110 | * [Reinforcement and Imitation Learning for Diverse Visuomotor Skills](https://www.semanticscholar.org/paper/d356a5603f14c7a6873272774782d7812871f952), Yuke Zhu et al 2018, RSS 111 | * [Asymmetric Actor Critic for Image-Based Robot Learning](https://www.semanticscholar.org/paper/Asymmetric-Actor-Critic-for-Image-Based-Robot-Pinto-Andrychowicz/cee949487d13d0b64c4ef21b66ece96eb08472b3), Lerrel Pinto et al 2018, RSS 112 | * [Learning Synergies Between Pushing and Grasping with Self-Supervised Deep Reinforcement Learning](https://ieeexplore.ieee.org/document/8593986/), Andy Zeng et al 2018, IROS 113 | * [Learning Complex Dexterous Manipulation with Deep Reinforcement Learning and Demonstrations](https://www.semanticscholar.org/paper/Learning-Complex-Dexterous-Manipulation-with-Deep-Rajeswaran-Kumar/e010ba3ff5744604cdbfe44a733e2a98649ee907), A. Rajeswaran et al 2018, RSS 114 | * [Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates](https://www.semanticscholar.org/paper/Deep-reinforcement-learning-for-robotic-with-Gu-Holly/e37b999f0c96d7136db07b0185b837d5decd599a), S. Gu et al 2017, ICRA 115 | 116 | ### MRS 117 | 118 | * [MAMBPO: Sample-efficient *multi*-*robot* reinforcement learning using learned world models](https://www.semanticscholar.org/paper/MAMBPO%3A-Sample-efficient-multi-robot-reinforcement-Willemsen-Coppola/b7cb2bb1c116efd825d391c6e17028f51770cac7), Daniel Willemsen et al 2021, IROS 119 | * [Adaptive and extendable control of unmanned surface vehicle formations using distributed deep reinforcement learning](https://www.semanticscholar.org/paper/Adaptive-and-extendable-control-of-unmanned-surface-Wang-Ma/f1880a9a9d3080c516be80b0bc7a6d4c9fcdd137), Shuwu Wang et al 2021, 120 | * [Voronoi-based multi-robot autonomous exploration in unknown environments via deep reinforcement learning](https://ieeexplore.ieee.org/abstract/document/9244647/), Hu J et al 2020, TVT 121 | * [Distributed multi-robot collision avoidance via deep reinforcement learning for navigation in complex scenarios](https://www.semanticscholar.org/paper/Distributed-multi-robot-collision-avoidance-via-for-Fan-Long/3c7a22a6e60a8adbfff34bc55cb07f6429b9e522), Tingxiang Fan 2020, IJRR 122 | * [Glas: Global-to-local safe autonomy synthesis for multi-robot motion planning with end-to-end learning](https://ieeexplore.ieee.org/abstract/document/9091314/), B Riviere 2020, RAL 123 | * [Distributed Non-Communicating Multi-Robot Collision Avoidance via Map-Based Deep Reinforcement Learning](https://www.semanticscholar.org/paper/Distributed-Non-Communicating-Multi-Robot-Collision-Chen-Yao/b013a5f87a6966dc7fdb0f62bab2c88c52e3f9f5), Guangda Chen et al 2020, Sensors 124 | * [A Two-Stage Reinforcement Learning Approach for Multi-UAV Collision Avoidance Under Imperfect Sensing](https://www.semanticscholar.org/paper/A-Two-Stage-Reinforcement-Learning-Approach-for-Wang-Fan/b6a2741002714cbe3e069ab21e74cdea8ba35806), Dawei Wang 2020, RAL 125 | * [Towards optimally decentralized multi-robot collision avoidance via deep reinforcement learning](https://ieeexplore.ieee.org/abstract/document/8461113/), P Long et al 2018, ICRA 126 | * [Motion Planning Among Dynamic, Decision-Making Agents with Deep Reinforcement Learning](https://www.semanticscholar.org/paper/Motion-Planning-Among-Dynamic%2C-Decision-Making-with-Everett-Chen/f3161b75de1e37b0591f250068b676ea72d1ba22), Michael Everett et al 2018, IROS 127 | * [Decentralized non-communicating multiagent collision avoidance with deep reinforcement learning](https://ieeexplore.ieee.org/abstract/document/7989037/), YF Chen et al 2017, ICRA 128 | 129 | ### simulator-related 130 | 131 | * [Isaac Gym: High Performance GPU-Based Physics Simulation For Robot Learning](https://www.semanticscholar.org/paper/49142e3e381c0dc7fee0049ea41d2ef02c0340d7), Viktor Makoviychuk et al 2021, NeurIPS 132 | 133 | > The most prominent reinforcement learning simulator at this stage 134 | 135 | * [Flightmare: A Flexible Quadrotor Simulator](https://rpg.ifi.uzh.ch/docs/CoRL20_Yunlong.pdf), Yunlong Song et al 2020, CoRL 136 | 137 | > [ETH research team](https://rpg.ifi.uzh.ch/) developed a simulator for its own UAV reinforcement learning simulation. 138 | 139 | * [Leveraging Deep Reinforcement Learning For Active Shooting Under Open-World Setting](https://link.zhihu.com/?target=https%3A//ieeexplore.ieee.org/abstract/document/9102966), A. Tzimas et al 2020, ICME 140 | 141 | * [FlightGoggles: A Modular Framework for Photorealistic Camera, Exteroceptive Sensor, and Dynamics Simulation](https://ieeexplore.ieee.org/abstract/document/8968116/), Winter Guerra et al 2019, IROS 142 | 143 | * [AirSim Drone Racing Lab](http://proceedings.mlr.press/v123/madaan20a.html), Ratnesh Madaan et al 2019, NeurIPS 144 | 145 | > A simulation framework for autonomous drone racing. 146 | 147 | ### research groups & institutes 148 | 149 | * [Robotics and Perception Group, University of Zurich](https://rpg.ifi.uzh.ch/) 150 | 151 | * [Dynamic Robotics Laboratory, Oregon State University](https://mime.oregonstate.edu/research/drl/ ) 152 | 153 | * [Robotic Systems Lab, ethz](https://rsl.ethz.ch/) 154 | 155 | * [UC Berkeley's Robot Learning Lab](https://rll.berkeley.edu/) 156 | 157 | * [Robotic AI & Learning Lab](http://rail.eecs.berkeley.edu/) 158 | 159 | * [Learning Agents Research Group](https://www.cs.utexas.edu/~pstone/index.shtml) 160 | 161 | 162 | 163 | --- 164 | 165 | ## References 166 | 167 | 1. https://zhuanlan.zhihu.com/p/508916024 168 | 2. https://www.zhihu.com/question/516672871/answer/2409132149 169 | 3. -------------------------------------------------------------------------------- /Tools.md: -------------------------------------------------------------------------------- 1 | # RL Tools 2 | 3 | Contributors: [johnjim0816](https://github.com/JohnJim0816) 4 | 5 | ## Frameworks 6 | 7 | [OpenSpiel](https://github.com/deepmind/open_spiel): A Framework for Reinforcement Learning in Games, including **DeepNash** 8 | 9 | ### RL-basics 10 | 11 | * [OpenAI Gym](https://github.com/openai/gym) 12 | 13 | ### Offline RL 14 | 15 | * [D4RL](https://sites.google.com/view/d4rl/home) 16 | 17 | ### Robotic Platforms: 18 | - [Habitat](https://aihabitat.org/): Real world simulator for Embodied AI 19 | - [AI2-THOR](https://github.com/allenai/ai2thor) 20 | - [Meta-World](https://github.com/rlworkgroup/metaworld) 21 | - [CoppeliaSim](https://www.coppeliarobotics.com/) + [PyRep](https://github.com/stepjam/PyRep) 22 | - [OpenAI Gym Mujoco](https://github.com/deepmind/mujoco) 23 | - [OpenAI robogym](https://github.com/openai/robogym) 24 | ### Other RL platforms 25 | - [Gym Retro](https://github.com/openai/retro) 26 | - [MineRL](https://minerl.readthedocs.io/en/v1.0.0/tutorials/index.html) 27 | --------------------------------------------------------------------------------