├── .gitignore
├── ES for RL.md
├── LFHF.md
├── MARL-basics.md
├── README.md
├── RL-Metalearning.md
├── RL-basics.md
├── RL4Control.md
├── RL4DrugDiscovery.md
├── RL4Game.md
├── RL4IIoT.md
├── RL4IL.md
├── RL4Policy-Diversity.md
├── RL4QD.md
├── RL4Robot.md
└── Tools.md


/.gitignore:
--------------------------------------------------------------------------------
1 | .DS_Store


--------------------------------------------------------------------------------
/ES for RL.md:
--------------------------------------------------------------------------------
 1 | # Combining Evolution and Deep Reinforcement Learning for Policy Search: a Survey
 2 | 
 3 | Contributors:
 4 | 
 5 | ## **Policy Search**  
 6 | 
 7 | 1. **PEPG:** [Parameter-exploring policy gradients](https://www.sciencedirect.com/science/article/abs/pii/S0893608009003220), Sehnke F et al, 2010, Neural Networks.  
 8 | 2. **NES:** [Natural evolution strategies](https://www.jmlr.org/papers/volume15/wierstra14a/wierstra14a.pdf), Wierstra D et al, 2014, The Journal of Machine Learning Research.  
 9 | 3. **OpenAI-ES:** [Evolution strategies as a scalable alternative to reinforcement learning](https://arxiv.org/abs/1703.03864), Salimans T et al, 2017.  
10 | 4. **GA:** [Deep neuroevolution: Genetic algorithms are a competitive alternative for training deep neural networks for reinforcement learning](https://arxiv.org/abs/1712.06567), Such F P et al, 2017.  
11 | 5. **NS-ES/NSR-ES/NSRA-ES:** [Improving exploration in evolution strategies for deep reinforcement learning via a population of novelty-seeking agents](https://proceedings.neurips.cc/paper/2018/hash/b1301141feffabac455e1f90a7de2054-Abstract.html), Conti E et al, 2018, NeurIPS.  
12 | 6. **TRES:** [Trust region evolution strategies](https://ojs.aaai.org/index.php/AAAI/article/view/4345), Liu G et al, 2019, AAAI.  
13 | 7. **Guided ES:** [Guided evolutionary strategies: Augmenting random search with surrogate gradients](http://proceedings.mlr.press/v97/maheswaranathan19a.html), Maheswaranathan N et al, 2019, ICML.  
14 | 8. **PBT:** [Population based training of neural networks](https://arxiv.org/abs/1711.09846), Jaderberg M et al, 2017.  
15 | 9. **PB2:** [Provably efficient online hyperparameter optimization with population-based bandits](https://proceedings.neurips.cc/paper/2020/hash/c7af0926b294e47e52e46cfebe173f20-Abstract.html), Parker-Holder J et al, 2020, Advances in Neural Information Processing Systems.  
16 | 10. **SEARL:** [Sample-efficient automated deep reinforcement learning](https://arxiv.org/abs/2009.01555), Franke J K H et al, 2020.  
17 | 11. **DERL:** [Embodied intelligence via learning and evolution](https://www.nature.com/articles/s41467-021-25874-z), Gupta A et al, 2021, Nature communications.  
18 | 
19 | ******
20 | ## **Experience-guided**
21 | 
22 | 1. **ERQL:** [Bootstrapping $ q $-learning for robotics from neuro-evolution results](https://ieeexplore.ieee.org/abstract/document/7879193), Zimmer M et al, 2017, IEEE.  
23 | 2. **GRP-PG:** [Gep-pg: Decoupling exploration and exploitation in deep reinforcement learning algorithms](https://proceedings.mlr.press/v80/colas18a.html), Colas C et al, 2018, ICML.  
24 | 3. **ERL:** [Evolution-guided policy gradient in reinforcement learning](https://proceedings.neurips.cc/paper/2018/hash/85fc37b18c57097425b52fc7afbb6969-Abstract.html), Khadka S et al, 2018, NeurIPS.
25 | 4. **CEM-RL:** [CEM-RL: Combining evolutionary and gradient-based methods for policy search](https://arxiv.org/abs/1810.01222), Pourchot A et al, 2018.  
26 | 5. **CERL:** [Collaborative evolutionary reinforcement learning](https://proceedings.mlr.press/v97/khadka19a.html), Khadka S et al, 2019, ICML.  
27 | 6. **PDERL:** [Proximal distilled evolutionary reinforcement learning](https://ojs.aaai.org/index.php/AAAI/article/view/5728), Bodnar C et al, 2020, AAAI.  
28 | 7. **RIM:** [Recruitment-imitation mechanism for evolutionary reinforcement learning](https://www.sciencedirect.com/science/article/abs/pii/S0020025520311828), Lü S et al, 2021, Information Sciences.  
29 | 8. **ESAC:** [Maximum mutation reinforcement learning for scalable control](https://arxiv.org/abs/2007.13690), Suri K et al, 2020.  
30 | 9. **QD-RL:** [Qd-rl: Efficient mixing of quality and diversity in reinforcement learning](https://arxiv.org/abs/2006.08505), Cideron G et al, 2020.  
31 | 10. **SUPE-RL:** [Genetic soft updates for policy evolution in deep reinforcement learning](https://openreview.net/forum?id=TGFO0DbD_pk), Marchesini E et al, 2020, ICLR.  
32 | 
33 | ******
34 | ## **Modules-embedded**
35 | 
36 | 1. **PPO-CMA:** [PPO-CMA: Proximal policy optimization with covariance matrix adaptation](https://ieeexplore.ieee.org/abstract/document/9231618), Hämäläinen P et al, 2020, IEEE.  
37 | 2. **EPG:** [Evolved policy gradients](https://proceedings.neurips.cc/paper/2018/hash/7876acb66640bad41f1e1371ef30c180-Abstract.html), Houthooft R et al, 2018, NeurIPS.  
38 | 3. **CGP:** [Q-learning for continuous actions with cross-entropy guided policies](https://arxiv.org/abs/1903.10605), Simmons-Edler R et al, 2019.  
39 | 4. **GRAC:** [Grac: Self-guided and self-regularized actor-critic](https://proceedings.mlr.press/v164/shao22a.html), Shao L et al, 2022, CoRL.  


--------------------------------------------------------------------------------
/LFHF.md:
--------------------------------------------------------------------------------
 1 | 
 2 | 
 3 | * [Learning to summarize from human feedback](https://arxiv.org/abs/2009.01325)
 4 | > OpenAI采用LFHF技术在NLP领域的初步尝试
 5 | * Raining a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback
 6 | > Anthropic团队尝试用LFHF技术解决Harmless Assistant问题
 7 | * [InstructGPT: Training language models to follow instructions with human feedback](https://arxiv.org/abs/2203.02155)
 8 | > OpenAI采用LFHF技术在NLP领域的进一步尝试，也是ChatGPT的前身
 9 | * Constitutional AI: Harmlessness from AI Feedback
10 | > 针对Human feedback效率低的问题，提出AI feedback方案
11 | * Scaling Laws for Reward Model Overoptimization
12 | > 分析 Reward Model 细节的文章


--------------------------------------------------------------------------------
/MARL-basics.md:
--------------------------------------------------------------------------------
 1 | # Paper Collection of MARL
 2 | 
 3 | Contributors:
 4 | ## MARL Basic
 5 | 
 6 | ### CTDE : Centralized Training, Decentralized Execution
 7 | 1. VDN :[Value-decomposition networks for cooperative multi-agent learning](https://arxiv.org/pdf/1706.05296.pdf), Sunehag P, et al 2017.  
 8 | 2. QMIX : [Qmix: Monotonic value function factorisation for deep multi-agent reinforcement learning](http://proceedings.mlr.press/v80/rashid18a/rashid18a.pdf) Rashid T, et al 2018, ICML
 9 | 3. QTRAN : [Qtran: Learning to factorize with transformation for cooperative multi-agent reinforcement learning](http://proceedings.mlr.press/v97/son19a/son19a.pdf) Son K, et al 2019, ICML
10 | 4. QATTEN : [Qatten: A general framework for cooperative multiagent reinforcement learning](https://arxiv.org/pdf/2002.03939.pdf) Yang Y, et al 2020. 
11 | 3. MADDPG : [Multi-agent actor-critic for mixed cooperative-competitive environments](https://proceedings.neurips.cc/paper/2017/file/68a9750337a418a86fe06c1991a1d64c-Paper.pdf) Lowe R et al 2017, NIPS 
12 | 4. COMA : [Counterfactual multi-agent policy gradients Foerster](https://ojs.aaai.org/index.php/AAAI/article/download/11794/11653) J et al 2018, AAAI
13 | 8.  MAPPO : [Joint optimization of handover control and power allocation based on multi-agent deep reinforcement learning](https://github.com/cr-bh/on-policy) Guo D, et al 2020
14 | 9. HATRPO & HAPPO : [Trust region policy optimisation in multi-agent reinforcement learning](https://arxiv.org/pdf/2109.11251.pdf) Kuba J G, et al 2021
15 | 7. MA3C : [Asynchronous Actor-Critic for Multi-Agent Reinforcement Learning](https://arxiv.org/pdf/2209.10113.pdf) Xiao Y, et al 2022
16 | 
17 | ### DTDE : Decentralized Training, Decentralized Execution
18 | IPPO : [Is independent learning all you need in the starcraft multi-agent challenge?](https://arxiv.org/pdf/2011.09533) de Witt C S, et al 2020
19 | 
20 | ### Communication
21 | 1. RIAL & DIAL: [Learning to communicate with deep multi-agent reinforcement learning](https://proceedings.neurips.cc/paper/2016/file/c7635bfd99248a2cdef8249ef7bfbef4-Paper.pdf) Foerster J, et al 2016, NIPS 
22 | 2. CommNet : [Learning multiagent communication with backpropagation](https://proceedings.neurips.cc/paper/2016/file/55b1927fdafef39c48e5b73b5d61ea60-Paper.pdf) Sukhbaatar S, et al 2016, NIPS
23 | 3.  BicNet : [Multiagent bidirectionally-coordinated nets: Emergence of human-level coordination in learning to play starcraft combat games](https://arxiv.org/pdf/1703.10069.pdf) Peng P, et al 2017. 
24 | 4. ATOC : [Learning attentional communication for multi-agent cooperation](https://proceedings.neurips.cc/paper/2018/file/6a8018b3a00b69c008601b8becae392b-Paper.pdf) Jiang J, et al 2018, NIPS
25 | 5. IC3Net : [Learning when to communicate at scale in multiagent cooperative and competitive tasks](https://arxiv.org/pdf/1812.09755.pdf) Singh A, et al 2018
26 | 6. Tramac : [Tarmac: Targeted multi-agent communication](http://proceedings.mlr.press/v97/das19a/das19a.pdf) Das A, et al 2019, ICML
27 | 7. NDQ : [Learning nearly decomposable value functions via communication minimization](https://arxiv.org/pdf/1910.05366.pdf) Wang T, et al 2019
28 | 8. SchedNet : [Learning to schedule communication in multi-agent reinforcement learning](https://arxiv.org/pdf/1902.01554.pd) Kim D, et al 2019
29 | 9. [Social influence as intrinsic motivation for multi-agent deep reinforcement learning](http://proceedings.mlr.press/v97/jaques19a/jaques19a.pdf) Jaques N, et al 2019, ICML 
30 | 10. Infobot : [Infobot: Transfer and exploration via the information bottleneck](https://arxiv.org/pdf/1901.10902.pdf) Goyal A, et al 2019
31 | 
32 | ## MARL FOR MAPF
33 | 
34 | 1. PRMIAL : [Primal: Pathfinding via reinforcement and imitation multi-agent learning](https://ieeexplore.ieee.org/ielaam/7083369/8668830/8661608-aam.pdf) Sartoretti G, et al 2019, ICRA
35 | 3. MARLSP : [Learning to cooperate: Application of deep reinforcement learning for online AGV path finding](https://ifaamas.org/Proceedings/aamas2020/pdfs/p2077.pdf) Zhang Y, et al 2020, AAMAS
36 | 4. MAPPER: [Mapper: Multi-agent path planning with evolutionary reinforcement learning in mixed dynamic environments](https://arxiv.org/pdf/2007.15724) Liu Z, et a 2020, IROS
37 | 5. G2RL : [Mobile robot path planning in dynamic environments through globally guided reinforcement learning](https://arxiv.org/pdf/2005.05420) Wang B, et al 2020
38 | 2. PRMIAL2 : [PRIMAL2: Pathfinding via reinforcement and imitation multi-agent learning-lifelong](https://arxiv.org/pdf/2010.08184) Damani M, et al 2021, ICRA
39 | 6. DHC :[Distributed heuristic multi-agent path finding with communication](https://arxiv.org/pdf/2106.11365) Ma Z, et al 2021, ICRA
40 | 7. PICO : [Multi-Agent Path Finding with Prioritized Communication](https://arxiv.org/pdf/2202.03634) Learning Li W, et al 2022


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | 
 2 | 
 3 | # Paper Collection of RL and Its Applications
 4 | This repo mainly collects paper of RL (Reinforcement Learning) and Its Applications, and also tools including datasets , envs and frameworks usually used in RL.
 5 | 
 6 | ## Paper lists
 7 | 
 8 | [RL-basics](./RL-basics.md): basic papers of RL, if you want to learn RL, you must not miss these.
 9 | 
10 | [MARL-basics](./MARL-basics.md):  basic papers of multi-agent reinforcement learning(MARL), if you want to learn RL, you must not miss these.
11 | 
12 | [RL4RS](): RL for recommendation systems
13 | 
14 | [RL4Game](./RL4Game.md): RL for game theory
15 | 
16 | [RL4Traffics]()
17 | 
18 | [RL4Policy-Diversity]()
19 | 
20 | [RL4DrugDiscovery](./RL4DrugDiscovery.md): Drug discovery is a challenging multi-objective optimization problem where multiple pharmaceutical objectives need to be satisfied. Recently, utilizing reinforcement learning to generate molecules with desired physicochemical properties such as solubility has been acknowledged as a promising strategy for drug design. 
21 | 
22 | [RL4QD](./RL4QD.md): Quality-Diversity methods are evolutionary based algorithms to return the collection contains several working solutions, and also deal with the exploration-exploitation trade-off. 
23 | 
24 | [RL4IL](./RL4IL.md): RL for imitation learning
25 | 
26 | [RL4Robot](./RL4Robot.md): RL for Robot. According to the classification of robot types, papers of the same category are arranged in chronological order, and papers that have been physically verified are preferred.
27 | 
28 | [RL4IIoT](./RL4IIoT.md): With the technological breakthrough of 5G, more and more Internet of Things (IoT) technologies are being used in industrial scenarios. Industrial IoT (IIoT), which refers to the integrating industrial manufacturing systems and the Internet of Things (IoT), has received accumulating attention. These emerging IIoT applications and have higher requirements on quality of experience (QoE) which cannot be easily satisfied by heuristic algorithms. Recently, some research use RL to learn algorithms for IIoT tasks through exploiting the potential feature of the IIoT environment,
29 | 
30 | [LFHF](./LFHF.md): Learn From Human Feedback，ChatGPT的核心技术之一。
31 | 
32 | ## Tools
33 | 
34 | [Tools](./Tools.md):  including datasets , envs and frameworks
35 | 
36 | 
37 | 
38 | ## Main Contributors
39 | 
40 | <table border="0">
41 |   <tbody>
42 |     <tr align="center" >
43 |       <td>
44 |          <a href="https://github.com/cr-bh"><img width="70" height="70" src="https://github.com/cr-bh.png?s=40" alt="pic"></a><br>
45 |          <a href="https://github.com/cr-bh">Ariel Chen</a>
46 |          <p> MARL-basics <br> THU </p>
47 |       </td>
48 |       <td>
49 |          <a href="https://github.com/L3Y1Q2"><img width="70" height="70" src="https://github.com/L3Y1Q2.png?s=40" alt="pic"></a><br>
50 |          <a href="https://github.com/L3Y1Q2">Yongqi Li</a>
51 |          <p> RL4Robotics&MRS <br> SUSTech </p>
52 |       </td>
53 |       <td>
54 |          <a href="https://github.com/curryliu30"><img width="70" height="70" src="https://github.com/curryliu30.png?s=40" alt="pic"></a><br>
55 |          <a href="https://github.com/curryliu30">Erlong Liu</a>
56 |          <p> QD&ERL <br> NJU </p>
57 |       </td>
58 |       <td>
59 |          <a href="https://github.com/clorisqiu1"><img width="70" height="70" src="https://github.com/clorisqiu1.png?s=40" alt="pic"></a><br>
60 |          <a href="https://github.com/clorisqiu1">Wen Qiu</a>
61 |          <p> DQN&PG&Exploration <br> KIT </p>
62 |       </td>
63 |       <td>
64 |          <a href="https://github.com/shikejianalan"><img width="70" height="70" src="https://github.com/shikejianalan.png?s=40" alt="pic"></a><br>
65 |          <a href="https://github.com/shikejianalan">Kejian Shi</a>
66 |          <p> RL&Robotics <br> IC </p>
67 |       </td>
68 |       <td>
69 |          <a href="https://github.com/JohnJim0816"><img width="70" height="70" src="https://github.com/JohnJim0816.png?s=40" alt="pic"></a><br>
70 |          <a href="https://github.com/JohnJim0816">John Jim</a>
71 |          <p> offline RL <br> PKU </p>
72 |       </td>
73 |     </tr>
74 |   </tbody>
75 | </table>
76 | 


--------------------------------------------------------------------------------
/RL-Metalearning.md:
--------------------------------------------------------------------------------
 1 | # Meta-learning
 2 | 
 3 | Authors: [Tienyu Zuo](https://github.com/TienyuZuo)
 4 | * [Meta-learning from Learning Curves: Challenge Design and Baseline Results](https://ieeexplore.ieee.org/document/9892534), Nguyen et al, 2022, IJCNN.
 5 | 
 6 | * [Exploration With Task Information for Meta Reinforcement Learning](https://ieeexplore.ieee.org/document/9604770), Peng et al, 2021, IEEE Trans. NNLS.
 7 | 
 8 | * [Meta-Learning-Based Deep Reinforcement Learning for Multiobjective Optimization Problems](https://ieeexplore.ieee.org/document/9714721), Zhang et al, 2022, IEEE Trans. NNLS.
 9 | 
10 | * [Meta-Reinforcement Learning With Dynamic Adaptiveness Distillation](https://ieeexplore.ieee.org/document/9525812), Hu et al, 2021, IEEE Trans. NNLS.
11 | 
12 | * [Meta-Reinforcement Learning in Non-Stationary and Dynamic Environments](https://ieeexplore.ieee.org/document/9804728), Bing et al, 2022, IEEE Trans. PAMI.
13 | 
14 | * [MURAL: Meta-Learning Uncertainty-Aware Rewards for Outcome-Driven Reinforcement Learning](https://proceedings.mlr.press/v139/li21g.html), Gupta et al, 2021, ICML.
15 | 
16 |   
17 | 
18 | # Multi-task
19 | 
20 | * [Prioritized Sampling with Intrinsic Motivation in Multi-Task Reinforcement Learning](https://ieeexplore.ieee.org/document/9892973), D'Eramo et al, 2022, IJCNN.
21 | 
22 | * [A Multi-Task Learning Framework for Head Pose Estimation under Target Motion](https://ieeexplore.ieee.org/document/7254213), Yan et al, 2015, IEEE Trans. PAMI.
23 | 
24 | * [Multi-Task Reinforcement Learning in Reproducing Kernel Hilbert Spaces via Cross-Learning](https://ieeexplore.ieee.org/document/9585424), Cerviño et al, 2021, IEEE Trans. SP.
25 | 
26 | * [Multi-Task Reinforcement Learning with Soft Modularization](https://proceedings.neurips.cc/paper/2020/hash/32cfdce9631d8c7906e8e9d6e68b514b-Abstract.html), Yang et al, 2020, NIPS.
27 | 
28 | * [Provably efficient multi-task reinforcement learning with model transfer](https://proceedings.neurips.cc/paper/2021/hash/a440a3d316c5614c7a9310e902f4a43e-Abstract.html), Zhang et al, 2021, NIPS.
29 | 
30 | * [Multi-Task Deep Reinforcement Learning with PopArt](https://ojs.aaai.org/index.php/AAAI/article/view/4266), Hessel et al, 2019, AAAI.
31 | 
32 |   
33 | 
34 | # Hierarchical RL
35 | 
36 | * H-DQN: [Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation](https://proceedings.neurips.cc/paper/2016/file/f442d33fa06832082290ad8544a8da27-Paper.pdf), Kulkarni et al, 2016, NIPS.
37 | 
38 | * [HierRL: Hierarchical Reinforcement Learning for Task Scheduling in Distributed Systems](https://ieeexplore.ieee.org/document/9892507), Guan et al, 2022, IJCNN.
39 | * [Data-Efficient Hierarchical Reinforcement Learning](https://proceedings.neurips.cc/paper/2018/hash/e6384711491713d29bc63fc5eeb5ba4f-Abstract.html), Nachum et al, 2018, NIPS.
40 | * [FeUdal Networks for Hierarchical Reinforcement Learning](http://proceedings.mlr.press/v70/vezhnevets17a.html), Vezhnevets et al, 2017, ICML.
41 | * [Hierarchical Reinforcement Learning with Advantage-Based Auxiliary Rewards](https://proceedings.neurips.cc/paper/2019/hash/81e74d678581a3bb7a720b019f4f1a93-Abstract.html), Li et al, 2019, NIPS.
42 | 
43 | 
44 | 
45 | # Order dispatching
46 | 
47 | * [A Hierarchical Reinforcement Learning Based Optimization Framework for Large-scale Dynamic Pickup and Delivery Problems](https://proceedings.neurips.cc/paper/2021/hash/c6a01432c8138d46ba39957a8250e027-Abstract.html), Ma et al, 2021, NIPS.
48 | 
49 | * [Deep Reinforcement Learning for Ride-sharing Dispatching and Repositioning](https://www.ijcai.org/proceedings/2019/0958.pdf), Qin et al, 2019, IJCAI.
50 | 
51 | * [A City-Wide Crowdsourcing Delivery System with Reinforcement Learning](https://dl.acm.org/doi/abs/10.1145/3478117), Ding et al, 2021.
52 | 
53 | * [Deep Reinforcement Learning with Knowledge Transfer for Online Rides Order Dispatching](https://ieeexplore.ieee.org/document/8594886), Wang et al, 2018, ICDM.
54 | 
55 | * [Combinatorial Optimization Meets Reinforcement Learning: Effective Taxi Order Dispatching at Large-Scale](https://ieeexplore.ieee.org/document/9611023), Tong et al, 2021, IEEE Trans. KDE.
56 | 
57 | * [An Integrated Reinforcement Learning and Centralized Programming Approach for Online Taxi Dispatching](https://ieeexplore.ieee.org/document/9366995), Liang et al, 2021, IEEE Trans. NNLS.
58 | 
59 | * [Context-Aware Taxi Dispatching at City-Scale Using Deep Reinforcement Learning](https://ieeexplore.ieee.org/document/9247444), Liu et al, 2020, IEEE Trans. ITS.
60 | 
61 | * [A Learning and Operation Planning Method for Uber Energy Storage System: Order Dispatch](https://ieeexplore.ieee.org/document/9868255), Tao et al, 2022, IEEE Trans. ITS.
62 | 
63 | * [Distributed Q -Learning-Based Online Optimization Algorithm for Unit Commitment and Dispatch in Smart Grid](https://ieeexplore.ieee.org/document/8746822), Li et al, 2019, IEEE Trans. Cyb.
64 | 
65 | * [Efficient Large-Scale Fleet Management via Multi-Agent Deep Reinforcement Learning](https://dl.acm.org/doi/abs/10.1145/3219819.3219993), Lin et al, 2018, KDD.
66 | 
67 | * [PassGoodPool: Joint Passengers and Goods Fleet Management With Reinforcement Learning Aided Pricing, Matching, and Route Planning](https://ieeexplore.ieee.org/abstract/document/9655445), Manchella et al, 2021, IEEE Trans. ITS.
68 | 
69 | * [Deep Reinforcement Learning for Multi-driver Vehicle Dispatching and Repositioning Problem](https://ieeexplore.ieee.org/abstract/document/8970873), Holler et al, 2018, ICDM.
70 | 
71 | * [Supply-Demand-aware Deep Reinforcement Learning for Dynamic Fleet Management](https://dl.acm.org/doi/full/10.1145/3467979), Zheng et al, 2022.
72 | 
73 | * [AdaPool: A Diurnal-Adaptive Fleet Management Framework Using Model-Free Deep Reinforcement Learning and Change Point Detection](https://ieeexplore.ieee.org/abstract/document/9565816), Haliem et al, 2021, IEEE Trans. ITS.
74 | 
75 | * [CoRide: Joint Order Dispatching and Fleet Management for Multi-Scale Ride-Hailing Platforms](https://dl.acm.org/doi/abs/10.1145/3357384.3357978), Jin et al, 2019, CIKM. 
76 | 
77 |   
78 | 
79 | # Auction
80 | 
81 | * [Reinforcement-Learning- and Belief-Learning-Based Double Auction Mechanism for Edge Computing Resource Allocation](https://ieeexplore.ieee.org/document/8896972), Li et al, 2019, IEEE IoT Journal.
82 | * [Intelligent EV Charging for Urban Prosumer Communities: An Auction and Multi-Agent Deep Reinforcement Learning Approach](https://ieeexplore.ieee.org/document/9737233), Zou et al, 2022, IEEE Trans. NSM.
83 | * [Comparisons of Auction Designs through Multi-Agent Learning in Peer-to-Peer Energy Trading](https://ieeexplore.ieee.org/document/9828543), Zhao et al, 2022, IEEE Trans. SG.
84 | * [Coordination for Multi-Energy Microgrids Using Multi-Agent Reinforcement Learning](https://ieeexplore.ieee.org/document/9760021), Qiu et al, 2022, IEEE Trans. II.
85 | * [Multi-Agent Reinforcement Learning for Automated Peer-to-Peer Energy Trading in Double-Side Auction Market](https://www.ijcai.org/proceedings/2021/0401.pdf), Qiu et al, 2021, IJCAI.
86 | 


--------------------------------------------------------------------------------
/RL-basics.md:
--------------------------------------------------------------------------------
  1 | <!--
  2 |  * @Author: QIU clorisqiu1@gmail.com
  3 |  * @Date: 2022-12-03 22:45:58
  4 |  * @LastEditors: QIU clorisqiu1@gmail.com
  5 |  * @LastEditTime: 2022-12-03 23:28:55
  6 |  * @FilePath: /rl-papers/RL-basics.md
  7 |  * @Description: 这是默认设置,请设置`customMade`, 打开koroFileHeader查看配置 进行设置: https://github.com/OBKoro1/koro1FileHeader/wiki/%E9%85%8D%E7%BD%AE
  8 | -->
  9 | ## Paper Collection of RL basics
 10 | 
 11 | Contributors:
 12 | 
 13 | ### Review papers
 14 | 
 15 | * Offline RL: [A Survey on Offline Reinforcement Learning: Taxonomy, Review, and Open Problems](https://arxiv.org/abs/2203.01387), Rafael Figueiredo Prudencio et al, 2022
 16 | 
 17 | ### DQN Related
 18 | 
 19 | * DQN: [Playing Atari with Deep Reinforcement Learning](https://arxiv.org/pdf/1312.5602v1.pdf), V.Mnih et al 2013.
 20 |   > Typical DQN paper
 21 | * DQN: [Human-level control through deep reinforcement learning](https://storage.googleapis.com/deepmind-media/dqn/DQNNaturePaper.pdf), V.Mnih et al 2015, Nature.
 22 |   > Narture DQN, Compared to typical DQN paper, proposes a periodically updated target Q to address instabilities, which is more common today
 23 | * DoubleDQN: [Deep Reinforcement Learning with Double Q-Learning](https://ojs.aaai.org/index.php/AAAI/article/view/10295), H Van Hasselt et al 2016, AAAI.
 24 | * DuelingDQN: [Dueling Network Architectures for Deep Reinforcement Learning](http://proceedings.mlr.press/v48/wangf16.pdf), Z Wang et al 2015.
 25 | * PER: [Prioritized Experience Replay](https://arxiv.org/pdf/1511.05952.pdf), T Schaul et al 2015, ICLR.
 26 | * Rainbow DQN: [Rainbow: Combining Improvements in Deep Reinforcement Learning](https://arxiv.org/pdf/1710.02298.pdf), M Hessel et al 2017, AAAI.
 27 | * DRQN: [Deep Recurrent Q-Learning for Partially Observable MDPs](https://www.aaai.org/ocs/index.php/FSS/FSS15/paper/view/11673/11503), M Hausknecht e al 2015, AAAI.
 28 | * Noisy DQN: [Noisy Networks for Exploration](https://arxiv.org/pdf/1706.10295.pdf), M Fortunato et al 2017, ICLR.
 29 | * Averaged-DQN: [Averaged-DQN: Variance Reduction and Stabilization for Deep Reinforcement Learning](http://proceedings.mlr.press/v70/anschel17a/anschel17a.pdf), O Anschel et al 2016, ICML.
 30 | * C51: [A Distributional Perspective on Reinforcement Learning](https://arxiv.org/abs/1707.06887), 2017, ICML
 31 | * [Neural Network Dynamics for Model-Based Deep Reinforcement Learning with Model-Free Fine-Tuning](https://arxiv.org/pdf/1708.02596.pdf),A Nagabandi et al 2017, ICRA.
 32 | * [Deep Reinforcement Learning and the Deadly Triad](https://arxiv.org/pdf/1812.02648.pdf), H. V. Hasselt et al 2018.
 33 | 
 34 | ### Policy gradient and related
 35 | 
 36 | * REINFORCE: [Policy Gradient  Methods for  Reinforcement  Learning with Function  Approximation](https://proceedings.neurips.cc/paper/1999/file/464d828b85b0bed98e80ade0a5c43b0f-Paper.pdf), Sutton et al, 1999, NIPS
 37 | * A3C: [Asynchronous Methods for Deep Reinforcement Learning](http://proceedings.mlr.press/v48/mniha16.pdf), Mnig et al, 2016, ICML.
 38 | * TRPO: [Trust Region Policy Optimization](http://proceedings.mlr.press/v37/schulman15.pdf), J. Schulman et al 2015, ICML.
 39 | * GAE: [High-Dimensional Continuous Control Using Generalized Advantage Estimation](https://arxiv.org/pdf/1506.02438.pdf), J. Schulman et al 2015, ICLR.
 40 | * PPO: [Proximal Policy Optimization Algorithms](https://arxiv.org/pdf/1707.06347.pdf), J. Schulman et al 2017.
 41 |   > Update in small batches, solve the problem that step size in Policy Gradient algorithm is difficult to determine, and KL divergence as Penalty is easier to solve than TRPO
 42 | * Distributed PPO: [Emergence of Locomotion Behaviours in Rich Environments](https://arxiv.org/pdf/1707.02286.pdf), N. Heess et al 2017.
 43 | * ACKTR: [Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation](https://proceedings.neurips.cc/paper/2017/file/361440528766bbaaaa1901845cf4152b-Paper.pdf), Y Wu et al 2017, NIPS.
 44 | * ACER: [Sample Efficient Actor-Critic with Experience Replay](https://arxiv.org/pdf/1611.01224.pdf), Z Wang et al 2016, ICLR.
 45 | * DPG: [Deterministic Policy Gradient Algorithms](http://proceedings.mlr.press/v32/silver14.pdf), D Silver et al 2014, ICML.
 46 | * DDPG: [Continuous control with deep reinforcement learning](https://arxiv.org/pdf/1509.02971.pdf), TP Lillicrap et al 2016, ICLR.
 47 | * TD3: [Addressing Function Approximation Error in Actor-Critic Methods](http://proceedings.mlr.press/v80/fujimoto18a/fujimoto18a.pdf), S Fujimoto et al 2018, ICML.
 48 | * C51: [A Distributional Perspective on Reinforcement Learning](http://proceedings.mlr.press/v70/bellemare17a/bellemare17a.pdf), MG Bellemare et al 2017, ICML.
 49 | * Q-Prop:[Q-Prop: Sample-Efficient Policy Gradient with An Off-Policy Critic](https://arxiv.org/pdf/1611.02247.pdf), S Gu et al 2016, ICLR.
 50 | * [Action-dependent Control Variates for Policy Optimization via Stein’s Identity](https://arxiv.org/pdf/1710.11198.pdf), H Liu et al 2017, ICLR.
 51 | * [The Mirage of Action-Dependent Baselines in Reinforcement Learning](http://proceedings.mlr.press/v80/tucker18a/tucker18a.pdf), G Tucker et al 2018, ICML.
 52 | * PCL:[Bridging the Gap Between Value and Policy Based Reinforcement Learning](https://proceedings.neurips.cc/paper/2017/file/facf9f743b083008a894eee7baa16469-Paper.pdf), O Nachum et al 2017, NIPS.
 53 | * Trust-PCL:[Trust-PCL: An Off-Policy Trust Region Method for Continuous Control](https://arxiv.org/pdf/1707.01891.pdf), O Nachum et al 2017, CoRR.
 54 | * PGQL:[Combining Policy Gradient and Q-learning](https://arxiv.org/pdf/1611.01626.pdf), B O'Donoghue et al 2016, ICLR.
 55 | * [The Reactor: A Fast and Sample-Efficient Actor-Critic Agent for Reinforcement Learning](https://arxiv.org/pdf/1704.04651.pdf), A Gruslys et al 2017, ICLR.
 56 | * IPG:[Interpolated Policy Gradient: Merging On-Policy and Off-Policy Gradient Estimation for Deep Reinforcement Learning](https://arxiv.org/pdf/1706.00387.pdf), S Gu et al 2017, NIPS.
 57 | * [Equivalence Between Policy Gradients and Soft Q-Learning](https://arxiv.org/pdf/1704.06440.pdf), J Schulman et al 2017.
 58 | * IQN:[Implicit Quantile Networks for Distributional Reinforcement Learning](http://proceedings.mlr.press/v80/dabney18a/dabney18a.pdf), W Dabney et al 2018, ICML.
 59 | * [Dopamine: A Research Framework for Deep Reinforcement Learning](https://arxiv.org/pdf/1812.06110.pdf), PS Castro et al 2018.
 60 | 
 61 | ### Exploration and related
 62 | 
 63 | * VIME:[VIME: Variational Information Maximizing Exploration](https://proceedings.neurips.cc/paper/2016/file/abd815286ba1007abfbb8415b83ae2cf-Paper.pdf), R Houthooft et al 2017, NIPS.
 64 | * [Unifying Count-Based Exploration and Intrinsic Motivation](https://proceedings.neurips.cc/paper/2016/file/afda332245e2af431fb7b672a68b659d-Paper.pdf), MG Bellemare et al 2016, NIPS.
 65 | * [Count-Based Exploration with Neural Density Models](http://proceedings.mlr.press/v70/ostrovski17a/ostrovski17a.pdf), G Ostrovski et al 2017, ICML.
 66 | * [#Exploration: A Study of Count-Based Exploration for Deep Reinforcement Learning](https://proceedings.neurips.cc/paper/2017/file/3a20f62a0af1aa152670bab3c602feed-Paper.pdf), H Tang et al 2016, NIPS. 
 67 | * EX2:[EX2: Exploration with Exemplar Models for Deep Reinforcement Learning](https://proceedings.neurips.cc/paper/2017/file/1baff70e2669e8376347efd3a874a341-Paper.pdf), J Fu et al 2017, NIPS.
 68 | * ICM:[Curiosity-driven Exploration by Self-supervised Prediction](http://proceedings.mlr.press/v70/pathak17a/pathak17a.pdf), D Pathak et al 2017, ICML.
 69 | * [Large-Scale Study of Curiosity-Driven Learning](https://arxiv.org/pdf/1808.04355.pdf), Y Burda et al 2018, ICLR.
 70 | * RND:[Exploration by Random Network Distillation](https://arxiv.org/pdf/1810.12894.pdf%20http://arxiv.org/abs/1810.12894.pdf), Y Burda et al 2018, ICLR.
 71 | 
 72 | ### Maximum Entropy RL
 73 | 
 74 | * SAC_V: [Soft Actor-Critic:Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor](http://proceedings.mlr.press/v80/haarnoja18b/haarnoja18b.pdf), T Haarnoja et al 2018, ICML.
 75 | 
 76 | * SAC: [Soft Actor-Critic Algorithms and Applications ](https://arxiv.org/pdf/1812.05905.pdf), T Haarnoja et al 2018, CoRR 
 77 | 
 78 |   > SAC_V suffers from brittleness to the temperature hyperparameter, thus SAC solves it by automatic gradient-based temperature.
 79 | 
 80 | 
 81 | ### Distributed RL
 82 | 
 83 | * Distributed DQN:[Massively Parallel Methods for Deep Reinforcement Learning](https://arxiv.org/pdf/1507.04296.pdf), A Nair et al 2015.
 84 | * [Distributed Prioritized Experience Replay](https://arxiv.org/pdf/1803.00933.pdf), D Horgan et al 2018, ICLR.
 85 | * QR-DQN:[Distributional Reinforcement Learning with Quantile Regression](https://ojs.aaai.org/index.php/AAAI/article/view/11791), W Dabney et al 2017, AAAI.
 86 | 
 87 | 
 88 | ### Offline RL
 89 | 
 90 | * REM: [An Optimistic Perspective on Offline Reinforcement Learning](https://arxiv.org/abs/1907.04543), Rishabh Agarwal et al 2016.
 91 | * AWR: [Simple and Scalable Off-Policy Reinforcement Learning](https://arxiv.org/abs/1910.00177), Xue Bin Peng et al 2019, CoRR.
 92 | * AWAC: [AWAC: Accelerating Online Reinforcement Learning with Offline Datasets](https://arxiv.org/abs/2006.09359), Ashvin Nair et al 2020, CoRR
 93 | * TD3+BC: [A Minimalist Approach to Offline Reinforcement Learning](https://arxiv.org/abs/2106.06860), Scott Fujimoto et al 2020.
 94 | * CQL: [Conservative Q-Learning for Offline Reinforcement Learning](https://arxiv.org/abs/2006.04779), Aviral Kumar et al 2020, CoRR.
 95 | * IQL: [Offline Reinforcement Learning with Implicit Q-Learning](https://arxiv.org/abs/2110.06169), Ilya Kostrikov et al 2021.
 96 | 
 97 | 
 98 | *** IRL
 99 | * App: [Apprenticeship Learning via Inverse Reinforcement Learning](https://www.cs.utexas.edu/~sniekum/classes/RLFD-F15/papers/Abbeel04.pdf), P Abbeel et al 2004.
100 | * [Maximum Entropy Inverse Reinforcement Learning](https://www.aaai.org/Papers/AAAI/2008/AAAI08-227.pdf?source=post_page---------------------------), BD Ziebart et al 2008, AAAI.
101 | * [Relative Entropy Inverse Reinforcement Learning](http://proceedings.mlr.press/v15/boularias11a/boularias11a.pdf), A Boularias et al 2011, AISTATS.
102 | 
103 | 
104 | 
105 | ## Refs
106 | 
107 | * https://spinningup.openai.com/en/latest/spinningup/keypapers.html
108 | 


--------------------------------------------------------------------------------
/RL4Control.md:
--------------------------------------------------------------------------------
1 | # RL4Control
2 | 
3 | Contributors: 
4 | 
5 | ## model-free
6 | 
7 | * [Enhanced model-Free deep Q-Learning Based control](https://www.iosrjournals.org/iosr-jce/papers/Vol20-issue1/Version-3/E2001032332.pdf), S. Mohamed el tal 2018.


--------------------------------------------------------------------------------
/RL4DrugDiscovery.md:
--------------------------------------------------------------------------------
 1 | # RL4DrugDiscovery
 2 | 
 3 | 
 4 | 
 5 | * **GCPN**: [Graph Convolutional Policy Network for Goal-Directed Molecular Graph Generation](https://arxiv.org/abs/1806.02473), You et al 2018, NIPS.
 6 | * **MolDQN**: [Optimization of Molecules via Deep Reinforcement Learning](https://arxiv.org/abs/1810.08678), Zhou et al 2018, Sci. Rep.
 7 | * **MolGAN**: [MolGAN: An implicit generative model for small molecular graphs](https://arxiv.org/abs/1805.11973), De Cao et al 2018, ICML.
 8 | * **MolGym**: [Reinforcement Learning for Molecular Design Guided by Quantum Mechanics](https://arxiv.org/abs/2002.07717), N. C. Simm et al 2020, ICML.
 9 | * **REINVENT**: [REINVENT 2.0: An AI Tool for De Novo Drug Design](https://pubs.acs.org/doi/10.1021/acs.jcim.0c00915), Blaschke et al 2020, J Chem Inf Model.
10 | * **RationaleRL**: [Multi-Objective Molecule Generation using Interpretable Substructures](https://arxiv.org/abs/2002.03244), Jin et al 2020, ICML.
11 | * **MCMG**: [Multi-constraint molecular generation based on conditional transformer, knowledge distillation and reinforcement learning](https://www.nature.com/articles/s42256-021-00403-1), Wang et al 2020, Nat. Mach. Intell.
12 | * **DeepLigBuilder**: [Structure-based de novo drug design using 3D deep generative models](https://pubs.rsc.org/en/content/articlelanding/2021/sc/d1sc04444c), Li et al 2021, Chem. Sci.
13 | * **GEGL**: [Guiding Deep Molecular Optimization with Genetic Exploration (neurips.cc)](https://proceedings.neurips.cc//paper/2020/hash/8ba6c657b03fc7c8dd4dff8e45defcd2-Abstract.html), Ahn et al 2020, NIPS.
14 | * **MOLER**: [MOLER: Incorporate Molecule-Level Reward to Enhance Deep Generative Model for Molecule Optimization](https://ieeexplore.ieee.org/document/9330796), Fu et al 2022, IEEE Trans Knowl Data Eng.
15 | * **PROTAC-RL**: [Accelerated rational PROTAC design via deep learning and molecular simulations](https://www.nature.com/articles/s42256-022-00527-y), Zheng et al 2022, Nat. Mach. Intell.
16 | 


--------------------------------------------------------------------------------
/RL4Game.md:
--------------------------------------------------------------------------------
1 | # RL for game theory
2 | 
3 | 
4 | * Nash Q: [Deep Q-Learning for Nash Equilibria: Nash-DQN](https://arxiv.org/abs/1904.10554), Casgrain P, et al 2019, CoRR
5 | * Nash Q:[Nash Q-Learning for General-Sum Stochastic Games](https://www.jmlr.org/papers/volume4/hu03a/hu03a.pdf), 2003
6 | * SNQ2: [Learning Nash Equilibria in Zero-Sum Stochastic Games via Entropy-Regularized Policy Approximation](https://arxiv.org/abs/2009.00162), 2021
7 | 
8 | * DeepNash: [Mastering the Game of Stratego with Model-Free Multiagent Reinforcement Learning](https://arxiv.org/abs/2206.15378), 2022, Deepmind


--------------------------------------------------------------------------------
/RL4IIoT.md:
--------------------------------------------------------------------------------
 1 | # RL- IIoT-related
 2 | 
 3 | Contributors: [Wenhao Wu](https://github.com/wenhao0214)
 4 | 
 5 | ### Network Scheduling
 6 | 
 7 | [Cellular network traffic scheduling with deep reinforcement learning](https://dl.acm.org/doi/10.5555/3504035.3504129), Sandeep Chinchali et al 2018, AAAI.
 8 | 
 9 | > Present a reinforcement learning (RL) based scheduler that can dynamically
10 | > adapt to traffic variation.
11 | 
12 | [Learning Scheduling Algorithms for Data Processing Clusters](https://arxiv.org/abs/1810.01963), Hongzi Mao et al 2019, SIGCOMM.
13 | 
14 | > Develop new representations for jobs' dependency and conduct real experiments on Spark.
15 | 
16 | [ReLeS: A Neural Adaptive Multipath Scheduler based on Deep Reinforcement Learning](https://dl.acm.org/doi/abs/10.1109/INFOCOM.2019.8737649), Han Zhang et al 2019, IEEE INFOCOM.
17 | 
18 | > A scheduler with an training algorithm that enables parallel execution of packet scheduling, data collecting, and neural network training.
19 | 
20 | [Deep_Reinforcement_Learning_for_User_Association_and_Resource_Allocation_in_Heterogeneous_Cellular_Networks](https://ieeexplore.ieee.org/document/8796358), Nan Zhao et al 2019, IEEE Transactions on Wireless Communications.
21 | 
22 | > Develop a distributed optimization method based on multi-agent RL.
23 | 
24 | [Adaptive Video Streaming for Massive MIMO Networks via Approximate MDP and Reinforcement Learning](https://ieeexplore.ieee.org/document/9103310), Qiao Lan et al 2020, IEEE Transactions on Wireless Communications.
25 | 
26 | >Consider a MDP with random user arrivals and departures.
27 | 
28 | ### Workshop Scheduling
29 | 
30 | [Relative value function approximation for the capacitated re-entrant line scheduling problem](https://ieeexplore.ieee.org/abstract/document/1458721), Jin Young Choi et al 2005, IEEE Transactions on Automation Science and Engineering.
31 | 
32 | [A Reinforcement Learning Approach to Robust Scheduling of Semiconductor Manufacturing Facilities](https://ieeexplore.ieee.org/document/8946870), In-Beom Park et al 2020, IEEE Transactions on Automation Science and Engineering.
33 | 
34 | [Deep reinforcement learning in production systems: a systematic literature review](https://arxiv.org/abs/2109.03540), Xiaocong Chen et al 2021, International Journal of Production Research.
35 | 
36 | [Dynamic job-shop scheduling in smart manufacturing using deep reinforcement learning](https://www.sciencedirect.com/science/article/abs/pii/S1389128621001031), Libing Wang et al 2021, Computer Networks.
37 | 
38 | ### Other Scheduling Problem
39 | 
40 | [A deep q-network for the beer game: A reinforcement learning algorithm to solve inventory optimization problems](https://arxiv.org/abs/1708.05924), Afshin Oroojlooyjadid et al 2017.
41 | 
42 | > Use RL in a simply case of the supply chain.
43 | 
44 | [Efficient Large-Scale Fleet Management via Multi-Agent Deep Reinforcement Learning](https://dl.acm.org/doi/10.1145/3219819.3219993), Kaixiang Lin et al 2018, KDD.
45 | 
46 | > A special case of bin packing problem solved by RL.
47 | 
48 | [ORL Reinforcement Learning Benchmarks for Online Stochastic Optimization](https://arxiv.org/abs/1911.10641v2), Bharathan Balaji et al 2018, Amazon Report.
49 | 
50 | >Applying RL algorithms to a range of practical applications.
51 | 


--------------------------------------------------------------------------------
/RL4IL.md:
--------------------------------------------------------------------------------
 1 | # RL for imitation learning
 2 | 
 3 | Contributors: [johnjim0816](https://github.com/JohnJim0816)
 4 | 
 5 | ## Surveys
 6 | 
 7 | [A Imitation Learning: A Survey of Learning Methods](https://core.ac.uk/download/pdf/141207521.pdf)
 8 | 
 9 | * [Imitation Learning by Estimating Expertise of Demonstrators](https://arxiv.org/abs/2202.01288), 2022, ICML
10 | * BC: [Integrating Behavior Cloning and Reinforcement Learning for Improved Performance in Dense and Sparse Reward Environments](https://arxiv.org/abs/1910.04281)
11 | * BC+TD3: [Twin Delayed DDPG with Behavior Cloning ](https://arxiv.org/pdf/2106.06860.pdf)


--------------------------------------------------------------------------------
/RL4Policy-Diversity.md:
--------------------------------------------------------------------------------
1 | * [Diversity Driven Exploration Strategy for Deep Reinforcement Learning](https://arxiv.org/abs/1802.04564), Zhang Wei Hong et al, 2018, NIPS
2 | 
3 | ### Reward Shaping
4 | 
5 | * Continuously Discovering Novel Strategies via Reward-Switching Policy Optimization, 2022, ICLR


--------------------------------------------------------------------------------
/RL4QD.md:
--------------------------------------------------------------------------------
  1 | # QD
  2 | 
  3 | ## Survey:
  4 | 
  5 | - QD: [Quality Diversity: A New Frontier for Evolutionary Computation](http://eplex.cs.ucf.edu/papers/pugh_frontiers16.pdf), Justin Pugh et al 2016, Frontiers in Robotics and AI
  6 | - QD opt: [Quality and diversity optimization: A unifying modular framework](https://arxiv.org/pdf/1708.09251.pdf) , Antoine Cully et al 2018 , TEC
  7 | - [Policy search in continuous action domains: An overview](), Oliver Sigaud et al 2019, Neural Network
  8 | - [Quality-Diversity Optimization: a novel branch of stochastic optimization](https://arxiv.org/pdf/2012.04322.pdf), Konstantinos Chatzilygeroudis et al 2021
  9 | 
 10 | ## QD Analysis:
 11 | 
 12 | - [An Extended Study of Quality Diversity Algorithms](http://delivery.acm.org/10.1145/2910000/2909000/p19-pugh.pdf?ip=129.31.142.189&id=2909000&acc=CHORUS&key=BF07A2EE685417C5%2EF5014A9D3D5CC2D9%2E4D4702B0C3E38B35%2E6D218144511F3437&__acm__=1572805067_7b82b5509b1bbc9081c7a3dd591b28b7) , Justin Pugh et al 2016 , Gecco
 13 | - [Gaining insight into quality diversity](https://infoscience.epfl.ch/record/220676/files/p1061-auerbach.pdf) , Joshua Auerbach et al 2016, Gecco
 14 | - [Searching for quality diversity when diversity is unaligned with quality](https://eplex.cs.ucf.edu/papers/pugh_ppsn16.pdf) , Justin Pugh  et al 2016 , PPSN
 15 | - QD-suite: [Towards QD-suite: developing a set of benchmarks for Quality-Diversity algorithms](https://arxiv.org/pdf/2205.03207.pdf)
 16 | - [ANALYSIS OF QUALITY DIVERSITY ALGORITHMS FOR THE
 17 |   KNAPSACK PROBLEM](https://arxiv.org/pdf/2207.14037.pdf), Adel Nikfarjam et al 2022, PPSN
 18 | 
 19 | 
 20 | 
 21 | ## QD-RL:
 22 | 
 23 | - QD-RL: [QD-RL: Efficient Mixing of Quality and Diversity in Reinforcement Learning](https://arxiv.org/pdf/2006.08505.pdf), Geoffrey Cideron et al 2020
 24 | - DQD: [Differentiable Quality Diversity](https://arxiv.org/pdf/2106.03894.pdf), Matthew C. Fontaine et al 2021, NIPS
 25 | - GUSS: [Guided Safe Shooting: model based reinforcement learning with safety constraints](https://arxiv.org/abs/2206.09743), Giuseppe Paolo et al 2022
 26 | - EDO-CS: [Evolutionary Diversity Optimization with Clustering-based Selection for Reinforcement Learning](https://openreview.net/pdf?id=74x5BXs4bWD), Yutong Wang et al 2022, ICLR
 27 | - [Deep Surrogate Assisted Generation of Environments](https://arxiv.org/pdf/2206.04199.pdf), Varun Bhatt et al 2022, NIPS
 28 | - HTSE: [Promoting Quality and Diversity in Population-based Reinforcement
 29 |   Learning via Hierarchical Trajectory Space Exploration](https://ieeexplore.ieee.org/document/9811888/), Jiayu Miao et al 2022, ICRA
 30 | - DA-QD: [Dynamics-Aware Quality-Diversity for Efficient Learning of Skill
 31 |   Repertoires](https://arxiv.org/pdf/2109.08522.pdf), Bryan Lim et al 2022, ICRA 
 32 | - [Approximating Gradients for Differentiable Quality Diversity in Reinforcement Learning](https://dl.acm.org/doi/10.1145/3512290.3528705), Bryon Tjanaka et al 2022, Gecco
 33 | - QD-PG: [Diversity Policy Gradient for Sample Efficient Quality-Diversity Optimization](https://arxiv.org/pdf/2006.08505.pdf), Thomas Pierrot et al 2022, Gecco
 34 | 
 35 | ## QD-Evolution:
 36 | 
 37 | - [Discovering evolutionary stepping stones through behavior domination](https://arxiv.org/pdf/1704.05554.pdf) , Elliot Meyerson 2017, Gecco
 38 | - [The surprising creativity of digital evolution: A collection of anecdotes from the evolutionary computation and artificial life research communities](https://arxiv.org/abs/1803.03453) , Joel Lehman et al 2018
 39 | - [Open-ended evolution with multi-containers QD](https://dl.acm.org/doi/abs/10.1145/3205651.3205705) , Stephane Doncieux et al 2018 , Gecco
 40 | - [Mapping structural diversity in networks sharing a given degree distribution and global clustering: Adaptive resolution grid search evolution with Diophantine equation-based mutations](https://arxiv.org/pdf/1809.06293.pdf) , Peter Overbury et al 2018
 41 | - [Hierarchical Behavioral Repertoires with Unsupervised Descriptors](https://arxiv.org/pdf/1804.07127.pdf) , Antoine Cully et a; 2018, Gecco
 42 | - mEDEA: [Evolution of a Functionally Diverse Swarm via a Novel Decentralised Quality-Diversity Algorithm](https://arxiv.org/pdf/1804.07655.pdf), Emma Hart 2018, Gecco
 43 | - [An approach to evolve and exploit repertoires of general robot behaviours](https://repositorio.iscte-iul.pt/bitstream/10071/16255/5/1-s2.0-S2210650217308556-main.pdf),Jorge Gomes et al 2018, Swarm and Evolutionary Computation
 44 | - POET:[POET: open-ended coevolution of environments and their optimized solutions](),Rui Wang et al 2019, Gecco
 45 | - [Modeling user selection in quality diversity](https://arxiv.org/pdf/1907.06912.pdf),  Alexander Hagg et al 2019, Gecco
 46 | - [Exploration and Exploitation in Symbolic Regression using Quality-Diversity and Evolutionary Strategies Algorithms](https://arxiv.org/pdf/1906.03959.pdf), J.-P,Bruneton et al 2019
 47 | - [Designing neural networks through neuroevolution](http://www.evolvingai.org/files/s42256-018-0006-z.pdf), Kenneth Stanley et al 2019, Nature Machine Intelligence
 48 | - GAPN: [Behavioral Repertoire via Generative Adversarial Policy Networks](http://homepages.inf.ed.ac.uk/thospeda/papers/jegorova2019gpn.pdf), Marija Jegorova et al 2019
 49 | - [Autonomous Skill Discovery with Quality-diversity and Unsupervised Descriptors](https://arxiv.org/pdf/1905.11874.pdf), Antoine Cully 2019, Gecco
 50 | - [Scaling MAP-Elites to Deep Neuroevolution](https://arxiv.org/pdf/2003.01825.pdf), Cedric Colas et al 2020
 51 | - [Quality Diversity for Multi-task Optimization](https://arxiv.org/pdf/2003.04407.pdf), Jean-Baptiste Mouret et al 2020, Gecco
 52 | - [Policy Manifold Search for Improving Diversity-based Neuroevolution](https://arxiv.org/pdf/2012.08676), Nemanja Rakicevic et al 2020, NIPS Workshop
 53 | - [Learning behaviour-performance maps with meta-evolution](https://hal.inria.fr/hal-02555231/document), David Bossens et al 2020, Gecco
 54 | - [Exploring the Evolution of GANs through Quality Diversity](https://arxiv.org/abs/2007.06251), Victor Costa et al 2020, Gecco
 55 | - Enhance POET: [Enhanced POET: Open-ended reinforcement learning through unbounded invention of learning challenges and their solutions](https://arxiv.org/pdf/2003.08536), Rui Wang et al 2020, ICML
 56 | - [Effective Diversity in Population Based Reinforcement Learning](https://papers.nips.cc/paper/2020/file/d1dc3a8270a6f9394f88847d7f0050cf-Paper.pdf),  Jack Parker-Holder et al 2020, NIPS
 57 | - [Discovering Representations for Black-box Optimization](https://hal.inria.fr/hal-02555221/document), Adam Gaier et al 2020, Gecco
 58 | - [Competitiveness of MAP-Elites against Proximal Policy Optimization on locomotion tasks in deterministic simulations](https://arxiv.org/pdf/2009.08438), Szymon Brych et al 2020
 59 | - CPPN2GAN: [CPPN2GAN: Combining Compositional Pattern Producing Networks and GANs for Large-scale Pattern Generation](https://arxiv.org/pdf/2004.01703.pdf), Jacob Schrum et al 2020, Gecco
 60 | - Bop-Elites: [Bop-elites, a bayesian optimisation algorithm for quality-diversity search](https://arxiv.org/pdf/2005.04320), Paul Kent et al 2020
 61 | - [Unsupervised Behaviour Discovery with Quality-Diversity Optimisation](https://arxiv.org/pdf/2106.05648), Luca Grillotti et al 2021
 62 | - [Sparse Reward Exploration via Novelty Search and Emitters](https://arxiv.org/pdf/2102.03140), Giuseppe Paolo et al 2021, Gecco
 63 | - PMS: [Policy Manifold Search: Exploring the Manifold Hypothesis for Diversity-based Neuroevolution](https://arxiv.org/pdf/2104.13424), Nemanja Rakicevic et al 2021, Gecco
 64 | - [On the use of feature-maps and parameter control for improved quality-diversity meta-evolution](https://arxiv.org/pdf/2105.10317), David M. Bossens et al 2021, Gecco
 65 | - [Illuminating the Space of Beatable Lode Runner Levels Produced By Various Generative Adversarial Networks](https://arxiv.org/pdf/2101.07868), Kirby Steckel et al 2021
 66 | - [Expressivity of Parameterized and Data-driven Representations in Quality Diversity Search](https://arxiv.org/pdf/2105.04247.pdf), Alexander Hagg et al 2021, Gecco
 67 | - [Ensemble Feature Extraction for Multi-Container Quality-Diversity Algorithms](https://arxiv.org/pdf/2105.00682), Leo Cazenille et al 2021, Gecco
 68 | - AutoAlpha: [AutoAlpha: an Efficient Hierarchical Evolutionary Algorithm for Mining Alpha Factors in Quantitative Investment](https://arxiv.org/pdf/2002.08245), Tianping Zhang et al 2021
 69 | 
 70 | 
 71 | 
 72 | ## Novelty Search:
 73 | 
 74 | - [Novelty-based multiobjectivization](https://hal.archives-ouvertes.fr/hal-01300711/file/2011COS1944.pdf) , Jean-Baptiste Mouret 2011
 75 | - [Evolving a diversity of virtual creatures through novelty search and local competition](https://pdfs.semanticscholar.org/6d45/9da1ff73ec7225e92842341605e2b90d0da2.pdf) , Joel Lehman et al 2011, Gecco
 76 | - [Abandoning objectives: Evolution through the search for novelty alone](http://eplex.cs.ucf.edu/papers/lehman_ecj11.pdf) , Joel Lehman et al 2011, Evolutionary Computation
 77 | - [Constrained novelty search: A study on game content generation](http://antoniosliapis.com/papers/constrained_novelty_search.pdf) , Antonios Liapis et al 2015, Evolutionary Computation
 78 | - [Understanding innovation engines: Automated creativity and improved stochastic optimization via deep learning](http://www.evolvingai.org/files/2016_NguyenEtAl_UnderstandingInnovationEngines.pdf) , Anh Nguyen et al 2016, Evolutionary Computation
 79 | - [Bayesian optimization with automatic prior selection for data-efficient direct policy search](https://arxiv.org/pdf/1709.06919.pdf), Remi Pautrat et al 2018, ICRA
 80 | - [Novelty search: a theoretical perspective](https://hal.archives-ouvertes.fr/hal-02561846/document),Stephane Doncieux et al 2019, Gecco
 81 | - BR-NS:[BR-NS: an Archive-less Approach to Novelty Search](https://arxiv.org/pdf/2104.03936.pdf), Achkan Salehi et al 2021, Gecco
 82 | - [Geodesics, Non-linearities and the Archive of Novelty Search](https://arxiv.org/pdf/2205.03162.pdf), Achkan Salehi et al 2022, Gecco
 83 | 
 84 | 
 85 | 
 86 | ## MAP-Elites Family:
 87 | 
 88 | - [Illuminating search spaces by mapping elites](https://arxiv.org/pdf/1504.04909.pdf) , Jean-Baptiste Mouret et al 2015
 89 | - MAP-Elites: [Robots that can adapt like animals](https://arxiv.org/pdf/1407.3501.pdf) , Antoine Cully et al 2015 , Nature
 90 | - [How Do Different Encodings Influence the Performance of the MAP-Elites Algorithm?](https://hal.inria.fr/hal-01302658/file/gecco_map_elites.pdf) , Danesh Tarapore et al 2016, Gecco
 91 | - SAIL: [Feature space modeling through surrogate illumination](https://pdfs.semanticscholar.org/d305/ba9a5ee3089de3f2e03d6fa53c90aba89d9c.pdf) , Adam Gaier et al 2017, Gecco
 92 | - SAIL2: [Data-efficient exploration, optimization, and modeling of diverse designs through surrogate-assisted illumination](https://hal.inria.fr/hal-01518698/file/sail2017.pdf) , Adam Gaier et al 2017, Gecco
 93 | - [Comparing multimodal optimization and illumination](https://hal.inria.fr/hal-01518802/document) , Vassilis Vassiliades et al 2017 , Gecco
 94 | - [A comparison of illumination algorithms in unbounded spaces](https://hal.inria.fr/hal-01518814/document) , Vassilis Vassiliades et al 2017 , Gecco
 95 | - CVT-MAP-Elites: [Using Centroidal Voronoi Tessellations to Scale Up the Multidimensional Archive of Phenotypic Elites Algorithm](https://hal.inria.fr/hal-01630627/file/ieee_tec_voronoi_map_elites.pdf) , Vassilis Vassiliades et al 2018 , TEC
 96 | - Talakat: [Talakat: Bullet Hell Generation through Constrained Map-Elites](https://arxiv.org/pdf/1806.04718.pdf) , Ahmed Khalifa et al 2018 , Gecco
 97 | - RTE: [Reset-free trial-and-error learning for robot damage recovery](https://arxiv.org/pdf/1610.04213) , Konstantinos Chatzilygeroudis et al 2018 , RAS
 98 | - [Optimisation and Illumination of a Real-World Workforce Scheduling and Routing Application (WSRP) via Map-Elites](https://arxiv.org/pdf/1805.11555.pdf) , Neil Urquhart et al 2018, PPSN
 99 | - [Multi-objective Analysis of MAP-Elites Performance](https://arxiv.org/pdf/1803.05174.pdf) , Eivind Samuelsen et al 2018
100 | - SAIL3: [Data-Efficient Design Exploration through Surrogate-Assisted Illumination](https://arxiv.org/pdf/1806.05865.pdf), Adam Gaier et al 2018, Evolutionary Computation
101 | - MESB: [Mapping Hearthstone Deck Spaces with Map-Elites with Sliding Boundaries](https://arxiv.org/pdf/1904.10656.pdf), Matthew Fontaine et al 2019, Gecco
102 | - [MAP-Elites for noisy domains by adaptive sampling](http://sebastianrisi.com/wp-content/uploads/justesen_gecco19.pdf), Niels Justesen et al 2019, Gecco
103 | - [Evaluating MAP-Elites on Constrained Optimization Problems](https://arxiv.org/pdf/1902.00703.pdf), Stefano Fioravanzo  et al 2019
104 | - [Empowering Quality Diversity in Dungeon Design with Interactive Constrained MAP-Elites](https://arxiv.org/pdf/1906.05175.pdf), Alberto Alvarez et al 2019
105 | - [An illumination algorithm approach to solving the micro-depot routing problem](https://dl.acm.org/doi/10.1145/3321707.3321767), Neil Urquhart et al 2019, Gecco
106 | - [Using MAP-Elites to support policy making around Workforce Scheduling and Routing](https://www.napier.ac.uk/~/media/worktribe/output-2296970/using-map-elites-to-support-policy-making-around-workforce-scheduling-and-routing.pdf), Neil Urquhart et al 2020
107 | - [Exploring the BipedalWalker benchmark with MAP-Elites and Curiosity-driven A3C](https://dl.acm.org/doi/pdf/10.1145/3377929.3389921), Vikas Gupta et al 2020, Gecco
108 | - [Covariance Matrix Adaptation for the Rapid Illumination of Behavior Space](https://arxiv.org/pdf/1912.02400.pdf), Matthew C. Fontaine et al 2020, Gecco
109 | - PGA-MAP-Elites: [Policy Gradient Assisted MAP-Elites](https://hal.archives-ouvertes.fr/hal-03135723/document), Olle Nilsson et al 2021, Gecco
110 | - [Multi-Emitter MAP-Elites: Improving quality, diversity and convergence speed with heterogeneous sets of emitters](https://arxiv.org/pdf/2007.05352), Antoine Cully 2021, Gecco
111 | - [Minimize Surprise MAP-Elites: A Task-Independent MAP-Elites Variant for Swarms](https://dl.acm.org/doi/10.1145/3520304.3528773),  Tanja Katharina Kaiser et al 2022, Gecco
112 | - [Illuminating Diverse Neural Cellular Automata for Level Generation](https://dl.acm.org/doi/10.1145/3512290.3528754), Sam Earle et al 2022, Gecco
113 | - [Deep Surrogate Assisted MAP-Elites for Automated Hearthstone Deckbuilding](https://dl.acm.org/doi/10.1145/3512290.3528718), Yulun Zhang et al 2022, Gecco
114 | - [Accelerated Quality-Diversity through Massive
115 |   Parallelism](https://arxiv.org/pdf/2202.01258.pdf), Bryan Lim et al 2022
116 | 
117 | 
118 | 
119 | ## Refs
120 | 
121 | [Quality-Diversity optimisation algorithms](https://quality-diversity.github.io/papers.html)
122 | 
123 | 


--------------------------------------------------------------------------------
/RL4Robot.md:
--------------------------------------------------------------------------------
  1 | # RL-Robot-related
  2 | 
  3 | Contributors: [Yongqi Li](https://github.com/L3Y1Q2)
  4 | 
  5 | ### legged robot
  6 | 
  7 | #### quadrupedal robot
  8 | 
  9 | * [Creating a Dynamic Quadrupedal Robotic Goalkeeper with Reinforcement Learning](https://arxiv.org/abs/2210.04435), Huang, Xiaoyu et al 2022, arXiv
 10 | 
 11 |   > A hierarchical reinforcement learning framework is used to intercept the ball by combining a quadruple robot with high dynamic motion and an object perception method.
 12 | 
 13 | * [Learning to Walk in Minutes Using Massively Parallel Deep Reinforcement Learning](https://proceedings.mlr.press/v164/rudin22a.html), Rudin N et al 2022, CoRL
 14 | 
 15 | * [Learning robust perceptive locomotion for quadrupedal robots in the wild](https://www.science.org/doi/abs/10.1126/scirobotics.abk2822), Takahiro Miki et al 2022, Science Robotics
 16 | 
 17 | * [Legged Robots that Keep on Learning: Fine-Tuning Locomotion Policies in the Real World](https://www.semanticscholar.org/paper/35efc3a4c5f64d96ded6daea692f3935c96f0415), Laura Smith et al 2022, ICRA
 18 | 
 19 | * [Cat-Like Jumping and Landing of Legged Robots in Low Gravity Using Deep Reinforcement Learning](https://www.semanticscholar.org/paper/7e8186146b95337d24d28dda05cab886621cdf8c), Rudin N et al 2021, TRO
 20 | 
 21 | * [Learning quadrupedal locomotion over challenging terrain](https://www.semanticscholar.org/paper/eadbe2e4f9de47dd357589cf59e3d1f0199e5075), Joonho Lee et al 2020, Science Robotics
 22 | 
 23 | * [Learning Agile Robotic Locomotion Skills by Imitating Animals](https://arxiv.org/abs/2004.00784), Xue Bin Peng et al 2020, RSS
 24 | 
 25 | * [Learning agile and dynamic motor skills for legged robots](https://www.science.org/doi/full/10.1126/scirobotics.aau5872), Hwangbo et al 2019, Science Robotics
 26 | 
 27 | * [Sim-to-Real: Learning Agile Locomotion For Quadruped Robots](https://www.semanticscholar.org/paper/4d3b69bdcd1d325d29badc6a38f2d6cc504fe7d1), Jie Tan et al 2018, RSS
 28 | 
 29 | * [Learning to Walk via Deep Reinforcement Learning](https://www.semanticscholar.org/paper/2ed619fbc7902155d54f6f21da16ad6c120eac63), Tuomas Haarnoja et al 2018, RSS
 30 | 
 31 | * [Robust Rough-Terrain Locomotion with a Quadrupedal Robot](https://ieeexplore.ieee.org/abstract/document/8460731), Peter Fankhauser et al 2018, ICRA
 32 | 
 33 | #### bipedal robot
 34 | 
 35 | * [Towards Real Robot Learning in the Wild: A Case Study in Bipedal Locomotion](https://proceedings.mlr.press/v164/bloesch22a.html), Bloesch M et al 2022, CoRL
 36 | * [Sim-to-Real Learning for Bipedal Locomotion Under Unsensed Dynamic Loads](https://arxiv.org/abs/2204.04340), Jeremy Dao et al 2022, ICRA
 37 | * [Sim-to-Real Learning of Footstep-Constrained Bipedal Dynamic Walking](https://arxiv.org/abs/2203.07589), Helei Duan et al 2022, ICRA
 38 | * [Blind bipedal stair traversal via sim-to-real reinforcement learning](https://arxiv.org/abs/2105.08328), Jonah Siekmann et al 2021, RSS
 39 | * [Reinforcement learning for robust parameterized locomotion control of bipedal robots](https://ieeexplore.ieee.org/abstract/document/9560769), Zhongyu Li et al 2021, ICRA
 40 | * [DeepWalk: Omnidirectional bipedal gait by deep reinforcement learning](https://ieeexplore.ieee.org/abstract/document/9561717), Diego Rodriguez et al 2021, ICRA
 41 | * [Learning Memory-Based Control for Human-Scale Bipedal Locomotion](https://arxiv.org/abs/2006.02402), Jonah Siekmann et al 2020, RSS
 42 | * [Learning Locomotion Skills for Cassie: Iterative Design and Sim-to-Real](https://www.semanticscholar.org/paper/719068eb8b8c9ab8552ec3e82c1b1088a9eacdce), Zhaoming Xie et al 2019, CoRL
 43 | * [Feedback Control For Cassie With Deep Reinforcement Learning](https://www.semanticscholar.org/paper/e3bcefbcba308934dd1d843102e2b82c7239d56d), Zhaoming Xie et al 2018, IROS
 44 | 
 45 | * [DeepMimic: Example-Guided Deep Reinforcement Learning of Physics-Based Character Skills](https://www.semanticscholar.org/paper/1b9ce6abc0f3024b88fcd4dbd0c10cf5bcf7d38d), Xue Bin Peng et al 2018, TOG
 46 | 
 47 | ### UAV
 48 | 
 49 | * [Learning Minimum-Time Flight in Cluttered Environments](https://rpg.ifi.uzh.ch/docs/RAL_IROS22_Penicka.pdf), Robert Penicka et al 2022, RAL
 50 | 
 51 |   > On the basis of [previous work](https://rpg.ifi.uzh.ch/docs/IROS21_Yunlong.pdf), obstacles are considered.
 52 | 
 53 | * [Learning High-Speed Flight in the Wild](https://rpg.ifi.uzh.ch/AgileAutonomy.html), A. Loquercio et al 2021, Science Robotics
 54 | 
 55 |   >  This paper proposes an end-to-end approach that can autonomously fly quadrotors through complex natural and man-made environments at high speeds, with purely onboard sensing and computation.
 56 | 
 57 | * [Autonomous Drone Racing with Deep Reinforcement Learning](https://rpg.ifi.uzh.ch/docs/IROS21_Yunlong.pdf), Yunlong Song et al 2021, IROS
 58 | 
 59 |   > This paper presents a learning-based method for autonomous drone racing.
 60 | 
 61 | * [A Two-Stage Reinforcement Learning Approach for Multi-UAV Collision Avoidance Under Imperfect Sensing](https://ieeexplore.ieee.org/abstract/document/9001167), Wang D et al 2020, RAL
 62 | 
 63 |   > A two-stage training method for collision avoidance based on DDPG can generate a time-effective and collision-free path under imperfect perception.
 64 | 
 65 | * [Low-level autonomous control and tracking of quadrotor using reinforcement learning](https://www.sciencedirect.com/science/article/pii/S0967066119301923), Chen Huan Pi et al 2020, CEP
 66 | 
 67 |   > Model-free DRL based low-level control algorithm for quadrotor is used for hovering and trajectory tracking.
 68 | 
 69 | * [Low-level control of a quadrotor with deep model-based reinforcement learning](https://ieeexplore.ieee.org/abstract/document/8769882/), Lambert N O et al 2019, RAL
 70 | 
 71 |   > Model-based DRL based low-level control algorithm for quadrotor.
 72 | 
 73 | * [Deterministic Policy Gradient With Integral Compensator for Robust Quadrotor Control](https://ieeexplore.ieee.org/abstract/document/8600717/), Wang Y et al 2019, TSMC
 74 | 
 75 |   > DPG-IC: a learning-based robust control strategy for quadrotor control with DRL.
 76 | 
 77 | * [Reinforcement learning for UAV attitude control](https://dl.acm.org/doi/abs/10.1145/3301273), William Koch et al 2019, TCPS
 78 | 
 79 |   > This paper replaces the inner-loop PID attitude controller with reinforcement learning.
 80 | 
 81 | * [Autonomous UAV Navigation Using Reinforcement Learning](https://arxiv.org/abs/1801.05086), Huy X. Pham et al 2018, IJMLC
 82 | 
 83 |   > This paper presented a technique to train a quadrotor to learn to navigate to the target point using a PID+ Q-learning algorithm in an unknown environment.
 84 | 
 85 | * [Control of a quadrotor with reinforcement learning](https://ieeexplore.ieee.org/abstract/document/7961277/), Hwangbo J et al 2017, RAL
 86 | 
 87 |   > The paper proposes autonomous UAV stability control based on reinforcement learning.
 88 | 
 89 | ### UGV&USV
 90 | 
 91 | * [Goal-Driven Autonomous Exploration Through Deep Reinforcement Learning](https://ieeexplore.ieee.org/document/9645287/), Cimurs R et al 2021, RAL
 92 | * [Path Planning Algorithms for USVs via Deep Reinforcement Learning](https://www.semanticscholar.org/paper/Path-Planning-Algorithms-for-USVs-via-Deep-Learning-Zhai-Wang/b7d3afecf5ea672621b1f96d28ea7542c02afc1a), Haoran Zhai et al 2021, CAC
 93 | * [Mobile robot path planning in dynamic environments through globally guided reinforcement learning](https://ieeexplore.ieee.org/abstract/document/9205217/), B Wang et al 2020, RAL
 94 | * [Deep reinforcement learning for indoor mobile robot path planning](https://www.mdpi.com/838810), Gao J et al 2020, Sensors
 95 | * [PRM-RL: Long-range Robotic Navigation Tasks by Combining Reinforcement Learning and Sampling-Based Planning](https://www.semanticscholar.org/paper/PRM-RL%3A-Long-range-Robotic-Navigation-Tasks-by-and-Faust-Ramirez/551c60bd9178a199c20723122cd26ddd9c0c93b6), Aleksandra Faust et al 2018, ICRA
 96 | * [Target-driven visual navigation in indoor scenes using deep reinforcement learning](https://www.semanticscholar.org/paper/Target-driven-visual-navigation-in-indoor-scenes-Zhu-Mottaghi/7af7f2f539cd3479faae4c66bbef49b0f66202fa), Yuke Zhu et al 2017, ICRA
 97 | * [Virtual-to-real deep reinforcement learning: Continuous control of mobile robots for mapless navigation](https://www.semanticscholar.org/paper/Virtual-to-real-deep-reinforcement-learning%3A-of-for-Tai-Paolo/799c0e461332570ecde97e13266fecde8476efe3), L Tai et al 2017, IROS
 98 | * [Towards Monocular Vision based Obstacle Avoidance through Deep Reinforcement Learning](https://www.semanticscholar.org/paper/Towards-Monocular-Vision-based-Obstacle-Avoidance-Xie-Wang/eab2c0bb3eda3b2c37b379e574a645d52ec264ef), Linhai Xie et al 2017, RSS
 99 | * [Socially aware motion planning with deep reinforcement learning](https://www.semanticscholar.org/paper/Socially-aware-motion-planning-with-deep-learning-Chen-Everett/fe2ef22089712fcff33a77761860a10b7834da47), Yu Fan Chen et al 2017, IROS
100 | * [From perception to decision: A data-driven approach to end-to-end motion planning for autonomous ground robots](https://www.semanticscholar.org/paper/From-perception-to-decision%3A-A-data-driven-approach-Pfeiffer-Schaeuble/aa0b2517c1555fc5b3885723959f7ac950ba1626), Mark Pfeiffer et al 2017, ICRA
101 | 
102 | ### manipulator
103 | 
104 | * [Learning dexterous in-hand manipulation](https://journals.sagepub.com/doi/abs/10.1177/0278364919887447), Andrychowicz et al 2020, IJRR
105 | * [Dynamics Learning with Cascaded Variational Inference for Multi-Step Manipulation](https://www.semanticscholar.org/paper/Dynamics-Learning-with-Cascaded-Variational-for-Fang-Zhu/1674008abd47f1ce1e894c672074a47ee6c3288c), Kuan Fang et al 2019, CoRL
106 | * [Towards Practical Multi-Object Manipulation using Relational Reinforcement Learning](https://link.zhihu.com/?target=https%3A//richardrl.github.io/relational-rl/), R. Li et al 2019, ICRA
107 | * [Solving rubik's cube with a robot hand](https://arxiv.org/abs/1910.07113), I Akkaya et al 2019, ArXiv
108 | * [Closing the Sim-to-Real Loop: Adapting Simulation Randomization with Real World Experience](https://www.semanticscholar.org/paper/b2174399c04a7d894bcd2dc7848a35aed4c67f80), Chebotar et al 2019, ICRA
109 | * [Sim-to-Real Transfer of Robotic Control with Dynamics Randomization](https://www.semanticscholar.org/paper/0af8cdb71ce9e5bf37ad2a11f05af293cfe62172), Xue Bin Peng et al 2018, ICRA
110 | * [Reinforcement and Imitation Learning for Diverse Visuomotor Skills](https://www.semanticscholar.org/paper/d356a5603f14c7a6873272774782d7812871f952), Yuke Zhu et al 2018, RSS
111 | * [Asymmetric Actor Critic for Image-Based Robot Learning](https://www.semanticscholar.org/paper/Asymmetric-Actor-Critic-for-Image-Based-Robot-Pinto-Andrychowicz/cee949487d13d0b64c4ef21b66ece96eb08472b3), Lerrel Pinto et al 2018, RSS
112 | * [Learning Synergies Between Pushing and Grasping with Self-Supervised Deep Reinforcement Learning](https://ieeexplore.ieee.org/document/8593986/), Andy Zeng et al 2018, IROS
113 | * [Learning Complex Dexterous Manipulation with Deep Reinforcement Learning and Demonstrations](https://www.semanticscholar.org/paper/Learning-Complex-Dexterous-Manipulation-with-Deep-Rajeswaran-Kumar/e010ba3ff5744604cdbfe44a733e2a98649ee907), A. Rajeswaran et al 2018, RSS
114 | * [Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates](https://www.semanticscholar.org/paper/Deep-reinforcement-learning-for-robotic-with-Gu-Holly/e37b999f0c96d7136db07b0185b837d5decd599a), S. Gu et al 2017, ICRA
115 | 
116 | ### MRS
117 | 
118 | * [MAMBPO: Sample-efficient *multi*-*robot* reinforcement learning using learned world models](https://www.semanticscholar.org/paper/MAMBPO%3A-Sample-efficient-multi-robot-reinforcement-Willemsen-Coppola/b7cb2bb1c116efd825d391c6e17028f51770cac7), Daniel Willemsen et al 2021, IROS
119 | * [Adaptive and extendable control of unmanned surface vehicle formations using distributed deep reinforcement learning](https://www.semanticscholar.org/paper/Adaptive-and-extendable-control-of-unmanned-surface-Wang-Ma/f1880a9a9d3080c516be80b0bc7a6d4c9fcdd137), Shuwu Wang et al 2021, 
120 | * [Voronoi-based multi-robot autonomous exploration in unknown environments via deep reinforcement learning](https://ieeexplore.ieee.org/abstract/document/9244647/), Hu J et al 2020, TVT
121 | * [Distributed multi-robot collision avoidance via deep reinforcement learning for navigation in complex scenarios](https://www.semanticscholar.org/paper/Distributed-multi-robot-collision-avoidance-via-for-Fan-Long/3c7a22a6e60a8adbfff34bc55cb07f6429b9e522), Tingxiang Fan 2020, IJRR
122 | * [Glas: Global-to-local safe autonomy synthesis for multi-robot motion planning with end-to-end learning](https://ieeexplore.ieee.org/abstract/document/9091314/), B Riviere 2020, RAL
123 | * [Distributed Non-Communicating Multi-Robot Collision Avoidance via Map-Based Deep Reinforcement Learning](https://www.semanticscholar.org/paper/Distributed-Non-Communicating-Multi-Robot-Collision-Chen-Yao/b013a5f87a6966dc7fdb0f62bab2c88c52e3f9f5), Guangda Chen et al 2020, Sensors
124 | * [A Two-Stage Reinforcement Learning Approach for Multi-UAV Collision Avoidance Under Imperfect Sensing](https://www.semanticscholar.org/paper/A-Two-Stage-Reinforcement-Learning-Approach-for-Wang-Fan/b6a2741002714cbe3e069ab21e74cdea8ba35806), Dawei Wang 2020, RAL
125 | * [Towards optimally decentralized multi-robot collision avoidance via deep reinforcement learning](https://ieeexplore.ieee.org/abstract/document/8461113/), P Long et al 2018, ICRA
126 | * [Motion Planning Among Dynamic, Decision-Making Agents with Deep Reinforcement Learning](https://www.semanticscholar.org/paper/Motion-Planning-Among-Dynamic%2C-Decision-Making-with-Everett-Chen/f3161b75de1e37b0591f250068b676ea72d1ba22), Michael Everett et al 2018, IROS
127 | * [Decentralized non-communicating multiagent collision avoidance with deep reinforcement learning](https://ieeexplore.ieee.org/abstract/document/7989037/), YF Chen et al 2017, ICRA
128 | 
129 | ### simulator-related
130 | 
131 | * [Isaac Gym: High Performance GPU-Based Physics Simulation For Robot Learning](https://www.semanticscholar.org/paper/49142e3e381c0dc7fee0049ea41d2ef02c0340d7), Viktor Makoviychuk et al 2021, NeurIPS
132 | 
133 |   > The most prominent reinforcement learning simulator at this stage
134 | 
135 | * [Flightmare: A Flexible Quadrotor Simulator](https://rpg.ifi.uzh.ch/docs/CoRL20_Yunlong.pdf), Yunlong Song et al 2020, CoRL
136 | 
137 |   > [ETH research team](https://rpg.ifi.uzh.ch/) developed a simulator for its own UAV reinforcement learning simulation.
138 | 
139 | * [Leveraging Deep Reinforcement Learning For Active Shooting Under Open-World Setting](https://link.zhihu.com/?target=https%3A//ieeexplore.ieee.org/abstract/document/9102966), A. Tzimas et al 2020, ICME
140 | 
141 | * [FlightGoggles: A Modular Framework for Photorealistic Camera, Exteroceptive Sensor, and Dynamics Simulation](https://ieeexplore.ieee.org/abstract/document/8968116/), Winter Guerra et al 2019, IROS 
142 | 
143 | * [AirSim Drone Racing Lab](http://proceedings.mlr.press/v123/madaan20a.html), Ratnesh Madaan et al 2019, NeurIPS
144 | 
145 |   > A simulation framework for autonomous drone racing.
146 | 
147 | ### research groups & institutes
148 | 
149 | * [Robotics and Perception Group, University of Zurich](https://rpg.ifi.uzh.ch/)
150 | 
151 | * [Dynamic Robotics Laboratory, Oregon State University](https://mime.oregonstate.edu/research/drl/ )
152 | 
153 | * [Robotic Systems Lab, ethz](https://rsl.ethz.ch/)
154 | 
155 | * [UC Berkeley's Robot Learning Lab](https://rll.berkeley.edu/)
156 | 
157 | * [Robotic AI & Learning Lab](http://rail.eecs.berkeley.edu/)
158 | 
159 | * [Learning Agents Research Group](https://www.cs.utexas.edu/~pstone/index.shtml)
160 | 
161 | 
162 | 
163 | ---
164 | 
165 | ## References
166 | 
167 | 1. https://zhuanlan.zhihu.com/p/508916024
168 | 2. https://www.zhihu.com/question/516672871/answer/2409132149
169 | 3. 


--------------------------------------------------------------------------------
/Tools.md:
--------------------------------------------------------------------------------
 1 | # RL Tools
 2 | 
 3 | Contributors: [johnjim0816](https://github.com/JohnJim0816)
 4 | 
 5 | ## Frameworks
 6 | 
 7 | [OpenSpiel](https://github.com/deepmind/open_spiel): A Framework for Reinforcement Learning in Games, including **DeepNash**
 8 | 
 9 | ### RL-basics
10 | 
11 | * [OpenAI Gym](https://github.com/openai/gym)
12 | 
13 | ### Offline RL
14 | 
15 | * [D4RL](https://sites.google.com/view/d4rl/home)
16 | 
17 | ### Robotic Platforms:
18 | - [Habitat](https://aihabitat.org/): Real world simulator for Embodied AI
19 | - [AI2-THOR](https://github.com/allenai/ai2thor)
20 | - [Meta-World](https://github.com/rlworkgroup/metaworld)
21 | - [CoppeliaSim](https://www.coppeliarobotics.com/) + [PyRep](https://github.com/stepjam/PyRep)
22 | - [OpenAI Gym Mujoco](https://github.com/deepmind/mujoco)
23 | 	-  [OpenAI robogym](https://github.com/openai/robogym) 
24 | ### Other RL platforms 
25 | - [Gym Retro](https://github.com/openai/retro) 
26 | - [MineRL](https://minerl.readthedocs.io/en/v1.0.0/tutorials/index.html)
27 | 


--------------------------------------------------------------------------------