└── README.md


/README.md:
--------------------------------------------------------------------------------
  1 | # Uncertainty-in-RL
  2 | 
  3 | 
  4 | 
  5 | The repository is for Reinforcement-Learning Uncertainty research, in which we investigate various uncertain factors in RL. 
  6 | 
  7 | 
  8 | 
  9 | 
 10 | 
 11 | ### 1. Uncertainty in Reward
 12 | #### 1.1. Inverse Reinforcement Learning 
 13 | - Apprenticeship Learning via Inverse Reinforcement Learning, [Paper](https://www.cs.utexas.edu/~sniekum/classes/RLFD-F15/papers/Abbeel04.pdf) (2004)
 14 | - Maximum Entropy Inverse Reinforcement Learning, [Paper](https://www.aaai.org/Papers/AAAI/2008/AAAI08-227.pdf?source=post_page---------------------------) (2008)
 15 | - Adversarial Inverse Reinforcement Learning, [Paper](https://arxiv.org/pdf/1710.11248.pdf) (2018)
 16 | - Inverse Reward Design, [Paper](https://proceedings.neurips.cc/paper/2017/file/32fdab6559cdfa4f167f8c31b9199643-Paper.pdf) (2017)
 17 | 
 18 | #### 1.2. Generative Adversarial Imitation Learning
 19 | - Generative Adversarial Imitation Learning, [Paper](https://proceedings.neurips.cc/paper/2016/file/cc7e2b878868cbae992d1fb743995d8f-Paper.pdf)(2016)
 20 | - A Connection Between Generative Adversarial Networks, Inverse Reinforcement Learning, and Energy-Based Models, [Paper](https://arxiv.org/pdf/1611.03852.pdf?source=post_page)(2016)
 21 | - Triple-GAIL: A Multi-Modal Imitation Learning Framework with Generative Adversarial Nets, [Paper](https://arxiv.org/pdf/2005.10622.pdf)(2020)
 22 | 
 23 | #### 1.3. Preference-based RL 
 24 | - Deep Reinforcement Learning from Human Preferences, [Paper](https://proceedings.neurips.cc/paper/2017/file/d5e2c0adad503c91f91df240d0cd4e49-Paper.pdf)(2017)
 25 | - Reward learning from human preferences and demonstrations in Atari, [Paper](https://proceedings.neurips.cc/paper/2018/file/8cbe9ce23f42628c98f80fa0fac8b19a-Paper.pdf)(2018)
 26 | - End-to-End Robotic Reinforcement Learning without Reward Engineering, [Paper](https://arxiv.org/pdf/1904.07854.pdf)(2019)
 27 | - PEBBLE: Feedback-Efficient Interactive Reinforcement Learning via Relabeling Experience and Unsupervised Pre-training, [Paper](https://arxiv.org/pdf/2106.05091.pdf)(2021)
 28 | - Reward uncertainty for exploration in preference-based RL, [Paper](https://arxiv.org/pdf/2205.12401.pdf)(2022)
 29 | 
 30 | #### 1.4. Meta-Learning 
 31 | - MURAL: Meta-Learning Uncertainty-Aware Rewards for Outcome-Driven Reinforcement Learning, [Paper](http://proceedings.mlr.press/v139/li21g/li21g.pdf) (2021)
 32 | 
 33 | 
 34 | 
 35 | 
 36 | ### 2. Uncertainty in Transition
 37 | 
 38 | #### 2.1. Gaussian Process, Bayesian Neural Network 
 39 | 
 40 | - PILCO: A Model-Based and Data-Efficient Approach to Policy Search, [Paper](https://mlg.eng.cam.ac.uk/pub/pdf/DeiRas11.pdf)(2011)
 41 | 
 42 | - Improving PILCO with Bayesian Neural Network Dynamics Models，[Paper](http://mlg.eng.cam.ac.uk/yarin/website/PDFs/DeepPILCO.pdf)(2016)
 43 | 
 44 | - Weight Uncertainty in Neural Networks, [Paper](http://proceedings.mlr.press/v37/blundell15.pdf)(2015)
 45 | 
 46 | - Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning, [Paper](http://proceedings.mlr.press/v48/gal16.pdf)(2016)
 47 | 
 48 | 
 49 | 
 50 | #### 2.2. Model-Ensemble
 51 |  
 52 | - Deep Exploration via Bootstrapped DQN, [Paper](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9287440)(2016) 
 53 | 
 54 | - Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles, [Paper](https://proceedings.neurips.cc/paper/2017/file/9ef2ed4b7fd2c810847ffa5fa85bce38-Paper.pdf)(2017)
 55 | 
 56 | - Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models, [Paper](https://proceedings.neurips.cc/paper/2018/file/3de568f8597b94bda53149c7d7f5958c-Paper.pdf)(2018)
 57 | [Code](https://github.com/kchua/handful-of-trials)
 58 | 
 59 | - Model-Ensemble Trust-Region Policy Optimization,  [Paper](https://arxiv.org/pdf/1802.10592.pdf)(2018)
 60 | [Code](https://github.com/thanard/me-trpo.)
 61 | 
 62 | 
 63 | #### 2.3. Variational RL
 64 | 
 65 | 
 66 | - Auto-Encoding Variational Bayes, [Paper](https://arxiv.org/pdf/1312.6114.pdf?source=post_page---------------------------)(2013)
 67 | 
 68 | - Exploring State Transition Uncertainty in Variational Reinforcement Learning, [Paper](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9287440)(2020)
 69 | 
 70 | - UMBRELLA: Uncertainty-Aware Model-Based Offline Reinforcement Learning Leveraging Plannin, [Paper](https://arxiv.org/pdf/2111.11097.pdf)(2022)
 71 | 
 72 | 
 73 | 
 74 | #### 2.4. Robust RL
 75 | - Robust Control of Markov Decision Processes with Uncertain Transition Matrices, [Paper](http://people.eecs.berkeley.edu/~elghaoui/Pubs/RobMDP_OR2005.pdf)(2005)
 76 | 
 77 | - Reinforcement Learning in Robust Markov Decision Processes, [Paper](https://proceedings.neurips.cc/paper/2013/file/0deb1c54814305ca9ad266f53bc82511-Paper.pdf)(2013)
 78 | 
 79 | - Robust Adversarial Reinforcement Learning, [Paper](http://proceedings.mlr.press/v70/pinto17a/pinto17a.pdf)(2017)
 80 | 
 81 | - Robust analysis of discounted Markov decision processes with uncertain transition probabilities, [Paper](http://www.amjcu.zju.edu.cn/amjcub/2020-2029/202004/417-436.pdf)(2020)
 82 | 
 83 | - Robust Multi-Agent Reinforcement Learning with Model Uncertainty, [Paper](https://proceedings.neurips.cc/paper/2020/file/774412967f19ea61d448977ad9749078-Paper.pdf)(2020)
 84 | 
 85 | - RAMBO-RL: Robust Adversarial Model-Based Offline Reinforcement Learning, [Paper](https://arxiv.org/pdf/2204.12581.pdf)(2022) [Code](https://github.com/marc-rigter/rambo)
 86 | 
 87 | ### 3. Uncertainty in State
 88 | 
 89 | #### 3.1. Approximation of belief-states with Bayesian Filtering
 90 | 
 91 | - Deep Reinforcement Learning with POMDP, [Paper](http://cs229.stanford.edu/proj2015/363_report.pdf)(2015)
 92 | 
 93 | - QMDP-Net: Deep Learning for Planning under Partial Observability, [Paper](https://proceedings.neurips.cc/paper/2017/file/e9412ee564384b987d086df32d4ce6b7-Paper.pdf)(2017)
 94 | 
 95 | #### 3.2. Approximation of belief-states in vector representation with RNN
 96 | 
 97 | - Deep Recurrent Q-Learning for Partially Observable MDPs, [Paper](https://arxiv.org/pdf/1507.06527.pdf)(2015)
 98 | 
 99 | - On Improving Deep Reinforcement Learning for POMDPs, [Paper](https://arxiv.org/pdf/1704.07978.pdf)(2018) [Code](https://github.com/bit1029public/ADRQN)
100 | 
101 | - Shaping Belief States with Generative Environment Models for RL, [Paper](https://proceedings.neurips.cc/paper/2019/file/2c048d74b3410237704eb7f93a10c9d7-Paper.pdf)(2019)
102 | 
103 | - Recurrent Model-Free RL Can Be a Strong Baseline for Many POMDP, [Paper](https://arxiv.org/pdf/2110.05038.pdf)(2021) 
104 | 
105 | - Memory-based Deep Reinforcement Learning for POMDP, [Paper](https://arxiv.org/pdf/2102.12344.pdf)(2021)
106 | 
107 | #### 3.3. Approximation of belief-states with variational inference
108 | 
109 | - Deep Kalman Filters, [Paper](https://arxiv.org/pdf/1511.05121.pdf)(2015)
110 | 
111 | - A Recurrent Latent Variable Model for Sequential Data, [Paper](https://proceedings.neurips.cc/paper/2015/file/b618c3210e934362ac261db280128c22-Paper.pdf)(2015)
112 | 
113 | - TEMPORAL DIFFERENCE VARIATIONAL AUTO-ENCODER, [Paper](https://arxiv.org/pdf/1806.03107.pdf)(2018)
114 | 
115 | - VARIATIONAL RECURRENT MODELS FOR SOLVING PARTIALLY OBSERVABLE CONTROL TASKS, [Paper](https://openreview.net/pdf?id=r1lL4a4tDB)(2020)[Code](https://github.com/oist-cnru/Variational-Recurrent-Models)
116 | 
117 | - Stochastic Latent Actor-Critic: Deep Reinforcement Learning with a Latent Variable Model, [Paper](https://proceedings.neurips.cc/paper/2020/file/08058bf500242562c0d031ff830ad094-Paper.pdf)(2020)
118 | 
119 | - Flow-based Recurrent Belief State Learning for POMDPs, [Paper](https://proceedings.mlr.press/v162/chen22q/chen22q.pdf)(2022)
120 | 
121 | #### 3.4. Approximation of belief-states with Particle Filter
122 | 
123 | - DESPOT: Online POMDP Planning with Regularization, [Paper](https://proceedings.neurips.cc/paper/2013/file/c2aee86157b4a40b78132f1e71a9e6f1-Paper.pdf)(2013)
124 | 
125 | - Intention-Aware Online POMDP Planning for Autonomous Driving in a Crows, [Paper](https://bigbird.comp.nus.edu.sg/m2ap/wordpress/wp-content/uploads/2016/01/icra15.pdf)(2015) 
126 | 
127 | - Deep Variational Reinforcement Learning for POMDPs, [Paper](http://proceedings.mlr.press/v80/igl18a/igl18a.pdf)(2018)
128 | 
129 | - Particle Filter Recurrent Neural Networks, [Paper](https://ojs.aaai.org/index.php/AAAI/article/view/5952)(2020)
130 | 
131 | 
132 | 
133 | ### 4. Uncertainty in Observation
134 | 
135 | 
136 | - A Bayesian Method for Learning POMDP Observation Parameters for Robot Interaction Management Systems, [Paper](http://users.isr.ist.utl.pt/~mtjspaan/POMDPPractioners/pomdp2010_submission_16.pdf)(2010)
137 | 
138 | - Modeling Humans as Observation Providers using POMDPs, [Paper](https://ieeexplore.ieee.org/abstract/document/6005272)(2011)
139 | 
140 | - Adversarial Attacks on Neural Network Policies, [Paper](https://arxiv.org/pdf/1702.02284.pdf)(2017)
141 | 
142 | - Vulnerability of Deep Reinforcement Learning to Policy Induction Attacks, [Paper](https://arxiv.org/pdf/1701.04143.pdf)(2017)
143 | 
144 | - Robust Deep Reinforcement Learning with Adversarial Attacks, [Paper](https://arxiv.org/pdf/1712.03632.pdf)(2017)
145 | 
146 | - Whatever Does Not Kill Deep Reinforcement Learning, Makes It Stronger, [Paper](https://arxiv.org/pdf/1712.09344.pdf)(2017)
147 | 
148 | - Multi-Agent Connected Autonomous Driving using Deep Reinforcement Learning, [Paper](https://arxiv.org/pdf/1911.04175.pdf)(2019)
149 | 
150 | - Certified Adversarial Robustness for Deep Reinforcement Learning, [Paper](https://arxiv.org/pdf/1910.12908.pdf)(2020)
151 | 
152 | - A Two-Stage Reinforcement Learning Approach for Multi-UAV Collision Avoidance Under Imperfect Sensing, [Paper](https://www.researchgate.net/profile/Dawei-Wang-34/publication/339344692_A_Two-Stage_Reinforcement_Learning_Approach_for_Multi-UAV_Collision_Avoidance_Under_Imperfect_Sensing/links/5ed8b2ba299bf1c67d3bd2ab/A-Two-Stage-Reinforcement-Learning-Approach-for-Multi-UAV-Collision-Avoidance-Under-Imperfect-Sensing.pdf)(2020)
153 | 
154 | - ADVERSARIAL POLICIES: ATTACKING DEEP REINFORCEMENT LEARNING, [Paper](https://openreview.net/attachment?id=HJgEMpVFwB&name=original_pdf)(2020)
155 | 
156 | - Robust Deep Reinforcement Learning against Adversarial Perturbations on State Observations, [Paper](https://arxiv.org/pdf/2003.08938.pdf)(2021)[Code](https://github.com/chenhongge/StateAdvDRL)
157 | 
158 | - ROBUST REINFORCEMENT LEARNING ON STATE OBSERVATIONS WITH LEARNED OPTIMAL ADVERSARY, [Paper](https://arxiv.org/pdf/2101.08452.pdf)(2021)
159 | 
160 | - Certifiable Robustness to Adversarial State Uncertainty in Deep Reinforcement Learning, [Paper](https://arxiv.org/pdf/2004.06496.pdf)(2021)
161 | 
162 | - Incorporating Observation Uncertainty into Reinforcement Learning-Based Spacecraft Guidance Schemes, [Paper](https://arc.aiaa.org/doi/pdf/10.2514/6.2022-1765)(2022)
163 | 
164 | - POLICY SMOOTHING FOR PROVABLY ROBUST REINFORCEMENT LEARNING, [Paper](https://arxiv.org/pdf/2106.11420.pdf)(2022)
165 | 
166 | - RORL: Robust Offline Reinforcement Learning via Conservative Smoothing, [Paper](https://arxiv.org/pdf/2206.02829v3.pdf)(2022)
167 | 
168 | ### 5. Uncertainty in Constraint
169 | 
170 | 
171 | - Budget Allocation using Weakly Coupled, Constrained Markov Decision Processes, [Paper](https://storage.googleapis.com/pub-tools-public-publication-data/pdf/45291.pdf)(2016)
172 | 
173 | - A Fitted-Q Algorithm for Budgeted MDPs, [Paper](https://www.microsoft.com/en-us/research/uploads/prod/2019/04/ewrl_14_2018_paper_67.pdf)(2018)
174 | 
175 | - Inferring geometric constraints in human demonstrations, [Paper](http://proceedings.mlr.press/v87/subramani18a/subramani18a.pdf)(2018)
176 | 
177 | - Learning Constraints from Demonstrations, [Paper](https://arxiv.org/pdf/1812.07084.pdf)(2018)
178 | 
179 | - Safe Exploration in Continuous Action Spaces, [Paper](https://arxiv.org/pdf/1801.08757.pdf)(2018)
180 | 
181 | - Learner-aware Teaching: Inverse Reinforcement Learning with Preferences and Constraints, [Paper](https://proceedings.neurips.cc/paper/2019/file/3de568f8597b94bda53149c7d7f5958c-Paper.pdf)(2019)
182 | 
183 | - Budgeted Reinforcement Learning in Continuous State Space, [Paper](https://proceedings.neurips.cc/paper/2019/file/4fe5149039b52765bde64beb9f674940-Paper.pdf)(2019)
184 | 
185 | - Learning Parametric Constraints in High Dimensions from Demonstrations, [Paper](https://arxiv.org/pdf/1910.03477.pdf)(2019)
186 | 
187 | - Maximum Likelihood Constraint Inference for Inverse Reinforcement Learning, [Paper](https://arxiv.org/pdf/1909.05477.pdf)(2020)
188 | 
189 | - Learning Constraints from Locally-Optimal Demonstrations under Cost Function Uncertainty, [Paper](https://arxiv.org/pdf/2001.09336.pdf)(2020)
190 | 
191 | - WCSAC: Worst-Case Soft Actor Critic for Safety-Constrained Reinforcement Learning, [Paper](https://www.st.ewi.tudelft.nl/~mtjspaan/pub/Yang21aaai.pdf)(2021)
192 | 
193 | - Inverse Constrained Reinforcement Learning, [Paper](https://arxiv.org/pdf/2011.09999.pdf)(2021) [Code](https://github.com/shehryar-malik/icrl)
194 | - Safety‑constrained reinforcement learning with a distributional safety critic, [Paper](https://link.springer.com/content/pdf/10.1007/s10994-022-06187-8.pdf?pdf=button)(2022)
195 | 
196 | - Learning Behavioral Soft Constraints from Demonstrations, [Paper](https://arxiv.org/pdf/2202.10407.pdf)(2022)
197 | 
198 | - Constrained Markov decision processes with uncertain costs, [Paper](https://www.sciencedirect.com/science/article/abs/pii/S0167637722000268)(2022)
199 | 
200 | 
201 | 
202 | 
203 | 


--------------------------------------------------------------------------------