└── README.md /README.md: -------------------------------------------------------------------------------- 1 | # Multi-Agent Reinforcement Learning papers 2 | This is a collection of Multi-Agent Reinforcement Learning (MARL) papers. Each category is a potential start point for you to start your research. Some papers are listed more than once because they belong to multiple categories. 3 | 4 | For MARL papers with code and MARL resources, please refer to [MARL Papers with Code](https://github.com/TimeBreaker/MARL-papers-with-code) and [MARL Resources Collection](https://github.com/TimeBreaker/MARL-resources-collection). 5 | 6 | I will continually update this repository and I welcome suggestions. (missing important papers, missing categories, invalid links, etc.) This is only a first draft so far and I'll add more resources in the next few months. 7 | 8 | This repository is not for commercial purposes. 9 | 10 | My email: chenhao915@mails.ucas.ac.cn 11 | 12 | 13 | ## Overview 14 | * [Reviews](https://github.com/TimeBreaker/Multi-Agent-Reinforcement-Learning-papers#reviews) 15 | * [Environments](https://github.com/TimeBreaker/Multi-Agent-Reinforcement-Learning-papers#environments) 16 | * [Dealing With Credit Assignment Issue](https://github.com/TimeBreaker/Multi-Agent-Reinforcement-Learning-papers#dealing-with-credit-assignment-issue) 17 | * [Policy Gradient](https://github.com/TimeBreaker/Multi-Agent-Reinforcement-Learning-papers#policy-gradient) 18 | * [Communication](https://github.com/TimeBreaker/Multi-Agent-Reinforcement-Learning-papers#communication) 19 | * [Emergent](https://github.com/TimeBreaker/Multi-Agent-Reinforcement-Learning-papers#emergent) 20 | * [Opponent Modeling](https://github.com/TimeBreaker/Multi-Agent-Reinforcement-Learning-papers#opponent-modeling) 21 | * [Game Theoretic](https://github.com/TimeBreaker/Multi-Agent-Reinforcement-Learning-papers#game-theoretic) 22 | * [Hierarchical](https://github.com/TimeBreaker/Multi-Agent-Reinforcement-Learning-papers#hierarchical) 23 | * [Ad Hoc Teamwork](https://github.com/TimeBreaker/Multi-Agent-Reinforcement-Learning-papers#ad-hoc-teamwork) 24 | * [League Training](https://github.com/TimeBreaker/Multi-Agent-Reinforcement-Learning-papers#league-training) 25 | * [Curriculum Learning](https://github.com/TimeBreaker/Multi-Agent-Reinforcement-Learning-papers#curriculum-learning) 26 | * [Mean Field](https://github.com/TimeBreaker/Multi-Agent-Reinforcement-Learning-papers#mean-field) 27 | * [Transfer Learning](https://github.com/TimeBreaker/Multi-Agent-Reinforcement-Learning-papers#transfer-learning) 28 | * [Meta Learning](https://github.com/TimeBreaker/Multi-Agent-Reinforcement-Learning-papers#meta-learning) 29 | * [Fairness](https://github.com/TimeBreaker/Multi-Agent-Reinforcement-Learning-papers#fairness) 30 | * [Exploration](https://github.com/TimeBreaker/Multi-Agent-Reinforcement-Learning-papers#exploration) 31 | * [Graph Neural Network](https://github.com/TimeBreaker/Multi-Agent-Reinforcement-Learning-papers#graph-neural-network) 32 | * [Model-based](https://github.com/TimeBreaker/Multi-Agent-Reinforcement-Learning-papers#model-based) 33 | * [NAS](https://github.com/TimeBreaker/Multi-Agent-Reinforcement-Learning-papers#nas) 34 | * [Safe Multi-Agent Reinforcement Learning](https://github.com/TimeBreaker/Multi-Agent-Reinforcement-Learning-papers#safe-multi-agent-reinforcement-learning) 35 | * [From Single-Agent to Multi-Agent](https://github.com/TimeBreaker/Multi-Agent-Reinforcement-Learning-papers#from-single-agent-to-multi-agent) 36 | * [Discrete-Continuous Hybrid Action Spaces / Parameterized Action Space](https://github.com/TimeBreaker/Multi-Agent-Reinforcement-Learning-papers#discrete-continuous-hybrid-action-space--parameterized-action-space) 37 | * [Role](https://github.com/TimeBreaker/Multi-Agent-Reinforcement-Learning-papers#role) 38 | * [Diversity](https://github.com/TimeBreaker/Multi-Agent-Reinforcement-Learning-papers#diversity) 39 | * [Sparse Reward](https://github.com/TimeBreaker/Multi-Agent-Reinforcement-Learning-papers#sparse-reward) 40 | * [Large Scale](https://github.com/TimeBreaker/Multi-Agent-Reinforcement-Learning-papers#large-scale) 41 | * [DTDE](https://github.com/TimeBreaker/Multi-Agent-Reinforcement-Learning-papers#dtde) 42 | * [Decision Transformer](https://github.com/TimeBreaker/Multi-Agent-Reinforcement-Learning-papers#decision-transformer) 43 | * [Offline MARL](https://github.com/TimeBreaker/Multi-Agent-Reinforcement-Learning-papers#offline-marl) 44 | * [Generalization](https://github.com/TimeBreaker/Multi-Agent-Reinforcement-Learning-papers#generalization) 45 | * [Adversarial](https://github.com/TimeBreaker/Multi-Agent-Reinforcement-Learning-papers#adversarial) 46 | * [Multi-Agent Path Finding](https://github.com/TimeBreaker/Multi-Agent-Reinforcement-Learning-papers#multi-agent-path-finding) 47 | * [To be Categorized](https://github.com/TimeBreaker/Multi-Agent-Reinforcement-Learning-papers#to-be-categorized) 48 | * [TODO](https://github.com/TimeBreaker/Multi-Agent-Reinforcement-Learning-papers#todo) 49 | 50 | 51 | ## Reviews 52 | ### Recent Reviews (Since 2019) 53 | * [A Survey and Critique of Multiagent Deep Reinforcement Learning](https://arxiv.org/pdf/1810.05587v3) 54 | * [An Overview of Multi-Agent Reinforcement Learning from Game Theoretical Perspective](https://arxiv.org/abs/2011.00583v2) 55 | * [Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms](https://arxiv.org/abs/1911.10635v1) 56 | * [A Review of Cooperative Multi-Agent Deep Reinforcement Learning](https://arxiv.org/abs/1908.03963) 57 | * [Dealing with Non-Stationarity in Multi-Agent Deep Reinforcement Learning](https://arxiv.org/abs/1906.04737) 58 | * [A Survey of Learning in Multiagent Environments: Dealing with Non-Stationarity](https://arxiv.org/abs/1707.09183v1) 59 | * [Deep Reinforcement Learning for Multi-Agent Systems: A Review of Challenges, Solutions and Applications](https://arxiv.org/pdf/1812.11794.pdf) 60 | * [A Survey on Transfer Learning for Multiagent Reinforcement Learning Systems](https://www.researchgate.net/publication/330752409_A_Survey_on_Transfer_Learning_for_Multiagent_Reinforcement_Learning_Systems) 61 | 62 | ### Other Reviews (Before 2019) 63 | * [If multi-agent learning is the answer, what is the question?](https://ai.stanford.edu/people/shoham/www%20papers/LearningInMAS.pdf) 64 | * [Multiagent learning is not the answer. It is the question](https://core.ac.uk/download/pdf/82595758.pdf) 65 | * [Is multiagent deep reinforcement learning the answer or the question? A brief survey](https://arxiv.org/abs/1810.05587v1) Note that [A Survey and Critique of Multiagent Deep Reinforcement Learning](https://arxiv.org/pdf/1810.05587v3) is an updated version of this paper with the same authors. 66 | * [Evolutionary Dynamics of Multi-Agent Learning: A Survey](https://www.researchgate.net/publication/280919379_Evolutionary_Dynamics_of_Multi-Agent_Learning_A_Survey) 67 | * (Worth reading although they're not recent reviews.) 68 | 69 | 70 | ## Environments 71 | Environment|Paper|Code|Accepted at|Year 72 | --|:--:|:--:|:--:|--: 73 | StarCraft|[The StarCraft Multi-Agent Challenge](https://arxiv.org/pdf/1902.04043)|https://github.com/oxwhirl/smac|NIPS|2019 74 | StarCraft|[SMACv2: A New Benchmark for Cooperative Multi-Agent Reinforcement Learning](https://openreview.net/pdf?id=pcBnes02t3u)|https://github.com/oxwhirl/smacv2||2022 75 | StarCraft|[Benchmarking Multi-Agent Deep Reinforcement Learning Algorithms in Cooperative Tasks](https://arxiv.org/pdf/2006.07869)|https://github.com/uoe-agents/epymarl|NIPS|2021 76 | Football|[Google Research Football: A Novel Reinforcement Learning Environment](https://ojs.aaai.org/index.php/AAAI/article/view/5878/5734)|https://github.com/google-research/football|AAAI|2020 77 | PettingZoo|[PettingZoo: Gym for Multi-Agent Reinforcement Learning](https://proceedings.neurips.cc/paper/2021/file/7ed2d3454c5eea71148b11d0c25104ff-Paper.pdf)|https://github.com/Farama-Foundation/PettingZoo|NIPS|2021 78 | Melting Pot|[Scalable Evaluation of Multi-Agent Reinforcement Learning with Melting Pot](http://proceedings.mlr.press/v139/leibo21a/leibo21a.pdf)|https://github.com/deepmind/meltingpot|ICML|2021 79 | MuJoCo|[MuJoCo: A physics engine for model-based control](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.296.6848&rep=rep1&type=pdf)|https://github.com/deepmind/mujoco|IROS|2012 80 | MALib|[MALib: A Parallel Framework for Population-based Multi-agent Reinforcement Learning](https://arxiv.org/pdf/2106.07551)|https://github.com/sjtu-marl/malib||2021 81 | MAgent|[MAgent: A many-agent reinforcement learning platform for artificial collective intelligence](https://ojs.aaai.org/index.php/AAAI/article/download/11371/11230)|https://github.com/Farama-Foundation/MAgent|AAAI|2018 82 | Neural MMO|[Neural MMO: A Massively Multiagent Game Environment for Training and Evaluating Intelligent Agents](https://arxiv.org/pdf/1903.00784)|https://github.com/openai/neural-mmo||2019 83 | MPE|[Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments](https://proceedings.neurips.cc/paper/2017/file/68a9750337a418a86fe06c1991a1d64c-Paper.pdf)|https://github.com/openai/multiagent-particle-envs|NIPS|2017 84 | Pommerman|[Pommerman: A multi-agent playground](https://arxiv.org/pdf/1809.07124.pdfâ€%E2%80%B9arxiv.org)|https://github.com/MultiAgentLearning/playground||2018 85 | HFO|[Half Field Offense: An Environment for Multiagent Learning and Ad Hoc Teamwork](https://www.cse.iitb.ac.in/~shivaram/papers/hmsks_ala_2016.pdf)|https://github.com/LARG/HFO|AAMAS Workshop|2016 86 | 87 | 88 | ## Dealing With Credit Assignment Issue 89 | 90 | ### Value Decomposition 91 | Paper|Code|Accepted at|Year 92 | --|:--:|:--:|--: 93 | [VDN:Value-Decomposition Networks For Cooperative Multi-Agent Learning](https://arxiv.org/pdf/1706.05296)|https://github.com/oxwhirl/pymarl|AAMAS|2017 94 | [QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning](http://proceedings.mlr.press/v80/rashid18a/rashid18a.pdf)|https://github.com/oxwhirl/pymarl|ICML|2018 95 | [QTRAN: Learning to Factorize with Transformation for Cooperative Multi-Agent Reinforcement Learning](https://arxiv.org/abs/1905.05408)|https://github.com/oxwhirl/pymarl|ICML|2019 96 | [NDQ: Learning Nearly Decomposable Value Functions Via Communication Minimization](https://arxiv.org/abs/1910.05366v1)|https://github.com/TonghanWang/NDQ|ICLR|2020 97 | [CollaQ:Multi-Agent Collaboration via Reward Attribution Decomposition](https://arxiv.org/abs/2010.08531)|https://github.com/facebookresearch/CollaQ||2020 98 | [SQDDPG:Shapley Q-Value: A Local Reward Approach to Solve Global Reward Games](https://arxiv.org/abs/1907.05707)|https://github.com/hsvgbkhgbv/SQDDPG|AAAI|2020 99 | [QPD:Q-value Path Decomposition for Deep Multiagent Reinforcement Learning](http://proceedings.mlr.press/v119/yang20d/yang20d.pdf)||ICML|2020 100 | [Weighted QMIX: Expanding Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning](https://arxiv.org/abs/2006.10800)|https://github.com/oxwhirl/wqmix|NIPS|2020 101 | [QTRAN++: Improved Value Transformation for Cooperative Multi-Agent Reinforcement Learning](https://arxiv.org/abs/2006.12010v2)|||2020 102 | [QPLEX: Duplex Dueling Multi-Agent Q-Learning](https://arxiv.org/abs/2008.01062)|https://github.com/wjh720/QPLEX|ICLR|2021 103 | 104 | ### Other Methods 105 | Paper|Code|Accepted at|Year 106 | --|:--:|:--:|--: 107 | [COMA:Counterfactual Multi-Agent Policy Gradients](https://arxiv.org/abs/1705.08926)|https://github.com/oxwhirl/pymarl|AAAI|2018 108 | [LiCA:Learning Implicit Credit Assignment for Cooperative Multi-Agent Reinforcement Learning](https://arxiv.org/abs/2007.02529v2)|https://github.com/mzho7212/LICA|NIPS|2020 109 | 110 | 111 | ## Policy Gradient 112 | Paper|Code|Accepted at|Year 113 | --|:--:|:--:|--: 114 | [MADDPG:Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments](https://arxiv.org/abs/1706.02275v3)|https://github.com/openai/maddpg|NIPS|2017 115 | [COMA:Counterfactual Multi-Agent Policy Gradients](https://arxiv.org/abs/1705.08926)|https://github.com/oxwhirl/pymarl|AAAI|2018 116 | [IPPO:Is Independent Learning All You Need in the StarCraft Multi-Agent Challenge?](https://arxiv.org/abs/2011.09533)|||2020 117 | [MAPPO:The Surprising Effectiveness of MAPPO in Cooperative, Multi-Agent Games](https://arxiv.org/abs/2103.01955)|https://github.com/marlbenchmark/on-policy||2021 118 | [MAAC:Actor-Attention-Critic for Multi-Agent Reinforcement Learning](https://arxiv.org/abs/1810.02912)|https://github.com/shariqiqbal2810/MAAC|ICML|2019 119 | [DOP: Off-Policy Multi-Agent Decomposed PolicyGradients](https://arxiv.org/abs/2007.12322)|https://github.com/TonghanWang/DOP|ICLR|2021 120 | [M3DDPG:Robust Multi-Agent Reinforcement Learning via Minimax Deep Deterministic Policy Gradient](https://ojs.aaai.org/index.php/AAAI/article/view/4327/4205)||AAAI|2019 121 | 122 | 123 | ## Communication 124 | ### Communication Without Bandwidth Constraint 125 | Paper|Code|Accepted at|Year 126 | --|:--:|:--:|--: 127 | [CommNet:Learning Multiagent Communication with Backpropagation](https://arxiv.org/abs/1605.07736)|https://github.com/facebookarchive/CommNet|NIPS|2016 128 | [BiCNet:Multiagent Bidirectionally-Coordinated Nets: Emergence of Human-level Coordination in Learning to Play StarCraft Combat Games](https://arxiv.org/abs/1703.10069)|https://github.com/Coac/CommNet-BiCnet||2017 129 | [VAIN: Attentional Multi-agent Predictive Modeling](https://arxiv.org/abs/1706.06122)||NIPS|2017 130 | [IC3Net:Learning when to Communicate at Scale in Multiagent Cooperative and Competitive Tasks](https://arxiv.org/abs/1812.09755)|https://github.com/IC3Net/IC3Net||2018 131 | [VBC:Efficient Communication in Multi-Agent Reinforcement Learning via Variance Based Control](https://arxiv.org/abs/1909.02682v1)||NIPS|2019 132 | [Graph Convolutional Reinforcement Learning for Multi-Agent Cooperation](https://arxiv.org/abs/1810.09202v1)|||2018 133 | [NDQ:Learning Nearly Decomposable Value Functions Via Communication Minimization](https://arxiv.org/abs/1910.05366v1)[NDQ: Learning Nearly Decomposable Value Functions Via Communication Minimization](https://arxiv.org/abs/1910.05366v1)|https://github.com/TonghanWang/NDQ|ICLR|2020 134 | [RIAL/RIDL:Learning to Communicate with Deep Multi-Agent Reinforcement Learning](https://arxiv.org/abs/1605.06676)|https://github.com/iassael/learning-to-communicate|NIPS|2016 135 | [ATOC:Learning Attentional Communication for Multi-Agent Cooperation](https://arxiv.org/abs/1805.07733)||NIPS|2018 136 | [Fully decentralized multi-agent reinforcement learning with networked agents](http://proceedings.mlr.press/v80/zhang18n/zhang18n.pdf)|https://github.com/cts198859/deeprl_network|ICML|2018 137 | [TarMAC: Targeted Multi-Agent Communication](http://proceedings.mlr.press/v97/das19a/das19a.pdf)||ICML|2019 138 | ### Communication Under Limited Bandwidth 139 | Paper|Code|Accepted at|Year 140 | --|:--:|:--:|--: 141 | [SchedNet:Learning to Schedule Communication in Multi-Agent Reinforcement learning](https://arxiv.org/abs/1902.01554)|||2019 142 | [Learning Multi-agent Communication under Limited-bandwidth Restriction for Internet Packet Routing](https://arxiv.org/abs/1903.05561)|||2019 143 | [Gated-ACML:Learning Agent Communication under Limited Bandwidth by Message Pruning](https://arxiv.org/abs/1912.05304v1)||AAAI|2020 144 | [Learning Efficient Multi-agent Communication: An Information Bottleneck Approach](https://arxiv.org/abs/1911.06992)||ICML|2020 145 | [Coordinating Multi-Agent Reinforcement Learning with Limited Communication](http://aamas.csc.liv.ac.uk/Proceedings/aamas2013/docs/p1101.pdf)||AAMAS|2013 146 | 147 | 148 | ## Emergent 149 | Paper|Code|Accepted at|Year 150 | --|:--:|:--:|--: 151 | [Multiagent Cooperation and Competition with Deep Reinforcement Learning](https://arxiv.org/abs/1511.08779v1)||PloS one|2017 152 | [Multi-agent Reinforcement Learning in Sequential Social Dilemmas](https://arxiv.org/abs/1702.03037)|||2017 153 | [Emergent preeminence of selfishness: an anomalous Parrondo perspective](https://kanghaocheong.files.wordpress.com/2020/02/koh-cheong2019_article_emergentpreeminenceofselfishne.pdf)||Nonlinear Dynamics|2019 154 | [Emergent Coordination Through Competition](https://arxiv.org/abs/1902.07151v2)|||2019 155 | [Biases for Emergent Communication in Multi-agent Reinforcement Learning](https://arxiv.org/abs/1912.05676)||NIPS|2019 156 | [Towards Graph Representation Learning in Emergent Communication](https://arxiv.org/abs/2001.09063)|||2020 157 | [Emergent Tool Use From Multi-Agent Autocurricula](https://arxiv.org/abs/1909.07528)|https://github.com/openai/multi-agent-emergence-environments|ICLR|2020 158 | [On Emergent Communication in Competitive Multi-Agent Teams](https://arxiv.org/abs/2003.01848)||AAMAS|2020 159 | [QED:Quasi-Equivalence Discovery for Zero-Shot Emergent Communication](https://arxiv.org/abs/2103.08067)|||2021 160 | [Incorporating Pragmatic Reasoning Communication into Emergent Language](https://arxiv.org/abs/2006.04109)||NIPS|2020 161 | 162 | 163 | ## Opponent Modeling 164 | Paper|Code|Accepted at|Year 165 | --|:--:|:--:|--: 166 | [Bayesian Opponent Exploitation in Imperfect-Information Games](https://arxiv.org/abs/1603.03491v1)||IEEE Conference on Computational Intelligence and Games|2018 167 | [LOLA:Learning with Opponent-Learning Awareness](https://arxiv.org/abs/1709.04326)||AAMAS|2018 168 | [Variational Autoencoders for Opponent Modeling in Multi-Agent Systems](https://arxiv.org/abs/2001.10829)|||2020 169 | [Stable Opponent Shaping in Differentiable Games](https://arxiv.org/abs/1811.08469)|||2018 170 | [Opponent Modeling in Deep Reinforcement Learning](https://arxiv.org/abs/1609.05559)|https://github.com/hhexiy/opponent|ICML|2016 171 | [Game Theory-Based Opponent Modeling in Large Imperfect-Information Games](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.385.6032&rep=rep1&type=pdf)||AAMAS|2011 172 | [Agent Modelling under Partial Observability for Deep Reinforcement Learning](https://proceedings.neurips.cc/paper/2021/file/a03caec56cd82478bf197475b48c05f9-Paper.pdf)||NIPS|2021 173 | 174 | 175 | 176 | ## Game Theoretic 177 | Paper|Code|Accepted at|Year 178 | --|:--:|:--:|--: 179 | [α-Rank: Multi-Agent Evaluation by Evolution](https://arxiv.org/abs/1903.01373)||Scientific reports|2019 180 | [α^α -Rank: Practically Scaling α-Rank through Stochastic Optimisation](https://arxiv.org/abs/1909.11628)||AAMAS|2020 181 | [A Game Theoretic Framework for Model Based Reinforcement Learning](https://arxiv.org/abs/2004.07804)||ICML|2020 182 | [Fictitious Self-Play in Extensive-Form Games](http://proceedings.mlr.press/v37/heinrich15.pdf)||ICML|2015 183 | [Combining Deep Reinforcement Learning and Search for Imperfect-Information Games](https://arxiv.org/pdf/2007.13544)||NIPS|2020 184 | [Real World Games Look Like Spinning Tops](https://arxiv.org/pdf/2004.09468)||NIPS|2020 185 | [PSRO: A Unified Game-Theoretic Approach to Multiagent Reinforcement Learning](https://arxiv.org/pdf/1711.00832)||NIPS|2017 186 | [Pipeline PSRO: A Scalable Approach for Finding Approximate Nash Equilibria in Large Games](https://arxiv.org/pdf/2006.08555)||NIPS|2020 187 | [A Game-Theoretic Model and Best-Response Learning Method for Ad Hoc Coordination in Multiagent Systems](https://arxiv.org/pdf/1506.01170)||AAMAS|2013 188 | [Neural Replicator Dynamics: Multiagent Learning via Hedging Policy Gradients](http://www.ifaamas.org/Proceedings/aamas2020/pdfs/p492.pdf)||AAMAS|2020 189 | 190 | 191 | 192 | ## Hierarchical 193 | Paper|Code|Accepted at|Year 194 | --|:--:|:--:|--: 195 | [Hierarchical multi-agent reinforcement learning](https://apps.dtic.mil/sti/pdfs/ADA440418.pdf)||AAMAS|2006 196 | [Hierarchical Cooperative Multi-Agent Reinforcement Learning with Skill Discovery](https://arxiv.org/pdf/1912.03558)||AAMAS|2020 197 | [Hierarchical Critics Assignment for Multi-agent Reinforcement Learning](https://arxiv.org/pdf/1902.03079)|||2019 198 | [Hierarchical Reinforcement Learning for Multi-agent MOBA Game](https://arxiv.org/pdf/1901.08004)|||2019 199 | [Hierarchical Deep Multiagent Reinforcement Learning with Temporal Abstraction](https://arxiv.org/pdf/1809.09332)|||2018 200 | [HAMA:Multi-Agent Actor-Critic with Hierarchical Graph Attention Network](https://ojs.aaai.org/index.php/AAAI/article/download/6214/6070)||AAAI|2020 201 | 202 | 203 | ## Ad Hoc Teamwork 204 | Paper|Code|Accepted at|Year 205 | --|:--:|:--:|--: 206 | [CollaQ:Multi-Agent Collaboration via Reward Attribution Decomposition](https://arxiv.org/pdf/2010.08531)|https://github.com/facebookresearch/CollaQ||2020 207 | [A Game-Theoretic Model and Best-Response Learning Method for Ad Hoc Coordination in Multiagent Systems](https://arxiv.org/pdf/1506.01170)||AAMAS|2013 208 | [Half Field Offense: An Environment for Multiagent Learning and Ad Hoc Teamwork](https://www.cse.iitb.ac.in/~shivaram/papers/hmsks_ala_2016.pdf)|https://github.com/LARG/HFO|AAMAS Workshop|2016 209 | [Open Ad Hoc Teamwork using Graph-based Policy Learning](http://proceedings.mlr.press/v139/rahman21a/rahman21a.pdf)|https://github.com/uoe-agents/GPL|ICLM|2021 210 | [A Survey of Ad Hoc Teamwork: Definitions, Methods, and Open Problems](https://arxiv.org/pdf/2202.10450)|||2022 211 | [Towards open ad hoc teamwork using graph-based policy learning](http://proceedings.mlr.press/v139/rahman21a/rahman21a.pdf)||ICML|2021 212 | [Learning with generated teammates to achieve type-free ad-hoc teamwork](https://www.ijcai.org/proceedings/2021/0066.pdf)||IJCAI|2021 213 | [Online ad hoc teamwork under partial observability](https://openreview.net/pdf?id=18Ys0-PzyPI)||ICLR|2022 214 | 215 | 216 | ## League Training 217 | Paper|Code|Accepted at|Year 218 | --|:--:|:--:|--: 219 | [AlphaStar:Grandmaster level in StarCraft II using multi-agent reinforcement learning](https://www.gwern.net/docs/rl/2019-vinyals.pdf)||Nature|2019 220 | 221 | 222 | ## Curriculum Learning 223 | Paper|Code|Accepted at|Year 224 | --|:--:|:--:|--: 225 | [Diverse Auto-Curriculum is Critical for Successful Real-World Multiagent Learning Systems](https://arxiv.org/abs/2102.07659)||AAMAS|2021 226 | [From Few to More: Large-Scale Dynamic Multiagent Curriculum Learning](https://arxiv.org/abs/1909.02790)|https://github.com/starry-sky6688/MARL-Algorithms|AAAI|2020 227 | [EPC:Evolutionary Population Curriculum for Scaling Multi-Agent Reinforcement Learning](https://arxiv.org/pdf/2003.10423)|https://github.com/qian18long/epciclr2020|ICLR|2020 228 | [Emergent Tool Use From Multi-Agent Autocurricula](https://arxiv.org/pdf/1909.07528)|https://github.com/openai/multi-agent-emergence-environments|ICLR|2020 229 | [Learning to Teach in Cooperative Multiagent Reinforcement Learning](https://ojs.aaai.org/index.php/AAAI/article/download/4570/4448)||AAAI|2019 230 | [StarCraft Micromanagement with Reinforcement Learning and Curriculum Transfer Learning](https://arxiv.org/pdf/1804.00810)||IEEE Transactions on Emerging Topics in Computational Intelligence|2018 231 | [Cooperative Multi-agent Control using deep reinforcement learning](http://ala2017.it.nuigalway.ie/papers/ALA2017_Gupta.pdf)|https://github.com/sisl/MADRL|AAMAS|2017 232 | [Variational Automatic Curriculum Learning for Sparse-Reward Cooperative Multi-Agent Problems](https://proceedings.neurips.cc/paper/2021/file/503e7dbbd6217b9a591f3322f39b5a6c-Paper.pdf)||NIPS|2021 233 | 234 | 235 | ## Mean Field 236 | Paper|Code|Accepted at|Year 237 | --|:--:|:--:|--: 238 | [Mean Field Multi-Agent Reinforcement Learning](http://proceedings.mlr.press/v80/yang18d/yang18d.pdf)||ICML|2018 239 | [Efficient Ridesharing Order Dispatching with Mean Field Multi-Agent Reinforcement Learning](https://arxiv.org/pdf/1901.11454)||The world wide web conference|2019 240 | [Bayesian Multi-type Mean Field Multi-agent Imitation Learning](https://www.researchgate.net/profile/Wen_Dong5/publication/347240659_Bayesian_Multi-type_Mean_Field_Multi-agent_Imitation_Learning/links/5fd8c3b245851553a0bb78b1/Bayesian-Multi-type-Mean-Field-Multi-agent-Imitation-Learning.pdf)||NIPS|2020 241 | 242 | 243 | ## Transfer Learning 244 | Paper|Code|Accepted at|Year 245 | --|:--:|:--:|--: 246 | [A Survey on Transfer Learning for Multiagent Reinforcement Learning Systems](https://www.jair.org/index.php/jair/article/download/11396/26482)||Journal of Artificial Intelligence Research|2019 247 | [Parallel Knowledge Transfer in Multi-Agent Reinforcement Learning](https://arxiv.org/pdf/2003.13085)|||2020 248 | 249 | 250 | ## Meta Learning 251 | Paper|Code|Accepted at|Year 252 | --|:--:|:--:|--: 253 | [A Policy Gradient Algorithm for Learning to Learn in Multiagent Reinforcement Learning](https://arxiv.org/pdf/2011.00382)||ICML|2021 254 | [Continuous Adaptation via Meta-Learning in Nonstationary and Competitive Environments](https://arxiv.org/pdf/1710.03641.pdf?source=post_page---------------------------)|||2017 255 | 256 | 257 | ## Fairness 258 | Paper|Code|Accepted at|Year 259 | --|:--:|:--:|--: 260 | [FEN:Learning Fairness in Multi-Agent Systems](https://arxiv.org/pdf/1910.14472)||NIPS|2019 261 | [Fairness in Multiagent Resource Allocation with Dynamic and Partial Observations](https://hal.archives-ouvertes.fr/hal-01808984/file/aamas-distrib-fairness-final.pdf)||AAMAS|2018 262 | [Fairness in Multi-agent Reinforcement Learning for Stock Trading](https://arxiv.org/pdf/2001.00918)|||2019 263 | 264 | 265 | ## Exploration 266 | 267 | ### Dense Reward Exploration 268 | Paper|Code|Accepted at|Year 269 | --|:--:|:--:|--: 270 | [MAVEN:Multi-Agent Variational Exploration](https://arxiv.org/pdf/1910.07483)|https://github.com/starry-sky6688/MARL-Algorithms|NIPS|2019 271 | [Social Influence as Intrinsic Motivation for Multi-Agent Deep Reinforcement Learning](http://proceedings.mlr.press/v97/jaques19a/jaques19a.pdf)||ICML|2019 272 | [Episodic Multi-agent Reinforcement Learning with Curiosity-driven Exploration](https://proceedings.neurips.cc/paper/2021/file/1e8ca836c962598551882e689265c1c5-Paper.pdf)||NIPS|2021 273 | [Celebrating Diversity in Shared Multi-Agent Reinforcement Learning](https://proceedings.neurips.cc/paper/2021/file/20aee3a5f4643755a79ee5f6a73050ac-Paper.pdf)|https://github.com/lich14/CDS|NIPS|2021 274 | 275 | ### Sparse Reward Exploration 276 | Paper|Code|Accepted at|Year 277 | --|:--:|:--:|--: 278 | [EITI/EDTI:Influence-Based Multi-Agent Exploration](https://arxiv.org/pdf/1910.05512)|https://github.com/TonghanWang/EITI-EDTI|ICLR|2020 279 | [Cooperative Exploration for Multi-Agent Deep Reinforcement Learning](http://proceedings.mlr.press/v139/liu21j/liu21j.pdf)||ICML|2021 280 | [Centralized Model and Exploration Policy for Multi-Agent](https://arxiv.org/pdf/2107.06434)|||2021 281 | [REMAX: Relational Representation for Multi-Agent Exploration](https://dl.acm.org/doi/abs/10.5555/3535850.3535977)||AAMAS|2022 282 | 283 | ### Uncategorized 284 | Paper|Code|Accepted at|Year 285 | --|:--:|:--:|--: 286 | [CM3: Cooperative Multi-goal Multi-stage Multi-agent Reinforcement Learning](https://arxiv.org/pdf/1809.05188)||ICLR|2020 287 | [Coordinated Exploration via Intrinsic Rewards for Multi-Agent Reinforcement Learning](https://arxiv.org/pdf/1905.12127)|||2019 288 | [Exploration by Maximizing Renyi Entropy for Reward-Free RL Framework](https://arxiv.org/abs/2006.06193v3)||AAAI|2021 289 | [Exploration-Exploitation in Multi-Agent Learning: Catastrophe Theory Meets Game Theory](https://arxiv.org/abs/2012.03083v2)||AAAI|2021 290 | [LIIR: Learning Individual Intrinsic Reward in Multi-Agent Reinforcement Learning](http://papers.neurips.cc/paper/8691-liir-learning-individual-intrinsic-reward-in-multi-agent-reinforcement-learning.pdf)|https://github.com/yalidu/liir|NIPS|2019 291 | 292 | 293 | ## Graph Neural Network 294 | Paper|Code|Accepted at|Year 295 | --|:--:|:--:|--: 296 | [Multi-Agent Game Abstraction via Graph Attention Neural Network](https://ojs.aaai.org/index.php/AAAI/article/view/6211/6067)|https://github.com/starry-sky6688/MARL-Algorithms|AAAI|2020 297 | [Graph Convolutional Reinforcement Learning for Multi-Agent Cooperation](https://arxiv.org/abs/1810.09202v1)||ICLR|2020 298 | [Multi-Agent Reinforcement Learning with Graph Clustering](https://arxiv.org/pdf/2008.08808)|||2020 299 | [Learning to Coordinate with Coordination Graphs in Repeated Single-Stage Multi-Agent Decision Problems](http://proceedings.mlr.press/v80/bargiacchi18a/bargiacchi18a.pdf)||ICML|2018 300 | 301 | 302 | ## Model-based 303 | Paper|Code|Accepted at|Year 304 | --|:--:|:--:|--: 305 | [Model-based Multi-Agent Reinforcement Learning with Cooperative Prioritized Sweeping](https://arxiv.org/pdf/2001.07527)|||2020 306 | 307 | 308 | ## NAS 309 | Paper|Code|Accepted at|Year 310 | --|:--:|:--:|--: 311 | [MANAS: Multi-Agent Neural Architecture Search](https://arxiv.org/pdf/1909.01051)|||2019 312 | 313 | 314 | ## Safe Multi-Agent Reinforcement Learning 315 | Paper|Code|Accepted at|Year 316 | --|:--:|:--:|--: 317 | [MAMPS: Safe Multi-Agent Reinforcement Learning via Model Predictive Shielding](https://arxiv.org/pdf/1910.12639)|||2019 318 | [Safer Deep RL with Shallow MCTS: A Case Study in Pommerman](https://arxiv.org/pdf/1904.05759)|||2019 319 | 320 | 321 | ## From Single-Agent to Multi-Agent 322 | Paper|Code|Accepted at|Year 323 | --|:--:|:--:|--: 324 | [IQL:Multi-Agent Reinforcement Learning: Independent vs. Cooperative Agents](https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.84.3701&rep=rep1&type=pdf)|https://github.com/oxwhirl/pymarl|ICML|1993 325 | [IPPO:Is Independent Learning All You Need in the StarCraft Multi-Agent Challenge?](https://arxiv.org/pdf/2011.09533)|||2020 326 | [MAPPO:The Surprising Effectiveness of MAPPO in Cooperative, Multi-Agent Games](https://arxiv.org/pdf/2103.01955)|https://github.com/marlbenchmark/on-policy||2021 327 | [MADDPG:Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments](https://arxiv.org/pdf/1706.02275.pdf">Multi-Agent)|https://github.com/openai/maddpg|NIPS|2017 328 | 329 | 330 | ## Discrete-Continuous Hybrid Action Space / Parameterized Action Space 331 | Paper|Code|Accepted at|Year 332 | --|:--:|:--:|--: 333 | [Deep Reinforcement Learning in Parameterized Action Space](https://arxiv.org/pdf/1511.04143)|||2015 334 | [DMAPQN: Deep Multi-Agent Reinforcement Learning with Discrete-Continuous Hybrid Action Spaces](https://arxiv.org/pdf/1903.04959)||IJCAI|2019 335 | [H-PPO: Hybrid actor-critic reinforcement learning in parameterized action space](https://arxiv.org/pdf/1903.01344)||IJCAI|2019 336 | [P-DQN: Parametrized Deep Q-Networks Learning: Reinforcement Learning with Discrete-Continuous Hybrid Action Space](https://arxiv.org/pdf/1810.06394)|||2018 337 | 338 | 339 | ## Role 340 | Paper|Code|Accepted at|Year 341 | --|:--:|:--:|--: 342 | [ROMA: Multi-Agent Reinforcement Learning with Emergent Roles](https://openreview.net/pdf?id=RQP2wq-dbkz)|https://github.com/TonghanWang/ROMA|ICML|2020 343 | [RODE: Learning Roles to Decompose Multi-Agent Tasks](https://arxiv.org/pdf/2010.01523)|https://github.com/TonghanWang/RODE|ICLR|2021 344 | [Scaling Multi-Agent Reinforcement Learning with Selective Parameter Sharing](http://proceedings.mlr.press/v139/christianos21a/christianos21a.pdf)|https://github.com/uoe-agents/seps|ICML|2021 345 | 346 | 347 | ## Diversity 348 | Paper|Code|Accepted at|Year 349 | --|:--:|:--:|--: 350 | [Diverse Auto-Curriculum is Critical for Successful Real-World Multiagent Learning Systems](https://arxiv.org/pdf/2102.07659)||AAMAS|2021 351 | [Q-DPP:Multi-Agent Determinantal Q-Learning](http://proceedings.mlr.press/v119/yang20i/yang20i.pdf)|https://github.com/QDPP-GitHub/QDPP|ICML|2020 352 | [Diversity is All You Need: Learning Skills without a Reward Function](https://arxiv.org/pdf/1802.06070)|||2018 353 | [Modelling Behavioural Diversity for Learning in Open-Ended Games](https://arxiv.org/pdf/2103.07927)||ICML|2021 354 | [Diverse Agents for Ad-Hoc Cooperation in Hanabi](https://arxiv.org/pdf/1907.03840)||CoG|2019 355 | [Generating Behavior-Diverse Game AIs with Evolutionary Multi-Objective Deep Reinforcement Learning](https://nos.netease.com/mg-file/mg/neteasegamecampus/art_works/20200812/202008122020238603.pdf)||IJCAI|2020 356 | [Quantifying environment and population diversity in multi-agent reinforcement learning](https://arxiv.org/pdf/2102.08370)|||2021 357 | 358 | 359 | ## Sparse Reward 360 | Paper|Code|Accepted at|Year 361 | --|:--:|:--:|--: 362 | [Variational Automatic Curriculum Learning for Sparse-Reward Cooperative Multi-Agent Problems](https://proceedings.neurips.cc/paper/2021/file/503e7dbbd6217b9a591f3322f39b5a6c-Paper.pdf)||NIPS|2021 363 | [Individual Reward Assisted Multi-Agent Reinforcement Learning](https://proceedings.mlr.press/v162/wang22ao/wang22ao.pdf)|https://github.com/MDrW/ICML2022-IRAT|ICML|2022 364 | 365 | ## Large Scale 366 | Paper|Code|Accepted at|Year 367 | --|:--:|:--:|--: 368 | [From Few to More: Large-Scale Dynamic Multiagent Curriculum Learning](https://arxiv.org/abs/1909.02790)|https://github.com/starry-sky6688/MARL-Algorithms|AAAI|2020 369 | [PooL: Pheromone-inspired Communication Framework for Large Scale Multi-Agent Reinforcement Learning](https://arxiv.org/pdf/2202.09722)|||2022 370 | [Factorized Q-learning for large-scale multi-agent systems](https://dl.acm.org/doi/pdf/10.1145/3356464.3357707?casa_token=CNK3OslP6hkAAAAA:yZFMOmNQB1iasPqxmA6DYDIFe79RdMqUu_8Y7sGASsPNQ3u4o0UkAcqwMTahAwSUcDuh5r6NvSAyig)||ICDAI|2019 371 | [EPC:Evolutionary Population Curriculum for Scaling Multi-Agent Reinforcement Learning](https://arxiv.org/pdf/2003.10423)|https://github.com/qian18long/epciclr2020|ICLR|2020 372 | [Mean Field Multi-Agent Reinforcement Learning](http://proceedings.mlr.press/v80/yang18d/yang18d.pdf)||ICML|2018 373 | [A Study of AI Population Dynamics with Million-agent Reinforcement Learning](https://arxiv.org/pdf/1709.04511)||AAMAS|2018 374 | 375 | 376 | ## DTDE 377 | Paper|Code|Accepted at|Year 378 | --|:--:|:--:|--: 379 | [Networked Multi-Agent Reinforcement Learning in Continuous Spaces](https://ieeexplore.ieee.org/abstract/document/8619581)||IEEE conference on decision and control|2018 380 | [Value Propagation for Decentralized Networked Deep Multi-agent Reinforcement Learning](https://proceedings.neurips.cc/paper/2019/file/8a0e1141fd37fa5b98d5bb769ba1a7cc-Paper.pdf)||NIPS|2019 381 | [Fully Decentralized Multi-Agent Reinforcement Learning with Networked Agents](http://proceedings.mlr.press/v80/zhang18n/zhang18n.pdf)||ICML|2018 382 | 383 | 384 | ## Decision Transformer 385 | Paper|Code|Accepted at|Year 386 | --|:--:|:--:|--: 387 | [Offline Pre-trained Multi-Agent Decision Transformer: One Big Sequence Model Conquers All StarCraftII Tasks](https://arxiv.org/pdf/2112.02845)|||2021 388 | [Multi-Agent Reinforcement Learning is a Sequence Modeling Problem](https://arxiv.org/pdf/2205.14953)|https://github.com/PKU-MARL/Multi-Agent-Transformer||2022 389 | [Towards Flexible Inference in Sequential Decision Problems via Bidirectional Transformers](https://arxiv.org/abs/2204.13326)|||2022 390 | 391 | 392 | ## Offline MARL 393 | Paper|Code|Accepted at|Year 394 | --|:--:|:--:|--: 395 | [Offline Pre-trained Multi-Agent Decision Transformer: One Big Sequence Model Conquers All StarCraftII Tasks](https://arxiv.org/pdf/2112.02845)|||2021 396 | [Believe what you see: Implicit constraint approach for offline multi-agent reinforcement learning](https://proceedings.neurips.cc/paper/2021/file/550a141f12de6341fba65b0ad0433500-Paper.pdf)||NIPS|2021 397 | 398 | 399 | 402 | 403 | 404 | 405 | ## Adversarial 406 | Paper|Code|Accepted at|Year 407 | --|:--:|:--:|--: 408 | [Certifiably Robust Policy Learning against Adversarial Communication in Multi-agent Systems](https://arxiv.org/pdf/2206.10158)|||2022 409 | [Distributed Multi-Agent Deep Reinforcement Learning for Robust Coordination against Noise](https://arxiv.org/pdf/2205.09705)|||2022 410 | [On the Robustness of Cooperative Multi-Agent Reinforcement Learning](https://arxiv.org/pdf/2003.03722)||IEEE Security and Privacy Workshops|2020 411 | [Towards Comprehensive Testing on the Robustness of Cooperative Multi-agent Reinforcement Learning](https://openaccess.thecvf.com/content/CVPR2022W/ArtOfRobust/papers/Guo_Towards_Comprehensive_Testing_on_the_Robustness_of_Cooperative_Multi-Agent_Reinforcement_CVPRW_2022_paper.pdf)||CVPR workshop|2022 412 | [Robust Multi-Agent Reinforcement Learning via Minimax Deep Deterministic Policy Gradient](https://ojs.aaai.org/index.php/AAAI/article/view/4327/4205)||AAAI|2019 413 | [Multi-agent Deep Reinforcement Learning with Extremely Noisy Observations](https://arxiv.org/pdf/1812.00922)||NIPS Deep Reinforcement Learning Workshop|2018 414 | [Policy Regularization via Noisy Advantage Values for Cooperative Multi-agent Actor-Critic methods](https://arxiv.org/pdf/2106.14334)|||2021 415 | 416 | 417 | ## Multi-Agent Path Finding 418 | * TODO 419 | 420 | 421 | ## To be Categorized 422 | Paper|Code|Accepted at|Year 423 | --|:--:|:--:|--: 424 | [Mind-aware Multi-agent Management Reinforcement Learning](https://arxiv.org/pdf/1810.00147)|https://github.com/facebookresearch/M3RL|ICLR|2019 425 | [Emergence of grounded compositional language in multi-agent populations](https://ojs.aaai.org/index.php/AAAI/article/download/11492/11351)|https://github.com/bkgoksel/emergent-language|AAAI|2018 426 | [Emergent Complexity via Multi-Agent Competition](https://arxiv.org/pdf/1710.03748.pdf%3Cp%3EKEYWORDS: Artificial)|https://github.com/openai/multiagent-competition|ICLR|2018 427 | [TLeague: A Framework for Competitive Self-Play based Distributed Multi-Agent Reinforcement Learning](https://arxiv.org/pdf/2011.12895)|https://github.com/tencent-ailab/TLeague||2020 428 | [UPDeT: Universal Multi-agent Reinforcement Learning via Policy Decoupling with Transformers](https://openreview.net/forum?id=v9c7hr9ADKx)|https://github.com/hhhusiyi-monash/UPDeT|ICLR|2021 429 | [SIDE: State Inference for Partially Observable Cooperative Multi-Agent Reinforcement Learning](https://www.ifaamas.org/Proceedings/aamas2022/pdfs/p1400.pdf)|https://github.com/deligentfool/SIDE|AAMAS|2022 430 | [UNMAS: Multiagent Reinforcement Learningfor Unshaped Cooperative Scenarios](https://arxiv.org/pdf/2203.14477)|https://github.com/James0618/unmas|TNNLS|2021 431 | [Context-Aware Sparse Deep Coordination Graphs](https://arxiv.org/pdf/2106.02886)|https://github.com/TonghanWang/CASEC-MACO-benchmark|ICLR|2022 432 | 433 | 434 | ## TODO 435 | * Multi-Agent Path Finding 436 | * Generalization in MARL 437 | 438 | 439 | ## Citation 440 | 441 | If you find this repository useful, please cite our repo: 442 | ``` 443 | @misc{chen2021multi, 444 | author={Chen, Hao}, 445 | title={Multi-Agent Reinforcement Learning Papers}, 446 | year={2021} 447 | publisher = {GitHub}, 448 | journal = {GitHub Repository}, 449 | howpublished = {\url{https://github.com/TimeBreaker/Multi-Agent-Reinforcement-Learning-papers}} 450 | } 451 | ``` 452 | --------------------------------------------------------------------------------