├── README.md
└── assets
├── app&eval.png
├── architecture-1.png
├── architecture.png
├── capability.png
├── planning.png
├── table.png
└── trend.png
/README.md:
--------------------------------------------------------------------------------
1 | # A Survey on LLM-based Autonomous Agents
2 |
3 | 
4 |
5 | Autonomous agents are designed to achieve specific objectives through self-guided instructions. With the emergence and growth of large language models (LLMs), there is a growing trend in utilizing LLMs as fundamental controllers for these autonomous agents. While previous studies in this field have achieved remarkable successes, they remain independent proposals with little effort devoted to a systematic analysis. To bridge this gap, we conduct a comprehensive survey study, focusing on the construction, application, and evaluation of LLM-based autonomous agents. In particular, we first explore the essential components of an AI agent, including a profile module, a memory module, a planning module, and an action module. We further investigate the application of LLM-based autonomous agents in the domains of natural sciences, social sciences, and engineering. Subsequently, we delve into a discussion of the evaluation strategies employed in this field, encompassing both subjective and objective methods. Our survey aims to serve as a resource for researchers and practitioners, providing insights, related references, and continuous updates on this exciting and rapidly evolving field.
6 |
7 | **📍 This is the first released and published survey paper in the field of LLM-based autonomous agents.**
8 |
9 | Paper link: [A Survey on Large Language Model based Autonomous Agents](https://arxiv.org/abs/2308.11432)
10 |
11 |
12 | ## Update Records
13 | - 🔥 [25/3/2024] Our survey paper has been accepted by Frontiers of Computer Science, which is the first published survey paper in the field of LLM-based agents.
14 |
15 | - 🔥 [9/8/2023] The second version of our survey has been released on arXiv.
16 | Updated contents
18 |
19 | - **📚 Additional References**
20 | - We have added 31 new works until 9/1/2023 to make the survey more comprehensive and up-to-date.
21 |
22 | - **📊 New Figures**
23 | - **Figure 3:** This is a new figure illustrating the differences and similarities between various planning approaches. This helps in gaining a clearer understanding of the comparisons between different planning methods.
24 | 
25 | - **Figure 4:** This is a new figure that describes the evolutionary path of model capability acquisition from the "Machine Learning era" to the "Large Language Model era" and then to the "Agent era." Specifically, a new concept, "mechanism engineering," has been introduced, which, along with "parameter learning" and "prompt engineering," forms part of this evolutionary path.
26 | 
27 |
28 | - **🔍 Optimized Classification System**
29 | - We have slightly modified the classification system in our survey to make it more logical and organized.
30 |
33 |
34 |
35 |
36 | ## Table of Content
37 |
38 |
39 | - [🤖 Construction of LLM-based Autonomous Agent](#-construction-of-llm-based-autonomous-agent)
40 | - [📍 Applications of LLM-based Autonomous Agent](#-applications-of-llm-based-autonomous-agent)
41 | - [📊 Evaluation on LLM-based Autonomous Agent](#-evaluation-on-llm-based-autonomous-agent)
42 | - [🌐 More Comprehensive Summarization](#-more-comprehensive-summarization)
43 | - [👨👨👧👦 Maintainers](#-maintainers)
44 | - [📚 Citation](#-citation)
45 | - [💪 How to Contribute](#-how-to-contribute)
46 | - [🫡 Acknowledgement](#-acknowledgement)
47 | - [📧 Contact Us](#-contact-us)
48 |
49 |
50 |
51 | ## 🤖 Construction of LLM-based Autonomous Agent
52 | 
53 |
54 |
Model | 57 |Profile | 58 |Memory | 59 |Planning | 60 |Action | 61 |CA | 62 |Paper | 63 |Code | 64 ||
Operation | 67 |Structure | 68 ||||||||
WebGPT | 71 |- | 72 |- | 73 |- | 74 |- | 75 |w/ tools | 76 |w/ fine-tuning | 77 |Paper | 78 |- | 79 |
SayCan | 82 |- | 83 |- | 84 |- | 85 |w/o feedback | 86 |w/o tools | 87 |w/o fine-tuning | 88 |Paper | 89 |Code | 90 |
MRKL | 93 |- | 94 |- | 95 |- | 96 |w/o feedback | 97 |w/ tools | 98 |- | 99 |Paper | 100 |- | 101 |
Inner Monologue | 104 |- | 105 |- | 106 |- | 107 |w/ feedback | 108 |w/o tools | 109 |w/o fine-tuning | 110 |Paper | 111 |Code | 112 |
Social Simulacra | 115 |GPT-Generated | 116 |- | 117 |- | 118 |- | 119 |w/o tools | 120 |- | 121 |Paper | 122 |- | 123 |
ReAct | 126 |- | 127 |- | 128 |- | 129 |w/ feedback | 130 |w/ tools | 131 |w/ fine-tuning | 132 |Paper | 133 |Code | 134 |
LLM Planner | 137 |- | 138 |- | 139 |- | 140 |w/ feedback | 141 |w/o tools | 142 |Environment feedback | 143 |Paper | 144 |Code | 145 |
MALLM | 148 |- | 149 |Read/Write | 150 |Hybrid | 151 |- | 152 |w/o tools | 153 |- | 154 |Paper | 155 |- | 156 |
aiflows | 159 |- | 160 |Read/Write/ Reflection |
161 | Hybrid | 162 |w/ feedback | 163 |w/ tools | 164 |- | 165 |Paper | 166 |Code | 167 |
DEPS | 170 |- | 171 |- | 172 |- | 173 |w/ feedback | 174 |w/o tools | 175 |w/o fine-tuning | 176 |Paper | 177 |Code | 178 |
Toolformer | 181 |- | 182 |- | 183 |- | 184 |w/o feedback | 185 |w/ tools | 186 |w/ fine-tuning | 187 |Paper | 188 |Code | 189 |
Reflexion | 192 |- | 193 |Read/Write/ Reflection |
194 | Hybrid | 195 |w/ feedback | 196 |w/o tools | 197 |w/o fine-tuning | 198 |Paper | 199 |Code | 200 |
CAMEL | 203 |Handcrafting & GPT-Generated | 204 |- | 205 |- | 206 |w/ feedback | 207 |w/o tools | 208 |- | 209 |Paper | 210 |Code | 211 |
API-Bank | 214 |- | 215 |- | 216 |- | 217 |w/ feedback | 218 |w/ tools | 219 |w/o fine-tuning | 220 |Paper | 221 |- | 222 |
Chameleon | 226 |- | 227 |- | 228 |- | 229 |w/o feedback | 230 |w/ tools | 231 |- | 232 |Paper | 233 |Code | 234 |
ViperGPT | 237 |- | 238 |- | 239 |- | 240 |- | 241 |w/ tools | 242 |- | 243 |Paper | 244 |Code | 245 |
HuggingGPT | 248 |- | 249 |- | 250 |Unified | 251 |w/o feedback | 252 |w/ tools | 253 |- | 254 |Paper | 255 |Code | 256 |
Generative Agents | 259 |Handcrafting | 260 |Read/Write/ Reflection |
261 | Hybrid | 262 |w/ feedback | 263 |w/o tools | 264 |- | 265 |Paper | 266 |Code | 267 |
LLM+P | 270 |- | 271 |- | 272 |- | 273 |w/o feedback | 274 |w/o tools | 275 |- | 276 |Paper | 277 |- | 278 |
ChemCrow | 281 |- | 282 |- | 283 |- | 284 |w/ feedback | 285 |w/ tools | 286 |- | 287 |Paper | 288 |Code | 289 |
OpenAGI | 292 |- | 293 |- | 294 |- | 295 |w/ feedback | 296 |w/ tools | 297 |w/ fine-tuning | 298 |Paper | 299 |Code | 300 |
AutoGPT | 303 |- | 304 |Read/Write | 305 |Hybrid | 306 |w/ feedback | 307 |w/ tools | 308 |w/o fine-tuning | 309 |- | 310 |Code | 311 |
SCM | 314 |- | 315 |Read/Write | 316 |Hybrid | 317 |- | 318 |w/o tools | 319 |- | 320 |Paper | 321 |Code | 322 |
Socially Alignment | 325 |- | 326 |Read/Write | 327 |Hybrid | 328 |- | 329 |w/o tools | 330 |Example | 331 |Paper | 332 |Code | 333 |
GITM | 336 |- | 337 |Read/Write/ Reflection |
338 | Hybrid | 339 |w/ feedback | 340 |w/o tools | 341 |w/ fine-tuning | 342 |Paper | 343 |Code | 344 |
Voyager | 347 |- | 348 |Read/Write/ Reflection |
349 | Hybrid | 350 |w/ feedback | 351 |w/o tools | 352 |w/o fine-tuning | 353 |Paper | 354 |Code | 355 |
Introspective Tips | 358 |- | 359 |- | 360 |- | 361 |w/ feedback | 362 |w/o tools | 363 |w/o fine-tuning | 364 |Paper | 365 |- | 366 |
RET-LLM | 369 |- | 370 |Read/Write | 371 |Hybrid | 372 |- | 373 |w/o tools | 374 |w/ fine-tuning | 375 |Paper | 376 |- | 377 |
ChatDB | 380 |- | 381 |Read/Write | 382 |Hybrid | 383 |w/ feedback | 384 |w/ tools | 385 |- | 386 |Paper | 387 |- | 388 |
S3 | 391 |Dataset alignment | 392 |Read/Write/ Reflection |
393 | Hybrid | 394 |- | 395 |w/o tools | 396 |w/ fine-tuning | 397 |Paper | 398 |- | 399 |
ChatDev | 402 |Handcrafting | 403 |Read/Write/ Reflection |
404 | Hybrid | 405 |w/ feedback | 406 |w/o tools | 407 |w/o fine-tuning | 408 |Paper | 409 |Code | 410 |
ToolLLM | 413 |- | 414 |- | 415 |- | 416 |w/ feedback | 417 |w/ tools | 418 |w/ fine-tuning | 419 |Paper | 420 |Code | 421 |
MemoryBank | 424 |- | 425 |Read/Write/ Reflection |
426 | Hybrid | 427 |- | 428 |w/o tools | 429 |- | 430 |Paper | 431 |Code | 432 |
MetaGPT | 435 |Handcrafting | 436 |Read/Write/ Reflection |
437 | Hybrid | 438 |w/ feedback | 439 |w/ tools | 440 |- | 441 |Paper | 442 |Code | 443 |
L2MAC | 446 |Handcrafting | 447 |Read/Write/ Reflection |
448 | Hybrid | 449 |w/ feedback | 450 |w/ tools | 451 |- | 452 |Paper | 453 |Code | 454 |
LEO | 457 |- | 458 |- | 459 |- | 460 |w/ feedback | 461 |w/o tools | 462 |w/ fine-tuning | 463 |Paper | 464 |Code | 465 |
JARVIS-1 | 468 |- | 469 |Read/Write/ Reflection |
470 | Hybrid | 471 |w/ feedback | 472 |w/ tools | 473 |w/o fine-tuning | 474 |Paper | 475 |Code | 476 |
CLOVA | 479 |- | 480 |Read/Write/ Reflection |
481 | Hybrid | 482 |w/ feedback | 483 |w/ tools | 484 |w/ fine-tuning | 485 |Paper | 486 |Code | 487 |
LearnAct | 490 |- | 491 |- | 492 |- | 493 |w/ feedback | 494 |w/ tools | 495 |w/ fine-tuning | 496 |Paper | 497 |Code | 498 |
AgentSquare | 501 |- | 502 |Read/Write | 503 |Hybrid | 504 |w/ feedback | 505 |w/ tools | 506 |- | 507 |Paper | 508 |Code | 509 |
Title | 520 |Social Science | 521 |Natural Science | 522 |Engineering | 523 |Paper | 524 |Code | 525 |
Drori et al. | 528 |- | 529 |Science Education | 530 |- | 531 |Paper | 532 |- | 533 |
SayCan | 536 |- | 537 |- | 538 |Robotics & Embodied AI | 539 |Paper | 540 |Code | 541 |
Inner monologue | 544 |- | 545 |- | 546 |Robotics & Embodied AI | 547 |Paper | 548 |Code | 549 |
Language-Planners | 552 |- | 553 |- | 554 |Robotics & Embodied AI | 555 |Paper | 556 |Code | 557 |
Social Simulacra | 560 |Social Simulation | 561 |- | 562 |- | 563 |Paper | 564 |- | 565 |
TE | 568 |Psychology | 569 |- | 570 |- | 571 |Paper | 572 |Code | 573 |
Out of One | 576 |Political Science and Economy | 577 |- | 578 |- | 579 |Paper | 580 |- | 581 |
LIBRO | 584 |CS&SE | 585 |- | 586 |- | 587 |Paper | 588 |- | 589 |
Blind Judgement | 592 |Jurisprudence | 593 |- | 594 |- | 595 |Paper | 596 |- | 597 |
Horton | 600 |Political Science and Economy | 601 |- | 602 |- | 603 |Paper | 604 |- | 605 |
DECKARD | 608 |- | 609 |- | 610 |Robotics & Embodied AI | 611 |Paper | 612 |Code | 613 |
Planner-Actor-Reporter | 616 |- | 617 |- | 618 |Robotics & Embodied AI | 619 |Paper | 620 |- | 621 |
DEPS | 624 |- | 625 |- | 626 |Robotics & Embodied AI | 627 |Paper | 628 |- | 629 |
RCI | 632 |- | 633 |- | 634 |CS&SE | 635 |Paper | 636 |Code | 637 |
Generative Agents | 640 |Social Simulation | 641 |- | 642 |- | 643 |Paper | 644 |Code | 645 |
SCG | 648 |- | 649 |- | 650 |CS&SE | 651 |Paper | 652 |- | 653 |
IGLU | 656 |- | 657 |- | 658 |Civil Engineering | 659 |Paper | 660 |- | 661 |
IELLM | 664 |- | 665 |- | 666 |Industrial Automation | 667 |Paper | 668 |- | 669 |
ChemCrow | 672 |- | 673 |Document and Data Management; Documentation, Data Managent; Science Education |
674 | - | 675 |Paper | 676 |- | 677 |
Boiko et al. | 680 |- | 681 |Document and Data Management; Documentation, Data Managent; Science Education |
682 | - | 683 |Paper | 684 |- | 685 |
GPT4IA | 688 |- | 689 |- | 690 |Industrial Automation | 691 |Paper | 692 |Code | 693 |
Self-collaboration | 696 |- | 697 |- | 698 |CS&SE | 699 |Paper | 700 |- | 701 |
E2WM | 704 |- | 705 |- | 706 |Robotics & Embodied AI | 707 |Paper | 708 |Code | 709 |
Akata et al. | 712 |Psychology | 713 |- | 714 |- | 715 |Paper | 716 |- | 717 |
Ziems et al. | 720 |Psychology; Political Science and Economy; Research Assistant |
721 | - | 722 |- | 723 |Paper | 724 |- | 725 |
AgentVerse | 728 |Social Simulation | 729 |- | 730 |- | 731 |Paper | 732 |Code | 733 |
SmolModels | 736 |- | 737 |- | 738 |CS&SE | 739 |- | 740 |Code | 741 |
TidyBot | 744 |- | 745 |- | 746 |Robotics & Embodied AI | 747 |Paper | 748 |Code | 749 |
PET | 752 |- | 753 |- | 754 |Robotics & Embodied AI | 755 |Paper | 756 |- | 757 |
Voyager | 760 |- | 761 |- | 762 |Robotics & Embodied AI | 763 |Paper | 764 |Code | 765 |
GITM | 768 |- | 769 |- | 770 |Robotics & Embodied AI | 771 |Paper | 772 |Code | 773 |
NLSOM | 776 |- | 777 |Science Education | 778 |- | 779 |Paper | 780 |- | 781 |
LLM4RL | 784 |- | 785 |- | 786 |Robotics & Embodied AI | 787 |Paper | 788 |- | 789 |
GPT Engineer | 792 |- | 793 |- | 794 |CS&SE | 795 |- | 796 |Code | 797 |
Grossman et al. | 800 |- | 801 |Experiment Assistant; Science Education |
802 | - | 803 |Paper | 804 |- | 805 |
SQL-PALM | 808 |- | 809 |- | 810 |CS&SE | 811 |Paper | 812 |- | 813 |
REMEMBER | 816 |- | 817 |- | 818 |Robotics & Embodied AI | 819 |Paper | 820 |- | 821 |
DemoGPT | 824 |- | 825 |- | 826 |CS&SE | 827 |- | 828 |Code | 829 |
Chatlaw | 832 |Jurisprudence | 833 |- | 834 |- | 835 |Paper | 836 |Code | 837 |
RestGPT | 840 |- | 841 |- | 842 |CS&SE | 843 |Paper | 844 |Code | 845 |
Dialogue shaping | 848 |- | 849 |- | 850 |Robotics & Embodied AI | 851 |Paper | 852 |- | 853 |
TaPA | 856 |- | 857 |- | 858 |Robotics & Embodied AI | 859 |Paper | 860 |- | 861 |
Ma et al. | 864 |Psychology | 865 |- | 866 |- | 867 |Paper | 868 |- | 869 |
Math Agents | 872 |- | 873 |Science Education | 874 |- | 875 |Paper | 876 |- | 877 |
SocialAI School | 880 |Social Simulation | 881 |- | 882 |- | 883 |Paper | 884 |- | 885 |
Unified Agent | 888 |- | 889 |- | 890 |Robotics & Embodied AI | 891 |Paper | 892 |- | 893 |
Wiliams et al. | 896 |Social Simulation | 897 |- | 898 |- | 899 |Paper | 900 |- | 901 |
Li et al. | 904 |Social Simulation | 905 |- | 906 |- | 907 |Paper | 908 |- | 909 |
S3 | 912 |Social Simulation | 913 |- | 914 |- | 915 |Paper | 916 |- | 917 |
Dialogue Shaping | 920 |- | 921 |- | 922 |Robotics & Embodied AI | 923 |Paper | 924 |- | 925 |
RoCo | 928 |- | 929 |- | 930 |Robotics & Embodied AI | 931 |Paper | 932 |Code | 933 |
Sayplan | 936 |- | 937 |- | 938 |Robotics & Embodied AI | 939 |Paper | 940 |Code | 941 |
aiflows | 944 |- | 945 |- | 946 |CS & SE | 947 |Paper | 948 |Code | 949 |
ToolLLM | 952 |- | 953 |- | 954 |CS&SE | 955 |Paper | 956 |Code | 957 |
ChatDEV | 960 |- | 961 |- | 962 |CS&SE | 963 |Paper | 964 |- | 965 |
Chao et al. | 968 |Social Simulation | 969 |- | 970 |- | 971 |Paper | 972 |- | 973 |
AgentSims | 976 |Social Simulation | 977 |- | 978 |- | 979 |Paper | 980 |Code | 981 |
ChatMOF | 984 |- | 985 |Document and Data Management; Science Education |
986 | - | 987 |Paper | 988 |- | 989 |
MetaGPT | 992 |- | 993 |- | 994 |CS&SE | 995 |Paper | 996 |Code | 997 |
L2MAC | 1000 |- | 1001 |- | 1002 |CS&SE | 1003 |Paper | 1004 |Code | 1005 |
Codehelp | 1008 |- | 1009 |Science Education | 1010 |CS&SE | 1011 |Paper | 1012 |- | 1013 |
AutoGen | 1016 |- | 1017 |Science Education | 1018 |- | 1019 |Paper | 1020 |- | 1021 |
RAH | 1024 |- | 1025 |- | 1026 |CS&SE | 1027 |Paper | 1028 |- | 1029 |
DB-GPT | 1032 |- | 1033 |- | 1034 |CS&SE | 1035 |Paper | 1036 |Code | 1037 |
RecMind | 1040 |- | 1041 |- | 1042 |CS&SE | 1043 |Paper | 1044 |- | 1045 |
ChatEDA | 1048 |- | 1049 |- | 1050 |CS&SE | 1051 |Paper | 1052 |- | 1053 |
InteRecAgent | 1056 |- | 1057 |- | 1058 |CS&SE | 1059 |Paper | 1060 |- | 1061 |
PentestGPT | 1064 |- | 1065 |- | 1066 |CS&SE | 1067 |Paper | 1068 |- | 1069 |
Codehelp | 1072 |- | 1073 |- | 1074 |CS&SE | 1075 |Paper | 1076 |- | 1077 |
ProAgent | 1080 |- | 1081 |- | 1082 |Robotics & Embodied AI | 1083 |Paper | 1084 |- | 1085 |
MindAgent | 1088 |- | 1089 |- | 1090 |Robotics & Embodied AI | 1091 |Paper | 1092 |- | 1093 |
LEO | 1096 |- | 1097 |- | 1098 |Robotics & Embodied AI | 1099 |Paper | 1100 |- | 1101 |
JARVIS-1 | 1104 |- | 1105 |- | 1106 |Robotics & Embodied AI | 1107 |Paper | 1108 |- | 1109 |
CLOVA | 1112 |- | 1113 |- | 1114 |CS&SE | 1115 |Paper | 1116 |- | 1117 |
AgentTrust | 1120 |- | 1121 |Social Simulation | 1122 |- | 1123 |Paper | 1124 |Code | 1125 |
embodied-agents | 1128 |- | 1129 |- | 1130 |Robotics & Embodied AI | 1131 |- | 1132 |Code | 1133 |
AgentOccam | 1136 |- | 1137 |- | 1138 |CS&SE | 1139 |Paper | 1140 |- | 1141 |
Model | 1151 |Subjective | 1152 |Objective | 1153 |Benchmark | 1154 |Paper | 1155 |Code | 1156 |
WebShop | 1159 |- | 1160 |Environment Simulation; Multi-task Evaluation |
1161 | ✓ | 1162 |Paper | 1163 |Code | 1164 |
Social Simulacra | 1167 |Human Annotation | 1168 |Social Evaluation | 1169 |- | 1170 |Paper | 1171 |- | 1172 |
TE | 1175 |- | 1176 |Social Evaluation | 1177 |- | 1178 |Paper | 1179 |Code | 1180 |
LIBRO | 1183 |- | 1184 |Software Testing | 1185 |- | 1186 |Paper | 1187 |- | 1188 |
ReAct | 1191 |- | 1192 |Environment Simulation | 1193 |✓ | 1194 |Paper | 1195 |Code | 1196 |
Out of One, Many | 1199 |Turing Test | 1200 |Social Evaluation; Multi-task Evaluation |
1201 | - | 1202 |Paper | 1203 |- | 1204 |
DEPS | 1207 |- | 1208 |Environment Simulation | 1209 |✓ | 1210 |Paper | 1211 |- | 1212 |
Jalil et al. | 1215 |- | 1216 |Software Testing | 1217 |- | 1218 |Paper | 1219 |Code | 1220 |
Reflexion | 1223 |- | 1224 |Environment Simulation; Multi-task Evaluation |
1225 | - | 1226 |Paper | 1227 |Code | 1228 |
IGLU | 1231 |- | 1232 |Environment Simulation | 1233 |✓ | 1234 |Paper | 1235 |- | 1236 |
Generative Agents | 1239 | Human Annoation; Turing Test |
1240 | - | 1241 |- | 1242 |Paper | 1243 |Code | 1244 |
ToolBench | 1247 |Human Annoation | 1248 |Multi-task Evalution | 1249 |✓ | 1250 |Paper | 1251 |Code | 1252 |
GITM | 1255 |- | 1256 |Environment Simulation | 1257 |✓ | 1258 |Paper | 1259 |Code | 1260 |
Two-Failures | 1263 |- | 1264 |Multi-task Evalution | 1265 |- | 1266 |Paper | 1267 |- | 1268 |
Voyager | 1271 |- | 1272 |Environment Simulation | 1273 |✓ | 1274 |Paper | 1275 |Code | 1276 |
SocKET | 1279 |- | 1280 |Social Evaluation; Multi-task Evaluation |
1281 | ✓ | 1282 |Paper | 1283 |- | 1284 |
Mobile-Env | 1287 |- | 1288 |Environment Simulation; Multi-task Evaluation |
1289 | ✓ | 1290 |Paper | 1291 |Code | 1292 |
Clembench | 1295 |- | 1296 |Environment Simulation; Multi-task Evaluation |
1297 | ✓ | 1298 |Paper | 1299 |Code | 1300 |
Mind2Web | 1303 |- | 1304 |Environment Simulation; Multi-task Evaluation |
1305 | ✓ | 1306 |Paper | 1307 |Code | 1308 |
Dialop | 1311 |- | 1312 |Social Evaluation | 1313 |✓ | 1314 |Paper | 1315 |Code | 1316 |
Feldt et al. | 1319 |- | 1320 |Software Testing | 1321 |- | 1322 |Paper | 1323 |- | 1324 |
CO-LLM | 1327 |Human Annoation | 1328 |Environment Simulation | 1329 |- | 1330 |Paper | 1331 |Code | 1332 |
Tachikuma | 1335 |Human Annoation | 1336 |Environment Simulation | 1337 |✓ | 1338 |Paper | 1339 |- | 1340 |
WebArena | 1343 |- | 1344 |Environment Simulation | 1345 |✓ | 1346 |Paper | 1347 |Code | 1348 |
RocoBench | 1351 |- | 1352 |Environment Simulation; Social Evaluation; Multi-task Evaluation |
1353 | ✓ | 1354 |Paper | 1355 |Code | 1356 |
AgentSims | 1359 |- | 1360 |Social Evaluation | 1361 |- | 1362 |Paper | 1363 |Code | 1364 |
AgentBench | 1367 |- | 1368 |Multi-task Evaluation | 1369 |✓ | 1370 |Paper | 1371 |Code | 1372 |
BOLAA | 1375 |- | 1376 |Environment Simulation; Multi-task Evaluation; Software Testing |
1377 | ✓ | 1378 |Paper | 1379 |Code | 1380 |
Gentopia | 1383 |- | 1384 |Isolated Reasoning; Multi-task Evaluation |
1385 | ✓ | 1386 |Paper | 1387 |Code | 1388 |
EmotionBench | 1391 |Human Annotation | 1392 |- | 1393 |✓ | 1394 |Paper | 1395 |Code | 1396 |
PTB | 1399 |- | 1400 |Software Testing | 1401 |✓ | 1402 |Paper | 1403 |- | 1404 |
MintBench | 1407 |- | 1408 |Multi-task Evaluation | 1409 |✓ | 1410 |Paper | 1411 |Code | 1412 |
MindAgent | 1415 |- | 1416 |Environment Simulation; Multi-task Evaluation |
1417 | ✓ | 1418 |Paper | 1419 |- | 1420 |
JARVIS-1 | 1423 |- | 1424 |Environment Simulation | 1425 |- | 1426 |Paper | 1427 |- | 1428 |
TimeCharac | 1431 |GPT Annotation | 1432 |- | 1433 |✓ | 1434 |Paper | 1435 |Code | 1436 |
AppWorld | 1439 |- | 1440 |Environment Simulation | 1441 |✓ | 1442 |Paper | 1443 |Code | 1444 |