├── CONTRIBUTING.md
├── LICENSE
├── README.md
└── assets
    ├── mobile.jpg
    └── pc.jpg


/CONTRIBUTING.md:
--------------------------------------------------------------------------------
 1 | # Contributing to Awesome UI Agent
 2 | 
 3 | Anyone interested in UI Agent is welcomed to contribute to this repo:
 4 | 
 5 | - You can add the classical or latest publications / tutorials directly to `README.md` and `your_create.md`.
 6 | 
 7 | - You are welcomed to update anything helpful.
 8 | 
 9 | 
10 | ## Pull Requests
11 | 
12 | In general, we follow the "fork-and-pull" Git workflow.
13 | 
14 | 1. Fork this repo on your personal GitHub.
15 | 
16 | 2. Clone this repo to your own machine.
17 |     ```
18 |     git clone https://github.com/<your_username>/awesome-ui-agent.git
19 |     ```
20 | 
21 | 3. Make necessary changes and commit those changes.
22 |     
23 | -  If you go to this project directory and execute the command `git status`, you'll see there are changes.
24 | 
25 | - Add those changes to the branch using the `git add` command:
26 |     ```
27 |     git add <your_contributor_files>
28 |     ```
29 | - Now commit those changes using the `git commit` command:
30 |     ```
31 |     git commit -m "add(<your_name>): commit message"
32 |     ```
33 |     * There are some standards of commit message.
34 |     * `Template:` add/feature/polish(committer_name or project_name): commit message
35 |     * `For example:` add(jrn): add one paper about UI-Agent
36 | 
37 | 
38 | 4. Push your work back up to your fork.
39 |     ```
40 |     git push origin <your_branch_name>
41 |     ```
42 | 
43 | 5. Submit a Pull request so that we can review your changes.
44 | 
45 | - If you go to your repository on GitHub, you'll see a `Contribute` button. Click on that button. Then click `Open pull request` and `Create pull request` buttons in turn.
46 | 
47 | - Soon We will be merging all your changes into the main branch of this repo. You will get a notification email once the changes have been merged.
48 | 
49 | 


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
  1 |                                  Apache License
  2 |                            Version 2.0, January 2004
  3 |                         http://www.apache.org/licenses/
  4 | 
  5 |    TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
  6 | 
  7 |    1. Definitions.
  8 | 
  9 |       "License" shall mean the terms and conditions for use, reproduction,
 10 |       and distribution as defined by Sections 1 through 9 of this document.
 11 | 
 12 |       "Licensor" shall mean the copyright owner or entity authorized by
 13 |       the copyright owner that is granting the License.
 14 | 
 15 |       "Legal Entity" shall mean the union of the acting entity and all
 16 |       other entities that control, are controlled by, or are under common
 17 |       control with that entity. For the purposes of this definition,
 18 |       "control" means (i) the power, direct or indirect, to cause the
 19 |       direction or management of such entity, whether by contract or
 20 |       otherwise, or (ii) ownership of fifty percent (50%) or more of the
 21 |       outstanding shares, or (iii) beneficial ownership of such entity.
 22 | 
 23 |       "You" (or "Your") shall mean an individual or Legal Entity
 24 |       exercising permissions granted by this License.
 25 | 
 26 |       "Source" form shall mean the preferred form for making modifications,
 27 |       including but not limited to software source code, documentation
 28 |       source, and configuration files.
 29 | 
 30 |       "Object" form shall mean any form resulting from mechanical
 31 |       transformation or translation of a Source form, including but
 32 |       not limited to compiled object code, generated documentation,
 33 |       and conversions to other media types.
 34 | 
 35 |       "Work" shall mean the work of authorship, whether in Source or
 36 |       Object form, made available under the License, as indicated by a
 37 |       copyright notice that is included in or attached to the work
 38 |       (an example is provided in the Appendix below).
 39 | 
 40 |       "Derivative Works" shall mean any work, whether in Source or Object
 41 |       form, that is based on (or derived from) the Work and for which the
 42 |       editorial revisions, annotations, elaborations, or other modifications
 43 |       represent, as a whole, an original work of authorship. For the purposes
 44 |       of this License, Derivative Works shall not include works that remain
 45 |       separable from, or merely link (or bind by name) to the interfaces of,
 46 |       the Work and Derivative Works thereof.
 47 | 
 48 |       "Contribution" shall mean any work of authorship, including
 49 |       the original version of the Work and any modifications or additions
 50 |       to that Work or Derivative Works thereof, that is intentionally
 51 |       submitted to Licensor for inclusion in the Work by the copyright owner
 52 |       or by an individual or Legal Entity authorized to submit on behalf of
 53 |       the copyright owner. For the purposes of this definition, "submitted"
 54 |       means any form of electronic, verbal, or written communication sent
 55 |       to the Licensor or its representatives, including but not limited to
 56 |       communication on electronic mailing lists, source code control systems,
 57 |       and issue tracking systems that are managed by, or on behalf of, the
 58 |       Licensor for the purpose of discussing and improving the Work, but
 59 |       excluding communication that is conspicuously marked or otherwise
 60 |       designated in writing by the copyright owner as "Not a Contribution."
 61 | 
 62 |       "Contributor" shall mean Licensor and any individual or Legal Entity
 63 |       on behalf of whom a Contribution has been received by Licensor and
 64 |       subsequently incorporated within the Work.
 65 | 
 66 |    2. Grant of Copyright License. Subject to the terms and conditions of
 67 |       this License, each Contributor hereby grants to You a perpetual,
 68 |       worldwide, non-exclusive, no-charge, royalty-free, irrevocable
 69 |       copyright license to reproduce, prepare Derivative Works of,
 70 |       publicly display, publicly perform, sublicense, and distribute the
 71 |       Work and such Derivative Works in Source or Object form.
 72 | 
 73 |    3. Grant of Patent License. Subject to the terms and conditions of
 74 |       this License, each Contributor hereby grants to You a perpetual,
 75 |       worldwide, non-exclusive, no-charge, royalty-free, irrevocable
 76 |       (except as stated in this section) patent license to make, have made,
 77 |       use, offer to sell, sell, import, and otherwise transfer the Work,
 78 |       where such license applies only to those patent claims licensable
 79 |       by such Contributor that are necessarily infringed by their
 80 |       Contribution(s) alone or by combination of their Contribution(s)
 81 |       with the Work to which such Contribution(s) was submitted. If You
 82 |       institute patent litigation against any entity (including a
 83 |       cross-claim or counterclaim in a lawsuit) alleging that the Work
 84 |       or a Contribution incorporated within the Work constitutes direct
 85 |       or contributory patent infringement, then any patent licenses
 86 |       granted to You under this License for that Work shall terminate
 87 |       as of the date such litigation is filed.
 88 | 
 89 |    4. Redistribution. You may reproduce and distribute copies of the
 90 |       Work or Derivative Works thereof in any medium, with or without
 91 |       modifications, and in Source or Object form, provided that You
 92 |       meet the following conditions:
 93 | 
 94 |       (a) You must give any other recipients of the Work or
 95 |           Derivative Works a copy of this License; and
 96 | 
 97 |       (b) You must cause any modified files to carry prominent notices
 98 |           stating that You changed the files; and
 99 | 
100 |       (c) You must retain, in the Source form of any Derivative Works
101 |           that You distribute, all copyright, patent, trademark, and
102 |           attribution notices from the Source form of the Work,
103 |           excluding those notices that do not pertain to any part of
104 |           the Derivative Works; and
105 | 
106 |       (d) If the Work includes a "NOTICE" text file as part of its
107 |           distribution, then any Derivative Works that You distribute must
108 |           include a readable copy of the attribution notices contained
109 |           within such NOTICE file, excluding those notices that do not
110 |           pertain to any part of the Derivative Works, in at least one
111 |           of the following places: within a NOTICE text file distributed
112 |           as part of the Derivative Works; within the Source form or
113 |           documentation, if provided along with the Derivative Works; or,
114 |           within a display generated by the Derivative Works, if and
115 |           wherever such third-party notices normally appear. The contents
116 |           of the NOTICE file are for informational purposes only and
117 |           do not modify the License. You may add Your own attribution
118 |           notices within Derivative Works that You distribute, alongside
119 |           or as an addendum to the NOTICE text from the Work, provided
120 |           that such additional attribution notices cannot be construed
121 |           as modifying the License.
122 | 
123 |       You may add Your own copyright statement to Your modifications and
124 |       may provide additional or different license terms and conditions
125 |       for use, reproduction, or distribution of Your modifications, or
126 |       for any such Derivative Works as a whole, provided Your use,
127 |       reproduction, and distribution of the Work otherwise complies with
128 |       the conditions stated in this License.
129 | 
130 |    5. Submission of Contributions. Unless You explicitly state otherwise,
131 |       any Contribution intentionally submitted for inclusion in the Work
132 |       by You to the Licensor shall be under the terms and conditions of
133 |       this License, without any additional terms or conditions.
134 |       Notwithstanding the above, nothing herein shall supersede or modify
135 |       the terms of any separate license agreement you may have executed
136 |       with Licensor regarding such Contributions.
137 | 
138 |    6. Trademarks. This License does not grant permission to use the trade
139 |       names, trademarks, service marks, or product names of the Licensor,
140 |       except as required for reasonable and customary use in describing the
141 |       origin of the Work and reproducing the content of the NOTICE file.
142 | 
143 |    7. Disclaimer of Warranty. Unless required by applicable law or
144 |       agreed to in writing, Licensor provides the Work (and each
145 |       Contributor provides its Contributions) on an "AS IS" BASIS,
146 |       WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
147 |       implied, including, without limitation, any warranties or conditions
148 |       of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
149 |       PARTICULAR PURPOSE. You are solely responsible for determining the
150 |       appropriateness of using or redistributing the Work and assume any
151 |       risks associated with Your exercise of permissions under this License.
152 | 
153 |    8. Limitation of Liability. In no event and under no legal theory,
154 |       whether in tort (including negligence), contract, or otherwise,
155 |       unless required by applicable law (such as deliberate and grossly
156 |       negligent acts) or agreed to in writing, shall any Contributor be
157 |       liable to You for damages, including any direct, indirect, special,
158 |       incidental, or consequential damages of any character arising as a
159 |       result of this License or out of the use or inability to use the
160 |       Work (including but not limited to damages for loss of goodwill,
161 |       work stoppage, computer failure or malfunction, or any and all
162 |       other commercial damages or losses), even if such Contributor
163 |       has been advised of the possibility of such damages.
164 | 
165 |    9. Accepting Warranty or Additional Liability. While redistributing
166 |       the Work or Derivative Works thereof, You may choose to offer,
167 |       and charge a fee for, acceptance of support, warranty, indemnity,
168 |       or other liability obligations and/or rights consistent with this
169 |       License. However, in accepting such obligations, You may act only
170 |       on Your own behalf and on Your sole responsibility, not on behalf
171 |       of any other Contributor, and only if You agree to indemnify,
172 |       defend, and hold each Contributor harmless for any liability
173 |       incurred by, or claims asserted against, such Contributor by reason
174 |       of your accepting any such warranty or additional liability.
175 | 
176 |    END OF TERMS AND CONDITIONS
177 | 
178 |    APPENDIX: How to apply the Apache License to your work.
179 | 
180 |       To apply the Apache License to your work, attach the following
181 |       boilerplate notice, with the fields enclosed by brackets "[]"
182 |       replaced with your own identifying information. (Don't include
183 |       the brackets!)  The text should be enclosed in the appropriate
184 |       comment syntax for the file format. We also recommend that a
185 |       file or class name and description of purpose be included on the
186 |       same "printed page" as the copyright notice for easier
187 |       identification within third-party archives.
188 | 
189 |    Copyright [yyyy] [name of copyright owner]
190 | 
191 |    Licensed under the Apache License, Version 2.0 (the "License");
192 |    you may not use this file except in compliance with the License.
193 |    You may obtain a copy of the License at
194 | 
195 |        http://www.apache.org/licenses/LICENSE-2.0
196 | 
197 |    Unless required by applicable law or agreed to in writing, software
198 |    distributed under the License is distributed on an "AS IS" BASIS,
199 |    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
200 |    See the License for the specific language governing permissions and
201 |    limitations under the License.
202 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | # Awesome UI Agent
  2 | 
  3 | [![Awesome](https://cdn.rawgit.com/sindresorhus/awesome/d7305f38d29fed78fa85652e3a63e154dd8e8829/media/badge.svg)](https://github.com/sindresorhus/awesome)
  4 | ![visitor badge](https://visitor-badge.lithub.cc/badge?page_id=opendilab.awesome-ui-agents&left_text=Visitors)
  5 | [![Twitter](https://img.shields.io/twitter/url?style=social&url=https%3A%2F%2Ftwitter.com%2Fopendilab)](https://twitter.com/opendilab)    
  6 | [![GitHub stars](https://img.shields.io/github/stars/opendilab/awesome-ui-agents)](https://github.com/opendilab/awesome-ui-agents/stargazers)
  7 | [![GitHub forks](https://img.shields.io/github/forks/opendilab/awesome-ui-agents)](https://github.com/opendilab/awesome-ui-agents/network)
  8 | ![GitHub commit activity](https://img.shields.io/github/commit-activity/m/opendilab/awesome-ui-agents)
  9 | [![GitHub issues](https://img.shields.io/github/issues/opendilab/awesome-ui-agents)](https://github.com/opendilab/awesome-ui-agents/issues)
 10 | [![GitHub pulls](https://img.shields.io/github/issues-pr/opendilab/awesome-ui-agents)](https://github.com/opendilab/awesome-ui-agents/pulls)
 11 | [![Contributors](https://img.shields.io/github/contributors/opendilab/awesome-ui-agents)](https://github.com/opendilab/awesome-ui-agents/graphs/contributors)
 12 | [![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
 13 | 
 14 | This is a collection of research papers for **UI Agent**, which includes models, tools, and datasets.
 15 | And the repository will be continuously updated to track the frontier of UI Agent or related fields.
 16 | 
 17 | Welcome to follow and star!
 18 | 
 19 | ## Table of Contents
 20 | 
 21 | - [Awesome UI Agent](#awesome-ui-agent)
 22 |   - [Table of Contents](#table-of-contents)
 23 |   - [Overview of UI Agent](#overview-of-ui-agent)
 24 |   - [Papers](#papers)
 25 |     - [Models](#models)
 26 |       - [2025](#2025)
 27 |       - [2024](#2024)
 28 |       - [2023](#2023)
 29 |     - [Tools](#tools)
 30 |     - [Datasets](#datasets)
 31 |   - [Related Repositories](#related-repositories)
 32 |   - [Contributing](#contributing)
 33 |   - [License](#license)
 34 | 
 35 | ## Overview of UI Agent
 36 | 
 37 | UI Agent aims to build a generalist agent that can interact with various user interfaces (UIs) in different environments, such as mobile apps, web pages, and PC applications. The agent can understand the UIs through vision-language models and interact with them to complete tasks. The agent can be applied to various scenarios, such as mobile device operation, web browsing, and game playing. The agent can be trained in a simulated environment or with real-world data. The agent can be evaluated in terms of task completion rate, efficiency, and generalization ability.
 38 | 
 39 | <p align="center">
 40 |   <img src="assets/mobile.jpg" alt="Image Description 1" width="80%" height="auto" style="margin: 0 1%;">
 41 | </p>
 42 | 
 43 | The research on UI Agent is still in its early stage, and there are many challenges to be addressed, such as the scalability of the agent, the robustness of the agent, and the interpretability of the agent. The research on UI Agent is interdisciplinary, involving computer vision, natural language processing, reinforcement learning, human-computer interaction, and software engineering. The research on UI Agent has the potential to revolutionize the way we interact with computers and improve the efficiency and usability of computer systems.
 44 | 
 45 | <p align="center">
 46 |   <img src="assets/pc.jpg" alt="Image Description 1" width="80%" height="auto" style="margin: 0 1%;">
 47 | </p>
 48 | 
 49 | ## Papers
 50 | 
 51 | ```
 52 | format:
 53 | - [title](paper link) [links]
 54 |     - author1, author2, and author3...
 55 |     - year
 56 |     - publisher
 57 |     - key 
 58 |     - code 
 59 |     - experiment environment
 60 | ```
 61 | 
 62 | ### Models
 63 | 
 64 | #### 2025
 65 | 
 66 | - [VSC-RL: Advancing Autonomous Vision-Language Agents with Variational Subgoal-Conditioned Reinforcement Learning](https://arxiv.org/abs/2502.07949)  
 67 |   - Qingyuan Wu, Jianheng Liu, Jianye Hao, Jun Wang, Kun Shao  
 68 |   - Key: framework, reinforcement learning, subgoal generation, VSC-RL, learning efficiency  
 69 |   - ExpEnv: Android in the Wild
 70 | 
 71 | - [AppVLM: A Lightweight Vision Language Model for Online App Control](https://arxiv.org/abs/2502.06395)  
 72 |   - Georgios Papoudakis, Thomas Coste, Zhihao Wu, Jianye Hao, Jun Wang, Kun Shao  
 73 |   - Key: vision-language model, multi-modal, AppVLM, on-device control
 74 |   - ExpEnv: two open-source mobile control datasets
 75 | 
 76 | - [DistRL: An Asynchronous Distributed Reinforcement Learning Framework for On-Device Control Agents](https://arxiv.org/abs/2410.14803)
 77 |   - Taiyi Wang, Zhihao Wu, Jianheng Liu, Jianye Hao, Jun Wang, Kun Shao  
 78 |   - Key: framework, reinforcement learning, distributed training, A-RIDE, on-device control  
 79 |   - ExpEnv: Android in the Wild
 80 |   - [code](https://ai-agents-2030.github.io/DistRL/)
 81 | 
 82 | - [OpenAI operator](https://openai.com/index/introducing-operator/)  
 83 |   - OpenAI 
 84 |   - Key: A research preview of an agent that can use its own browser to perform tasks for you.  
 85 |   - ExpEnv: OSWorld, WebArena, WebVoyager
 86 | 
 87 | - [OpenAI Computer-Using Agent](https://openai.com/index/computer-using-agent/)  
 88 |   - OpenAI 
 89 |   - Key: a universal interface for AI to interact with the digital world. 
 90 |   - ExpEnv: OSWorld, WebArena, WebVoyager
 91 | 
 92 | - [Claude computer use](https://www.anthropic.com/news/developing-computer-use)  
 93 |   - anthropic 
 94 |   - Key: emulating the way people interact with their own computer. 
 95 |   - ExpEnv: OSWorld
 96 | 
 97 | - [Multi-modal Agent Tuning: Building a VLM-Driven Agent for Efficient Tool Usage](https://openreview.net/forum?id=0bmGL4q7vJ)  
 98 |   - Zhi Gao, Bofei Zhang, Pengxiang Li, Xiaojian Ma, Tao Yuan, Yue Fan, Yuwei Wu, Yunde Jia, Song-Chun Zhu, Qing Li  
 99 |   - Key: Multimodal Agents, Vision-language Model, Tool usage  
100 |   - ExpEnv: GTA, GAIA benchmarks
101 | 
102 | - [Lightweight Neural App Control](https://openreview.net/forum?id=BL4WBIfyrz)  
103 |   - Filippos Christianos, Georgios Papoudakis, Thomas Coste, Jianye HAO, Jun Wang, Kun Shao  
104 |   - Key: vision-language model, multi-modal, android control, app agent  
105 |   - ExpEnv: two open-source mobile control datasets
106 |   
107 | - [Enhancing Software Agents with Monte Carlo Tree Search and Hindsight Feedback](https://openreview.net/forum?id=G7sIFXugTX)  
108 |   - Antonis Antoniades, Albert Örwall, Kexun Zhang, Yuxi Xie, Anirudh Goyal, William Yang Wang  
109 |   - Key: agents, LLM, SWE-agents, SWE-bench, search, planning, reasoning, self-improvement, open-ended  
110 |   - ExpEnv: SWE-bench
111 | 
112 | #### 2024
113 | 
114 | - [On the Effects of Data Scale on UI Control Agents](https://arxiv.org/abs/2406.03679)  
115 |     - Wei Li, William Bishop, Alice Li, Chris Rawles, Folawiyo Campbell-Ajala, Divya Tyamagundlu, Oriana Riva  
116 |     - Key: Autonomous agents, UI control, AndroidControl dataset, fine-tuning, in-domain vs out-of-domain performance  
117 |     - 2024
118 |     - [code](https://github.com/google-research/google-research/tree/master/android_control)
119 | 
120 | - [SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering](https://arxiv.org/abs/2405.15793)
121 |     - John Yang, Carlos E. Jimenez, Alexander Wettig, Kilian Lieret, Shunyu Yao, Karthik Narasimhan, Ofir Press
122 |     - Key: Language model agents, agent-computer interface (ACI), automated software engineering, SWE-bench, HumanEvalFix
123 |     - 2024
124 |     - [code](https://github.com/SWE-agent/SWE-agent)
125 | 
126 | - [Embodied Agent Interface: Benchmarking LLMs for Embodied Decision Making](https://arxiv.org/abs/2410.07166)
127 |     - Manling Li, Shiyu Zhao, Qineng Wang, Kangrui Wang, Yu Zhou, Sanjana Srivastava, Cem Gokmen, Tony Lee, Li Erran Li, Ruohan Zhang, Weiyu Liu, Percy Liang, Li Fei-Fei, Jiayuan Mao, Jiajun Wu
128 |     - Key: Large language models, embodied decision making, generalized interface, fine-grained metrics, subgoal decomposition, action sequencing
129 |     - 2024
130 |     - [code](https://github.com/embodied-agent-eval/embodied-agent-eval)
131 | 
132 | - [Cradle: Empowering Foundation Agents Towards General Computer Control](https://arxiv.org/abs/2403.03186)
133 |     - Weihao Tan and Wentao Zhang and Xinrun Xu and Haochong Xia and et al.
134 |     - Key:  various virtual scenarios, General Computer Control
135 |     - 2024
136 |     - [code](https://github.com/BAAI-Agents/Cradle)
137 | 
138 | - [Lightweight Neural App Control](https://arxiv.org/abs/2410.17883)
139 |     - Filippos Christianos and Georgios Papoudakis and Thomas Coste and Jianye Hao and Jun Wang and Kun Shao
140 |     - KEY: app agents,  Android apps, Action Transformer
141 |     - 2024
142 |     
143 | - [SeeAct GPT-4V(ision) is a Generalist Web Agent, if Grounded](https://github.com/OSU-NLP-Group/SeeAct)
144 |     - Boyuan Zheng and Boyu Gou and Jihyung Kil and Huan Sun amd Yu Su
145 |     - Key: live websites, grounding still, mage captioning, visual question answering
146 |     - 2024
147 |     - [code](https://osu-nlp-group.github.io/SeeAct)
148 | 
149 | - [MMAC-Copilot: Multi-modal Agent Collaboration Operating System Copilot](https://arxiv.org/abs/2404.18074)
150 |     - Zirui Song and Yaohang Li and Meng Fang and Zhenhao Chen and Zecheng Shi and Yuan Huang and Ling Chen
151 |     - Key: Autonomous virtual agents, Multi-Modal Agent Collaboration
152 |     - 2024
153 | 
154 | - [SeeClick: Harnessing GUI Grounding for Advanced Visual GUI Agents](https://arxiv.org/abs/2401.10935)
155 |     - Kanzhi Cheng and Qiushi Sun and Yougang Chu and Fangzhi Xu and Yantao Li and Jianbing Zhang and Zhiyong Wu
156 |     - Key: Graphical User Interface, screenshots
157 |     - 2024
158 |     -[code](https://github.com/njucckevin/SeeClick)
159 | 
160 | - [OS-ATLAS: A Foundation Action Model for Generalist GUI Agents](https://arxiv.org/abs/2410.23218)
161 |     - Zhiyong Wu and Zhenyu Wu and Fangzhi Xu and Yian Wang and Qiushi Sun and Chengyou Jia and Kanzhi Cheng and Zichen Ding and Liheng Chen and Paul Pu Liang and Yu Qiao
162 |     - Key: Out-Of-Distribution, GUI grounding, language agent
163 |     - 2024
164 |     -[code](https://github.com/OS-Copilot/OS-Atlas)
165 | 
166 | - [Proposer-Agent-Evaluator(PAE): Autonomous Skill Discovery For Foundation Model Internet Agents](https://arxiv.org/abs/2412.13194)
167 |     - Yifei Zhou and Qianlan Yang and Kaixiang Lin and Min Bai and Xiong Zhou and Yu-Xiong Wang and Sergey Levine and Erran Li
168 |     - Key: Large language models,  Internet-browsing agent, autonomous task proposal 
169 |     - 2024
170 |     - [code](https://yanqval.github.io/PAE/)
171 | 
172 | - [Autowebglm: Bootstrap and reinforce a large language model-based web navigating agent](https://arxiv.org/abs/2404.03648)
173 |     - Hanyu Lai and Xiao Liu and Iat Long Iong and Shuntian Yao and Yuxuan Chen and Pengbo Shen and Hao Yu and Hanchen Zhang and Xiaohan Zhang and Yuxiao Dong and Jie Tang
174 |     - Key: Large language models, real-world web navigation, bilingual benchmark 
175 |     - 2024
176 |     - [code](https://github.com/THUDM/AutoWebGLM)
177 |   
178 | - [Dual-view visual contextualization for web navigation](https://arxiv.org/abs/2402.04476)
179 |     - Jihyung Kil and Chan Hee Song and Boyuan Zheng and Xiang Deng and Yu Su and Wei-Lun Chao
180 |     - Key: Automatic web navigation, language instructions, HTML elements
181 |     - 2024
182 | 
183 | - [Agent-e: From autonomous web navigation to foundational design principles in agentic systems](https://arxiv.org/abs/2407.13032)
184 |     - Hanyu Lai and Xiao Liu and Iat Long Iong and Shuntian Yao and Yuxuan Chen and Pengbo Shen and Hao Yu and Hanchen Zhang and Xiaohan Zhang and Yuxiao Dong and Jie Tang
185 |     - Key: hierarchical architecture, flexible DOM distillation, denoising method
186 |     - 2024
187 |     - [code](https://github.com/EmergenceAI/Agent-E)
188 | 
189 | - [Tree search for language model agents](https://arxiv.org/abs/2407.01476)
190 |     - Jing Yu Koh and Stephen McAleer and Daniel Fried and Ruslan Salakhutdinov
191 |     - Key: multi-step reasoning, planning, best-first tree search 
192 |     - 2024
193 |     - [code](https://github.com/kohjingyu/search-agents)
194 | 
195 | - [Agent S: an open agentic framework that uses computers like a human](https://arxiv.org/abs/2410.08164)
196 |     - Saaket Agashe and Jiuzhou Han and Shuyu Gan and Jiachen Yang and Ang Li and Xin Eric Wang
197 |     - Key: Multimodal Large Language Models, Graphical User Interface, Agent-Computer Interface
198 |     - 2024
199 |     - [code](https://github.com/simular-ai/Agent-S)
200 | 
201 | - [Apple Intelligence Foundation Language Models](https://arxiv.org/pdf/2407.21075)
202 |     - Apple
203 |     - Key: Vision-Language Model, Private Cloud Compute
204 |     - 2024
205 | 
206 | - [CoCo-Agent: A Comprehensive Cognitive MLLM Agent for Smartphone GUI Automation](https://arxiv.org/abs/2402.11941)
207 |     - Xinbei Ma and Zhuosheng Zhang and Hai Zhao
208 |     - Key: Vision-Language Model, Phone
209 |     - 2024
210 |     - [code](https://github.com/xbmxb/CoCo-Agent)
211 | 
212 | - [SeeClick: Harnessing GUI Grounding for Advanced Visual GUI Agents](https://arxiv.org/abs/2401.10935)
213 |     - Kanzhi Cheng and Qiushi Sun and Yougang Chu and Fangzhi Xu and Yantao Li and Jianbing Zhang and Zhiyong Wu
214 |     - Key: Vision-Language Model, PC
215 |     - 2024
216 |     - [code](https://github.com/njucckevin/SeeClick)
217 | 
218 | - [Intention-inInteraction (IN3): Tell Me More! Towards Implicit User Intention Understanding of Language Model Driven Agents](https://arxiv.org/abs/2402.09205)
219 |     - Cheng Qian and Bingxiang He and Zhong Zhuang and Jia Deng and Yujia Qin and Xin Cong and Zhong Zhang and Jie Zhou and Yankai Lin and Zhiyuan Liu and Maosong Sun
220 |     - Key: Language Model, User Intention
221 |     - 2024
222 |     - [code](https://github.com/OpenBMB/Tell_Me_More)
223 | 
224 | - [LATS: Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models](https://arxiv.org/abs/2406.11896)
225 |     - Andy Zhou and Kai Yan and Michal Shlapentokh-Rothman and Haohan Wang and Yu-Xiong Wang
226 |     - Key: Tree Search, Language Model
227 |     - 2024
228 |     - [code](https://github.com/lapisrocks/LanguageAgentTreeSearch)
229 | 
230 | - [DigiRL: Training In-The-Wild Device-Control Agents with Autonomous Reinforcement Learning](https://arxiv.org/abs/2406.11896)
231 |     - Hao Bai and Yifei Zhou and Mert Cemri and Jiayi Pan and Alane Suhr and Sergey Levine and Aviral Kumar
232 |     - Key: Vision-Language Model, Android, Reinforcement Learning
233 |     - 2024
234 |     - [code](https://github.com/DigiRL-agent/digirl)
235 | 
236 | - [ScreenAI: A Vision-Language Model for UI and Infographics Understanding](https://arxiv.org/abs/2402.04615)
237 |     - Gilles Baechler and Srinivas Sunkara and Maria Wang and Fedir Zubach and Hassan Mansoor and Vincent Etter and Victor Cărbune and Jason Lin and Jindong Chen and Abhanshu Sharma
238 |     - Key: Vision-Language Model, Mobile, Infographics
239 |     - 2024
240 |     -[code](https://github.com/kyegomez/ScreenAI)
241 | 
242 | - [ScreenAgent: A Vision Language Model-driven Computer Control Agent](https://arxiv.org/abs/2402.07945)
243 |     - Runliang Niu and Jindong Li and Shiqi Wang and Yali Fu and Xiyu Hu and Xueyuan Leng and He Kong and Yi Chang and Qi Wang
244 |     - Key: Vision Language Model, PC
245 |     - 2024
246 |     - [code](https://github.com/niuzaisheng/ScreenAgent)
247 | 
248 | - [Android in the Zoo: Chain-of-Action-Thought for GUI Agents](https://arxiv.org/abs/2403.02713)
249 |     - Jiwen Zhang and Jihao Wu and Yihua Teng and Minghui Liao and Nuo Xu and Xiao Xiao and Zhongyu Wei and Duyu Tang
250 |     - Key: Vision-Language Model, Android, Chain-of-Action-Thought
251 |     - 2024
252 |     - [code](https://github.com/IMNearth/CoAT)
253 | 
254 | 
255 | - [Mobile-Agent-v2: Mobile Device Operation Assistant with Effective Navigation via Multi-Agent Collaboration](https://arxiv.org/abs/2406.01014)
256 |     - Junyang Wang and Haiyang Xu and Haitao Jia and Xi Zhang and Ming Yan and Weizhou Shen and Ji Zhang and Fei Huang and Jitao Sang
257 |     - Key: Vision-Language Model, Android
258 |     - 2024
259 |     - [code](https://github.com/X-PLUG/MobileAgent)
260 | 
261 | - [Mobile-Agent: Autonomous Multi-Modal Mobile Device Agent with Visual Perception](https://arxiv.org/abs/2401.16158)
262 |     - Junyang Wang and Haiyang Xu and Jiabo Ye and Ming Yan and Weizhou Shen and Ji Zhang and Fei Huang and Jitao Sang
263 |     - Key: Vision-Language Model, Android
264 |     - 2024
265 |     - [code](https://github.com/X-PLUG/MobileAgent)
266 | 
267 | - [WebVoyager: Building an End-to-End Web Agent with Large Multimodal Models](https://arxiv.org/abs/2401.13919)
268 |     - Hongliang He and Wenlin Yao and Kaixin Ma and Wenhao Yu and Yong Dai and Hongming Zhang and Zhenzhong Lan and Dong Yu
269 |     - Key: Vision-Language Model, Web
270 |     - 2024
271 |     - [code](https://github.com/MinorJerry/WebVoyager)
272 | 
273 | - [OS-Copilot: Towards Generalist Computer Agents with Self-Improvement](https://arxiv.org/abs/2402.07456)
274 |     - Zhiyong Wu and Chengcheng Han and Zichen Ding and Zhenmin Weng and Zhoumianze Liu and Shunyu Yao and Tao Yu and Lingpeng Kong
275 |     - Key: Vision-Language Model, PC
276 |     - 2024
277 |     - [code](https://github.com/OS-Copilot/OS-Copilot)
278 | 
279 | - [UFO: A UI-Focused Agent for Windows OS Interaction](https://arxiv.org/abs/2402.07939)
280 |     - Chaoyun Zhang and Liqun Li and Shilin He and Xu Zhang and Bo Qiao and Si Qin and Minghua Ma and Yu Kang and Qingwei Lin and Saravan Rajmohan and Dongmei Zhang and Qi Zhang
281 |     - Key: Vision-Language Model, PC, Windows OS
282 |     - 2024
283 |     - [code](https://github.com/microsoft/UFO)
284 | 
285 | - [Octopus v2: On-device language model for super agent](https://arxiv.org/abs/2404.01744)
286 |     - Wei Chen and Zhiyuan Li
287 |     - Key: Vision-Language Model, Android, IOS
288 |     - 2024
289 | 
290 | 
291 | #### 2023
292 | - [Openagents: An open platform for language agents in the wild](https://arxiv.org/abs/2309.08172)
293 |     - Kaixin Ma and Hongming Zhang and Hongwei Wang and Xiaoman Pan and Wenhao Yu and Dong Yu
294 |     - wild of everyday life, Language agents, real-world evaluations
295 |     - 2023
296 |     -[code](https://github.com/xlang-ai/OpenAgents)
297 |   
298 | - [LASER: LLM Agent with State-Space Exploration for Web Navigation](https://arxiv.org/abs/2309.08172)
299 |     - Kaixin Ma and Hongming Zhang and Hongwei Wang and Xiaoman Pan and Wenhao Yu and Dong Yu
300 |     - Large language models, web navigation, interactive task
301 |     - 2023
302 |     -[code](https://github.com/Mayer123/LASER)
303 | 
304 | - [AppAgent: Multimodal Agents as Smartphone Users](https://arxiv.org/abs/2312.13771)
305 |     - Chi Zhang and Zhao Yang and Jiaxuan Liu and Yucheng Han and Xin Chen and Zebiao Huang and Bin Fu and Gang Yu
306 |     - Key: Vision-Language Model, Android
307 |     - 2023
308 |     - [code](https://github.com/mnotgod96/AppAgent)
309 | 
310 | - [CogAgent: A Visual Language Model for GUI Agents](https://arxiv.org/html/2312.08914v1)
311 |     - Wenyi Hong and Weihan Wang and Qingsong Lv and Jiazheng Xu and Wenmeng Yu and Junhui Ji and Yan Wang and Zihan Wang and Yuxuan Zhang and Juanzi Li and Bin Xu and Yuxiao Dong and Ming Ding and Jie Tang
312 |     - Key: Vision-Language Model, PC, Android, screenshots
313 |     - 2023
314 |     - [code](https://github.com/THUDM/CogVLM)
315 | 
316 | - [Octopus: Embodied Vision-Language Programmer from Environmental Feedback](https://arxiv.org/abs/2310.08588)
317 |     - Jingkang Yang and Yuhao Dong and Shuai Liu and Bo Li and Ziyue Wang and Chencheng Jiang and Haoran Tan and Jiamu Kang and Yuanhan Zhang and Kaiyang Zhou and Ziwei Liu
318 |     - Key: Vision-Language Model, Android, IOS
319 |     - 2023
320 |     - [code](https://github.com/dongyh20/Octopus)
321 | 
322 | - [You Only Look at Screens: Multimodal Chain-of-Action Agents](https://arxiv.org/abs/2309.11436)
323 |     - Zhuosheng Zhang and Aston Zhang
324 |     - Key: Vision-Language Model, Android, Chain-of-Action-Thought
325 |     - 2023
326 |     - [code](https://github.com/cooelf/Auto-GUI)
327 | 
328 | - [LASER: LLM Agent with State-Space Exploration for Web Navigation](https://arxiv.org/abs/2309.08172)
329 |     - Kaixin Ma and Hongming Zhang and Hongwei Wang and Xiaoman Pan and Wenhao Yu and Dong Yu
330 |     - Key: Vision-Language Model, Web, State-Space Exploration
331 |     - 2023
332 |     - [code](https://github.com/Mayer123/LASER)
333 | 
334 | - [A Real-World WebAgent with Planning, Long Context Understanding, and Program Synthesis](https://arxiv.org/abs/2307.12856)
335 |     - Izzeddin Gur and Hiroki Furuta and Austin Huang and Mustafa Safdari and Yutaka Matsuo and Douglas Eck and Aleksandra Faust
336 |     - Key: Vision-Language Model, Web, Planning, Program Synthesis
337 |     - 2023
338 | 
339 | - [Augmenting Autotelic Agents with Large Language Models](https://arxiv.org/abs/2305.12487)
340 |     - Cédric Colas and Laetitia Teodorescu and Pierre-Yves Oudeyer and Xingdi Yuan and Marc-Alexandre Côté
341 |     - Key: Language Model
342 |     - 2023
343 | 
344 | - [Language Models can Solve Computer Tasks](https://arxiv.org/abs/2303.17491)
345 |     - Geunwoo Kim and Pierre Baldi and Stephen McAleer
346 |     - Key: Language Model
347 |     - 2023
348 |     - [code](https://github.com/posgnu/rci-agent)
349 | 
350 | ### Tools
351 | - [Opera Browser Operator: AI-based Agentic Browsing](https://press.opera.com/2025/03/03/opera-browser-operator-ai-agentics/)  
352 |   - Opera Software
353 |   - Key: AI agent, agentic browsing, native client-side solution, privacy-focused  
354 |   - 2025
355 |   - [code](https://press.opera.com) 
356 | 
357 | - [OmniParser: Screen Parsing tool for Pure Vision Based GUI Agent](https://arxiv.org/abs/2408.00203)  
358 |   - Yadong Lu and Jianwei Yang and Yelong Shen and Ahmed Awadallah
359 |   - Key: UI parsing, vision-based agent, GPT-4V, structured elements  
360 |   - 2025
361 |   - [code](https://github.com/microsoft/OmniParser)
362 |   
363 | -[Make Websites Accessible for Agents](https://browser-use.com)
364 |     - Li Zhang and Shihe Wang and Xianqing Jia and Zhihan Zheng and Yunhe Yan and Longxi Gao and Yuanchun Li and Mengwei Xu
365 |     - Key: websites, Agents
366 |     - 2024
367 |     - [code](https://github.com/browser-use/browser-use)
368 | 
369 | - [ToolGen: Unified Tool Retrieval and Calling via Generation](https://openreview.net/forum?id=XLMAMmowdY)  
370 |   - Renxi Wang, Xudong Han, Lei Ji, Shu Wang, Timothy Baldwin, Haonan Li  
371 |   - Key: Agent, Tool Learning, Virtual Token  
372 |   - 2024
373 |   - [code](https://github.com/Reason-Wang/ToolGen)
374 | 
375 | - [LEGENT: An Open Platform for Embodied Agentb Agents on Large Language Models](https://aclanthology.org/2024.acl-demos.8/)
376 |     - Iat Long Iong and Xiao Liu and Yuxuan Chen and Hanyu Lai and Shuntian Yao and Pengbo Shen and Hao Yu and Yuxiao Dong and Jie Tang
377 |     - Key: Webpage, deployment
378 |     - 2024
379 |     - [code](https://github.com/boxworld18/OpenWebAgent)
380 | 
381 | - [LlamaTouch: A Faithful and Scalable Testbed for Mobile UI Automation Task Evaluation](https://arxiv.org/abs/2404.16054)
382 |     - Li Zhang and Shihe Wang and Xianqing Jia and Zhihan Zheng and Yunhe Yan and Longxi Gao and Yuanchun Li and Mengwei Xu
383 |     - Key: Mobile UI, Simulator
384 |     - 2024
385 |     - [code](https://github.com/llamatouch/llamatouch)
386 | 
387 | - [WebArena: A Realistic Web Environment for Building Autonomous Agents](https://arxiv.org/abs/2307.13854)
388 |     - Shuyan Zhou and Frank F. Xu and Hao Zhu and Xuhui Zhou and Robert Lo and Abishek Sridhar and Xianyi Cheng and Tianyue Ou and Yonatan Bisk and Daniel Fried and Uri Alon and Graham Neubig
389 |     - Key: Web, Simulator
390 |     - 2023
391 |     - [code](https://github.com/web-arena-x/webarena)
392 | 
393 | - [Mobile-Env: Building Qualified Evaluation Benchmarks for LLM-GUI Interaction](https://arxiv.org/abs/2305.08144)
394 |     - Danyang Zhang and Zhennan Shen and Rui Xie and Situo Zhang and Tianbao Xie and Zihan Zhao and Siyuan Chen and Lu Chen and Hongshen Xu and Ruisheng Cao and Kai Yu
395 |     - Key: Android, Simulator
396 |     - 2023
397 |     - [code](https://github.com/X-LANCE/Mobile-Env)
398 | 
399 | - [AndroidEnv: A Reinforcement Learning Platform for Android](https://arxiv.org/abs/2105.13231)
400 |     - Daniel Toyama and Philippe Hamel and Anita Gergely and Gheorghe Comanici and Amelia Glaese and Zafarali Ahmed and Tyler Jackson and Shibl Mourad and Doina Precup
401 |     - Key: Android, Reinforcement Learning, Simulator
402 |     - 2021
403 |     - [code](https://github.com/google-deepmind/android_env)
404 | 
405 | ### Datasets
406 | - [SPA-Bench: A Comprehensive Benchmark for SmartPhone Agent Evaluation](https://arxiv.org/abs/2410.15164)
407 |     - Jingxuan Chen, Derek Yuen, Bin Xie, Yuhao Yang, Gongwei Chen, Zhihao Wu, Li Yixing, Xurui Zhou, Weiwen Liu, Shuai Wang, Rui Shao, Liqiang Nie, Yasheng Wang, Jianye Hao, Jun Wang, Kun Shao
408 |     - Key: Two Languages, Interactive Environment, Plug-and-play Framework, 11 Agents, Diverse Metrics
409 |     - 2025
410 |     - [code](https://ai-agents-2030.github.io/SPA-Bench/)
411 | 
412 | - [AssistantBench: Can Web Agents Solve Realistic and Time-Consuming Tasks](https://arxiv.org/abs/2407.15711)
413 |     - Ori Yoran and Samuel Joseph Amouyal and Chaitanya Malaviya and Ben Bogin and Ofir Press and Jonathan Berant
414 |     - Key: Web, Realistic, Time-Consuming,  Benchmark
415 |     - 2024
416 |     - [code](https://assistantbench.github.io)
417 | 
418 | - [WebCanvas: Benchmarking Web Agents in Online Environments](https://arxiv.org/abs/2307.12856)
419 |     - Yichen Pan1 and Dehan Kong and Sida Zhou and Cheng Cui and Yifei Leng and Bing Jiang and Hangyu Liu and Yanyi Shang and Shuyan Zhou and  Tongshuang Wu and Zhengyang Wu
420 |     - Key: Web, Online Environments, Benchmark
421 |     - 2024
422 |     - [code](https://www.imean.ai/web-canvas)
423 | 
424 | - [MobileAgentBench: An Efficient and User-Friendly Benchmark for Mobile LLM Agents](https://arxiv.org/abs/2406.08184)
425 |     - Luyuan Wang and Yongyu Deng and Yiwei Zha and Guodong Mao and Qinmin Wang and Tianchen Min and Wei Chen and Shoufa Chen
426 |     - Key: Mobile, Benchmark
427 |     - 2024
428 |     - [code](https://github.com/MobileAgentBench/mobile-agent-bench)
429 | 
430 | - [VillagerBench/VillagerAgent: A Graph-Based Multi-Agent Framework for Coordinating Complex Task Dependencies in Minecraft](https://arxiv.org/abs/2406.05720)
431 |     - Yubo Dong and Xukun Zhu and Zhengzhe Pan and Linchao Zhu and Yi Yang
432 |     - Key: Vision-Language Model, Game
433 |     - 2024
434 |     - [code](https://github.com/cnsdqd-dyb/VillagerAgent)
435 | 
436 | - [CToolEval: A Chinese Benchmark for LLM-Powered Agent Evaluation in Real-World API Interactions](https://aclanthology.org/2024.findings-acl.928/)
437 |     - Guo, Zishan and Huang, Yufei and Xiong, Deyi
438 |     - Key: Vision-Language Model, Phone
439 |     - 2024
440 |     - [code](https://github.com/tjunlp-lab/CToolEval)
441 | 
442 | - [Multi-Turn Mind2Web: On the Multi-turn Instruction Following for Conversational Web Agents](https://arxiv.org/pdf/2402.15057)
443 |     - Yang Deng and Xuan Zhang and Wenxuan Zhang and Yifei Yuan and See-Kiong Ng and Tat-Seng Chua
444 |     - Key: Vision-Language Model, Web Tasks
445 |     - 2024
446 |     - [code](https://github.com/magicgh/self-map)
447 | 
448 | - [VisualWebArena: Evaluating Multimodal Agents on Realistic Visual Web Tasks](https://arxiv.org/abs/2401.13649)
449 |     - Jing Yu Koh and Robert Lo and Lawrence Jang and Vikram Duvvur and Ming Chong Lim and Po-Yu Huang and Graham Neubig and Shuyan Zhou and Ruslan Salakhutdinov and Daniel Fried
450 |     - Key: Vision-Language Model, Web Tasks
451 |     - 2024
452 |     - [code](https://github.com/web-arena-x/visualwebarena)
453 | 
454 | - [Android in the Zoo: Chain-of-Action-Thought for GUI Agents](https://arxiv.org/abs/2403.02713)
455 |     - Jiwen Zhang and Jihao Wu and Yihua Teng and Minghui Liao and Nuo Xu and Xiao Xiao and Zhongyu Wei and Duyu Tang
456 |     - Key: Vision-Language Model, Android, Chain-of-Action-Thought
457 |     - 2024
458 |     - [code](https://github.com/IMNearth/CoAT)
459 | 
460 | - [Android in the Wild: A Large-Scale Dataset for Android Device Control](https://arxiv.org/abs/2307.10088)
461 |     - Christopher Rawles and Alice Li and Daniel Rodriguez and Oriana Riva and Timothy Lillicrap
462 |     - Key: Android, datasets
463 |     - 2023
464 |     - [code](https://github.com/google-research/google-research/blob/master/android_in_the_wild/README.md)
465 | 
466 | - [Mind2Web: Towards a Generalist Agent for the Web](https://arxiv.org/abs/2306.06070)
467 |     - Xiang Deng and Yu Gu and Boyuan Zheng and Shijie Chen and Samuel Stevens and Boshi Wang and Huan Sun and Yu Su
468 |     - Key: Web, datasets
469 |     - 2023
470 |     - [code](https://github.com/OSU-NLP-Group/Mind2Web)
471 | 
472 | - [WebShop: Towards Scalable Real-World Web Interaction with Grounded Language Agents](https://arxiv.org/abs/2207.01206)
473 |     - Shunyu Yao and Howard Chen and John Yang and Karthik Narasimhan
474 |     - Key: Web, datasets
475 |     - 2022
476 |     - [code](https://github.com/princeton-nlp/WebShop)
477 | 
478 | - [Rico: A Mobile App Dataset for Building Data-Driven Design Applications](https://dl.acm.org/doi/10.1145/3126594.3126651)
479 |     - Deka, Biplab and Huang, Zifeng and Franzen, Chad and Hibschman, Joshua and Afergan, Daniel and Li, Yang and Nichols, Jeffrey and Kumar, Ranjitha
480 |     - Key: mobile app, datasets
481 |     - 2017
482 | 
483 | ## Related Repositories
484 | 
485 | - [awesome-llm-powered-agent](https://github.com/hyp1231/awesome-llm-powered-agent)
486 | - [Awesome-LLM-based-Web-Agent-and-Tools](https://github.com/albzni/Awesome-LLM-based-Web-Agent-and-Tools)
487 | - [Awesome-GUI-Agent](https://github.com/showlab/Awesome-GUI-Agent)
488 | - [computer-control-agent-knowledge-base](https://github.com/James4Ever0/computer_control_agent_knowledge_base)
489 | 
490 | ## Contributing
491 | 
492 | Our purpose is to make this repo even better. If you are interested in contributing, please refer to [HERE](CONTRIBUTING.md) for instructions in contribution.
493 | 
494 | ## License
495 | 
496 | This repository is released under the Apache 2.0 license.
497 | 


--------------------------------------------------------------------------------
/assets/mobile.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/opendilab/awesome-ui-agents/3dc02de3889057db6c63017a19cc4f55d02194d1/assets/mobile.jpg


--------------------------------------------------------------------------------
/assets/pc.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/opendilab/awesome-ui-agents/3dc02de3889057db6c63017a19cc4f55d02194d1/assets/pc.jpg


--------------------------------------------------------------------------------